php-general Digest 26 Oct 2011 14:39:39 -0000 Issue 7538
Topics (messages 315427 through 315440):
Re: Exporting large data from mysql to html using php
315427 by: Jim Giner
315428 by: Jason Pruim
315429 by: Jim Giner
315430 by: Jason Pruim
315431 by: Tommy Pham
315432 by: Lester Caine
315433 by: Tommy Pham
315434 by: Lester Caine
315435 by: Tommy Pham
315436 by: Tommy Pham
315437 by: Lester Caine
Displaying variables in text - clarification needed
315438 by: Rick Dwyer
315439 by: Lester Caine
315440 by: Shawn McKenzie
Administrivia:
To subscribe to the digest, e-mail:
php-general-digest-subscr...@lists.php.net
To unsubscribe from the digest, e-mail:
php-general-digest-unsubscr...@lists.php.net
To post to the list, e-mail:
php-gene...@lists.php.net
----------------------------------------------------------------------
--- Begin Message ---
"David Robley" <robl...@aapt.net.au> wrote in message
news:49.50.34068.1b567...@pb1.pair.com...
>
> Consider running EXPLAIN on all your queries to see if there is something
> Mysql thinks could be done to improve performance.
>
Why do so many responders seem to think the problem here is in the
preparation of the query?
It is the Approach that it is the problem. It needs to re-addressed, not
simply tinkered with. I don't care if someone figures out how to spew out
89M+ records in 5 seconds flat. What are you doing pulling up that many
detail records at one time for? Who is going to look at them? You? If so,
see my previous post on how long that is going to take you just to examine
(!) the first million alone.
This problem needs a re-working of the data processing and user interfaces
to allow for the selection of an appropriate amount of individual records in
any result set, otherwise simply preparing a summary of some sort.
--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com
On Oct 25, 2011, at 9:58 PM, Jim Giner wrote:
> "David Robley" <robl...@aapt.net.au> wrote in message
> news:49.50.34068.1b567...@pb1.pair.com...
>>
>> Consider running EXPLAIN on all your queries to see if there is something
>> Mysql thinks could be done to improve performance.
>>
>
> Why do so many responders seem to think the problem here is in the
> preparation of the query?
>
> It is the Approach that it is the problem. It needs to re-addressed, not
> simply tinkered with. I don't care if someone figures out how to spew out
> 89M+ records in 5 seconds flat. What are you doing pulling up that many
> detail records at one time for? Who is going to look at them? You? If so,
> see my previous post on how long that is going to take you just to examine
> (!) the first million alone.
>
> This problem needs a re-working of the data processing and user interfaces
> to allow for the selection of an appropriate amount of individual records in
> any result set, otherwise simply preparing a summary of some sort.
It turns out the issue was actually in the pagination... I'm reworking the
whole thing and stream lining it... But in the pagination that I found on the
internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the COUNT was
killing the time... Once that was removed, I was displaying records faster then
I could imagine... So it's off to pagination land to fix it! And possibly redo
the entire thing!
--- End Message ---
--- Begin Message ---
----- Original Message -----
From: "Jason Pruim" <li...@pruimphotography.com>
To: "Jim Giner" <jim.gi...@albanyhandball.com>
Cc: <php-gene...@lists.php.net>
Sent: Tuesday, October 25, 2011 10:06 PM
Subject: Re: [PHP] Exporting large data from mysql to html using php
It turns out the issue was actually in the pagination... I'm reworking the
whole thing and stream lining it... But in the pagination that I found on
the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the
COUNT was killing the time... Once that was removed, I was displaying
records faster then I could imagine... So it's off to pagination land to fix
it! And possibly redo the entire thing!
************
So you're still going to generate pages and pages of an un-manageable volume
of data?
Sorry, but I offer my congratulations on producing a technically marvelous
solution, and my condolences for a completely worthless application.
--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com
On Oct 25, 2011, at 10:10 PM, Jim Giner wrote:
>
> ----- Original Message ----- From: "Jason Pruim" <li...@pruimphotography.com>
> To: "Jim Giner" <jim.gi...@albanyhandball.com>
> Cc: <php-gene...@lists.php.net>
> Sent: Tuesday, October 25, 2011 10:06 PM
> Subject: Re: [PHP] Exporting large data from mysql to html using php
>
>
> It turns out the issue was actually in the pagination... I'm reworking the
> whole thing and stream lining it... But in the pagination that I found on the
> internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the COUNT
> was killing the time... Once that was removed, I was displaying records
> faster then I could imagine... So it's off to pagination land to fix it! And
> possibly redo the entire thing!
> ************
>
> So you're still going to generate pages and pages of an un-manageable volume
> of data?
>
> Sorry, but I offer my congratulations on producing a technically marvelous
> solution, and my condolences for a completely worthless application.
It's walking the fine line between what I want, and what the people who are
paying for it want... What I always try and do is give them what they ask for,
but also give them what they want... because what they ask for != what they
want MOST of the time... So most likely I will end up pulling the pagination
and really limiting the searching...
But until I convince the right people... I go on as planned.
--- End Message ---
--- Begin Message ---
On Tue, Oct 25, 2011 at 7:06 PM, Jason Pruim <li...@pruimphotography.com>wrote:
>
> It turns out the issue was actually in the pagination... I'm reworking the
> whole thing and stream lining it... But in the pagination that I found on
> the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the
> COUNT was killing the time... Once that was removed, I was displaying
> records faster then I could imagine... So it's off to pagination land to fix
> it! And possibly redo the entire thing!
>
>
If you're encountering performance issues while doing SELECT COUNT(*), it
sounds like you have serious disk IO performance issue. Is the DB on RAID
subsystem? If not, why not? If so, what level? Also, what type of HDDs?
For something like this, it should be RAID 10 with HDDs spinning at least
7200RPM, 10,000+ RPM recommended, connected to a good RAID controller, like
3ware's. Also, the controller should be in slot PCI-X or, preferably,
PCI-e.
Regards,
Tommy
--- End Message ---
--- Begin Message ---
Tommy Pham wrote:
It turns out the issue was actually in the pagination... I'm reworking the
> whole thing and stream lining it... But in the pagination that I found on
> the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the
> COUNT was killing the time... Once that was removed, I was displaying
> records faster then I could imagine... So it's off to pagination land to fix
> it! And possibly redo the entire thing!
>
If you're encountering performance issues while doing SELECT COUNT(*), it
sounds like you have serious disk IO performance issue. Is the DB on RAID
subsystem? If not, why not? If so, what level? Also, what type of HDDs?
For something like this, it should be RAID 10 with HDDs spinning at least
7200RPM, 10,000+ RPM recommended, connected to a good RAID controller, like
3ware's. Also, the controller should be in slot PCI-X or, preferably,
PCI-e.
What a load of twoddle ...
SELECT COUNT(*) is a problem on any fully transactional database, since it has
to be generated from the currently active view of the data.
Rather than trying to get the database engine access every record faster, the
correct action is to either avoid the count altogether, or more practically
maintain a separate table with useful counts that have been generated from the
committed data.
Jason, it is not unreasonable that an initial view will be displaying unfiltered
data, so you just need to restrict the number of records displayed. As you have
found out, telling the user how many more records there are is the real problem,
so if it's not important ... don't, but if it's useful to know, then keep a
'cache' of counts that link to your initial filter options. Once you are down to
a few thousand records, then a SELECT COUNT(*) may be appropriate ;) Where it
becomes a problem is when there the user can set up a more complex filter that
the cache does not cover ...
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
--- End Message ---
--- Begin Message ---
On Wed, Oct 26, 2011 at 12:52 AM, Lester Caine <les...@lsces.co.uk> wrote:
> Tommy Pham wrote:
>
>> It turns out the issue was actually in the pagination... I'm reworking the
>>> > whole thing and stream lining it... But in the pagination that I found
>>> on
>>> > the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and
>>> the
>>> > COUNT was killing the time... Once that was removed, I was displaying
>>> > records faster then I could imagine... So it's off to pagination land
>>> to fix
>>> > it! And possibly redo the entire thing!
>>> >
>>>
>> If you're encountering performance issues while doing SELECT COUNT(*), it
>> sounds like you have serious disk IO performance issue. Is the DB on RAID
>> subsystem? If not, why not? If so, what level? Also, what type of HDDs?
>> For something like this, it should be RAID 10 with HDDs spinning at least
>> 7200RPM, 10,000+ RPM recommended, connected to a good RAID controller,
>> like
>> 3ware's. Also, the controller should be in slot PCI-X or, preferably,
>> PCI-e.
>>
>
> What a load of twoddle ...
>
>
I wonder ... The real question is what's the purpose of the DB? Is it for
OLAP or OLTP? ;)
As for dealing with DB having millions of rows, you're crossing over into
DBA area.
> SELECT COUNT(*) is a problem on any fully transactional database, since it
> has to be generated from the currently active view of the data.
> Rather than trying to get the database engine access every record faster,
> the correct action is to either avoid the count altogether, or more
> practically maintain a separate table with useful counts that have been
> generated from the committed data.
>
> Jason, it is not unreasonable that an initial view will be displaying
> unfiltered data, so you just need to restrict the number of records
> displayed. As you have found out, telling the user how many more records
> there are is the real problem, so if it's not important ... don't, but if
> it's useful to know, then keep a 'cache' of counts that link to your initial
> filter options. Once you are down to a few thousand records, then a SELECT
> COUNT(*) may be appropriate ;) Where it becomes a problem is when there the
> user can set up a more complex filter that the cache does not cover ...
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact -
> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird -
> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--- End Message ---
--- Begin Message ---
Tommy Pham wrote:
I wonder ... The real question is what's the purpose of the DB? Is it for OLAP
or OLTP? ;)
As for dealing with DB having millions of rows, you're crossing over into DBA
area.
Many of my customers have coming up on 20 years of data available. There has
been a debate on transferring historic data to a separate database, but having
it available is not causing a problem, except for some counts and larger search
actions, and being able to see how long a client has been visiting is often
useful. Statistical analysis is always done on a separate machine, with a
replicated copy of the data, so as not to affect the active users ...
That said, we are now using the much more detailed LLPG address data rather than
simple postcode, and that has added another order of magnitude to data that is
being searched live ...
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
--- End Message ---
--- Begin Message ---
On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <les...@lsces.co.uk> wrote:
> Tommy Pham wrote:
>
>> I wonder ... The real question is what's the purpose of the DB? Is it for
>> OLAP
>> or OLTP? ;)
>> As for dealing with DB having millions of rows, you're crossing over into
>> DBA area.
>>
>
> Many of my customers have coming up on 20 years of data available. There
> has been a debate on transferring historic data to a separate database, but
> having it available is not causing a problem, except for some counts and
> larger search actions, and being able to see how long a client has been
> visiting is often useful. Statistical analysis is always done on a separate
> machine, with a replicated copy of the data, so as not to affect the active
> users ...
>
>
What kind of counts/filters? What kind of RAID subsystem is the storage?
What's the total size of the DB? Up to 20 years of data should be in the
peta range. In that peta range, if you're not having performance issue and
not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the
application and database design in details. :)
> That said, we are now using the much more detailed LLPG address data rather
> than simple postcode, and that has added another order of magnitude to data
> that is being searched live ...
>
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact -
> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird -
> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>
>
Since Jason didn't disclose sufficient information, I suggested that RAID
storage based on the worst case scenario being this: as a business analyst,
I'd want to do drill-downs, filters, counts of a the following for an
up-coming marketing campaign:
* county(ies)
* city(ies)
* zip code(s)
* address type (business and/or residential)
* of business (commercial, industrial/manufacturing, etc)
* of residential ( homes - single/townhouses - or apartments/condos )
The filters and counts will any combinations of the above, ie: 5 random zip
codes within the given state that lists all other criteria and break down.
As Jason mention having 89million rows for a given state, how long would it
take to run the drill-downs if the DB isn't sitting on a fast storage
medium? That 89 million is the most likely the average count in the USA.
For California and New York, the number can double that easily. That's only
the basic filtering. What of a specific business industry such as
landscaping? What of the filtering by yearly income and/or real estate
value? BTW, as a business analyst, I don't want to wait hours for the info
to update every time I change a criteria/filter to get the counts before I
look into a few random individual records from the results.
As I've mentioned, something of this magnitude is better to leave it to the
DBA and work together with that DBA. Either hire/contract one or become one
:)
Regards,
Tommy
--- End Message ---
--- Begin Message ---
On Wed, Oct 26, 2011 at 3:23 AM, Tommy Pham <tommy...@gmail.com> wrote:
> On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <les...@lsces.co.uk> wrote:
>
>> Tommy Pham wrote:
>>
>>> I wonder ... The real question is what's the purpose of the DB? Is it
>>> for OLAP
>>> or OLTP? ;)
>>> As for dealing with DB having millions of rows, you're crossing over into
>>> DBA area.
>>>
>>
>> Many of my customers have coming up on 20 years of data available. There
>> has been a debate on transferring historic data to a separate database, but
>> having it available is not causing a problem, except for some counts and
>> larger search actions, and being able to see how long a client has been
>> visiting is often useful. Statistical analysis is always done on a separate
>> machine, with a replicated copy of the data, so as not to affect the active
>> users ...
>>
>>
> What kind of counts/filters? What kind of RAID subsystem is the storage?
> What's the total size of the DB? Up to 20 years of data should be in the
> peta range. In that peta range, if you're not having performance issue and
> not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the
> application and database design in details. :)
>
Brain fart... that should be tera not peta ... lol..
>
>
>> That said, we are now using the much more detailed LLPG address data
>> rather than simple postcode, and that has added another order of magnitude
>> to data that is being searched live ...
>>
>>
>> --
>> Lester Caine - G8HFL
>> -----------------------------
>> Contact -
>> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
>> L.S.Caine Electronic Services - http://lsces.co.uk
>> EnquirySolve - http://enquirysolve.com/
>> Model Engineers Digital Workshop - http://medw.co.uk//
>> Firebird -
>> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>>
>>
> Since Jason didn't disclose sufficient information, I suggested that RAID
> storage based on the worst case scenario being this: as a business analyst,
> I'd want to do drill-downs, filters, counts of a the following for an
> up-coming marketing campaign:
>
> * county(ies)
> * city(ies)
> * zip code(s)
> * address type (business and/or residential)
> * of business (commercial, industrial/manufacturing, etc)
> * of residential ( homes - single/townhouses - or apartments/condos )
>
> The filters and counts will any combinations of the above, ie: 5 random zip
> codes within the given state that lists all other criteria and break down.
> As Jason mention having 89million rows for a given state, how long would it
> take to run the drill-downs if the DB isn't sitting on a fast storage
> medium? That 89 million is the most likely the average count in the USA.
> For California and New York, the number can double that easily. That's only
> the basic filtering. What of a specific business industry such as
> landscaping? What of the filtering by yearly income and/or real estate
> value? BTW, as a business analyst, I don't want to wait hours for the info
> to update every time I change a criteria/filter to get the counts before I
> look into a few random individual records from the results.
>
> As I've mentioned, something of this magnitude is better to leave it to the
> DBA and work together with that DBA. Either hire/contract one or become one
> :)
>
>
> Regards,
> Tommy
>
--- End Message ---
--- Begin Message ---
Tommy Pham wrote:
Many of my customers have coming up on 20 years of data available. There has
been a debate on transferring historic data to a separate database, but
having it available is not causing a problem, except for some counts and
larger search actions, and being able to see how long a client has been
visiting is often useful. Statistical analysis is always done on a separate
machine, with a replicated copy of the data, so as not to affect the active
users ...
What kind of counts/filters? What kind of RAID subsystem is the storage?
What's the total size of the DB? Up to 20 years of data should be in the peta
range. In that peta range, if you're not having performance issue and not using
either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the application and
database design in details. :)
We are still only in hundreds on Mb and historic data is has less detail than
the current 'transactions'. The current postcode table is 500Mb, and while the
LLPG data would increase that by the order of 100, it's currently only
restricted to a councils immediate working area, so we keep the problem
contained. Dropping back to postcode for out of area enquiries. Users complain
if an enquiry takes more than a few seconds, and Firebird is giving me more than
adequate performance, and allows shadow data to be created via triggers to
reduce the need for 'counting'.
I have a new 'application' which is using the same search criteria but the data
volume is growing a lot faster, 10Gb on the test system here, but I am still
seeing the same search speeds once the correct indexes have been generated. But
it will take a few more years before that starts reaching the 100Gb level :)
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
--- End Message ---
--- Begin Message ---
Hello all.
I inherited some PHP pages about a year ago. They have been fine all
along but now a bunch of erroneous errors and results are popping up.
I traced it to the way the variables were being used on the page...
for example, the following SQL statement (a space between ' and " for
clarity):
sql="select name from mytable where name=$myvar and display='yes' ";
This has worked in the past but is now returning errors for some
records and working for others. I changed the above to the following
and now all is good:
sql="select name from mytable where name=' ".$myvar." ' and
display='yes' ";
What would explain why the former is suddenly causing problems? The
version of PHP is 5.2.3 and from what I can tell, hasn't been updated
since February of 2011.
Thanks,
--Rick
--- End Message ---
--- Begin Message ---
Rick Dwyer wrote:
Hello all.
I inherited some PHP pages about a year ago. They have been fine all along but
now a bunch of erroneous errors and results are popping up. I traced it to the
way the variables were being used on the page... for example, the following SQL
statement (a space between ' and " for clarity):
sql="select name from mytable where name=$myvar and display='yes' ";
This has worked in the past but is now returning errors for some records and
working for others. I changed the above to the following and now all is good:
sql="select name from mytable where name=' ".$myvar." ' and display='yes' ";
What would explain why the former is suddenly causing problems? The version of
PHP is 5.2.3 and from what I can tell, hasn't been updated since February of
2011.
I would have a closer look at what the contents of $myvar is when it's giving an
error. I would suspect it might contain a " or a ' ? The first version is not
taking care of any characters that may need escaping, so you may well have been
lucky in the past?
sql="select name from mytable where name='$myvar' and display='yes' " is another
option for a different 'result', but personally I prefer to pass variables like
this as a parameter, so the query just has name=? and you pass the $myvar in an
array of variables. This helps prevent suspect sql getting into the query as
well, not such a problem here, but sensible practice anyway.
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
--- End Message ---
--- Begin Message ---
On 10/26/2011 07:20 AM, Rick Dwyer wrote:
> Hello all.
>
> I inherited some PHP pages about a year ago. They have been fine all
> along but now a bunch of erroneous errors and results are popping up. I
> traced it to the way the variables were being used on the page... for
> example, the following SQL statement (a space between ' and " for clarity):
>
>
> sql="select name from mytable where name=$myvar and display='yes' ";
>
> This has worked in the past but is now returning errors for some records
> and working for others. I changed the above to the following and now
> all is good:
>
> sql="select name from mytable where name=' ".$myvar." ' and
> display='yes' ";
>
> What would explain why the former is suddenly causing problems? The
> version of PHP is 5.2.3 and from what I can tell, hasn't been updated
> since February of 2011.
>
> Thanks,
>
> --Rick
>
>
In addition to escaping the var with mysql_real_escape_string() or
similar, values for text columns need to be quoted as you did in your
second example. consider this as a name:
name=Shawn McKenzie
Without quoting, the space breaks the query, so it would need to be:
name='Shawn McKenzie'
And the escaping would take care of:
name='Tip O'Neill'
--
Thanks!
-Shawn
http://www.spidean.com
--- End Message ---