php-general Digest 26 Oct 2011 14:39:39 -0000 Issue 7538

Topics (messages 315427 through 315440):

Re: Exporting large data from mysql to html using php
        315427 by: Jim Giner
        315428 by: Jason Pruim
        315429 by: Jim Giner
        315430 by: Jason Pruim
        315431 by: Tommy Pham
        315432 by: Lester Caine
        315433 by: Tommy Pham
        315434 by: Lester Caine
        315435 by: Tommy Pham
        315436 by: Tommy Pham
        315437 by: Lester Caine

Displaying variables in text - clarification needed
        315438 by: Rick Dwyer
        315439 by: Lester Caine
        315440 by: Shawn McKenzie

Administrivia:

To subscribe to the digest, e-mail:
        php-general-digest-subscr...@lists.php.net

To unsubscribe from the digest, e-mail:
        php-general-digest-unsubscr...@lists.php.net

To post to the list, e-mail:
        php-gene...@lists.php.net


----------------------------------------------------------------------
--- Begin Message ---
"David Robley" <robl...@aapt.net.au> wrote in message 
news:49.50.34068.1b567...@pb1.pair.com...
>
> Consider running EXPLAIN on all your queries to see if there is something
> Mysql thinks could be done to improve performance.
>

Why do so many responders seem to think the problem here is in the 
preparation of the query?

It is the Approach that it is the problem.  It needs to re-addressed, not 
simply tinkered with.  I don't care if someone figures out how to spew out 
89M+ records in 5 seconds flat.  What are you doing pulling up that many 
detail records at one time for? Who is going to look at them?  You?  If so, 
see my previous post on how long that is going to take you just to examine 
(!) the first million alone.

This problem needs a re-working of the data processing and user interfaces 
to allow for the selection of an appropriate amount of individual records in 
any result set, otherwise simply preparing a summary of some sort. 



--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com



On Oct 25, 2011, at 9:58 PM, Jim Giner wrote:

> "David Robley" <robl...@aapt.net.au> wrote in message 
> news:49.50.34068.1b567...@pb1.pair.com...
>> 
>> Consider running EXPLAIN on all your queries to see if there is something
>> Mysql thinks could be done to improve performance.
>> 
> 
> Why do so many responders seem to think the problem here is in the 
> preparation of the query?
> 
> It is the Approach that it is the problem.  It needs to re-addressed, not 
> simply tinkered with.  I don't care if someone figures out how to spew out 
> 89M+ records in 5 seconds flat.  What are you doing pulling up that many 
> detail records at one time for? Who is going to look at them?  You?  If so, 
> see my previous post on how long that is going to take you just to examine 
> (!) the first million alone.
> 
> This problem needs a re-working of the data processing and user interfaces 
> to allow for the selection of an appropriate amount of individual records in 
> any result set, otherwise simply preparing a summary of some sort. 


It turns out the issue was actually in the pagination... I'm reworking the 
whole thing and stream lining it... But in the pagination that I found on the 
internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the COUNT was 
killing the time... Once that was removed, I was displaying records faster then 
I could imagine... So it's off to pagination land to fix it! And possibly redo 
the entire thing!



--- End Message ---
--- Begin Message ---

----- Original Message ----- From: "Jason Pruim" <li...@pruimphotography.com>
To: "Jim Giner" <jim.gi...@albanyhandball.com>
Cc: <php-gene...@lists.php.net>
Sent: Tuesday, October 25, 2011 10:06 PM
Subject: Re: [PHP] Exporting large data from mysql to html using php


It turns out the issue was actually in the pagination... I'm reworking the whole thing and stream lining it... But in the pagination that I found on the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the COUNT was killing the time... Once that was removed, I was displaying records faster then I could imagine... So it's off to pagination land to fix it! And possibly redo the entire thing!
************

So you're still going to generate pages and pages of an un-manageable volume of data?

Sorry, but I offer my congratulations on producing a technically marvelous solution, and my condolences for a completely worthless application.







--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com



On Oct 25, 2011, at 10:10 PM, Jim Giner wrote:

> 
> ----- Original Message ----- From: "Jason Pruim" <li...@pruimphotography.com>
> To: "Jim Giner" <jim.gi...@albanyhandball.com>
> Cc: <php-gene...@lists.php.net>
> Sent: Tuesday, October 25, 2011 10:06 PM
> Subject: Re: [PHP] Exporting large data from mysql to html using php
> 
> 
> It turns out the issue was actually in the pagination... I'm reworking the 
> whole thing and stream lining it... But in the pagination that I found on the 
> internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the COUNT 
> was killing the time... Once that was removed, I was displaying records 
> faster then I could imagine... So it's off to pagination land to fix it! And 
> possibly redo the entire thing!
> ************
> 
> So you're still going to generate pages and pages of an un-manageable volume 
> of data?
> 
> Sorry, but I offer my congratulations on producing a technically marvelous 
> solution, and my condolences for a completely worthless application.

It's walking the fine line between what I want, and what the people who are 
paying for it want... What I always try and do is give them what they ask for, 
but also give them what they want... because what they ask for != what they 
want MOST of the time...  So most likely I will end up pulling the pagination 
and really limiting the searching... 

But until I convince the right people... I go on as planned.


--- End Message ---
--- Begin Message ---
On Tue, Oct 25, 2011 at 7:06 PM, Jason Pruim <li...@pruimphotography.com>wrote:

>
> It turns out the issue was actually in the pagination... I'm reworking the
> whole thing and stream lining it... But in the pagination that I found on
> the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the
> COUNT was killing the time... Once that was removed, I was displaying
> records faster then I could imagine... So it's off to pagination land to fix
> it! And possibly redo the entire thing!
>
>
If you're encountering performance issues while doing SELECT COUNT(*), it
sounds like you have serious disk IO performance issue.  Is the DB on RAID
subsystem?  If not, why not? If so, what level?  Also, what type of HDDs?
For something like this, it should be RAID 10 with HDDs spinning at least
7200RPM, 10,000+ RPM recommended, connected to a good RAID controller, like
3ware's.  Also, the controller should be in slot PCI-X or, preferably,
PCI-e.

Regards,
Tommy

--- End Message ---
--- Begin Message ---
Tommy Pham wrote:
It turns out the issue was actually in the pagination... I'm reworking the
>  whole thing and stream lining it... But in the pagination that I found on
>  the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and the
>  COUNT was killing the time... Once that was removed, I was displaying
>  records faster then I could imagine... So it's off to pagination land to fix
>  it! And possibly redo the entire thing!
>
If you're encountering performance issues while doing SELECT COUNT(*), it
sounds like you have serious disk IO performance issue.  Is the DB on RAID
subsystem?  If not, why not? If so, what level?  Also, what type of HDDs?
For something like this, it should be RAID 10 with HDDs spinning at least
7200RPM, 10,000+ RPM recommended, connected to a good RAID controller, like
3ware's.  Also, the controller should be in slot PCI-X or, preferably,
PCI-e.

What a load of twoddle ...

SELECT COUNT(*) is a problem on any fully transactional database, since it has to be generated from the currently active view of the data. Rather than trying to get the database engine access every record faster, the correct action is to either avoid the count altogether, or more practically maintain a separate table with useful counts that have been generated from the committed data.

Jason, it is not unreasonable that an initial view will be displaying unfiltered data, so you just need to restrict the number of records displayed. As you have found out, telling the user how many more records there are is the real problem, so if it's not important ... don't, but if it's useful to know, then keep a 'cache' of counts that link to your initial filter options. Once you are down to a few thousand records, then a SELECT COUNT(*) may be appropriate ;) Where it becomes a problem is when there the user can set up a more complex filter that the cache does not cover ...

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--- End Message ---
--- Begin Message ---
On Wed, Oct 26, 2011 at 12:52 AM, Lester Caine <les...@lsces.co.uk> wrote:

> Tommy Pham wrote:
>
>> It turns out the issue was actually in the pagination... I'm reworking the
>>> >  whole thing and stream lining it... But in the pagination that I found
>>> on
>>> >  the internet it used a "SELECT COUNT(*) WHERE state='{$state}'"; and
>>> the
>>> >  COUNT was killing the time... Once that was removed, I was displaying
>>> >  records faster then I could imagine... So it's off to pagination land
>>> to fix
>>> >  it! And possibly redo the entire thing!
>>> >
>>>
>> If you're encountering performance issues while doing SELECT COUNT(*), it
>> sounds like you have serious disk IO performance issue.  Is the DB on RAID
>> subsystem?  If not, why not? If so, what level?  Also, what type of HDDs?
>> For something like this, it should be RAID 10 with HDDs spinning at least
>> 7200RPM, 10,000+ RPM recommended, connected to a good RAID controller,
>> like
>> 3ware's.  Also, the controller should be in slot PCI-X or, preferably,
>> PCI-e.
>>
>
> What a load of twoddle ...
>
>
I wonder ... The real question is what's the purpose of the DB?  Is it for
OLAP or OLTP? ;)
As for dealing with DB having millions of rows, you're crossing over into
DBA area.


> SELECT COUNT(*) is a problem on any fully transactional database, since it
> has to be generated from the currently active view of the data.
> Rather than trying to get the database engine access every record faster,
> the correct action is to either avoid the count altogether, or more
> practically maintain a separate table with useful counts that have been
> generated from the committed data.
>
> Jason, it is not unreasonable that an initial view will be displaying
> unfiltered data, so you just need to restrict the number of records
> displayed. As you have found out, telling the user how many more records
> there are is the real problem, so if it's not important ... don't, but if
> it's useful to know, then keep a 'cache' of counts that link to your initial
> filter options. Once you are down to a few thousand records, then a SELECT
> COUNT(*) may be appropriate ;) Where it becomes a problem is when there the
> user can set up a more complex filter that the cache does not cover ...
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact - 
> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird - 
> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--- End Message ---
--- Begin Message ---
Tommy Pham wrote:
I wonder ... The real question is what's the purpose of the DB?  Is it for OLAP
or OLTP? ;)
As for dealing with DB having millions of rows, you're crossing over into DBA 
area.

Many of my customers have coming up on 20 years of data available. There has been a debate on transferring historic data to a separate database, but having it available is not causing a problem, except for some counts and larger search actions, and being able to see how long a client has been visiting is often useful. Statistical analysis is always done on a separate machine, with a replicated copy of the data, so as not to affect the active users ...

That said, we are now using the much more detailed LLPG address data rather than simple postcode, and that has added another order of magnitude to data that is being searched live ...

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--- End Message ---
--- Begin Message ---
On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <les...@lsces.co.uk> wrote:

> Tommy Pham wrote:
>
>> I wonder ... The real question is what's the purpose of the DB?  Is it for
>> OLAP
>> or OLTP? ;)
>> As for dealing with DB having millions of rows, you're crossing over into
>> DBA area.
>>
>
> Many of my customers have coming up on 20 years of data available. There
> has been a debate on transferring historic data to a separate database, but
> having it available is not causing a problem, except for some counts and
> larger search actions, and being able to see how long a client has been
> visiting is often useful. Statistical analysis is always done on a separate
> machine, with a replicated copy of the data, so as not to affect the active
> users ...
>
>
What kind of counts/filters?  What kind of RAID subsystem is the storage?
What's the total size of the DB?  Up to 20 years of data should be in the
peta range.  In that peta range, if you're not having performance issue and
not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the
application and database design in details. :)


> That said, we are now using the much more detailed LLPG address data rather
> than simple postcode, and that has added another order of magnitude to data
> that is being searched live ...
>
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact - 
> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird - 
> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>
>
Since Jason didn't disclose sufficient information, I suggested that RAID
storage based on the worst case scenario being this:  as a business analyst,
I'd want to do drill-downs, filters, counts of a the following for an
up-coming marketing campaign:

* county(ies)
* city(ies)
* zip code(s)
* address type (business and/or residential)
* of business (commercial, industrial/manufacturing, etc)
* of residential ( homes - single/townhouses - or apartments/condos )

The filters and counts will any combinations of the above, ie: 5 random zip
codes within the given state that lists all other criteria and break down.
As Jason mention having 89million rows for a given state, how long would it
take to run the drill-downs if the DB isn't sitting on a fast storage
medium?  That 89 million is the most likely the average count in the USA.
For California and New York, the number can double that easily.  That's only
the basic filtering.  What of a specific business industry such as
landscaping?  What of the filtering by yearly income and/or real estate
value?  BTW, as a business analyst, I don't want to wait hours for the info
to update every time I change a criteria/filter to get the counts before I
look into a few random individual records from the results.

As I've mentioned, something of this magnitude is better to leave it to the
DBA and work together with that DBA.  Either hire/contract one or become one
:)


Regards,
Tommy

--- End Message ---
--- Begin Message ---
On Wed, Oct 26, 2011 at 3:23 AM, Tommy Pham <tommy...@gmail.com> wrote:

> On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <les...@lsces.co.uk> wrote:
>
>> Tommy Pham wrote:
>>
>>> I wonder ... The real question is what's the purpose of the DB?  Is it
>>> for OLAP
>>> or OLTP? ;)
>>> As for dealing with DB having millions of rows, you're crossing over into
>>> DBA area.
>>>
>>
>> Many of my customers have coming up on 20 years of data available. There
>> has been a debate on transferring historic data to a separate database, but
>> having it available is not causing a problem, except for some counts and
>> larger search actions, and being able to see how long a client has been
>> visiting is often useful. Statistical analysis is always done on a separate
>> machine, with a replicated copy of the data, so as not to affect the active
>> users ...
>>
>>
> What kind of counts/filters?  What kind of RAID subsystem is the storage?
> What's the total size of the DB?  Up to 20 years of data should be in the
> peta range.  In that peta range, if you're not having performance issue and
> not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the
> application and database design in details. :)
>

Brain fart... that should be tera not peta ... lol..


>
>
>> That said, we are now using the much more detailed LLPG address data
>> rather than simple postcode, and that has added another order of magnitude
>> to data that is being searched live ...
>>
>>
>> --
>> Lester Caine - G8HFL
>> -----------------------------
>> Contact - 
>> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
>> L.S.Caine Electronic Services - http://lsces.co.uk
>> EnquirySolve - http://enquirysolve.com/
>> Model Engineers Digital Workshop - http://medw.co.uk//
>> Firebird - 
>> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>>
>>
> Since Jason didn't disclose sufficient information, I suggested that RAID
> storage based on the worst case scenario being this:  as a business analyst,
> I'd want to do drill-downs, filters, counts of a the following for an
> up-coming marketing campaign:
>
> * county(ies)
> * city(ies)
> * zip code(s)
> * address type (business and/or residential)
> * of business (commercial, industrial/manufacturing, etc)
> * of residential ( homes - single/townhouses - or apartments/condos )
>
> The filters and counts will any combinations of the above, ie: 5 random zip
> codes within the given state that lists all other criteria and break down.
> As Jason mention having 89million rows for a given state, how long would it
> take to run the drill-downs if the DB isn't sitting on a fast storage
> medium?  That 89 million is the most likely the average count in the USA.
> For California and New York, the number can double that easily.  That's only
> the basic filtering.  What of a specific business industry such as
> landscaping?  What of the filtering by yearly income and/or real estate
> value?  BTW, as a business analyst, I don't want to wait hours for the info
> to update every time I change a criteria/filter to get the counts before I
> look into a few random individual records from the results.
>
> As I've mentioned, something of this magnitude is better to leave it to the
> DBA and work together with that DBA.  Either hire/contract one or become one
> :)
>
>
> Regards,
> Tommy
>

--- End Message ---
--- Begin Message ---
Tommy Pham wrote:

    Many of my customers have coming up on 20 years of data available. There has
    been a debate on transferring historic data to a separate database, but
    having it available is not causing a problem, except for some counts and
    larger search actions, and being able to see how long a client has been
    visiting is often useful. Statistical analysis is always done on a separate
    machine, with a replicated copy of the data, so as not to affect the active
    users ...


What kind of counts/filters?  What kind of RAID subsystem is the storage?
What's the total size of the DB?  Up to 20 years of data should be in the peta
range.  In that peta range, if you're not having performance issue and not using
either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the application and
database design in details. :)

We are still only in hundreds on Mb and historic data is has less detail than the current 'transactions'. The current postcode table is 500Mb, and while the LLPG data would increase that by the order of 100, it's currently only restricted to a councils immediate working area, so we keep the problem contained. Dropping back to postcode for out of area enquiries. Users complain if an enquiry takes more than a few seconds, and Firebird is giving me more than adequate performance, and allows shadow data to be created via triggers to reduce the need for 'counting'.

I have a new 'application' which is using the same search criteria but the data volume is growing a lot faster, 10Gb on the test system here, but I am still seeing the same search speeds once the correct indexes have been generated. But it will take a few more years before that starts reaching the 100Gb level :)

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--- End Message ---
--- Begin Message ---
Hello all.

I inherited some PHP pages about a year ago. They have been fine all along but now a bunch of erroneous errors and results are popping up. I traced it to the way the variables were being used on the page... for example, the following SQL statement (a space between ' and " for clarity):


sql="select name from mytable where name=$myvar and display='yes' ";

This has worked in the past but is now returning errors for some records and working for others. I changed the above to the following and now all is good:

sql="select name from mytable where name=' ".$myvar." ' and display='yes' ";

What would explain why the former is suddenly causing problems? The version of PHP is 5.2.3 and from what I can tell, hasn't been updated since February of 2011.

Thanks,

 --Rick



--- End Message ---
--- Begin Message ---
Rick Dwyer wrote:
Hello all.

I inherited some PHP pages about a year ago. They have been fine all along but
now a bunch of erroneous errors and results are popping up. I traced it to the
way the variables were being used on the page... for example, the following SQL
statement (a space between ' and " for clarity):

sql="select name from mytable where name=$myvar and display='yes' ";

This has worked in the past but is now returning errors for some records and
working for others. I changed the above to the following and now all is good:

sql="select name from mytable where name=' ".$myvar." ' and display='yes' ";

What would explain why the former is suddenly causing problems? The version of
PHP is 5.2.3 and from what I can tell, hasn't been updated since February of 
2011.

I would have a closer look at what the contents of $myvar is when it's giving an error. I would suspect it might contain a " or a ' ? The first version is not taking care of any characters that may need escaping, so you may well have been lucky in the past?

sql="select name from mytable where name='$myvar' and display='yes' " is another option for a different 'result', but personally I prefer to pass variables like this as a parameter, so the query just has name=? and you pass the $myvar in an array of variables. This helps prevent suspect sql getting into the query as well, not such a problem here, but sensible practice anyway.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--- End Message ---
--- Begin Message ---
On 10/26/2011 07:20 AM, Rick Dwyer wrote:
> Hello all.
> 
> I inherited some PHP pages about a year ago.  They have been fine all
> along but now a bunch of erroneous errors and results are popping up.  I
> traced it to the way the variables were being used on the page... for
> example, the following SQL statement (a space between ' and " for clarity):
> 
> 
> sql="select name from mytable where name=$myvar and display='yes' ";
> 
> This has worked in the past but is now returning errors for some records
> and working for others.  I changed the above to the following and now
> all is good:
> 
> sql="select name from mytable where name=' ".$myvar." ' and
> display='yes' ";
> 
> What would explain why the former is suddenly causing problems?  The
> version of PHP is 5.2.3 and from what I can tell, hasn't been updated
> since February of 2011.
> 
> Thanks,
> 
>  --Rick
> 
> 

In addition to escaping the var with mysql_real_escape_string() or
similar, values for text columns need to be quoted as you did in your
second example.  consider this as a name:

name=Shawn McKenzie

Without quoting, the space breaks the query, so it would need to be:

name='Shawn McKenzie'

And the escaping would take care of:

name='Tip O'Neill'

-- 
Thanks!
-Shawn
http://www.spidean.com

--- End Message ---

Reply via email to