php-general Digest 27 Oct 2011 07:53:15 -0000 Issue 7539
Topics (messages 315441 through 315445):
Re: Exporting large data from mysql to html using php
315441 by: Tommy Pham
315442 by: Jason Pruim
315443 by: Jim Giner
315444 by: Jason Pruim
315445 by: Lester Caine
Administrivia:
To subscribe to the digest, e-mail:
php-general-digest-subscr...@lists.php.net
To unsubscribe from the digest, e-mail:
php-general-digest-unsubscr...@lists.php.net
To post to the list, e-mail:
php-gene...@lists.php.net
----------------------------------------------------------------------
--- Begin Message ---
On Wed, Oct 26, 2011 at 4:14 AM, Lester Caine <les...@lsces.co.uk> wrote:
> Tommy Pham wrote:
>
>>
>> Many of my customers have coming up on 20 years of data available.
>> There has
>> been a debate on transferring historic data to a separate database, but
>> having it available is not causing a problem, except for some counts
>> and
>> larger search actions, and being able to see how long a client has been
>> visiting is often useful. Statistical analysis is always done on a
>> separate
>> machine, with a replicated copy of the data, so as not to affect the
>> active
>> users ...
>>
>>
>> What kind of counts/filters? What kind of RAID subsystem is the storage?
>> What's the total size of the DB? Up to 20 years of data should be in the
>> peta
>> range. In that peta range, if you're not having performance issue and not
>> using
>> either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the application
>> and
>> database design in details. :)
>>
>
> We are still only in hundreds on Mb and historic data is has less detail
> than the current 'transactions'. The current postcode table is 500Mb, and
> while the LLPG data would increase that by the order of 100, it's currently
> only restricted to a councils immediate working area, so we keep the problem
> contained. Dropping back to postcode for out of area enquiries. Users
> complain if an enquiry takes more than a few seconds, and Firebird is giving
> me more than adequate performance, and allows shadow data to be created via
> triggers to reduce the need for 'counting'.
>
> I have a new 'application' which is using the same search criteria but the
> data volume is growing a lot faster, 10Gb on the test system here, but I am
> still seeing the same search speeds once the correct indexes have been
> generated. But it will take a few more years before that starts reaching the
> 100Gb level :)
>
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact -
> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird -
> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>
>
>
I'm just curious. What's the total rows count? Data accumulated in 20
years and only taking that much space doesn't seem like there's a lot going
on each year over the years. All the DBAs that I know they deal with
minimum addition/import of 1 million rows per week, currently. I didn't
bother asking them how far back they keep the data as that amount of rows is
too overwhelming for me, for the moment, as I'm not a DBA :)
--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com
On Oct 26, 2011, at 6:23 AM, Tommy Pham wrote:
> On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <les...@lsces.co.uk> wrote:
>
>> Tommy Pham wrote:
>>
>>> I wonder ... The real question is what's the purpose of the DB? Is it for
>>> OLAP
>>> or OLTP? ;)
>>> As for dealing with DB having millions of rows, you're crossing over into
>>> DBA area.
>>>
>>
>> Many of my customers have coming up on 20 years of data available. There
>> has been a debate on transferring historic data to a separate database, but
>> having it available is not causing a problem, except for some counts and
>> larger search actions, and being able to see how long a client has been
>> visiting is often useful. Statistical analysis is always done on a separate
>> machine, with a replicated copy of the data, so as not to affect the active
>> users ...
>>
>>
> What kind of counts/filters? What kind of RAID subsystem is the storage?
> What's the total size of the DB? Up to 20 years of data should be in the
> peta range. In that peta range, if you're not having performance issue and
> not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the
> application and database design in details. :)
>
>
>> That said, we are now using the much more detailed LLPG address data rather
>> than simple postcode, and that has added another order of magnitude to data
>> that is being searched live ...
>>
>>
>> --
>> Lester Caine - G8HFL
>> -----------------------------
>> Contact -
>> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
>> L.S.Caine Electronic Services - http://lsces.co.uk
>> EnquirySolve - http://enquirysolve.com/
>> Model Engineers Digital Workshop - http://medw.co.uk//
>> Firebird -
>> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>>
>>
> Since Jason didn't disclose sufficient information, I suggested that RAID
> storage based on the worst case scenario being this: as a business analyst,
> I'd want to do drill-downs, filters, counts of a the following for an
> up-coming marketing campaign:
>
> * county(ies)
> * city(ies)
> * zip code(s)
> * address type (business and/or residential)
> * of business (commercial, industrial/manufacturing, etc)
> * of residential ( homes - single/townhouses - or apartments/condos )
>
> The filters and counts will any combinations of the above, ie: 5 random zip
> codes within the given state that lists all other criteria and break down.
> As Jason mention having 89million rows for a given state, how long would it
> take to run the drill-downs if the DB isn't sitting on a fast storage
> medium? That 89 million is the most likely the average count in the USA.
> For California and New York, the number can double that easily. That's only
> the basic filtering. What of a specific business industry such as
> landscaping? What of the filtering by yearly income and/or real estate
> value? BTW, as a business analyst, I don't want to wait hours for the info
> to update every time I change a criteria/filter to get the counts before I
> look into a few random individual records from the results.
The server that's running it is a home computer with a VPS installed... It's
not my dev environment :)
The information being searched is specifically phone numbers, and the bosses
want to provide the public away to browse them, hence the pagination... Once I
removed a COUNT from mysql it started working alot better... So I still need to
provide a better pagination system, but it's working.
Oh, and the 89 million is just for one state :) We are talking the possibility
of I believe 10 billion numbers to cover the US, not to mention Canada which I
believe uses the same numbering system as we do so that could be another 10
billion...
>
> As I've mentioned, something of this magnitude is better to leave it to the
> DBA and work together with that DBA. Either hire/contract one or become one
> :)
I'm working on becoming one ontop of web designer and programmer :)
>
>
> Regards,
> Tommy
--- End Message ---
--- Begin Message ---
Your boss wants to give access to phone numbers to the public in general?
Then what?
Glad mine's unlisted.
--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com
On Oct 26, 2011, at 9:09 PM, Jim Giner wrote:
> Your boss wants to give access to phone numbers to the public in general?
> Then what?
>
> Glad mine's unlisted.
There's no identifying information on the phone numbers... Simply just the
phone number... Eventually maybe some info about who owns the phone line... And
a few other things.
The rest is covered under an NDA until the site goes live :) but there
shouldn't be any privacy related issues...
--- End Message ---
--- Begin Message ---
Jim Giner wrote:
Your boss wants to give access to phone numbers to the public in general?
Then what?
Glad mine's unlisted.
Is it?
Does it start 518248 ?
I often forget to get a phone number when a parcel has to go by carrier, and
paypal does not include that info, but only rarely does one not come up on
google ... even if in your case I had to check the cashed copy of the the result ;)
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
--- End Message ---