php-general Digest 27 Oct 2011 07:53:15 -0000 Issue 7539

Topics (messages 315441 through 315445):

Re: Exporting large data from mysql to html using php
        315441 by: Tommy Pham
        315442 by: Jason Pruim
        315443 by: Jim Giner
        315444 by: Jason Pruim
        315445 by: Lester Caine

Administrivia:

To subscribe to the digest, e-mail:
        php-general-digest-subscr...@lists.php.net

To unsubscribe from the digest, e-mail:
        php-general-digest-unsubscr...@lists.php.net

To post to the list, e-mail:
        php-gene...@lists.php.net


----------------------------------------------------------------------
--- Begin Message ---
On Wed, Oct 26, 2011 at 4:14 AM, Lester Caine <les...@lsces.co.uk> wrote:

> Tommy Pham wrote:
>
>>
>>    Many of my customers have coming up on 20 years of data available.
>> There has
>>    been a debate on transferring historic data to a separate database, but
>>    having it available is not causing a problem, except for some counts
>> and
>>    larger search actions, and being able to see how long a client has been
>>    visiting is often useful. Statistical analysis is always done on a
>> separate
>>    machine, with a replicated copy of the data, so as not to affect the
>> active
>>    users ...
>>
>>
>> What kind of counts/filters?  What kind of RAID subsystem is the storage?
>> What's the total size of the DB?  Up to 20 years of data should be in the
>> peta
>> range.  In that peta range, if you're not having performance issue and not
>> using
>> either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the application
>> and
>> database design in details. :)
>>
>
> We are still only in hundreds on Mb and historic data is has less detail
> than the current 'transactions'. The current postcode table is 500Mb, and
> while the LLPG data would increase that by the order of 100, it's currently
> only restricted to a councils immediate working area, so we keep the problem
> contained. Dropping back to postcode for out of area enquiries. Users
> complain if an enquiry takes more than a few seconds, and Firebird is giving
> me more than adequate performance, and allows shadow data to be created via
> triggers to reduce the need for 'counting'.
>
> I have a new 'application' which is using the same search criteria but the
> data volume is growing a lot faster, 10Gb on the test system here, but I am
> still seeing the same search speeds once the correct indexes have been
> generated. But it will take a few more years before that starts reaching the
> 100Gb level :)
>
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact - 
> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird - 
> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>
>
>
I'm just curious.  What's the total rows count?  Data accumulated in 20
years and only taking that much space doesn't seem like there's a lot going
on each year over the years.  All the DBAs that I know they deal with
minimum addition/import of 1 million rows per week, currently.  I didn't
bother asking them how far back they keep the data as that amount of rows is
too overwhelming for me, for the moment, as I'm not a DBA :)

--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com



On Oct 26, 2011, at 6:23 AM, Tommy Pham wrote:

> On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <les...@lsces.co.uk> wrote:
> 
>> Tommy Pham wrote:
>> 
>>> I wonder ... The real question is what's the purpose of the DB?  Is it for
>>> OLAP
>>> or OLTP? ;)
>>> As for dealing with DB having millions of rows, you're crossing over into
>>> DBA area.
>>> 
>> 
>> Many of my customers have coming up on 20 years of data available. There
>> has been a debate on transferring historic data to a separate database, but
>> having it available is not causing a problem, except for some counts and
>> larger search actions, and being able to see how long a client has been
>> visiting is often useful. Statistical analysis is always done on a separate
>> machine, with a replicated copy of the data, so as not to affect the active
>> users ...
>> 
>> 
> What kind of counts/filters?  What kind of RAID subsystem is the storage?
> What's the total size of the DB?  Up to 20 years of data should be in the
> peta range.  In that peta range, if you're not having performance issue and
> not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the
> application and database design in details. :)
> 
> 
>> That said, we are now using the much more detailed LLPG address data rather
>> than simple postcode, and that has added another order of magnitude to data
>> that is being searched live ...
>> 
>> 
>> --
>> Lester Caine - G8HFL
>> -----------------------------
>> Contact - 
>> http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact>
>> L.S.Caine Electronic Services - http://lsces.co.uk
>> EnquirySolve - http://enquirysolve.com/
>> Model Engineers Digital Workshop - http://medw.co.uk//
>> Firebird - 
>> http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php>
>> 
>> 
> Since Jason didn't disclose sufficient information, I suggested that RAID
> storage based on the worst case scenario being this:  as a business analyst,
> I'd want to do drill-downs, filters, counts of a the following for an
> up-coming marketing campaign:
> 
> * county(ies)
> * city(ies)
> * zip code(s)
> * address type (business and/or residential)
> * of business (commercial, industrial/manufacturing, etc)
> * of residential ( homes - single/townhouses - or apartments/condos )
> 
> The filters and counts will any combinations of the above, ie: 5 random zip
> codes within the given state that lists all other criteria and break down.
> As Jason mention having 89million rows for a given state, how long would it
> take to run the drill-downs if the DB isn't sitting on a fast storage
> medium?  That 89 million is the most likely the average count in the USA.
> For California and New York, the number can double that easily.  That's only
> the basic filtering.  What of a specific business industry such as
> landscaping?  What of the filtering by yearly income and/or real estate
> value?  BTW, as a business analyst, I don't want to wait hours for the info
> to update every time I change a criteria/filter to get the counts before I
> look into a few random individual records from the results.

The server that's running it is a home computer with a VPS installed... It's 
not my dev environment :)  

The information being searched is specifically phone numbers, and the bosses 
want to provide the public away to browse them, hence the pagination... Once I 
removed a COUNT from mysql it started working alot better... So I still need to 
provide a better pagination system, but it's working.

Oh, and the 89 million is just for one state :) We are talking the possibility 
of I believe 10 billion numbers to cover the US, not to mention Canada which I 
believe uses the same numbering system as we do so that could be another 10 
billion... 

> 
> As I've mentioned, something of this magnitude is better to leave it to the
> DBA and work together with that DBA.  Either hire/contract one or become one
> :)

I'm working on becoming one ontop of web designer and programmer :)
> 
> 
> Regards,
> Tommy


--- End Message ---
--- Begin Message ---
Your boss wants to give access to phone numbers to the public in general? 
Then what?

Glad mine's unlisted.



--- End Message ---
--- Begin Message ---
Jason Pruim
li...@pruimphotography.com



On Oct 26, 2011, at 9:09 PM, Jim Giner wrote:

> Your boss wants to give access to phone numbers to the public in general? 
> Then what?
> 
> Glad mine's unlisted.

There's no identifying information on the phone numbers... Simply just the 
phone number... Eventually maybe some info about who owns the phone line... And 
a few other things. 

The rest is covered under an NDA until the site goes live :) but there 
shouldn't be any privacy related issues...



--- End Message ---
--- Begin Message ---
Jim Giner wrote:
Your boss wants to give access to phone numbers to the public in general?
Then what?

Glad mine's unlisted.

Is it?
Does it start 518248 ?
I often forget to get a phone number when a parcel has to go by carrier, and paypal does not include that info, but only rarely does one not come up on google ... even if in your case I had to check the cashed copy of the the result ;)

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--- End Message ---

Reply via email to