Hi Aaron, I have written similar CLI scripts to parse multiple tables with ~9M records fine with a 64MB PHP memory limit.
It sounds like you have some kind of memory leak, dealing with large data requires a few special considerations: - You need to do 'virtual paging' with the data, using MySQL LIMIT and OFFSET and iterate through blocks of something like 2048 per set. Otherwise you will consume all the memory just getting the resultset back from MySQL. - If you are working with the data on each iteration, try not to put it in a new array. Work with it directly and use 'unset()' or re-use variables. - If you have some kind of ORM, you might have to explicitly unset() variables or with Doctrine you have to use its inbuilt method - If using a framework like symfony, force the script to use a 'CLI' mode rather than 'dev' or 'prod' to turn off any logging that could cause memory leaks Hope this helps, Cheers, Stig On 12 June 2011 11:52, AaronC <[email protected]> wrote: > Hi all, > > This is more of a logic/efficiency question than purely technical. > > I am in the progress of migrating two similar sites into one (as far > as user data is concerned). It is possible that some user data to be > present on both sites. > > I have written a script to run in CLI that cycles through records in > various tables from site A, and checks for matching records in one > site B table. If match is present, data from site A is updated into > the site B table. If no no match is present, data from site A is > inserted into site B table. Here are the record counts: > > Site A > Table 1 - 350,000 records > Table 2 - 1.3 million records > > Site B > Table 1 - 172,000 records > > Running SQL queries and looping through such large data sets obviously > has the issue of memory exhaustion. In the case of Site A, Table 1, a > memory limit of 500M is exhausted at around 220,000 records. > > No, the nature of the data does not allow me to reverse the import > procedure and just run on Site B's records instead. > > I am trying to come up with a way of effectively releasing memory as > this processes. > > 1. I could insert the update/insert queries into another table for > running individually by a cron (like a queue). I don't see why this > would help however, as the exhaustion above occurs with insert/update > queries disabled. All the script is doing is outputting a short > message on the console (one per record) > 2. I could use a start/limit in my queries when selecting data from > the Site A tables. This seems to be the most obvious choice. > > Does anyone have any other ideas? This is a rather tricky thing to > find on Google relating to pure php. > > Cheers > Aaron > > -- > NZ PHP Users Group: http://groups.google.com/group/nzphpug > To post, send email to [email protected] > To unsubscribe, send email to > [email protected] > -- NZ PHP Users Group: http://groups.google.com/group/nzphpug To post, send email to [email protected] To unsubscribe, send email to [email protected]
