PHP does expose sys V shared-memory apis (shm_* functions):

If you already have apc installed, you could also try:

APC also allows you to store user specific data too (it will be in a
shared memory).

Haven't tried these myself, so I would do some quick tests to ensure
if they meet your performance requirements. In theory, it should be
faster than berkeley-db like solutions (which is also another option
but it seems something similar like MongoDB was not good enough?).

I  am curious to know if someone here has run these tests. Note that
with memcached installed locally (on the same box running php), it can
be surprisingly efficient - using pconnect(),  caching the handler in
a static var for a given request cycle etc...


On Sun, Jan 24, 2010 at 9:39 AM, D. Dante Lorenso <> wrote:
> shiplu wrote:
>> On Sun, Jan 24, 2010 at 3:11 AM, D. Dante Lorenso <>
>> wrote:
>>> All,
>>> I'm loading millions of records into a backend PHP cli script that I
>>> need to build a hash index from to optimize key lookups for data that
>>> I'm importing into a MySQL database.  The problem is that storing this
>>> data in a PHP array is not very memory efficient and my millions of
>>> records are consuming about 4-6 GB of ram.
>> What are you storing? An array of row objects??
>> In that case storing only the row id is will reduce the memory.
> I am querying a MySQL database which contains 40 million records and mapping
> string columns to numeric ids.  You might consider it normalizing the data.
> Then, I am importing a new 40 million records and comparing the new values
> to the old values.  Where the value matches, I update records, but where
> they do not match, I insert new records, and finally I go back and delete
> old records.  So, the net result is that I have a database with 40 million
> records that I need to "sync" on a daily basis.
>> If you are loading full row objects, it will take a lot of memory.
>> But if you just load the row id values, it will significantly decrease
>> the memory amount.
> For what I am trying to do, I just need to map a string value (32 bytes) to
> a bigint value (8 bytes) in a fast-lookup hash.
>> Besides, You can load row ids in a chunk by chunk basis. if you have
>> 10 millions of rows to process. load 10000 rows as a chunk. process
>> them then load the next chunk.  This will significantly reduce memory
>> usage.
> When importing the fresh 40 million records, I need to compare each record
> with 4 different indexes that will map the record to existing other records,
> or into a "group_id" that the record also belongs to.  My current solution
> uses a trigger in MySQL that will do the lookups inside MySQL, but this is
> extremely slow.  Pre-loading the mysql indexes into PHP ram and processing
> that was is thousands of times faster.
> I just need an efficient way to hold my hash tables in PHP ram.  PHP arrays
> are very fast, but like my original post says, they consume way too much
> ram.
>> A good algorithm can solve your problem anytime. ;-)
> It takes about 5-10 minutes to build my hash indexes in PHP ram currently
> which makes up for the 10,000 x speedup on key lookups that I get later on.
>  I just want to not use the whole 6 GB of ram to do this.   I need an
> efficient hashing API that supports something like:
>        $value = (int) fasthash_get((string) $key);
>        $exists = (bool) fasthash_exists((string) $key);
>        fasthash_set((string) $key, (int) $value);
> Or ... it feels like a "memcached" api but where the data is stored locally
> instead of accessed via a network.  So this is how my search led me to what
> appears to be a dead "lchash" extension.
> -- Dante
> ----------
> D. Dante Lorenso
> 972-333-4139
> --
> PHP General Mailing List (
> To unsubscribe, visit:

PHP General Mailing List (
To unsubscribe, visit:

Reply via email to