Hello.

I have a somewhat large database table in MySQL. It's a MyISAM table  
with about 21,000,000 rows. It was working great at around 18,000,000  
rows, and I've written a ton of code using Rose::DB... but now that  
the table has that mayn rows, I can't do anything without running out  
of memory:

Out of memory (Needed 1461356 bytes)
DBD::mysql::st execute failed: MySQL client ran out of memory at /usr/ 
local/lib/perl5/site_perl/5.8.8/Rose/DB/Object/Manager.pm line 1680.
get_objects() - DBD::mysql::st execute failed: MySQL client ran out  
of memory at /usr/local/lib/perl5/site_perl/5.8.8/Rose/DB/Object/ 
Manager.pm line 1680.
at /usr/local/lib/perl5/site_perl/5.8.8/Rose/DB/Object/Manager.pm  
line 2662
         Rose::DB::Object::Manager::get_objects 
('MalwareLog::Manager', 'id', 'HASH(0x804d16c)', 'object_class',  
'MalwareLog') called at /usr/local/lib/perl5/site_perl/5.8.8/Rose/DB/ 
Object/Manager.pm line 202
         Rose::DB::Object::Manager::__ANON__('MalwareLog::Manager',  
'id', 'HASH(0x804d16c)') called at ./test.pl line 8

I have rewritten a package that implements just the bare basics to  
exemplify this error. I will try and keep this as short as possible  
without smothering you guys with details.

Here is a quick description of the package space
Intellifound::DB - I'm going to be using multiple databases, and I'd  
prefer to manage them in separate, simple classes
Intellifound::DB::ISDB - Rose::DB derived class for a mysql database  
called 'isdb'
Intellifound::DB::ISDB::Object - Rose::DB::Object derived class
Intellifound::ISDB::MalwareLog - ORM interface to isdb.malware_logs.  
This is the huge table.
Intellifound::ISDB::MalwareLog::Manager - Rose::DB::Object::Manager  
derived class with MalwareLog as object_class. The manager methods  
are named 'malwarelogs'.

This is what the 21,000,000 row table I am working with looks like:
+------------------+---------------------+------+----- 
+---------------------+----------------+
| Field            | Type                | Null | Key |  
Default             | Extra          |
+------------------+---------------------+------+----- 
+---------------------+----------------+
| id               | int(10) unsigned    | NO   | PRI |  
NULL                | auto_increment |
| repos_key        | bigint(20) unsigned | NO   | MUL  
|                     |                |
| add_time         | timestamp           | YES  | MUL |  
CURRENT_TIMESTAMP   |                |
| entry_time       | timestamp           | YES  | MUL | 0000-00-00  
00:00:00 |                |
| ownership        | varchar(100)        | YES  | MUL |  
NULL                |                |
| login_credential | tinyint(1)          | YES  | MUL |  
NULL                |                |
| string_data      | tinyint(1)          | YES  | MUL |  
NULL                |                |
| guid             | varchar(64)         | NO   | MUL  
|                     |                |
| user_ip          | int(10) unsigned    | NO   | MUL  
|                     |                |
| host             | text                | YES  | MUL |  
NULL                |                |
| url              | tinytext            | YES  | MUL |  
NULL                |                |
| data             | mediumtext          | NO   |      
|                     |                |
+------------------+---------------------+------+----- 
+---------------------+----------------+

I don't expect to be able to say:

my $objects = MalwareLog::Manager->get_malwarelogs()

and have a reference to an array of 21,000,000 loaded objects. That's  
almost 12gb in memory (the 'data' column is pretty big for each row).

The last row in the table has an id of 23503496. Am I wrong to think  
I can say:

my $objects = MalwareLog::Manager->get_malwarelogs(id => { gt =>  
23503495 });

and get an array of two objects back? Cause that's what produces the  
error above.

Ideally, I will be making iterators from SQL calls. The SQL will  
limit the number of rows as best as I can, but there will be times I  
need to iterate over every row. I read in one of the CPAN docs that  
Rose::DB::Object::Iterator returns 'true iterators', i.e. the rows  
are not loaded until you call next, so it seems like I should also be  
able to say:

my $sql = "SELECT * from malware_logs";
my $iterator = MalwareLog::Manager->get_objects_iterator_from_sql($sql);
while (my $object = $iterator->next) {
        ## row gets loaded
        ## i do stuff
        ## memory is released at the end of this block
}

is that correct? That also does not work, same memory error. Well  
either way, if I can't even get it to work with the array of 2 loaded  
objects in the last example, I don't suppose anything will work.

when I make a new table (called 'abc') from the same schema  
definition and load it with 1,000,000 rows from the malware_logs  
table, everything works fine. is it just a limitation of Rose::DB  
that it won't work with tables of this size? or is it a DBI issue?

any suggestions would be very appreciated. like i said, i spent tons  
of time writing software using this ORM scheme that worked great  
until the table got this big, so I'm quite desperate for a way to  
make this work.

For the record this is FreeBSD 6.1-RELEASE, mysql 14.12 Distrib  
5.1.11-beta (from ports) and Rose::DB 0.758.

Thank you for reading,
Neal

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Rose-db-object mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rose-db-object

Reply via email to