Re: DBD::CSV much slower on osX ?
On Wed, Feb 16, 2005 at 09:44:08AM -0800, Jeff Zucker wrote : [EMAIL PROTECTED] wrote: Total Elapsed Time = 63.19465 Seconds User+System Time = 43.46465 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 66.5 28.91 36.463 109445 0.0003 0.0003 SQL::Statement::eval_where I'm the maintainer of DBD::CSV and while I don't have time this week to get into this, I am following the conversation and will certainly eventually revise the module if any useful information is discovered. Sorry for wasting your time with my problems, but since I changed the i386 computer, I can not reproduce the speed difference anymore. And I do not have access anymore to this computer where the query was running faster to double-check wether everything is exactly the same than on the new one. As I had better to spend my time on analysing some data with my script rather than analysing the script itself, I eventually swiched to DBI::SQLite, which solved my problem by performing the query in a few seconds. The query was something like : $query = SELECT * FROM $db_table WHERE SYMBOL1 LIKE \'%${Search}%\' OR SYMBOL2 LIKE \'%${Search}%\' ; my $sth = $dbh-prepare($query); $sth-execute; (actually CLIKE, before the transition) You may wonder the purpose of this mail, as there is no new crucial information. The reason is that I do not like not knowing the end of the story when I browse archived threads. Anyway, many thanks to all that tried to help me. -- Charles
Re: DBD::CSV much slower on osX ?
On Fri, Apr 01, 2005 at 03:09:12PM +0900, Charles Plessy wrote: You may wonder the purpose of this mail, as there is no new crucial information. The reason is that I do not like not knowing the end of the story when I browse archived threads. Thanks Charles. Tim.
Re: DBD::CSV much slower on osX ?
Charles Plessy wrote: As I had better to spend my time on analysing some data with my script rather than analysing the script itself, I eventually swiched to DBI::SQLite, which solved my problem by performing the query in a few seconds. There's no question, SQLite is faster than DBD::CSV for most things. If someone asks me for a recommendation for a database to use and they have large and or complex data, don't care about the format of the data, and speed is an issue, I never recommend my own modules :-). I hope you're using mod_perl or something because connecting to the database is one area where SQLite is slower. It's also slower for inserts. I'd like to point out also that SQL::Statement, the underlying engine for DBD::CSV, is undergoing some major changes and is becoming faster on each benchmark and that the purpose of the pure perl DBDs like DBD::CSV is to provide access to human readable data, unconventional datasources, and to provide support for platforms and contexts where compilation is not an option, not to try to rival the speed of RDBMSs written in C. The query was something like : $query = SELECT * FROM $db_table WHERE SYMBOL1 LIKE \'%${Search}%\' OR SYMBOL2 LIKE \'%${Search}%\' ; LIKE and CLIKE with wildcards are full text searches and are always going to be slow relative to other kinds of searches. You may wonder the purpose of this mail, as there is no new crucial information. The reason is that I do not like not knowing the end of the story when I browse archived threads. Thanks, I appreciate hearing back. Good luck! -- Jeff
Re: DBD::CSV much slower on osX ?
On Thu, Feb 03, 2005 at 10:18:43AM +, Tim Bunce wrote : On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] wrote: Dear list, I wrote a simple CGI script using DBD::CSV on a linux computer, and then installed it on a iMac G5. Its execution time is now alomst 10 times slower. Using print instructions, I traced the bottleneck to the following instruction : $sth-execute; Now I am a bit stuck, as I do not know how to investigate in the DBD::CSV module to find where is the slow isnstruction. The Devel::DProf module (and/or other code profiling modules) may help. Thank you for pointing this module. I have used it to analyse my script : Total Elapsed Time = 63.19465 Seconds User+System Time = 43.46465 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 66.5 28.91 36.463 109445 0.0003 0.0003 SQL::Statement::eval_where 11.1 4.848 6.132 109447 0. 0.0001 Text::CSV_XS::getline 7.96 3.458 3.458 218890 0. 0. SQL::Statement::get_row_value 6.90 2.998 9.129 109447 0. 0.0001 DBD::CSV::Table::fetch_row 5.80 2.520 7.545 109445 0. 0.0001 SQL::Statement::process_predicate 4.36 1.894 1.894 109445 0. 0. SQL::Statement::is_matched 2.95 1.284 1.284 109447 0. 0. IO::Handle::getline 2.29 0.997 46.596 1 0.9965 46.595 SQL::Statement::SELECT 0.16 0.069 0.076 27 0.0026 0.0028 CGI::_compile 0.16 0.068 0.123 4 0.0170 0.0307 main::BEGIN 0.09 0.040 0.079 4 0.0099 0.0198 DBI::SQL::Nano::BEGIN 0.09 0.040 0.039 7 0.0057 0.0056 SQL::Statement::BEGIN 0.07 0.029 0.164 1 0.0295 0.1645 DBI::install_driver 0.02 0.010 0.010 1 0.0100 0.0100 Fcntl::bootstrap 0.02 0.010 0.010 1 0.0100 0.0100 SQL::Parser::dialect It seems that there is no bottleneck. I have ran the script on a 3GHz linux box, and it took it 19 seconds to complete. So I do not understand how I mananaged to run it in 7 seconds on a 1.8 GHz Athlon laptop (which unfortunately I do not own anymore). As somebody kindly pointed me out that hardware differences could have a strong impact on the results, I will suppose that this is the reason why I have seen such speed inconsistencies, unless somebody found something insightful in the dprofpp output extract in this mail. Thanks to those who answered me and offered their help. -- Charles
Re: DBD::CSV much slower on osX ?
On Thu, Feb 03, 2005 at 11:33:10AM -0800, Henri Asseily wrote : Also note that DBD::CSV is significantly impacted by I/O speed. If your IMac G5 has a 4200 rpm drive and your linux box has a 10k rpm one, that makes quite a large difference. Would that mean that running the script twice should dramatically accelerate the execution because the file would then be cached in memory? (which does not work, I just tried). Best, -- Charles
Re: DBD::CSV much slower on osX ?
On 2005-02-16 19:44:49 +0900, [EMAIL PROTECTED] wrote: On Thu, Feb 03, 2005 at 10:18:43AM +, Tim Bunce wrote : On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] wrote: Dear list, I wrote a simple CGI script using DBD::CSV on a linux computer, and then installed it on a iMac G5. Its execution time is now alomst 10 times slower. [...] The Devel::DProf module (and/or other code profiling modules) may help. Thank you for pointing this module. I have used it to analyse my script : Total Elapsed Time = 63.19465 Seconds User+System Time = 43.46465 Seconds That's strange. Unless your computer is busy doing something else, it is waiting almost 20 seconds for I/O. Unless the lines of your CSV file are really long, I cannot imagine that simply reading a file with 109447 lines can take 20 seconds. (Even less if that is a join over several files) Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 66.5 28.91 36.463 109445 0.0003 0.0003 SQL::Statement::eval_where [...] 2.95 1.284 1.284 109447 0. 0. IO::Handle::getline 2.29 0.997 46.596 1 0.9965 46.595 SQL::Statement::SELECT [...] It seems that there is no bottleneck. Most of the CPU time is spent in SQL::Statement::eval_where. Unless you can change your query, you probably can't make it much faster (except maybe by moving from CSV to a real RDBMS or maybe SQLite). hp -- _ | Peter J. Holzer | If the code is old but the problem is new |_|_) | Sysadmin WSR / LUGA | then the code probably isn't the problem. | | | [EMAIL PROTECTED]| __/ | http://www.hjp.at/ | -- Tim Bunce on dbi-users, 2004-11-05 pgpzOVmn24wHP.pgp Description: PGP signature
Re: Re: DBD::CSV much slower on osX ?
From: Charles Plessy [EMAIL PROTECTED] Date: 2005/02/16 Wed AM 04:58:36 CST Would that mean that running the script twice should dramatically accelerate the execution because the file would then be cached in memory? (which does not work, I just tried). Keep in mind that it's not just your script that must be cached, if indeed that did work. There's also Perl, the modules, and your data (if in CSV form). If you want to test to see if it is just I/O, find out if your O/S supports RAM-disk, copy all files involved there, and run a test. If your run time is significantly reduced, it may indeed point to I/O issues... Best, Charles HTH, amonotod -- `\|||/ amonotod@| sun|perl|windows (@@) charter.net | sysadmin|dba ooO_(_)_Ooo _|_|_|_|_|_|_|_|
Re: DBD::CSV much slower on osX ?
On Feb 16, 2005, at 2:58 AM, Charles Plessy wrote: On Thu, Feb 03, 2005 at 11:33:10AM -0800, Henri Asseily wrote : Also note that DBD::CSV is significantly impacted by I/O speed. If your IMac G5 has a 4200 rpm drive and your linux box has a 10k rpm one, that makes quite a large difference. Would that mean that running the script twice should dramatically accelerate the execution because the file would then be cached in memory? (which does not work, I just tried). Looks like everything is normal from your dprof output. it just takes a while to access the file. Assuming your IMac G5 has enough available RAM to use as file buffers when you load the file, try the following command just before running the script: time cat /path/to/csv/file /dev/null This should load the file into memory buffers and tell you the time it took. Then run the same command again. It should be almost instantaneous, telling you that the file is in RAM. Only then, run your perl script and see it fly. FYI on my powerbook, a 'cat' to /dev/null of a 230 meg file takes 11 seconds. The next run takes 0.8 seconds. H.
Re: DBD::CSV much slower on osX ?
[EMAIL PROTECTED] wrote: Total Elapsed Time = 63.19465 Seconds User+System Time = 43.46465 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 66.5 28.91 36.463 109445 0.0003 0.0003 SQL::Statement::eval_where I'm the maintainer of DBD::CSV and while I don't have time this week to get into this, I am following the conversation and will certainly eventually revise the module if any useful information is discovered. My only comment so far is that if, as the above shows, the eval_where is what is taking the time, then I'd like to know what SQL query you are running. Are you doing full text searches with LIKE? or many comparisons? Do you use a prepare-once, execute many approach? This may be a red herring, but I'd be curious. -- Jeff
Re: DBD::CSV much slower on osX ?
On Wed, Feb 16, 2005 at 07:44:49PM +0900, [EMAIL PROTECTED] wrote: As somebody kindly pointed me out that hardware differences could have a strong impact on the results, I will suppose that this is the reason why I have seen such speed inconsistencies, unless somebody found something insightful in the dprofpp output extract in this mail. Peter had some insightful comments. And you've not addressed the issues I raised a few days ago: : A couple of quick guesses: : - perl config differences - eg configured for threads or not : - perhaps on OS X you're using unicode data You could start by posting the perl -V output for the two perls. Tim.
DBD::CSV much slower on osX ?
Dear list, I wrote a simple CGI script using DBD::CSV on a linux computer, and then installed it on a iMac G5. Its execution time is now alomst 10 times slower. Using print instructions, I traced the bottleneck to the following instruction : $sth-execute; Now I am a bit stuck, as I do not know how to investigate in the DBD::CSV module to find where is the slow isnstruction. I installed the modules by cpan on osX and by apt-get on debian testing. Can anyone help me to identify the source of the problem ? Best, -- Charles
Re: DBD::CSV much slower on osX ?
On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] wrote: Dear list, I wrote a simple CGI script using DBD::CSV on a linux computer, and then installed it on a iMac G5. Its execution time is now alomst 10 times slower. Using print instructions, I traced the bottleneck to the following instruction : $sth-execute; Now I am a bit stuck, as I do not know how to investigate in the DBD::CSV module to find where is the slow isnstruction. The Devel::DProf module (and/or other code profiling modules) may help. A couple of quick guesses: - perl config differences - eg configured for threads or not - perhaps on OS X you're using unicode data Tim.
Re: DBD::CSV much slower on osX ?
On Feb 3, 2005, at 2:18 AM, Tim Bunce wrote: On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] wrote: Dear list, I wrote a simple CGI script using DBD::CSV on a linux computer, and then installed it on a iMac G5. Its execution time is now alomst 10 times slower. Using print instructions, I traced the bottleneck to the following instruction : $sth-execute; Now I am a bit stuck, as I do not know how to investigate in the DBD::CSV module to find where is the slow isnstruction. The Devel::DProf module (and/or other code profiling modules) may help. A couple of quick guesses: - perl config differences - eg configured for threads or not - perhaps on OS X you're using unicode data Tim. We need to know a little more about your config: OS X version? What Perl version? I know that Perl by default is threaded on OS X. What type of linux box are you benchmarking against? Also note that DBD::CSV is significantly impacted by I/O speed. If your IMac G5 has a 4200 rpm drive and your linux box has a 10k rpm one, that makes quite a large difference. Send me off-list the code and data and I'll test it.