Re: DBD::CSV much slower on osX ?

2005-04-01 Thread Charles Plessy
On Wed, Feb 16, 2005 at 09:44:08AM -0800, Jeff Zucker wrote :
 [EMAIL PROTECTED] wrote:
 
 Total Elapsed Time = 63.19465 Seconds
  User+System Time = 43.46465 Seconds
 Exclusive Times
 %Time ExclSec CumulS #Calls sec/call Csec/c  Name
 66.5   28.91 36.463 109445   0.0003 0.0003  SQL::Statement::eval_where
  
 
 I'm the maintainer of DBD::CSV and while I don't have time this week to 
 get into this, I am following the conversation and will certainly 
 eventually revise the module if any useful information is discovered.  

Sorry for wasting your time with my problems, but since I
changed the i386 computer, I can not reproduce the speed difference
anymore. And I do not have access anymore to this computer where the
query was running faster to double-check wether everything is exactly
the same than on the new one.

As I had better to spend my time on analysing some data with
my script rather than analysing the script itself, I eventually
swiched to DBI::SQLite, which solved my problem by performing the
query in a few seconds.

The query was something like :

$query = SELECT * FROM $db_table WHERE SYMBOL1 LIKE \'%${Search}%\' OR  
SYMBOL2 LIKE \'%${Search}%\' ;
my $sth = $dbh-prepare($query);
$sth-execute;

(actually CLIKE, before the transition)


You may wonder the purpose of this mail, as there is no new
crucial information. The reason is that I do not like not knowing the
end of the story when I browse archived threads.

Anyway, many thanks to all that tried to help me.

-- 
Charles


Re: DBD::CSV much slower on osX ?

2005-04-01 Thread Tim Bunce
On Fri, Apr 01, 2005 at 03:09:12PM +0900, Charles Plessy wrote:
 
   You may wonder the purpose of this mail, as there is no new
 crucial information. The reason is that I do not like not knowing the
 end of the story when I browse archived threads.

Thanks Charles.

Tim.


Re: DBD::CSV much slower on osX ?

2005-04-01 Thread Jeff Zucker
Charles Plessy wrote:
As I had better to spend my time on analysing some data with
my script rather than analysing the script itself, I eventually
swiched to DBI::SQLite, which solved my problem by performing the
query in a few seconds.
There's no question, SQLite is faster than DBD::CSV for most things.  If 
someone asks me for a recommendation for a database to use and they have large 
and or complex data, don't care about the format of the data, and speed is an 
issue, I never recommend my own modules :-).  I hope you're using mod_perl or 
something because connecting to the database is one area where SQLite is 
slower.  It's also slower for inserts.  I'd like to point out also that 
SQL::Statement, the underlying engine for DBD::CSV, is undergoing some major 
changes and is becoming faster on each benchmark and that the purpose of the 
pure perl DBDs like DBD::CSV is to provide access to human readable data, 
unconventional datasources, and to provide support for platforms and contexts 
where compilation is not an option, not to try to rival the speed of RDBMSs 
written in C.
The query was something like :
$query = SELECT * FROM $db_table WHERE SYMBOL1 LIKE \'%${Search}%\' OR  SYMBOL2 LIKE \'%${Search}%\' ;
 

LIKE and CLIKE with wildcards are full text searches and are always 
going to be slow relative to other kinds of searches.

	You may wonder the purpose of this mail, as there is no new
crucial information. The reason is that I do not like not knowing the
end of the story when I browse archived threads.
 

Thanks, I appreciate hearing back. Good luck!
--
Jeff


Re: DBD::CSV much slower on osX ?

2005-02-16 Thread charles-perl
On Thu, Feb 03, 2005 at 10:18:43AM +, Tim Bunce wrote :
 On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] wrote:
  Dear list,
  
  I wrote a simple CGI script using DBD::CSV on a linux
  computer, and then installed it on a iMac G5. Its execution time is
  now alomst 10 times slower. Using print instructions, I traced the
  bottleneck to the following instruction :
  
  $sth-execute;
  
  Now I am a bit stuck, as I do not know how to investigate in
  the DBD::CSV module to find where is the slow isnstruction.
 
 The Devel::DProf module (and/or other code profiling modules) may help.
 

Thank you for pointing this module. I have used it to analyse my script :

Total Elapsed Time = 63.19465 Seconds
  User+System Time = 43.46465 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 66.5   28.91 36.463 109445   0.0003 0.0003  SQL::Statement::eval_where
 11.1   4.848  6.132 109447   0. 0.0001  Text::CSV_XS::getline
 7.96   3.458  3.458 218890   0. 0.  SQL::Statement::get_row_value
 6.90   2.998  9.129 109447   0. 0.0001  DBD::CSV::Table::fetch_row
 5.80   2.520  7.545 109445   0. 0.0001  SQL::Statement::process_predicate
 4.36   1.894  1.894 109445   0. 0.  SQL::Statement::is_matched
 2.95   1.284  1.284 109447   0. 0.  IO::Handle::getline
 2.29   0.997 46.596  1   0.9965 46.595  SQL::Statement::SELECT
 0.16   0.069  0.076 27   0.0026 0.0028  CGI::_compile
 0.16   0.068  0.123  4   0.0170 0.0307  main::BEGIN
 0.09   0.040  0.079  4   0.0099 0.0198  DBI::SQL::Nano::BEGIN
 0.09   0.040  0.039  7   0.0057 0.0056  SQL::Statement::BEGIN
 0.07   0.029  0.164  1   0.0295 0.1645  DBI::install_driver
 0.02   0.010  0.010  1   0.0100 0.0100  Fcntl::bootstrap
 0.02   0.010  0.010  1   0.0100 0.0100  SQL::Parser::dialect


It seems that there is no bottleneck. I have ran the script on
a 3GHz linux box, and it took it 19 seconds to complete. So I do not
understand how I mananaged to run it in 7 seconds on a 1.8 GHz Athlon
laptop (which unfortunately I do not own anymore).

As somebody kindly pointed me out that hardware differences
could have a strong impact on the results, I will suppose that this is
the reason why I have seen such speed inconsistencies, unless somebody
found something insightful in the dprofpp output extract in this mail.

Thanks to those who answered me and offered their help.

-- 
Charles


Re: DBD::CSV much slower on osX ?

2005-02-16 Thread Charles Plessy
On Thu, Feb 03, 2005 at 11:33:10AM -0800, Henri Asseily wrote :

 Also note that DBD::CSV is significantly impacted by I/O speed. If your 
 IMac G5 has a 4200 rpm drive and your linux box has a 10k rpm one, that 
 makes quite a large difference.


Would that mean that running the script twice should
dramatically accelerate the execution because the file would then be
cached in memory? (which does not work, I just tried).

Best,

-- 
Charles


Re: DBD::CSV much slower on osX ?

2005-02-16 Thread Peter J. Holzer
On 2005-02-16 19:44:49 +0900, [EMAIL PROTECTED] wrote:
 On Thu, Feb 03, 2005 at 10:18:43AM +, Tim Bunce wrote :
  On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] wrote:
   Dear list,
   
 I wrote a simple CGI script using DBD::CSV on a linux
   computer, and then installed it on a iMac G5. Its execution time is
   now alomst 10 times slower.
[...]
  The Devel::DProf module (and/or other code profiling modules) may help.
  
 
 Thank you for pointing this module. I have used it to analyse my script :
 
 Total Elapsed Time = 63.19465 Seconds
   User+System Time = 43.46465 Seconds

That's strange. Unless your computer is busy doing something else, it is
waiting almost 20 seconds for I/O. Unless the lines of your CSV file are
really long, I cannot imagine that simply reading a file with 109447
lines can take 20 seconds. (Even less if that is a join over several
files)

 Exclusive Times
 %Time ExclSec CumulS #Calls sec/call Csec/c  Name
  66.5   28.91 36.463 109445   0.0003 0.0003  SQL::Statement::eval_where
[...]
  2.95   1.284  1.284 109447   0. 0.  IO::Handle::getline
  2.29   0.997 46.596  1   0.9965 46.595  SQL::Statement::SELECT
[...]
   It seems that there is no bottleneck.

Most of the CPU time is spent in SQL::Statement::eval_where. Unless you
can change your query, you probably can't make it much faster (except
maybe by moving from CSV to a real RDBMS or maybe SQLite).

hp

-- 
   _  | Peter J. Holzer  | If the code is old but the problem is new
|_|_) | Sysadmin WSR / LUGA  | then the code probably isn't the problem.
| |   | [EMAIL PROTECTED]|
__/   | http://www.hjp.at/   | -- Tim Bunce on dbi-users, 2004-11-05


pgpzOVmn24wHP.pgp
Description: PGP signature


Re: Re: DBD::CSV much slower on osX ?

2005-02-16 Thread amonotod
 From: Charles Plessy [EMAIL PROTECTED]
 Date: 2005/02/16 Wed AM 04:58:36 CST
 
   Would that mean that running the script twice should
 dramatically accelerate the execution because the file would then be
 cached in memory? (which does not work, I just tried).

Keep in mind that it's not just your script that must be cached, if indeed that 
did work.  There's also Perl, the modules, and your data (if in CSV form).

If you want to test to see if it is just I/O, find out if your O/S supports 
RAM-disk, copy all files involved there, and run a test.  If your run time is 
significantly reduced, it may indeed point to I/O issues...

 Best,
 Charles

HTH,
amonotod


--

`\|||/ amonotod@| sun|perl|windows
  (@@) charter.net  | sysadmin|dba
  ooO_(_)_Ooo
  _|_|_|_|_|_|_|_|



Re: DBD::CSV much slower on osX ?

2005-02-16 Thread Henri Asseily
On Feb 16, 2005, at 2:58 AM, Charles Plessy wrote:
On Thu, Feb 03, 2005 at 11:33:10AM -0800, Henri Asseily wrote :
Also note that DBD::CSV is significantly impacted by I/O speed. If 
your
IMac G5 has a 4200 rpm drive and your linux box has a 10k rpm one, 
that
makes quite a large difference.

Would that mean that running the script twice should
dramatically accelerate the execution because the file would then be
cached in memory? (which does not work, I just tried).
Looks like everything is normal from your dprof output. it just takes a 
while to access the file.
Assuming your IMac G5 has enough available RAM to use as file buffers 
when you load the file, try the following command just before running 
the script:

time cat /path/to/csv/file  /dev/null
This should load the file into memory buffers and tell you the time it 
took.
Then run the same command again. It should be almost instantaneous, 
telling you that the file is in RAM.
Only then, run your perl script and see it fly.

FYI on my powerbook, a 'cat' to /dev/null of a 230 meg file takes 11 
seconds. The next run takes 0.8 seconds.

H.


Re: DBD::CSV much slower on osX ?

2005-02-16 Thread Jeff Zucker
[EMAIL PROTECTED] wrote:
Total Elapsed Time = 63.19465 Seconds
 User+System Time = 43.46465 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
66.5   28.91 36.463 109445   0.0003 0.0003  SQL::Statement::eval_where
 

I'm the maintainer of DBD::CSV and while I don't have time this week to 
get into this, I am following the conversation and will certainly 
eventually revise the module if any useful information is discovered.  
My only comment so far is that if, as the above shows, the eval_where is 
what is taking the time, then I'd like to know what SQL query you are 
running.  Are you doing full text searches with LIKE? or many 
comparisons?  Do you use a prepare-once, execute many approach?  This 
may be a red herring, but I'd be curious.

--
Jeff


Re: DBD::CSV much slower on osX ?

2005-02-16 Thread Tim Bunce
On Wed, Feb 16, 2005 at 07:44:49PM +0900, [EMAIL PROTECTED] wrote:
 
   As somebody kindly pointed me out that hardware differences
 could have a strong impact on the results, I will suppose that this is
 the reason why I have seen such speed inconsistencies, unless somebody
 found something insightful in the dprofpp output extract in this mail.

Peter had some insightful comments.

And you've not addressed the issues I raised a few days ago:

: A couple of quick guesses:
:   - perl config differences - eg configured for threads or not
:   - perhaps on OS X you're using unicode data

You could start by posting the perl -V output for the two perls.

Tim.


DBD::CSV much slower on osX ?

2005-02-03 Thread charles-perl
Dear list,

I wrote a simple CGI script using DBD::CSV on a linux
computer, and then installed it on a iMac G5. Its execution time is
now alomst 10 times slower. Using print instructions, I traced the
bottleneck to the following instruction :

$sth-execute;

Now I am a bit stuck, as I do not know how to investigate in
the DBD::CSV module to find where is the slow isnstruction.

I installed the modules by cpan on osX and by apt-get on
debian testing.

Can anyone help me to identify the source of the problem ?

Best,

-- 
Charles


Re: DBD::CSV much slower on osX ?

2005-02-03 Thread Tim Bunce
On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] wrote:
 Dear list,
 
   I wrote a simple CGI script using DBD::CSV on a linux
 computer, and then installed it on a iMac G5. Its execution time is
 now alomst 10 times slower. Using print instructions, I traced the
 bottleneck to the following instruction :
 
 $sth-execute;
 
   Now I am a bit stuck, as I do not know how to investigate in
 the DBD::CSV module to find where is the slow isnstruction.

The Devel::DProf module (and/or other code profiling modules) may help.

A couple of quick guesses:
  - perl config differences - eg configured for threads or not
  - perhaps on OS X you're using unicode data

Tim.


Re: DBD::CSV much slower on osX ?

2005-02-03 Thread Henri Asseily
On Feb 3, 2005, at 2:18 AM, Tim Bunce wrote:
On Thu, Feb 03, 2005 at 06:27:05PM +0900, [EMAIL PROTECTED] 
wrote:
Dear list,
I wrote a simple CGI script using DBD::CSV on a linux
computer, and then installed it on a iMac G5. Its execution time is
now alomst 10 times slower. Using print instructions, I traced the
bottleneck to the following instruction :
$sth-execute;
Now I am a bit stuck, as I do not know how to investigate in
the DBD::CSV module to find where is the slow isnstruction.
The Devel::DProf module (and/or other code profiling modules) may help.
A couple of quick guesses:
  - perl config differences - eg configured for threads or not
  - perhaps on OS X you're using unicode data
Tim.
We need to know a little more about your config:
OS X version?
What Perl version?
I know that Perl by default is threaded on OS X.
What type of linux box are you benchmarking against?
Also note that DBD::CSV is significantly impacted by I/O speed. If your 
IMac G5 has a 4200 rpm drive and your linux box has a 10k rpm one, that 
makes quite a large difference.

Send me off-list the code and data and I'll test it.