Jim-

You have been very helpful in doing this so far (especially for free!).
Understand that I have been away until today.  I will work on getting you a
better picture of how much time is spent doing each step, but I suspect that
our numbers are about the same.  It takes you ~ 4 seconds to create 17
graphs...I am creating around 75 graphs, and in addition, i am saving them
to the hard drive, so the estimate of around 18 to 19 seconds for that
portion seems about right, and perhaps there isn't any way to speed that
aspect up significantly.

As for the other 10 seconds, you may be right on the money again regarding
data transfer and type conversion.  I have already tried to optimize my SQL
queries to speed this part up, and it makes a difference, but obviously not
to the extent I am hoping for.

To address your final suggestion:  In the big picture, I am looking to have
an application which can take raw data from a SQL database server, analyze
it with set algorithms, place the results of those algorithms back in the
database, AND use the same raw data to create graphs which are saved on a
hard drive.  Ideally, this whole process should take under 10 seconds.

Thanks for your time.

leo

P.S.  I am using only one instance of R - that actually sped up the process
quite a bit :-0



On 3/24/06, jim holtman <[EMAIL PROTECTED]> wrote:
>
> R may be 'interpreted', but the base functions are written in
> C++/FORTRAN.  If you are doing a lot of matrix operations (selection,
> computing, etc.) you are not going to run any faster since you are already
> using optimized code.
>
> There are routine for profiling R.  Do you know where in your code you are
> spending most of the time?  Here is an example of some output from my script
> that reads in some data and then generates about 16 plots.  This is done by
> putting some 'print' statements in the code to output the amount of CPU,
> elapsed time and memory that is being used; I have put some comments to show
> what the progress is.  From this I know which portion of my program needs to
> be optimized:
>
> > source('C:/Perf/bin/Trace CPU by PID (POSIX).r')
> Read 720726 records
> read - my.stats : < 8.1 8.2 > 33.54 2.43 664.04  :  108.2 MB
> # this has read in 720,726 lines of data with 9 columns of
> # data and it took 8.1 seconds of CPU time (first number in the <>)
> #   the CPU time and elapsed (second number in <>) are cumulative from the
> start)
> # this is mostly the conversion of character data to reals/integers.
> # you probably are not going to get faster unless you data is already in
> binary.
>
> time conversion - my.stats : < 8.6 8.6 > 34.01 2.43 664.51  :  110.1 MB
> # this 0.5 seconds was the conversion of 720,726 character strings to
> factors for
> # faster processing in later parts of the program
>
> badTimes - my.stats : < 12.4 12.5 > 37.81 2.48 668.39  :  143.5 MB
>  # this is converting the character string "mm/dd/yy hh:mm:ss" from
> character to
> # binary.  This took an addition 3.8 CPU seconds (12.4 - 8.6). For 720,726
> character
> # strings, I would say this is pretty fast.  Compiled code is probably not
> going to
> # be a lot faster.
>
> done; make approx functions - my.stats : < 13.5 13.6 > 38.84 2.51 669.45
> :  112.2 MB
> # another 1.1 CPU seconds to make five passes through 720,726 data point,
> # cleaning up some more data
>
> start plots - my.stats : < 16.8 17.1 > 42.1 2.53 672.93  :  116.4 MB
> # the time to here was another 3.1 CPU seconds to partition the data by
> 'command',
> # (this is computer performance data) into 159 groups, sum up the CPU time
> that the
> # command in each group used.  Again I would guess compiled code would not
> be much faster
> plot commands - my.stats : < 17.4 17.8 > 42.79 2.53 673.64  :  117.6 MB
> # it took 0.6 seconds to generate a graph that created an 'area' plot of
> the top
> # 20 commands and all the rest grouped in 'other'
>
> indiv commands - my.stats : < 21.1 21.5 > 46.4 2.56 677.35  :  100.1 MB
> # took another 3.7 CPU seconds to create individual plots for the top 15
> commands as
> # area graphs broken down by PID.
>
> done Trace - my.stats : < 23.8 24.3 > 49.15 2.57 680.18  :  124.9 MB
> # so we used 23.8 CPU seconds in 24.3 seconds of elapsed time
>
> So once you get something like this, you can see where the time is being
> spent.  Also how many data points are we talking about?  I would guess that
> a lot of your time may be in the interface with the data base.  So can you
> provide some details like this to help us understand where you problem is?
> For my scripts, I would guess I would not get that much better performance
> out of compiled code since I am probably spending most of my time in the
> 'base' (compiled) functions in R.  The time is due to the amount of data I
> have to process.  What is the size of the data that you are processing?  It
> is details like this that may help you reduce the time.  You also have a
> startup of 1-2 seconds of R itself.  Can you just keep a copy of R running
> that you send scripts to?
>
> So I will ask you one of my favorite comments that I put to development
> organizations when reviewing their architecture:  "Tell me what you want to
> do, not how to do it".
>
> There may be other alternatives to consider once we all understand the
> problem.
>
>
>
>
> On 3/24/06, Leo Espindle <[EMAIL PROTECTED] > wrote:
> >
> > Thanks for the reply.
> >
> > Basically, it is taking too long to run through a series of R scripts
> > when using the D COM interface (StatConnector) to the R environment from a
> > .Net application.  Right now, it takes about 30 seconds to finish the
> > routine, which includes firing up R using StatConnector, performing a series
> > of calculations using .R scripts residing on the hard drive, inserting
> > results into a SQL Server Database (using sqlSave), and then generating 70+
> > graphs using the default graphics device and saving those graphs to the hard
> > drive (also using .R scripts residing on the hard drive).
> >
> > Admittedly, its running on a somewhat older CPU, and not as a service
> > (so the graphs actually appear on screen), but we are looking to cut this
> > time down to around under 5 seconds if at all possible.
> >
> > I have not spent a lot of time optimizing the code, but I suspect the
> > problem lies more in the fact that we have to rely on the interpreted R
> > environment, and we have to read .R scripts (existing on the hard drive)
> > using StatConnector.
> >
> > Any suggestions as to how to speed up graphics generation and subsequent
> > writing of the graphs, for instance, would be helpful, as that seems to take
> > up the bulk of the time.
> >
> > I have not tried writing C++ programs that call the required R routines,
> > unless you are referring to the functionality in StatConnector.
> >
> > Leo
> >
> > On 3/24/06, jim holtman <[EMAIL PROTECTED] > wrote:
> > >
> > > What are your performance issues?  What are you performance targets?
> > > How far you are off the targets?  Have you optimized your R code?  How are
> > > you calling/using R?  What is your interface?  Have you tried to write C++
> > > programs that call the required R routines?
> > >
> > > Please provide some more information.
> > >
> > >
> > >  On 3/24/06, Leo Espindle < [EMAIL PROTECTED]> wrote:
> > >
> > > >  I am currently working on a project that involves using R and
> > > .Net.  We're
> > > having performance issues with R, and we're wondering if there is a
> > > way to
> > > get around the R interpreter, particularly by compiling R directly for
> > > the
> > > .Net CLR?  We're wondering if there is any initiatives to build such a
> > > compiler.
> > >
> > > Thanks,
> > > Leo
> > >
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > >
> > >
> > >
> > > --
> > > Jim Holtman
> > > Cincinnati, OH
> > > +1 513 646 9390 (Cell)
> > > +1 513 247 0281 (Home)
> > >
> > > What the problem you are trying to solve?
> > >
> >
> >
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390 (Cell)
> +1 513 247 0281 (Home)
>
> What the problem you are trying to solve?
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to