Paul Meagher wrote:
Below is the test I ran awhile back on invoking R as a system call.  It
might be faster if you had a c-extension to R but before I went that route I
would want to know 1) roughly how fast Python and Perl are in returning
results with their c-bindings/embedded stuff/dcom stuff, 2) whether R can be
run as a daemon process so you don't incur start up costs, and 3) whether R
can act as a math server in the sense that it will fork children or threads
as multiple users establish sessions with it.  I agree it would be nice to
have a better interface to R than via a system call.


I'm doing something similar using PL/R (an R procedural language handler extension to Postgres that I wrote) with Postgres, R, and PHP. In Postgres 7.4 (currently at beta3) or with a back-patched copy of 7.3, you can preload the R interpreter when the Postgres postmaster first starts. This means that essentially R is running as part of the Postgres daemon. Whenever a connection is made to the database, the forked process already has an initialized copy of R running inside it. The startup savings I see are similar to what you did (2.2 seconds versus 0.009 seconds):


------------------------------------------------------------------
Function -- intentionally very simple:
--------------------------------------
create or replace function echo(text) returns text as 'print(arg1)' language 'plr';


Without preloading (first function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
 Total runtime: 2195.35 msec

Without preloading (second function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
 Total runtime: 0.55 msec

With preloading (first function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
 Total runtime: 9.74 msec

With preloading (second function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
 Total runtime: 0.59 msec
------------------------------------------------------------------


In both cases the second (and subsequent) function calls are even faster because the PL/R function itself has been precompiled and cached.


I call the PL/R function from PHP to read my data directly from the database, process it, and generate whatever charts I need. Here's a very simple example:


The PL/R function: ------------------------------------------------------------------ create type histtup as ( break float8, count int );

create or replace function hist(text, text)
returns setof histtup as '
 sql <- paste("select id_val from sample_numeric_data ",
              "where ia_id=''", arg1, "''", sep="")
 rs <- pg.spi.exec(sql)

 if (!is.na(arg2)) {
    x11(display=":5")
    jpeg(file=arg2, width = 480, height = 480,
         pointsize = 12, quality = 75)
    par(ask = FALSE, bg = "#F8F8F8")
    sql <- paste("select ia_attname as val from atts ",
                 "where ia_id=''", arg1, "''", sep="")
    attname <- pg.spi.exec(sql)
    h <- hist(rs[,1], col = "blue",
              main = paste("Histogram of", attname$val),
              xlab = attname$val);
    dev.off()
    system(paste("chmod 666 ", arg2, sep=""),
           intern = FALSE, ignore.stderr = TRUE)
  }
  else
    h <- hist(rs[,1], plot = FALSE);

  result = data.frame(breaks = h$breaks[1:length(h$breaks)-1],
           count = h$counts);

  return(result)
' language 'plr';
------------------------------------------------------------------

The PHP page:
------------------------------------------------------------------
<HTML><BODY>
<?PHP
echo "
<FORM ACTION='$PHP_SELF' METHOD='post' NAME='proto_form'>
<TABLE WIDTH='482' CELLSPACING='0' CELLPADDING='1' BORDER='0'>
  <TR>
    <TD>Data</TD>
    <TD><INPUT TYPE='text' NAME='userdata' value='' size='80'></TD>
  </TR>
  <TR>
    <TD colspan='2'>
      <INPUT TYPE='submit' NAME='submit' value='Submit'>
    </TD>
  </TR>
</TABLE>
</FORM>
";

if ($_POST['submit'] == "Submit")
{
  $tmpfilename = 'charts/hist1.jpg';
  $conn = pg_connect("dbname=oscon user=postgres");
  $sql = "select * from hist('" . $_POST['userdata'] . "','" .
         "/tmp/" . $tmpfilename . "')";
  $rs = pg_query($conn,$sql);
  echo "<img src='$tmpfilename' border=0>";
}
?>
</BODY></HTML>
------------------------------------------------------------------


Hopefully this gives you some ideas about what is possible. If you're interested in PL/R, you can grab a copy (along with a patched 7.3.4 source RPM for Postgres) here: http://www.joeconway.com/


HTH,

Joe

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to