Re: [Rd] CHAR () and Rmpi

2007-09-28 Thread Prof Brian Ripley
I'm not sure what your sticking point here is.  If mpi does not modify 
data in a (char *) pointer, then that really is a (const char *) pointer 
and the headers are being unhelpful in not telling the compiler that 
the data are constant.

If that is the case you need to use casts to (char *) and the following 
private define may be useful to you:

#define CHAR_RW(x) ((char *) CHAR(x))


However, you ask

> Is there an easy way to get a char pointer to STRING_ELT((sexp_rdata),0) 
> and is also backward compatible to old R versions.

and the answer is that there is no such way, since (const char *) and 
(char *) are not the same thing and any package that wants to alter the 
contents of a string element needs to create a new CHARSXP to be that 
element.


BTW, you still have not changed Rmpi to remove the configure problems on 
64-bit systems (including assuming libs are in /usr/lib not /usr/lib64) I 
pointed out a long time ago.


On Fri, 28 Sep 2007, Hao Yu wrote:

> Hi. I am the maintainer of Rmpi package. Now I have a problem regarding
> the change of CHAR () in R 2.6.0. According to R 2.6.0 NEWS:
> ***
> CHAR() now returns (const char *) since CHARSXPs should no
>longer be modified in place.  This change allows compilers to
>warn or error about improper modification.  Thanks to Herve
>Pages for the suggestion.
> ***
> Unfortunately this causes Rmpi to fail since MPI requires char pointers
> rather than const char pointers. Normally I use
>CHAR(STRING_ELT((sexp_rdata),0))
> to get the pointer to MPI where a R character vector (C sense) is stored.
> Because of the change, all character messengers fail. Is there an easy way
> to get a char pointer to STRING_ELT((sexp_rdata),0) and is also backward
> compatible to old R versions. BTW Rmpi does not do any modification of
> characters at C level.
>
> Thanks
> Hao Yu
>
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] CHAR () and Rmpi

2007-09-28 Thread Hao Yu
Hi. I am the maintainer of Rmpi package. Now I have a problem regarding
the change of CHAR () in R 2.6.0. According to R 2.6.0 NEWS:
***
CHAR() now returns (const char *) since CHARSXPs should no
longer be modified in place.  This change allows compilers to
warn or error about improper modification.  Thanks to Herve
Pages for the suggestion.
***
Unfortunately this causes Rmpi to fail since MPI requires char pointers
rather than const char pointers. Normally I use
CHAR(STRING_ELT((sexp_rdata),0))
to get the pointer to MPI where a R character vector (C sense) is stored.
Because of the change, all character messengers fail. Is there an easy way
to get a char pointer to STRING_ELT((sexp_rdata),0) and is also backward
compatible to old R versions. BTW Rmpi does not do any modification of
characters at C level.

Thanks
Hao Yu

-- 
Department of Statistics & Actuarial Sciences
Fax Phone#:(519)-661-3813
The University of Western Ontario
Office Phone#:(519)-661-3622
London, Ontario N6A 5B7
http://www.stats.uwo.ca/faculty/yu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-Server remotecontrolled via browser-GUI

2007-09-28 Thread Jeffrey Horner
idontwant googeltospyafterme wrote on 09/28/2007 09:33 AM:
> dear r-community,
> 
> currently i have a workflow as following:
> data in online-MYSQL-database via .csv into R for analysis, results
> back to mysql.
> generating graphics for reports also.
> 
> that's ok, but i need some process-optimization.
> 
> i'd like to run r directly on the webserver. analysis should work
> automatically, after the appropriate r-script was selected in a
> browser-gui for R.
> 
> until know after collecting some information, i think the best
> approach to reach that goal is:
> * using mod_R / RApache to run multiple instances of r on the server,
> * build a website that will serve as frontend/gui for r either with
> html/php or some ajax-framework,
> * connect R to the database with one of the available db-packages to
> fetch the survey-data
> * put the r-scripts for analysis somewhere on the server
> * use cairo for generation of the images
> * and see what happens...
> 
> i would like to know, if my construction seems good to you, if you
> have other recommendations or constructive critics and what you think
> about the effort for configuring mod_R/RAPACHE, cairo and the
> db-package for r.

Hi Josuah,

I'm planning on releasing the 1.0 version of rapache sometime after the 
relase of R 2.6.0. Work's been really busy around here, so I haven't had 
time to finish up the last of the documentation and packaging.

After that, I hope to have time to provide recommendations on how to use 
rapache in various situations. Expect to see progress here:

http://biostat.mc.vanderbilt.edu/rapache

near the middle of October.

Jeff
-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-Server remotecontrolled via browser-GUI

2007-09-28 Thread Bos, Roger
In response to Martin's suggestion, if you DID want a fairly simple way
to provide light-weight web access to R, Rpad would be a good choice.  I
actually run SQL Server 2005, Apache, and R/Rpad on the same server, but
the usage is very light in terms of number of users.  It was pretty
quick to set up and making a new Rpad page is really easy.

HTH,

Roger
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin Morgan
Sent: Friday, September 28, 2007 11:28 AM
To: idontwant googeltospyafterme
Cc: r-devel@r-project.org
Subject: Re: [Rd] R-Server remotecontrolled via browser-GUI

Hi Josuah

RWebServices might be a different approach. It allows R functions and
methods to be deployed as SOAP-based web services, and relies on an
underlying Java architecture that uses JMS to separate the web services
front end from a collection of R workers (on different machines). This
seems more scalable than putting the R workers on the same machine as
the Apache server.

Entry point here:

http://bioconductor.org/packages/2.0/bioc/html/RWebServices.html

There is quite a learning curve and technology commitment associated
with this, so not the best solution for providing light-weight web
access to R.

Martin

"idontwant googeltospyafterme" <[EMAIL PROTECTED]>
writes:

> dear r-community,
>
> currently i have a workflow as following:
> data in online-MYSQL-database via .csv into R for analysis, results 
> back to mysql.
> generating graphics for reports also.
>
> that's ok, but i need some process-optimization.
>
> i'd like to run r directly on the webserver. analysis should work 
> automatically, after the appropriate r-script was selected in a 
> browser-gui for R.
>
> until know after collecting some information, i think the best 
> approach to reach that goal is:
> * using mod_R / RApache to run multiple instances of r on the server,
> * build a website that will serve as frontend/gui for r either with 
> html/php or some ajax-framework,
> * connect R to the database with one of the available db-packages to 
> fetch the survey-data
> * put the r-scripts for analysis somewhere on the server
> * use cairo for generation of the images
> * and see what happens...
>
> i would like to know, if my construction seems good to you, if you 
> have other recommendations or constructive critics and what you think 
> about the effort for configuring mod_R/RAPACHE, cairo and the 
> db-package for r.
>
> thanks a lot in advance for your help!
>
> cheers,
>
> josuah r.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

** *
This message is for the named person's use only. It may 
contain confidential, proprietary or legally privileged 
information. No right to confidential or privileged treatment 
of this message is waived or lost by any error in 
transmission. If you have received this message in error, 
please immediately notify the sender by e-mail, 
delete the message and all copies from your system and destroy 
any hard copies. You must not, directly or indirectly, use, 
disclose, distribute, print or copy any part of this message 
if you are not the intended recipient. 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-28 Thread Peter Dalgaard
Duncan Murdoch wrote:
> On 9/28/2007 7:45 AM, Petr Savicky wrote:
>   
>> On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote:
>> 
>   ...
>   
>>> Longer-term, I still have some hope for better reference counting, but 
>>> the semantics of environments make it really ugly -- an environment can 
>>> contain an object that contains the environment, a simple example being 
>>>
>>> f <- function()
>>>g <- function() 0
>>> f()
>>>
>>> At the end of f(), we should decide whether to destroy f's evaluation 
>>> environment. In the present example, what we need to be able to see is 
>>> that this would remove all refences to g and that the reference from g 
>>> to f can therefore be ignored.  Complete logic for sorting this out is 
>>> basically equivalent to a new garbage collector, and one can suspect 
>>> that applying the logic upon every function return is going to be 
>>> terribly inefficient. However, partial heuristics might apply.
>>>   
>> I have to say that I do not understand the example very much.
>> What is the input and output of f? Is g inside only defined or
>> also used?
>> 
>
> f has no input; it's output is the function g, whose environment is the 
> evaluation environment of f.  g is never used, but it is returned as the 
> value of f.  Thus we have the loop:
>
> g refers to the environment.
> the environment contains g.
>
> Even though the result of f() was never saved, two things (the 
> environment and g) got created and each would have non-zero reference 
> count.
>
> In a more complicated situation you might want to save the result of the 
> function and then modify it.  But because of the loop above, you would 
> always think there's another reference to the object, so every in-place 
> modification would require a copy first.
>
>   
Thanks Duncan. It was way past my bedtime when I wrote that...

I had actually missed the point about the return value,  but the point 
remains even if you let f return something other than g: You get a 
situation where the two objects both have a refcount of 1, so by 
standard refcounting semantics neither can be removed even though 
neither object is reachable.

Put differently, standard refcounting assumes that references between 
objects of the language form a directed acyclic graph, but when 
environments are involved, there can be cycles in R-like languages.

-p

> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>   


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-28 Thread Prof Brian Ripley
On Fri, 28 Sep 2007, Luke Tierney wrote:

> On Fri, 28 Sep 2007, Petr Savicky wrote:

[...]

>> This leads me to a question. Some of the tests, which I did, suggest
>> that gc() may not free all the memory, even if I remove all data
>> objects by rm() before calling gc(). Is this possible or I must have
>> missed something?

> Not impossible but very unlikely givent he use gc gets. There are a
> few internal tables that are grown but not shrunk at the moment but
> that should not usually cause much total growth.  If you are ooking at
> system memopry use then that is a malloc issue -- there was a thread
> about this a month or so ago.

A likely explanation is lazy-loading.  Almost all the package code is 
stored externally until used: 2.6.0 is better at not bringing in unused 
code.  E.g. (2.6.0, 64-bit system)

> gc()
  used (Mb) gc trigger (Mb) max used (Mb)
Ncells 141320  7.6 35 18.7   35 18.7
Vcells 130043  1.0 786432  6.0   561893  4.3
> for(s in search()) lsf.str(s)
> gc()
  used (Mb) gc trigger (Mb) max used (Mb)
Ncells 424383 22.7 531268 28.4   437511 23.4
Vcells 228005  1.8 786432  6.0   700955  5.4

'if I remove all data objects by rm()' presumably means clearing the 
user workspace: there are lots of other environments containing objects 
('data' or otherwise), many of which are needed to run R.

Otherwise the footer to every R-help message applies 

>> A possible solution to the unwanted increase of NAMED due to temporary
>> calculations could be to give the user the possibility
>> to store NAMED attribute of an object before a call to a function
>> and restore it after the call. To use this, the user should be
>> confident that no new reference to the object persists after the
>> function is completed.
>
> This would be too dangerous for general use. Some more structured
> approach may be possible. A related issue is that user-defined
> assignment functions always see a NAMED of 2 and hence cannot modify
> in place. We've been trying to come up with a reasonable solution to
> this, so far without success but I'm moderately hopeful.

I am not persuaded that the difference between NAMED=1/2 makes much 
difference in general use of R, and I recall Ross saying that he no longer 
believed that this was a worthwhile optimization.  It's not just 
'user-defined' replacement functions, but also all the system-defined 
closures (including all methods for the generic replacement functions 
which are primitive) that are unable to benefit from it.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-Server remotecontrolled via browser-GUI

2007-09-28 Thread Martin Morgan
Hi Josuah

RWebServices might be a different approach. It allows R functions and
methods to be deployed as SOAP-based web services, and relies on an
underlying Java architecture that uses JMS to separate the web services
front end from a collection of R workers (on different machines). This
seems more scalable than putting the R workers on the same machine as
the Apache server.

Entry point here:

http://bioconductor.org/packages/2.0/bioc/html/RWebServices.html

There is quite a learning curve and technology commitment associated
with this, so not the best solution for providing light-weight
web access to R.

Martin

"idontwant googeltospyafterme" <[EMAIL PROTECTED]> writes:

> dear r-community,
>
> currently i have a workflow as following:
> data in online-MYSQL-database via .csv into R for analysis, results
> back to mysql.
> generating graphics for reports also.
>
> that's ok, but i need some process-optimization.
>
> i'd like to run r directly on the webserver. analysis should work
> automatically, after the appropriate r-script was selected in a
> browser-gui for R.
>
> until know after collecting some information, i think the best
> approach to reach that goal is:
> * using mod_R / RApache to run multiple instances of r on the server,
> * build a website that will serve as frontend/gui for r either with
> html/php or some ajax-framework,
> * connect R to the database with one of the available db-packages to
> fetch the survey-data
> * put the r-scripts for analysis somewhere on the server
> * use cairo for generation of the images
> * and see what happens...
>
> i would like to know, if my construction seems good to you, if you
> have other recommendations or constructive critics and what you think
> about the effort for configuring mod_R/RAPACHE, cairo and the
> db-package for r.
>
> thanks a lot in advance for your help!
>
> cheers,
>
> josuah r.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-Server remotecontrolled via browser-GUI

2007-09-28 Thread Greg Snow
Have you read section 4 of the FAQ?  If not, that would be a good place
to start.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of idontwant 
> googeltospyafterme
> Sent: Friday, September 28, 2007 8:33 AM
> To: r-devel@r-project.org
> Subject: [Rd] R-Server remotecontrolled via browser-GUI
> 
> dear r-community,
> 
> currently i have a workflow as following:
> data in online-MYSQL-database via .csv into R for analysis, 
> results back to mysql.
> generating graphics for reports also.
> 
> that's ok, but i need some process-optimization.
> 
> i'd like to run r directly on the webserver. analysis should 
> work automatically, after the appropriate r-script was 
> selected in a browser-gui for R.
> 
> until know after collecting some information, i think the 
> best approach to reach that goal is:
> * using mod_R / RApache to run multiple instances of r on the server,
> * build a website that will serve as frontend/gui for r 
> either with html/php or some ajax-framework,
> * connect R to the database with one of the available 
> db-packages to fetch the survey-data
> * put the r-scripts for analysis somewhere on the server
> * use cairo for generation of the images
> * and see what happens...
> 
> i would like to know, if my construction seems good to you, 
> if you have other recommendations or constructive critics and 
> what you think about the effort for configuring 
> mod_R/RAPACHE, cairo and the db-package for r.
> 
> thanks a lot in advance for your help!
> 
> cheers,
> 
> josuah r.
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R-Server remotecontrolled via browser-GUI

2007-09-28 Thread idontwant googeltospyafterme
dear r-community,

currently i have a workflow as following:
data in online-MYSQL-database via .csv into R for analysis, results
back to mysql.
generating graphics for reports also.

that's ok, but i need some process-optimization.

i'd like to run r directly on the webserver. analysis should work
automatically, after the appropriate r-script was selected in a
browser-gui for R.

until know after collecting some information, i think the best
approach to reach that goal is:
* using mod_R / RApache to run multiple instances of r on the server,
* build a website that will serve as frontend/gui for r either with
html/php or some ajax-framework,
* connect R to the database with one of the available db-packages to
fetch the survey-data
* put the r-scripts for analysis somewhere on the server
* use cairo for generation of the images
* and see what happens...

i would like to know, if my construction seems good to you, if you
have other recommendations or constructive critics and what you think
about the effort for configuring mod_R/RAPACHE, cairo and the
db-package for r.

thanks a lot in advance for your help!

cheers,

josuah r.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] this is just a test...

2007-09-28 Thread idontwant googeltospyafterme
...because my messages get rejected due to filter rule match everytime...
sorry for any unconviniece.
josuah r.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-28 Thread Duncan Murdoch
On 9/28/2007 7:45 AM, Petr Savicky wrote:
> On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote:
  ...
>> Longer-term, I still have some hope for better reference counting, but 
>> the semantics of environments make it really ugly -- an environment can 
>> contain an object that contains the environment, a simple example being 
>> 
>> f <- function()
>>g <- function() 0
>> f()
>> 
>> At the end of f(), we should decide whether to destroy f's evaluation 
>> environment. In the present example, what we need to be able to see is 
>> that this would remove all refences to g and that the reference from g 
>> to f can therefore be ignored.  Complete logic for sorting this out is 
>> basically equivalent to a new garbage collector, and one can suspect 
>> that applying the logic upon every function return is going to be 
>> terribly inefficient. However, partial heuristics might apply.
> 
> I have to say that I do not understand the example very much.
> What is the input and output of f? Is g inside only defined or
> also used?

f has no input; it's output is the function g, whose environment is the 
evaluation environment of f.  g is never used, but it is returned as the 
value of f.  Thus we have the loop:

g refers to the environment.
the environment contains g.

Even though the result of f() was never saved, two things (the 
environment and g) got created and each would have non-zero reference 
count.

In a more complicated situation you might want to save the result of the 
function and then modify it.  But because of the loop above, you would 
always think there's another reference to the object, so every in-place 
modification would require a copy first.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] modifying large R objects in place

2007-09-28 Thread Luke Tierney
On Fri, 28 Sep 2007, Petr Savicky wrote:

> On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote:
> [...]
>>> nrow <- function(...) dim(...)[1]
>>> ncol <- function(...) dim(...)[2]
>>>
>>> At least in my environment, the new versions preserved NAMED == 1.

I believe this is a bug in the evaluation of ... arguments.  THe
intent in the code is I believe that all promise evaluations result in
NAMED==2 for safety.  That may be overly conservative but I would not
want to change it without some very careful thought -- I prefer to
wait a little longer for the right answer than to get a wrong one
quickly.

>>>
>> Yes, but changing the formal arguments is a bit messy, is it not?
>
> Specifically for nrow, ncol, I think not much, since almost nobody needs
> to know (or even knows) that the name of the formal argument is "x".
>
> However, there is another argument against the ... solution: it solves
> the problem only in the simplest cases like nrow, ncol, but is not
> usable in other, like colSums, rowSums. These functions also increase
> NAMED of its argument, although their output does not contain any
> reference to the original content of their arguments.
>
> I think that a systematic solution of this problem may be helpful.
> However, making these functions Internal or Primitive would
> not be good in my opinion. It is advantageous that these functions
> contain an R level part, which
> makes the basic decisions before a call to .Internal.
> If nothing else, this serves as a sort of documentation.
>
> For my purposes, I replaced calls to "colSums" and "matrix" by the
> corresponding calls to .Internal in my script. The result is that
> now I can complete several runs of my calculation in a cycle instead
> of restarting R after each of the runs.
>
> This leads me to a question. Some of the tests, which I did, suggest
> that gc() may not free all the memory, even if I remove all data
> objects by rm() before calling gc(). Is this possible or I must have
> missed something?

Not impossible but very unlikely givent he use gc gets. There are a
few internal tables that are grown but not shrunk at the moment but
that should not usually cause much total growth.  If you are ooking at
system memopry use then that is a malloc issue -- there was a thread
about this a month or so ago.

> A possible solution to the unwanted increase of NAMED due to temporary
> calculations could be to give the user the possibility
> to store NAMED attribute of an object before a call to a function
> and restore it after the call. To use this, the user should be
> confident that no new reference to the object persists after the
> function is completed.

This would be too dangerous for general use. Some more structured
approach may be possible. A related issue is that user-defined
assignment functions always see a NAMED of 2 and hence cannot modify
in place. We've been trying to come up with a reasonable solution to
this, so far without success but I'm moderately hopeful.

>> Presumably, nrow <- function(x) eval.parent(substitute(dim(x)[1])) works
>> too, but if the gain is important enough to warrant that sort of
>> programming, you might as well make nrow a .Primitive.
>
> You are right. This indeed works.
>
>> Longer-term, I still have some hope for better reference counting, but
>> the semantics of environments make it really ugly -- an environment can
>> contain an object that contains the environment, a simple example being
>>
>> f <- function()
>>g <- function() 0
>> f()
>>
>> At the end of f(), we should decide whether to destroy f's evaluation
>> environment. In the present example, what we need to be able to see is
>> that this would remove all refences to g and that the reference from g
>> to f can therefore be ignored.  Complete logic for sorting this out is
>> basically equivalent to a new garbage collector, and one can suspect
>> that applying the logic upon every function return is going to be
>> terribly inefficient. However, partial heuristics might apply.
>
> I have to say that I do not understand the example very much.
> What is the input and output of f? Is g inside only defined or
> also used?
>
> Let me ask the following question. I assume that gc() scans the whole
> memory and determines for each part of data, whether a reference
> to it still exists or not. In my understanding, this is equivalent to
> determine, whether NAMED of it may be dropped to zero or not.
> Structures, for which this succeeds are then removed. Am I right?
> If yes, is it possible during gc() to determine also cases,
> when NAMED may be dropped from 2 to 1? How much would this increase
> the complexity of gc()?

Probably not impossible but would be a fair bit of work with probably
not much gain as the NAMED values would still be high until the next
gc of the appropriate level, which will probably be a fair time as an
object being modified is likely to be older, but the interval in which
there would be a benefit is short.

The basic f

Re: [Rd] modifying large R objects in place

2007-09-28 Thread Petr Savicky
On Fri, Sep 28, 2007 at 12:39:30AM +0200, Peter Dalgaard wrote:
[...]
> >nrow <- function(...) dim(...)[1]
> >ncol <- function(...) dim(...)[2]
> >
> >At least in my environment, the new versions preserved NAMED == 1.
> >  
> Yes, but changing the formal arguments is a bit messy, is it not?

Specifically for nrow, ncol, I think not much, since almost nobody needs
to know (or even knows) that the name of the formal argument is "x".

However, there is another argument against the ... solution: it solves
the problem only in the simplest cases like nrow, ncol, but is not
usable in other, like colSums, rowSums. These functions also increase
NAMED of its argument, although their output does not contain any
reference to the original content of their arguments.

I think that a systematic solution of this problem may be helpful.
However, making these functions Internal or Primitive would
not be good in my opinion. It is advantageous that these functions
contain an R level part, which
makes the basic decisions before a call to .Internal.
If nothing else, this serves as a sort of documentation.

For my purposes, I replaced calls to "colSums" and "matrix" by the
corresponding calls to .Internal in my script. The result is that
now I can complete several runs of my calculation in a cycle instead
of restarting R after each of the runs.

This leads me to a question. Some of the tests, which I did, suggest
that gc() may not free all the memory, even if I remove all data
objects by rm() before calling gc(). Is this possible or I must have
missed something?

A possible solution to the unwanted increase of NAMED due to temporary
calculations could be to give the user the possibility
to store NAMED attribute of an object before a call to a function
and restore it after the call. To use this, the user should be
confident that no new reference to the object persists after the
function is completed.

> Presumably, nrow <- function(x) eval.parent(substitute(dim(x)[1])) works 
> too, but if the gain is important enough to warrant that sort of 
> programming, you might as well make nrow a .Primitive.

You are right. This indeed works.

> Longer-term, I still have some hope for better reference counting, but 
> the semantics of environments make it really ugly -- an environment can 
> contain an object that contains the environment, a simple example being 
> 
> f <- function()
>g <- function() 0
> f()
> 
> At the end of f(), we should decide whether to destroy f's evaluation 
> environment. In the present example, what we need to be able to see is 
> that this would remove all refences to g and that the reference from g 
> to f can therefore be ignored.  Complete logic for sorting this out is 
> basically equivalent to a new garbage collector, and one can suspect 
> that applying the logic upon every function return is going to be 
> terribly inefficient. However, partial heuristics might apply.

I have to say that I do not understand the example very much.
What is the input and output of f? Is g inside only defined or
also used?

Let me ask the following question. I assume that gc() scans the whole
memory and determines for each part of data, whether a reference
to it still exists or not. In my understanding, this is equivalent to
determine, whether NAMED of it may be dropped to zero or not.
Structures, for which this succeeds are then removed. Am I right?
If yes, is it possible during gc() to determine also cases,
when NAMED may be dropped from 2 to 1? How much would this increase
the complexity of gc()?

Thank you in advance for your kind reply.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel