Re: [Rd] Serializing many small objects efficiently

2012-03-22 Thread Whit Armstrong
Here's a snip from r-hcp. You can probably find it in the archive:

From: Michael Spiegel
Date: Thu, Sep 29, 2011 at 11:38 AM
Subject: RE: [R-sig-hpc] [zeromq-dev] rzmq package

Calling serialize/serialize from c/c++ is not too convoluted. You can
find a good example in
https://github.com/mspiegel/PiebaldMPI/blob/master/src/lapply_workers_helpers.c,
look for the function "generateReturnList". I'm doing both
serialization and unserialization in that function, but you'll be able
to tease apart the two calls.




On Thu, Mar 22, 2012 at 12:34 PM, Antonio Piccolboni
 wrote:
> Hi,
> sorry if this question is trivial or unclear, this is my first venture into
> mixed C/R programming (I am reasonably experienced in each separately).
> I am trying to write a serialization function for a format called
> typedbytes, which is used as an interchange format in Hadoop circles. Since
> I would need to serialize according to the internal R format many small R
> objects I looked at the c interface
>
> void R_Serialize(SEXP s, R_outpstream_t ops);
> SEXP R_Unserialize(R_inpstream_t ips);
>
> If I look at the source for e.g. unserialize is see a
>
>  .Call("R_unserialize", connection, refhook, PACKAGE = "base")
>
> which, despite the name of the second argument, accepts as 'connection' a
> raw vector. Is there any way to call that function from C -- without
> calling the R function? Failing that, from what I've read I gather that it
> is not possible to get a C stream from a connection, so unless I am wrong
> using R_serialize directly is not possible. If all else fails I would have
> probably to use a hack requiring knowledge of the serialization format,
> which I'd much rather avoid. Suggestions? Thanks
>
>
> Antonio
>
>        [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Serializing many small objects efficiently

2012-03-22 Thread Antonio Piccolboni
Hi,
sorry if this question is trivial or unclear, this is my first venture into
mixed C/R programming (I am reasonably experienced in each separately).
I am trying to write a serialization function for a format called
typedbytes, which is used as an interchange format in Hadoop circles. Since
I would need to serialize according to the internal R format many small R
objects I looked at the c interface

void R_Serialize(SEXP s, R_outpstream_t ops);
SEXP R_Unserialize(R_inpstream_t ips);

If I look at the source for e.g. unserialize is see a

 .Call("R_unserialize", connection, refhook, PACKAGE = "base")

which, despite the name of the second argument, accepts as 'connection' a
raw vector. Is there any way to call that function from C -- without
calling the R function? Failing that, from what I've read I gather that it
is not possible to get a C stream from a connection, so unless I am wrong
using R_serialize directly is not possible. If all else fails I would have
probably to use a hack requiring knowledge of the serialization format,
which I'd much rather avoid. Suggestions? Thanks


Antonio

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] .Call ref card

2012-03-22 Thread Terry Therneau

On 03/22/2012 11:03 AM, peter dalgaard wrote:

Don't know how useful it is any more, but back in the days, I gave this talk in 
Vienna

http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf

Looking at it now, perhaps it moves a little too quickly into the hairy stuff. 
On the other hand, those were the things that I had found important to figure 
out at the time. At a quick glance, I didn't spot anything obviously outdated.


Peter,
  I just looked at this, and I'd say that moved into the hairy stuff 
way too quickly.  Much of what it covered I would never expect to use.  
Some I ddn't understand.  Part of this of course is that slides for a 
talk are rarely very useful without the talker.


 Something simpler for the layman would be good.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 109, Issue 22

2012-03-22 Thread Terry Therneau
On 03/22/2012 09:38 AM, Simon Urbanek wrote:
>
> On Mar 22, 2012, at 9:45 AM, Terry Therneau  > wrote:
>
>>
>   strongly disagree. I'm appalled to see that sentence here.
 >  
 >  Come on!
 >  
> >>  The overhead is significant for any large vector and it is in 
> >> particular unnecessary since in .C you have to allocate*and copy*  
> >> space even for results (twice!). Also it is very error-prone, because 
> >> you have no information about the length of vectors so it's easy to 
> >> run out of bounds and there is no way to check. IMHO .C should not be 
> >> used for any code written in this century (the only exception may be 
> >> if you are passing no data, e.g. if all you do is to pass a flag and 
> >> expect no result, you can get away with it even if it is more 
> >> dangerous). It is a legacy interface that dates way back and is 
> >> essentially just re-named .Fortran interface. Again, I would strongly 
> >> recommend the use of .Call in any recent code because it is safer and 
> >> more efficient (if you don't care about either attribute, well, feel 
> >> free ;)).
 >  
 >  So aleph will not support the .C interface? ;-)
 >  
>>> It will look at the timestamp of the source file and delete the package if 
>>> it is not before 1980 ;). Otherwise it will send a request for punch cards 
>>> with ".C is deprecated, please upgrade to .Call" stamped out :P At that 
>>> point I'll be flaming about using the native Aleph interface and not the R 
>>> compatibility layer ;)
>>>
>>> Cheers,
>>> S
>> I'll dissent -- I don't think .C is inherently any more dangerous 
>> than .Call and prefer it's simplicity in many cases.  Calling C at 
>> all is what is inherently dangerous -- I can reference beyond the end 
>> of a vector, write over objects that should be read only, and branch 
>> to random places using either interface.
>
> You can always do so deliberately, but with .C you have no way of 
> preventing it since you don't even know what is the length! That is 
> certainly far more dangerous than .Call where you can simply loop over 
> the length, check that the lengths are compatible etc. Also for types 
> like strings .C is a minefield that is hard to not blow up whereas 
> .Call it is even more safe than scalar arrays. You can do none of that 
> with .C which relies entirely on conventions with no recorded semantics.
>
I've overrun arrays in both .C and .Call routines, and I assure you that 
it was never deliberate.  Very effective at crashing R with strange 
error messages though.
I will have .C("somefun", as.integer(length(x)), x), the .Call version 
will skip the second argument and add a line in the C code; no real 
difference from my point of view.  ( Though the spelling is harder to 
remember in .Call.  Does R core use dice to decide which things are 
upper, lower, and mixed case: LENGTH, asInteger, ncols?)   R strings are 
a PITA in C and I mostly avoid them so have no arguments about C vs Call 
there.
Much of the survival library is .C for historical reasons of course, but 
I think it shows that you can be safe in C; though you can't be sloppy.
>
>> If you are dealing with large objects and worry about memory 
>> efficiency then .Call puts more tools at your disposal and is worth 
>> the effort.  However, I did not find the .Call interface at all easy 
>> to use at first
>
> I guess this depends on the developer and is certainly a factor. 
> Personally, I find the subset of the R API needed for .Call fairly 
> small and intuitive (in particular when you are just writing a safer 
> replacement for .C), but I'm obviously biased. Maybe in a separate 
> thread we could discuss this - I'd be happy to write a ref card or 
> cheat sheet if I find out what people find challenging on .Call. 
> Nonetheless, my point is that it is more than worth investing the 
> effort both in safety and performance.
>
I'm giving a short course at UseR on the design of the survival packages 
which promises a document "containing all the details", currently being 
written.  It has examples of .Call with discussion of what each action 
is doing.  The final result will  certainly be added to the survival 
package; hopefully it will be useful enough to earn a place on the CRAN 
documentation page as well.
>
>> and we should keep that in mind before getting too pompous in our 
>> lectures to the "sinners of .C".  (Mostly because the things I needed 
>> to know are scattered about in multiple places.)
>>
>> I might have to ask for an exemption on that timestamp -- the first 
>> bits of the survival package only reach back to 1986.  And I've had 
>> to change source code systems multiple times which plays hob with the 
>> file times, though I did try to preserve the changelog history to 
>> forstall some future litigious soul who claims they wrote it first  
>> (sccs -> rcs -> cvs -> svn -> mercurial).   :-)
>>
>
> ;) Maybe th

Re: [Rd] .Call ref card [was Re: R-devel Digest, Vol 109, Issue 22]

2012-03-22 Thread peter dalgaard
Don't know how useful it is any more, but back in the days, I gave this talk in 
Vienna

http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf

Looking at it now, perhaps it moves a little too quickly into the hairy stuff. 
On the other hand, those were the things that I had found important to figure 
out at the time. At a quick glance, I didn't spot anything obviously outdated. 


On Mar 22, 2012, at 16:15 , Ramon Diaz-Uriarte wrote:

> 
> 
> 
> On Thu, 22 Mar 2012 10:38:55 -0400,Simon Urbanek 
>  wrote:
> 
>> On Mar 22, 2012, at 9:45 AM, Terry Therneau  wrote:
> 
>>> 
 
> 
>> strongly disagree. I'm appalled to see that sentence here.
>> 
>> Come on!
>> 
 The overhead is significant for any large vector and it is in 
 particular unnecessary since in .C you have to allocate *and copy* 
 space even for results (twice!). Also it is very error-prone, because 
 you have no information about the length of vectors so it's easy to 
 run out of bounds and there is no way to check. IMHO .C should not be 
 used for any code written in this century (the only exception may be 
 if you are passing no data, e.g. if all you do is to pass a flag and 
 expect no result, you can get away with it even if it is more 
 dangerous). It is a legacy interface that dates way back and is 
 essentially just re-named .Fortran interface. Again, I would strongly 
 recommend the use of .Call in any recent code because it is safer and 
 more efficient (if you don't care about either attribute, well, feel 
 free ;)).
>> 
>> So aleph will not support the .C interface? ;-)
>> 
 It will look at the timestamp of the source file and delete the package if 
 it is not before 1980 ;). Otherwise it will send a request for punch cards 
 with ".C is deprecated, please upgrade to .Call" stamped out :P At that 
 point I'll be flaming about using the native Aleph interface and not the R 
 compatibility layer ;)
 
 Cheers,
 S
>>> I'll dissent -- I don't think .C is inherently any more dangerous than 
>>> .Call and prefer it's simplicity in many cases.  Calling C at all is what 
>>> is inherently dangerous -- I can reference beyond the end of a vector, 
>>> write over objects that should be read only, and branch to random places 
>>> using either interface. 
> 
>> You can always do so deliberately, but with .C you have no way of preventing 
>> it since you don't even know what is the length! That is certainly far more 
>> dangerous than .Call where you can simply loop over the length, check that 
>> the lengths are compatible etc. Also for types like strings .C is a 
>> minefield that is hard to not blow up whereas .Call it is even more safe 
>> than scalar arrays. You can do none of that with .C which relies entirely on 
>> conventions with no recorded semantics.
> 
> 
>>> If you are dealing with large objects and worry about memory efficiency 
>>> then .Call puts more tools at your disposal and is worth the effort.  
>>> However, I did not find the .Call interface at all easy to use at first
> 
>> I guess this depends on the developer and is certainly a
>> factor. Personally, I find the subset of the R API needed for .Call
>> fairly small and intuitive (in particular when you are just writing a
>> safer replacement for .C), but I'm obviously biased. Maybe in a separate
>> thread we could discuss this - I'd be happy to write a ref card or cheat
>> sheet if I find out what people find challenging on .Call. Nonetheless,
>> my point is that it is more than worth investing the effort both in
>> safety and performance.
> 
> 
> After your previous email I made a mental note "try to finally learn to
> use .Call since I often deal with large objects". So, yes, I'd love to see
> a ref card and cheat sheet: I have tried learning to use .Call a few
> times, but have always gone back to .C since (it seems that) all I needed
> to know are just a couple of conventions, and the rest is "C as usual".
> 
> 
> 
> You say "if I find out what people find challenging on
> .Call". Hummm... can I answer "basically everything"?  I think Terry
> Thereneau says, "the things I needed to know are scattered about in
> multiple places". When I see the convolve example (5.2 in Writing R
> extensions) I understand the C code; when I see the convolve2 example in
> 5.10.1 I think I can guess what lines "PROTECT(a ..." to "xab =
> NUMERIC_POINTER ..."  might be doing, but I would not know how to do that
> on my own. Yes, I can go to 5.9.1 to read about PROTECT, then search for
> ... But, at that point, I've gone back to .C. Of course, this might just
> be my laziness/incompetence/whatever.
> 
> 
> 
> Best,
> 
> 
> R.
> 
> 
> 
> 
> 
> 
> 
> 
> 
>>> and we should keep that in mind before getting too pompous in our lectures 
>>> to the "sinners of .C".  (Mostly because the things I needed to know are 
>>> 

[Rd] .Call ref card [was Re: R-devel Digest, Vol 109, Issue 22]

2012-03-22 Thread Ramon Diaz-Uriarte



On Thu, 22 Mar 2012 10:38:55 -0400,Simon Urbanek  
wrote:

> On Mar 22, 2012, at 9:45 AM, Terry Therneau  wrote:

> > 
> >> 
> >>> 
>   strongly disagree. I'm appalled to see that sentence here.
> >>> > 
> >>> > Come on!
> >>> > 
>  >> The overhead is significant for any large vector and it is in 
>  >> particular unnecessary since in .C you have to allocate *and copy* 
>  >> space even for results (twice!). Also it is very error-prone, because 
>  >> you have no information about the length of vectors so it's easy to 
>  >> run out of bounds and there is no way to check. IMHO .C should not be 
>  >> used for any code written in this century (the only exception may be 
>  >> if you are passing no data, e.g. if all you do is to pass a flag and 
>  >> expect no result, you can get away with it even if it is more 
>  >> dangerous). It is a legacy interface that dates way back and is 
>  >> essentially just re-named .Fortran interface. Again, I would strongly 
>  >> recommend the use of .Call in any recent code because it is safer and 
>  >> more efficient (if you don't care about either attribute, well, feel 
>  >> free ;)).
> >>> > 
> >>> > So aleph will not support the .C interface? ;-)
> >>> > 
> >> It will look at the timestamp of the source file and delete the package if 
> >> it is not before 1980 ;). Otherwise it will send a request for punch cards 
> >> with ".C is deprecated, please upgrade to .Call" stamped out :P At that 
> >> point I'll be flaming about using the native Aleph interface and not the R 
> >> compatibility layer ;)
> >> 
> >> Cheers,
> >> S
> > I'll dissent -- I don't think .C is inherently any more dangerous than 
> > .Call and prefer it's simplicity in many cases.  Calling C at all is what 
> > is inherently dangerous -- I can reference beyond the end of a vector, 
> > write over objects that should be read only, and branch to random places 
> > using either interface. 

> You can always do so deliberately, but with .C you have no way of preventing 
> it since you don't even know what is the length! That is certainly far more 
> dangerous than .Call where you can simply loop over the length, check that 
> the lengths are compatible etc. Also for types like strings .C is a minefield 
> that is hard to not blow up whereas .Call it is even more safe than scalar 
> arrays. You can do none of that with .C which relies entirely on conventions 
> with no recorded semantics.


> > If you are dealing with large objects and worry about memory efficiency 
> > then .Call puts more tools at your disposal and is worth the effort.  
> > However, I did not find the .Call interface at all easy to use at first

> I guess this depends on the developer and is certainly a
> factor. Personally, I find the subset of the R API needed for .Call
> fairly small and intuitive (in particular when you are just writing a
> safer replacement for .C), but I'm obviously biased. Maybe in a separate
> thread we could discuss this - I'd be happy to write a ref card or cheat
> sheet if I find out what people find challenging on .Call. Nonetheless,
> my point is that it is more than worth investing the effort both in
> safety and performance.


After your previous email I made a mental note "try to finally learn to
use .Call since I often deal with large objects". So, yes, I'd love to see
a ref card and cheat sheet: I have tried learning to use .Call a few
times, but have always gone back to .C since (it seems that) all I needed
to know are just a couple of conventions, and the rest is "C as usual".



You say "if I find out what people find challenging on
.Call". Hummm... can I answer "basically everything"?  I think Terry
Thereneau says, "the things I needed to know are scattered about in
multiple places". When I see the convolve example (5.2 in Writing R
extensions) I understand the C code; when I see the convolve2 example in
5.10.1 I think I can guess what lines "PROTECT(a ..." to "xab =
NUMERIC_POINTER ..."  might be doing, but I would not know how to do that
on my own. Yes, I can go to 5.9.1 to read about PROTECT, then search for
... But, at that point, I've gone back to .C. Of course, this might just
be my laziness/incompetence/whatever.

 

Best,


R.









> > and we should keep that in mind before getting too pompous in our lectures 
> > to the "sinners of .C".  (Mostly because the things I needed to know are 
> > scattered about in multiple places.)
> > 
> > I might have to ask for an exemption on that timestamp -- the first bits of 
> > the survival package only reach back to 1986.  And I've had to change 
> > source code systems multiple times which plays hob with the file times, 
> > though I did try to preserve the changelog history to forstall some future 
> > litigious soul who claims they wrote it first  (sccs -> rcs -> cvs -> svn 
> > -> mercurial).   :-)
> > 

> ;) Maybe the rule should be based on the date of the first appearance of the 
>

Re: [Rd] R-devel Digest, Vol 109, Issue 22

2012-03-22 Thread Simon Urbanek

On Mar 22, 2012, at 9:45 AM, Terry Therneau  wrote:

> 
>> 
>>> 
  strongly disagree. I'm appalled to see that sentence here.
>>> > 
>>> > Come on!
>>> > 
 >> The overhead is significant for any large vector and it is in 
 >> particular unnecessary since in .C you have to allocate *and copy* 
 >> space even for results (twice!). Also it is very error-prone, because 
 >> you have no information about the length of vectors so it's easy to run 
 >> out of bounds and there is no way to check. IMHO .C should not be used 
 >> for any code written in this century (the only exception may be if you 
 >> are passing no data, e.g. if all you do is to pass a flag and expect no 
 >> result, you can get away with it even if it is more dangerous). It is a 
 >> legacy interface that dates way back and is essentially just re-named 
 >> .Fortran interface. Again, I would strongly recommend the use of .Call 
 >> in any recent code because it is safer and more efficient (if you don't 
 >> care about either attribute, well, feel free ;)).
>>> > 
>>> > So aleph will not support the .C interface? ;-)
>>> > 
>> It will look at the timestamp of the source file and delete the package if 
>> it is not before 1980 ;). Otherwise it will send a request for punch cards 
>> with ".C is deprecated, please upgrade to .Call" stamped out :P At that 
>> point I'll be flaming about using the native Aleph interface and not the R 
>> compatibility layer ;)
>> 
>> Cheers,
>> S
> I'll dissent -- I don't think .C is inherently any more dangerous than .Call 
> and prefer it's simplicity in many cases.  Calling C at all is what is 
> inherently dangerous -- I can reference beyond the end of a vector, write 
> over objects that should be read only, and branch to random places using 
> either interface. 

You can always do so deliberately, but with .C you have no way of preventing it 
since you don't even know what is the length! That is certainly far more 
dangerous than .Call where you can simply loop over the length, check that the 
lengths are compatible etc. Also for types like strings .C is a minefield that 
is hard to not blow up whereas .Call it is even more safe than scalar arrays. 
You can do none of that with .C which relies entirely on conventions with no 
recorded semantics.


> If you are dealing with large objects and worry about memory efficiency then 
> .Call puts more tools at your disposal and is worth the effort.  However, I 
> did not find the .Call interface at all easy to use at first

I guess this depends on the developer and is certainly a factor. Personally, I 
find the subset of the R API needed for .Call fairly small and intuitive (in 
particular when you are just writing a safer replacement for .C), but I'm 
obviously biased. Maybe in a separate thread we could discuss this - I'd be 
happy to write a ref card or cheat sheet if I find out what people find 
challenging on .Call. Nonetheless, my point is that it is more than worth 
investing the effort both in safety and performance.


> and we should keep that in mind before getting too pompous in our lectures to 
> the "sinners of .C".  (Mostly because the things I needed to know are 
> scattered about in multiple places.)
> 
> I might have to ask for an exemption on that timestamp -- the first bits of 
> the survival package only reach back to 1986.  And I've had to change source 
> code systems multiple times which plays hob with the file times, though I did 
> try to preserve the changelog history to forstall some future litigious soul 
> who claims they wrote it first  (sccs -> rcs -> cvs -> svn -> mercurial).   
> :-)
> 

;) Maybe the rule should be based on the date of the first appearance of the 
package, fair enough :)

Cheers,
Simon
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 109, Issue 22

2012-03-22 Thread Terry Therneau

>>>   strongly disagree. I'm appalled to see that sentence here.
>> >  
>> >  Come on!
>> >  
>>> >>  The overhead is significant for any large vector and it is in 
>>> >> particular unnecessary since in .C you have to allocate*and copy*  space 
>>> >> even for results (twice!). Also it is very error-prone, because you have 
>>> >> no information about the length of vectors so it's easy to run out of 
>>> >> bounds and there is no way to check. IMHO .C should not be used for any 
>>> >> code written in this century (the only exception may be if you are 
>>> >> passing no data, e.g. if all you do is to pass a flag and expect no 
>>> >> result, you can get away with it even if it is more dangerous). It is a 
>>> >> legacy interface that dates way back and is essentially just re-named 
>>> >> .Fortran interface. Again, I would strongly recommend the use of .Call 
>>> >> in any recent code because it is safer and more efficient (if you don't 
>>> >> care about either attribute, well, feel free ;)).
>> >  
>> >  So aleph will not support the .C interface? ;-)
>> >  
> It will look at the timestamp of the source file and delete the package if it 
> is not before 1980 ;). Otherwise it will send a request for punch cards with 
> ".C is deprecated, please upgrade to .Call" stamped out :P At that point I'll 
> be flaming about using the native Aleph interface and not the R compatibility 
> layer ;)
>
> Cheers,
> S
I'll dissent -- I don't think .C is inherently any more dangerous than 
.Call and prefer it's simplicity in many cases.  Calling C at all is 
what is inherently dangerous -- I can reference beyond the end of a 
vector, write over objects that should be read only, and branch to 
random places using either interface.  If you are dealing with large 
objects and worry about memory efficiency then .Call puts more tools at 
your disposal and is worth the effort.  However, I did not find the 
.Call interface at all easy to use at first and we should keep that in 
mind before getting too pompous in our lectures to the "sinners of .C".  
(Mostly because the things I needed to know are scattered about in 
multiple places.)

I might have to ask for an exemption on that timestamp -- the first bits 
of the survival package only reach back to 1986.  And I've had to change 
source code systems multiple times which plays hob with the file times, 
though I did try to preserve the changelog history to forstall some 
future litigious soul who claims they wrote it first  (sccs -> rcs -> 
cvs -> svn -> mercurial).   :-)

Terry T

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 2.14.1 memory management under Windows

2012-03-22 Thread Spencer Graves
Thanks for the replies and please excuse my failure to provide 
sessionInfo():



WINDOWS 7 WITH 8 GB RAM:


> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] fda_2.2.8 zoo_1.7-7

loaded via a namespace (and not attached):
[1] grid_2.14.1lattice_0.20-0


FEDORA 13 LINUX WITH 4 GB RAM (copied manually, thereby increasing the 
risks of copying errors):



> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform:  i386-redhat-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.utf8LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=CLC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8LC_NAME=C
 [9] LC_ADDRESS=CLC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8LC_IDENTIFICATION=C

attached base packages:
[1] splines  stats  graphics  grDevices  utils  datgasets  methods
[8] base

other attached packages:
[1] fda_2.2.6  zoo_1.6-5

loaded via a namespace (and not attached):
[1] grid_2.12.0  lattice_0.19-30


  Thanks again,
  Spencer


On 3/22/2012 5:02 AM, Prof Brian Ripley wrote:

On 22/03/2012 06:11, Peter Meilstrup wrote:

My guess would be that it's a matter of having swap space be a dedicated
partition or fixed-size file (Linux, usually) versus swapping to a 
regular

file that grows as needed (Windows and OS X, usually.) So if you
defragmented your drive and set Windows to have a fixedsize swap 
file, it

would probably behave more like your Linux machine.


There is far more to the topic than that, but the answer here appears 
to be a complete failure to supply the relevant information.


We haven't even been told the 'at a minumum' information required by 
the posting guide, so we do not know what architectures are in use.  
The messages suggest that 'Linux' is 32-bit and 'Windows' is 64-bit, 
in which case the tasks are simply not comparable.  On 32-bit R on 
Windows I got the message about 3.4GB after 0.05 sec.  Conversely, 
with 64-bit R on an 8GB Linux box with 16GB swap it swapped away for 
about 10 minutes.  On a 32GB box it succeeded after 270s, typically 
using 8-14GB.  The object SG tried to create is a bit over 7GB.


But Windows' memory management is notoriously slow, and R actually 
adds a layer on top to make it tolerable for routine use of R.


I have no idea why this was posted on R-devel: it did not involve R 
development nor programming, just a basic understanding of 32- vs 
64-bit R.




Peter

On Wed, Mar 21, 2012 at 10:14 PM, Spencer Graves<
spencer.gra...@prodsyse.com>  wrote:


I computed "system.time(diag(3))" with R 2.12.0 on Fedora 13 Linux
with 4 GB RAM and with R 2.14.1 on Windows 7 with 8 GB RAM:


Linux (4 GB RAM):  0, 0.21, 0.21 -- a fifth of a second


Windows 7 (8 GB RAM):  11.37 7.47 93.19 -- over 1.5 minutes.  Moreover,
during most of that time, I could not switch windows or get any 
response
from the system.  When I first encountered this, I thought Windows 
was hung

permanently and the only way out was a hard reset and reboot.


  On both systems, diag(3) generated, "Error:  cannot allocate
vector of size ___ Gb", with "___" = 3.4 for Linux with 4 GB RAM and 
6.7
for Windows with 8 GB RAM.  Linux with half the RAM and an older 
version of
R was done with this in 0.21 seconds.  Windows 7 went into 
suspension for

over 93 seconds -- 1.5 minutes before giving an error message.


   I don't know how easy this would be to fix under Windows, but 
I felt

a need to report it.


  Best Wishes,
  Spencer


--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__**
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-devel 





[[alternative HTML version deleted]]




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel






--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] uncompressed saves warning

2012-03-22 Thread Michael Friendly

On 3/21/2012 1:22 PM, Uwe Ligges wrote:

What is the equivalent R command to compress these files in my project
tree?



Michael,

if you use
R CMD build --resave-data
to build the tar archive, the versions therein are recompressed.
But AFAIK, in StatET, R CMD build  builds a separate .tar.gz file under 
c:/eclipse, and does not affect

the project directory where these files are stored and sync'd with R-Forge.


Otherwise, you can also open the files and resave them via save() and 
appropriate arguments.

I exported the .rda files to c:/R/data and ran

> load("gfrance.rda")
> load("gfrance85.rda")
> save(gfrance, file="gfrance.RData", compress="xz")
Error in xzfile(file, "wb", compression = 9) : cannot open the connection
In addition: Warning message:
In xzfile(file, "wb", compression = 9) :
  cannot initialize lzma encoder, error 5

Why doesn't this work?

> save(gfrance, file="gfrance.RData", compress=TRUE)

The above works, but only compresses a 300K file to 299K


Or use  resaveRdaFiles() in package tools to runn it on a whole folder 
automatically.




> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United 
States.1252


attached base packages:
[1] grid  stats graphics  grDevices utils datasets  
methods   base


other attached packages:
[1] p3d_0.02-4   mgcv_1.7-13  car_2.0-12   nnet_7.3-1   
rgl_0.92.798 vcd_1.2-13   colorspace_1.1-1 MASS_7.3-17


loaded via a namespace (and not attached):
[1] lattice_0.20-6 Matrix_1.0-4   nlme_3.1-103   tools_2.14.1
>

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 2.14.1 memory management under Windows

2012-03-22 Thread Prof Brian Ripley

On 22/03/2012 06:11, Peter Meilstrup wrote:

My guess would be that it's a matter of having swap space be a dedicated
partition or fixed-size file (Linux, usually) versus swapping to a regular
file that grows as needed (Windows and OS X, usually.) So if you
defragmented your drive and set Windows to have a fixedsize swap file, it
would probably behave more like your Linux machine.


There is far more to the topic than that, but the answer here appears to 
be a complete failure to supply the relevant information.


We haven't even been told the 'at a minumum' information required by the 
posting guide, so we do not know what architectures are in use.  The 
messages suggest that 'Linux' is 32-bit and 'Windows' is 64-bit, in 
which case the tasks are simply not comparable.  On 32-bit R on Windows 
I got the message about 3.4GB after 0.05 sec.  Conversely, with 64-bit R 
on an 8GB Linux box with 16GB swap it swapped away for about 10 minutes. 
 On a 32GB box it succeeded after 270s, typically using 8-14GB.  The 
object SG tried to create is a bit over 7GB.


But Windows' memory management is notoriously slow, and R actually adds 
a layer on top to make it tolerable for routine use of R.


I have no idea why this was posted on R-devel: it did not involve R 
development nor programming, just a basic understanding of 32- vs 64-bit R.




Peter

On Wed, Mar 21, 2012 at 10:14 PM, Spencer Graves<
spencer.gra...@prodsyse.com>  wrote:


I computed "system.time(diag(3))" with R 2.12.0 on Fedora 13 Linux
with 4 GB RAM and with R 2.14.1 on Windows 7 with 8 GB RAM:


Linux (4 GB RAM):  0, 0.21, 0.21 -- a fifth of a second


Windows 7 (8 GB RAM):  11.37 7.47 93.19 -- over 1.5 minutes.  Moreover,
during most of that time, I could not switch windows or get any response
from the system.  When I first encountered this, I thought Windows was hung
permanently and the only way out was a hard reset and reboot.


  On both systems, diag(3) generated, "Error:  cannot allocate
vector of size ___ Gb", with "___" = 3.4 for Linux with 4 GB RAM and 6.7
for Windows with 8 GB RAM.  Linux with half the RAM and an older version of
R was done with this in 0.21 seconds.  Windows 7 went into suspension for
over 93 seconds -- 1.5 minutes before giving an error message.


   I don't know how easy this would be to fix under Windows, but I felt
a need to report it.


  Best Wishes,
  Spencer


--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__**
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-devel



[[alternative HTML version deleted]]




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 2.14.1 memory management under Windows

2012-03-22 Thread Peter Meilstrup
My guess would be that it's a matter of having swap space be a dedicated
partition or fixed-size file (Linux, usually) versus swapping to a regular
file that grows as needed (Windows and OS X, usually.) So if you
defragmented your drive and set Windows to have a fixedsize swap file, it
would probably behave more like your Linux machine.

Peter

On Wed, Mar 21, 2012 at 10:14 PM, Spencer Graves <
spencer.gra...@prodsyse.com> wrote:

> I computed "system.time(diag(3))" with R 2.12.0 on Fedora 13 Linux
> with 4 GB RAM and with R 2.14.1 on Windows 7 with 8 GB RAM:
>
>
> Linux (4 GB RAM):  0, 0.21, 0.21 -- a fifth of a second
>
>
> Windows 7 (8 GB RAM):  11.37 7.47 93.19 -- over 1.5 minutes.  Moreover,
> during most of that time, I could not switch windows or get any response
> from the system.  When I first encountered this, I thought Windows was hung
> permanently and the only way out was a hard reset and reboot.
>
>
>  On both systems, diag(3) generated, "Error:  cannot allocate
> vector of size ___ Gb", with "___" = 3.4 for Linux with 4 GB RAM and 6.7
> for Windows with 8 GB RAM.  Linux with half the RAM and an older version of
> R was done with this in 0.21 seconds.  Windows 7 went into suspension for
> over 93 seconds -- 1.5 minutes before giving an error message.
>
>
>   I don't know how easy this would be to fix under Windows, but I felt
> a need to report it.
>
>
>  Best Wishes,
>  Spencer
>
>
> --
> Spencer Graves, PE, PhD
> President and Chief Technology Officer
> Structure Inspection and Monitoring, Inc.
> 751 Emerson Ct.
> San José, CA 95126
> ph:  408-655-4567
> web:  www.structuremonitoring.com
>
> __**
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel