date:20121214

[R] How to import the large dataset into R

2012-12-14 Thread Tammy Ma


HI, R Users,


I met the following problem:
I was trying to import data of one table in .accdb database into my ODBC 
database for being imported into R. The table contains 1021965 records. 
 I always got the following
error msg even I change the drive:

The query can not be completed. Either the size of the query
result is larger than the maximum size of a database(2GB), or there is not
enough storage space on the disk to store the query result.

Have you got some hints why it happened like this? Is it possible
that I get all data in that table in another way to be imported into R?

Thanks a lot.
Kind regards.Tammy



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Beta-coefficients for ZINB model

2012-12-14 Thread Jeremy Goss

Dear users,

Does anyone have any idea how to generate standardised beta coefficients
for a ZINB model in R to compare which explanatory variables are having the
greatest impact on the dependent variable?

Thanks,
Jeremy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] PST version 0.81

2012-12-14 Thread Alexis Gabadinho


Dear R users,

The PST package (version 0.81) is now available on the CRAN. It enables 
the modelling and analysis of categorical sequences with probabilistic 
suffix trees (PST).


The package fits variable length Markov chain models and store them in 
PSTs. It is specifically adapted to the field of social sciences by 
allowing to learn VLMC models from sets of individual sequences possibly 
containing missing values, and to account for case weights.


Fitted PSTs can be used among other for sequence prediction (pattern 
mining), imputation and artificial sequences generation. The package 
also includes original visualization tools of both the model and 
outcomes of sequence prediction.


Alexis Gabadinho
NCCR LIVES
Institute for demographic and life course studies
University of Geneva
Switzerland

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] recursion depth limitations

2012-12-14 Thread Suzen, Mehmet

On 14 December 2012 12:13, Suzen, Mehmet su...@acm.org wrote:
 On 13 December 2012 23:21, Rui Barradas ruipbarra...@sapo.pt wrote:
 But it does, each recursive call will load another copy of the function, and
 another copy of the variables used.
 In fact, the cost can become quite large since everything is loaded in
 memory again.

 Hope this helps,


 Many thanks for the replies.

 What about tail-recursion?  I have seen that there were discussions
 about this:

 https://stat.ethz.ch/pipermail/r-help/2006-April/102873.html

 Since it was 6 years ago, maybe now things are little different.


Isn't it logical to translate any recursive function to tail-recursive
internally? While tail-recursive
version returns a value but the operation is essentially the same. I don't know
how difficult to do it generically but maybe there is an advantage of
keeping recursion
as it is. What would that advantage be?

For example, tail recursion would run but not the recursive version:

 options(expressions=50)
  f - function(x) if(x 0) f(x-1);
 system.time(f(1000))
Error: C stack usage is too close to the limit
Error: C stack usage is too close to the limit
 f1-function(x) x-1
 f - function(x) f1(x);
 system.time(f(1000))
   user  system elapsed
  0   0   0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] recursion depth limitations

2012-12-14 Thread Simon Urbanek


On Dec 14, 2012, at 6:25 AM, Suzen, Mehmet wrote:

 On 14 December 2012 12:13, Suzen, Mehmet su...@acm.org wrote:
 On 13 December 2012 23:21, Rui Barradas ruipbarra...@sapo.pt wrote:
 But it does, each recursive call will load another copy of the function, and
 another copy of the variables used.
 In fact, the cost can become quite large since everything is loaded in
 memory again.
 
 Hope this helps,
 
 
 Many thanks for the replies.
 
 What about tail-recursion?  I have seen that there were discussions
 about this:
 
 https://stat.ethz.ch/pipermail/r-help/2006-April/102873.html
 
 Since it was 6 years ago, maybe now things are little different.
 
 
 Isn't it logical to translate any recursive function to tail-recursive
 internally? While tail-recursive
 version returns a value but the operation is essentially the same. I don't 
 know
 how difficult to do it generically but maybe there is an advantage of
 keeping recursion
 as it is. What would that advantage be?
 
 For example, tail recursion would run but not the recursive version:
 
 options(expressions=50)
 f - function(x) if(x 0) f(x-1);
 system.time(f(1000))
 Error: C stack usage is too close to the limit
 Error: C stack usage is too close to the limit
 f1-function(x) x-1
 f - function(x) f1(x);
 system.time(f(1000))
   user  system elapsed
  0   0   0
 
 

You may be a bit misinformed about with tail recursion means - it still needs 
to evaluate the function for each recursion step, the only difference is that 
in such special case there is no additional information that needs to be stored 
-- and you have also proven why it's not as simple as you think:

 f - function(x) if(x 0) f(x-1)
 (f(100))
NULL

 f1-function(x) x-1
 f - function(x) f1(x);
 (f(100))
[1] 99

Your latter version doesn't actually do anything anywhere close to the 
recursion you defined. R doesn't have tail recursion elimination and as Thomas 
pointed out in the cited post it would break some existing things if it did 
(call frame introspection etc.). [Whether it would be ok to break it for the 
sake of optimization is another question]

Note that in all such cases you can easily write the problem in an iterative 
fashion (admittedly it is less elegant -- but if you are dealing with such deep 
structures chances are that you probably care about efficiency and to go native 
code anyway).

Cheers,
Simon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem loading .Rdata file

2012-12-14 Thread ONKELINX, Thierry

Dear all,

I'm having troubles migrating a large matrix from one system to another.

#system 1: Ubuntu 12.04, 64-bit, running R 2.15.2
# do some simulations
# save the simulations
 save(Output, file = Simulations.Rdata)
#Output is a numeric matrix with 6 columns and about 2M rows.

Use ftp to transfer the Simulations.Rdata file to system 2

#system2: Windows XP, 32-bit running R 2.15.2
#in a clean R session
 load(Simulations.Rdata)
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file 'Simulations.Rdata' has magic number ''
   Use of save versions prior to 2 is deprecated

The load() functions works correct on system 1

Any ideas how to solve this problem?

Best regards,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem loading .Rdata file

2012-12-14 Thread Duncan Murdoch


On 12-12-14 8:00 AM, ONKELINX, Thierry wrote:

Dear all,

I'm having troubles migrating a large matrix from one system to another.

#system 1: Ubuntu 12.04, 64-bit, running R 2.15.2
# do some simulations
# save the simulations

save(Output, file = Simulations.Rdata)

#Output is a numeric matrix with 6 columns and about 2M rows.

Use ftp to transfer the Simulations.Rdata file to system 2

#system2: Windows XP, 32-bit running R 2.15.2
#in a clean R session

load(Simulations.Rdata)

Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file 'Simulations.Rdata' has magic number ''
Use of save versions prior to 2 is deprecated

The load() functions works correct on system 1

Any ideas how to solve this problem?


I suspect you really did get a corrupted transfer.  Try computing an md5 
checksum before and after the transfer, using tools::md5sum.


Duncan Murdoch



Best regards,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprof causing R to crash

2012-12-14 Thread Jon Olav Skoien


Uwe,

I am unfortunately not able to upgrade to R 2.15.2 right now, but I have 
seen a similar problem with several older R versions. If you want to 
test with a shorter script, you can try the lines below. These provoke a 
crash from a fresh R session on my machine (R 2.15.1 Windows 7):


Rprof()
z = 1
for (i in 1:1e8) z = z+1/i

This runs without problems without the call to Rprof(). I was far from 
using all my memory with this call, but you could also check with a 
higher number of loops in case you can run these lines and it is somehow 
memory related. Just for comparison, I tried 1e9 loops after a call to 
Rprof() on one of our Linux servers (R 2.14.0) without any problems.


Jon

On 12-Dec-12 17:06, Uwe Ligges wrote:



On 12.12.2012 00:05, Marian Talbert wrote:
I'm trying to use Rprof() to identify bottlenecks and speed up a 
particullary

slow section of code which reads in a portion of a tif file and compares
each of the values to values of predictors used for model fitting.  I've
written up an example that anyone can run.  Generally temp would be a
section of a tif read into a data.frame and used later for other 
processing.
The first portion which just records the time works in about 6 
seconds the
second part causes RGui to immediately close with no error or 
warning.  Any

advice on how to get Rprof to work or how to speed up this code would be
greatly appreciated.  I'm using Windows 7 (which might be my problem) 
and R

version 2.15.0.


The problem is rather the R version: I cannot reproduce errors with a 
recent R.


Uwe Ligges




CalcMESS-function(tiff.entry,pred.vect){
f-sum(pred.vecttiff.entry)/length(pred.vect)*100
   if(is.na(f)) return(NA)
   if(f==0)
return((tiff.entry-min(pred.vect))/(max(pred.vect)-min(pred.vect))*100)
   if(0f  f=50) return(2*f)
   if(50=f  f100) return(2*(100-f))
   if(f==100)
return((max(pred.vect)-tiff.entry)/(max(pred.vect)-min(pred.vect))*100)
   else return(NA)
}

train.dat - 
data.frame(a=runif(200),b=runif(200),c=runif(200),d=runif(200))

temp -
data.frame(a=runif(13),b=runif(13),c=runif(13),d=runif(13)) 


pred.rng-temp
vnames.final.mod - names(train.dat)
nvars.final - length(vnames.final.mod)

start.time-Sys.time()
  for(k in 1:nvars.final){

pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))]

pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) 


  }
Sys.time()-start.time


Rprof(C:\\temp\\mapply.out)
  for(k in 1:nvars.final){

pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))]

pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range)) 


  }
Rprof(NULL)



--
View this message in context: 
http://r.789695.n4.nabble.com/Rprof-causing-R-to-crash-tp4652846.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] winBUGS with R2WinBUGS on a mac – path issue

2012-12-14 Thread Martin Batholdy

Hi,

I try to execute a winBUGS model within R on a Mac.
I use wine and the R2WinBUGS package.

Now I have a small problem with the path variables;

the path to the bugs directory include paranthesis and because of that it won't 
run.
error message:
sh: -c: line 0: syntax error near unexpected token `(' 

Is there any way to get winBUGS to run even with a path that includes 
parenthesis?


thanks for any suggestions!
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beta-coefficients for ZINB model

2012-12-14 Thread Michael Dewey


At 08:33 14/12/2012, Jeremy Goss wrote:

Dear users,

Does anyone have any idea how to generate standardised beta coefficients
for a ZINB model in R to compare which explanatory variables are having the
greatest impact on the dependent variable?


You might like to look at Ulrike Grömping's work 
on relative importance in regression. Not sure 
how directly this translates into the non-linear case though.


J Stat Soft October 2006 vol 17 number 1



Thanks,
Jeremy

[[alternative HTML version deleted]]


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] More efficient use of reshape?

2012-12-14 Thread John Kane

I think David was pointing out that reshape() is not a reshape2 function.  It 
is in the stats package.

I am not sure exactly what you are doing but perhaps something along the lines 
of 

library(reshape2)  
mm  -  melt(clim.data, id = Cs(yr_frac, yr_mn,AMO, NINO34, SSTA))

is a start?  

I also don't think that the more recent versions of ggplot2 automatically load 
reshape2 so it may be that you are working with a relatively old installation 
of ggplot and reshape?

sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C   LC_TIME=en_CA.UTF-8  
 
 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=C LC_NAME=C  LC_ADDRESS=C 
 
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C  
 

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods   base  
   

other attached packages:
[1] lubridate_1.2.0directlabels_2.9   RColorBrewer_1.0-5 gridExtra_0.9.1
stringr_0.6.2 
[6] scales_0.2.3   plyr_1.8   reshape2_1.2.1 ggplot2_0.9.3 

loaded via a namespace (and not attached):
[1] colorspace_1.2-0 dichromat_1.2-4  digest_0.6.0 gtable_0.1.2 
labeling_0.1
[6] MASS_7.3-22  munsell_0.4  proto_0.3-9.2tools_2.15.2






John Kane
Kingston ON Canada


 -Original Message-
 From: natemille...@gmail.com
 Sent: Thu, 13 Dec 2012 09:58:34 -0800
 To: dwinsem...@comcast.net
 Subject: Re: [R] More efficient use of reshape?
 
 Sorry David,
 
 In my attempt to simplify example and just include the code I felt was
 necessary I left out the loading of ggplot2, which then imports reshape2,
 and which was actually used in the code I provided. Sorry to the mistake
 and my misunderstanding of where the reshape function was coming from.
 Should have checked that more carefully.
 
 Thanks,
 Nate
 
 
 On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius
 dwinsem...@comcast.netwrote:
 
 
 On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
 
  Hi all,
 
 I have played a bit with the reshape package and function along with
 melt and cast, but I feel I still don't have a good handle on how
 to
 use them efficiently. Below I have included a application of reshape
 that
 is rather clunky and I'm hoping someone can offer advice on how to use
 reshape (or melt/cast) more efficiently.
 
 
 You do realize that the 'reshape' function is _not_ in the reshape
 package, right? And also that the reshape package has been superseded by
 the reshape2 package?
 
 --
 David.
 
 
 #For this example I am using climate change data available on-line
 
 file - (
 http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csvhttp://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv
 )
 clim.data - read.csv(file, header=TRUE)
 
 library(lubridate)
 library(reshape)
 
 #I've been playing with the lubridate package a bit to work with dates,
 but
 as the climate dataset only uses year and month I have
 #added a day to each entry in the yr_mn column and then used dym
 from
 lubridate to generate the POSIXlt formatted dates in
 #a new column clim.data$date
 
 clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=)
 clim.data$date-dym(clim.data$**yr_mn)
 
 #Now to the reshape. The dataframe is in a wide format. The columns
 GISS,
 HAD, NOAA, RSS, and UAH are all different sources
 #from which the global temperature anomaly has been calculated since
 1880
 (actually only 1978 for RSS and UAH). What I would like to
 #do is plot the temperature anomaly vs date and use ggplot to facet by
 the
 different data source (GISS, HAD, etc.). Thus I need the
 #data in long format with a date column, a temperature anomaly column,
 and
 a data source column. The code below works, but its
 #really very clunky and I'm sure I am not using these tools as
 efficiently
 as I can.
 
 #The varying=list(3:7) specifies the columns in the dataframe that
 corresponded to the sources (GISS, etc.), though then in the resulting
 #reshaped dataframe the sources are numbered 1-5, so I have to
 reassigned
 their names. In addition, the original dataframe has
 #additional data columns I do not want and so after reshaping I create
 another! dataframe with just the columns I need, and
 #then I have to rename them so that I can keep track of what everything
 is.
 Whew! Not the most elegant of code.
 
 d-reshape(clim.data, varying=list(3:7),idvar=date**,
 v.names=anomaly,direction=**long)
 
 d$time-ifelse(d$time==1,**GISS,d$time)
 d$time-ifelse(d$time==2,HAD**,d$time)
 d$time-ifelse(d$time==3,**NOAA,d$time)
 d$time-ifelse(d$time==4,RSS**,d$time)
 d$time-ifelse(d$time==5,UAH**,d$time)
 
 new.data-data.frame(d$date,d$**time,d$anomaly)
 names(new.data)-c(date,**source,anomaly)
 
 I realize this is a mess, though it works. I think with just some help
 on
 how better to work this

[R] Hello R User

2012-12-14 Thread bibek sharma

Hello R User,
In the sample data given below, time is recorded for each id
subsequently. For the analysis, for each id, I would like to set 1st
recorded time to zero and thereafter find the difference from previous
time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs
to be implemented to big data set.
Any suggestions are much appreciated!
Thanks,
Bibek

ID  Time
1   3
1   6
1   7
1   10
1   16
2   12
2   18
2   19
2   25
2   28
2   30

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beta-coefficients for ZINB model

2012-12-14 Thread Cade, Brian

I always liked Johan Bring's (1994. How to standardize regression
coefficients. Am. Stat. 48(3):209-213.) arguments for standardizing the
beta estimates in a linear model by their partial standard deviations which
equates to comparing ratios of t-statistics between variables to determine
their relative importance. It seems to me that this approach ought to
work reasonably well for any of the generalized linear models.

Brian

On Fri, Dec 14, 2012 at 8:00 AM, Michael Dewey i...@aghmed.fsnet.co.ukwrote:

At 08:33 14/12/2012, Jeremy Goss wrote:

Dear users,

Does anyone have any idea how to generate standardised beta coefficients
for a ZINB model in R to compare which explanatory variables are having
the
greatest impact on the dependent variable?

You might like to look at Ulrike Grömping's work on relative importance in
regression. Not sure how directly this translates into the non-linear case
though.

J Stat Soft October 2006 vol 17 number 1

Thanks,
Jeremy

[[alternative HTML version deleted]]

Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/**home.htmlhttp://www.aghmed.fsnet.co.uk/home.html

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO 80526-8818

email: brian_c...@usgs.gov
tel: 970 226-9326

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Define a custom-built loss function and combine it with nls()

2012-12-14 Thread Christiana Anagnostou

Dear R helpers,

For an allometric analysis (allometric equation y = a*x^b) I would like 
to apply a non-linear regression instead of using log-log 
transformations of the measured parameters x and y and a Model II linear 
regression. Since both of the variables x and y are random, I would like 
to apply a Model II non-linear analog of either Reduced Major Axis or 
Major Axis Regression.

The allometric equation for the non-linear regression model would be: y 
= a*x^b, with y as the dimension of a body part (dependent variable) and 
x a dimension of body size (independent variable). With applying a 
non-linear regression I would like to get estimates for the parameters a 
and b.

Fortunately, I already have a recommended loss function, proposed by 
Ebert  Russel 1994, J. Theor. Biol.:

loss = abs(y*abs(x-(y/a)^(1/b))-a/(b+1)*abs(x^(b+1)-(y/a)^((b+1)/b)))

How may I define the loss function above and combine it with a 
non-linear regression nls() in R?



Example for a data frame with the measured variables x and y:

x-1:10

y-c(0.3,0.7,0.9,1.3,2.0,2.4,3.3,3.8,5.0,5.8)

d-data.frame(x,y)

Non-linear regression with inbuilt loss-function (sum of squared residuals):

nlmodel-nls(y~a*x^b,data=d,start=list(a=1,b=0.1),trace=T)

Many thanks!!!

Christiana

-- 
Christiana Anagnostou
*
Department of Evolutionary Ecology and Genetics
Zoological Institute
Christian-Albrechts-University of Kiel
Am Botanischen Garten 9
24118 Kiel
Germany
Tel: +49-431-8804145 (work)
Tel: +49-431-5709649 (home)
Fax: +49-431-8802403
Email: canagnos...@zoologie.uni-kiel.de
Web: www.uni-kiel.de/zoologie/evoecogen/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] More efficient use of reshape?

2012-12-14 Thread Nathan Miller

Thanks John,

I appreciate your help.

With the help of Dennis Murphy, code along the lines of this

climData_melt - melt(clim.data, id = 'date', measure = c(GISS, HAD,
  NOAA, RSS, UAH))

actually gets the data into the correct form very simply and from there its
easy to plot with faceting in ggplot2.

Thanks for helping clear up the reshape and reshape2 issues. I try to make
sure I have those types of things figured out before posting to the list.
Didn't mean to confuse the issue by incorrectly referring to the
packages/functions or their locations.

Thanks,
Nate



On Fri, Dec 14, 2012 at 7:38 AM, John Kane jrkrid...@inbox.com wrote:

 I think David was pointing out that reshape() is not a reshape2 function.
  It is in the stats package.

 I am not sure exactly what you are doing but perhaps something along the
 lines of

 library(reshape2)
 mm  -  melt(clim.data, id = Cs(yr_frac, yr_mn,AMO, NINO34,
 SSTA))

 is a start?

 I also don't think that the more recent versions of ggplot2 automatically
 load reshape2 so it may be that you are working with a relatively old
 installation of ggplot and reshape?

 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: i686-pc-linux-gnu (32-bit)

 locale:
  [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
 LC_TIME=en_CA.UTF-8
  [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
  LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=C LC_NAME=C  LC_ADDRESS=C
 [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8
 LC_IDENTIFICATION=C

 attached base packages:
 [1] grid  stats graphics  grDevices utils datasets  methods
 base

 other attached packages:
 [1] lubridate_1.2.0directlabels_2.9   RColorBrewer_1.0-5
 gridExtra_0.9.1stringr_0.6.2
 [6] scales_0.2.3   plyr_1.8   reshape2_1.2.1 ggplot2_0.9.3

 loaded via a namespace (and not attached):
 [1] colorspace_1.2-0 dichromat_1.2-4  digest_0.6.0 gtable_0.1.2
 labeling_0.1
 [6] MASS_7.3-22  munsell_0.4  proto_0.3-9.2tools_2.15.2






 John Kane
 Kingston ON Canada


  -Original Message-
  From: natemille...@gmail.com
  Sent: Thu, 13 Dec 2012 09:58:34 -0800
  To: dwinsem...@comcast.net
  Subject: Re: [R] More efficient use of reshape?
 
  Sorry David,
 
  In my attempt to simplify example and just include the code I felt was
  necessary I left out the loading of ggplot2, which then imports reshape2,
  and which was actually used in the code I provided. Sorry to the mistake
  and my misunderstanding of where the reshape function was coming from.
  Should have checked that more carefully.
 
  Thanks,
  Nate
 
 
  On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius
  dwinsem...@comcast.netwrote:
 
 
  On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
 
   Hi all,
 
  I have played a bit with the reshape package and function along with
  melt and cast, but I feel I still don't have a good handle on how
  to
  use them efficiently. Below I have included a application of reshape
  that
  is rather clunky and I'm hoping someone can offer advice on how to use
  reshape (or melt/cast) more efficiently.
 
 
  You do realize that the 'reshape' function is _not_ in the reshape
  package, right? And also that the reshape package has been superseded by
  the reshape2 package?
 
  --
  David.
 
 
  #For this example I am using climate change data available on-line
 
  file - (
 
 http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv
 http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv
  )
  clim.data - read.csv(file, header=TRUE)
 
  library(lubridate)
  library(reshape)
 
  #I've been playing with the lubridate package a bit to work with dates,
  but
  as the climate dataset only uses year and month I have
  #added a day to each entry in the yr_mn column and then used dym
  from
  lubridate to generate the POSIXlt formatted dates in
  #a new column clim.data$date
 
  clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=)
  clim.data$date-dym(clim.data$**yr_mn)
 
  #Now to the reshape. The dataframe is in a wide format. The columns
  GISS,
  HAD, NOAA, RSS, and UAH are all different sources
  #from which the global temperature anomaly has been calculated since
  1880
  (actually only 1978 for RSS and UAH). What I would like to
  #do is plot the temperature anomaly vs date and use ggplot to facet by
  the
  different data source (GISS, HAD, etc.). Thus I need the
  #data in long format with a date column, a temperature anomaly column,
  and
  a data source column. The code below works, but its
  #really very clunky and I'm sure I am not using these tools as
  efficiently
  as I can.
 
  #The varying=list(3:7) specifies the columns in the dataframe that
  corresponded to the sources (GISS, etc.), though then in the resulting
  #reshaped dataframe the sources are numbered 1-5, so I have to
  reassigned
  their names. In addition, the original dataframe has
  #additional data columns I do not want and so after reshaping I

[R] Plotting from a matrix, where column 1 is your x value

2012-12-14 Thread masepot

Hello,

I'm very new to R, and have managed to plot xy graphs, but can't seem to
plot a matrix properly.

The first column is my time points, and the following columns 2-4 are the
replicates of my experiment.

I've tried using row.names=1 to make the first column the value for that row
(whereas before I had 1-31 sequence) and then my 1st column) but I can't
work out how to tell it that column 1 of my dataset is the xvalue for the
data in that row?

It's probably really simple, but I can't seem to figure it out!

Many thanks, any help appreciated!

Laura



--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-from-a-matrix-where-column-1-is-your-x-value-tp4653076.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems with displaying graphics in Linux

2012-12-14 Thread Wibowo Arindrarto

Hi R users,

I've previously updated my R installation to R 2.15.2 in my Arch Linux
64 bit. I haven't used it in a while after updating, but now that I'm
using it again, I find that the graphics device is not functioning
properly.

For any plots that I create, it's going to show a blank window. Upon
clicking the close button of the window, the plot that it's supposed
to show flickers very briefly and then the window closes.

Running dev.list() when the window is visible (without any plots)
shows this output.
X11cairo
   2

It changes to NULL, as expected, if I run it after closing the window.

Does anyone have any pointers on how to fix this?

Thanks in advance,
Bow

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] COZIGAM removed from CRAN

2012-12-14 Thread Raeanne Miller

Hi All,

Another quick question - I noticed that COZIGAM has been removed from CRAN, and 
that you are referred to the archive for previous versions (last updated 23 
July 2012). Is this package still ok to use? Or is there an alternative which 
might also fit zero-inflated GAMs?

Thanks,

Raeanne
The Scottish Association for Marine Science (SAMS) is registered in Scotland as 
a Company Limited by Guarantee (SC009292) and is a registered charity (9206). 
SAMS has an actively trading wholly owned subsidiary company: SAMS Research 
Services Ltd a Limited Company (SC224404). All Companies in the group are 
registered in Scotland and share a registered office at Scottish Marine 
Institute, Oban Argyll PA37 1QA. The content of this message may contain 
personal views which are not the views of SAMS unless specifically stated. 
Please note that all email traffic is monitored for purposes of security and 
spam filtering. As such individual emails may be examined in more detail.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotting log regression

2012-12-14 Thread mrkooper

I want to plot the regression line of a log regression into a plot with my
normal, nonlog, data. 

for example

x - (1,2,3,4,5)
y - (6,7,8,9,10)

plot (x,y)

I tried a log-regression by

a - glm (log(y) ~ log(x))

and then i tried to insert the answer to my graph, where the standard values
are shown: 
abline (a, col=red) 

unfortunately it does not work




--
View this message in context: 
http://r.789695.n4.nabble.com/plotting-log-regression-tp4653087.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] recursion depth limitations

2012-12-14 Thread Suzen, Mehmet

On 13 December 2012 23:21, Rui Barradas ruipbarra...@sapo.pt wrote:
 But it does, each recursive call will load another copy of the function, and
 another copy of the variables used.
 In fact, the cost can become quite large since everything is loaded in
 memory again.

 Hope this helps,


Many thanks for the replies.

What about tail-recursion?  I have seen that there were discussions
about this:

https://stat.ethz.ch/pipermail/r-help/2006-April/102873.html

Since it was 6 years ago, maybe now things are little different.

Best
-m



Mehmet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Combined Marimekko/heatmap

2012-12-14 Thread Thomas Stewart

Neal-

Perhaps the following code is a start for what you want.

-tgs

par(mar=c(1,1,1,1),
 oma = c(0,0,0,0),
  mgp=c(1.5,.2,0),
  tcl=0,
  cex.axis=.75,
  col.axis=black,
  pch=16)

Z - textConnection(
country A1 A2 A3
A 3 4 5
B 6 9 8
C 6 9 5)
ddd - read.table(Z,header=TRUE)
close(Z)


CountryPcts - rowSums(ddd[,-1]) / sum(ddd[,-1])

plot.new()
plot.window(ylim=c(0,1),xlim=c(0,1),xaxs = 'i',yaxs='i',las=1)
box()
abline(h = cumsum(CountryPcts),lwd=2)

labxats - NULL
vlines - ddd[,-1] / sum(ddd[,-1]) / CountryPcts
vlines - t(apply(vlines,1,cumsum))
yyy - c(0,rep(cumsum(CountryPcts),each=2))
yyy - head(yyy,-1)
for(i in 1:nrow(ddd) ){
xxx - rep(vlines[,i],each=2)
lines(xxx,yyy,col=red,lwd=3)
labxats[i] - rev(xxx)[1]
}

labxats - (labxats + c(0,head(labxats,-1)))/2
labyats - (cumsum(CountryPcts) + c(0,head(cumsum(CountryPcts),-1)))/2
axis(2,at=labyats,labels = ddd[,1],las=1 )
axis(3,at=labxats,labels = colnames(ddd)[-1],las=1 )


On Thu, Dec 13, 2012 at 6:09 PM, Neal Humphrey nhumph...@clasponline.orgwrote:

 Hi all,

 I'm trying to figure out a way to create a data graphic that I haven't
 ever seen an example of before, but hopefully there's an R package out
 there for it. The idea is to essentially create a heatmap, but to allow
 each column and/or row to be a different width, rather than having uniform
 column and row height. This is sort of like a Marimekko chart in
 appearance, except that rather than use a single color to represent the
 category, the color represents a value and all the y-axis heights in each
 column line up with each other. That way color represents one variable,
 while the area of the cell represents another.

 In my application, my heatmap has discrete categorical data rather than
 continuous. Rows are countries, columns are appliances, and I want to scale
 the width and height of each column to be the fraction of global energy
 consumed by the country and the fraction of energy use consumed by that
 appliance type. The color coding would then indicate whether or not that
 appliance is regulated in that country.

 Any ideas how to make such a chart, or even what it might be called?


 Neal Humphrey
 nhumph...@clasponline.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hello R User

2012-12-14 Thread arun

HI,
Try this:
dat1-read.table(text=
ID    Time
1    3
1    6
1    7
1    10
1    16
2    12
2    18
2    19
2    25
2    28
2    30
,sep=,header=TRUE)
 dat1$Time1-ave(dat1$Time,dat1$ID,FUN=function(x) c(0,diff(x)))
head(dat1,3)
#  ID Time Time1
#1  1    3 0
#2  1    6 3
#3  1    7 1

#or
dat2-unsplit(lapply(split(dat1,dat1$ID),function(x) {x$Time-c(0,diff(x[,2])); 
return(x)}),dat1$ID)
head(dat2,3)
#  ID Time
#1  1    0
#2  1    3
#3  1    1
A.K.




- Original Message -
From: bibek sharma mbhpat...@gmail.com
To: R help r-help@r-project.org
Cc: 
Sent: Friday, December 14, 2012 10:51 AM
Subject: [R] Hello R User

Hello R User,
In the sample data given below, time is recorded for each id
subsequently. For the analysis, for each id, I would like to set 1st
recorded time to zero and thereafter find the difference from previous
time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs
to be implemented to big data set.
Any suggestions are much appreciated!
Thanks,
Bibek

ID    Time
1    3
1    6
1    7
1    10
1    16
2    12
2    18
2    19
2    25
2    28
2    30

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hello R User

2012-12-14 Thread Jessica Streicher

 dataset-data.frame(id=c(1,1,2,3,3,3),time=c(3,5,1,2,4,6))
 dataset
  id time
1  13
2  15
3  21
4  32
5  34
6  36
 ids-unique(dataset$id)
 for(id in ids){
+   dataset$time[dataset$id==id]-c(0,diff(dataset$time[dataset$id==id]))
+ }
 dataset
  id time
1  10
2  12
3  20
4  30
5  32
6  32

might not be the fastest though.


On 14.12.2012, at 16:51, bibek sharma wrote:

 Hello R User,
 In the sample data given below, time is recorded for each id
 subsequently. For the analysis, for each id, I would like to set 1st
 recorded time to zero and thereafter find the difference from previous
 time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs
 to be implemented to big data set.
 Any suggestions are much appreciated!
 Thanks,
 Bibek
 
 IDTime
 1 3
 1 6
 1 7
 1 10
 1 16
 2 12
 2 18
 2 19
 2 25
 2 28
 2 30
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hello R User

2012-12-14 Thread Eik Vettorazzi

Hi Bibek,
how about this?

dta-read.table(textConnection(ID  Time
1   3
1   6
1   7
1   10
1   16
2   12
2   18
2   19
2   25
2   28
2   30),header=T)

dta$delta-with(dta,ave(Time,ID,FUN=function(x)c(0,diff(x
dta

hth.

Am 14.12.2012 16:51, schrieb bibek sharma:
 Hello R User,
 In the sample data given below, time is recorded for each id
 subsequently. For the analysis, for each id, I would like to set 1st
 recorded time to zero and thereafter find the difference from previous
 time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs
 to be implemented to big data set.
 Any suggestions are much appreciated!
 Thanks,
 Bibek
 
 IDTime
 1 3
 1 6
 1 7
 1 10
 1 16
 2 12
 2 18
 2 19
 2 25
 2 28
 2 30
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting from a matrix, where column 1 is your x value

2012-12-14 Thread Rui Barradas

Hello,

There are several ways of doing this, perhaps the easiest is with ?matplot.
You should provide some data example, like the posting guide says. Since
you haven't, I've made up some.

y - replicate(3, cumsum(rnorm(10)))
x - matrix(c(1:10, y), ncol = 4)

matplot(x[, 1], x[, -1], type = l)

Hope this helps,

Rui Barradas

Em 14-12-2012 11:28, masepot escreveu:

Hello,

I'm very new to R, and have managed to plot xy graphs, but can't seem to
plot a matrix properly.

The first column is my time points, and the following columns 2-4 are the
replicates of my experiment.

I've tried using row.names=1 to make the first column the value for that row
(whereas before I had 1-31 sequence) and then my 1st column) but I can't
work out how to tell it that column 1 of my dataset is the xvalue for the
data in that row?

It's probably really simple, but I can't seem to figure it out!

Many thanks, any help appreciated!

Laura

--
View this message in context:
http://r.789695.n4.nabble.com/Plotting-from-a-matrix-where-column-1-is-your-x-value-tp4653076.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] plotting log regression

2012-12-14 Thread Rui Barradas


Hello,

How can you expect to see the fit line if you are ploting x and y 
values, not their logarithms?

And your definitions of x and y are wrong, they should use c().

x - c(1,2,3,4,5)
y - c(6,7,8,9,10)

plot(log(x), log(y))

fit - lm(log(y) ~ log(x))  # Same as glm
abline(fit)


Please read R-intro, in the doc folder of your installation of R.

Hope this helps,

Rui Barradas
Em 14-12-2012 13:58, mrkooper escreveu:

I want to plot the regression line of a log regression into a plot with my
normal, nonlog, data.

for example

x - (1,2,3,4,5)
y - (6,7,8,9,10)

plot (x,y)

I tried a log-regression by

a - glm (log(y) ~ log(x))

and then i tried to insert the answer to my graph, where the standard values
are shown:
abline (a, col=red)

unfortunately it does not work




--
View this message in context: 
http://r.789695.n4.nabble.com/plotting-log-regression-tp4653087.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Define a custom-built loss function and combine it with nls()

2012-12-14 Thread Bert Gunter

Inline.
Note: My comments subject to confirmation by true experts.

On Thu, Dec 13, 2012 at 11:09 PM, Christiana Anagnostou 
canagnos...@zoologie.uni-kiel.de wrote:

 Dear R helpers,

 For an allometric analysis (allometric equation y = a*x^b) I would like
 to apply a non-linear regression instead of using log-log
 transformations of the measured parameters x and y and a Model II linear
 regression. Since both of the variables x and y are random, I would like
 to apply a Model II non-linear analog of either Reduced Major Axis or
 Major Axis Regression.

 The allometric equation for the non-linear regression model would be: y
 = a*x^b, with y as the dimension of a body part (dependent variable) and
 x a dimension of body size (independent variable). With applying a
 non-linear regression I would like to get estimates for the parameters a
 and b.

 Fortunately, I already have a recommended loss function, proposed by
 Ebert  Russel 1994, J. Theor. Biol.:

 loss = abs(y*abs(x-(y/a)^(1/b))-a/(b+1)*abs(x^(b+1)-(y/a)^((b+1)/b)))

 How may I define the loss function above and combine it with a
 non-linear regression nls() in R?


-- Doubt that you can.
You'll need to use an explicit optimization function. See ?optim and the
CRAN  Optimization task view. Note that you cannot use a gradient-based
optimizer as your function is not continuously differentiable (due to
abs()).

-- Bert




 Example for a data frame with the measured variables x and y:

 x-1:10

 y-c(0.3,0.7,0.9,1.3,2.0,2.4,3.3,3.8,5.0,5.8)

 d-data.frame(x,y)

 Non-linear regression with inbuilt loss-function (sum of squared
 residuals):

 nlmodel-nls(y~a*x^b,data=d,start=list(a=1,b=0.1),trace=T)

 Many thanks!!!

 Christiana

 --
 Christiana Anagnostou
 *
 Department of Evolutionary Ecology and Genetics
 Zoological Institute
 Christian-Albrechts-University of Kiel
 Am Botanischen Garten 9
 24118 Kiel
 Germany
 Tel: +49-431-8804145 (work)
 Tel: +49-431-5709649 (home)
 Fax: +49-431-8802403
 Email: canagnos...@zoologie.uni-kiel.de
 Web: www.uni-kiel.de/zoologie/evoecogen/


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] COZIGAM removed from CRAN

2012-12-14 Thread Uwe Ligges




On 14.12.2012 16:55, Raeanne Miller wrote:

Hi All,

Another quick question - I noticed that COZIGAM has been removed from CRAN, and 
that you are referred to the archive for previous versions (last updated 23 
July 2012). Is this package still ok to use? Or is there an alternative which 
might also fit zero-inflated GAMs?


Please ask the package maintainer for details.

The CRAN team decides to archive packages on maintainer request, if the 
package appears to be unmaintained (no reaction to CRAN requests), if 
problems in the packages are not fixed after given deadlines passed, or 
possibly also for other reasons.


Best,
Uwe Ligges





Thanks,

Raeanne
The Scottish Association for Marine Science (SAMS) is registered in Scotland as 
a Company Limited by Guarantee (SC009292) and is a registered charity (9206). 
SAMS has an actively trading wholly owned subsidiary company: SAMS Research 
Services Ltd a Limited Company (SC224404). All Companies in the group are 
registered in Scotland and share a registered office at Scottish Marine 
Institute, Oban Argyll PA37 1QA. The content of this message may contain 
personal views which are not the views of SAMS unless specifically stated. 
Please note that all email traffic is monitored for purposes of security and 
spam filtering. As such individual emails may be examined in more detail.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Combined Marimekko/heatmap

2012-12-14 Thread Neal Humphrey

Thomas, 

This is a big help for getting me started. Brand new to R, so I'm unfamiliar 
with how to 'manually' drawing graphs (instead of packages). 

The graph your code makes is more like a Marimekko chart. What I'm thinking of 
is like a heatmap but each row has a different width, and each column has a 
different width. But, for any particular column, the width is the same all the 
way down. 

I used your code to figure out how to draw lines on a chart the way I need this 
to look. Now what I need to figure out is how to add color coding to the 
respective squares. In my example it's binary data (yes/no), but a more robust 
approach would allow for a true heatmap. 

Any help on getting this to the next step of color coding would be much 
appreciated! The code below will draw a grid that is scaled per the sample data 
on the x and y axes. Perhaps what is needed is to draw boxes rather than lines 
to make the grid? I'm not sure. 

Neal

par(mar=c(1,1,1,1),
 oma = c(0,0,0,0),
  mgp=c(1.5,.2,0),
  tcl=0,
  cex.axis=.75,
  col.axis=black,
  pch=16)

#Input the sample data---
Z - textConnection(
country TWh
Australia 244
Canada 522
China 3503
EU 2500
India 689.5
Japan 997
USA 3960
)
CountryEnergy - read.table(Z,header=TRUE)
close(Z)

Z - textConnection(
Product TWh
Freezer 72.4
Refrigerator 379
HVAC 466
Lighting 123
Television 152
)
ProductEnergy - read.table(Z,header=TRUE)
close(Z)

#-Binary data indicating whether that country/product combination 
has a standard
#-Rows correspond to countries (in the same order as the 
CountryEnergy matrix)--
#-Columns correspond to products (in the same order as the 
ProductEnergy matrix)
Z - textConnection(
country TWh
0 1 0 0 1
0 1 1 1 0
1 0 1 0 0
1 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 0 0
)
ddd - read.table(Z,header=FALSE)
close(Z)

#---rewrite the data table so that the vector is numbers only, and 
label the rows--
row.names(CountryEnergy) - CountryEnergy$Country  
CountryEnergy-CountryEnergy[,2:2]
row.names(ProductEnergy) - ProductEnergy$Product
ProductEnergy - ProductEnergy[,2:2]


#---plot the grid
plot.new()
plot.window(ylim=c(0,sum(CountryEnergy)),xlim=c(0,sum(ProductEnergy)),xaxs = 
'i',yaxs='i',las=1)
box()
abline(h = cumsum(CountryEnergy),lwd=2, col=gray60) #lwd - line width
abline(v = cumsum(ProductEnergy), lwd=2, col=gray60)

labxats - NULL

#--Use ddd data to code the cells as yes/no for having a 
standard-
#
#
#  this is the part I need help with
#
#
#
#--


Neal Humphrey
nhumph...@clasponline.org

From: Thomas Stewart [mailto:t...@live.unc.edu] 
Sent: Friday, December 14, 2012 10:36 AM
To: Neal Humphrey
Cc: r-help@r-project.org
Subject: Re: [R] Combined Marimekko/heatmap

Neal-

Perhaps the following code is a start for what you want.

-tgs

par(mar=c(1,1,1,1),
 oma = c(0,0,0,0),
  mgp=c(1.5,.2,0),
  tcl=0,
  cex.axis=.75,
  col.axis=black,
  pch=16)

Z - textConnection(
country A1 A2 A3
A 3 4 5
B 6 9 8
C 6 9 5)
ddd - read.table(Z,header=TRUE)
close(Z)


CountryPcts - rowSums(ddd[,-1]) / sum(ddd[,-1])

plot.new()
plot.window(ylim=c(0,1),xlim=c(0,1),xaxs = 'i',yaxs='i',las=1)
box()
abline(h = cumsum(CountryPcts),lwd=2)

labxats - NULL
vlines - ddd[,-1] / sum(ddd[,-1]) / CountryPcts
vlines - t(apply(vlines,1,cumsum))
yyy - c(0,rep(cumsum(CountryPcts),each=2))
yyy - head(yyy,-1)
for(i in 1:nrow(ddd) ){
xxx - rep(vlines[,i],each=2)
lines(xxx,yyy,col=red,lwd=3)
labxats[i] - rev(xxx)[1]
}

labxats - (labxats + c(0,head(labxats,-1)))/2
labyats - (cumsum(CountryPcts) + c(0,head(cumsum(CountryPcts),-1)))/2
axis(2,at=labyats,labels = ddd[,1],las=1 )
axis(3,at=labxats,labels = colnames(ddd)[-1],las=1 )

On Thu, Dec 13, 2012 at 6:09 PM, Neal Humphrey nhumph...@clasponline.org 
wrote:
Hi all,

I'm trying to figure out a way to create a data graphic that I haven't ever 
seen an example of before, but hopefully there's an R package out there for it. 
The idea is to essentially create a heatmap, but to allow each column and/or 
row to be a different width, rather than having uniform column and row height. 
This is sort of like a Marimekko chart in appearance, except that rather than 
use a single color to represent the category, the color represents a value and 
all the y-axis heights in each column line up with each other. That way color 
represents one variable, while the area of the cell represents another.

In my application, my heatmap has discrete categorical data rather than 
continuous. Rows are countries, columns are appliances, and I want to scale the 
width and height of each column to be the fraction of global energy consumed by 
the country and the fraction of energy use consumed by that appliance type. The 
color coding would then indicate whether or not that appliance is regulated in 
that

Re: [R] Rprof causing R to crash

2012-12-14 Thread Uwe Ligges




On 14.12.2012 14:22, Jon Olav Skoien wrote:

Uwe,

I am unfortunately not able to upgrade to R 2.15.2 right now, but I have
seen a similar problem with several older R versions. If you want to
test with a shorter script, you can try the lines below. These provoke a
crash from a fresh R session on my machine (R 2.15.1 Windows 7):

Rprof()
z = 1
for (i in 1:1e8) z = z+1/i

This runs without problems without the call to Rprof(). I was far from
using all my memory with this call, but you could also check with a
higher number of loops in case you can run these lines and it is somehow
memory related. Just for comparison, I tried 1e9 loops after a call to
Rprof() on one of our Linux servers (R 2.14.0) without any problems.


Works for me.

Uwe



Jon

On 12-Dec-12 17:06, Uwe Ligges wrote:



On 12.12.2012 00:05, Marian Talbert wrote:

I'm trying to use Rprof() to identify bottlenecks and speed up a
particullary
slow section of code which reads in a portion of a tif file and compares
each of the values to values of predictors used for model fitting.  I've
written up an example that anyone can run.  Generally temp would be a
section of a tif read into a data.frame and used later for other
processing.
The first portion which just records the time works in about 6
seconds the
second part causes RGui to immediately close with no error or
warning.  Any
advice on how to get Rprof to work or how to speed up this code would be
greatly appreciated.  I'm using Windows 7 (which might be my problem)
and R
version 2.15.0.


The problem is rather the R version: I cannot reproduce errors with a
recent R.

Uwe Ligges




CalcMESS-function(tiff.entry,pred.vect){
f-sum(pred.vecttiff.entry)/length(pred.vect)*100
   if(is.na(f)) return(NA)
   if(f==0)
return((tiff.entry-min(pred.vect))/(max(pred.vect)-min(pred.vect))*100)
   if(0f  f=50) return(2*f)
   if(50=f  f100) return(2*(100-f))
   if(f==100)
return((max(pred.vect)-tiff.entry)/(max(pred.vect)-min(pred.vect))*100)
   else return(NA)
}

train.dat -
data.frame(a=runif(200),b=runif(200),c=runif(200),d=runif(200))
temp -
data.frame(a=runif(13),b=runif(13),c=runif(13),d=runif(13))

pred.rng-temp
vnames.final.mod - names(train.dat)
nvars.final - length(vnames.final.mod)

start.time-Sys.time()
  for(k in 1:nvars.final){

pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))]

pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range))

  }
Sys.time()-start.time


Rprof(C:\\temp\\mapply.out)
  for(k in 1:nvars.final){

pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))]

pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range))

  }
Rprof(NULL)



--
View this message in context:
http://r.789695.n4.nabble.com/Rprof-causing-R-to-crash-tp4652846.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprof causing R to crash

2012-12-14 Thread Prof Brian Ripley


On 14/12/2012 13:22, Jon Olav Skoien wrote:

Uwe,

I am unfortunately not able to upgrade to R 2.15.2 right now, but I have


Why not?   Note that is part of the R-help contract: we only offer any 
support for the current version of R (see the posting guide).


The posting guide also asked for 'at a minimum' information: we do not 
know (amongst other things) if this is 32- or 64-bit R.  (Your example 
works for me in both.)



seen a similar problem with several older R versions. If you want to
test with a shorter script, you can try the lines below. These provoke a
crash from a fresh R session on my machine (R 2.15.1 Windows 7):

Rprof()
z = 1
for (i in 1:1e8) z = z+1/i

This runs without problems without the call to Rprof(). I was far from
using all my memory with this call, but you could also check with a
higher number of loops in case you can run these lines and it is somehow
memory related. Just for comparison, I tried 1e9 loops after a call to
Rprof() on one of our Linux servers (R 2.14.0) without any problems.


Which is irrelevant: a different version of R and a completely different 
implementation of Rprof.




Jon

On 12-Dec-12 17:06, Uwe Ligges wrote:



On 12.12.2012 00:05, Marian Talbert wrote:

I'm trying to use Rprof() to identify bottlenecks and speed up a
particullary
slow section of code which reads in a portion of a tif file and compares
each of the values to values of predictors used for model fitting.  I've
written up an example that anyone can run.  Generally temp would be a
section of a tif read into a data.frame and used later for other
processing.
The first portion which just records the time works in about 6
seconds the
second part causes RGui to immediately close with no error or
warning.  Any
advice on how to get Rprof to work or how to speed up this code would be
greatly appreciated.  I'm using Windows 7 (which might be my problem)
and R
version 2.15.0.


The problem is rather the R version: I cannot reproduce errors with a
recent R.

Uwe Ligges




CalcMESS-function(tiff.entry,pred.vect){
f-sum(pred.vecttiff.entry)/length(pred.vect)*100
   if(is.na(f)) return(NA)
   if(f==0)
return((tiff.entry-min(pred.vect))/(max(pred.vect)-min(pred.vect))*100)
   if(0f  f=50) return(2*f)
   if(50=f  f100) return(2*(100-f))
   if(f==100)
return((max(pred.vect)-tiff.entry)/(max(pred.vect)-min(pred.vect))*100)
   else return(NA)
}

train.dat -
data.frame(a=runif(200),b=runif(200),c=runif(200),d=runif(200))
temp -
data.frame(a=runif(13),b=runif(13),c=runif(13),d=runif(13))

pred.rng-temp
vnames.final.mod - names(train.dat)
nvars.final - length(vnames.final.mod)

start.time-Sys.time()
  for(k in 1:nvars.final){

pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))]

pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range))

  }
Sys.time()-start.time


Rprof(C:\\temp\\mapply.out)
  for(k in 1:nvars.final){

pred.range-train.dat[,match(vnames.final.mod[k],names(train.dat))]

pred.rng[,k]-mapply(CalcMESS,tiff.entry=temp[,match(vnames.final.mod[k],names(temp))],MoreArgs=list(pred.vect=pred.range))

  }
Rprof(NULL)



--
View this message in context:
http://r.789695.n4.nabble.com/Rprof-causing-R-to-crash-tp4652846.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] A question on list and lapply

2012-12-14 Thread Christofer Bogaso


Dear all, let say I have following list:

Dat - vector(list, length = 26)
names(Dat) - LETTERS
My_Function - function(x) return(rnorm(5))
Dat1 - lapply(Dat, My_Function)


However I want to apply my function 'My_Function' for all elements of 
'Dat' except the elements having 'names(Dat) == P'. Here I have 
specified the name P just for illustration however this will be some 
name specified by user.


Is there any direct way to achieve this, using 'lapply'?

Thanks for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A question on list and lapply

2012-12-14 Thread Sarah Goslee

What about:
lapply(Dat[names(Dat) != P], My_Function)

You could use %in% if you actually want to match a longer set of names.

Sarah


On Fri, Dec 14, 2012 at 1:58 PM, Christofer Bogaso
bogaso.christo...@gmail.com wrote:
 Dear all, let say I have following list:

 Dat - vector(list, length = 26)
 names(Dat) - LETTERS
 My_Function - function(x) return(rnorm(5))
 Dat1 - lapply(Dat, My_Function)


 However I want to apply my function 'My_Function' for all elements of 'Dat'
 except the elements having 'names(Dat) == P'. Here I have specified the
 name P just for illustration however this will be some name specified by
 user.

 Is there any direct way to achieve this, using 'lapply'?

 Thanks for your help.


--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A question on list and lapply

2012-12-14 Thread Mark Lamias

How about 

Dat1 - lapply(subset(Dat, Dat!=P), My_Function)

--Mark





 From: Christofer Bogaso bogaso.christo...@gmail.com
To: r-help@r-project.org 
Sent: Friday, December 14, 2012 1:58 PM
Subject: [R] A question on list and lapply
 
Dear all, let say I have following list:

Dat - vector(list, length = 26)
names(Dat) - LETTERS
My_Function - function(x) return(rnorm(5))
Dat1 - lapply(Dat, My_Function)


However I want to apply my function 'My_Function' for all elements of 'Dat' 
except the elements having 'names(Dat) == P'. Here I have specified the name 
P just for illustration however this will be some name specified by user.

Is there any direct way to achieve this, using 'lapply'?

Thanks for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hello R User

2012-12-14 Thread arun

Hi,

You could also use library(data.table) to do this faster.
dat1-read.table(text=
ID    Time
1    3
1    6
1    7
1    10
1    16
2    12
2    18
2    19
2    25
2    28
2    30
,sep=,header=TRUE)
library(data.table)
dat2-data.table(dat1)
res-dat2[,Time1:=c(0,diff(Time)),by=ID]
 head(res,3)
 #  ID Time Time1
#1:  1    3 0
#2:  1    6 3
#3:  1    7 1

#Comparing different approaches:
set.seed(55)
dat3- data.frame(ID=rep(1:1000,each=500),Value=sample(1:800,5e5,replace=TRUE))
dat4-data.table(dat3)
system.time(dat3$Value1-ave(dat3$Value,dat3$ID,FUN=function(x) c(0,diff(x
#   user  system elapsed 
 # 0.312   0.000   0.313 

ids-unique(dat3$ID)
 system.time({
   for(id in ids){
   dat3$Value[dat3$ID==id]-c(0,diff(dat3$Value[dat3$ID==id]))
   } })
#   user  system elapsed 
# 36.938   0.868  37.873 

system.time(dat5-dat4[,Value1:=c(0,diff(Value)),by=ID])
#   user  system elapsed 
 # 0.036   0.000   0.037 
head(dat5)
#   ID Value Value1
#1:  1   439  0
#2:  1   175   -264
#3:  1    28   -147
#4:  1   634    606
#5:  1   449   -185
#6:  1    60   -389
 head(dat3)
#  ID Value Value1
#1  1 0  0
#2  1  -264   -264
#3  1  -147   -147
#4  1   606    606
#5  1  -185   -185
#6  1  -389   -389

A.K.












- Original Message -
From: bibek sharma mbhpat...@gmail.com
To: R help r-help@r-project.org
Cc: 
Sent: Friday, December 14, 2012 10:51 AM
Subject: [R] Hello R User

Hello R User,
In the sample data given below, time is recorded for each id
subsequently. For the analysis, for each id, I would like to set 1st
recorded time to zero and thereafter find the difference from previous
time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs
to be implemented to big data set.
Any suggestions are much appreciated!
Thanks,
Bibek

ID    Time
1    3
1    6
1    7
1    10
1    16
2    12
2    18
2    19
2    25
2    28
2    30

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hello R User

2012-12-14 Thread arun

HI,
Try:
dat2[,Time1:=c(0,diff(Time)),by=ID]
dat2[,CumSTime1:=cumsum(Time1),by=ID]


  head(dat2,4)
#   ID Time Time1 CumSTime1
#1:  1    3 0 0
#2:  1    6 3 3
#3:  1    7 1 4
#4:  1   10 3 7
A.K.

- Original Message -
From: bibek sharma mbhpat...@gmail.com
To: arun smartpink...@yahoo.com
Cc: 
Sent: Friday, December 14, 2012 12:56 PM
Subject: Re: [R] Hello R User

Hi Arun,
Great!
Once we get, time1, I again wanna add time to its previous value for
example, wanna get 0,3 4 etc...
can I have suggestion?

On Fri, Dec 14, 2012 at 9:42 AM, arun smartpink...@yahoo.com wrote:
 Hi,

 You could also use library(data.table) to do this faster.
 dat1-read.table(text=
 ID    Time
 1    3
 1    6
 1    7
 1    10
 1    16
 2    12
 2    18
 2    19
 2    25
 2    28
 2    30
 ,sep=,header=TRUE)
 library(data.table)
 dat2-data.table(dat1)
 res-dat2[,Time1:=c(0,diff(Time)),by=ID]
  head(res,3)
  #  ID Time Time1
 #1:  1    3     0
 #2:  1    6     3
 #3:  1    7     1

 #Comparing different approaches:
 set.seed(55)
 dat3- 
 data.frame(ID=rep(1:1000,each=500),Value=sample(1:800,5e5,replace=TRUE))
 dat4-data.table(dat3)
 system.time(dat3$Value1-ave(dat3$Value,dat3$ID,FUN=function(x) c(0,diff(x
 #   user  system elapsed
  # 0.312   0.000   0.313

 ids-unique(dat3$ID)
  system.time({
    for(id in ids){
    dat3$Value[dat3$ID==id]-c(0,diff(dat3$Value[dat3$ID==id]))
    } })
 #   user  system elapsed
 # 36.938   0.868  37.873

 system.time(dat5-dat4[,Value1:=c(0,diff(Value)),by=ID])
 #   user  system elapsed
  # 0.036   0.000   0.037
 head(dat5)
 #   ID Value Value1
 #1:  1   439      0
 #2:  1   175   -264
 #3:  1    28   -147
 #4:  1   634    606
 #5:  1   449   -185
 #6:  1    60   -389
  head(dat3)
 #  ID Value Value1
 #1  1     0      0
 #2  1  -264   -264
 #3  1  -147   -147
 #4  1   606    606
 #5  1  -185   -185
 #6  1  -389   -389

 A.K.












 - Original Message -
 From: bibek sharma mbhpat...@gmail.com
 To: R help r-help@r-project.org
 Cc:
 Sent: Friday, December 14, 2012 10:51 AM
 Subject: [R] Hello R User

 Hello R User,
 In the sample data given below, time is recorded for each id
 subsequently. For the analysis, for each id, I would like to set 1st
 recorded time to zero and thereafter find the difference from previous
 time. I.e. for ID==1, I would like to see Time=0,3,1,3,6. This needs
 to be implemented to big data set.
 Any suggestions are much appreciated!
 Thanks,
 Bibek

 ID    Time
 1    3
 1    6
 1    7
 1    10
 1    16
 2    12
 2    18
 2    19
 2    25
 2    28
 2    30

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R-sig-hpc] recursion depth limitations

2012-12-14 Thread Suzen, Mehmet

On 14 December 2012 13:46, Simon Urbanek simon.urba...@r-project.org wrote:
 You may be a bit misinformed about with tail recursion means - it still needs 
 to evaluate the function for each recursion step, the only difference is that 
 in
 such special case there is no additional information that needs to be stored 
 -- and you have also proven why it's not as simple as you think:

Thank you Simon for your time and patient reply. I was thinking
calling another function over again is called indirect recursion.

 f - function(x) if(x 0) f(x-1)
 (f(100))
NULL

 f1-function(x) x-1
 f - function(x) f1(x);
 (f(100))
[1] 99

 Your latter version doesn't actually do anything anywhere close to the 
 recursion you defined.

I see what you mean. I was confused. My intention was having 100 calls
in the second one with f1 as well.
Maybe I can make the point with using another example via mutual
recursion, 'fibRping/pong' below.
But I was thinking 'fibRping' version would need less memory then the
plain recursion, which I now understand
that I was mistaken. Every call counts then. Even though I had an
impression that  calling the same
function more then once like fibRping/pong would not create additional
memory requirement ... (Probably the issue is noting to do with
Rs internals though..

Using Dirk's blog entry (http://dirk.eddelbuettel.com/blog/2011/09/08/) :
## R implementation of recursive Fibonacci sequence
fibR - function(n) {
if (n == 0) return(0)
if (n == 1) return(1)
return (fibR(n - 1) + fibR(n - 2))
}


# Now Mutual recursion
fibRping  - function(n) {
if (n == 0) return(0)
if (n == 1) return(1)
return (fibRpong(n - 1) + fibRpong(n - 2))
}
fibRpong  - function(n) {
if (n == 0) return(0)
if (n == 1) return(1)
return (fibRping(n - 1) + fibRping(n - 2))
}

options(expressions=50)
fibR(5) options(expressions=50)
 fibR(5)
Error: C stack usage is too close to the limit
 fibRping(5)
Error: C stack usage is too close to the limit

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] format.pval () and printCoefmat ()

2012-12-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi List,

My goal is to force R not to print in scientific notation in the sixth column 
(rel_diff - for the p-value) of my data frame (not a matrix).

I have used the format.pval () and printCoefmat () functions on the data frame. 
The R script is appended below.

This issue is that use of the format.pval () and printCoefmat () functions on 
the data frame gives me the desired results, but coerces the character string 
into NAs for the two character variables, because my object is a data frame, 
not a matrix. Please see the first output below: contrast_level1 
contrast_level2).

Is there a way I could have avoid printing the NAs in the character fields when 
using the format.pval () and printCoefmat () on the data frame?

I would appreciate receiving your help.

Thanks,

Pradip
setwd (F:/PR1/R_PR1)

load (file = sigtests_overall_withid.rdata)

#format.pval(tt$p.value, eps=0.0001)

# keep only selected columns from the above data frame
keep_cols1 - c(contrast_level1, contrast_level2,mean_level1,
mean_level2, rel_diff,
  p_mean, cohens_d)

#subset the data frame
y0410_1825_mf_alc - subset (sigtests_overall_withid,
  years==0410  age_group==1825
   gender_group==all  drug==alc
   contrast_level1==wh,
  select=keep_cols1)
#change the row.names
row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1]

#force
format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001)

#print the observations from the sub-data frame
options (width=120,digits=3 )
#y0410_1825_mf_alc

printCoefmat(y0410_1825_mf_alc, has.Pvalue=TRUE, eps.Pvalue=0.0001)

### When format.pval () and printCoefmat () used


contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d

1   NA  NA  18.744  11.9110.574   0.00
0.175

2   NA  NA  18.744  14.4550.297   0.00
0.110

3   NA  NA  18.744  13.5400.384   0.00
0.133

4   NA  NA  18.744   6.0022.123   0.00
0.333

5   NA  NA  18.744   5.8342.213   0.00
0.349

6   NA  NA  18.744   7.9331.363   0.00
0.279

7   NA  NA  18.744  10.8490.728   0.00
0.203

8   NA  NA  18.744   7.1301.629   0.00
0.298

9   NA  NA  18.744   9.7200.928   0.00
0.242

10  NA  NA  18.744   9.6000.952   0.00
0.242

11  NA  NA  18.744  16.1350.162   0.17
0.067 .

12  NA  NA  18.744  NA   NA NA  
 NA

13  NA  NA  18.744  10.4650.791   0.00
0.213

14  NA  NA  18.744  15.1490.237   0.02
0.092 .

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Warning messages:

1: In data.matrix(x) : NAs introduced by coercion

2: In data.matrix(x) : NAs introduced by coercion




### When format.pval () and printCoefmat () not used

contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
cohens_d
1   wh2+hi18.7   11.910.574  1.64e-05   
0.1753
2   wh2+rc18.7   14.460.297  9.24e-06   
0.1101
3   whaian18.7   13.540.384  9.01e-05   
0.1335
4   whasan18.76.002.123 2.20e-119   
0.3326
5   whblck18.75.832.213  0.00e+00   
0.3490
6   whcsam18.77.931.363  1.27e-47   
0.2793
7   wh cub18.7   10.850.728  6.12e-08   
0.2025
8   whdmcn18.77.131.629  1.59e-15   
0.2981
9   whhisp18.79.720.928 3.27e-125   
0.2420
10  wh mex18.79.600.952 8.81e-103   
0.2420
11  whnhpi18.7   16.140.162  1.74e-01   
0.0669
12  whothh18.7  NA   NANA   
NA
13  wh  pr18.7   10.470.791  3.64e-23   
0.2131
14  wh spn18.7   15.150.237  1.58e-02   
0.0922



Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.govmailto:pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health

Re: [R] format.pval () and printCoefmat ()

2012-12-14 Thread David Winsemius


On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi List,
 
 My goal is to force R not to print in scientific notation in the sixth column 
 (rel_diff - for the p-value) of my data frame (not a matrix).
 
 I have used the format.pval () and printCoefmat () functions on the data 
 frame. The R script is appended below.
 
 This issue is that use of the format.pval () and printCoefmat () functions on 
 the data frame gives me the desired results, but coerces the character string 
 into NAs for the two character variables, because my object is a data frame, 
 not a matrix. Please see the first output below: contrast_level1 
 contrast_level2).
 
 Is there a way I could have avoid printing the NAs in the character fields

They are probably factor columns.

 when using the format.pval () and printCoefmat () on the data frame?
 
 I would appreciate receiving your help.
 
 Thanks,
 
 Pradip
 setwd (F:/PR1/R_PR1)
 
 load (file = sigtests_overall_withid.rdata)
 
 #format.pval(tt$p.value, eps=0.0001)
 
 # keep only selected columns from the above data frame
 keep_cols1 - c(contrast_level1, contrast_level2,mean_level1,
mean_level2, rel_diff,
  p_mean, cohens_d)
 
 #subset the data frame
 y0410_1825_mf_alc - subset (sigtests_overall_withid,
  years==0410  age_group==1825
   gender_group==all  drug==alc
   contrast_level1==wh,
  select=keep_cols1)
 #change the row.names
 row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1]
 
 #force
 format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001)

Presumably that call will produce desired results since it is on only one 
column. (I'm not sure why you think format.pval contributed to your NA output.)

 
 #print the observations from the sub-data frame
 options (width=120,digits=3 )
 #y0410_1825_mf_alc
 
 printCoefmat(y0410_1825_mf_alc, has.Pvalue=TRUE, eps.Pvalue=0.0001)

Why not use `cbind.data.frame` rather than trying to get `printCoefmat` to do 
something it (apparently) wasn't designed to do?

cbind(  y0410_1825_mf_alc[ 1:2],  
printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, 
eps.Pvalue=0.0001)
 )

-- 
David.

 
 ### When format.pval () and printCoefmat () used
 
 
 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean 
 cohens_d
 
 1   NA  NA  18.744  11.9110.574   0.00
 0.175
 2   NA  NA  18.744  14.4550.297   0.00
 0.110
 3   NA  NA  18.744  13.5400.384   0.00
 0.133
 4   NA  NA  18.744   6.0022.123   0.00
 0.333
 5   NA  NA  18.744   5.8342.213   0.00
 0.349
 6   NA  NA  18.744   7.9331.363   0.00
 0.279
 7   NA  NA  18.744  10.8490.728   0.00
 0.203
 8   NA  NA  18.744   7.1301.629   0.00
 0.298
 9   NA  NA  18.744   9.7200.928   0.00
 0.242
 10  NA  NA  18.744   9.6000.952   0.00
 0.242
 11  NA  NA  18.744  16.1350.162   0.17
 0.067 .
 12  NA  NA  18.744  NA   NA NA
NA
 13  NA  NA  18.744  10.4650.791   0.00
 0.213
 14  NA  NA  18.744  15.1490.237   0.02
 0.092 .
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 Warning messages:
 1: In data.matrix(x) : NAs introduced by coercion
 2: In data.matrix(x) : NAs introduced by coercion
 
 ### When format.pval () and printCoefmat () not used
 
 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
 cohens_d
 1   wh2+hi18.7   11.910.574  1.64e-05 
   0.1753
 2   wh2+rc18.7   14.460.297  9.24e-06 
   0.1101
 3   whaian18.7   13.540.384  9.01e-05 
   0.1335
 4   whasan18.76.002.123 2.20e-119 
   0.3326
 5   whblck18.75.832.213  0.00e+00 
   0.3490
 6   whcsam18.77.931.363  1.27e-47 
   0.2793
 7   wh cub18.7   10.850.728  6.12e-08 
   0.2025
 8   whdmcn18.77.131.629  1.59e-15 
   0.2981
 9   whhisp18.79.720.928 3.27e-125 
   0.2420
 10  wh mex18.79.600.952 8.81e-103 
   0.2420
 11  whnhpi18.7   16.140.162  1.74e-01 
   0.0669
 12  wh

Re: [R] A question on list and lapply

2012-12-14 Thread arun

Hi,

If you want the list element P to be present as NULL in the result
you could use this:
set.seed(51)
lapply(lapply(Dat,My_Function),function(x) 
{if(names(Dat)[match.call()[[2]][[3]]]%in% P) NULL else x})
A.K.




- Original Message -
From: Christofer Bogaso bogaso.christo...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, December 14, 2012 1:58 PM
Subject: [R] A question on list and lapply

Dear all, let say I have following list:

Dat - vector(list, length = 26)
names(Dat) - LETTERS
My_Function - function(x) return(rnorm(5))
Dat1 - lapply(Dat, My_Function)


However I want to apply my function 'My_Function' for all elements of 'Dat' 
except the elements having 'names(Dat) == P'. Here I have specified the name 
P just for illustration however this will be some name specified by user.

Is there any direct way to achieve this, using 'lapply'?

Thanks for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A question on list and lapply

2012-12-14 Thread William Dunlap

 lapply(lapply(Dat,My_Function),function(x) 
 {if(names(Dat)[match.call()[[2]][[3]]]%in%
 P) NULL else x})

match.call()[[2]][[3]], gack!

In lapply(X, FUN), FUN is applied to X[[i]], which has lost the names attribute 
that X
may have had.  X[i] retains a part of the names attribute (since it is a 
sublist of X, not an element
of X).  Hence FUN can look at the name associated with X[i] with code like the 
following:
 lapply(seq_along(X), FUN=function(i) { Xi - X[i] ; names(Xi) })

E.g., to apply one sort of processing to elements named P and another sort to 
those
not named P you can do:
   bases - list(O=Oak Harbor,P=Pensicola,Q=Quonset Point)
   lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) != 
P) paste0((,base,)) else tolower(base) } )
  [[1]]
  [1] (Oak Harbor)  
  
  [[2]]
  [1] pensicola
  
  [[3]]
  [1] (Quonset Point)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of arun
 Sent: Friday, December 14, 2012 12:11 PM
 To: Christofer Bogaso
 Cc: R help
 Subject: Re: [R] A question on list and lapply
 
 Hi,
 
 If you want the list element P to be present as NULL in the result
 you could use this:
 set.seed(51)
 lapply(lapply(Dat,My_Function),function(x) 
 {if(names(Dat)[match.call()[[2]][[3]]]%in%
 P) NULL else x})
 A.K.
 
 
 
 
 - Original Message -
 From: Christofer Bogaso bogaso.christo...@gmail.com
 To: r-help@r-project.org
 Cc:
 Sent: Friday, December 14, 2012 1:58 PM
 Subject: [R] A question on list and lapply
 
 Dear all, let say I have following list:
 
 Dat - vector(list, length = 26)
 names(Dat) - LETTERS
 My_Function - function(x) return(rnorm(5))
 Dat1 - lapply(Dat, My_Function)
 
 
 However I want to apply my function 'My_Function' for all elements of 'Dat' 
 except the
 elements having 'names(Dat) == P'. Here I have specified the name P just 
 for
 illustration however this will be some name specified by user.
 
 Is there any direct way to achieve this, using 'lapply'?
 
 Thanks for your help.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A question on list and lapply

2012-12-14 Thread William Dunlap

Yes, my example should have either applied lapply to
 structure(seq_along(bases), names=names(bases))
instead of just
 seq_along(bases)
or added the names(bases) to the ouput of lapply  (assuming
one wanted the names of 'bases' on the output).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: arun [mailto:smartpink...@yahoo.com]
 Sent: Friday, December 14, 2012 1:37 PM
 To: William Dunlap
 Cc: R help
 Subject: Re: [R] A question on list and lapply
 
 HI,
 
 By applying:
 bases
 #$O
 #[1] Oak Harbor
 
 #$P
 #[1] Pensicola
 #
 #$Q
 #[1] Quonset Point
 
 res2- lapply(bases,function(x) {if(names(bases)[match.call()[[2]][[3]]]%in% 
 P)
 tolower(x) else paste0((,x,))})
 res2
 #$O
 #[1] (Oak Harbor)
 #
 #$P
 #[1] pensicola
 #
 #$Q
 #[1] (Quonset Point)
 #the names of the list elements are not lost unless, it is named again.
 
 
 res1-lapply(seq_along(bases), function(i){ base - bases[i] ; if 
 (names(base) != P)
 paste0((,base,)) else tolower(base) } )
 names(res1)
 #NULL
  names(res2)
 #[1] O P Q
 A.K.
 
 
 
 - Original Message -
 From: William Dunlap wdun...@tibco.com
 To: arun smartpink...@yahoo.com; Christofer Bogaso
 bogaso.christo...@gmail.com
 Cc: R help r-help@r-project.org
 Sent: Friday, December 14, 2012 3:59 PM
 Subject: RE: [R] A question on list and lapply
 
  lapply(lapply(Dat,My_Function),function(x) 
  {if(names(Dat)[match.call()[[2]][[3]]]%in%
  P) NULL else x})
 
 match.call()[[2]][[3]], gack!
 
 In lapply(X, FUN), FUN is applied to X[[i]], which has lost the names 
 attribute that X
 may have had.  X[i] retains a part of the names attribute (since it is a 
 sublist of X, not an
 element
 of X).  Hence FUN can look at the name associated with X[i] with code like 
 the following:
      lapply(seq_along(X), FUN=function(i) { Xi - X[i] ; names(Xi) })
 
 E.g., to apply one sort of processing to elements named P and another sort 
 to those
 not named P you can do:
    bases - list(O=Oak Harbor,P=Pensicola,Q=Quonset Point)
    lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) 
 != P)
 paste0((,base,)) else tolower(base) } )
   [[1]]
   [1] (Oak Harbor)
 
   [[2]]
   [1] pensicola
 
   [[3]]
   [1] (Quonset Point)
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
  Behalf
  Of arun
  Sent: Friday, December 14, 2012 12:11 PM
  To: Christofer Bogaso
  Cc: R help
  Subject: Re: [R] A question on list and lapply
 
  Hi,
 
  If you want the list element P to be present as NULL in the result
  you could use this:
  set.seed(51)
  lapply(lapply(Dat,My_Function),function(x) 
  {if(names(Dat)[match.call()[[2]][[3]]]%in%
  P) NULL else x})
  A.K.
 
 
 
 
  - Original Message -
  From: Christofer Bogaso bogaso.christo...@gmail.com
  To: r-help@r-project.org
  Cc:
  Sent: Friday, December 14, 2012 1:58 PM
  Subject: [R] A question on list and lapply
 
  Dear all, let say I have following list:
 
  Dat - vector(list, length = 26)
  names(Dat) - LETTERS
  My_Function - function(x) return(rnorm(5))
  Dat1 - lapply(Dat, My_Function)
 
 
  However I want to apply my function 'My_Function' for all elements of 'Dat' 
  except the
  elements having 'names(Dat) == P'. Here I have specified the name P 
  just for
  illustration however this will be some name specified by user.
 
  Is there any direct way to achieve this, using 'lapply'?
 
  Thanks for your help.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] format.pval () and printCoefmat ()

2012-12-14 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hi David,

Thank you so much for helping me with the code.

Your suggested code gives me the following results. Please see below. I don't 
understand why I am getting two  blocks of prints (5 columns, and then 7 
columns), with some columns repeated.

Regards,

Pradip
#

 cbind(  y0410_1825_mf_alc[ 1:2],  
+ printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, 
eps.Pvalue=0.0001)
+ )
   mean_level1 mean_level2 rel_diff p_mean cohens_d  
1   18.744  11.9110.574   0.000.175  
2   18.744  14.4550.297   0.000.110  
3   18.744  13.5400.384   0.000.133  
4   18.744   6.0022.123   0.000.333  
5   18.744   5.8342.213   0.000.349  
6   18.744   7.9331.363   0.000.279  
7   18.744  10.8490.728   0.000.203  
8   18.744   7.1301.629   0.000.298  
9   18.744   9.7200.928   0.000.242  
10  18.744   9.6000.952   0.000.242  
11  18.744  16.1350.162   0.170.067 .
12  18.744  NA   NA NA   NA  
13  18.744  10.4650.791   0.000.213  
14  18.744  15.1490.237   0.020.092 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
cohens_d
1   wh2+hi18.7   11.910.574  1.64e-05   
0.1753
2   wh2+rc18.7   14.460.297  9.24e-06   
0.1101
3   whaian18.7   13.540.384  9.01e-05   
0.1335
4   whasan18.76.002.123 2.20e-119   
0.3326
5   whblck18.75.832.213  0.00e+00   
0.3490
6   whcsam18.77.931.363  1.27e-47   
0.2793
7   wh cub18.7   10.850.728  6.12e-08   
0.2025
8   whdmcn18.77.131.629  1.59e-15   
0.2981
9   whhisp18.79.720.928 3.27e-125   
0.2420
10  wh mex18.79.600.952 8.81e-103   
0.2420
11  whnhpi18.7   16.140.162  1.74e-01   
0.0669
12  whothh18.7  NA   NANA   
NA
13  wh  pr18.7   10.470.791  3.64e-23   
0.2131
14  wh spn18.7   15.150.237  1.58e-02   
0.0922

Pradip K. Muhuri, PhD
Statistician
Substance Abuse  Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Friday, December 14, 2012 3:22 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: [R] format.pval () and printCoefmat ()


On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi List,
 
 My goal is to force R not to print in scientific notation in the sixth column 
 (rel_diff - for the p-value) of my data frame (not a matrix).
 
 I have used the format.pval () and printCoefmat () functions on the data 
 frame. The R script is appended below.
 
 This issue is that use of the format.pval () and printCoefmat () functions on 
 the data frame gives me the desired results, but coerces the character string 
 into NAs for the two character variables, because my object is a data frame, 
 not a matrix. Please see the first output below: contrast_level1 
 contrast_level2).
 
 Is there a way I could have avoid printing the NAs in the character fields

They are probably factor columns.

 when using the format.pval () and printCoefmat () on the data frame?
 
 I would appreciate receiving your help.
 
 Thanks,
 
 Pradip
 setwd (F:/PR1/R_PR1)
 
 load (file = sigtests_overall_withid.rdata)
 
 #format.pval(tt$p.value, eps=0.0001)
 
 # keep only selected columns from the above data frame
 keep_cols1 - c(contrast_level1, contrast_level2,mean_level1,
mean_level2, rel_diff,
  p_mean, cohens_d)
 
 #subset the data frame
 y0410_1825_mf_alc - subset (sigtests_overall_withid,
  years==0410  age_group==1825
   gender_group==all  drug==alc
   contrast_level1==wh,
  select=keep_cols1)
 #change the row.names
 row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1]
 
 #force
 format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001)

Presumably that

Re: [R] Combined Marimekko/heatmap

2012-12-14 Thread Thomas Stewart

Here is a pretty simple function.
-tgs

ColorBlocks - function(rows, columns, ColorMatrix, RowNames, ColumnNames,
TextMatrix = NULL){

# This function takes the following inputs:
# rows : a numeric vector denoting the relative heights of rows
# columns : a numeric vector denoting the relative widths of columns
# ColorMatrix : a numeric matrix of integers representing colors
# RowNames : a character vector of row names
# ColumnNames : a character vector of column names
# TextMatrix : optional, a character matrix of text to print in the center
of each rectangle

# Colors are selected from palette().  To change colors, see help(palette).

rows - rows / sum(rows)
columns - columns / sum(columns)
old.pars - par(no.readonly=TRUE)
LeftMargin - max(strwidth(RowNames,inches)) * 1.04
TopMargin - max(strheight(ColumnNames,inches)) * 1.4
par(tcl = 0, mgp =
c(1,.2,0),mai=c(0.04,LeftMargin,TopMargin,0.04),cex.axis=.8)
plot.new()
plot.window(xlim=c(0,1),ylim=c(-1,0),xaxs='i',yaxs='i')
rrr - -c(0,cumsum(rows))
ccc - c(0,cumsum(columns))

for(i in 1:nrow(ColorMatrix)){
for(j in 1:ncol(ColorMatrix)){
rect(ccc[j],rrr[i+1],ccc[j+1],rrr[i],col=ColorMatrix[i,j],border=white)
if(!is.null(TextMatrix[i,j])) text( mean(ccc[j:(j+1)]), mean(rrr[i:(i+1)]),
TextMatrix[i,j])
}
}

axis(2,at = -(cumsum(rows) + head(-rrr,-1))/2, labels =
RowNames,line=0,las=1)
axis(3,at = (cumsum(columns) + head(ccc,-1))/2, labels = ColumnNames)
box()
par(old.pars)
}


Z - textConnection(
country TWh
Australia 244
Canada 522
China 3503
EU 2500
India 689.5
Japan 997
USA 3960
)
CountryEnergy - read.table(Z,header=TRUE)
close(Z)

Z - textConnection(
Product TWh
Freezer 72.4
Refrigerator 379
HVAC 466
Lighting 123
Television 152
)
ProductEnergy - read.table(Z,header=TRUE)
close(Z)

#Generate Random Color Matrix
set.seed(325034)
RandomColors - matrix(sample(1:100,35),nrow=7,ncol=5)

#Choose the entire palette of colors
palette(colors())

#
ColorBlocks(
rows = CountryEnergy[,2],
columns = ProductEnergy[,2],
ColorMatrix = RandomColors,
RowNames = CountryEnergy[,1],
ColumnNames = ProductEnergy[,1]
)







On Fri, Dec 14, 2012 at 4:12 PM, Neal Humphrey nhumph...@clasponline.orgwrote:

 Your right, the way Ive proposed the graphic it simplifies the story. In
 this graph, the size of the box is approximately how much energy that
 country uses for that appliance  but not exactly. For most appliances it
 should be an okay estimate, although for example air conditioners in Russia
 would be an exception. 

 ** **

 Im proposing this format mostly for readability  when there are 20-30
 countries and  appliances, Im concerned about misaligning the appliances
 from their labels. The main story Im trying to tell with the graphic is
 how much standards coverage there is and for which appliances, while the
 amount of energy is a secondary variable. Its also secondary because the
 actual energy savings potential from a standard will vary widely based on
 how strict the standard is. 

 ** **

 I was planning to complement this with a treemap 
 graphichttp://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/,
 where you trade off readability of the secondary category for accurate
 representation of per country per appliance energy use. 

 ** **

 Talking through it now, maybe youre right that just going with the
 Marimekko-like plot, with some careful formatting cleanup in Illustrator,
 is a better representation? Id need to make some assumptions to fill holes
 in my data (its hard to get per country-per appliance data at the level of
 granularity Im aiming for). 

 ** **

 Im open to critiques or suggestions about the approach. Either way, I
 still need to figure out how to do the true/false color coding so any
 pointers on that are welcome. 

 ** **

 ** **

 ** **

 *Neal Humphrey*

 Tel: +1 202.662.7241 | Skype: neal.s.humphrey | nhumph...@clasponline.org*
 ***

 www.clasponline.org

 ** **

 ** **

 *From:* tgstew...@gmail.com [mailto:tgstew...@gmail.com] *On Behalf Of *Thomas
 Stewart
 *Sent:* Friday, December 14, 2012 3:18 PM
 *To:* Neal Humphrey

 *Subject:* Re: [R] Combined Marimekko/heatmap

 ** **

 Neal-

 ** **

 Help me understand something:  In my mind, the size of each box---say
 Country A, appliance A1---should communicate the percentage of (total)
 energy consumption in country A using appliance A1.  They way you've set up
 your plot does not do that.  I guess my question is: What information does
 the size of the box communicate in your plot?

 ** **

 I'd be happy to help once I understand.

 ** **

 -tgs

 ** **

 On Fri, Dec 14, 2012 at 1:03 PM, Neal Humphrey nhumph...@clasponline.org
 wrote:

 Thomas,

 This is a big help for getting me started. Brand new to R, so I'm
 unfamiliar with how to 'manually' drawing graphs (instead of packages).

 The graph your code makes is more like a Marimekko chart. What I'm
 thinking of is like a heatmap but each row has a different width, and each
 column

Re: [R] A question on list and lapply

2012-12-14 Thread arun

HI Bill,

bases
#$O
#[1] Oak Harbor

#$P
#[1] Pensicola
#
#$Q
#[1] Quonset Point

res2- lapply(bases,function(x) {if(names(bases)[match.call()[[2]][[3]]]%in% 
P) tolower(x) else paste0((,x,))})
res2
#$O
#[1] (Oak Harbor)
#
#$P
#[1] pensicola
#
#$Q
#[1] (Quonset Point)
#the names of the list elements are not lost  here, while in your solution, it 
is not there and needs to be named again.


res1-lapply(seq_along(bases),
 function(i){ base - bases[i] ; if (names(base) != P) 
paste0((,base,)) else tolower(base) } )
names(res1)
#NULL
 names(res2)
#[1] O P Q
A.K.



- Original Message -
From: William Dunlap wdun...@tibco.com
To: arun smartpink...@yahoo.com; Christofer Bogaso 
bogaso.christo...@gmail.com
Cc: R help r-help@r-project.org
Sent: Friday, December 14, 2012 3:59 PM
Subject: RE: [R] A question on list and lapply

 lapply(lapply(Dat,My_Function),function(x) 
 {if(names(Dat)[match.call()[[2]][[3]]]%in%
 P) NULL else x})

match.call()[[2]][[3]], gack!

In lapply(X, FUN), FUN is applied to X[[i]], which has lost the names attribute 
that X
may have had.  X[i] retains a part of the names attribute (since it is a 
sublist of X, not an element
of X).  Hence FUN can look at the name associated with X[i] with code like the 
following:
     lapply(seq_along(X), FUN=function(i) { Xi - X[i] ; names(Xi) })

E.g., to apply one sort of processing to elements named P and another sort to 
those
not named P you can do:
   bases - list(O=Oak Harbor,P=Pensicola,Q=Quonset Point)
   lapply(seq_along(bases), function(i){ base - bases[i] ; if (names(base) != 
P) paste0((,base,)) else tolower(base) } )
  [[1]]
  [1] (Oak Harbor)  
  
  [[2]]
  [1] pensicola
  
  [[3]]
  [1] (Quonset Point)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of arun
 Sent: Friday, December 14, 2012 12:11 PM
 To: Christofer Bogaso
 Cc: R help
 Subject: Re: [R] A question on list and lapply
 
 Hi,
 
 If you want the list element P to be present as NULL in the result
 you could use this:
 set.seed(51)
 lapply(lapply(Dat,My_Function),function(x) 
 {if(names(Dat)[match.call()[[2]][[3]]]%in%
 P) NULL else x})
 A.K.
 
 
 
 
 - Original Message -
 From: Christofer Bogaso bogaso.christo...@gmail.com
 To: r-help@r-project.org
 Cc:
 Sent: Friday, December 14, 2012 1:58 PM
 Subject: [R] A question on list and lapply
 
 Dear all, let say I have following list:
 
 Dat - vector(list, length = 26)
 names(Dat) - LETTERS
 My_Function - function(x) return(rnorm(5))
 Dat1 - lapply(Dat, My_Function)
 
 
 However I want to apply my function 'My_Function' for all elements of 'Dat' 
 except the
 elements having 'names(Dat) == P'. Here I have specified the name P just 
 for
 illustration however this will be some name specified by user.
 
 Is there any direct way to achieve this, using 'lapply'?
 
 Thanks for your help.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] format.pval () and printCoefmat ()

2012-12-14 Thread arun

Hi Pradip,

May be this helps:
dat1-read.table(text=
 contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff    p_mean 
cohens_d
1  wh    2+hi    18.7  11.91    0.574  1.64e-05  
0.1753
2  wh    2+rc    18.7  14.46    0.297  9.24e-06  
0.1101
3  wh    aian    18.7  13.54    0.384  9.01e-05  
0.1335
4  wh    asan    18.7    6.00    2.123 2.20e-119  
0.3326
5  wh    blck    18.7    5.83    2.213  0.00e+00  
0.3490
6  wh    csam    18.7    7.93    1.363  1.27e-47  
0.2793
7  wh    cub    18.7  10.85    0.728  6.12e-08  
0.2025
8  wh    dmcn    18.7    7.13    1.629  1.59e-15  
0.2981
9  wh    hisp    18.7    9.72    0.928 3.27e-125  
0.2420
10  wh    mex    18.7    9.60    0.952 8.81e-103  
0.2420
11  wh    nhpi    18.7  16.14    0.162  1.74e-01  
0.0669
12  wh    othh    18.7  NA  NA    
NA  NA
13  wh  pr    18.7  10.47    0.791  3.64e-23  
0.2131
14  wh    spn    18.7  15.15    0.237  1.58e-02  
0.0922
,sep=,header=TRUE,stringsAsFactors=FALSE)
 Lines1-capture.output(printCoefmat(dat1[,-c(1:2)],has.Pvalue=TRUE,eps.Pvalue=0.001))
Lines2-gsub(\\s+$,,gsub(\\.$,,Lines1[1:15]))
res-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE))
#or
# res-cbind(dat1[,1:2],read.table(text=Lines2,header=TRUE))


 res
#   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean
#1   wh    2+hi    18.7   11.91    0.574 0.
#2   wh    2+rc    18.7   14.46    0.297 0.
#3   wh    aian    18.7   13.54    0.384 0.0001
- 

--

# cohens_d
#1    0.1753
#2    0.1101
#3    0.1335
-
-

 str(res)
#'data.frame':    14 obs. of  7 variables:
# $ contrast_level1: chr  wh wh wh wh ...
# $ contrast_level2: chr  2+hi 2+rc aian asan ...
# $ mean_level1    : num  18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 ...
# $ mean_level2    : num  11.91 14.46 13.54 6 5.83 ...
# $ rel_diff   : num  0.574 0.297 0.384 2.123 2.213 ...
# $ p_mean : num  0e+00 0e+00 1e-04 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 
0e+00 ...
# $ cohens_d   : num  0.175 0.11 0.134 0.333 0.349 ...


A.K.

- Original Message -
From: Muhuri, Pradip (SAMHSA/CBHSQ) pradip.muh...@samhsa.hhs.gov
To: 'David Winsemius' dwinsem...@comcast.net
Cc: R help r-help@r-project.org
Sent: Friday, December 14, 2012 5:18 PM
Subject: Re: [R] format.pval () and printCoefmat ()

Hi David,

Thank you so much for helping me with the code.

Your suggested code gives me the following results. Please see below. I don't 
understand why I am getting two  blocks of prints (5 columns, and then 7 
columns), with some columns repeated.

Regards,

Pradip
#

 cbind(  y0410_1825_mf_alc[ 1:2],  
+         printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, 
eps.Pvalue=0.0001)
+ )
   mean_level1 mean_level2 rel_diff p_mean cohens_d  
1       18.744      11.911    0.574   0.00    0.175  
2       18.744      14.455    0.297   0.00    0.110  
3       18.744      13.540    0.384   0.00    0.133  
4       18.744       6.002    2.123   0.00    0.333  
5       18.744       5.834    2.213   0.00    0.349  
6       18.744       7.933    1.363   0.00    0.279  
7       18.744      10.849    0.728   0.00    0.203  
8       18.744       7.130    1.629   0.00    0.298  
9       18.744       9.720    0.928   0.00    0.242  
10      18.744       9.600    0.952   0.00    0.242  
11      18.744      16.135    0.162   0.17    0.067 .
12      18.744          NA       NA     NA       NA  
13      18.744      10.465    0.791   0.00    0.213  
14      18.744      15.149    0.237   0.02    0.092 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff    p_mean 
cohens_d
1               wh            2+hi        18.7       11.91    0.574  1.64e-05   
0.1753
2               wh            2+rc        18.7       14.46    0.297  9.24e-06   
0.1101
3               wh            aian        18.7       13.54    0.384  9.01e-05   
0.1335
4               wh            asan        18.7        6.00    2.123 2.20e-119   
0.3326
5               wh            blck        18.7        5.83    2.213  0.00e+00   
0.3490
6               wh            csam        18.7        7.93    1.363  1.27e-47   
0.2793
7               wh             cub        18.7       10.85    0.728  6.12e-08   
0.2025
8

Re: [R] format.pval () and printCoefmat ()

2012-12-14 Thread David Winsemius


On Dec 14, 2012, at 2:18 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

 Hi David,
 
 Thank you so much for helping me with the code.
 
 Your suggested code gives me the following results. Please see below. I don't 
 understand why I am getting two  blocks of prints (5 columns, and then 7 
 columns), with some columns repeated.

I am not sure why you think we will be able to answer this question. Your data 
is on your drive and you have not provided it in a form that would allow one to 
assist any further than the apparently inaccurate guesses already offered. You 
should consult this function:

?dput   # use this with:  y0410_1825_mf_alc

-- 
David.

 
 Regards,
 
 Pradip
 #
 
 cbind(  y0410_1825_mf_alc[ 1:2],  
 + printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, 
 eps.Pvalue=0.0001)
 + )
   mean_level1 mean_level2 rel_diff p_mean cohens_d  
 1   18.744  11.9110.574   0.000.175  
 2   18.744  14.4550.297   0.000.110  
 3   18.744  13.5400.384   0.000.133  
 4   18.744   6.0022.123   0.000.333  
 5   18.744   5.8342.213   0.000.349  
 6   18.744   7.9331.363   0.000.279  
 7   18.744  10.8490.728   0.000.203  
 8   18.744   7.1301.629   0.000.298  
 9   18.744   9.7200.928   0.000.242  
 10  18.744   9.6000.952   0.000.242  
 11  18.744  16.1350.162   0.170.067 .
 12  18.744  NA   NA NA   NA  
 13  18.744  10.4650.791   0.000.213  
 14  18.744  15.1490.237   0.020.092 .
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diffp_mean 
 cohens_d
 1   wh2+hi18.7   11.910.574  1.64e-05 
   0.1753
 2   wh2+rc18.7   14.460.297  9.24e-06 
   0.1101
 3   whaian18.7   13.540.384  9.01e-05 
   0.1335
 4   whasan18.76.002.123 2.20e-119 
   0.3326
 5   whblck18.75.832.213  0.00e+00 
   0.3490
 6   whcsam18.77.931.363  1.27e-47 
   0.2793
 7   wh cub18.7   10.850.728  6.12e-08 
   0.2025
 8   whdmcn18.77.131.629  1.59e-15 
   0.2981
 9   whhisp18.79.720.928 3.27e-125 
   0.2420
 10  wh mex18.79.600.952 8.81e-103 
   0.2420
 11  whnhpi18.7   16.140.162  1.74e-01 
   0.0669
 12  whothh18.7  NA   NANA 
   NA
 13  wh  pr18.7   10.470.791  3.64e-23 
   0.2131
 14  wh spn18.7   15.150.237  1.58e-02 
   0.0922
 
 Pradip K. Muhuri, PhD
 Statistician
 Substance Abuse  Mental Health Services Administration
 The Center for Behavioral Health Statistics and Quality
 Division of Population Surveys
 1 Choke Cherry Road, Room 2-1071
 Rockville, MD 20857
  
 Tel: 240-276-1070
 Fax: 240-276-1260
 e-mail: pradip.muh...@samhsa.hhs.gov
  
 The Center for Behavioral Health Statistics and Quality your feedback.  
 Please click on the following link to complete a brief customer survey:   
 http://cbhsqsurvey.samhsa.gov
 
 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net] 
 Sent: Friday, December 14, 2012 3:22 PM
 To: Muhuri, Pradip (SAMHSA/CBHSQ)
 Cc: R help
 Subject: Re: [R] format.pval () and printCoefmat ()
 
 
 On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
 
 Hi List,
 
 My goal is to force R not to print in scientific notation in the sixth 
 column (rel_diff - for the p-value) of my data frame (not a matrix).
 
 I have used the format.pval () and printCoefmat () functions on the data 
 frame. The R script is appended below.
 
 This issue is that use of the format.pval () and printCoefmat () functions 
 on the data frame gives me the desired results, but coerces the character 
 string into NAs for the two character variables, because my object is a data 
 frame, not a matrix. Please see the first output below: contrast_level1 
 contrast_level2).
 
 Is there a way I could have avoid printing the NAs in the character fields
 
 They are probably factor columns.
 
 when using the format.pval () and printCoefmat () on the data frame?
 
 I would appreciate receiving your help.
 
 Thanks,
 
 Pradip
 setwd (F:/PR1/R_PR1)
 
 load (file = sigtests_overall_withid.rdata)
 
 #format.pval(tt$p.value, eps=0.0001)
 
 # keep only selected columns from the above data frame
 keep_cols1 - c(contrast_level1, contrast_level2,mean_level1,
   mean_level2,

47 matches

Mail list logo