date:20131102

Re: [R] Efficient way to convert covariance to Euclidian distance matrix

2013-11-02 Thread Jari Oksanen

Rolf Turner r.turner at auckland.ac.nz writes:

 
 On 10/31/13 23:14, Takatsugu Kobayashi wrote:

 
  I am struggling to come up with an efficient vectorized way to convert
  20Kx20K covariance matrix to a Euclidian distance matrix as a surrogate for
  dissimilarity matrix. Hopefully I can apply multidimensional scaling for
  mapping these 20K points (commercial products).
 

 
 My suspicion is that with a 20K x 20K covariance matrix:
 
  * nothing will work
 
  * even if it did, the results would be meaningless numerical noise.
 
 I.e.  Get real.

FWIW, I have tried NMDS for 19.2 K observations. It worked. The results looked 
sensible (i.e., they were not meaningless numerical noise). It was not fast, 
though, but it can be done. 

Cheers, Jari Oksanen 

PS. Sorry for excessive pruning, but gmane does not allow me post if I don't
remove some quoted lines.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ascii-grid export

2013-11-02 Thread Enzo Cocca

hi,

I want to export in ascii-grid file a semivariogram map that I have did
with gstat library.

How can I make it?

Somebody can help me?
 thanks!

enzo
-- 
Enzo Cocca (PhD Candidate)
Research Fellow
Università di Napoli L'Orientale
mail: enzo@gmail.com
cell: +393495087014

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mapping data to a geographic map of Europe

2013-11-02 Thread Markus Kainu

Dear Claudia,

I have written a tutorial for this task here;

http://www.markuskainu.fi/r-tutorial/eurostat/spatial.html

It uses the shapefile provided by Gisco and plots data from Eurostat.
Packages rgdal and maptools are used to process the data and
SpatialPolygonDataFrame is fortified into data.frame and plotted using
ggplot2-package. Smaterpoland-package is used to access Eurostat data.

Best,
Markus Kainu

2013/11/1 palad...@trustindata.de

Hi Jean,
nevertheless this page R-bloggers looks realy interesting so I'll work
through the tutorial.
Thanks again for recommanding this web-site.

Best regards

Claudia

Zitat von Adams, Jean jvad...@usgs.gov:

Claudia,

I have not worked through the example myself. Since you seem to be
getting
errors, perhaps a different example would help. Here are some more
choropleth maps (although these use US states rather than European
countries).

http://blog.**revolutionanalytics.com/2009/**11/choropleth-challenge-**
result.htmlhttp://blog.revolutionanalytics.com/2009/11/choropleth-challenge-result.html

Jean

On Fri, Nov 1, 2013 at 12:04 PM, palad...@trustindata.de wrote:

Hi Jean,
thanks again for your response. As I told you I did the downloads and
double checked if I selected the right directory.
But I noticed right now what happend:
The command in the example is :

eurMap - readShapePoly(fn=NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_

2010)

But it should be :
eurMap - readShapePoly(fn=NUTS_2010_60M_SH/data/ggg/NUTS_RG_60M_**
2010)

Because if you do the downloads and unzip these data there is no such
think as a Shape directory.

Now eurMap - readShapePoly(fn=NUTS_2010_
60M_SH/data/NUTS_RG_60M_2010)
works.

What happens now is that after typing:
eurEduMapDf - merge(eurMapDf, eurEdu, by.x=id, by.y=GEO)
I get another error message because eurMapDf is unknown.

So I supposed it should be:
eurEduMapDf - merge(eurMap, eurEdu, by.x=id, by.y=GEO)

But this doesn't work either.The error message this time is: undefined
column selected.

Did I do something wrong?

Best regards

Claudia

Zitat von Adams, Jean jvad...@usgs.gov:

Claudia,

You should cc r-help on all correspondence so that others can follow the
thread.

In the second paragraph of the link I sent you

http://www.r-bloggers.com/maps-in-r-choropleth-maps/http://www.r-bloggers.com/**maps-in-r-choropleth-maps/
htt**p://www.r-bloggers.com/maps-**in-r-choropleth-maps/http://www.r-bloggers.com/maps-in-r-choropleth-maps/

a link is provided for the NUTS data,
The polygons for drawing the administrative boundaries were
obtained
from this link. In particular, the NUTS 2010 shapefile in the 1:60
million
scale was downloaded and used. The other available scales would allow
the
drawing of better defined maps, but at a computational cost. The zipped
file has to be extracted in a folder of choice for using it later.

http://epp.eurostat.ec.europa.eu/portal/page/portal/gisco_
Geographical_information_maps/popups/references/
administrative_units_
**statistical_units_1http://**epp.eurostat.ec.europa.eu/**
portal/page/portal/gisco_**Geographical_information_maps/**
popups/references/**administrative_units_**statistical_units_1http://epp.eurostat.ec.europa.eu/portal/page/portal/gisco_Geographical_information_maps/popups/references/administrative_units_statistical_units_1

If you want to follow the example, you will need to download this data
to
your computer and then make sure that you refer to the appropriate
directory when using the readShapePoly() function.

Jean

On Wed, Oct 30, 2013 at 11:14 AM, palad...@trustindata.de wrote:

Hi Jean,

thank you for your advice.
The page looks quite interesting and I tried the example in GNU R. I
did
all the downloads.
But
Just in the beginnig after typing

eurMap - readShapePoly(fn=NUTS_2010_**60M_SH/Shape/data/NUTS_RG_
**60M_

2010)
I get the following error message:

Error in getinfo.shape(filen) : Error opening SHP file

To you have an idea what I did wrong?

Thanks a lot and best regards

Claudia

Zitat von Adams, Jean jvad...@usgs.gov:

Check out this link for some examples

http://www.r-bloggers.com/**maps-in-r-choropleth-maps/http://www.r-bloggers.com/maps-in-r-choropleth-maps/
htt**p://www.r-bloggers.com/**maps-**in-r-choropleth-maps/http://www.r-bloggers.com/**maps-in-r-choropleth-maps/

htt**p://www.r-bloggers.com/**maps-**in-r-choropleth-maps/http://www.r-bloggers.com/maps-**in-r-choropleth-maps/
h**ttp://www.r-bloggers.com/maps-**in-r-choropleth-maps/http://www.r-bloggers.com/maps-in-r-choropleth-maps/

Jean

On Tue, Oct 29, 2013 at 12:02 PM, palad...@trustindata.de wrote:

Hello,

I would like to draw a map of Europe. Each country should be colored
depending on how it scores in an index called GPIndex.
Say a dark red for real bad countries a light red for those which are

Re: [R] ascii-grid export

2013-11-02 Thread Gergely Daróczi

If I understand what you are up to, my pander package might help. You
mean something like this?
I used the example form the package manual.

 library(pander)
 data(meuse)
 coordinates(meuse) = ~x+y
 pander(variogram(log(zinc)~1, meuse))


 np   dist   gamma   dir.hor   dir.ver   id
 -- --- - - 
 57  79.29  0.1234  0 0 var1

299   164   0.2162  0 0 var1

419  267.4  0.3028  0 0 var1

457  372.7  0.4121  0 0 var1

547  478.5  0.4634  0 0 var1

533  585.3  0.5647  0 0 var1

574  693.1   0.569  0 0 var1

564  796.2  0.6187  0 0 var1

589  903.1  0.6471  0 0 var1

543   1011  0.6916  0 0 var1

500   1118  0.7034  0 0 var1

477   1221  0.6039  0 0 var1

452   1329  0.6517  0 0 var1

457   1437  0.5665  0 0 var1

415   1543  0.5748  0 0 var1


Best,
Gergely


On 2 November 2013 09:55, Enzo Cocca enzo@gmail.com wrote:

 hi,

 I want to export in ascii-grid file a semivariogram map that I have did
 with gstat library.

 How can I make it?

 Somebody can help me?
  thanks!

 enzo
 --
 Enzo Cocca (PhD Candidate)
 Research Fellow
 Università di Napoli L'Orientale
 mail: enzo@gmail.com
 cell: +393495087014

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Packages in Mac

2013-11-02 Thread Jim Silverton

Hello everyone,
I am interested in finding out why some packages don't work on Mac like the
package 'vcd'
Is there a fix?

-- 
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Path Analysis

2013-11-02 Thread Sarah Rogers

 Hello,

I have just started to work on a path analysis (see attached image for the
diagram), but I have encountered an error message.




This is the code I have used:

cov_matrix-var(xdata)

library(sem)
model.xdata-specifyModel()
x1 - y2, xy12, NA
x2 - y1, xy21, NA
y1 - y2, yy12, NA
y2 - y3, yy23, NA
y2 - y4, yy24, NA
y3 - y4, yy34, NA
y2 - y2, y2error, NA
y1 - y1, y1error, NA
y3 - y3, y3error, NA
y4 - y4, y4error, NA

model.xdata.sem - sem(model.xdata, cov_matrix, nrow(xdata))

and the error message is:
Error in csem(model = model.description, start, opt.flag = 1, typsize =
typsize,  :
  The matrix is non-invertable.

I fear to have a problem in the data.
I would be very grateful if you could help me to solve this problem and
proceed with my analyses.

thank you in advance for your help!
Sarah

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to compute correlation between times (with buffer before and after times)?

2013-11-02 Thread Yla Savh

Dear forum members:
 
I have two series of times to correlate. Additionally, I have
to allow for a buffer before and after the times so that they can be within 20 
seconds
of each other and not seem discordant. 


Example of my data:


Time 1 Time 2 
9:26 9:26 
10:31 10:29 
9:22  
9:35 9:35 
9:38 9:37 
9:28 9:28 
9:17 9:16 
9:34 9:34 
9:08 9:07 
 
Thanks,
Julia
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Packages in Mac

2013-11-02 Thread Prof Brian Ripley


On 02/11/2013 17:03, Jim Silverton wrote:

Hello everyone,
I am interested in finding out why some packages don't work on Mac like the
package 'vcd'
Is there a fix?


What does don't work mean?  That one does: see 
http://cran.r-project.org/web/packages/vcd/index.html and the results page.


Vague false accusations are unfair to the package maintainers and CRAN team.

You were asked not to send HTML mail and to ask on the correct list: see 
the posting guide.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Packages in Mac

2013-11-02 Thread Ista Zahn

The vcd package works for me on a mac. Please be more specific,
describe what you did, and what happened that you found surprising.
Also include the output of sessionInfo()

Best,
Ista

On Sat, Nov 2, 2013 at 1:03 PM, Jim Silverton jim.silver...@gmail.com wrote:
 Hello everyone,
 I am interested in finding out why some packages don't work on Mac like the
 package 'vcd'
 Is there a fix?

 --
 Thanks,
 Jim.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Path Analysis

2013-11-02 Thread John Fox

Dear Sarah,

It's generally a good idea to include a reproducible example if you want to get 
help with a problem, but in this case it's a safe bet that the problem is that 
the model you specified has no variance or covariance parameters for the 
variables x1 and x2, which, I assume, you mean to be exogenous. The easiest way 
to include these variances and covariance in the model is to specify the 
argument fixed.x=c(x1, x2) in the call to sem().

In addition:

(1) Your model is fully recursive (guessing that all the x's and y's are 
observed variables), and so it amounts to four OLS regressions. You could just 
use lm() to fit the model.

(2) It's generally easier in the sem package to use specifyEquations() than 
specifyModel() for model specification.

(3) If you have the original data set, as you do, it's generally preferable to 
use the data argument to sem() than to pass it the covariance matrix for the 
observed variables.

I hope that this helps,
 John



John Fox
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

On Sat, 2 Nov 2013 11:02:31 +0100
 Sarah Rogers rogerssara...@gmail.com wrote:
  Hello,
 
 I have just started to work on a path analysis (see attached image for the
 diagram), but I have encountered an error message.
 
 
 
 
 This is the code I have used:
 
 cov_matrix-var(xdata)
 
 library(sem)
 model.xdata-specifyModel()
 x1 - y2, xy12, NA
 x2 - y1, xy21, NA
 y1 - y2, yy12, NA
 y2 - y3, yy23, NA
 y2 - y4, yy24, NA
 y3 - y4, yy34, NA
 y2 - y2, y2error, NA
 y1 - y1, y1error, NA
 y3 - y3, y3error, NA
 y4 - y4, y4error, NA
 
 model.xdata.sem - sem(model.xdata, cov_matrix, nrow(xdata))
 
 and the error message is:
 Error in csem(model = model.description, start, opt.flag = 1, typsize =
 typsize,  :
   The matrix is non-invertable.
 
 I fear to have a problem in the data.
 I would be very grateful if you could help me to solve this problem and
 proceed with my analyses.
 
 thank you in advance for your help!
 Sarah
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to obtain nonparametric baseline hazard estimates in the gamma frailty model?

2013-11-02 Thread Y

I can easily get the other parameter estimates by using coxph() but don't
know how to get the baseline hazard estimates from it. Does anyone know how
to obtain nonparametric baseline hazard estimates in the gamma frailty
model?

Thanks,
YH

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (gam) formula: Why different results for terms being factor vs. numeric?

2013-11-02 Thread Marius Hofert

Dear Bert,

Thanks for helping.

Your questions 'answers' why I get the expected behavior if
'group' is a factor. My question was why I don't get the expected
behavior if 'group' is not a factor.

From a theoretical (non-programming) point of view, there is no
difference in a factor with two levels and a 0-1 (bool/integer)
variable (in my case the 1-2 variable 'group'). gam() interprets
these inputs differently, thus distinguishes these cases (and I
was wondering why; In my opinion, this is a purely R/mgcv related
question and belongs here).

As it turned out, the problem was merely the following: By using
factors and thus specifying a GAM, the intercept was 'hidden' in
the estimated coefficients. When using integers as group
variables, this is a glm and there one needs the intercept. The
examples below provide the details.

With best wishes,

Marius



require(mgcv)
n - 10
yrs - 2000+seq_len(n)
loss - c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n))


## Version 1: gam() with 'group' as factor #

set.seed(271)
dat - data.frame(year  = rep(yrs, 2),
  group = as.factor(rep(1:2, each=n)), # could also be A, B
  resp  = loss)
fit1 - glm(resp ~ year + group - 1, data=dat)
plot(yrs, fit1$fitted.values[seq_len(n)], type=l, ylim=range(dat$resp),
 xlab=Year, ylab=Response) # fit group A; mean over all
responses in this group
lines (yrs, fit1$fitted.values[n+seq_len(n)], col=blue) # fit group
B; mean over all responses in this group
points(yrs, dat$resp[seq_len(n)]) # actual response group A
points(yrs, dat$resp[n+seq_len(n)], col=blue) # actual response group B


## Version 2: gam() with 'group' as numeric (= glm) ###

set.seed(271)
dat - data.frame(year  = rep(yrs, 2),
  group = rep(1:2, each=n), # could also be 0:1
  resp  = loss)
fit2 - glm(resp ~ year + group - 1, data=dat) # (*)
plot(yrs, fit2$fitted.values[seq_len(n)], type=l, ylim=range(dat$resp),
 xlab=Year, ylab=Response) # fit group A; mean over all
responses in this group
lines (yrs, fit2$fitted.values[n+seq_len(n)], col=blue) # fit group
B; mean over all responses in this group
points(yrs, dat$resp[seq_len(n)]) # actual response group A
points(yrs, dat$resp[n+seq_len(n)], col=blue) # actual response group B

## Note: without '-1' (intercept) in (*), an unexpected behavior results
## Explanation:
## S. Wiki GAM (without beta_0):
##g(E(Y)) = f_1(x_1) + f_2(x_2)
## where f_i(x_i) may be functions with a specified parametric form
(for example a
## polynomial, or a coefficient depending on the levels of a factor variable)
## = for f_i's being coefficients (numbers) beta_i, this is a GLM:
##g(E(Y)) = beta_1 x_1 + beta_2 x_2 (x_1 = year, x_2 = group)
## Problem: (*) does not specify an intercept and thus the lines are
not picked up correctly
fit2$coefficients


## Version 3: Version 2 with intercept #

set.seed(271)
dat - data.frame(year  = rep(yrs, 2),
  group = rep(1:2, each=n), # could also be 0:1
  resp  = loss)
fit3 - glm(resp ~ year + group, data=dat) # now with intercept
plot(yrs, fit3$fitted.values[seq_len(n)], type=l, ylim=range(dat$resp),
 xlab=Year, ylab=Response) # fit group A; mean over all
responses in this group
lines (yrs, fit3$fitted.values[n+seq_len(n)], col=blue) # fit group
B; mean over all responses in this group
points(yrs, dat$resp[seq_len(n)]) # actual response group A
points(yrs, dat$resp[n+seq_len(n)], col=blue) # actual response group B
## = correct/as expected
fit3$coefficients

## Note: in Version 1, the intercept is already included in the group
coefficients:
fit1$coefficients



On Tue, Oct 29, 2013 at 9:31 PM, Bert Gunter gunter.ber...@gene.com wrote:
 Think about it. How can one define a smooth term with a factor???

 Further discussion is probably offtopic. Post on
 stats.stackexchange.com if it still isn't obvious.

 Cheers,
 Bert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ascii-grid export

2013-11-02 Thread Barry Rowlingson

Or do you mean you want to write the gridded output of an
interpolation you did (perhaps using kriging) in gstat as an ESRI
ASCII Grid file for reading into a GIS?

If so, you can probably do it with writeGDAL from the rgdal package,
or writeRaster from the raster package.

I don't really know what a 'semivariogram map' is.


On Sat, Nov 2, 2013 at 8:55 AM, Enzo Cocca enzo@gmail.com wrote:
 hi,

 I want to export in ascii-grid file a semivariogram map that I have did
 with gstat library.

 How can I make it?

 Somebody can help me?
  thanks!

 enzo
 --
 Enzo Cocca (PhD Candidate)
 Research Fellow
 Università di Napoli L'Orientale
 mail: enzo@gmail.com
 cell: +393495087014

 [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing RCurl: 'configure' exists but is not executable

2013-11-02 Thread Michael Hannon

That seems OK.  Are you trying to install the RCurl package from source?
 IIRC, the configure script is commonly used in GNU-style,
compile-from-source installations.

-- Mike


On Fri, Nov 1, 2013 at 8:38 PM, Rainer Schuermann rainer.schuerm...@gmx.net
 wrote:

 #  ls -l `which curl-config`
 -rwxr-xr-x 1 root root 6327 Oct 20 15:25 /usr/bin/curl-config



 On Friday 01 November 2013 20:21:36 Michael Hannon wrote:
  The error message doesn't seem to refer to the tmp directory.  What do
 you
  get from:
 
  ls -l `which curl-config`
 
  -- Mike
 
 
  On Fri, Nov 1, 2013 at 7:43 PM, Rainer Schuermann 
 rainer.schuerm...@gmx.net
   wrote:
 
   I'm trying to install.packages( RCurl ) as root but get
   ERROR: 'configure' exists but is not executable
  
   I remember having had something like that before on another machine and
   tried in bash what is described here
   http://mazamascience.com/WorkingWithData/?p=1185
   and helped me before:
   # mkdir ~/tmp
   # export TMPDIR=~/tmp
   and added, just in case,
   # chmod u+x $TMPDIR
  
   which apparently does what it should
   # ls -ld $TMPDIR
   drwxrwxrwx 2 root root 4096 Nov  1 08:59 /root/tmp
  
   but it doesn't help, I get the same error.
  
   What else can I try?
  
   Thanks in advance,
   Rainer
  
  
sessionInfo()
   R version 3.0.2 (2013-09-25)
   Platform: x86_64-pc-linux-gnu (64-bit)
  
   locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
   [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
  
   attached base packages:
   [1] stats graphics  grDevices utils datasets  methods   base
  
   loaded via a namespace (and not attached):
   [1] tools_3.0.2
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package(s) for making waffle plot-like figures?

2013-11-02 Thread Jim Lemon


On 11/02/2013 10:35 AM, Zhao Jin wrote:

Dear all,

I am trying to make a series of waffle plot-like figures for my data to
visualize the ratios of amino acid residues at each position. For each one
of 37 positions, there may be one to four different amino acid residues. So
the data consist of the positions, what residues are there, and the ratios
of residues. The ratios of residues at a position add up to 100, or close
to 100 (more on this soon)*. I am hoping to make a *square* waffle
plot-like figure for each position, and fill the 10 X 10 grids with colors
representing each amino acid residue and areas for grids of a certain color
corresponding to the ratio of that residue. Then I could line up all the
plots in one row from position 1 to position 37.
*: if the sum of the ratios is less than 100 at a position, that's because
of an unknown residue which I did not include in the table.

I am attaching the dput output for my data here:
structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L,
8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L,
26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L,
36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L,
7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L,
14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L,
15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L,
12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c(A, C, D, E,
G, H, I, K, L, M, N, P, Q, R, S, T, V,
Y), class = factor), ratio = c(99L, 100L, 100L, 1L, 99L,
100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L,
1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L,
98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L,
100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names =
c(position,
residue, ratio), class = data.frame, row.names = c(1,
2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15,
17, 18, 19, 20, 23, 25, 27, 28, 29, 30, 31,
32, 33, 34, 36, 37, 38, 39, 40, 42, 43, 44,
45, 46, 47, 48, 50, 51, 52, 53, 54, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65))

Inspired by a statexchange post, I am using these scripts to make the plots
:
library(ggplot2)
col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99')
dflist=list()
for (i in 1:37){
residue_num=length(which(df$position==i))
dflist[[i]]=df[df$position==i,2:3]
waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(sum(dflist[[i]]$ratio)/residue_num)))
residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio)
waffle$residue=c(as.vector(residuevec),rep(NA,nrow(waffle)-length(residuevec)))
png(paste('plot',i,'.png',sep=''))
print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color =
white) + scale_fill_manual(residue,values = col4) + coord_equal() +
theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank())
+ theme(axis.ticks=element_blank()) +
theme(axis.text.x=element_blank(),axis.text.y=element_blank()) +
theme(axis.title.x=element_blank(),axis.title.y=element_blank())
)
dev.off()}

With my scripts, I could make a waffle plot, but not a *square* 10 X 10
waffle plot. Also, the grid size differs for positions with different
numbers of residues. I am suspecting that I didn't use coord_equal()
correctly.

So I wonder how I can make the plots like I described above in ggplot2 or
with some other packages. Also, is there a way to assign a color to
different residues, say, purple for alanine, blue for glycine, etc, and
incorporate that information in the for loop?


Hi Zhao,
By beginning with a 10x10 matrix of NA values and then replacing some of 
them with a color, I think you can do what you want. First you need a 
function to fill one corner of your matrix with values, leaving the rest 
uncolored (i.e. NA):


fill.corner-function(x,nrow,ncol) {
 xlen-length(x)
 if(nrow*ncol  xlen) {
  newmat-matrix(NA,nrow=nrow,ncol=ncol)
  xside-1
  while(xside*xside  xlen) xside-xside+1
  row=1
  col=1
  for(xindex in 1:xlen) {
   newmat[row,col]-x[xindex]
   if(row == xside) {
col-col+1
row-1
   }
   else row-row+1
  }
  return(newmat)
 }
 cat(Too many values in x for,xrow,by,xcol,\n)
}

Then you have to massage your data frame into 37 smaller data frames, 
create matrices with the values and colors to display on your 37 waffle 
plots:


library(plotrix)
# get an alphabet of colors
alphacol-rainbow(18)
# the actual values in the plotted matrix don't matter
fakemat-matrix(1:100,nrow=10)
# pick off the positions one by one
for(pos in 1:37) {
 posdf-zjdat[zjdat$position == pos,]
 for(res in 1:dim(posdf)[1]) {
  if(res == 1)
   rescol-rep(alphacol[as.numeric(posdf$residue[res])],
   posdf$ratio[res])
  else
   rescol-c(rescol,rep(alphacol[as.numeric(posdf$residue[res])],
   posdf$ratio[res]))
 }
 if(!is.null(resmat-fill.corner(rescol,10,10)))
  color2D.matplot(fakemat,border=lightgray,cellcolors=resmat,
   yrev=FALSE,main=c(pos,length(resmat)))
}

That might get you started. In fact, I might even write a waffle

Re: [R] plot time series data in wide format

2013-11-02 Thread Jim Lemon


On 11/02/2013 11:08 AM, Gary Dong wrote:

Dear R users,

I wonder if there is a way that I can plot a time series data which is in a
wide format like this:

CITY_NAME   2000Q12000Q2  2000Q32000Q4 2001Q1
2001Q2  2001Q3 2001Q4 2002Q1  2002Q2
CITY1100.5210   101.9667  103.24933   104.0506   104.4317
105.3921   106.7643   107.5202   107.2561   107.8184
CITY2100.0412   100.6146  103.20293   104.0867   104.6612
106.6126   109.3514   110.1943   110.9480   113.0071
CITY3 99.589599.2298   99.2694799.4101   100.5776
101.3719   101.5957   102.2411   103.4390   105.1745
CITY4 99.6491   101.5386  104.90953   106.1065   108.1785
110.6845   113.3746   114.1254   116.2121   119.1033
CITY5100.9828   103.6847  105.04793   106.5925   108.7437
110.5549   111.9343   112.6704   113.6201   115.3020

Ideally, each city of the five city is represented by a line in the plot.


Hi Gary,
I'm not sure that this is exactly what you want, but:

library(plotrix)
# use window() if you are using Windows
x11(width=10,height=5)
bumpchart(garydf[,2:11],rank=FALSE,labels=garydf[,1])

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Efficient way to convert covariance to Euclidian distance matrix

[R] ascii-grid export

Re: [R] mapping data to a geographic map of Europe

Re: [R] ascii-grid export

Re: [R] R Packages in Mac

[R] Path Analysis

[R] how to compute correlation between times (with buffer before and after times)?

Re: [R] R Packages in Mac

Re: [R] R Packages in Mac

Re: [R] Path Analysis

[R] How to obtain nonparametric baseline hazard estimates in the gamma frailty model?

Re: [R] (gam) formula: Why different results for terms being factor vs. numeric?

Re: [R] ascii-grid export

Re: [R] Installing RCurl: 'configure' exists but is not executable

Re: [R] Package(s) for making waffle plot-like figures?

Re: [R] plot time series data in wide format

16 matches

Site Navigation

Mail list logo

Footer information