date:20130403

Hi,

I am using qpplot.das to produce a probability plot of my data. Some of the
data are negative values and once logged, NaN values are produced.

Does anybody know, what happens these NaN values, are they just removed
from the dataset before the other points are plotted?

Thanks

-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] speedometer charts in R

2013-04-03 Thread Barry Rowlingson

On Tue, Apr 2, 2013 at 4:00 PM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 Look at the R GoogleVis package.

 Or read what Hadley W had to say on a similar question first:

The question would why would you want to?  You are trying to
understand your data, not driving a race car or aeroplane.  

 - 
http://r.789695.n4.nabble.com/Graphical-output-dials-and-meters-for-a-dashboard-td845090.html

But maybe you *are* creating a dashboard for an R-powered race car, in
which case here's an R-native version of the google vis speedometers:

http://gastonsanchez.wordpress.com/2013/01/10/gauge-chart-in-r/

Can't wait to see the full source code for your race car:

require(engine)
block = engine(cc=2000,cylinders=6)
require(downforce)
...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] speedometer charts in R

2013-04-03 Thread andrija djurovic

Hi. Thanks for help.
In meanwhile some of contributors already send me the same link with
examples.
I agree with you and I am not trying to drive a race car just to do what
was asked from me :)

Andrija



On Wed, Apr 3, 2013 at 11:24 AM, Barry Rowlingson 
b.rowling...@lancaster.ac.uk wrote:

 On Tue, Apr 2, 2013 at 4:00 PM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:
  Look at the R GoogleVis package.

  Or read what Hadley W had to say on a similar question first:

 The question would why would you want to?  You are trying to
 understand your data, not driving a race car or aeroplane.  

  -
 http://r.789695.n4.nabble.com/Graphical-output-dials-and-meters-for-a-dashboard-td845090.html

 But maybe you *are* creating a dashboard for an R-powered race car, in
 which case here's an R-native version of the google vis speedometers:

 http://gastonsanchez.wordpress.com/2013/01/10/gauge-chart-in-r/

 Can't wait to see the full source code for your race car:

 require(engine)
 block = engine(cc=2000,cylinders=6)
 require(downforce)
 ...

 Barry


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R 3.0.0 is released

2013-04-03 Thread Peter Dalgaard

The build system rolled up R-3.0.0.tar.gz (codename Masked Marvel) this 
morning.

The list below details the changes in this release.

You can get the source code from

 http://cran.r-project.org/src/base/R-3/R-3.0.0.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.


  For the R Core Team

  Peter Dalgaard


These are the md5sums for the freshly created files, in case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = cbf6da8f886ccd8d0dda0cc7ffd1b8ec
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = 43fcae6a4c96e17313d11a0aaefb73f8
MD5 (INSTALL) = 37adac6d0fbadf25b5a40e3f7535415e
MD5 (NEWS) = ed5405acecb3ba4a2d9a3467bbcea7e5
MD5 (NEWS.html) = baea8a4f82a3aa9d29d1a73a34238aa1
MD5 (ONEWS) = 0c3e10eef74439786e5fceddd06dac71
MD5 (OONEWS) = b0d650eba25fc5664980528c147a20db
MD5 (R-latest.tar.gz) = 5fb80535b0e144a978f67aa2158015de
MD5 (README) = e259ae5dd943b8547f0b7719664e815b
MD5 (RESOURCES) = c7cb32499ebbf85deb064aab282f93a4
MD5 (THANKS) = d4b45e302b7cad0fc4bb50d2cfe69649
MD5 (R-3/R-3.0.0.tar.gz) = 5fb80535b0e144a978f67aa2158015de


This is the relevant part of the NEWS file

CHANGES IN R 3.0.0:

  SIGNIFICANT USER-VISIBLE CHANGES:

o Packages need to be (re-)installed under this version (3.0.0) of
  R.

o There is a subtle change in behaviour for numeric index values
  2^31 and larger.  These never used to be legitimate and so were
  treated as NA, sometimes with a warning.  They are now legal for
  long vectors so there is no longer a warning, and x[2^31] - y
  will now extend the vector on a 64-bit platform and give an error
  on a 32-bit one.

o It is now possible for 64-bit builds to allocate amounts of
  memory limited only by the OS.  It may be wise to use OS
  facilities (e.g. ulimit in a bash shell, limit in csh), to set
  limits on overall memory consumption of an R process,
  particularly in a multi-user environment.  A number of packages
  need a limit of at least 4GB of virtual memory to load.

  64-bit Windows builds of R are by default limited in memory usage
  to the amount of RAM installed: this limit can be changed by
  command-line option --max-mem-size or setting environment
  variable R_MAX_MEM_SIZE.

o Negative numbers for colours are consistently an error:
  previously they were sometimes taken as transparent, sometimes
  mapped into the current palette and sometimes an error.

  NEW FEATURES:

o identical() has a new argument, ignore.environment, used when
  comparing functions (with default FALSE as before).

o There is a new option, options(CBoundsCheck=), which controls how
  .C() and .Fortran() pass arguments to compiled code.  If true
  (which can be enabled by setting the environment variable
  R_C_BOUNDS_CHECK to yes), raw, integer, double and complex
  arguments are always copied, and checked for writing off either
  end of the array on return from the compiled code (when a second
  copy is made).  This also checks individual elements of character
  vectors passed to .C().

  This is not intended for routine use, but can be very helpful in
  finding segfaults in package code.

o In layout(), the limits on the grid size have been raised
  (again).

o New simple provideDimnames() utility function.

o Where methods for length() return a double value which is
  representable as an integer (as often happens for package
  Matrix), this is converted to an integer.

o Matrix indexing of dataframes by two-column numeric indices is
  now supported for replacement as well as extraction.

o setNames() now has a default for its object argument, useful for
  a character result.

o StructTS() has a revised additive constant in the loglik
  component of the result: the previous definition is returned as
  the loglik0 component.  However, the help page has always warned
  of a lack of comparability of log-likelihoods for non-stationary
  models.  (Suggested by Jouni Helske.)

o The logic in aggregate.formula() has been revised.  It is now
  possible to use a formula stored in a variable; previously, it
  had to be given explicitly in the function call.

o install.packages() has a new argument quiet to reduce the amount
  of output shown.

o Setting an element of the graphics argument lwd to a negative or
  infinite value is now an error.  Lines corresponding to elements
  with values NA or NaN are silently omitted.

  Previously the behaviour was device-dependent.

o Setting graphical parameters cex, col, lty, lwd and pch in par()
  now requires a length-one argument.  Previously some silently
  took the first element of a longer vector, but not always when
  documented to do so.

o Sys.which() when

[R] Better way of writing R code

2013-04-03 Thread Katherine Gobin

Dear R forum,

(Pl note this is not a finance problem)

I have two data.frames as 

currency_df = data.frame(current_date = c(3/4/2013, 3/4/2013, 3/4/2013, 
3/4/2013), issue_date = c(27/11/2012, 9/12/2012, 14/01/2013, 
28/02/2013), maturity_date = c(27/04/2013, 3/5/2013, 14/6/2013, 
28/06/2013), currency = c(USD, USD, GBP, SEK), other_currency = 
c(EURO, CAD, CHF, USD), transaction = c(Buy, Buy, Sell, Buy), 
units_currency = c(10, 25000, 15, 4), units_other_currency = 
c(78000, 25350, 99200, 6150)) 

rate_df = 
data.frame(date = 
c(28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
 
25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
 
25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
 
25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013), 

currency =  c(USD,USD,USD,USD, USD, USD, USD,USD,USD,USD, 
USD,USD, GBP,GBP,GBP,GBP,GBP,GBP,GBP,GBP, GBP,GBP, 
GBP,GBP, EURO,EURO,EURO,EURO,EURO,EURO,EURO, EURO, 
EURO,EURO, EURO,EURO), 

tenor = c(1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 
week,2 weeks,2 weeks,2 weeks,2 weeks,1 day,1 day,1 day,1 
day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2 
weeks,1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 week,2 
weeks,2 weeks,2 weeks,2 weeks), 

rate = 
c(0.156,0.157,0.157,0.155,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752,
 0.1752,0.48625, 
0.485,0.48625,0.4825,0.49,0.49125,0.4925,0.49,0.49375,0.49125,0.4925, 
0.49125,0.02643,0.02214, 
0.02214,0.01929,0.034,0.034,0.034125,0.034,0.044,0.044, 0.041,0.045))

# ___

# 1st data.frame
 
 currency_df
  current_date issue_date maturity_date currency
1 3/4/2013 27/11/2012    27/04/2013  USD
2 3/4/2013  9/12/2012  3/5/2013  USD
3 3/4/2013 14/01/2013 14/6/2013  GBP
4 3/4/2013 28/02/2013    28/06/2013  SEK
  other_currency transaction units_currency
1  
 EURO Buy 10
2    CAD Buy  25000
3    CHF    Sell 15
4    USD Buy  4
  units_other_currency
1    78000
2   
 25350
3    99200
4 6150

# 
...

# 2nd data.frame

 rate_df
    date currency   tenor rate
1  28/3/2013  USD   1 day 0.156000
2  27/3/2013  USD   1 day 0.157000
3  26/3/2013  USD   1 day 0.157000
4  25/3/2013  USD   1 day 0.155000
5  28/3/2013  USD  1 week 0.175200
6  27/3/2013  USD  1 week
 0.175200
7  26/3/2013  USD  1 week 0.175200
8  25/3/2013  USD  1 week 0.175200
9  28/3/2013  USD 2 weeks 0.175200
10 27/3/2013  USD 2 weeks 0.175200
11 26/3/2013  USD 2 weeks 0.175200
12 25/3/2013  USD 2 weeks 0.175200
13 28/3/2013  GBP   1 day 0.486250
14 27/3/2013  GBP   1 day 0.485000
15 26/3/2013  GBP   1 day 0.486250
16 25/3/2013  GBP   1 day 0.482500
17 28/3/2013  GBP  1 week 0.49
18 27/3/2013  GBP  1 week 0.491250
19 26/3/2013  GBP  1 week 0.492500
20
 25/3/2013  GBP  1 week 0.49
21 28/3/2013  GBP 2 weeks 0.493750
22 27/3/2013  GBP 2 weeks 0.491250
23 26/3/2013  GBP 2 weeks 0.492500
24 25/3/2013  GBP 2 weeks 0.491250
25 28/3/2013 EURO   1 day 0.026430
26 27/3/2013 EURO   1 day 0.022140
27 26/3/2013 EURO   1 day 0.022140
28 25/3/2013 EURO   1 day 0.019290
29 28/3/2013 EURO  1 week 0.034000
30 27/3/2013 EURO  1 week 0.034000
31 26/3/2013 EURO  1 week 0.034125
32 25/3/2013 EURO  1 week 0.034000
33 28/3/2013 EURO 2 weeks 0.044000
34
 27/3/2013 EURO 2 weeks 0.044000
35 26/3/2013 EURO 2 weeks 0.041000
36 25/3/2013 EURO 2 weeks 0.045000

# ___

Using plyr and reshape libraries, I have converted the rate_df into tabular 
form as

   date   USD_1 day USD_1 week USD_2 weeks GBP_1 day
1 25/3/2013 0.155 0.1752  0.1752   0.48250
2 26/3/2013 0.157 0.1752  0.1752   0.48625
3 27/3/2013 0.157 0.1752  0.1752   0.48500
4 28/3/2013 0.156 0.1752  0.1752   0.48625
 
 GBP_1 week GBP_2 weeks EURO_1 day EURO_1 week
1    0.49000 0.49125    0.01929    0.034000
2    0.49250 0.49250    0.02214    0.034125
3    0.49125 0.49125    0.02214    0.034000
4    0.49000 0.49375    0.02643    0.034000
  EURO_2 weeks
1    0.045
2    0.041
3    0.044
4    0.044

# __

Depending on the maturity period, I have defined discount rates as

# FOR USD


 if
  (as.character(currency) ==
 USD)
{
  if
  (as.character(other_currency) == GBP  days_to_maturity = 1)

  {
  libor_rate1 = df_LIBOR_rates$USD_o_n
  libor_rate2 = df_LIBOR_rates$GBP_o_n
  }
  
  else if

[R] Packages on R 3.0.0

2013-04-03 Thread Tal Galili

Hello all,

I see that R 3.0.0 is announced (hurray!), and have a question regarding
this line in the NEWS file:
Packages need to be (re-)installed under this version (3.0.0) of R.

Assuming I copy my packages to the new library folder and run
update.packages will it be enough?  Or is there anything more to do? for
example - for packages that I have which are already of the latest version
- would I still need to use install.packages() on them?

Thanks,
Tal







Contact
Details:---
Contact me: tal.gal...@gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Iterative regression through a series

2013-04-03 Thread Rui Barradas


Hello,

I've made a samll change to the code, and with your example data it's 
now working without errors.



N - nrow(example)

estim - numeric(N)  # It's better to allocate the results
error - numeric(N)  # vectors in advance
for (i in seq_len(N)) {
  regr - lm(example$Price[1:i] ~ example$Time[1:i])
  estim[i] - coef(regr)[2]
  if(is.na(coef(regr)[2]))
error[i] - NA
  else
error[i] - summary(regr)$coefficients[2,2]

}


Hope this helps,

Rui Barradas

Em 03-04-2013 01:57, triskell4-umbre...@yahoo.fr escreveu:

That is very helpful--I've run your code, and it works perfectly with the example data.  However, 
I'm having some problems using my actual data--probably because my Time variable isn't actually a 
regular series (as it was in the example: Time=1:100).  When run, it's producing estim 
and error vectors of lengths much greater than N.  Here's a subset of my actual data:


dput(example)

structure(list(Time = c(3075L, 3168L, 3318L, 3410L, 3534L, 3715L,
3776L, 3926L, 3987L, 4110L, 4232L, 4291L, 4413L, 4505L, 4536L,
4656L, 4782L, 4886L, 5018L, 5138L, 5187L, 5253L, 5384L, 5540L,
5669L, 5740L, 5796L, 5887L, 5963L, 6042L, 6126L, 6197L, 6280L,
6405L, 6464L, 6553L, 6659L, 6755L, 6847L, 6917L, 7001L, 7120L,
7216L), Price = c(2.08, 3.55, 5.75, 5.69, 4.47, 5.11, 2.74, 3.04,
3.87, 4.7, 6.61, 3.95, 4.63, 7.11, 3.08, 4.476628726, 7.472854559,
8.775893276, 6.34, 5.79, 3.98889, 4.01917, 3.69, 4.603636364,
5.242094366, 6.854871699, 5.163700257, 9.154206814, 8.712059541,
10.60635248, 10.58180221, 10.55396909, 10.67812007, 9.985298266,
10.57385693, 9.644945417, 11.62, 12.615, 13.61, 10.83, 8.38,
12.7, 8.94)), .Names = c(Time, Price), row.names = c(NA,
-43L), class = data.frame)

Is this as simple as replacing the expression:

 for (i in Time) {
with

 for (i in 1:length(Time)) {

or somesuch?

Mendi




De : Rui Barradas ruipbarra...@sapo.pt
À : triskell4-umbre...@yahoo.fr triskell4-umbre...@yahoo.fr
Cc : r-help@r-project.org r-help@r-project.org
Envoyé le : Mardi 2 avril 2013 11h51
Objet : Re: [R] Iterative regression through a series

Hello,

The error comes from NAs where you would expect coefficients. Try the
following.



set.seed(7511)  # Make the example reproducible

N - 100
Time -1:N
Price - rnorm(N, 8, 2)

estim - numeric(N)  # It's better to allocate the results
error - numeric(N)  # vectors in advance
for (i in Time) {
   regr - lm(Price[1:i] ~ Time[1:i])
   estim[i] - coef(regr)[2]
   if(is.na(coef(regr)[2]))
 error[i] - NA
   else
 error[i] - summary(regr)$coefficients[2,2]

}


Hope this helps,

Rui Barradas

Em 02-04-2013 17:44, triskell4-umbre...@yahoo.fr escreveu:

Hello,

Some context:  let's say I have a data series (let's call it PRICE, for 
simplicity), sample size N.  I have a desire to regress that on TIME, and then 
to use the TIME and intercept coefficients to predict the price in the next 
period and to use the standard error to calculate a confidence interval.  This 
is all very easy.

However, what I need help for is to calculate a confidence interval for each 
point in time:  imagining that at the end of the 10th period I have 10 data 
points, and wish to regress them on the 10 periods to create a confidence 
interval for the next 'predicted' price.  And so on from TIME[10:100].  So the 
first regression would be of PRICE[1:10] on TIME[1:10], the second of 
PRICE[1:11] on TIME[1:11], the third of PRICE[1:11] on TIME[1:11], and so on to 
PRICE[1:N] and TIME[1:N].  I'd like to be able to vary the starting point (so 
it would need to be an argument in the function, in this case it would be 10).  
The ultimate output of the code would be to save the TIME coefficients and 
standard errors it generates to two vectors, say TIME.coef and TIME.SE.

I'm not sure if lapply() can be bent to my will, or if a for loop would be too 
inefficient, or what.  I'm not new to R, but I'm fairly new to this kind of 
programming.

This is a bungled mess of a narrative, and I apologize.  Please feel free to 
use TIME=1:100 and PRICE=rnorm(100,8,2).

Here's an attempt, which has failed for reasons I can only imagine.  Any help 
getting this to work would be greatly appreciated.  Any help doing this without 
loops would be even better.



Time=1:100
Price=rnorm(100,8,2)

estim=0#I'm hoping this will be the Time coefficient
error=0  #I'm hoping this will be the standard error of the Time coefficient
for (i in Time) {

+regr=lm(Price[1:i]~Time[1:i])
+estim=c(estim,coef(summary(regr))[2,1])
+error=c(error,coef(summary(regr))[2,1])
+}
Error: subscript out of bounds


Many, many thanks in advance.

Mendi
 [[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] Packages on R 3.0.0

2013-04-03 Thread Prof Brian Ripley


On 03/04/2013 11:43, Tal Galili wrote:

Hello all,

I see that R 3.0.0 is announced (hurray!), and have a question regarding
this line in the NEWS file:
Packages need to be (re-)installed under this version (3.0.0) of R.

Assuming I copy my packages to the new library folder and run
update.packages will it be enough?  Or is there anything more to do? for


No.  It means what it says.  You have to run 
update.packages(checkBuilt=TRUE) to force a re-install.


Since you are the maintainer of a Windows-only package, I presume this 
is on Windows.  In which case it is an FAQ:

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What_0027s-the-best-way-to-upgrade_003f


example - for packages that I have which are already of the latest version
- would I still need to use install.packages() on them?

Thanks,
Tal



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrixplot (VIM package): can I add a colour scale?

2013-04-03 Thread Jim Lemon


On 04/03/2013 04:43 PM, Ross Marriott wrote:

Dear R Help,

I would like to know how to add a colour scale to a matrixplot please.

Here is the code that I've used to construct the matrix plot:

library(VIM)
SpatialPlot- function(YearxBlock,plot.title){
# Years are columns, Blocks are rows in this matrix
YearxBlock- as.matrix(YearxBlock)
# To set margins for the plot:
par(yaxt=n, oma=c(4,4,4,4),mar=c(1.5,1.5,1.5,1.5))
# Data coded according to a continuous color scheme, with lowest value light gray, maximum value 
as black, missing values as white:
matrixplot(YearxBlock,col=c(lightgray,black,white),axes=FALSE)
axis(side=1,col=black,at=c(1,4,7,10,13,16,19),
  labels=as.character(c(1993,1996,1999,2002,2005,2008,2011)),cex.axis=0.8)
mtext(text=block,side = 2, line = 1, outer = TRUE, font = 1)
title(main=plot.title,adj=0)
}


Hi Ross,
You can do this with the color.legend function (plotrix).

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data

Cecilia,

Thanks for providing a reproducible example.  Excellent.

You could use the ddply() function in the plyr package to fit the model for
each industry and year, keep the coefficients, and then estimate the fitted
and residual values.

Jean

library(plyr)
coef - ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z,
data=dat)$coef)
names(coef) - c(industry, year, b0, b1, b2)
final4 - merge(final3, coef)
newdata1 - transform(final4, Yhat = b0 + b1*X + b2*Z)
newdata2 - transform(newdata1, residual = Y-Yhat)
plot(as.factor(newdata2$firm), newdata2$residual)




On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:

 Hi R-helpers,



 My real data is a panel (unbalanced and with gaps in years) of thousands
 of firms, by year and industry, and with financial information (variables
 X, Y, Z, for example), the number of firms by year and industry is not
 always equal, the number of years by industry is not always equal.



 #reproducible example
 firm1-sort(rep(1:10,5),decreasing=F)
 year1-rep(2000:2004,10)
 industry1-rep(20,50)
 X-rnorm(50)
 Y-rnorm(50)
 Z-rnorm(50)
 data1-data.frame(firm1,year1,industry1,X,Y,Z)
 data1
 colnames(data1)-c(firm,year,industry,X,Y,Z)



 firm2-sort(rep(11:15,3),decreasing=F)
 year2-rep(2001:2003,5)
 industry2-rep(30,15)
 X-rnorm(15)
 Y-rnorm(15)
 Z-rnorm(15)
 data2-data.frame(firm2,year2,industry2,X,Y,Z)
 data2
 colnames(data2)-c(firm,year,industry,X,Y,Z)

 firm3-sort(rep(16:20,4),decreasing=F)
 year3-rep(2001:2004,5)
 industry3-rep(40,20)
 X-rnorm(20)
 Y-rnorm(20)
 Z-rnorm(20)
 data3-data.frame(firm3,year3,industry3,X,Y,Z)
 data3
 colnames(data3)-c(firm,year,industry,X,Y,Z)



 final1-rbind(data1,data2)
 final2-rbind(final1,data3)
 final2
 final3-final2[order(final2$industry,final2$year),]
 final3



 I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year,
 to obtain the estimates of b0, b1 and b2 by industry and year (for example
 I need to have de b0 for industry 20 and year 2000, for industry 20 and
 year 2001...). Then I need to calculate the fitted values and the residuals
 by firm so I need to keep b0, b1 and b2 in a way that I could do something
 like
 newdata1-transform(final3,Y'=b0+b1.X+b2.Z)
 newdata2-transform(newdata1,residual=Y-Y')
 or another way to keep Y' and the residuals in a dataframe with the
 columns firm and year.



 Until now I have been doing this in very hard way and because I need to do
 it several times, I need your help to get an easier way.



 Thank you,



 Cecília Carmo

 Universidade de Aveiro

 Portugal



 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Canadian politcal party colours in ggplot2

2013-04-03 Thread John Kane

A stupid question but does anyone know how to express the actual colours used 
by the main Canadian political parties?   I want to do a couple of ggplot2 
plots and have lines or rectangles that accurately reflect the party colours. 

I can probably play around with RColorBrewer or something to figure it out but 
if some some already has got them  it would save me some time especially with 
the NDP orange.

Thanks

John Kane
Kingston ON Canada


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Better way of writing R code

Katherine,

You don't need to convert rate_df into tabular form.  You just need to
categorize each row in currency_df into a tenor.  Then you can merge the
two data frames (by currency and tenor).  For example ...

# convert dates to R dates, to calculate the number of days to maturity
# I am assuming this is the number of days from the current date to the
maturity date
currency_df$maturity - as.Date(currency_df$maturity_date, %d/%m/%Y)
currency_df$current - as.Date(currency_df$current_date, %d/%m/%Y)
currency_df$days2mature - as.numeric(currency_df$maturity -
currency_df$current)

# categorize the number of days to maturity as you wish
# you may need to change the breaks= option to suit your needs
# read about the cut function to make sure you get the cut points included
in the proper category, ?cut
currency_df$tenor - cut(currency_df$days2mature, breaks=c(0, 1, 7, 14,
seq(from=30.5, length=12, by=30.5)),
labels=c(1 day, 1 week, 2 weeks, 1 month, paste(2:12, months)))

# merge the currency_df and rate_df
# this will work better with real data, since the example data you provided
didn't have matching tenors
both - merge(currency_df, rate_df, all.x=TRUE)

Jean



On Wed, Apr 3, 2013 at 5:21 AM, Katherine Gobin
katherine_go...@yahoo.comwrote:

 Dear R forum,

 (Pl note this is not a finance problem)

 I have two data.frames as

 currency_df = data.frame(current_date = c(3/4/2013, 3/4/2013,
 3/4/2013, 3/4/2013), issue_date = c(27/11/2012, 9/12/2012,
 14/01/2013, 28/02/2013), maturity_date = c(27/04/2013, 3/5/2013,
 14/6/2013, 28/06/2013), currency = c(USD, USD, GBP, SEK),
 other_currency = c(EURO, CAD, CHF, USD), transaction = c(Buy,
 Buy, Sell, Buy), units_currency = c(10, 25000, 15, 4),
 units_other_currency = c(78000, 25350, 99200, 6150))

 rate_df =
 data.frame(date =
 c(28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
  
 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013,
 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013),

 currency =  c(USD,USD,USD,USD, USD, USD,
 USD,USD,USD,USD, USD,USD,
 GBP,GBP,GBP,GBP,GBP,GBP,GBP,GBP, GBP,GBP, GBP,GBP,
 EURO,EURO,EURO,EURO,EURO,EURO,EURO, EURO, EURO,EURO,
 EURO,EURO),

 tenor = c(1 day,1 day,1 day,1 day,1 week,1 week,1 week,1
 week,2 weeks,2 weeks,2 weeks,2 weeks,1 day,1 day,1 day,1
 day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2
 weeks,1 day,1 day,1 day,1 day,1 week,1 week,1 week,1
 week,2 weeks,2 weeks,2 weeks,2 weeks),

 rate =
 c(0.156,0.157,0.157,0.155,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752,
  0.1752,0.48625,
 0.485,0.48625,0.4825,0.49,0.49125,0.4925,0.49,0.49375,0.49125,0.4925,
 0.49125,0.02643,0.02214,
 0.02214,0.01929,0.034,0.034,0.034125,0.034,0.044,0.044, 0.041,0.045))

 # ___

 # 1st data.frame

  currency_df
   current_date issue_date maturity_date currency
 1 3/4/2013 27/11/201227/04/2013  USD
 2 3/4/2013  9/12/2012  3/5/2013  USD
 3 3/4/2013 14/01/2013 14/6/2013  GBP
 4 3/4/2013 28/02/201328/06/2013  SEK
   other_currency transaction units_currency
 1
  EURO Buy 10
 2CAD Buy  25000
 3CHFSell 15
 4USD Buy  4
   units_other_currency
 178000
 2
  25350
 399200
 4 6150

 #
 ...

 # 2nd data.frame

  rate_df
 date currency   tenor rate
 1  28/3/2013  USD   1 day 0.156000
 2  27/3/2013  USD   1 day 0.157000
 3  26/3/2013  USD   1 day 0.157000
 4  25/3/2013  USD   1 day 0.155000
 5  28/3/2013  USD  1 week 0.175200
 6  27/3/2013  USD  1 week
  0.175200
 7  26/3/2013  USD  1 week 0.175200
 8  25/3/2013  USD  1 week 0.175200
 9  28/3/2013  USD 2 weeks 0.175200
 10 27/3/2013  USD 2 weeks 0.175200
 11 26/3/2013  USD 2 weeks 0.175200
 12 25/3/2013  USD 2 weeks 0.175200
 13 28/3/2013  GBP   1 day 0.486250
 14 27/3/2013  GBP   1 day 0.485000
 15 26/3/2013  GBP   1 day 0.486250
 16 25/3/2013  GBP   1 day 0.482500
 17 28/3/2013  GBP  1 week 0.49
 18 27/3/2013  GBP  1 week 0.491250
 19 26/3/2013  GBP  1 week 0.492500
 20
  25/3/2013  GBP  1 week 0.49
 21 28/3/2013  GBP 2 weeks 0.493750
 22 27/3/2013  GBP 2 weeks 0.491250
 23 26/3/2013  GBP 2 weeks 0.492500
 24 25/3/2013  GBP 2 weeks 0.491250
 25 28/3/2013 EURO   1 day 0.026430
 26 27/3/2013 EURO   1 day 0.022140
 27 26/3/2013 EURO   1 day 0.022140
 28 25/3/2013 EURO   1 day 0.019290
 29 28/3/2013 EURO  1 week 0.034000
 30 27/3/2013 EURO  1 week 0.034000
 31 26/3/2013 EURO  1 week 0.034125

Re: [R] Canadian politcal party colours in ggplot2

2013-04-03 Thread Ista Zahn

Hi John,

How about this:

library(XML)

party.info - 
readHTMLTable(http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Political_parties_and_politicians_in_Canada/list_of_parties;)
fed.party.info - party.info[[3]]
fed.party.colors - fed.party.info[, 2]
names(fed.party.colors) - gsub(^.*\\|, , fed.party.info[, 4])

tmp - data.frame(x=1:15,
  y=1:15,
  z=factor(rep(c(Canada Party, NDP, Socialist), each=5)))

ggplot(tmp, aes(x=x, y=y)) +
  geom_point(aes(color=z)) +
  scale_color_manual(values = fed.party.colors)

Best,
Ista

On Wed, Apr 3, 2013 at 9:08 AM, John Kane jrkrid...@inbox.com wrote:
 A stupid question but does anyone know how to express the actual colours used 
 by the main Canadian political parties?   I want to do a couple of ggplot2 
 plots and have lines or rectangles that accurately reflect the party colours.

 I can probably play around with RColorBrewer or something to figure it out 
 but if some some already has got them  it would save me some time especially 
 with the NDP orange.

 Thanks

 John Kane
 Kingston ON Canada

 
 FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer, p-values and all that

2013-04-03 Thread Nutter, Benjamin

My apologies for coming late into this conversation, but I'm curious about 
something in your response

You use the following code to peform a likelihood ratio test between an lm 
object and a mer object

fm0 - lm(distance~age,data=Orthodont)
fm2 - lmer(distance~age+(1|Subject),data=Orthodont,REML=FALSE)

ddiff - -2*c(logLik(fm0)-logLik(fm2))
pchisq(ddiff,1,lower.tail=FALSE)

It seems like this would be simple to roll up into a function, such as

lrtestGeneric - function(fit1, fit2){
  chisq - -2 * c(logLik(fit1) - logLik(fit2))
  df - abs(attributes(logLik(fit1))$df - attributes(logLik(fit2))$df)
   p - pchisq(chisq, df, lower.tail=FALSE)
   lrtest - data.frame(L.R.Chisq=chisq, d.f.=df, P=p)
   return(lrtest)
}

lrtestGeneric(fm0, fm2)


My question is about whether it is appropriate to use the degrees of freedom 
returned by logLik or if I should just use 1 degree of freedom when comparing a 
model without the random effect to one with the random effect.  For instance, 
logLik returns a difference of 3 between degrees of freedom in the models.  
Should I be using the 3 degrees of freedom in the likelihood ratio test, or is 
it better to go with 1? 


fit0 - lm(Reaction ~ Days, sleepstudy)
fit1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy)

lrtestGeneric(fit0, fit2)


Any education you can provide would be great appreciated.

Thanks

  Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
  Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 
445-1365


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ben Bolker
Sent: Wednesday, March 27, 2013 10:34 PM
To: David Winsemius
Cc: r-h...@stat.math.ethz.ch
Subject: Re: [R] lmer, p-values and all that

On 13-03-27 10:10 PM, David Winsemius wrote:
 
 On Mar 27, 2013, at 7:00 PM, Ben Bolker wrote:
 
 Michael Grant michael.grant at colorado.edu writes:
 Dear Help:

 I am trying to follow Professor Bates' recommendation, quoted by 
 Professor Crawley in The R Book, p629, to determine whether I should 
 model data using the 'plain old' lm function or the mixed model 
 function lmer by using the syntax anova(lmModel,lmerModel).
 Apparently I've not understood the recommendation or the proper 
 likelihood ratio test in question (or both) for I get this error
 message: Error: $ operator not defined for this S4 class.

  I don't have the R Book handy (some more context would be extremely 
 useful!  I would think it would count as fair use to quote the 
 passage you're referring to ...)
 
 This is the quoted Rhelp entry:
 
 http://tolstoy.newcastle.edu.au/R/help/05/01/10006.html
 
 (I'm unable to determine whether it applies to the question at hand.)

  OK, I misspoke -- sorry.  I think the lmer()/lme() likelihoods are actually 
comparable; it's GLMMs (glmer(), with no analogue in lme()-land except for 
MASS::glmmPQL, which doesn't give you log-likelihoods at all) where the problem 
arises.

  You can (1) use lme(), (2)  look at http://glmm.wikidot.com/faq for 
suggestions about testing random-effects terms (including perhaps don't do it 
at all), or (3) construct the likelihood ratio test yourself as follows:

library(nlme)
data(Orthodont)
fm1 - lme(distance~age,random=~1|Subject,data=Orthodont)
fm0 - lm(distance~age,data=Orthodont)
anova(fm1,fm0)[[p-value]]
detach(package:nlme,unload=TRUE)
library(lme4.0) ## stable version of lme4
fm2 - lmer(distance~age+(1|Subject),data=Orthodont,REML=FALSE)
anova(fm2,fm0) ## fails
ddiff - -2*c(logLik(fm0)-logLik(fm2))
pchisq(ddiff,1,lower.tail=FALSE)
## not identical to above but close enough

 

 Would someone be kind enough to point out my blunder?

  You should probably repost this to the 
 r-sig-mixed-mod...@r-project.org mailing list.

  My short answer would be: (1) I don't think you can actually use 
 anova() to compare likelihoods between lm() and lme()/lmer() fits in 
 the way that you want: *maybe* for lme() [don't recall], but almost 
 certainly not for lmer().  See http://glmm.wikidot.com/faq for 
 methods for testing significance/inclusion of random factors (short 
 answer: you should *generally* try to make the decision whether to 
 include random factors or not on _a priori_ grounds, not on the basis 
 of statistical tests ...)

  Ben Bolker



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

===


 Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals in America by U.S.News  
World Report (2012).  
Visit us online at http://www.clevelandclinic.org for a complete listing of our 
services, staff and locations.


Confidentiality Note:  This message is intended for use ...{{dropped:18}}

Re: [R] Canadian politcal party colours in ggplot2

2013-04-03 Thread Duncan Murdoch


On 03/04/2013 9:08 AM, John Kane wrote:

A stupid question but does anyone know how to express the actual colours used 
by the main Canadian political parties?   I want to do a couple of ggplot2 
plots and have lines or rectangles that accurately reflect the party colours.

I can probably play around with RColorBrewer or something to figure it out but 
if some some already has got them  it would save me some time especially with 
the NDP orange.


From this page http://www.ndp.ca/logos, NDP orange is 
CMYK=(0,60,100,0).  According to an online conversion tool, that's 
#FF6600 in #RGB notation.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] arrayInd and which

2013-04-03 Thread Keith S Weintraub

Folks,

I have Googled but not found much regarding arrayInd aside from the which 
help page.

Any good examples or docs on what arrayInd does that is better or different 
from which()?

In addition take the following 20x10 matrix:

td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 
6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 
6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 
3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 
6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6, 
6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 
4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L
))

I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That is my 
bounds are by column.

Is there a better way to do this other than:

bounds-c(rep(5,5), rep(4,5))
idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE)

 idxs
  row col
 [1,]   1   1
 [2,]  13   1
 [3,]  13   2
 [4,]   1   3
 [5,]   8   3
 [6,]  13   3
 [7,]   1   4
 [8,]  13   4
 [9,]   1   5
[10,]  13   5
[11,]   1   6
[12,]   4   6
[13,]  13   6
[14,]   4   7
[15,]  13   7
[16,]   1   8
[17,]   4   8
[18,]  13   8
[19,]   3   9
[20,]   1  10
[21,]  13  10

Lastly can you explain these results:

 td[idxs[10,]]
[1] 4 6

 td[idxs[10,1]]
[1] 4

 td[idxs[10,2]]
[1] 6

 td[idxs[10,3]]
Error: subscript out of bounds

Thanks in advance for your help,
KW

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating a bivariate joint t distribution in R


On Apr 2, 2013, at 11:07 PM, Jorge I Velez wrote:

 Dear Miao,
 
 Check
 
 require(MASS)
 ?mvrnorm
 

Also package mvtnorm has both Normal and t versions of its p,q, r funtions

 for some ideas.
 
 HTH,
 Jorge.-
 
 
 On Wed, Apr 3, 2013 at 4:57 PM, jpm miao  wrote:
 
 Hi,
 
   I conduct a panel data estimation and obtain estimators for two of the
 coefficients beta1 and beta2. R tells me the mean and covariance of the
 distribution of (beta1, beta2). Now I would like to find the distribution
 of the quotient beta1/beta2, and one way to do it is to simulate via the
 joint distribution (beta1,  beta2), where both beta1 and beta2 follow t
 distribution.
 
   How could we generate a joint t distrubuition in R?
 
   Thanks
 
 Miao
 


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Superscript

Hi,
How do I write a superscript within gsub?

I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1])

Thanks



-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question: how to convert raw to numeric

2013-04-03 Thread Henrik Bengtsson

See ?readBin - works also with raw objects.

Henrik
On Apr 3, 2013 1:18 AM, Mike Chen chenminyi1...@gmail.com wrote:

I know that there is a function to convert binary data to string named
rawToChar.but  I wander is there any similar function for Integer and
float.I need to read some binary file in integer and float data.
I can do this job in this way: (as below)
first convert 4 byte raw to bits then pack bits back to integer, but it
not work for float,I worry about the performance.

raw4 = raw_buffer[1:4]

bits = rawToBits(raw4)

packBits(bits, type = integer)


Best wishes

Really hope to get your response

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript

2013-04-03 Thread Duncan Murdoch


On 03/04/2013 11:01 AM, Shane Carey wrote:

Hi,
How do I write a superscript within gsub?

I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1])



gsub() doesn't work with expressions, it works with character strings.  
You're going to need to split your string into parts before and after 
the _mgkg string, and then put it together again as an expression.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript

Ok thanks


On Wed, Apr 3, 2013 at 4:15 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

 On 03/04/2013 11:01 AM, Shane Carey wrote:

 Hi,
 How do I write a superscript within gsub?

 I have the following: gsub(_mgkg,expression(paste(**mg
 kg^{-1})),names[1])


 gsub() doesn't work with expressions, it works with character strings.
  You're going to need to split your string into parts before and after the
 _mgkg string, and then put it together again as an expression.

 Duncan Murdoch




-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript

2013-04-03 Thread Jeff Newmiller

gsub searches strings, not expressions.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Shane Carey careys...@gmail.com wrote:

Hi,
How do I write a superscript within gsub?

I have the following: gsub(_mgkg,expression(paste(mg
kg^{-1})),names[1])

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] arrayInd and which

KW,

I don't know anything about arrayInd, so I can't help with that question.

In order to determine if there is a better way to do your comparison
depends largely on what you want to do with the results.  The way that
you're doing it now seems fine to me, but I wonder what you want to use the
indexes for.  You could approach the issue from another angle, which
preserves the original matrix structure.

compare - cbind(td[, 1:5]=5, td[, 6:10]=4)
compare
td[compare]
td2 - td
td2[!compare] - NA
td2

You are having difficulty extracting.  The information that you are
providing are all vectors and scalars.

td[idxs[10,]]
td[c(13, 5)]

td[idxs[10,1]]
td[13]

If you want to extract by row and column, you have to provide matrices.

td[idxs[10, , drop=FALSE]]
td[t(matrix(c(13, 5)))]

The error in your last attempt is caused by trying to extract row 10 column
3 from a matrix (idxs) that has only 2 columns.

td[idxs[10,3]]
idxs[10,3]
dim(idxs)

Hope this helps.

Jean



On Wed, Apr 3, 2013 at 9:53 AM, Keith S Weintraub kw1...@gmail.com wrote:

 Folks,

 I have Googled but not found much regarding arrayInd aside from the
 which help page.

 Any good examples or docs on what arrayInd does that is better or
 different from which()?

 In addition take the following 20x10 matrix:

 td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6,
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6,
 6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6,
 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6,
 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2,
 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6,
 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6,
 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6,
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6,
 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L
 ))

 I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That
 is my bounds are by column.

 Is there a better way to do this other than:

 bounds-c(rep(5,5), rep(4,5))
 idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE)

  idxs
   row col
  [1,]   1   1
  [2,]  13   1
  [3,]  13   2
  [4,]   1   3
  [5,]   8   3
  [6,]  13   3
  [7,]   1   4
  [8,]  13   4
  [9,]   1   5
 [10,]  13   5
 [11,]   1   6
 [12,]   4   6
 [13,]  13   6
 [14,]   4   7
 [15,]  13   7
 [16,]   1   8
 [17,]   4   8
 [18,]  13   8
 [19,]   3   9
 [20,]   1  10
 [21,]  13  10

 Lastly can you explain these results:

  td[idxs[10,]]
 [1] 4 6

  td[idxs[10,1]]
 [1] 4

  td[idxs[10,2]]
 [1] 6

  td[idxs[10,3]]
 Error: subscript out of bounds

 Thanks in advance for your help,
 KW

 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Process substitution and read.table/scan

2013-04-03 Thread Elena Grassi

Hello, I did the same question on stackoverflow
(http://stackoverflow.com/questions/15784373/process-substitution) but
did not understand completely the issue so I'm reporting it here:


I've given a look around about what puzzles me and I only found this:
http://stackoverflow.com/questions/4274171/do-some-programs-not-accept-process-substitution-for-input-files

which is partially helping, but I really would like to understand the
full story. I noticed that some of my R scripts give different (ie.
wrong) results when I use process substitution.

I tried to pinpoint the problem with a test case:

This script:

#!/usr/bin/Rscript

args  - commandArgs(TRUE)
file  -args[1]
cat(file)
cat(\n)
data - read.table(file, header=F)
cat(mean(data$V1))
cat(\n)

with an input file generated in this way:

$ for i in `seq 1 10`; do echo $i  p; done
$ for i in `seq 1 500`; do cat p  test; done

leads me to this:

$ ./mean.R test
test
5.5

$ ./mean.R (cat test)
/dev/fd/63
5.501476

Further tests reveal that some lines are lost...but I would like to
understand why. Does read.table (scan gives the same results) uses
seek?

Ps. with a smaller test file (100) an error is reported:

$./mean.R (cat test3)
/dev/fd/63
Error in read.table(file, header = F) : no lines available in input
Execution halted


Other notes: with a modified script that uses scan the results are the same.
Printing the whole data.frame results in 5001 lines in the first case
(which is correct) and only 3050 with the process redirection.

I checked read.table source code and I saw that it goes around in the
file to check for column types and so on...I thought that this was an
explanation for this problem but I would prefer an error message
reported instead than a result gotten from partial data...then someone
on stackoverflow pointed me to fifo() which solves the problem (i.e
the mean is reported correctly even with the process redirection) and
therefore I'm even more puzzled: does fifo() allows seeks and peeks
around a named pipe?
I'm willing to read the relevant code to understand what's really
happening (and even help if someone thinks that this issue could
represent a small bug) but I would really appreciate some pointers.

Here the sessionInfo() and other possibly relevant things:
 sessionInfo()
R version 3.0.0 beta (2013-03-23 r62384)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=CLC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

$ uname -a
Linux femto 3.6-trunk-amd64 #1 SMP Debian 3.6.9-1~experimental.1
x86_64 GNU/Linux

I use the debian R package: r-base-core, 3.0.0~20130324-1

Thanks,
Elena Grassi

ps.
I started on R-help as long as this could be of general interest,
sorry if that's a bad call.
--
$ pom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] scanning data in R

2013-04-03 Thread Naser Jamil

Dear R-user,
May I seek your suggestion.  I have a data file 'stop' to be scanned in R.
But I want to ignore one specific number '21' there. Putting differently, I
want to get all numbers in the file except 21. Is there any command to
achieve it?

--

b-scan(F:\\stop.txt)

--

Many thanks for your kind attention.

Regards,
Jamil.
15  15  16  15  17  16  16  15  17  15  


16  17  19  17  19  15  16  16  19  18  
19  17  17  19  16  17  16  20  15  16  
16  15  18  19  17  18
16  18  16  17  21  21  17  15  15  20  


16  20  17  21  19  15  15  15  18  20  
16  17  16  15  16  15  16  16  18  16  
15  21  20  15  16  17
18  18  20  17  18  18  21  16  15  16  
20  17  19  17  18  19  17  21  21  16  
15  19  21  15  20  20
16  16  20  18  21  


21  15  15  19  15  16  15  17  15  21  
20  16  15  16  16  18  17  16  18  16  
16  16  21  16  15  20
16  19  17  14  17  15  16  15  17  16  
15  15  20  15  16  15  15  16  15  20  
19  15  15  17  17  15
18  16  15  


14  21  19  20  15  17  15  15  17  19  
15  15  16  18  17  16  16  16  16  18  
21  15  16  16  21  19
17  15  16  15  18  16  20  17  16  16  
21  15  17  20  21  16  16  17  21  16  
16  20  19  17  15  16
18  16  16  15  16  18  19  16  15  21  
20  19  20  20  20  16  18  16  15  20  

16  16  18  19  16  18  21  16  21  19  
19  17  18  15  18  16  19  17  20  20  
16  18  16  19  20  
21  15  16  18  19  16  17  16  15  17  
16  15  21  16  15  18  18  15  21  19  
16  19  15  15  17  
18  17  16  15  16  18  18  19  17  16  
21  16  19  16  15  16  20  17  17  21  
21  20  16  17  19  
16  15  19  18  17  15  15  15  19  20  
18  21  16  15  15  20  20  17  19  15  
19  16  15  15  19  
16  16  19  15  15  19  18  21  15  15  
18  21  16  16  17  17  16  15  18  16  
15  20  17  20  15  
16  15  15  20  15  16  17  16  17  16  
21  17  18  17  15  20  18  16  17  16  
17  15  15  17  20  
17  18  15  16  20  16  15  15  18  16  
18  16  18  15  16  18  15  17  18  18  
16  
15  19  16  19  21  15  20  17  18  15  
16  19  17  16  18  18  15  19  15  15  
17  15  19  19  16  
20  16  16  15  19  15  16  15  16  15  
15  17  20  15  20  16  20  15  17  16  
15  16  16  16  17  
16  16  21  21  15  15  17  16  16  19  
20  16  16  16  16  15  16  16  20  16  
16  20  18  19  15  
21  16  15  15  21  20  20  19  17

[R] (no subject)

2013-04-03 Thread Ana Lucía Cárdenas Martínez

Hello, 

I want to perform a latent class analysis using poLCA package. My formula
is: 

substances - cbind(subs1, subs2, subs3, subs4, subs5, subs6) ~
gender+age+education+income+occupation+urban+dbehavior+incarceration+treatment+depression+alcriteria
 

I want to include sample weights in the model, I have read that poLCA does
not take into account weights, but when I introduce them, it seems that the
model is running correctly. This is the command I am using: 

lca2 - poLCA(substances, ena, nclass=2, graphs=TRUE, maxiter=2000,
(weights=p_adicc)) 

my question is, if the results are really taking into account the weights ?
if not, how can I introduce weights into my analysis? 

Thank you for your help. 
Regards, 
Ana
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] install packages and time-out

2013-04-03 Thread Menezes, David

I use R / RStudio at work.  Recently, I tried to download XLConnect package 
using

install.packages(XLConnect)

However, I got the following error message:

Installing package(s) into 'WORKCOMPUTER SPECIFIC STUFF HERE 
Documents/R/win-library/2.15'
(as 'lib' is unspecified)
trying URL 
'http://www.stats.bris.ac.uk/R/bin/windows/contrib/2.15/XLConnect_0.2-5.zip'
Warning in install.packages :
  InternetOpenUrl failed: 'The operation timed out'
Error in download.file(url, destfile, method, mode = wb, ...) :
  cannot open URL 
'http://www.stats.bris.ac.uk/R/bin/windows/contrib/2.15/XLConnect_0.2-5.zip'
Warning in install.packages :
  download of package 'XLConnect' failed

I think the reason for the problem is that I'm in a corporate environment and 
our firewall spends 2-3 mins scanning the zip file before releasing it.  I 
think the solution requires me to tell R to override (what I believe is) the 60 
second time-out default setting when you run install.packages.  At least I 
think install.packages calls download.file  I need to extend the time-out 
window, so that the firewall can do its thing.

Some notes, which I think are relevant:

Ø  Although I'm working in a corporate environment, I have worked with my IT 
team to ensure that HTTP/proxy measures allow on the fly package download, e.g. 
no problems with install.packages(ggplot2)

Ø  I've also tried using setInternet2(TRUE) before the above, which made no 
difference

Ø  I've also tried using: options(timeout=300) before the install.packages 
command, in order to attempt to force a five-minute window ; this had no affect

Ø  The target file does exist on Bristol's servers if you search for the 
website using internet explorer

Ø  I can download the package manually (although our security measures spend 
some time scanning this particular package).  Once saved locally, I 
successfully run install.packages to load the file from the local drive, but 
that's hardly the way forward if I want to keep things up to date!

Ø  I have tried Imperial College's server, which also fails, so it's not a 
mirror issue.

Ø  I have run the same code at home to the same Bristol mirror and get no 
issues.

A bit stuck, therefore!


**
Atrium Underwriters Ltd is authorised and regulated by the Financial Services 
Authority.
Atrium Insurance Agency Ltd is authorised and regulated by the Financial 
Services Authority.
Atrium Insurance Agency (Asia) Pte. Ltd. is authorised and regulated by the 
Monetary Authority of Singapore.

The information in this email, and in any of its attachments, is confidential 
and may be legally privileged.  It is intended solely for the addressee.  
Access to this email, and to any of its attachments, by anyone else is 
unauthorised.  If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.  If you have received this email in error 
please notify us immediately (by telephone on +44 (0)20 7327 4877 or by return 
email) and destroy the message, together with any attachments and all copies in 
your possession. Any views expressed in this email are not necessarily those of 
Atrium Underwriting Group Ltd or any of its subsidiaries.
Atrium Underwriting Group Ltd, Room 790, Lloyd's, 1 Lime Street, London EC3M 
7DQ. Registered in England No. 2860390. Atrium Insurance Agency Ltd, Room 790, 
Lloyd's, 1 Lime Street, London EC3M 7DQ. Registered in England No. 5993519. 
Atrium Underwriters Ltd, Room 790, Lloyd's, 1 Lime Street, London EC3M 7DQ. 
Registered in England No. 1958863 Registered Office as above
This footnote also confirms that this email message has been swept by MIMECast 
for the presence of computer viruses.
**
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] transformation maximizing the Kurtosis

2013-04-03 Thread Fluss

Hello,
I have multivariate data - matrix X with n rows and p columns. I want to do
a linear transformation V=XA similar to PCA but maximizing the Kurtosis
instead of the variance. The purpose is to identify potential outliers.
I have seen this paper (section 3.1)
http://halweb.uc3m.es/esp/Personal/personas/dpena/articles/finaljasa.pdf
but the result didnt give me orthogonal components.

Thank you
Ronen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kernlab::kkmeans initial centers

2013-04-03 Thread Ahmed Elgohary

I am not asking about k-means. I am asking about passing initial
assignments to the kernel k means algorithm. In kernel k-means, centroids
are not defined explicitly. I tried passing initial centroids in the
original feature space though. But, it did not work. The provided example
just sets the number of clusters and lets the method assigns random cluster
memberships. I am also wondering about the output centers. Are they the
centers in the eigenvectors space? If so, what kind of initial centers I am
supposed to pass to the method?

--ahmed

On Wed, Apr 3, 2013 at 12:54 AM, Pascal Oettli kri...@ymail.com wrote:

Hi,

I would say that if you know what k-means algorithm is, you know the
meaning of initial cluster centers.

You can also check the output of the provided example.

Regards,
Pascal

On 04/03/2013 09:27 AM, Ahmed Elgohary wrote:

Hi,

I am trying to pass initial cluster assignments to the kkmeans
methodhttp://rss.acs.unt.edu/**Rdoc/library/kernlab/html/**kkmeans.htmlhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/kkmeans.html
of

kernlab. It is not clear to me how I can set the parameter
*centers* with initial cluster centers as stated in the documentation?

thanks,

--ahmed

[[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kernlab::kkmeans initial centers

2013-04-03 Thread Pascal Oettli

Hi,

The output of the provided example shows you which kind of matrix you need for 
cluster centers. In the example, 3 centers per variable (4) are defined. Thus, 
the initial cluster centers are defined by a 3x4 matrix.

HTH,
Pascal




 De : Ahmed Elgohary aagoh...@gmail.com
À : r-help@r-project.org 
Cc : Pascal Oettli kri...@ymail.com 
Envoyé le : Mercredi 3 avril 2013 22h53
Objet : Re: [R] kernlab::kkmeans initial centers


I am not asking about k-means. I am asking about passing initial assignments to 
the kernel k means algorithm. In kernel k-means, centroids are not defined 
explicitly. I tried passing initial centroids in the original feature space 
though. But, it did not work. The provided example just sets the number of 
clusters and lets the method assigns random cluster memberships. I am also 
wondering about the output centers. Are they the centers in the eigenvectors 
space? If so, what kind of initial centers I am supposed to pass to the method?

--ahmed


On Wed, Apr 3, 2013 at 12:54 AM, Pascal Oettli kri...@ymail.com wrote:

Hi,

I would say that if you know what k-means algorithm is, you know the meaning 
of initial cluster centers.

You can also check the output of the provided example.

Regards,
Pascal


On 04/03/2013 09:27 AM, Ahmed Elgohary wrote:

Hi,

I am trying to pass initial cluster assignments to the kkmeans

methodhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/kkmeans.htmlof

kernlab. It is not clear to me how I can set the parameter

*centers* with initial cluster centers as stated in the documentation?

thanks,

--ahmed

        [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ask help

2013-04-03 Thread DONG Echo

Hello!
 I am eager to learn if I only have a data about the relationship between otu 
and sample, could I use the function capscale, and final make a point plot that 
x-axis is CAP1 and y-axis is CAP2?   Besides, what function could I use to make 
the different rarefaction curve with different color in the a plot in R? 
Appreciate very much.

 Sincerely!

Echo

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kernlab::kkmeans initial centers

2013-04-03 Thread Ahmed Elgohary

thanks for your reply but, I am still confused what kind of centroids the
3x4 matrix represents.
In fact, I passed that 3x4 centers matrix as the initial centers to kkmeans
method and I am getting the error
Error in crossprod(x[vgr[[i]], vgr[[i]], drop = FALSE], w[vgr[[i]]]) :
object 'vgr' not found

--ahmed

On Wed, Apr 3, 2013 at 10:30 AM, Pascal Oettli kri...@ymail.com wrote:

Hi,

The output of the provided example shows you which kind of matrix you need
for cluster centers. In the example, 3 centers per variable (4) are
defined. Thus, the initial cluster centers are defined by a 3x4 matrix.

HTH,
Pascal

--
*De :* Ahmed Elgohary aagoh...@gmail.com
*À :* r-help@r-project.org
*Cc :* Pascal Oettli kri...@ymail.com
*Envoyé le :* Mercredi 3 avril 2013 22h53
*Objet :* Re: [R] kernlab::kkmeans initial centers

--ahmed

On Wed, Apr 3, 2013 at 12:54 AM, Pascal Oettli kri...@ymail.com wrote:

Hi,

I would say that if you know what k-means algorithm is, you know the
meaning of initial cluster centers.

You can also check the output of the provided example.

Regards,
Pascal

On 04/03/2013 09:27 AM, Ahmed Elgohary wrote:

Hi,

I am trying to pass initial cluster assignments to the kkmeans
methodhttp://rss.acs.unt.edu/**Rdoc/library/kernlab/html/**kkmeans.htmlhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/kkmeans.html
of

kernlab. It is not clear to me how I can set the parameter
*centers* with initial cluster centers as stated in the documentation?

thanks,

--ahmed

[[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

Re: [R] scanning data in R

2013-04-03 Thread David L Carlson

You can certainly do it after scanning all the numbers in with

b - scan(F:\\stop.txt, what=integer())
b - b[b!=21]

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Naser Jamil
 Sent: Wednesday, April 03, 2013 4:33 AM
 To: R help
 Subject: [R] scanning data in R
 
 Dear R-user,
 May I seek your suggestion.  I have a data file 'stop' to be scanned in
 R.
 But I want to ignore one specific number '21' there. Putting
 differently, I
 want to get all numbers in the file except 21. Is there any command to
 achieve it?
 
 --
 
 b-scan(F:\\stop.txt)
 
 --
 
 Many thanks for your kind attention.
 
 Regards,
 Jamil.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Superscript and for loops

Hi,

If I have data as follows:
DATA_names-c(
A mg kg
B mg kg
C mg kg
D mg kg
E mg kg
F mg kg
G mg kg
H mg kg

How do I convert to:
 -1
A (mg kg   )
 -1
B (mg kg   )
 -1
C (mg kg   )
 -1
D (mg kg   )
 -1
E (mg kg   )
 -1
F (mg kg   )
 -1
G (mg kg   )
 -1
H (mg kg   )

I have lots of elements and I need to do this automatically in a for loop
or the like?

Thanks,

Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] scanning data in R

2013-04-03 Thread Robert Baer

On 4/3/2013 4:33 AM, Naser Jamil wrote:
 Dear R-user,
 May I seek your suggestion.  I have a data file 'stop' to be scanned in R.
 But I want to ignore one specific number '21' there. Putting differently, I
 want to get all numbers in the file except 21. Is there any command to
 achieve it?

 --

 b-scan(F:\\stop.txt)

 --
Well, I don't know what is in stop.txt, but I will assume  it is a 
string of numbers or strings separated by the end of line character and 
then a terminating EOL.  Read it in as you have it and then:
b - b[-21]

If you are reading in a dataframe, simply use what you have and then do:
b - b[-21, ]

and you should have what you want, or perhaps you want
b[21, ] -  NA  # indicate row 21 as a missing value


 Many thanks for your kind attention.

 Regards,
 Jamil.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 

Robert W. Baer, Ph.D.
Professor of Physiology
Kirksille College of Osteopathic Medicine
A. T. Still University of Health Sciences
Kirksville, MO 63501 USA


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript

2013-04-03 Thread William Dunlap

Are you trying to convert a column name like Na_mgkg to a plot label like Na 
(mg kg^-1) ?
If so you will have to use both string manipulation functions like gsub() and 
expression manipulating
functions like bquote().  E.g.,

f - function (name) 
{
   # add other suffices and their corresponding plotmath expressions to the list
   env - list2env(list(mgkg = bquote(mg ~ kg^{-1}),
ugkg = bquote(mu * g ~ kg^{-1})),
   parent = emptyenv())
   pattern - paste0(_(, paste(objects(env), collapse=|), ))
   bquoteExpr - parse(text=gsub(pattern,
 ~(.(\\1)),
 name))[[1]]
   # I use do.call() to work around the fact that bquote's first argument is 
not evaluated.
   do.call(bquote, list(bquoteExpr, env))
}

d - data.frame(Na_mgkg=1:10, K_ugkg=10:1)
plot(Na_mgkg ~ K_ugkg, data=d, xlab=f(K_ugkg), ylab=f(Na_mgkg))

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Shane Carey
 Sent: Wednesday, April 03, 2013 8:02 AM
 To: r-help@r-project.org
 Subject: [R] Superscript
 
 Hi,
 How do I write a superscript within gsub?
 
 I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1])
 
 Thanks
 
 
 
 --
 Shane
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Model Selection based on individual t-values with the largest possible number of variables in regression

2013-04-03 Thread Frank Harrell

To say that these strategies represent bad statistical practice is to put it
mildly.
Frank

mister_O wrote
Dear R-Community,

When writing my master thesis, I faced with difficult issue. Analyzing the
capital structure determinants I have one dependent variable (Total debt
ratio = TD) and 15 independent ones. At the first stage I normalized my
data by deleting outliers from each variable (Pairwise deletion) and in
the result I got every variable to be with different length. Now when
selecting relevant variables for the best model, neither stepwise nor
forward or backward procedures don't work perfectly since there are a
number of other combinations of variables wich have also high t-values.
Thus, wichever model I pick, you never know whether this model is
trustworthy. I tried to calculate all possible combinations of independent
variables, but since I have 15 ones, there are thousands of such
combinations and R simply refuses to calculate them! (computer crashes) I
wonder if you can help me to write the code in R in order to find the
model wich include as many variables as it possible with significant
t-values?

cheers, Oleg

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/Model-Selection-based-on-individual-t-values-with-the-largest-possible-number-of-variables-in-regresn-tp4663174p4663202.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] DUD (Does not Use Derivatives) for nonlinear

2013-04-03 Thread Prof J C Nash (U30A)

 Date: Tue, 2 Apr 2013 06:59:13 -0500

From: Paul Johnson pauljoh...@gmail.com
To: qi A send2...@gmail.com
Cc: R-help r-help@r-project.org
Subject: Re: [R] DUD (Does not Use Derivatives) for nonlinear
regression in   R?
Message-ID:
CAErODj_1pK8raHyAme_2Wt5zQZ_HqOhRjQ62bChhkORWbW=o...@mail.gmail.com
Content-Type: text/plain

On Apr 1, 2013 1:10 AM, qi A send2...@gmail.com wrote:

Hi, All

SAS has DUD (Does not Use Derivatives)/Secant Method for nonlinear
regression, does R offer this option for nonlinear regression?

I have read the helpfile for nls() and could not find such option, any
suggestion?

nelder-mead is default algorithm in optim. It does not use derivatives. dud
is from same generation, but John Nash recommended N-M method.

Pj

Thanks,

Derek

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

I'm not sure where Paul is saying I recommended N-M, but I do think it
is important with optimization methods to recommend a method FOR some
particular class of problems or for a problem solving situation. A
blanket this is good recommendation cannot be made.

I chose NM (slightly BEFORE DUD was released) as the only derivative
free method in my 1979 book as it had the best balance of reliability
and performance for an 8K machine (code and data) that I was using in
1975. It still works well as a first-try method for optimization, but
generally is less efficient than gradient based methods, in particular
because it does not have a good way to know it is finished. As a
derivative-free method, it is not too bad, particularly in the nmk
version in the dfoptim package. Indeed, I wish this version were put in
optim() as the default, since it can deal with bounds constraints,
though slightly less generally and less well than bobyqa or some other
methods, and there are a couple of minor details it handles better than
N-M in optim() that give it better performance and reliability.

Readers should notice that there are lots of conditional statements
above. It's a matter of selecting the right tool for the job. If you
have lots of compute power and don't mind wasting it, NM will likely get
somewhere near some or other optimum of your problem. It won't do it
terribly fast, and you'd better make sure you didn't just run out of
iterations or other measure that stops the program before it has
decided it is done. Also that the answer is the one you want. Most
optimization problems have more than one answer, and the wrong ones
often seem to be easier to find.

JN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] scanning data in R

2013-04-03 Thread arun

vec- scan(stop.txt)
#Read 635 items
vec1-vec[vec!=21]
 length(vec1)
#[1] 584
any(vec1==21)
#[1] FALSE


A.K.

- Original Message -
From: Naser Jamil jamilnase...@gmail.com
To: R help r-help@r-project.org
Cc: 
Sent: Wednesday, April 3, 2013 5:33 AM
Subject: [R] scanning data in R

Dear R-user,
May I seek your suggestion.  I have a data file 'stop' to be scanned in R.
But I want to ignore one specific number '21' there. Putting differently, I
want to get all numbers in the file except 21. Is there any command to
achieve it?

--

b-scan(F:\\stop.txt)

--

Many thanks for your kind attention.

Regards,
Jamil.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript and for loops


On Apr 3, 2013, at 9:06 AM, Shane Carey wrote:

 Hi,
 
 If I have data as follows:
 DATA_names-c(
 A mg kg
 B mg kg
 C mg kg
 D mg kg
 E mg kg
 F mg kg
 G mg kg
 H mg kg
 
 How do I convert to:
 -1
 A (mg kg   )
 -1
 B (mg kg   )
 -1
 C (mg kg   )
 -1
 D (mg kg   )
 -1
 E (mg kg   )
 -1
 F (mg kg   )
 -1
 G (mg kg   )
 -1
 H (mg kg   )
 
 I have lots of elements and I need to do this automatically in a for loop
 or the like?

You haven't described the task in very much detail, so Bill Dunlap's 
language-oriented (more expressive as it were) solution my be what you really 
need. Nonetheless, this answer stays on the character-plane of R's conceptual 
levels:

 gsub(mg kg, (mg kg)^-1, DATA_names)
[1] A (mg kg)^-1 B (mg kg)^-1 C (mg kg)^-1 D (mg kg)^-1 E (mg kg)^-1 
F (mg kg)^-1
[7] G (mg kg)^-1 H (mg kg)^-1

If you wanted these in an expression vector suitable for labels on a barplot or 
such:

sapply(gsub(mg kg, (mg kg)^-1, DATA_names), as.expression)

expression(`A (mg kg)^-1` = A (mg kg)^-1, `B (mg kg)^-1` = B (mg kg)^-1, 
`C (mg kg)^-1` = C (mg kg)^-1, `D (mg kg)^-1` = D (mg kg)^-1, 
`E (mg kg)^-1` = E (mg kg)^-1, `F (mg kg)^-1` = F (mg kg)^-1, 
`G (mg kg)^-1` = G (mg kg)^-1, `H (mg kg)^-1` = H (mg kg)^-1)

In practice:

 pos - barplot(1:length(DATA_names))
 text(x=pos,y=-1, xpd=TRUE, srt=45,
   labels= sapply( gsub(mg kg, (mg kg)^-1, DATA_names),
   as.expression))


-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] arrayInd and which


On Apr 3, 2013, at 7:53 AM, Keith S Weintraub wrote:

 Folks,
 
 I have Googled but not found much regarding arrayInd aside from the which 
 help page.
 
 Any good examples or docs on what arrayInd does that is better or different 
 from which()?
 
 In addition take the following 20x10 matrix:
 
 td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 
 6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 
 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 
 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 
 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6, 
 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 
 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 
 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L
 ))
 
 I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That is 
 my bounds are by column.
 
 Is there a better way to do this other than:
 
 bounds-c(rep(5,5), rep(4,5))
 idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE)
 
 idxs
  row col
 [1,]   1   1
 [2,]  13   1
 [3,]  13   2
 [4,]   1   3
 [5,]   8   3
 [6,]  13   3
 [7,]   1   4
 [8,]  13   4
 [9,]   1   5
 [10,]  13   5
 [11,]   1   6
 [12,]   4   6
 [13,]  13   6
 [14,]   4   7
 [15,]  13   7
 [16,]   1   8
 [17,]   4   8
 [18,]  13   8
 [19,]   3   9
 [20,]   1  10
 [21,]  13  10
 
 Lastly can you explain these results:
 
 td[idxs[10,]]
 [1] 4 6
 
 td[idxs[10,1]]
 [1] 4
 
 td[idxs[10,2]]
 [1] 6
 
 td[idxs[10,3]]
 Error: subscript out of bounds

This has nothing to do with the behavior of arrayInd and everything to do with 
the behavior of [.

 td[idxs[10,drop=FALSE] ]
[1] 4

When extracting from a matrix with a result of asingle row the extracted object 
looses its matrix attributes and becomes a numeric vector. That behavior is 
prevented with drop=FALSE and desire results accrue.

This would not have been a puzzle if you had chose multiple rows at a time:

  td [ idxs[1:2, ] ]
[1] 1 4
 td [ idxs ]
 [1] 1 4 1 1 3 3 1 5 3 4 2 1 1 5 3 2 2 4 1 2 3 3


-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data

2013-04-03 Thread Peter Ehlers


A few minor improvements to Jean's post suggested inline below.

On 2013-04-03 05:41, Adams, Jean wrote:

Cecilia,

Thanks for providing a reproducible example.  Excellent.

You could use the ddply() function in the plyr package to fit the model for
each industry and year, keep the coefficients, and then estimate the fitted
and residual values.

Jean

library(plyr)
coef - ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z,
data=dat)$coef)
names(coef) - c(industry, year, b0, b1, b2)
final4 - merge(final3, coef)
newdata1 - transform(final4, Yhat = b0 + b1*X + b2*Z)
newdata2 - transform(newdata1, residual = Y-Yhat)
plot(as.factor(newdata2$firm), newdata2$residual)


Suggestion 1:
Use the extractor function coef() and also avoid using the name
of an R function as a variable name:

 Coef - ddply(, function(dat) coef(lm()))

Suggestion 2:
Use plyr's mutate() to do both transforms at once:

 newdata - mutate(final4,
   Yhat = b0 + b1*X + b2*Z,
   residual = Y-Yhat)

[Or you could use within(), but I now find mutate handier, mainly
because it doesn't 'reverse' the order of the new variables.]

Suggestion 3:
Use the 'data=' argument in the plot:

 boxplot(residual ~ firm, data = newdata)

Peter Ehlers



On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote:


Hi R-helpers,



My real data is a panel (unbalanced and with gaps in years) of thousands
of firms, by year and industry, and with financial information (variables
X, Y, Z, for example), the number of firms by year and industry is not
always equal, the number of years by industry is not always equal.



#reproducible example
firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
X-rnorm(50)
Y-rnorm(50)
Z-rnorm(50)
data1-data.frame(firm1,year1,industry1,X,Y,Z)
data1
colnames(data1)-c(firm,year,industry,X,Y,Z)



firm2-sort(rep(11:15,3),decreasing=F)
year2-rep(2001:2003,5)
industry2-rep(30,15)
X-rnorm(15)
Y-rnorm(15)
Z-rnorm(15)
data2-data.frame(firm2,year2,industry2,X,Y,Z)
data2
colnames(data2)-c(firm,year,industry,X,Y,Z)

firm3-sort(rep(16:20,4),decreasing=F)
year3-rep(2001:2004,5)
industry3-rep(40,20)
X-rnorm(20)
Y-rnorm(20)
Z-rnorm(20)
data3-data.frame(firm3,year3,industry3,X,Y,Z)
data3
colnames(data3)-c(firm,year,industry,X,Y,Z)



final1-rbind(data1,data2)
final2-rbind(final1,data3)
final2
final3-final2[order(final2$industry,final2$year),]
final3



I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year,
to obtain the estimates of b0, b1 and b2 by industry and year (for example
I need to have de b0 for industry 20 and year 2000, for industry 20 and
year 2001...). Then I need to calculate the fitted values and the residuals
by firm so I need to keep b0, b1 and b2 in a way that I could do something
like
newdata1-transform(final3,Y'=b0+b1.X+b2.Z)
newdata2-transform(newdata1,residual=Y-Y')
or another way to keep Y' and the residuals in a dataframe with the
columns firm and year.



Until now I have been doing this in very hard way and because I need to do
it several times, I need your help to get an easier way.



Thank you,



Cecília Carmo

Universidade de Aveiro

Portugal



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] arrayInd and which


On Apr 3, 2013, at 10:59 AM, David Winsemius wrote:

 
 On Apr 3, 2013, at 7:53 AM, Keith S Weintraub wrote:
 
 Folks,
 
 I have Googled but not found much regarding arrayInd aside from the which 
 help page.
 
 Any good examples or docs on what arrayInd does that is better or different 
 from which()?
 
 In addition take the following 20x10 matrix:
 
 td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 
 6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 
 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 
 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 
 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6, 
 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 
 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 
 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L
 ))
 
 I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That is 
 my bounds are by column.
 
 Is there a better way to do this other than:
 
 bounds-c(rep(5,5), rep(4,5))
 idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE)
 
 idxs
 row col
 [1,]   1   1
 [2,]  13   1
 [3,]  13   2
 [4,]   1   3
 [5,]   8   3
 [6,]  13   3
 [7,]   1   4
 [8,]  13   4
 [9,]   1   5
 [10,]  13   5
 [11,]   1   6
 [12,]   4   6
 [13,]  13   6
 [14,]   4   7
 [15,]  13   7
 [16,]   1   8
 [17,]   4   8
 [18,]  13   8
 [19,]   3   9
 [20,]   1  10
 [21,]  13  10
 
 Lastly can you explain these results:
 
 td[idxs[10,]]
 [1] 4 6
 
 td[idxs[10,1]]
 [1] 4
 
 td[idxs[10,2]]
 [1] 6
 
 td[idxs[10,3]]
 Error: subscript out of bounds
 
 This has nothing to do with the behavior of arrayInd and everything to do 
 with the behavior of [.
 
 td[idxs[10,drop=FALSE] ]
 [1] 4

Arrgh. The explanation was correct, but there is a missing comma. Should be:

   td[idxs[10, ,drop=FALSE] ]

Only shows up if you chose a different row

 td [ idxs[12, drop=FALSE] ]
[1] 6  WRONG

 td [ idxs[12, , drop=FALSE] ]
[1] 1  Correct

-- 
David.

 When extracting from a matrix with a result of asingle row the extracted 
 object looses its matrix attributes and becomes a numeric vector. That 
 behavior is prevented with drop=FALSE and desire results accrue.
 
 This would not have been a puzzle if you had chose multiple rows at a time:
 
 td [ idxs[1:2, ] ]
 [1] 1 4
 td [ idxs ]
 [1] 1 4 1 1 3 3 1 5 3 4 2 1 1 5 3 2 2 4 1 2 3 3
 
 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] scanning data in R


On Apr 3, 2013, at 2:33 AM, Naser Jamil wrote:

 Dear R-user,
 May I seek your suggestion.  I have a data file 'stop' to be scanned in R.
 But I want to ignore one specific number '21' there. Putting differently, I
 want to get all numbers in the file except 21. Is there any command to
 achieve it?
 
 --
 
 b-scan(F:\\stop.txt)
 

b-scan(~/Downloads/stop.txt, na.strings=21)
Read 635 items
 b
  [1] 15 15 16 15 17 16 16 15 17 15 16 17 19 17 19 15 16 16 19 18 19 17 17 19 
16 17 16 20 15 16
 [31] 16 15 18 19 17 18 16 18 16 17 NA NA 17 15 15 20 16 20 17 NA 19 15 15 15 
18 20 16 17 16 15
snipped remainder

-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript and for loops

Yup, I want these as labels on plots, but I need it as: D (mg kg^-1) rather
than D (mg kg)^-1.

Sorry for not being more clear and thanks for your help.

Cheers


On Wed, Apr 3, 2013 at 6:44 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Apr 3, 2013, at 9:06 AM, Shane Carey wrote:

  Hi,
 
  If I have data as follows:
  DATA_names-c(
  A mg kg
  B mg kg
  C mg kg
  D mg kg
  E mg kg
  F mg kg
  G mg kg
  H mg kg
 
  How do I convert to:
  -1
  A (mg kg   )
  -1
  B (mg kg   )
  -1
  C (mg kg   )
  -1
  D (mg kg   )
  -1
  E (mg kg   )
  -1
  F (mg kg   )
  -1
  G (mg kg   )
  -1
  H (mg kg   )
 
  I have lots of elements and I need to do this automatically in a for loop
  or the like?

 You haven't described the task in very much detail, so Bill Dunlap's
 language-oriented (more expressive as it were) solution my be what you
 really need. Nonetheless, this answer stays on the character-plane of R's
 conceptual levels:

  gsub(mg kg, (mg kg)^-1, DATA_names)
 [1] A (mg kg)^-1 B (mg kg)^-1 C (mg kg)^-1 D (mg kg)^-1 E (mg
 kg)^-1 F (mg kg)^-1
 [7] G (mg kg)^-1 H (mg kg)^-1

 If you wanted these in an expression vector suitable for labels on a
 barplot or such:

 sapply(gsub(mg kg, (mg kg)^-1, DATA_names), as.expression)

 expression(`A (mg kg)^-1` = A (mg kg)^-1, `B (mg kg)^-1` = B (mg
 kg)^-1,
 `C (mg kg)^-1` = C (mg kg)^-1, `D (mg kg)^-1` = D (mg kg)^-1,
 `E (mg kg)^-1` = E (mg kg)^-1, `F (mg kg)^-1` = F (mg kg)^-1,
 `G (mg kg)^-1` = G (mg kg)^-1, `H (mg kg)^-1` = H (mg kg)^-1)

 In practice:

  pos - barplot(1:length(DATA_names))
  text(x=pos,y=-1, xpd=TRUE, srt=45,
labels= sapply( gsub(mg kg, (mg kg)^-1, DATA_names),
as.expression))


 --
 David Winsemius
 Alameda, CA, USA




-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Superscript

Hi William,

This is exactly what Im trying to do. Your a star,

Thanks


On Wed, Apr 3, 2013 at 5:33 PM, William Dunlap wdun...@tibco.com wrote:

 Are you trying to convert a column name like Na_mgkg to a plot label
 like Na (mg kg^-1) ?
 If so you will have to use both string manipulation functions like gsub()
 and expression manipulating
 functions like bquote().  E.g.,

 f - function (name)
 {
# add other suffices and their corresponding plotmath expressions to
 the list
env - list2env(list(mgkg = bquote(mg ~ kg^{-1}),
 ugkg = bquote(mu * g ~ kg^{-1})),
parent = emptyenv())
pattern - paste0(_(, paste(objects(env), collapse=|), ))
bquoteExpr - parse(text=gsub(pattern,
  ~(.(\\1)),
  name))[[1]]
# I use do.call() to work around the fact that bquote's first argument
 is not evaluated.
do.call(bquote, list(bquoteExpr, env))
 }

 d - data.frame(Na_mgkg=1:10, K_ugkg=10:1)
 plot(Na_mgkg ~ K_ugkg, data=d, xlab=f(K_ugkg), ylab=f(Na_mgkg))

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf
  Of Shane Carey
  Sent: Wednesday, April 03, 2013 8:02 AM
  To: r-help@r-project.org
  Subject: [R] Superscript
 
  Hi,
  How do I write a superscript within gsub?
 
  I have the following: gsub(_mgkg,expression(paste(mg
 kg^{-1})),names[1])
 
  Thanks
 
 
 
  --
  Shane
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data

Peter.

For suggestion 1, what advantages are there to using coef() rather than
$coef?

For suggestion 2, thanks!  I'm new to the plyr package and wasn't aware of
the mutate() function.

Jean


On Wed, Apr 3, 2013 at 1:01 PM, Peter Ehlers ehl...@ucalgary.ca wrote:

 A few minor improvements to Jean's post suggested inline below.


 On 2013-04-03 05:41, Adams, Jean wrote:

 Cecilia,

 Thanks for providing a reproducible example.  Excellent.

 You could use the ddply() function in the plyr package to fit the model
 for
 each industry and year, keep the coefficients, and then estimate the
 fitted
 and residual values.

 Jean

 library(plyr)
 coef - ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z,
 data=dat)$coef)
 names(coef) - c(industry, year, b0, b1, b2)
 final4 - merge(final3, coef)
 newdata1 - transform(final4, Yhat = b0 + b1*X + b2*Z)
 newdata2 - transform(newdata1, residual = Y-Yhat)
 plot(as.factor(newdata2$firm), newdata2$residual)


 Suggestion 1:
 Use the extractor function coef() and also avoid using the name
 of an R function as a variable name:

  Coef - ddply(, function(dat) coef(lm()))

 Suggestion 2:
 Use plyr's mutate() to do both transforms at once:

  newdata - mutate(final4,
Yhat = b0 + b1*X + b2*Z,
residual = Y-Yhat)

 [Or you could use within(), but I now find mutate handier, mainly
 because it doesn't 'reverse' the order of the new variables.]

 Suggestion 3:
 Use the 'data=' argument in the plot:

  boxplot(residual ~ firm, data = newdata)

 Peter Ehlers



 On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo cecilia.ca...@ua.pt
 wrote:

  Hi R-helpers,



 My real data is a panel (unbalanced and with gaps in years) of thousands
 of firms, by year and industry, and with financial information (variables
 X, Y, Z, for example), the number of firms by year and industry is not
 always equal, the number of years by industry is not always equal.



 #reproducible example
 firm1-sort(rep(1:10,5),**decreasing=F)
 year1-rep(2000:2004,10)
 industry1-rep(20,50)
 X-rnorm(50)
 Y-rnorm(50)
 Z-rnorm(50)
 data1-data.frame(firm1,year1,**industry1,X,Y,Z)
 data1
 colnames(data1)-c(firm,**year,industry,X,Y,Z)



 firm2-sort(rep(11:15,3),**decreasing=F)
 year2-rep(2001:2003,5)
 industry2-rep(30,15)
 X-rnorm(15)
 Y-rnorm(15)
 Z-rnorm(15)
 data2-data.frame(firm2,year2,**industry2,X,Y,Z)
 data2
 colnames(data2)-c(firm,**year,industry,X,Y,Z)

 firm3-sort(rep(16:20,4),**decreasing=F)
 year3-rep(2001:2004,5)
 industry3-rep(40,20)
 X-rnorm(20)
 Y-rnorm(20)
 Z-rnorm(20)
 data3-data.frame(firm3,year3,**industry3,X,Y,Z)
 data3
 colnames(data3)-c(firm,**year,industry,X,Y,Z)



 final1-rbind(data1,data2)
 final2-rbind(final1,data3)
 final2
 final3-final2[order(final2$**industry,final2$year),]
 final3



 I need to estimate a linear model Y = b0 + b1X + b2Z by industry and
 year,
 to obtain the estimates of b0, b1 and b2 by industry and year (for
 example
 I need to have de b0 for industry 20 and year 2000, for industry 20 and
 year 2001...). Then I need to calculate the fitted values and the
 residuals
 by firm so I need to keep b0, b1 and b2 in a way that I could do
 something
 like
 newdata1-transform(final3,Y'=**b0+b1.X+b2.Z)
 newdata2-transform(newdata1,**residual=Y-Y')
 or another way to keep Y' and the residuals in a dataframe with the
 columns firm and year.



 Until now I have been doing this in very hard way and because I need to
 do
 it several times, I need your help to get an easier way.



 Thank you,



 Cecília Carmo

 Universidade de Aveiro

 Portugal




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Changing y-axis in ggplot to show proportions instead of density

2013-04-03 Thread Camilo Mora


Hi Everyone,

I have a frequency data, which I am displaying with an area-curve-like  
plot in ggplot2 using:


ggplot(dfs, aes(x=values)) + geom_density(aes(group=ind))

The Y-axis that is returned is density, which is not really intuitive  
for my purposes and I would like to change it for proportions (i.e.  
the count of values at each bin divided by the total number of values).


I have found that I can change the y-axis to display counts or scaled  
proportions but not just the proportions.


something like +geom_density(aes(y = count/sum(count)))

This code does not work but I wonder if something like this possible?

Thanks,

Camilo




Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
 Provider code: 313
 Phone 776 2282
 From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] {Spam?} Re: linear model coefficients by year and industry, fitted values, residuals, panel data

2013-04-03 Thread Peter Ehlers


On 2013-04-03 11:54, Adams, Jean wrote:

Peter.

For suggestion 1, what advantages are there to using coef() rather than
$coef?


Not much difference for lm models, but note that it does use the
partial matching 'feature' of the '$' extractor, since the component
name is actually 'coefficients'.

But try it for an nls model. Mynlsmodel$coefficients won't work (well,
it won't give an error but it will yield NULL).

That's why there are special extractor functions such as coef.nls,
coef.Arima, etc. For lm models, coef.default is used.

Peter Ehlers



For suggestion 2, thanks!  I'm new to the plyr package and wasn't aware
of the mutate() function.

Jean



 [...snip...]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Make cdf package error Human Exon array

2013-04-03 Thread Maria Arnedo Munoz


Hello everybody!

I am not sure if this is the way for asking because I am new in this  
kind of help-website so, please if I am wrong tell me.


I am trying to make a cdf package for the Human Exon Array chip from  
Affymetrix (HuEx-1_0-st-v2). I have downloaded the file  
HuEx-10-st-v2.cdf from the Affymetrix website and write the following  
commands in the R program:


library(makecdfenv)
pkgpath-tempdir()
make.cdf.package(HuEx-1_0-st-v2.text.cdf,cdf.path=/Working/,compress=F,  
species = Homo_sapiens, package.path = pkgpath)


It costs me a lot of time and most of the RAM memory of my computer  
and at the end the following message appears:


Reading CDF file.
Error in .Call(reaD file, as.character(file), as.integer(3),
as.integer(compress), :promise already under evaluation: recursive default
argument reference or earlier problems?

I have no idea if I am doing anything wrong or if there is a better  
method for making the package, so any help will be really welcome.


Thank you,

Maria

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] scanning data in R

2013-04-03 Thread Naser Jamil

Many thanks, Prof. David. It's exactly what I wanted!

On 3 April 2013 17:26, David L Carlson dcarl...@tamu.edu wrote:

 You can certainly do it after scanning all the numbers in with

 b - scan(F:\\stop.txt, what=integer())
 b - b[b!=21]

 --
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77843-4352

  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Naser Jamil
  Sent: Wednesday, April 03, 2013 4:33 AM
  To: R help
  Subject: [R] scanning data in R
 
  Dear R-user,
  May I seek your suggestion.  I have a data file 'stop' to be scanned in
  R.
  But I want to ignore one specific number '21' there. Putting
  differently, I
  want to get all numbers in the file except 21. Is there any command to
  achieve it?
 
  --
 
  b-scan(F:\\stop.txt)
 
  --
 
  Many thanks for your kind attention.
 
  Regards,
  Jamil.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make cdf package error Human Exon array

2013-04-03 Thread Martin Morgan


On 04/03/2013 10:15 AM, Maria Arnedo Munoz wrote:

Hello everybody!

I am not sure if this is the way for asking because I am new in this kind of
help-website so, please if I am wrong tell me.


Hi Maria -- makecdfenv is a Bioconductor package, so ask on their mailing list

  http://bioconductor.org/help/mailing-list/

Best,
Martin



I am trying to make a cdf package for the Human Exon Array chip from Affymetrix
(HuEx-1_0-st-v2). I have downloaded the file HuEx-10-st-v2.cdf from the
Affymetrix website and write the following commands in the R program:

library(makecdfenv)
pkgpath-tempdir()
make.cdf.package(HuEx-1_0-st-v2.text.cdf,cdf.path=/Working/,compress=F,
species = Homo_sapiens, package.path = pkgpath)

It costs me a lot of time and most of the RAM memory of my computer and at the
end the following message appears:

Reading CDF file.
Error in .Call(reaD file, as.character(file), as.integer(3),
as.integer(compress), :promise already under evaluation: recursive default
argument reference or earlier problems?

I have no idea if I am doing anything wrong or if there is a better method for
making the package, so any help will be really welcome.

Thank you,

Maria




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with lmRob function

2013-04-03 Thread Mark Chaffin

Hi,

I am fairly new to R and have encountered an issue with the lmRob function that 
I have been unable to resolve. I am trying to run a robust regression using the 
lmRob function which runs successfully, but the results are rather strange. I'm 
not sure it's important, but my model has 3 dichotomous categorical variables 
and 2 continuous variables in it. When I look at a summary of my model, 1 of my 
continuous variables and 1 of my categorical variables both have an Estimate 
value of 1. My other 2 categorical variables and 1 other continuous variables 
all have Estimate values of 0. These results seem strange to me. When I run a 
simple rlm function the model runs fine and produces the results I'd expect to 
see.

Another potentially useful piece of information is that when I run a model with 
just the two variables that have an Estimate value of 1 (one categorical and 
one continuous), I get the following error message:

1: In lmRob.fit.compute(x, y, x1.idx = x1.idx, nrep = nrep, robust.control = 
robust.control,  :
  Sum(psi.weight(wi)) less than 1e-06 in lmRob.ucovcoef.  during initial 
estimation.
2: In lmRob.fit.compute(x, y, x1.idx = x1.idx, nrep = nrep, robust.control = 
robust.control,  :
  Sum(psi.weight(wi)) less than 1e-06 in lmRob.ucovcoef. during final scale 
estimation.

I do not understand what this error message means and have been unable to 
resolve it. I tried several different permutations of my five variables in 
varying combinations and sometimes the models work and sometimes I get this 
error message. There seems to be no pattern as to when the error message occurs 
and when it does not.

If anyone has encountered this or has any help I'd appreciate it. Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with integrate function

2013-04-03 Thread Swagato Chatterjee

Hello,

The following code of mine is giving the error:

Error in integrate(fx[[2]], 0.056, 1) :
  maximum number of subdivisions reached

Can anyone help?

Thanks and Regards.

Swagato

--

fv-vector(list)
fx-vector(list)

v-0
c-0
n-0
NOV-0
i-0

while(n200){
fv[[1]]-function(x)1 #prior function

fx[[1]]-function(x){
  rho-(0.2+6*x)/(.3+4*x)
  i-0
  N-0
  for(i in 1:length(x)){

kopt-function(K){(x[i]-0.05*(K-((1-rho[i]^K)/(1-rho[i])+(K*rho[i]^(K+1)-1))/(1-rho[i]^(K+1)))/(0.1+5*x[i]))^2}

  L-optim(c(0),kopt,method=BFGS)
  N[i]-round(L$par+1)}
  return(N)
} # Calculate Kv+1

fx[[2]]-function(x)((0.2+6*x)/(.3+4*x))^n*(1-(0.2+6*x)/(.3+4*x))/(1-(0.2+6*x)/(.3+4*x))^fx[[1]](x)*fv[[1]](x)

# numerator of f'
gx-integrate(fx[[2]],0,0.045) #denomenator of f'
hx-integrate(fx[[2]],0.056,1)
fv[[2]]-function(x)x*fx[[2]](x)/(gx$value+hx$value) #v.f'

fx[[3]]-function(x){
  NV-0;n0-0;
  rho-(0.2+6*x)/(.3+4*x)
  i-0
  N-0
  for(i in 1:length(x)){

kopt-function(K){(x[i]-0.05*(K-((1-rho[i]^K)/(1-rho[i])+(K*rho[i]^(K+1)-1))/(1-rho[i]^(K+1)))/(0.1+5*x[i]))^2}

  L-optim(c(0),kopt,method=BFGS)
  N[i]-L$par+1
  j-0
  for(j in 0:N[i]) NV[i]- NV[i] +
j*(rho[i])^j*(1-rho[i])/(1-rho[i]^(N[i]+1))
  n0[i]-n}

  return(0.05*(n0-NV)/(0.1+5*x))
 }#cost function
fv[[3]]-function(x)fx[[3]](x)*fx[[2]](x)/(gx$value+hx$value)
fv[[4]]-function(x)(n-fx[[3]](x)*(0.1+5*x)/0.05)*fx[[2]](x)/(gx$value+hx$value)



a1-integrate(fv[[2]],0,0.045)
a2-integrate(fv[[2]],0.056,1)
v[n]-a1$value+a2$value #expected value

b1-integrate(fv[[3]],0,0.045)
b2-integrate(fv[[3]],0.056,1)
c[n]-b1$value+b2$value #expected cost

d1-integrate(fv[[4]],0,0.045)
d2-integrate(fv[[4]],0.056,1)
NOV[n]-d1$value+d2$value #expected length
n-n+1
}
plot(v)
plot(c)
plot(NOV)
l-1/(1+exp(v-c))
plot(l)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] DUD (Does not Use Derivatives) for nonlinear

2013-04-03 Thread Rolf Turner


On 04/04/2013 05:34 AM, Prof J C Nash (U30A) wrote:

SNIP
Most optimization problems have more than one answer, and the wrong 
ones

often seem to be easier to find.


Fortune?

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data

2013-04-03 Thread Rolf Turner


On 04/04/2013 07:54 AM, Adams, Jean wrote:

Peter.

For suggestion 1, what advantages are there to using coef() rather than
$coef?


Just thought I'd chip in:  It is considered, uh, politically correct to use
extractor functions rather than digging out components of objects
in a direct manner.  The purported advantage of this is that it is
robust against package changes which could restructure the object
in question, which would break direct extraction but would (presumably)
have no deleterious consequences for the use of extractor functions.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Can package plyr also calculate the mode?

2013-04-03 Thread ramoss

I am trying to replicate the SAS proc univariate in R.  I got most of the
stats I needed for a by grouping in a data frame using:

all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS),
 q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
q75=quantile(COUNTS,.75),
  q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
q99=quantile(COUNTS,.99) )
So I got the mean, median std dev, quantiles etc.  

IS there any way I can add the mode to the mixt. Thanks ahead for any
suggestions.



--
View this message in context: 
http://r.789695.n4.nabble.com/Can-package-plyr-also-calculate-the-mode-tp4663235.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to re-shape a matrix

2013-04-03 Thread Hui Du


Hi All,
I have a matrix like
m - matrix( letters[1:10], ncol=5)

How to conver it to  10 * 3 matrix, which the first col is row index of m, 
second col is colum index of m and third column is the value of matrix, say

11   1a

21   2   c
1  3  e
etc...
Thanks.







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can package plyr also calculate the mode?

Sure, you can add the mode in, following the format by the other
summary statistics.

?mode

Sarah

On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote:
 I am trying to replicate the SAS proc univariate in R.  I got most of the
 stats I needed for a by grouping in a data frame using:

 all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS),
  q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
 q75=quantile(COUNTS,.75),
   q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
 q99=quantile(COUNTS,.99) )
 So I got the mean, median std dev, quantiles etc.

 IS there any way I can add the mode to the mixt. Thanks ahead for any
 suggestions.




--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to re-shape a matrix

Thanks for the reproducible example.

Because this mixes numeric and character data, it's better done as a
data frame than a matrix.


 data.frame(rowind=as.vector(row(m)), colind=as.vector(col(m)), value = 
 as.vector(m))
   rowind colind value
1   1  1 a
2   2  1 b
3   1  2 c
4   2  2 d
5   1  3 e
6   2  3 f
7   1  4 g
8   2  4 h
9   1  5 i
10  2  5 j

Sarah

On Wed, Apr 3, 2013 at 5:28 PM, Hui Du hui...@dataventures.com wrote:

 Hi All,
 I have a matrix like
 m - matrix( letters[1:10], ncol=5)

 How to conver it to  10 * 3 matrix, which the first col is row index of m, 
 second col is colum index of m and third column is the value of matrix, say

 11   1a

 21   2   c
 1  3  e
 etc...
 Thanks.




--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can package plyr also calculate the mode?

If you type
?mode
at an R prompt you will be able to read the help for the mode() function.

On Wed, Apr 3, 2013 at 5:34 PM, Mossadegh, Ramine N.
ramine.mossad...@finra.org wrote:
 I tried mode=?mode(COUNTS) but that doesn't work.

 -Original Message-
 From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Sent: Wednesday, April 03, 2013 5:32 PM
 To: Mossadegh, Ramine N.
 Cc: r-help
 Subject: Re: [R] Can package plyr also calculate the mode?

 Sure, you can add the mode in, following the format by the other summary 
 statistics.

 ?mode

 Sarah

 On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote:
 I am trying to replicate the SAS proc univariate in R.  I got most of
 the stats I needed for a by grouping in a data frame using:

 all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS),
  q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
 q75=quantile(COUNTS,.75),
   q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
 q99=quantile(COUNTS,.99) )
 So I got the mean, median std dev, quantiles etc.

 IS there any way I can add the mode to the mixt. Thanks ahead for any
 suggestions.





--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] looking for help with clustering analysis

2013-04-03 Thread capricy gao



My data have good correlations with spearman method and bad correlations with 
pearson method.

If I want to do cluster analysis to reflect the sprearman correlation, 
what method should I use to calculate the distance matrix? Thanks.

Xin

 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] looking for help with clustering analysis


On Apr 3, 2013, at 2:38 PM, capricy gao wrote:

 
 
 My data have good correlations with spearman method and bad correlations with 
 pearson method.
 
 If I want to do cluster analysis to reflect the sprearman correlation, 
 what method should I use to calculate the distance matrix? Thanks.

Difference in ranks?

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can package plyr also calculate the mode?

Of course it can. Use the mode() in the same way you used the mean() function.

You didn't provide a reproducible example, so I can't provided tested
code, but I would think that you can add
mode=mode(COUNTS) to the ddply() arguments.

all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS),
  q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
 q75=quantile(COUNTS,.75),
   q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
 q99=quantile(COUNTS,.99) )



On Wed, Apr 3, 2013 at 5:36 PM, Mossadegh, Ramine N.
ramine.mossad...@finra.org wrote:
 So you mean it cannot be calculated within plyer?

 -Original Message-
 From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Sent: Wednesday, April 03, 2013 5:36 PM
 To: Mossadegh, Ramine N.; r-help
 Subject: Re: [R] Can package plyr also calculate the mode?

 If you type
 ?mode
 at an R prompt you will be able to read the help for the mode() function.

 On Wed, Apr 3, 2013 at 5:34 PM, Mossadegh, Ramine N.
 ramine.mossad...@finra.org wrote:
 I tried mode=?mode(COUNTS) but that doesn't work.

 -Original Message-
 From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Sent: Wednesday, April 03, 2013 5:32 PM
 To: Mossadegh, Ramine N.
 Cc: r-help
 Subject: Re: [R] Can package plyr also calculate the mode?

 Sure, you can add the mode in, following the format by the other summary 
 statistics.

 ?mode

 Sarah

 On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote:
 I am trying to replicate the SAS proc univariate in R.  I got most of
 the stats I needed for a by grouping in a data frame using:

 all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS),
  q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
 q75=quantile(COUNTS,.75),
   q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
 q99=quantile(COUNTS,.99) )
 So I got the mean, median std dev, quantiles etc.

 IS there any way I can add the mode to the mixt. Thanks ahead for any
 suggestions.




--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with integrate function

2013-04-03 Thread S Ellison

Error in integrate(fx[[2]], 0.056, 1) :
  maximum number of subdivisions reached

 Can anyone help?
At the risk of longer integration time, look at the 'subdivisions' argument in 
?integrate and consider increasing it?

S Ellison

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] prop.test vs hand calculated confidence interval

2013-04-03 Thread David Arnold

Hi,

This code:

n=40
x=17
phat=x/n
SE=sqrt(phat*(1-phat)/n)
zstar=qnorm(0.995)
E=zstar*SE
phat+c(-E,E)

Gives this result:

[1] 0.2236668 0.6263332

The TI Graphing calculator gives the same result.

Whereas this test:

prop.test(x,n,conf.level=0.99,correct=FALSE)

Give this result:

0.2489036 0.6224374 

I'm wondering why there is a difference.

D.



--
View this message in context: 
http://r.789695.n4.nabble.com/prop-test-vs-hand-calculated-confidence-interval-tp4663245.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prop.test vs hand calculated confidence interval

One of the joys of R is that it's open source: you can read the code
for prop.test yourself and see what's happening.

In this case, simply typing
prop.test
at the command line will provide it, although without comments.

Sarah



On Wed, Apr 3, 2013 at 6:10 PM, David Arnold dwarnol...@suddenlink.net wrote:
 Hi,

 This code:

 n=40
 x=17
 phat=x/n
 SE=sqrt(phat*(1-phat)/n)
 zstar=qnorm(0.995)
 E=zstar*SE
 phat+c(-E,E)

 Gives this result:

 [1] 0.2236668 0.6263332

 The TI Graphing calculator gives the same result.

 Whereas this test:

 prop.test(x,n,conf.level=0.99,correct=FALSE)

 Give this result:

 0.2489036 0.6224374

 I'm wondering why there is a difference.

 D.



--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ask help

2013-04-03 Thread Jim Lemon


On 04/04/2013 01:31 AM, DONG Echo wrote:

Hello!
  I am eager to learn if I only have a data about the relationship between otu 
and sample, could I use the function capscale, and final make a point plot that 
x-axis is CAP1 and y-axis is CAP2?   Besides, what function could I use to make 
the different rarefaction curve with different color in the a plot in R? 
Appreciate very much.

  Sincerely!

Hi Echo,
This is such a great question that I am going to try to answer part of 
it. OTU tables are roughly observation by category tables, and as you 
mention the capscale (vegan) function, this will probably reduce to 
sampling unit by species. The CAPx scores can be plotted on 2D 
coordinates as in the first example from the capscale function. However, 
it does not seem to make any sense to overlay rarefaction curves on the 
CAP plot as the two are completely different measures. Displaying the 
two plots side by side would probably be a better idea.


I tried to answer this for four reasons:

1) I didn't see another answer.
2) I thought it might be fun to find out what an OTU table was.
3) The question is an archetype of asking something highly specific 
without providing the necessary information for anyone but the 
questioner to understand it.
4) It struck me that the reverse often occurs, with answerers providing 
information that the questioner has little chance of understanding 
without a substantial amount of introduction.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a question about clustering

2013-04-03 Thread JiangMei

Hi all. Sorry to bother you.

I have a table like the following
   A B C D E F
G1  0  1 1  0 1 1
G2  0  1 1  0 1 1
G3  0  0  0 0 0 1
H1  1  1  1 1 1 1
H2  1  0  1 1 0 1
H3  1  0  1 1 0 1

I already know G1, G2 and G3 belong to the same group and H1, H2 and H3 belong 
to the other group. I want to cluster A, B, C, D, E and F. It seems that I 
can't input the group information of G1-H3 into the 'hclust' function. Are 
there any other cluster functions into which I could input the group 
information of G1-H3?

Thanks very much! Any help will be greatly appreciated.

Best regards
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prop.test vs hand calculated confidence interval

2013-04-03 Thread Albyn Jones

It might help to look at the documentation.
Typing 

   ?prop.test

on the command line.  That would reveal various items of
interest, including a section labeled Details.  A close
reading of that section turns up the explanation:

  The confidence interval is computed by inverting the score test.

There are also journal references given.

albyn

On Wed, Apr 03, 2013 at 06:17:56PM -0400, Sarah Goslee wrote:
 One of the joys of R is that it's open source: you can read the code
 for prop.test yourself and see what's happening.
 
 In this case, simply typing
 prop.test
 at the command line will provide it, although without comments.
 
 Sarah
 
 
 
 On Wed, Apr 3, 2013 at 6:10 PM, David Arnold dwarnol...@suddenlink.net 
 wrote:
  Hi,
 
  This code:
 
  n=40
  x=17
  phat=x/n
  SE=sqrt(phat*(1-phat)/n)
  zstar=qnorm(0.995)
  E=zstar*SE
  phat+c(-E,E)
 
  Gives this result:
 
  [1] 0.2236668 0.6263332
 
  The TI Graphing calculator gives the same result.
 
  Whereas this test:
 
  prop.test(x,n,conf.level=0.99,correct=FALSE)
 
  Give this result:
 
  0.2489036 0.6224374
 
  I'm wondering why there is a difference.
 
  D.
 
 
 
 --
 Sarah Goslee
 http://www.functionaldiversity.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can package plyr also calculate the mode?

2013-04-03 Thread Peter Ehlers


On 2013-04-03 14:59, Sarah Goslee wrote:

Of course it can. Use the mode() in the same way you used the mean() function.

You didn't provide a reproducible example, so I can't provided tested
code, but I would think that you can add
mode=mode(COUNTS) to the ddply() arguments.


?mode will direct you to the help page for _storage_ mode of an object.
That's not likely what the OP had in mind. It seems that what s/he
wants is the most frequent value. This is (usually) a pretty useless
piece of information, but there are a number of packages that do
provide it.

To the OP:
Install package sos and then do findFn(mode) to see what's available.
E.g. packages, pracma, asbio, dprep, rattle and many others.
Do note that they handle the multimodal situation differently.

Or, write your own, perhaps using table() and which.max().

Peter Ehlers



all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS),

  q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
q75=quantile(COUNTS,.75),
   q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
q99=quantile(COUNTS,.99) )




On Wed, Apr 3, 2013 at 5:36 PM, Mossadegh, Ramine N.
ramine.mossad...@finra.org wrote:

So you mean it cannot be calculated within plyer?

-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
Sent: Wednesday, April 03, 2013 5:36 PM
To: Mossadegh, Ramine N.; r-help
Subject: Re: [R] Can package plyr also calculate the mode?

If you type
?mode
at an R prompt you will be able to read the help for the mode() function.

On Wed, Apr 3, 2013 at 5:34 PM, Mossadegh, Ramine N.
ramine.mossad...@finra.org wrote:

I tried mode=?mode(COUNTS) but that doesn't work.

-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
Sent: Wednesday, April 03, 2013 5:32 PM
To: Mossadegh, Ramine N.
Cc: r-help
Subject: Re: [R] Can package plyr also calculate the mode?

Sure, you can add the mode in, following the format by the other summary 
statistics.

?mode

Sarah

On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote:

I am trying to replicate the SAS proc univariate in R.  I got most of
the stats I needed for a by grouping in a data frame using:

all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS),
  q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
q75=quantile(COUNTS,.75),
   q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95),
q99=quantile(COUNTS,.99) )
So I got the mean, median std dev, quantiles etc.

IS there any way I can add the mode to the mixt. Thanks ahead for any
suggestions.





--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating data frame from individual files

2013-04-03 Thread Adrian Johnson

Dear Group:
I have 72 files (.txt).

Each file has 2 columns and column 1 is always identical for all 70 files.
Each file has 90,799 rows and is standard across all files.

I want to create a matrix 40(rows) x 70 columns.

I tried :

temp = list.files(pattern=*.txt)
named.list - lapply(temp, read.delim)
library(data.table)
files.matrix -rbindlist(named.list)

 dim(files.matrix)
[1] 6537456   2

What happened here is all 90K rows for 72 files were rbinded.

I want to cbind.

Could anyone please help me.
Thanks
Adrian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can package plyr also calculate the mode?

2013-04-03 Thread Jim Lemon


On 04/04/2013 09:56 AM, Peter Ehlers wrote:

On 2013-04-03 14:59, Sarah Goslee wrote:

Of course it can. Use the mode() in the same way you used the mean()
function.

You didn't provide a reproducible example, so I can't provided tested
code, but I would think that you can add
mode=mode(COUNTS) to the ddply() arguments.


?mode will direct you to the help page for _storage_ mode of an object.
That's not likely what the OP had in mind. It seems that what s/he
wants is the most frequent value. This is (usually) a pretty useless
piece of information, but there are a number of packages that do
provide it.

To the OP:
Install package sos and then do findFn(mode) to see what's available.
E.g. packages, pracma, asbio, dprep, rattle and many others.
Do note that they handle the multimodal situation differently.

Or, write your own, perhaps using table() and which.max().


The OP might find the Mode (note capital M) function (prettyR) helpful.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ploting several functions on the same plot

2013-04-03 Thread Julio Sergio

I want to superimpose two functions plots in the same page. The functions L0 
and L1, defined below


  f0 - function(mu, xm, ds, n) {
1 - pnorm((xm-mu)/(ds/sqrt(n)))
  }

  f1 - function(mu,n) f0(mu, 386.8, 48, n)

  L0 - function(mu) f1(mu, 36)

  plot(L0,ylim=c(0,1),xlim=c(360,420))

  L1 - function(mu) f1(mu,100)
  lines(L1)

The plot of L0 works pretty well. However, when trying to plot the second 
function, L1, the interpreter issues an error: 

Error in as.double(y) : 
  cannot coerce type 'closure' to vector of type 'double'

How can I solve my problem, since I want to have both plots with the same 
quality? I mean, I don't want to produce an approximating polygonal for the 
second function.

Thanks, 

 -Sergio.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ploting several functions on the same plot

2013-04-03 Thread Duncan Murdoch


On 13-04-03 7:35 PM, Julio Sergio wrote:

I want to superimpose two functions plots in the same page. The functions L0
and L1, defined below


   f0 - function(mu, xm, ds, n) {
 1 - pnorm((xm-mu)/(ds/sqrt(n)))
   }

   f1 - function(mu,n) f0(mu, 386.8, 48, n)

   L0 - function(mu) f1(mu, 36)

   plot(L0,ylim=c(0,1),xlim=c(360,420))

   L1 - function(mu) f1(mu,100)
   lines(L1)

The plot of L0 works pretty well. However, when trying to plot the second
function, L1, the interpreter issues an error:

Error in as.double(y) :
   cannot coerce type 'closure' to vector of type 'double'

How can I solve my problem, since I want to have both plots with the same
quality? I mean, I don't want to produce an approximating polygonal for the
second function.



curve(L1, add=TRUE) should handle it.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can package plyr also calculate the mode?