date:20110107


?read.table says

  ‘file’ can also be a complete URL.

This is implemented by url(): see the section on URLs on its help 
page.  You haven't followed the posting guide and told us your OS, and 
what the section says does depend on the OS.


On Thu, 6 Jan 2011, John Kane wrote:


#   Can anyone suggest why this works

datafilename - 
http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data;
person.data  - read.table(datafilename,header=TRUE)

# but this does not?

dd - https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt;
treedata - read.table(dd, header=TRUE)

===

Error in file(file, rt) : cannot open the connection
In addition: Warning message:
In file(file, rt) : unsupported URL scheme

# I can access both through a hyperlink in OOO Calc. t
#  Thanks


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R packages for R 2.11.1


On Thu, 6 Jan 2011, Joshua Wiley wrote:


I would try using the R 2.12.1 packages first, but if that does not


On 32-bit Windows this will not work if compiled code is involved: 
both the compiler and the package layout changed at 2.12.0.


However, binary packages for 2.11.x are still being run through the 
autobuiilder and are available on CRAN if they were built successfully 
(increasingly many are not).  See the summary table at

http://cran.r-project.org/bin/windows/contrib/checkSummaryWin.html

Nevertheless (as the posting guide makes clear) the R developers do 
not support obsolete versions of R, so you are advised to update to R 
2.12.1.



work, then you can go here:
http://cran.r-project.org/src/contrib/Archive/
to get older versions of the tar balls.  I think you might have to
build them yourself.  I kind of doubt anyone is keeping entire
duplicates of old CRAN packages in all forms.  This is not that
difficult though.  If the packages you want to install do not have
other code, I believe you can actually install them without any
additional software.  If there is compiled code (e.g., C or C++), then
I think you'll need RTools as well as a compatible compiler.  See the
installation manual for details:

http://cran.r-project.org/doc/manuals/R-admin.html

Cheers,

Josh


On Thu, Jan 6, 2011 at 8:48 PM, Raji raji.sanka...@gmail.com wrote:


Hi ,

 I am using R 2.11.1 . I need to download few packages for the same for
Windows.But in CRAN i see the latest packages for R 2.12.1 only. Can you
help me out with the locations where i can find the packages for R 2.11.1
Windows zip?



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

2011-01-07 Thread Dieter Menne



John Kane-2 wrote:
 
 #   Can anyone suggest why this works
 
 datafilename -
 http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data;
 person.data  - read.table(datafilename,header=TRUE)
 
 # but this does not?
 
 dd -
 https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt;
 treedata - read.table(dd, header=TRUE)
 
 ===
 
 Error in file(file, rt) : cannot open the connection
 

Your original file is no longer there, but when I try RCurl with a png file
that is present, I get a certificate error:

Dieter


library(RCurl)
sessionInfo()
dd -
https://sites.google.com/site/jrkrideau/home/general-stores/history.png;
x = getBinaryURL(dd)

-
 sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C   
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base 

other attached packages:
[1] RCurl_1.5-0.1  bitops_1.0-4.1

loaded via a namespace (and not attached):
[1] tools_2.12.1

 dd -
 https://sites.google.com/site/jrkrideau/home/general-stores/history.png;

 x = getBinaryURL(dd)
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : 
  SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
failed



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3178773.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problems with rJava


On Thu, 6 Jan 2011, karamoo wrote:



Hi All, and Heberto,

Did you ever resolve your installation problem with rJava?

I have a new windows 7 machine and can't seem to get it installed correctly.
I do have Java installed.  I download rJava without proble, then:


As you didn't follow the posting guide, we don't know if this is 32- 
or 64-bit R on 32- or 64-bit Windows.


If you are running 64-bit R you need 64-bit Java, and similarly for 
32-bit.  The message you show indicates that you do not have the 
appropriate Java installed.


As I have pointed out already this week, rJava has its own mailing 
list, so please follow up there.





install.packages(rJava)

Installing package(s) into ‘C:\Users\Patrick\Documents/R/win-library/2.12’
(as ‘lib’ is unspecified)
trying URL
'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.12/rJava_0.8-8.zip'
Content type 'application/zip' length 637125 bytes (622 Kb)
opened URL
downloaded 622 Kb


The downloaded packages are in
   C:\Users\Patrick\AppData\Local\Temp\Rtmp9zcM7g\downloaded_packages

Looks like it downloaded fine to my eyes. But in the next step I get an
error:


library(rJava)

Error in utils::readRegistry(key, HLM, 2) :
 Registry key 'Software\JavaSoft\Java Runtime Environment' not found
Error in utils::readRegistry(key, HLM, 2) :
 Registry key 'Software\JavaSoft\Java Development Kit' not found
Error : .onLoad failed in loadNamespace() for 'rJava', details:
 call: fun(...)
 error: JAVA_HOME cannot be found from the Registry
Error: package/namespace load failed for 'rJava'

I'm not sure how to check where R is looking for rJava vs where it lives,
this seems to be an issue in other posts, although they have different
errors.

I'm a novice still at R. I see a lot of posts on this topic, but few are
recent.  Help would be much appreciated.


Thanks!

Kara Moore
Evolution and Ecology
University of California, Davis, USA



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] algorithm help

2011-01-07 Thread Rainer M Krug

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/06/2011 11:57 PM, (Ted Harding) wrote:
 On 06-Jan-11 22:16:38, array chip wrote:
 Hi, I am seeking help on designing an algorithm to identify the
 locations of stretches of 1s in a vector of 0s and 1s. Below is
 an simple example:

 dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
   ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))

 dat
a  b
 1  0  4
 2  0 12
 3  1 13
 4  1 16
 5  1 18
 6  1 20
 7  0 28
 8  0 30
 9  1 34
 10 1 46
 11 0 47
 12 1 49
 13 1 61
 14 1 73
 15 1 77
 16 0 84
 17 0 87
 18 0 90
 19 0 95
 20 1 97

 In this dataset, b is sorted and denotes the location for each
 number in a. 
 So I would like to find the starting  ending locations for each
 stretch of 1s within a, also counting the number of 1s in each
 stretch as well.
 Hope the results from the algorithm would be:

 stretch   start   end   No.of.1s
 1 13  204
 2 34  462
 3 49  774
 4 97  971

 I can imagine using for loops can do the job, but I feel it's not a
 clever way to do this. Is there an efficient algorithm that can do
 this fast?

 Thanks for any suggestions.
 John
 
 The basic information you need can be got using rle() (run length
 encoding). See '?rle'. In your example:
 
   rle(dat$a)
   # Run Length Encoding
   #   lengths: int [1:8] 2 4 2 2 1 4 4 1
   #   values : num [1:8] 0 1 0 1 0 1 0 1
   ## Note: F - 0, T - 1
 
 The following has a somewhat twisted logic at the end, and may
 be flawed, but you can probably adapt it!
 
   L - rle(dat$a)$lengths
   V - rle(dat$a)$values
   pos - c(1,cumsum(L))
   V1 - c(-1,V)
   1+pos[V1==0]
   # [1]  3  9 12 20
   ## Positions in the series dat$a where each run of T (i.e. 1)
   ##   starts

A different approach would be to use the diff() function:

Where

 diff(dat$a)
 [1]  0  1  0  0  0 -1  0  1  0 -1  1  0  0  0 -1  0  0  0  1

is not equal 0, the value is changing from 0 to 1 or one to 0.
The indices of the first new value can be found by:

 which(diff(dat$a)!=0) + 1
[1]  3  7  9 11 12 16 20

where it is changing from 0 to 1 is at

 which(diff(dat$a)==1) + 1
[1]  3  9 12 20

where it is changing from 1 to 0 is at

 which(diff(dat$a)==-1) + 1
[1]  7 11 16

By taking into consideration if the first value and the last values are
0 or 1, you can calculate the length.


Cheers,

Rainer

 
 Hoping this helps,
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Fax-to-email: +44 (0)870 094 0861
 Date: 06-Jan-11   Time: 22:57:44
 -- XFMail --
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:+33 - (0)9 53 10 27 44
Cell:   +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0m1dMACgkQoYgNqgF2egoQbACcCB3iFQ6SKYfL4KVX8AMAN9Gp
1awAn0Z+8KXnOmwCLu61gihc8xZIT++j
=O+xA
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating a Matrix from a vector with some conditions

On Thu, Jan 06, 2011 at 01:34:31PM -0800, ADias wrote:
 
 Hi
 
 Suppose we have an object with strings:
 
 A-c(a,b,c,d)
 
 Now I do:
 
 B-matrix(A,4,4, byrow=F)
 
 and I get
 
 a a a a
 b b b b
 c c c c
 d d d d
 
 But what I really want is:
 
 a b c d
 b c d a
 c d a b
 d a b c
 
 How can I do this?

Try the following

  A - c(a,b,c,d)
  B - matrix(A, 5, 4)[1:4, ]

  # [,1] [,2] [,3] [,4]
  #[1,] a  b  c  d 
  #[2,] b  c  d  a 
  #[3,] c  d  a  b 
  #[4,] d  a  b  c 

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with IF operator

On Thu, Jan 06, 2011 at 12:21:33PM -0800, ADias wrote:
 
 Hi,
 
 I am with a problem on how to do a comparison of values. My script is as
 follows:
 
 repeat{
 cat(How many teams to use? (to end write 0) )
 nro-scan(n=1)
 if(nro==0)break
 cat(write the, nro, teams names \n)
 teams-readLines(n=nro)
 if (teams[1]==teams[2)next 
 else print(teams)
 }
 
 On this example I only compare teams 1 name with teams 2 name, and if they
 are the same the scrip starts again. If I had 10 teams how could I make it
 compare the nro number of teams names in order to check if the same name
 has been written more then once? The idea is, if the same name is written
 more then once it should give an error and start the scrip again by asking
 the teams names again.
 
 Two more things: With the next function the script stats from top, I mean
 starts by asking the number of teams to use. Can I make it that it goes
 directly to asking teams names?

Consider also using readline(), which reads a single line, and
%in% operator to compare the new name to the previous ones
immediately. 

  nro - as.numeric(readline(no of teams ))
  teams - rep(NA, times=nro)
  for (i in seq(length=nro)) {
  repeat {
  current - readline(paste(team, i, ))
  if (current %in% teams) {
  cat(error - repeated name\n)
  } else {
  break
  }
  }
  teams[i] - current
  }

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Converting Fortran or C++ etc to R

2011-01-07 Thread Murray Jorgensen

I will wind this thread up with some happy comments. I have indeed 
succeeded in constructing an R program to do the same thing as my 
Fortran program for an EM algorithm. I have not done timings yet but it 
seems to run acceptably fast for my purposes.


The key code to be replaced was the E and the M steps of the algorithm. 
I decided to try to replace all the loops with matrix operations such as 
%*%, t(), crossprod(), tcrossprod(). Other operations that I used were 
of the form

   A + v

where dim(A) = c(a, b) and length(v) = a. Here the vector v operates 
term by term down columns, recycling for each new column. [ *, - and / 
also work similarly.]


I was relived that matrices were as far as I needed to go, and I had 
visions of having to use tensor products of higher dimensioned arrays. 
Fortunately it did not come to that.


I didn't actually translate from F to R. The original is itself a 
translation of my underlying maths, and it was easier to translate the 
maths into R directly.


I preserved the form of my Fortran input and output files so that I will 
be able to run either version on the same files.


As I mentioned earlier the main point of doing all this is so that I may 
try out some variants of the program. I expect this will be much easier 
to do in R!


Thanks to all who replied.

Murray

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: m...@waikato.ac.nzFax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441   Mobile 021 0200 8350

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Stepwise SVM Variable selection

2011-01-07 Thread Georg Ruß

On 06/01/11 23:10:59, Noah Silverman wrote:
 I have a data set with about 30,000 training cases and 103 variable.
 I've trained an SVM (using the e1071 package) for a binary classifier
 {0,1}.  The accuracy isn't great.  I used a grid search over the C and G
 parameters with an RBF kernel to find the best settings. [...]

 Can anyone suggest an approach to seek the ideal subset of variables for
 my SVM classifier?

The standard feature selection stuff (backward/forward etc.) is probably
ruled out by the time it takes to compute all the sets and subsets. What
you could try is the following:

First, do a cross-validation setup: split up your data set into a training
and testing set (ratio 0.9 / 0.1 or so).

Second, train your SVM on the training set (try conservative parameters
first).

Third, have your trained SVM classify the test set and compute the
classification error.

Fourth, iterate over all variables and do the following:
  a) choose one variable and permute its values (only) in the test set
  b) have your trained SVM (from step 2) classify this test set and 
  measure the classification error
  c) repeat a) and b) a (high) number of times to be significant 
  d) go to next variable

Fifth, you can get an impression of the importance that one variable has
by comparing the errors generated on the permuted test set for each
variable with the non-permuted test set classification error. If the
permutation of one variable drastically increases the classification
error, the variable is probably important.

Sixth: repeat the cross-validation / random sampling a number of times to
be significant.

This is more like an ad-hoc approach and there are some pitfalls, but the
idea is easily explained and can also be carried over to any other
regression model with cross-validation. The computational burden in SVM is
assumed to be the training and not the prediction step and you only need a
relatively low number of training runs (sixth step) here.

Regards,
Georg.
-- 
Research Assistant
Otto-von-Guericke-Universität Magdeburg
resea...@georgruss.de
http://research.georgruss.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Odp: Calculating Returns : (Extremely sorry for earlier incomplete mail)

2011-01-07 Thread Petr PIKAL

Hi

Your code is quite complicated and I get an error

spot_returns_table - lapply(1:nrow(trans), function(z) with(trans[z, ], 
spot_trans(currency_trans=trans$currency_transacted)))
Error in if (currency_trans == USD) { : argument is of length zero

It seems to me that you do not know what is your code doing.

The warnings are from the fact that the currency_trans value you feed to 
spot_trans function is longer than one and if function needs an input of 
only one logical value.

Maybe you could use debug and see what are values of your variables during 
computation but I believe that better is to use more convenient input 
objects together with *apply or aggregate or basic math could be better 
solution.

rate1
  USDGBP  EURO   CHFAUD
1  112.05 171.52 42.71 41.50 109.55
2  112.90 168.27 42.68 41.47 102.52
3  110.85 169.03 41.86 42.84 114.91
4  109.63 169.64 44.71 43.44 122.48
5  108.08 169.29 44.14 43.69 122.12
6  111.23 169.47 44.58 42.30 123.96
7  112.49 170.90 41.07 42.05 100.36
8  108.87 168.69 42.23 41.23 110.19
9  109.33 170.90 44.55 42.76 121.58
10 111.88 169.96 41.12 43.79 103.46

log(rate1[-1,]/rate1[-nrow(rate1),])

Is this what you want?

Regards
Petr




r-help-boun...@r-project.org napsal dne 07.01.2011 07:36:28:

 
 
 
 
 
 
 
 
 
 Dear R forum helpers,
 
 I am extremely sorry for the receipt of my incomplete mail yesterday. 
There 
 was connectivity problem at my end and so I chose to send the mail 
through my 
 cell, only to realize today about the way mail has been transmitted. I 
am 
 again sending my complete mail through regular channel and sincerely 
apologize
 for the inconvenience caused.
 
 
 ## Here is my actual mail
 
 
 Dear R forum helpers,
 
 I have following data
 
 trans - data.frame(currency = c(EURO, USD, USD, GBP, USD, 
AUD), 
 position_amt = c(1, 25000, 2, 15000, 22000, 3))
 
 date - c(12/31/2010, 12/30/2010, 12/29/2010, 12/28/2010, 
12/27/
 2010, 12/24/2010, 12/23/2010, 12/22/2010, 12/21/2010, 
12/20/2010)
 USD  - c(112.05, 112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 
109.33, 111.88)
 GBP  - c(171.52, 168.27,169.03, 169.64, 169.29, 169.47, 170.9, 168.69, 
170.9, 169.96)
 EURO - c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07, 42.23, 44.55, 
41.12)
 CHF  - c(41.5, 41.47, 42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 
43.79)
 AUD  - c(109.55, 102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 
110.19, 121.
 58, 103.46)
 
 These are the exchange rates and I am trying calculating the returns. I 
am 
 giving only a small portion of actual data as I can't send this as an 
 attachment. I am using function as I want to generalize the code for any 
portfolio. 
 
 
 # __
 
 # My Code
  
 trans- read.table('transactions.csv', header=TRUE, sep=,, 
 na.strings=NA, dec=., strip.white=TRUE) 
 # reading as table. 
 
 #currency - read.table('currency.csv')
 
 #date - currency$date
 #USD = currency$USD
 #GBP = currency$GBP
 #EURO = currency$EURO
 #CHF = currency$CHF
 #AUD = currency$AUD
 
 # _
 
 # CREATION of Function. I am using function as no of transactions is not 
constant. 
 
 spot_trans = function(currency_trans) 
 
 {
 
 if (currency_trans == USD) 
 {rate = USD}
 
 # So if I am dealing with TRANSACTION USD, I am selecting the USD 
exchange rates.
 
 if (currency_trans == GBP)
 {rate = GBP}
 
 if (currency_trans == EURO)
 {rate = EURO}
 
 if (currency_trans == CHF)
 {rate = CHF}
 
 if (currency_trans == AUD)
 {rate = AUD}
 
 # 
 
 # CURRENCY Rate RETURNS i.e. lob(todays rate / yesterday rate) and the 
data is
 in descending Date order
 
 currency_rate_returns = NULL
 for (i in 1:(length(rate)-1))   # if there are 10 elements, total no of 
returns = 9
 
 {
 currency_rate_returns[i] = log(rate[i]/rate[i+1])
 }
 
 currency_rate_returns
 
 return(data.frame(returns = currency_rate_returns))
 
 }
 
 # ___
 
 spot_returns_table - lapply(1:nrow(trans), function(z) with(trans[z, ], 

 spot_trans(currency_trans=trans$currency_transacted)))
 
 spot_returns_table
 
 This generates the output as given below with 30 warnings. Also, as 
there are 
 six transactions, 6 outputs are generated but the output in all pertains 
only 
 to the first transacations i.e. 6 times returns are generated for the 
first 
 transaction EURO
 
  warnings()
 Warning messages:
 1: In if (currency_trans == USD) { ... :
   the condition has length  1 and only the first element will be used
 2: In if (currency_trans == GBP) { ... :
   the condition has length  1 and only the first element will be used
 3: In if (currency_trans == EURO) { ... :
   the condition has length  1 and only the first element will be used
 
  and so on
 
 The output is as given below.
 
  spot_returns_table
 [[1]]
 spot_returns
 1   0.0007026584
 2   0.0193997094
 3

Re: [R] R packages for R 2.11.1

2011-01-07 Thread Uwe Ligges




On 07.01.2011 09:21, Prof Brian Ripley wrote:

On Thu, 6 Jan 2011, Joshua Wiley wrote:


I would try using the R 2.12.1 packages first, but if that does not


On 32-bit Windows this will not work if compiled code is involved: both
the compiler and the package layout changed at 2.12.0.

However, binary packages for 2.11.x are still being run through the
autobuiilder and are available on CRAN if they were built successfully
(increasingly many are not). See the summary table at
http://cran.r-project.org/bin/windows/contrib/checkSummaryWin.html



Right, and if they were not built successfully, then the latest version 
that passed the checks is still available there (which is the one you 
want to use anyway then).



Let me add that just typing
install.packages(packagename)
should do the trick and will use the 2.11 package repository at 
your-CRAN-mirror/bin/windows/contrib/2.11


Best,
Uwe



Nevertheless (as the posting guide makes clear) the R developers do not
support obsolete versions of R, so you are advised to update to R 2.12.1.


work, then you can go here:
http://cran.r-project.org/src/contrib/Archive/
to get older versions of the tar balls. I think you might have to
build them yourself. I kind of doubt anyone is keeping entire
duplicates of old CRAN packages in all forms. This is not that
difficult though. If the packages you want to install do not have
other code, I believe you can actually install them without any
additional software. If there is compiled code (e.g., C or C++), then
I think you'll need RTools as well as a compatible compiler. See the
installation manual for details:

http://cran.r-project.org/doc/manuals/R-admin.html

Cheers,

Josh


On Thu, Jan 6, 2011 at 8:48 PM, Raji raji.sanka...@gmail.com wrote:


Hi ,

 I am using R 2.11.1 . I need to download few packages for the same for
Windows.But in CRAN i see the latest packages for R 2.12.1 only. Can you
help me out with the locations where i can find the packages for R
2.11.1
Windows zip?



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Match numeric vector against rows in a matrix?

On Wed, Jan 05, 2011 at 07:16:47PM +, Kevin Ummel wrote:
 Two posts in one day is not a good day...and this question seems like it 
 should have an obvious answer:
 
 I have a matrix where rows are unique combinations of 1's and 0's:
 
  combs=as.matrix(expand.grid(c(0,1),c(0,1)))
  combs
  Var1 Var2
 [1,]00
 [2,]10
 [3,]01
 [4,]11
 
 I want a single function that will give the row index containing an exact 
 match with vector x:
 
  x=c(0,1)
 
 The solution needs to be applied many times, so I need something quick -- I 
 was hoping a base function would do it, but I'm drawing a blank.

If the matrix can have different number of columns, then
also the following can be used

  combs - as.matrix(expand.grid(c(0,1),c(0,1),c(0,1)))
  x - c(0,1,1)
  which(rowSums(combs != rep(x, each=nrow(combs))) == 0) # [1] 7

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation for Ordinary Kriging

2011-01-07 Thread Jon Olav Skoien


Pearl,

The error suggests that there is something wrong with x2, and that there 
is a difference between the row names of the coordinates and the data. 
If you call

str(x2)
see if the first element of @coords is different from NULL, as this can 
cause some problems when cross-validating. If it is, try to figure out 
why. You can also set the row.names equal to NULL directly:

row.names(x...@coords) = NULL
although I dont think such manipulation of the slots of an object is 
usually recommended.


Cheers,
Jon

BTW, you will usually get more response to questions about spatial data 
handling using the list r-sig-geo 
(https://stat.ethz.ch/mailman/listinfo/r-sig-geo)



On 1/6/2011 4:00 PM, pearl may dela cruz wrote:

ear ALL,

The last part of my thesis analysis is the cross validation. Right now I am
having difficulty using the cross validation of gstat. Below are my commands
with the tsport_ace as the variable:

nfold- 3
part- sample(1:nfold, 69, replace = TRUE)
sel- (part != 1)
m.model- x2[sel, ]
m.valid- x2[-sel, ]
t- fit.variogram(v,vgm(0.0437, Exp, 26, 0))
cv69- krige.cv(tsport_ace ~ 1, x2, t, nfold = nrow(x2))

The last line gives an error saying:
Error in SpatialPointsDataFrame(coordinates(data),
data.frame(matrix(as.numeric(NA),  :
   row.names of data and coords do not match

I don't know what is wrong. The x2 data is a SpatialPointsdataframe that is why
i did not specify the location (as it will take it from the data). Here is the
usage of the function krige.cv:

krige.cv(formula, locations, data, model = NULL, beta = NULL, nmax = Inf,
 nmin = 0, maxdist = Inf, nfold = nrow(data), verbose = TRUE, ...)
I hope you can help me on this. Thanks a lot.
Best regards,
Pearl




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weighed mean of a data frame row-by-row

2011-01-07 Thread Vassilis


Thanks for the help guys! For my purpose I think that rharlow2's answer, i.e.
the `rowSums' function is the most appropriate since it also takes care of
the NAs. 

Best, 

Vassilis
-- 
View this message in context: 
http://r.789695.n4.nabble.com/weighed-mean-of-a-data-frame-row-by-row-tp3177421p3178912.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

2011-01-07 Thread Mike Marchywka





 Date: Fri, 7 Jan 2011 00:24:19 -0800
 From: dieter.me...@menne-biomed.de
 To: r-help@r-project.org
 Subject: Re: [R] Accessing data via url



 John Kane-2 wrote:
 
  # Can anyone suggest why this works
 
  datafilename -
  http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data;
  person.data - read.table(datafilename,header=TRUE)
 
  # but this does not?
 
  dd -
  https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt;
  treedata - read.table(dd, header=TRUE)
 
  ===
 
  Error in file(file, rt) : cannot open the connection
 

 Your original file is no longer there, but when I try RCurl with a png file
 that is present, I get a certificate error:

 Dieter

 
 library(RCurl)
 sessionInfo()
 dd -
 https://sites.google.com/site/jrkrideau/home/general-stores/history.png;
 x = getBinaryURL(dd)

 -
  sessionInfo()
 R version 2.12.1 (2010-12-16)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
 [5] LC_TIME=German_Germany.1252

 attached base packages:
 [1] stats graphics grDevices datasets utils methods base

 other attached packages:
 [1] RCurl_1.5-0.1 bitops_1.0-4.1

 loaded via a namespace (and not attached):
 [1] tools_2.12.1

  dd -
  https://sites.google.com/site/jrkrideau/home/general-stores/history.png;

  x = getBinaryURL(dd)
 Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
 SSL certificate problem, verify that the CA cert is OK. Details:
 error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
 failed


I think I replied to OP only using wget but puresumably there is similar option
for rcurl as -k on cmd line version. Network IO is unpredictable, you really
can use a few external tools from time to time. 

$ wget -O xxx -S -v --no-check-certificate --user-agent=Mozilla5.0 http://si
tes.google.com/site/jrkrideau/home/general-stores/trees.txt
--2011-01-06 16:00:01--  http://sites.google.com/site/jrkrideau/home/general-sto
res/trees.txt
Resolving sites.google.com (sites.google.com)... 74.125.229.3, 74.125.229.5, 74.
125.229.13, ...
Connecting to sites.google.com (sites.google.com)|74.125.229.3|:80... connected.

HTTP request sent, awaiting response...
  HTTP/1.0 404 Not Found
  Content-Type: text/html; charset=utf-8
  Date: Thu, 06 Jan 2011 22:00:05 GMT
  Expires: Thu, 06 Jan 2011 22:00:05 GMT
  Cache-Control: private, max-age=0
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
2011-01-06 16:00:01 ERROR 404: Not Found.


$ wget -O xxx -S -v --no-check-certificate --user-agent=Mozilla5.0 http://si
tes.google.com/site/jrkrideau/home/general-stores/history.png
--2011-01-07 05:43:00--  http://sites.google.com/site/jrkrideau/home/general-sto
res/history.png
Resolving sites.google.com (sites.google.com)... 74.125.229.11, 74.125.229.6, 74
.125.229.14, ...
Connecting to sites.google.com (sites.google.com)|74.125.229.11|:80... connected
.
HTTP request sent, awaiting response...
  HTTP/1.0 200 OK
  Content-Type: image/png
  X-Robots-Tag: noarchive
  Cache-Control: no-cache, no-store, max-age=0, must-revalidate
  Pragma: no-cache
  Expires: Fri, 01 Jan 1990 00:00:00 GMT
  Date: Fri, 07 Jan 2011 11:43:04 GMT
  Last-Modified: Wed, 28 Oct 2009 18:58:56 GMT
  ETag: 1256756336889
  Content-Length: 3817
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
  Connection: Keep-Alive
Length: 3817 (3.7K) [image/png]
Saving to: `xxx'

100%[==] 3,817   --.-K/s   in 0s

2011-01-07 05:43:00 (30.8 MB/s) - `xxx' saved [3817/3817]


$

$ curl -o xxx -k http://sites.google.com/site/jrkrideau/home/general-stores/hi
story.png
  % Total    % Received % Xferd  Average Speed   Time    Time Time  Current
 Dload  Upload   Total   Spent    Left  Speed
100  3817  100  3817    0 0  28916  0 --:--:-- --:--:-- --:--:-- 40606

$

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

With the ssl.verifypeer = FALSE argument it works:

 x = getBinaryURL(dd, ssl.verifypeer = FALSE)


On Fri, Jan 7, 2011 at 6:24 AM, Dieter Menne
dieter.me...@menne-biomed.dewrote:



 John Kane-2 wrote:
 
  #   Can anyone suggest why this works
 
  datafilename -
  http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data;
  person.data  - read.table(datafilename,header=TRUE)
 
  # but this does not?
 
  dd -
  https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt;
  treedata - read.table(dd, header=TRUE)
 
  ===
 
  Error in file(file, rt) : cannot open the connection
 

 Your original file is no longer there, but when I try RCurl with a png file
 that is present, I get a certificate error:

 Dieter

 
 library(RCurl)
 sessionInfo()
 dd -
 https://sites.google.com/site/jrkrideau/home/general-stores/history.png;
 x = getBinaryURL(dd)

 -
  sessionInfo()
 R version 2.12.1 (2010-12-16)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
 [5] LC_TIME=German_Germany.1252

 attached base packages:
 [1] stats graphics  grDevices datasets  utils methods   base

 other attached packages:
 [1] RCurl_1.5-0.1  bitops_1.0-4.1

 loaded via a namespace (and not attached):
 [1] tools_2.12.1

  dd -
  https://sites.google.com/site/jrkrideau/home/general-stores/history.png
 

  x = getBinaryURL(dd)
 Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
  SSL certificate problem, verify that the CA cert is OK. Details:
 error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
 failed



 --
 View this message in context:
 http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3178773.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting user specified variables from data frame to use as function arguments

2011-01-07 Thread nikhil abhyankar

Hello All;

I am writing an R program in the form of a function to find out the summary
statistics for response variable(s) in a data frame.

I need to accept variable names from the data as arguments and use them
inside the function for various tasks. The dataset name is also one of the
arguments to the program. The tasks include merging two datasets,
transposing a dataset or finding the summary stats of response variable(s)
using the input variables as the 'by' variables.

The variables specified as argument for the function are taken to be
character strings and I have not been able to use these to extract the
corresponding variables.

As an example, suppose I have an argument 'merge_by' which is used to
specify the variables used to merge two datasets. Typically, the input is
an period (with the double quotes) but could be more or less variables
than that.

How do I input the user specified variables and use them for tasks like
merging, transposing etc.?

A similar problem is with specifying the user defined by variables in
summaryBy.

I have played around with substitute, parse, eval etc. but without success.

Any suggestions?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to join matrices of different row length from a list

2011-01-07 Thread A.N. Spiess


Dear Emma,

there is a 'cbind.na', 'rbind.na' and 'data.frame.na' function in my qpcR
package.

library(qpcR)
matLis - list(matrix(1:4, 2, 2), matrix(1:6, 3, 2),
 matrix(2:1, 1, 2))
do.call(cbind.na, matLis)

They are essentially the generic functions extended with an internal fill.

You might also want to try these examples:

## binding
cbind.na(1, 1:7) # the '1' (= shorter vector) is NOT recycled but filled
cbind.na(1:8, 1:7, 1:5, 1:10) # same with many vectors
rbind.na(1:8, 1:7, 1:5, 1:10) # or in rows

a - matrix(rnorm(20), ncol = 4) # unequal size matrices
b - matrix(rnorm(20), ncol = 5)
cbind.na(a, b) # works, in contrast to original cbind
rbind.na(a, b) # works, in contrast to original rbind

## data frame with unequal size vectors
data.frame.na(A = 1:7, B = 1:5, C = letters[1:3], 
  D = factor(c(1, 1, 2, 2))) 
  
## convert a list with unequal length list items
## to a data frame
z - list(a = 1:5, b = letters[1:3], c = matrix(rnorm(20), ncol = 2))
do.call(data.frame.na, z)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-join-matrices-of-different-row-length-from-a-list-tp3177212p3178991.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Mike Marchywka

 Date: Thu, 6 Jan 2011 23:06:44 -0800
 From: peter.langfel...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, 
 scientific validity

 From a purely statistical and maybe somewhat naive point of view,
 published p-values should be corrected for the multiple testing that
 is effectively happening because of the large number of published
 studies. My experience is also that people will often try several
 statistical methods to get the most significant p-value but neglect to
 share that fact with the audience and/or at least attempt to correct
 the p-values for the selection bias.

You see this everywhere in one form or another from medical to financial
modelling. My solution here is simply to publish more raw data in a computer
readable form, in this case of course something easy to get with R,
so disinterested or adversarial parties can run their own analysis.
I think there was also a push to create a data base for failed drug
trials that may contain data of some value later. The value of R with
easily available data for a large cross section of users could be to moderate 
problems like the one cited here. 

I almost
slammed a poster here earlier who wanted a simple rule for when do I use
this test with something like  when your mom tells you to since post
hoc you do just about everything to assume you messed up and missed something
but a priori you hope you have designed a good hypothesis. And at the end of
the day, a given p-value is one piece of evidence in the overall objective
of learning about some system, not appeasing a sponsor. Personally I'm a big
fan of post hoc analysis on biotech data in some cases, especially as more 
pathway or other theory
is published, but it is easy to become deluded if you have a conclusion that you
know JUST HAS TO BE RIGHT. 

Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the
data I've been able to get tends to make me side with the FDA and I still hate 
the
idea of any regulation or access restrictions but it seems to be the only way
to keep sponsors honest to any extent. Your mileage
may vary however, take a look at some rather loud disagreement with FDA
over earlier DNDN panel results, possibly involving threats against critics. 
LOL.

 That being said, it would seem that biomedical sciences do make
 progress, so some of the published results are presumably correct :)

 Peter

 On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
  wrote:
   Part of the phenomenon can be explained by the natural censorship in
  what is accepted for publication:  Stronger results tend to have less
  difficulty getting published.  Therefore, given that a result is published,
  it is evident that the estimated magnitude of the effect is in average
  larger than it is in reality, just by the fact that weaker results are less
  likely to be published.  A study of the literature on this subject might
  yield an interesting and valuable estimate of the magnitude of this
  selection bias.

   A more insidious problem, that may not affect the work of Jonah Lehrer,
  is political corruption in the way research is funded, with less public and
  more private funding of research
  (http://portal.unesco.org/education/en/ev.php-URL_ID=21052URL_DO=DO_TOPICURL_SECTION=201.html).
   For example, I've heard claims (which I cannot substantiate right now) that
  cell phone companies allegedly lobbied successfully to block funding for
  researchers they thought were likely to document health problems with their
  products.  Related claims have been made by scientists in the US Food and
  Drug Administration that certain therapies were approved on political
  grounds in spite of substantive questions about the validity of the research
  backing the request for approval (e.g.,
  www.naturalnews.com/025298_the_FDA_scientists.html).  Some of these
  accusations of political corruption may be groundless.  However, as private
  funding replaces tax money for basic science, we must expect an increase in
  research results that match the needs of the funding agency while degrading
  the quality of published research.  This produces more research that can not
  be replicated -- effects that get smaller upon replication.  (My wife and I
  routinely avoid certain therapies recommended by physicians, because the
  physicians get much of their information on recent drugs from the
  pharmaceuticals, who have a vested interest in presenting their products in
  the most positive light.)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to export/save an mrpp object?

2011-01-07 Thread Nikos Alexandris

Nikos:

  I finally ran mrpp tests. I think all is fine but one very important
  issue: I
  have no idea how to export/save an mrpp object. Tried anything I
  know and
  searched the archives but found nothing.

David W:

 And what happened when you tried what seems like the obvious:
 
 save(mrpp_obj, file=)
 # rm(list=ls() )  # Only uncomment if you are ready for your workspace
 to clear
 load(mrpp_store.Rdata)

Right, clearing did the trick.

  Any ideas? Is really copy-pasting the mrpp results the only way?
 
 Many of us have no idea what such an object is, since you have not
 described the packages and functions used to create it. If you want an
 ASCII version then dput or dump are also available.

Multiresponse Permuation Procedures (MRPP) is implemented in the vegan 
package. The function mrpp() returns (an object of class mrpp) something 
like:

--%---
# check class
class ( samples_bitemporal_modis.0001.mrpp )
[1] mrpp

# check structure
str ( samples_bitemporal_modis.0001.mrpp )
List of 12
 $ call: language mrpp(dat = samples_bitemporal_modis.0001[, 1:5], 
grouping = samples_bitemporal_modis.0001[[Class]])
 $ delta   : num 0.126
 $ E.delta : num 0.202
 $ CS  : logi NA
 $ n   : Named int [1:5] 335 307 183 188 27
  ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned 
...
 $ classdelta  : Named num [1:5] 0.1255 0.1045 0.1837 0.0981 0.1743
  ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned 
...
 $ Pvalue  : num 0.001
 $ A   : num 0.378
 $ distance: chr euclidean
 $ weight.type : num 1
 $ boot.deltas : num [1:999] 0.202 0.202 0.202 0.203 0.202 ...
 $ permutations: num 999
 - attr(*, class)= chr mrpp
--%---

Now I've tried the following:

--%---
# 1. save(d) it
save ( samples_bitemporal_modis.0001.mrpp , file=exported.mrpp.R )

# 2. loade(d) it in a new object...
loadedmrpp - load ( exported.mrpp.R)

# 3. (tried) to check it...
str ( exported.mrpp.R)

 chr samples_bitemporal_modis.0001.mrpp

# it did not cross my mind immediately to...
get(loadedmrpp)

Call:
mrpp(dat = samples_bitemporal_modis.0001[, 1:5], grouping = 
samples_bitemporal_modis.0001[[Class]]) 

Dissimilarity index: euclidean 
Weights for groups:  n 

Class means and counts:

  Urban  Vegetation Bare ground Burned Water 
delta 0.1255 0.1045 0.1837  0.0981 0.1743
n 335307183 18827

Chance corrected within-group agreement A: 0.3778 
Based on observed delta 0.1258 and expected delta 0.2022 

Significance of delta: 0.001 
Based on  999  permutations

# ...or to work on a clean workspace!
--%---

Thank you David. Cheers, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Currency return calculations

2011-01-07 Thread Amelia Vettori



Dear sir, I am extremely sorry for messing up the logic
asking for help w.r.t. my earlier mails 

 

I have tried to explain below what I am looking for.

 

 

I have a database (say, currency_rates) storing datewise
currency exchange rates with some base currency XYZ.

 

currency_rates - data.frame(date =
c(12/31/2010, 12/30/2010, 12/29/2010,
12/28/2010, 12/27/2010,12/24/2010, 12/23/2010,
12/22/2010, 12/21/2010, 12/20/2010), 

USD  = c(112.05,
112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 109.33, 111.88),

GBP  = c(171.52,
168.27,169.03, 169.64, 169.29, 169.47, 170.9, 168.69, 170.9, 169.96),

EURO = c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07,
42.23, 44.55, 41.12),

CHF  = c(41.5, 41.47,
42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 43.79),

AUD  = c(109.55,
102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 110.19, 121.58, 103.46))

 

I have a portfolio consisting of some of these currencies. 

 

At this moment, suppose my portfolio has following currency
transactions. i.e following is my current portfolio and

has 2 USD transactions, 2 EURO transactions and a CHF
transactions.

 

portfolio_currency_names = c(USD,
USD, EURO, CHF, EURO,
USD)

 

 

# 

 

My objective is AS PER THE PORTFOLIO, I need to generate a
data.frame giving respective currency returns.

 

Thus, I need to have an output like

 

 USD    USD         EURO  CHF       
EURO  
     USD

-0.0076    -0.0076     
0.0007    0.0007     0.0007     -0.0076

 0.0183     0.0183      0.0194   -0.0325     
0.0194       0.0183

 0.0111     0.0111     -0.0659   -0.0139    
-0.0659      0.0111

 0.0142     0.0142      0.0128   -0.0057     
0.0128      0.0142

-0.0287    -0.0287     -0.0099    0.0323    
-0.0099     -0.0287

-0.0113    -0.0113     
0.0820    0.0059     0.0820     -0.0113

 0.0327     0.0327     -0.0279   
0.0197    -0.0279     
0.0327

-0.0042    -0.0042     -0.0535   -0.0364    
-0.0535     -0.0042

-0.0231    -0.0231     
0.0801   -0.0238     0.0801     -0.0231

 

Thus, my requirement is to have the dataframe as per the
composition of my portfolio. Thus, if there are only 2 transactions i.e. if my 
portfolio contains say only CHF
and AUD, I need the return calculations done only forCHF and AUD.

 

 

CHF        AUD

0.0007   0.0663

-0.0325    -0.1141

-0.0139    -0.0638

-0.0057     0.0029

0.0323     -0.0150

0.0059      0.2112

0.0197     -0.0934

-0.0364    -0.0984

-0.0238     0.1614

 

I once again apologize for not asking my requirement
properly thereby causing not only inconvenience to all of you, but also wasting
your valuable time. Its not that I wasn't careful while asking for guidance for
my requirement, I wasn't clear about it. I am sorry for the same once again.

 

I request you to please help me.

 

Amelia Vettori




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Match numeric vector against rows in a matrix?

Try this:

 which(colSums(t(combs) == x) == ncol(combs))

On Wed, Jan 5, 2011 at 5:16 PM, Kevin Ummel kevinum...@gmail.com wrote:

 Two posts in one day is not a good day...and this question seems like it
 should have an obvious answer:

 I have a matrix where rows are unique combinations of 1's and 0's:

  combs=as.matrix(expand.grid(c(0,1),c(0,1)))
  combs
 Var1 Var2
 [1,]00
 [2,]10
 [3,]01
 [4,]11

 I want a single function that will give the row index containing an exact
 match with vector x:

  x=c(0,1)

 The solution needs to be applied many times, so I need something quick -- I
 was hoping a base function would do it, but I'm drawing a blank.

 Thanks!
 Kevin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Odp: Currency return calculations

2011-01-07 Thread Petr PIKAL

Hi

What is wrong with my suggestion then.

 rate1
  USDGBP  EURO   CHFAUD
1  112.05 171.52 42.71 41.50 109.55
2  112.90 168.27 42.68 41.47 102.52
3  110.85 169.03 41.86 42.84 114.91
4  109.63 169.64 44.71 43.44 122.48
5  108.08 169.29 44.14 43.69 122.12
6  111.23 169.47 44.58 42.30 123.96
7  112.49 170.90 41.07 42.05 100.36
8  108.87 168.69 42.23 41.23 110.19
9  109.33 170.90 44.55 42.76 121.58
10 111.88 169.96 41.12 43.79 103.46
 portfolio-c(USD, USD, CHF, AUD, USD)
 log(rate1[-1,portfolio]/rate1[-nrow(rate1),portfolio])
USDUSD.1  CHF  AUDUSD.2
2   0.007557271  0.007557271 -0.000723153 -0.066323165  0.007557271
3  -0.018324535 -0.018324535  0.032501971  0.114091312 -0.018324535
4  -0.011066876 -0.011066876  0.013908430  0.063798538 -0.011066876
5  -0.014239366 -0.014239366  0.005738567 -0.002943583 -0.014239366
6   0.028728436  0.028728436 -0.032332157  0.014954765  0.028728436
7   0.011264199  0.011264199 -0.005927700 -0.211195211  0.011264199
8  -0.032709819 -0.032709819 -0.019693240  0.093442427 -0.032709819
9   0.004216322  0.004216322  0.036436939  0.098366334  0.004216322
10  0.023056037  0.023056037  0.023802395 -0.161387418  0.023056037
 

As I said instead fiddling with several loop/if/function/variables attempt 
it seems to me better to use powerful R indexing and whole object 
approach where it is possible.

Regards
Petr



Amelia Vettori amelia_vett...@yahoo.co.nz napsal dne 07.01.2011 
13:46:53:

 Dear sir, I am extremely sorry for messing up the logic asking for help 
w.r.t.
 my earlier mails 
 
 I have tried to explain below what I am looking for.
 
 
 I have a database (say, currency_rates) storing datewise currency 
exchange 
 rates with some base currency XYZ.
 
 currency_rates - data.frame(date = c(12/31/2010, 12/30/2010, 
12/29/
 2010, 12/28/2010, 12/27/2010,12/24/2010, 12/23/2010, 
12/22/2010, 
 12/21/2010, 12/20/2010), 
 USD  = c(112.05, 112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 
109.33, 111.88),
 GBP  = c(171.52, 168.27,169.03, 169.64, 169.29, 169.47, 170.9, 168.69, 
170.9, 169.96),
 EURO = c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07, 42.23, 44.55, 
41.12),
 CHF  = c(41.5, 41.47, 42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 
43.79),
 AUD  = c(109.55, 102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 110.19, 
121.
 58, 103.46))
 
 I have a portfolio consisting of some of these currencies. 
 
 At this moment, suppose my portfolio has following currency 
transactions. i.e 
 following is my current portfolio and
 has 2 USD transactions, 2 EURO transactions and a CHF transactions.
 
 portfolio_currency_names = c(USD, USD, EURO, CHF, EURO, USD)
 
 
 # 
 
 My objective is AS PER THE PORTFOLIO, I need to generate a data.frame 
giving 
 respective currency returns.
 
 Thus, I need to have an output like
 
  USDUSD EURO  CHF  
 EUROUSD
 -0.0076-0.0076  0.0007   0.0007   0.
 0007-0.0076
  0.01830.0183  0.0194   -0.0325   0.
 0194   0.0183
  0.01110.0111-0.0659   -0.0139  -0.
 0659  0.0111
  0.01420.0142  0.0128   -0.0057   0.
 0128  0.0142
 -0.0287-0.0287-0.0099   0.0323  -0.
 0099-0.0287
 -0.0113-0.0113  0.0820   0.0059   0.
 0820-0.0113
  0.03270.0327-0.0279   0.0197 -0.
 0279  0.0327
 -0.0042-0.0042-0.0535   -0.0364   -0.
 0535-0.0042
 -0.0231-0.0231  0.0801   -0.02380.
 0801-0.0231
 
 Thus, my requirement is to have the dataframe as per the composition of 
my 
 portfolio. Thus, if there are only 2 transactions i.e. if my portfolio 
 contains say only CHF and AUD, I need the return calculations done only 
forCHF and AUD.
 
 
 CHFAUD
 0.0007   0.0663
 -0.0325-0.1141
 -0.0139-0.0638
 -0.0057 0.0029
 0.0323 -0.0150
 0.0059  0.2112
 0.0197 -0.0934
 -0.0364-0.0984
 -0.0238 0.1614
 
 I once again apologize for not asking my requirement properly thereby 
causing 
 not only inconvenience to all of you, but also wasting your valuable 
time. Its
 not that I wasn't careful while asking for guidance for my requirement, 
I 
 wasn't clear about it. I am sorry for the same once again.
 
 I request you to please help me.
 
 Amelia Vettori
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make a Cluster of Clusters

2011-01-07 Thread Diego Pujoni

Hi Michael,

I agree with you and I will make this ordination. But I also want to
check a spatial correlation of the variables, so I thought that
comparing the dendrogram of the environmental variables with the
dendrogram of the geographical distances of the lakes it will
indicates if similar lakes are next to each other. But I have just one
geographical coordinate for each lake, but 12 measures of
environmental variables. How can I analyse this?

Thank you very much for the attention


Diego PJ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross validation for Ordinary Kriging

2011-01-07 Thread Jon Olav Skoien


On 1/7/2011 12:40 PM, Jon Olav Skoien wrote:

Pearl,

The error suggests that there is something wrong with x2, and that 
there is a difference between the row names of the coordinates and the 
data. If you call

str(x2)
see if the first element of @coords is different from NULL, as this 
can cause some problems when cross-validating. If it is, try to figure 
out why. You can also set the row.names equal to NULL directly:

row.names(x...@coords) = NULL
although I dont think such manipulation of the slots of an object is 
usually recommended.


Pearl,

It seems the problem was caused by a recent change in sp without 
updating gstat, the maintainer has fixed it and submitted new version of 
gstat to CRAN. So you should be able to use your original script after 
downloading the new version, probably available in a couple of days. In 
the mean time the suggestion above should still work.


Cheers,
Jon



Cheers,
Jon

BTW, you will usually get more response to questions about spatial 
data handling using the list r-sig-geo 
(https://stat.ethz.ch/mailman/listinfo/r-sig-geo)



On 1/6/2011 4:00 PM, pearl may dela cruz wrote:

ear ALL,

The last part of my thesis analysis is the cross validation. Right 
now I am
having difficulty using the cross validation of gstat. Below are my 
commands

with the tsport_ace as the variable:

nfold- 3
part- sample(1:nfold, 69, replace = TRUE)
sel- (part != 1)
m.model- x2[sel, ]
m.valid- x2[-sel, ]
t- fit.variogram(v,vgm(0.0437, Exp, 26, 0))
cv69- krige.cv(tsport_ace ~ 1, x2, t, nfold = nrow(x2))

The last line gives an error saying:
Error in SpatialPointsDataFrame(coordinates(data),
data.frame(matrix(as.numeric(NA),  :
   row.names of data and coords do not match

I don't know what is wrong. The x2 data is a SpatialPointsdataframe 
that is why
i did not specify the location (as it will take it from the data). 
Here is the

usage of the function krige.cv:

krige.cv(formula, locations, data, model = NULL, beta = NULL, nmax = 
Inf,
 nmin = 0, maxdist = Inf, nfold = nrow(data), verbose = TRUE, 
...)

I hope you can help me on this. Thanks a lot.
Best regards,
Pearl




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dont show zero values in line graph

2011-01-07 Thread Uwe Ligges




On 07.01.2011 04:10, LCOG1 wrote:


Hey everyone,
Im getting better at plotting my data but cant for the life of me figure
out how to show a line graph with missing data that doesnt continue the line
down to zero then back up to the remaining values.

Consider the following
x-c(1:5,0,0,8:10)
y-1:10

plot(0,0,xlim=c(0,10), ylim=c(0,10),type=n,main=Dont show the bloody 0
values!!)
lines(x~y, col=blue, lwd=2,)

My data is missing the 6th and 7th values and they come in as NA's so i
change them to 0s but then the plot has these ugly lines that dive toward
the x axis then back up.  I would do bar plots but i need to show multiple
sets of data on the same and side by side bars doesnt do it for me.

So i need a line graph that starts and stops where 0s or missing values
exist.  Thoughts?



If I understand correctly what you are going to do: Just do not change 
the NAs to zero in advance. NAs are not printed.


Uwe Ligges


JR


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Prediction error for Ordinary Kriging

2011-01-07 Thread Jon Olav Skoien


Pearl,

You find the prediction error as the var1.var column in your result 
object, i.e., y in your script. For plotting:


spplot(y, 2)
or
spplot(y,var1.var)

Jon



On 1/5/2011 9:28 PM, pearl may dela cruz wrote:

Hi ALL,

Can you please help me on how to determine the prediction error for ordinary
kriging?Below are all the commands i used to generate the OK plot:

rsa2- readShapeSpatial(residentialsa, CRS(+proj=tmerc
+lat_0=39.66 +lon_0=-8.1319062 +k=1 +x_0=0 +y_0=0
+ellps=intl +units=m +no_defs))
x2- readShapeSpatial(ptna2, CRS(+proj=tmerc +lat_0=39.66
+lon_0=-8.1319062 +k=1 +x_0=0 +y_0=0 +ellps=intl +units=m +no_defs))
bb- bbox(rsa2)
cs- c(1, 1)
cc- bb[, 1] + (cs/2)
cd- ceiling(diff(t(bb))/cs)
rsa2_grd- GridTopology(cellcentre.offset = cc,cellsize = cs, cells.dim = cd)
getClass(SpatialGrid)
p4s- CRS(proj4string(rsa2))
x2_SG- SpatialGrid(rsa2_grd, proj4string = p4s)
x2_SP- SpatialPoints(cbind(x2$X, x2$Y))
v- variogram(log1p(tsport_ace) ~ 1, x2, cutoff=100, width=9)
te- fit.variogram(v,vgm(0.0437, Exp, 26, 0))
y- krige(tsport_ace~1, x2, x2_SG, model = ve.fit)
spplot(y, 1, col.regions = bpy.colors(100), sp.layout = list(sp.lines,as(rsa2,
SpatialLines),no.clip = TRUE))

  I'm looking forward to your response. Thanks.

Best regards,
Pearl dela Cruz



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cairo pdf canvas size

2011-01-07 Thread Eduardo de Oliveira Horta

Thanks!

On Thu, Jan 6, 2011 at 7:13 PM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 On Thu, Jan 6, 2011 at 5:36 AM, Eduardo de Oliveira Horta
 eduardo.oliveiraho...@gmail.com wrote:

 Peter,
 thank you, that's what I was looking for!
 David, I forgot to tell you my OS. Sorry... it's Win7. I'm running a
 RKWard session.
 And this is strange:
  Cairo(example.pdf, type=pdf,width=12,height=12,units=cm,dpi=300)
 Error: could not find function Cairo
 ... maybe you're not using the Cairo
 package? http://cran.r-project.org/web/packages/Cairo/Cairo.pdf

 And Dennis, thanks for the code. It worked, and I'm considering to adopt
 data frames in the near future. By the way, I'm working with functional time
 series, so each observation is a function (or a vector representing that
 function evaluated on a grid) indexed by time. Any insights on how to
 implement data frames here?

 I don't see a real issue. It would be easier to give you concrete
 information if there were an artificial example that mimics your situation,
 but it's not that hard.  I'd suggest looking into the zoo package to create
 a series - it can handle both regular (zooreg()) and irregular (zoo())
 series. Basically, a zoo object is a numeric vector with a time index. One
 can create multiple series with a single index, individual series with
 different indices that can be combined into data frames, etc. I've browsed
 through some of the code that accompanies Ramsey, Hooker and Graves' FDA
 book in R and Matlab, and occasionally they use the zoo package as well.

 Here's an example, but I expect that someone will show how to convert the
 zoo series to data frames much more efficiently for use in ggplot2...

 library(zoo)
 library(ggplot2)
 library(lattice)
 # Generate three daily series with different start times and lengths
 a - zoo(rnorm(450), as.Date(2005-01-01) + 0:449)
 b - zoo(rnorm(600, 1, 2), as.Date('2005-06-01') + 0:599)
 d - zoo(rnorm(300, 2, 1), as.Date('2004-09-01') + 0:299)

 # Convert to data frame, make time index a variable and make sure it's a
 Date object
 A - as.data.frame(a)
 B - as.data.frame(b)
 D - as.data.frame(d)
 A$Date - as.Date(rownames(A))
 B$Date - as.Date(rownames(B))
 D$Date - as.Date(rownames(D))
 # Give all three series the same name
 names(A)[1] - names(B)[1] - names(D)[1] - 'y'
 # Stack the three data frames and create a series ID variable
 comb - rbind(A, B, D)
 comb$Series - rep(c('A', 'B', 'D'), c(nrow(A), nrow(B), nrow(D)))
 str(comb)    # make sure that Date is a Date object

 # ggplot of the three series
 ggplot(comb, aes(x = Date, y = y, color = Series)) + geom_path()
 # Stacked individual plots (faceted)
 last_plot() + facet_grid(Series ~ .)

 # lattice version
 xyplot(y ~ Date, data = comb, groups = Series, type = 'l', col.line = 1:3)
 # Stacked individual series
 xyplot(y ~ Date | Series, data = comb, type = 'l', layout = c(1, 3))

 If you need the grid coordinates, use expand.grid() - it can be used when
 creating a data frame, too.

 As Bert noted the other night in another thread, one can use xyplot directly
 on zoo objects, but I don't have any direct experience with that yet so will
 defer to others if they wish to contribute. ?xyplot.zoo provides some
 examples.

 Hope this gives you some idea of what can be done,
 Dennis

 Best regards,
 Eduardo
 On Thu, Jan 6, 2011 at 1:47 AM, Peter Langfelder
 peter.langfel...@gmail.com wrote:

 On Wed, Jan 5, 2011 at 7:35 PM, Eduardo de Oliveira Horta
 eduardo.oliveiraho...@gmail.com wrote:
  Something like this:
 
  u=seq(from=-pi, to=pi, length=1000)
  f=sin(u)
  Cairo(example.pdf, type=pdf,width=12,height=12,units=cm,dpi=300)
  par(cex.axis=.6,col.axis=grey,ann=FALSE, lwd=.25,bty=n, las=1,
  tcl=-.2,
  mgp=c(3,.5,0))
  xlim=c(-pi,pi)
  ylim=round(c(min(f),max(f)))
  plot(u,f,xlim,ylim,type=l,col=firebrick3, axes=FALSE)
  axis(side=1, lwd=.25, col=darkgrey, at=seq(from=xlim[1], to=xlim[2],
  length=5))
  axis(side=2, lwd=.25, col=darkgrey, at=seq(from=ylim[1], to=ylim[2],
  length=5))
  abline(v=seq(from=xlim[1], to=xlim[2], length=5), lwd=.25,lty=dotted,
  col=grey)
  abline(h=seq(from=ylim[1], to=ylim[2], length=5), lwd=.25,lty=dotted,
  col=grey)
  dev.off()
 
 


 Wow, you must like light colors :)

 To the point, just set margins, for example

 par(mar = c(2,2,0.5, 0.5))

 (margins are bottom, left, top, right)

 after the Cairo command.

 BTW, Cairo doesn't work for me either... but I tried your example by
 plotting to the screen.

 Peter




  Notice how the canvas' margins are relatively far from the plotting
 area.
 
  Thanks,
 
  Eduardo
 
  On Thu, Jan 6, 2011 at 1:00 AM, David Winsemius
  dwinsem...@comcast.netwrote:
 
 
  On Jan 5, 2011, at 9:38 PM, Eduardo de Oliveira Horta wrote:
 
   Hello,
 
  I want to save a pdf plot using Cairo, but the canvas of the saved
  file
  seems too large when compared to the actual plotted area.
 
  Is there a way to control the relation between the canvas size and
  the
  size
  of actual plotting area?

Re: [R] Parsing JSON records to a dataframe

2011-01-07 Thread Martin Morgan

On 01/07/2011 12:05 AM, Dieter Menne wrote:
 
 
 Jeroen Ooms wrote:

 What is the most efficient method of parsing a dataframe-like structure
 that has been json encoded in record-based format rather than vector
 based. For example a structure like this:

 [ {name:joe, gender:male, age:41}, {name:anna,
 gender:female, age:23} ]

 RJSONIO parses this as a list of lists, which I would then have to apply
 as.data.frame to and append them to an existing dataframe, which is
 terribly slow. 


 
 unlist is pretty fast. The solution below assumes that you know how your
 structure is, so it is not very flexible, but it should show you that the
 conversion to data.frame is not the bottleneck.
 
 # json
 library(RJSONIO)
 # [ {name:joe, gender:male, age:41},
 #  {name:anna, gender:female, age:23} ]
 n = 30
 d = data.frame(name=rep(c(joe,anna),n),
gender=rep(c(male,female),n),
age = rep(c(23,41),n))
 dj = toJSON(d)

This doesn't create the required structure

 cat(dj)
{
 name: [ joe, anna, joe, anna ],
   gender: [ male, female, male, female ],
   age: [ 23, 41, 23, 41 ]
}

instead

library(rjson)
n - 1000
name - apply(matrix(sample(letters, n * 5, TRUE), n),
  1, paste, collapse=)
gender - sample(c(male, female), n, TRUE)
age - ceiling(runif(n, 20, 60))
recs - sprintf('{name: %s, gender:%s, age:%d}',
name, gender, age)
j - sprintf([%s], paste(recs, collapse=,))
lol - fromJSON(j)

and then with

f - function(lst)
function(nm) unlist(lapply(lst, [[, nm), use.names=FALSE)

 oopt - options(stringsAsFactors=FALSE) # convenience for 'identical'
 system.time({
+ df0 - as.data.frame(Map(f(lol), names(lol[[1]])))
+ })
   user  system elapsed
  0.006   0.000   0.006

versus for instance

 system.time({
+ df1 - do.call(rbind, lapply(lol, data.frame))
+ })
   user  system elapsed
  1.497   0.000   1.500
 identical(df0, df1)
[1] TRUE

Martin


 
 system.time(d1 - fromJSON(dj))
 #  user  system elapsed
 #   4.060.264.32
 
 system.time(
   dd - data.frame(
 name = unlist(d1$name),
 gender = unlist(d1$gender),
 age=as.numeric(unlist(d1$age)))
 )
 #   user  system elapsed
 #   1.130.051.18
 
 
 
 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Spencer Graves

I wholeheartedly agree with the trend towards publishing
datasets. One way to do that is as datasets in an R package contributed
to CRAN.

Beyond this, there seems to be an increasing trend towards
journals requiring authors of scientific research to publish their data
as well. The Public Library of Science (PLOS) has such a policy, but it
is not enforced: Savage and Vickers (2010) were able to get the raw
data behind only one of ten published articles they tried, and that one
came only after reminding the author that s/he had agreed to making the
data available as a condition of publishing in PLOS. (Four other
authors refused to share their data in spite of their legal and moral
commitment to do so as a condition of publishing in PLOS.)

There are other venues for publishing data. For example, much
astronomical data is now routinely web published so anyone interested
can test their pet algorithm on real data
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomical-data).

Regarding my earlier comment, I just found a Wikipedia article on
scientific misconduct that mentioned the tendency to refuse to publish
research that proves your new drug is positively harmful. This is an
extreme version of both types of bias I previously mentioned: (1) only
significant results get published. (2) private funding provides its own
biases.

Spencer

#
Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors
Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26,
2010
(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2

http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2/).

On 1/7/2011 4:08 AM, Mike Marchywka wrote:

Date: Thu, 6 Jan 2011 23:06:44 -0800
From: peter.langfel...@gmail.com
To: r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity

From a purely statistical and maybe somewhat naive point of view,
published p-values should be corrected for the multiple testing that
is effectively happening because of the large number of published
studies. My experience is also that people will often try several
statistical methods to get the most significant p-value but neglect to
share that fact with the audience and/or at least attempt to correct
the p-values for the selection bias.
You see this everywhere in one form or another from medical to financial
modelling. My solution here is simply to publish more raw data in a computer
readable form, in this case of course something easy to get with R,
so disinterested or adversarial parties can run their own analysis.
I think there was also a push to create a data base for failed drug
trials that may contain data of some value later. The value of R with
easily available data for a large cross section of users could be to moderate
problems like the one cited here.

I almost
slammed a poster here earlier who wanted a simple rule for when do I use
this test with something like when your mom tells you to since post
hoc you do just about everything to assume you messed up and missed something
but a priori you hope you have designed a good hypothesis. And at the end of
the day, a given p-value is one piece of evidence in the overall objective
of learning about some system, not appeasing a sponsor. Personally I'm a big
fan of post hoc analysis on biotech data in some cases, especially as more
pathway or other theory
is published, but it is easy to become deluded if you have a conclusion that
you
know JUST HAS TO BE RIGHT.

Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the
data I've been able to get tends to make me side with the FDA and I still
hate the
idea of any regulation or access restrictions but it seems to be the only way
to keep sponsors honest to any extent. Your mileage
may vary however, take a look at some rather loud disagreement with FDA
over earlier DNDN panel results, possibly involving threats against critics.
LOL.

That being said, it would seem that biomedical sciences do make
progress, so some of the published results are presumably correct :)

Peter

On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
wrote:
Part of the phenomenon can be explained by the natural censorship in
what is accepted for publication: Stronger results tend to have less
difficulty getting published. Therefore, given that a result is published,
it is evident that the estimated magnitude of the effect is in average
larger than it is in reality, just by the fact that weaker results are less
likely to be published. A study of the literature on this subject might
yield an interesting and valuable estimate of the magnitude of this
selection bias.

A more insidious problem, that may not

Re: [R] Different LLRs on multinomial logit models in R and SPSS

2011-01-07 Thread sovo0815


On Thu, 6 Jan 2011, David Winsemius wrote:


On Jan 6, 2011, at 11:23 AM, Sören Vogel wrote:


Thanks for your replies. I am no mathematician or statistician by far,
however, it appears to me that the actual value of any of the two LLs
is indeed important when it comes to calculation of
Pseudo-R-Squared-s. If Rnagel devides by (some transformation of) the
actiual value of llnull then any calculation of Rnagel should differ.
How come? Or is my function wrong? And if my function is right, how
can I calculate a R-Squared independent from the software used?


You have two models in that function, the null one with .~ 1 and the 
origianl one and you are getting a ratio on the likelihood scale (which is a 
difference on the log-likelihood or deviance scale).


If this is the case, calculating 'fit' indices for those models 
must end up in different fit indices depending on software:


n - 143
ll1 - 135.02
ll2 - 129.8
# Rcs
(Rcs - 1 - exp( (ll2 - ll1) / n ))
# Rnagel
Rcs / (1 - exp(-ll1/n))
ll3 - 204.2904
ll4 - 199.0659
# Rcs
(Rcs - 1 - exp( (ll4 - ll3) / n ))
# Rnagel
Rcs / (1 - exp(-ll3/n))

The Rcs' are equal, however, the Rnagel's are not. Of course, this 
is no question, but I am rather confused. When publishing results 
I am required to use fit indices and editors would complain that 
they differ.


Sören__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems with glht function for lme object

2011-01-07 Thread anord


Dear all, 

I'm trying to make multiple comparisons for an lme-object. The data is for
an experiment on parental work load in birds, in which adults at different
sites were induced to work at one of three levels ('treat'; H, M, L). The
response is 'feedings', which is a quantitative measure of nest provisioning
per parent per chick per hour. Site is included as a random effect (one
male/female pair per site). 

My final model takes the following form:
feedings ~ treat + year + data^2, random = ~1|site,data=feed.df

For this model, I would like to do multiple comparisons on 'treat', using
the multcomp package:
summary(glht(m4.feed,linfct=mcp(treat=Tukey)))

However, this does not work, and I get the below error message.

Error in if (is.null(pkg) | pkg == nlme) terms(formula(x)) else slot(x,  : 
  argument is of length zero
Error in factor_contrasts(model) : 
  no ‘model.matrix’ method for ‘model’ found!
 
I suspect this might have quite a straightforward solution, but I'm stuck at
this point.
Any help would be most appreciated. Sample data below.

Kind regards, 
Andreas Nord
Sweden

==
feedings sex site  treat year  date^2
1.8877888   M  838 H 2009  81
1.9102787   M  247 H 2009  81
1.4647229   M  674 H 2010  121
1.4160590   M 7009 M 2009  144
1.3106749   M  863 M 2010  196
1.2718121   M   61 M 2009  225
1.2799263   M  729 L 2009  256
1.5829564   M  629 L 2009  256
1.4847251   M  299 L 2010  324
1.2463151   M  569 L 2010  324
2.1694169   F  838 H 2009  81
1.5966899   F  247 H 2009  81
2.4136983   F  674 H 2010  121
1.7784873   F 7009 M 2009  144
1.6681317   F  863 M 2010  196
2.3691275   F   61 M 2009  225
2.0672192   F  729 L 2009  256
1.6389902   F  629 L 2009  256
0.9307536   F  299 L 2010  324
1.6786767   F  569 L 2010  324
==


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problems-with-glht-function-for-lme-object-tp3179128p3179128.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

2011-01-07 Thread Dieter Menne



Henrique Dallazuanna wrote:
 
 With the ssl.verifypeer = FALSE argument it works:
 
  x = getBinaryURL(dd, ssl.verifypeer = FALSE)
 
 

Thank, good to know. It's only in the examples of ..., but is looks like a
parameter important enough to be included in the docs of getBinaryURL.
Digging through CURL docs can be daunting.

Dieter

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3179197.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] defining a formula method for a weighted lm()

2011-01-07 Thread Michael Friendly


Thanks, Martin

Now I understand 'standard non-standard evaluation' magic, and the code in
http://developer.r-project.org/model-fitting-functions.txt
explains how this works.

Still, I can't help but think of this as evil-magic, for which some 
anti-magic would be extremely useful, so that

a simple function like

my.lm - function(formula, data, subset, weights, ...) {
lm(formula, data, subset, weights, ...)
}

would work as expected. Oh dear, I think I need some syntactic sugar in 
my coffee!


-Michael


On 1/6/2011 2:16 PM, Martin Maechler wrote:

Michael Friendlyfrien...@yorku.ca
 on Thu, 06 Jan 2011 09:33:25 -0500 writes:

   No one replied to this, so I'll try again, with a simple example.  I
   calculate a set of log odds ratios, and turn them into a data frame as
   follows:

   library(vcdExtra)
   (lor.CM- loddsratio(CoalMiners))
   log odds ratios for Wheeze and Breathlessness by Age

   25-2930-3435-3940-4445-4950-5455-5960-64
   3.695261 3.398339 3.140658 3.014687 2.782049 2.926395 2.440571 2.637954
 
   (lor.CM.df- as.data.frame(lor.CM))
   Wheeze Breathlessness   Age  LORASE
   1  W:NoW  B:NoB 25-29 3.695261 0.16471778
   2  W:NoW  B:NoB 30-34 3.398339 0.07733658
   3  W:NoW  B:NoB 35-39 3.140658 0.03341311
   4  W:NoW  B:NoB 40-44 3.014687 0.02866111
   5  W:NoW  B:NoB 45-49 2.782049 0.01875164
   6  W:NoW  B:NoB 50-54 2.926395 0.01585918
   7  W:NoW  B:NoB 55-59 2.440571 0.01452057
   8  W:NoW  B:NoB 60-64 2.637954 0.02159903

   Now I want to fit a linear model by WLS, LOR ~ Age, which can do 
directly as

   lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df)

   Call:
   lm(formula = LOR ~ as.numeric(Age), data = lor.CM.df, weights = 1/ASE)

   Coefficients:
   (Intercept)  as.numeric(Age)
   3.5850  -0.1376

   But, I want to do the fitting in my own function, the simplest version 
is

   my.lm- function(formula, data, subset, weights) {
   lm(formula, data, subset, weights)
   }

   But there is obviously some magic about formula objects and evaluation
   environments, because I don't understand why this doesn't work.


   my.lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df)
   Error in model.frame.default(formula = formula, data = data, subset =
   subset,  :
   invalid type (closure) for variable '(weights)'
 

Yes, the magic has been called standard non-standard evaluation
for a while (since August 2002, to be precise),
and the http://developer.r-project.org/ web page has had two
very relevant links since then, namely those mentioned in the
following two lines there:

# Description of the nonstandard evaluation rules in R 1.5.1 and some 
suggestions. (updated). Also an R function and docn for making model frames 
from multiple formulas.

# Notes on model-fitting functions in R, and especially on how to enable all 
the safety features.


For what you want, I think (but haven't tried) the second link, which is
http://developer.r-project.org/model-fitting-functions.txt
is still very relevant.

Many many people (package authors) had to use something like
that or just directly taken the lm function as an example..
{{ but then probably failed the more subtle points on how to
program residuals() , predict() , etc functions which you can
also learn from model-fitting-functions.txt}}


   A second question: Age is a factor, and as.numeric(Age) gives me 1:8.

   What simple expression on lor.CM.df$Age would give me either the lower
   limits (here: seq(25, 60, by = 5)) or midpoints of these Age intervals
   (here: seq(27, 62, by = 5))?

With

   data(CoalMiners, package = vcd)

here are some variations :

(Astr- dimnames(CoalMiners)[[3]])
  [1] 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
sapply(lapply(strsplit(Astr, -), as.numeric), `[[`, 1)
  [1] 25 30 35 40 45 50 55 60
sapply(lapply(strsplit(Astr, -), as.numeric), `[[`, 2)
  [1] 29 34 39 44 49 54 59 64
sapply(lapply(strsplit(Astr, -), as.numeric), mean)
  [1] 27 32 37 42 47 52 57 62

Or use the 2-row matrix and apply(*, 1)  to that :

sapply(strsplit(Astr, -), as.numeric)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
  [1,]   25   30   35   40   45   50   55   60
  [2,]   29   34   39   44   49   54   59   64

Regards,
Martin Maechler, ETH Zurich




--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] Problem with timeSequence {timeDate} - wrong end date

2011-01-07 Thread Martin Maechler

 Joshua Wiley jwiley.ps...@gmail.com
 on Thu, 6 Jan 2011 06:47:43 -0800 writes:

 On Thu, Jan 6, 2011 at 5:27 AM, Joshua Wiley jwiley.ps...@gmail.com 
wrote:
 timeSequence() ultimately is relying on seq.POSIXt().  If you look at

 My apologies, I spoke nonsense---timeSequence() does NOT rely on
 seq.POSIXt().  The timeDate package has its own method defined for
 seq() which is what gets dispatched.  Still, the behavior is similar:

 timeSequence(from = 2010-01-15, to = 2010-02-01, by = 1 month)
 GMT
 [1] [2010-01-15] [2010-02-15]
 
 seq.POSIXt(from = as.POSIXct(2010-01-15),
 +   to = as.POSIXct(2010-02-01), by = 1 month)
 [1] 2010-01-15 PST 2010-02-15 PST


 which is not surprising because the code is responsible for this
 behavior is similar:

 ## seq.timeDate* ##
 else if (valid == 6) {
 if (missing(to)) {
 mon - seq.int(r1$mon, by = by, length.out = length.out)
 }
 else {
 to - as.POSIXlt(to)
 mon - seq.int(r1$mon, 12 * (to$year - r1$year) +
 to$mon, by)
 }
 r1$mon - mon
 r1$isdst - -1
 res - as.POSIXct(r1)
 }

 ## seq.POSIXt ##
 else if (valid == 6L) {
 if (missing(to)) {
 mon - seq.int(r1$mon, by = by, length.out = length.out)
 }
 else {
 to - as.POSIXlt(to)
 mon - seq.int(r1$mon, 12 * (to$year - r1$year) +
 to$mon, by)
 }
 r1$mon - mon
 r1$isdst - -1
 res - as.POSIXct(r1)
 }

 and res is the object returned in both cases (I believe).

Thank you, Joshua.

Note that R 2.12.1 patched and R 2.13.0 unstable
have this fixed (since yesterday) and hence will no longer overshoot.

Also, the next release of the  timeDate  package will have it
fixed.


Martin Maechler, ETH Zurich



 My system:
 R version 2.12.1 (2010-12-16)
 Platform: i486-pc-linux-gnu (32-bit)

 locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] timeDate_2130.91

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] POSIXct issue

2011-01-07 Thread Rustamali Manesiya

Hello I have trouble getting my original datetime back see below. I hope I
am missing something. Please help.


 tt - as.POSIXct(2011-01-07 07:49:13, tz=EST)
 tt
[1] 2011-01-07 07:49:13 EST
 ttn - as.numeric(tt)
 ttn
[1] 1294404553
 tt - as.POSIXct(ttn,origin='1970-01-01',tz=EST)
 tt
[1] 2011-01-07 12:49:13 EST


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Contour plot with time on X-axis (day) and Y-axis (hours)

2011-01-07 Thread xbodin


Hello all,

I'd like to graphically represent an hourly temperature timeseries (
http://r.789695.n4.nabble.com/file/n3179098/data.csv data.csv , and see
below for pre-process of the data) with the R functions image + contour. To
do that I wrote that script (
http://r.789695.n4.nabble.com/file/n3179098/Timetemp.r Timetemp.r ) which
basically :
 - creates x-axis, a vector which will contain the days,
 - creates y-axis, a vector which will contain the hours of the day,
 - creates the z-matrix, which has x columns and y rows.

This works fine, except that I have some troubles with the time format : I
can't set a date (day/month/year) to the x-axis of the resulting graph. 

Does anyone knows where the pb is ?

Thanks !

xav


||| Pre-process of data.csv:

data - na.omit(read.csv(data.csv, header = T, sep = ,, dec = .))
data - data[,-1]
data[[2]] - strptime(data[[2]], %m/%d/%y %I:%M:%S %p, tz =
America/Santiago)


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Contour-plot-with-time-on-X-axis-day-and-Y-axis-hours-tp3179098p3179098.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vector of character with unequal width

2011-01-07 Thread jose Bartolomei


Dear R users,
 
Thanks for your help
The recomendations were exaclty what I was searching. 
Bellow the one I will use for my script.
Thanks again,
jose
 
nchar(xx)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 4 4 4 4 4 4 4 4
 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 
z1 - rep(0, times=length(xx))
z2 - substr(z1, 1, 9 - nchar(xx))
xx - paste(z2, xx, sep=)
xx-substring(xx, 6, 9)
nchar(xx)
 
[1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [38] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

 
 Date: Wed, 5 Jan 2011 19:50:28 +0100
 From: savi...@cs.cas.cz
 To: r-help@r-project.org
 Subject: Re: [R] vector of character with unequal width
 
 On Wed, Jan 05, 2011 at 03:50:13PM +, jose Bartolomei wrote:
 [...]
  
  I was thinking to create a character vector of 0's 9-nchar(xx). 
  Then paste it to xx. 
  9-nchar(xx)
  [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 
  8
  [38] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 6 6 6 5 5 5 5 5 5 
  5 5
  [75] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ..1
  
  
  
  
  Nevertheless, I have not been able to create this vector nor I do not know 
  if this is the best option.
 
 Did you consider something like the following?
 
 xx - c(abc, abcd, abcde)
 z1 - rep(0, times=length(xx))
 z2 - substr(z1, 1, 9 - nchar(xx))
 yy - paste(z2, xx, sep=)
 cbind(yy)
 # yy 
 #[1,] 00abc
 #[2,] 0abcd
 #[3,] abcde
 
 Petr Savicky.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to run linear regression models at once

2011-01-07 Thread wangwallace


hey, folks,

I have two very simple questions. I am not quite sure if I can do this using
R. 

so, I am analyzing a large data frame with thousands of variables. For
example:

Dependent variables: d1, d2, d3 (i.e., there are three dependent variables)

Independent variables: s1, s2, s3, ..s1000 (i.e., there are 1000
independent variables)

now I want to run simple linear regression analyses of dependent variables
on independent variables. A laborious way to do this is running 1000 linear
regression models for each dependent variable separately. This would take a
lot of time. My question is: 

1) is there a easy way to run all 1000 linear regression analyses for each
dependent variable at once?  
2) after running all 1000 linear regression analyses at once, how can I
write 3000 regression weights (1000 regression weights for each dependent
variable)  and its significance level in to a file (e.g., csv, excel, ect).

Many thanks in advance!!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-run-linear-regression-models-at-once-tp3179256p3179256.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] POSIXct issue


On Fri, 7 Jan 2011, Rustamali Manesiya wrote:


Hello I have trouble getting my original datetime back see below. I hope I
am missing something. Please help.


The numeric form (which you should never use) has an origin at 
midnight in UTC, not EST.


Note too that 'EST' is not very plausible for your timezone (or are 
you in Canada?): see ?Sys.timezone.



tt - as.POSIXct(ttn,origin='1970-01-01',tz=UTC)
attr(tt, tzone) - EST
tt

[1] 2011-01-07 07:49:13 EST



tt - as.POSIXct(2011-01-07 07:49:13, tz=EST)
tt

[1] 2011-01-07 07:49:13 EST

ttn - as.numeric(tt)
ttn

[1] 1294404553

tt - as.POSIXct(ttn,origin='1970-01-01',tz=EST)
tt

[1] 2011-01-07 12:49:13 EST




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with timeSequence {timeDate} - wrong end date

2011-01-07 Thread Joshua Wiley

Thank you, Martin (and all of R core) for all your work making such
wonderful software available.  Not only is it practical, using R,
going through the source code and the documentation has been a huge
learning experience for me.  I am very grateful to you and everyone
else for being so generous with your time.

Josh

On Fri, Jan 7, 2011 at 6:59 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Thank you, Joshua.

 Note that R 2.12.1 patched and R 2.13.0 unstable
 have this fixed (since yesterday) and hence will no longer overshoot.

 Also, the next release of the  timeDate  package will have it
 fixed.


 Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] anova vs aov commands for anova with repeated measures

2011-01-07 Thread Frodo Jedi

Dear all,
I need to understand a thing in the beheaviour of the two functions aov and 
anova in the following case
involving an analysis of ANOVA with repeated measures:

If I use the folowing command I don´t get any problem:

aov1 = aov(response ~ stimulus*condition + 
Error(subject/(stimulus*condition)), 
data=scrd)
 summary(aov1)

Instead if I try to fit the same model for the regression I get an error:
 fit1- lm(response ~ stimulus*condition + 
 Error(subject/(stimulus*condition)), 
data=scrd) 

Error in eval(expr, envir, enclos) : could not find function Error

so I cannot run the command anova(fit1) afterwards.


I want to use fit1- lm(response ~ stimulus*condition + 
Error(subject/(stimulus*condition)), data=scrd) 

because I want to analyze the residuals in order to check normality, and see if 
the anova assumption of normality
still holds.

Could you please help me in understanding how to do this?

Thanks in advance

Best regards


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with glht function for lme object

2011-01-07 Thread Ben Bolker

anord andreas.nord at zooekol.lu.se writes:

 
 
 Dear all, 
 
 I'm trying to make multiple comparisons for an lme-object. The data is for
 an experiment on parental work load in birds, in which adults at different
 sites were induced to work at one of three levels ('treat'; H, M, L). The
 response is 'feedings', which is a quantitative measure of nest provisioning
 per parent per chick per hour. Site is included as a random effect (one
 male/female pair per site). 
 

  (For more complicated mixed model questions you might want
to try the r-sig-mixed-models list instead.)

It works for me:

feed.df - read.table(textConnection(
feedings sex site  treat year  date
1.8877888   M  838 H 2009  81
1.9102787   M  247 H 2009  81
1.4647229   M  674 H 2010  121
1.4160590   M 7009 M 2009  144
1.3106749   M  863 M 2010  196
1.2718121   M   61 M 2009  225
1.2799263   M  729 L 2009  256
1.5829564   M  629 L 2009  256
1.4847251   M  299 L 2010  324
1.2463151   M  569 L 2010  324
2.1694169   F  838 H 2009  81
1.5966899   F  247 H 2009  81
2.4136983   F  674 H 2010  121
1.7784873   F 7009 M 2009  144
1.6681317   F  863 M 2010  196
2.3691275   F   61 M 2009  225
2.0672192   F  729 L 2009  256
1.6389902   F  629 L 2009  256
0.9307536   F  299 L 2010  324
1.6786767   F  569 L 2010  324),
   header=TRUE)

library(nlme)
m4.feed - lme(feedings ~ treat + year + date, random = ~1|site,data=feed.df)

library(multcomp)
gg - glht(m4.feed,linfct=mcp(treat=Tukey))
plot(gg)
summary(gg)

It works for me (although it doesn't look like there's anything
going on there ...)

=

 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lme.formula(fixed = feedings ~ treat + year + date, data = feed.df, 
random = ~1 | site)

Linear Hypotheses:
   Estimate Std. Error z value Pr(|z|)
L - H == 0  -0.6452 0.7667  -0.8420.592
M - H == 0  -0.3996 0.4276  -0.9340.529
M - L == 0   0.2456 0.4229   0.5810.771
(Adjusted p values reported -- single-step method)

 


sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i486-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods  
[8] base 

other attached packages:
[1] multcomp_1.2-4  survival_2.36-3 mvtnorm_0.9-95  nlme_3.1-97

loaded via a namespace (and not attached):
[1] grid_2.12.1 lattice_0.19-18


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different LLRs on multinomial logit models in R and SPSS



On Jan 7, 2011, at 8:26 AM, sovo0...@gmail.com wrote:


On Thu, 6 Jan 2011, David Winsemius wrote:


On Jan 6, 2011, at 11:23 AM, Sören Vogel wrote:

Thanks for your replies. I am no mathematician or statistician by  
far,
however, it appears to me that the actual value of any of the two  
LLs

is indeed important when it comes to calculation of
Pseudo-R-Squared-s. If Rnagel devides by (some transformation of)  
the
actiual value of llnull then any calculation of Rnagel should  
differ.

How come? Or is my function wrong? And if my function is right, how
can I calculate a R-Squared independent from the software used?


You have two models in that function, the null one with .~ 1 and  
the origianl one and you are getting a ratio on the likelihood  
scale (which is a difference on the log-likelihood or deviance  
scale).


If this is the case, calculating 'fit' indices for those models must  
end up in different fit indices depending on software:


n - 143
ll1 - 135.02
ll2 - 129.8
# Rcs
(Rcs - 1 - exp( (ll2 - ll1) / n ))
# Rnagel
Rcs / (1 - exp(-ll1/n))
ll3 - 204.2904
ll4 - 199.0659
# Rcs
(Rcs - 1 - exp( (ll4 - ll3) / n ))
# Rnagel
Rcs / (1 - exp(-ll3/n))

The Rcs' are equal, however, the Rnagel's are not. Of course, this  
is no question, but I am rather confused. When publishing results I  
am required to use fit indices and editors would complain that they  
differ.


It is well known that editors are sometimes confused about statistics,  
and if an editor is insistent on publishing indices that are in fact  
arbitrary then that is a problem. I would hope that the editor were  
open to education. (And often there is a statistical associate editor  
who will be more likely to have a solid grounding and to whom one can  
appeal in situations of initial obstinancy.)  Perhaps you will be  
doing the world of science a favor by suggesting that said editor  
first review a first-year calculus text regarding the fact that  
indefinite integrals are only calculated up to a arbitrary constant  
and that one can only use the results in a practical setting by  
specifying the limits of integration. So it is with likelihoods. They  
are only meaningful when comparing two nested models. Sometimes the  
software obscures this fact, but it remains a statistical _fact_.


Whether you code is correct (and whether the Nagelkerke R^2 remain  
invariant with respect to such transformations) I cannot say. (I  
suspect that it would be, but I have never liked the NagelR2 as a  
measure, and didn't really like R^2 as a measure in linear regression  
for that matter, either.) I focus on fitting functions to trends,  
examining predictions, and assessing confidence intervals for  
parameter estimates. The notion that model fit is well-summarized in a  
single number blinds one to other critical issues such as the  
linearity and monotonicity assumptions implicit in much of regression  
(mal-)practice.


So, if someone who is more enamored of (or even more knowledgeably  
scornful of)  the Nagelkerke R^2 measure wants to take over here, I  
will read what they say with interest and appreciation.




Sören


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Odp: Currency return calculations

2011-01-07 Thread Amelia Vettori

My mistake sir. I was literally engrossed in my stupid logic, and while doing 
so, overlooked the simple and very effective solution you had offered. Sorry 
once again sir and will certainly try to be very careful in future.

Thanks again and have a great weekend sir.

Regards

Amelia

--- On Fri, 7/1/11, Petr PIKAL petr.pi...@precheza.cz wrote:

From: Petr PIKAL petr.pi...@precheza.cz
Subject: Odp: Currency return calculations
To: Amelia Vettori amelia_vett...@yahoo.co.nz
Cc: r-help@r-project.org
Received: Friday, 7 January, 2011, 12:59 PM

Hi

What is wrong with my suggestion
 then.

 rate1
      USD    GBP  EURO   CHF    AUD
1  112.05 171.52 42.71 41.50 109.55
2  112.90 168.27 42.68 41.47 102.52
3  110.85 169.03 41.86 42.84 114.91
4  109.63 169.64 44.71 43.44 122.48
5  108.08 169.29 44.14 43.69 122.12
6  111.23 169.47 44.58 42.30 123.96
7  112.49 170.90 41.07 42.05 100.36
8  108.87 168.69 42.23 41.23 110.19
9  109.33 170.90 44.55 42.76 121.58
10 111.88 169.96 41.12 43.79 103.46
 portfolio-c(USD, USD, CHF, AUD, USD)
 log(rate1[-1,portfolio]/rate1[-nrow(rate1),portfolio])
            USD        USD.1          CHF          AUD        USD.2
2   0.007557271  0.007557271 -0.000723153 -0.066323165 
 0.007557271
3  -0.018324535 -0.018324535  0.032501971  0.114091312 -0.018324535
4  -0.011066876 -0.011066876  0.013908430  0.063798538 -0.011066876
5  -0.014239366 -0.014239366  0.005738567 -0.002943583 -0.014239366
6   0.028728436  0.028728436 -0.032332157  0.014954765  0.028728436
7   0.011264199  0.011264199 -0.005927700 -0.211195211  0.011264199
8  -0.032709819 -0.032709819 -0.019693240  0.093442427 -0.032709819
9   0.004216322  0.004216322  0.036436939  0.098366334  0.004216322
10  0.023056037  0.023056037  0.023802395 -0.161387418  0.023056037
 

As I said instead fiddling with several loop/if/function/variables attempt 
it seems to me better to use powerful R indexing and whole object 
approach where it is
 possible.

Regards
Petr



Amelia Vettori amelia_vett...@yahoo.co.nz napsal dne 07.01.2011 
13:46:53:

 Dear sir, I am extremely sorry for messing up the logic asking for help 
w.r.t.
 my earlier mails 
 
 I have tried to explain below what I am looking for.
 
 
 I have a database (say, currency_rates) storing datewise currency 
exchange 
 rates with some base currency XYZ.
 
 currency_rates - data.frame(date = c(12/31/2010, 12/30/2010, 
12/29/
 2010, 12/28/2010, 12/27/2010,12/24/2010, 12/23/2010, 
12/22/2010, 
 12/21/2010, 12/20/2010), 
 USD  = c(112.05, 112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 
109.33, 111.88),
 GBP  = c(171.52, 168.27,169.03, 169.64, 169.29,
 169.47, 170.9, 168.69, 
170.9, 169.96),
 EURO = c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07, 42.23, 44.55, 
41.12),
 CHF  = c(41.5, 41.47, 42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 
43.79),
 AUD  = c(109.55, 102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 110.19, 
121.
 58, 103.46))
 
 I have a portfolio consisting of some of these currencies. 
 
 At this moment, suppose my portfolio has following currency 
transactions. i.e 
 following is my current portfolio and
 has 2 USD transactions, 2 EURO transactions and a CHF transactions.
 
 portfolio_currency_names = c(USD, USD, EURO, CHF, EURO, USD)
 
 
 # 
 
 My objective is AS PER THE PORTFOLIO, I need to generate a data.frame 
giving 
 respective currency returns.
 
 Thus, I need
 to have an output like
 
  USD                USD                 EURO              CHF  
 EURO                USD
 -0.0076            -0.0076              0.0007           0.0007   0.
 0007            -0.0076
  0.0183            0.0183              0.0194           -0.0325   0.
 0194               0.0183
  0.0111            0.0111            -0.0659 
          -0.0139  -0.
 0659              0.0111
  0.0142            0.0142              0.0128           -0.0057   0.
 0128              0.0142
 -0.0287            -0.0287            -0.0099           0.0323  -0.
 0099            -0.0287
 -0.0113            -0.0113              0.0820           0.0059   0.
 0820            -0.0113
  0.0327            0.0327   
         -0.0279           0.0197 -0.
 0279              0.0327
 -0.0042            -0.0042            -0.0535           -0.0364   -0.
 0535            -0.0042
 -0.0231            -0.0231              0.0801           -0.0238    0.
 0801            -0.0231
 
 Thus, my requirement is to have the dataframe as per the composition of 
my 
 portfolio. Thus, if there are only 2 transactions i.e. if my portfolio 
 contains say only CHF and AUD, I need the return calculations done only 
forCHF and AUD.
 
 

 CHF                    AUD
 0.0007               0.0663
 -0.0325            -0.1141
 -0.0139            -0.0638
 -0.0057             0.0029
 0.0323             -0.0150
 0.0059              0.2112
 0.0197             -0.0934
 -0.0364            -0.0984
 -0.0238             0.1614
 
 I once again apologize for not asking my requirement properly thereby 
causing

[R] Summing over specific columns in a matrix

2011-01-07 Thread emj83


Hi,

I would like to sum some specific columns in my matrix- for example, my
matrix looks like this:
 [,1]  [,2]  [,3] [,4]  [,5]
[1,]1   NA   NA   NA   NA
[2,]21   NA1   NA
[3,]321 21
[4,]432 32
[5,]   NA   NA   NA43
[6,]   NA   NA   NA5   NA

I would like to find the sum of the first two columns, the second two
columns and the last column:
i.e I am left with a vector of c(16, 18, 6).

I know about colSums and sum overall- I just wondered if this type of
grouping can be included somehow in a vector such as c(2,2,1)? I don't
really want to have to use a loop for this.

Many thanks Emma
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Summing-over-specific-columns-in-a-matrix-tp3179400p3179400.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to calculate this natural logarithm

2011-01-07 Thread zhaoxing731

Hello

I want to calculate natural logarithm of sum of combinations as follow: 
(R code)

{

com_sum=choose(200,482)*choose(100,118)+choose(200,483)*choose(100,117)+...+choose(200,i)*choose(100,600-i)+...+choose(200,600)*choose(100,0)
  #calculate the sum
   result=log(com_sum) #calculate 
the log of the sum
}

But every element of the com_sum is Inf, so I can't get the result
Thank you in advance

Yours sincerely
 
ZhaoXing
Department of Health Statistics
West China School of Public Health
Sichuan University
No.17 Section 3, South Renmin Road
Chengdu, Sichuan 610041
P.R.China



__
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with glht function for lme object



On Jan 7, 2011, at 8:28 AM, anord wrote:



Dear all,

I'm trying to make multiple comparisons for an lme-object. The data  
is for
an experiment on parental work load in birds, in which adults at  
different
sites were induced to work at one of three levels ('treat'; H, M,  
L). The
response is 'feedings', which is a quantitative measure of nest  
provisioning
per parent per chick per hour. Site is included as a random effect  
(one

male/female pair per site).

My final model takes the following form:
feedings ~ treat + year + data^2, random = ~1|site,data=feed.df


I am guessing that problems could easily arise as a result of your  
variable name containing ^. That is an invalid name in an un-back- 
quoted state. You have clearly not given us a copy of a console  
session since your variable is date^2  below and data^2 above.


--
David.


For this model, I would like to do multiple comparisons on 'treat',  
using

the multcomp package:
summary(glht(m4.feed,linfct=mcp(treat=Tukey)))

However, this does not work, and I get the below error message.

Error in if (is.null(pkg) | pkg == nlme) terms(formula(x)) else  
slot(x,  :

 argument is of length zero
Error in factor_contrasts(model) :
 no ‘model.matrix’ method for ‘model’ found!

I suspect this might have quite a straightforward solution, but I'm  
stuck at

this point.
Any help would be most appreciated. Sample data below.

Kind regards,
Andreas Nord
Sweden

==
feedings sex site  treat year  date^2
1.8877888   M  838 H 2009  81
1.9102787   M  247 H 2009  81
1.4647229   M  674 H 2010  121
1.4160590   M 7009 M 2009  144
1.3106749   M  863 M 2010  196
1.2718121   M   61 M 2009  225
1.2799263   M  729 L 2009  256
1.5829564   M  629 L 2009  256
1.4847251   M  299 L 2010  324
1.2463151   M  569 L 2010  324
2.1694169   F  838 H 2009  81
1.5966899   F  247 H 2009  81
2.4136983   F  674 H 2010  121
1.7784873   F 7009 M 2009  144
1.6681317   F  863 M 2010  196
2.3691275   F   61 M 2009  225
2.0672192   F  729 L 2009  256
1.6389902   F  629 L 2009  256
0.9307536   F  299 L 2010  324
1.6786767   F  569 L 2010  324
==


--
View this message in context: 
http://r.789695.n4.nabble.com/Problems-with-glht-function-for-lme-object-tp3179128p3179128.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing over specific columns in a matrix

Try this:

rowSums(rowsum(t(m), rep(1:3, c(2, 2, 1)), na.rm = TRUE))


On Fri, Jan 7, 2011 at 2:29 PM, emj83 stp08...@shef.ac.uk wrote:


 Hi,

 I would like to sum some specific columns in my matrix- for example, my
 matrix looks like this:
 [,1]  [,2]  [,3] [,4]  [,5]
 [1,]1   NA   NA   NA   NA
 [2,]21   NA1   NA
 [3,]321 21
 [4,]432 32
 [5,]   NA   NA   NA43
 [6,]   NA   NA   NA5   NA

 I would like to find the sum of the first two columns, the second two
 columns and the last column:
 i.e I am left with a vector of c(16, 18, 6).

 I know about colSums and sum overall- I just wondered if this type of
 grouping can be included somehow in a vector such as c(2,2,1)? I don't
 really want to have to use a loop for this.

 Many thanks Emma
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Summing-over-specific-columns-in-a-matrix-tp3179400p3179400.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Ravi Varadhan

I have just recently written about this issue (i.e. open learning and data
sharing) in a manuscript that is currently under review in a clinical
journal.  I have argued that data hoarding is unethical.  Participants in
research studies give their time, effort, saliva and blood in the altruistic
hope that their sacrifice will benefit humankind.  If they were to realize
that the real (ulterior) motive of the study investigators is only to
advance their careers, they would really think hard about participating in
the studies.  The study participants should only consent to participate if
they can get a signed assurance from the investigators that the
investigators will make their data available for scrutiny and for public use
(under some reasonable conditions that are fair to the study investigators).
As Vickers (Trials 2006) says, whose data is it anyway?  I believe that we
can achieve great progress in clinical research if and only if we make a
concerted effort towards open learning. Stakeholders (i.e. patients,
clinicians, policy-makers) should demand that all the data that is
potentially relevant to addressing a critical clinical question should be
made available in an open learning environment.  Unless, we can achieve this
we cannot solve the problems of publication bias and inefficient and
sub-optimal use of data.

Best,
Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Spencer Graves
Sent: Friday, January 07, 2011 8:26 AM
To: Mike Marchywka
Cc: r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity

   I wholeheartedly agree with the trend towards publishing datasets.
One way to do that is as datasets in an R package contributed to CRAN.


   Beyond this, there seems to be an increasing trend towards journals
requiring authors of scientific research to publish their data as well.  The
Public Library of Science (PLOS) has such a policy, but it is not enforced:
Savage and Vickers (2010) were able to get the raw data behind only one of
ten published articles they tried, and that one came only after reminding
the author that s/he had agreed to making the data available as a condition
of publishing in PLOS.  (Four other authors refused to share their data in
spite of their legal and moral commitment to do so as a condition of
publishing in PLOS.)


   There are other venues for publishing data.  For example, much
astronomical data is now routinely web published so anyone interested can
test their pet algorithm on real data
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
cal-data). 



   Regarding my earlier comment, I just found a Wikipedia article on
scientific misconduct that mentioned the tendency to refuse to publish
research that proves your new drug is positively harmful.  This is an
extreme version of both types of bias I previously mentioned:  (1) only
significant results get published.  (2) private funding provides its own
biases.


   Spencer


#
Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors
Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26, 2010
(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2
http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2/). 




On 1/7/2011 4:08 AM, Mike Marchywka wrote:






 Date: Thu, 6 Jan 2011 23:06:44 -0800
 From: peter.langfel...@gmail.com
 To: r-help@r-project.org
 Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, 
 scientific validity

  From a purely statistical and maybe somewhat naive point of view,
 published p-values should be corrected for the multiple testing that 
 is effectively happening because of the large number of published 
 studies. My experience is also that people will often try several 
 statistical methods to get the most significant p-value but neglect 
 to share that fact with the audience and/or at least attempt to 
 correct the p-values for the selection bias.
 You see this everywhere in one form or another from medical to 
 financial modelling. My solution here is simply to publish more raw 
 data in a computer readable form, in this case of course something 
 easy to get with R, so disinterested or adversarial parties can run their
own analysis.
 I think there was also a push to create a data base for failed drug 
 trials that may contain data of some value later. The value of R with 
 easily available data for a large cross section of users could be to 
 moderate problems like the one cited here.

 I almost
 slammed a poster here earlier

Re: [R] How to export/save an mrpp object?

2011-01-07 Thread Gavin Simpson

On Fri, 2011-01-07 at 14:04 +0200, Nikos Alexandris wrote:
 Nikos:
 
   I finally ran mrpp tests. I think all is fine but one very important
   issue: I
   have no idea how to export/save an mrpp object. Tried anything I
   know and
   searched the archives but found nothing.
 
 David W:
 
  And what happened when you tried what seems like the obvious:
  
  save(mrpp_obj, file=)
  # rm(list=ls() )  # Only uncomment if you are ready for your workspace
  to clear
  load(mrpp_store.Rdata)
 
 Right, clearing did the trick.
 
   Any ideas? Is really copy-pasting the mrpp results the only way?
  
  Many of us have no idea what such an object is, since you have not
  described the packages and functions used to create it. If you want an
  ASCII version then dput or dump are also available.
 
 Multiresponse Permuation Procedures (MRPP) is implemented in the vegan 
 package. The function mrpp() returns (an object of class mrpp) something 
 like:
 
 --%---
 # check class
 class ( samples_bitemporal_modis.0001.mrpp )
 [1] mrpp
 
 # check structure
 str ( samples_bitemporal_modis.0001.mrpp )
 List of 12
  $ call: language mrpp(dat = samples_bitemporal_modis.0001[, 1:5], 
 grouping = samples_bitemporal_modis.0001[[Class]])
  $ delta   : num 0.126
  $ E.delta : num 0.202
  $ CS  : logi NA
  $ n   : Named int [1:5] 335 307 183 188 27
   ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned 
 ...
  $ classdelta  : Named num [1:5] 0.1255 0.1045 0.1837 0.0981 0.1743
   ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned 
 ...
  $ Pvalue  : num 0.001
  $ A   : num 0.378
  $ distance: chr euclidean
  $ weight.type : num 1
  $ boot.deltas : num [1:999] 0.202 0.202 0.202 0.203 0.202 ...
  $ permutations: num 999
  - attr(*, class)= chr mrpp
 --%---
 
 Now I've tried the following:
 
 --%---
 # 1. save(d) it
 save ( samples_bitemporal_modis.0001.mrpp , file=exported.mrpp.R )

You would normally use the .R extension for R code not saved R objects.
instead, used .rda or .RData.

 # 2. loade(d) it in a new object...
 loadedmrpp - load ( exported.mrpp.R)

If you don't assign to output of load() then the
samples_bitemporal_modis.0001.mrpp object will end up in the global
workspace. If you assign the output, you store the strings of the
objects loaded, not the objects, which are still loaded into the global
workspace by default. This is stated in ?load:

Value:

 A character vector of the names of objects created, invisibly.

 # 3. (tried) to check it...
 str ( exported.mrpp.R)
 
  chr samples_bitemporal_modis.0001.mrpp

Hence the above, but I don't believe this. Surely you mean
`str(loadedmrpp)` but as I said this will only give the character string
of the names of the objects loaded...

 # it did not cross my mind immediately to...
 get(loadedmrpp)

...hence the above works, but what is wrong with simply typing
`samples_bitemporal_modis.0001.mrpp` (apart from the obvious - you must
enjoy typing to give objects such long names...).

I think you should read ?load to rethink what these functions do and how
they do it.

HTH

G

 Call:
 mrpp(dat = samples_bitemporal_modis.0001[, 1:5], grouping = 
 samples_bitemporal_modis.0001[[Class]]) 
 
 Dissimilarity index: euclidean 
 Weights for groups:  n 
 
 Class means and counts:
 
   Urban  Vegetation Bare ground Burned Water 
 delta 0.1255 0.1045 0.1837  0.0981 0.1743
 n 335307183 18827
 
 Chance corrected within-group agreement A: 0.3778 
 Based on observed delta 0.1258 and expected delta 0.2022 
 
 Significance of delta: 0.001 
 Based on  999  permutations
 
 # ...or to work on a clean workspace!
 --%---
 
 Thank you David. Cheers, Nikos
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Alan Kelly

Bert, consider the short rebuttal offered by George Musser in Scientific 
American:

http://www.scientificamerican.com/blog/post.cfm?id=in-praise-of-scientific-error-2010-12-20

Perhaps a more realistic assessment of the (acknowledged) problem.

Regards,
Alan Kelly
Trinity College Dublin

On 7 Jan 2011, at 11:00, 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org 
r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org wrote:

Message: 54
Date: Thu, 6 Jan 2011 10:56:34 -0800
From: Bert Gunter gunter.ber...@gene.commailto:gunter.ber...@gene.com
To: r-help@r-project.orgmailto:r-help@r-project.org
Subject: [R] Wyy off topic...Statistical methods, pub bias,
   scientific validity
Message-ID:
   
aanlktinvwp0bm864aedpr=hb-r=e_=b7zgftwdbxn...@mail.gmail.commailto:aanlktinvwp0bm864aedpr=hb-r=e_=b7zgftwdbxn...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

Folks:

The following has NOTHING (obvious) to do with R. But I believe that
all on this list would find it relevant and, I hope, informative. It
is LONG. I apologize in advance to those who feel I have wasted their
time.

http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer

Best regards to all,

Bert

--
Bert Gunter
Genentech Nonclinical Biostatistics




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Match numeric vector against rows in a matrix?

2011-01-07 Thread William Dunlap

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Petr Savicky
 Sent: Friday, January 07, 2011 2:11 AM
 To: r-help@r-project.org
 Subject: Re: [R] Match numeric vector against rows in a matrix?

 On Wed, Jan 05, 2011 at 07:16:47PM +, Kevin Ummel wrote:
  Two posts in one day is not a good day...and this question 
 seems like it should have an obvious answer:

  I have a matrix where rows are unique combinations of 1's and 0's:

   combs=as.matrix(expand.grid(c(0,1),c(0,1)))
   combs
   Var1 Var2
  [1,]00
  [2,]10
  [3,]01
  [4,]11

  I want a single function that will give the row index 
 containing an exact match with vector x:

   x=c(0,1)

  The solution needs to be applied many times, so I need 
 something quick -- I was hoping a base function would do it, 
 but I'm drawing a blank.

 If the matrix can have different number of columns, then
 also the following can be used

   combs - as.matrix(expand.grid(c(0,1),c(0,1),c(0,1)))
   x - c(0,1,1)
   which(rowSums(combs != rep(x, each=nrow(combs))) == 0) # [1] 7

If the combs matrix always has that form, why not
dispense with it and treat x as the binary expansion
of the row number (+1)?
1 + sum( 2^(seq_along(x)-1) * x )
   [1] 7

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to calculate this natural logarithm

2011-01-07 Thread Ted Harding

On 07-Jan-11 16:20:59, zhaoxing731 wrote:
 Hello
 I want to calculate natural logarithm of sum of combinations
 as follow: (R code)

{com_sum=choose(200,482)*choose(100,118)+
 choose(200,483)*choose(100,117)+...+
 choose(200,i)*choose(100,600-i)+...+
 choose(200,600)*choose(100,0) #calculate the sum
 result=log(com_sum) #calculate the log of the sum
}

 But every element of the com_sum is Inf, so I can't get the
 result. Thank you in advance
 Yours sincerely
 ZhaoXing

You may find you can usefully adapt the technique I have
described in:

http://www.zen89632.zen.co.uk/R/FisherTest/extreme_fisher.pdf

A somewhat different but closely related problem.
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jan-11   Time: 17:42:48
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

This seems to get me something (the binary format)  but what do I do with it?

Thanks to everyone for the help.

Results:  Original dd file on my hard drive:

mydd - structure(list(AUA9363 = structure(c(1L, 1L, 2L, 3L, 3L, 4L,
4L, 4L, 6L, 5L, 5L), .Label = c(AUA9733, AUA9734, JAF104,
KLM686, KLM744, KLM783), class = factor), OE.LNQ = structure(c(5L,
5L, 5L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), .Label = c(  OO-TUC,
  PH-BFF,   PH-BFH,   PH-BQD,  OE-LVK), class = factor),
X51.30 = c(50.82, 50.82, 50.74, 50.74, 50.75, 50.73, 50.84,
50.86, 50.94, 50.66, 50.75), X.3.01 = c(-3.12, -3.14, -3.17,
-3.18, -3.05, -3.96, -3.37, -3.28, -4, -3.92, -3.25), X3575 = c(3775L,
3625L, 5575L, 38000L, 38000L, 38975L, 39000L, 39000L, 39000L,
37000L, 36300L), X09.21.44 = structure(c(4L, 5L, 6L, 2L,
3L, 8L, 10L, 11L, 1L, 7L, 9L), .Label = c( 05:44:12,  07:50:08,
 07:50:44,  08:02:34,  08:02:47,  08:57:12,  13:22:37,
 13:23:11,  13:25:42,  13:25:51,  13:26:15), class = factor)), 
.Names = c(AUA9363,
OE.LNQ, X51.30, X.3.01, X3575, X09.21.44), class = data.frame, 
row.names = c(NA,
-11L))


Downloaded result
 x
  [1] 3c 48 54 4d 4c 3e 0a 3c 48 45 41 44 3e 0a 3c 54 49 54 4c 45 3e 4d 6f 76 
65 64 20 54 65 6d 70 6f 72
 [34] 61 72 69 6c 79 3c 2f 54 49 54 4c 45 3e 0a 3c 2f 48 45 41 44 3e 0a 3c 42 
4f 44 59 20 42 47 43 4f 4c
 [67] 4f 52 3d 22 23 46 46 46 46 46 46 22 20 54 45 58 54 3d 22 23 30 30 30 30 
30 30 22 3e 0a 3c 48 31 3e
[100] 4d 6f 76 65 64 20 54 65 6d 70 6f 72 61 72 69 6c 79 3c 2f 48 31 3e 0a 54 
68 65 20 64 6f 63 75 6d 65
[133] 6e 74 20 68 61 73 20 6d 6f 76 65 64 20 3c 41 20 48 52 45 46 3d 22 68 74 
74 70 73 3a 2f 2f 73 69 74
[166] 65 73 2e 67 6f 6f 67 6c 65 2e 63 6f 6d 2f 73 69 74 65 2f 6a 72 6b 72 69 
64 65 61 75 2f 68 6f 6d 65
[199] 2f 67 65 6e 65 72 61 6c 2d 73 74 6f 72 65 73 2f 64 75 70 6c 69 63 61 74 
65 73 2e 63 73 76 3f 61 74
[232] 74 72 65 64 69 72 65 63 74 73 3d 30 22 3e 68 65 72 65 3c 2f 41 3e 2e 0a 
3c 2f 42 4f 44 59 3e 0a 3c
[265] 2f 48 54 4d 4c 3e 0a




--- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote:

 From: Henrique Dallazuanna www...@gmail.com
 Subject: Re: [R] Accessing data via url
 To: Dieter Menne dieter.me...@menne-biomed.de
 Cc: r-help@r-project.org
 Received: Friday, January 7, 2011, 7:07 AM
 With the ssl.verifypeer = FALSE
 argument it works:
 
  x = getBinaryURL(dd, ssl.verifypeer = FALSE)
 
 
 On Fri, Jan 7, 2011 at 6:24 AM, Dieter Menne
 dieter.me...@menne-biomed.dewrote:
 
 
 
  John Kane-2 wrote:
  
   #   Can anyone suggest why this
 works
  
   datafilename -
   http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data;
   person.data  -
 read.table(datafilename,header=TRUE)
  
   # but this does not?
  
   dd -
   https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt;
   treedata - read.table(dd, header=TRUE)
  
  
 ===
  
   Error in file(file, rt) : cannot open the
 connection
  
 
  Your original file is no longer there, but when I try
 RCurl with a png file
  that is present, I get a certificate error:
 
  Dieter
 
  
  library(RCurl)
  sessionInfo()
  dd -
  https://sites.google.com/site/jrkrideau/home/general-stores/history.png;
  x = getBinaryURL(dd)
 
  -
   sessionInfo()
  R version 2.12.1 (2010-12-16)
  Platform: i386-pc-mingw32/i386 (32-bit)
 
  locale:
  [1] LC_COLLATE=German_Germany.1252 
 LC_CTYPE=German_Germany.1252
  [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
  [5] LC_TIME=German_Germany.1252
 
  attached base packages:
  [1] stats     graphics 
 grDevices datasets  utils 
    methods   base
 
  other attached packages:
  [1] RCurl_1.5-0.1  bitops_1.0-4.1
 
  loaded via a namespace (and not attached):
  [1] tools_2.12.1
 
   dd -
   https://sites.google.com/site/jrkrideau/home/general-stores/history.png
  
 
   x = getBinaryURL(dd)
  Error in curlPerform(curl = curl, .opts = opts,
 .encoding = .encoding) :
   SSL certificate problem, verify that the CA cert
 is OK. Details:
  error:14090086:SSL
 routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
  failed
 
 
 
  --
  View this message in context:
  http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3178773.html
  Sent from the R help mailing list archive at
 Nabble.com.
 
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O
 
     [[alternative HTML version deleted]]
 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal,

[R] Merging zoo objects

2011-01-07 Thread Pete B


Hi 

I have n zoo objects M1, M2, M3, ... , Mn that I want to merge where n is a
number calculated at run-time.

I am struggling to find the correct syntax to do this

Assuming n is calculated as 10 (for example), I have tried

n = 10
# First Effort
alldata= merge(paste(M,rep(1:n), sep=),all=TRUE)

# Second Effort
alldata
=merge(noquote(toString(paste(M,rep(1:nrow(counts1)),sep=))),all=TRUE)

Any insights would be gratefully received.

Kind regards

Pete




-- 
View this message in context: 
http://r.789695.n4.nabble.com/Merging-zoo-objects-tp3184696p3184696.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Spencer Graves

  I applaud your efforts, Ravi.  Regarding Whose data is it?, I 
humbly suggest that referees and editorial boards push (demand?) for 
rules that require the raw data be made available to the referees and 
concurrent with publication.



  Spencer


On 1/7/2011 8:43 AM, Ravi Varadhan wrote:

I have just recently written about this issue (i.e. open learning and data
sharing) in a manuscript that is currently under review in a clinical
journal.  I have argued that data hoarding is unethical.  Participants in
research studies give their time, effort, saliva and blood in the altruistic
hope that their sacrifice will benefit humankind.  If they were to realize
that the real (ulterior) motive of the study investigators is only to
advance their careers, they would really think hard about participating in
the studies.  The study participants should only consent to participate if
they can get a signed assurance from the investigators that the
investigators will make their data available for scrutiny and for public use
(under some reasonable conditions that are fair to the study investigators).
As Vickers (Trials 2006) says, whose data is it anyway?  I believe that we
can achieve great progress in clinical research if and only if we make a
concerted effort towards open learning. Stakeholders (i.e. patients,
clinicians, policy-makers) should demand that all the data that is
potentially relevant to addressing a critical clinical question should be
made available in an open learning environment.  Unless, we can achieve this
we cannot solve the problems of publication bias and inefficient and
sub-optimal use of data.

Best,
Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Spencer Graves
Sent: Friday, January 07, 2011 8:26 AM
To: Mike Marchywka
Cc: r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity

I wholeheartedly agree with the trend towards publishing datasets.
One way to do that is as datasets in an R package contributed to CRAN.


Beyond this, there seems to be an increasing trend towards journals
requiring authors of scientific research to publish their data as well.  The
Public Library of Science (PLOS) has such a policy, but it is not enforced:
Savage and Vickers (2010) were able to get the raw data behind only one of
ten published articles they tried, and that one came only after reminding
the author that s/he had agreed to making the data available as a condition
of publishing in PLOS.  (Four other authors refused to share their data in
spite of their legal and moral commitment to do so as a condition of
publishing in PLOS.)


There are other venues for publishing data.  For example, much
astronomical data is now routinely web published so anyone interested can
test their pet algorithm on real data
(http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
cal-data).



Regarding my earlier comment, I just found a Wikipedia article on
scientific misconduct that mentioned the tendency to refuse to publish
research that proves your new drug is positively harmful.  This is an
extreme version of both types of bias I previously mentioned:  (1) only
significant results get published.  (2) private funding provides its own
biases.


Spencer


#
Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors
Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26, 2010
(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2
http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
y-authors-publishing-in-plos-journals-2/).




On 1/7/2011 4:08 AM, Mike Marchywka wrote:







Date: Thu, 6 Jan 2011 23:06:44 -0800
From: peter.langfel...@gmail.com
To: r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity


 From a purely statistical and maybe somewhat naive point of view,

published p-values should be corrected for the multiple testing that
is effectively happening because of the large number of published
studies. My experience is also that people will often try several
statistical methods to get the most significant p-value but neglect
to share that fact with the audience and/or at least attempt to
correct the p-values for the selection bias.

You see this everywhere in one form or another from medical to
financial modelling. My solution here is simply to publish more raw
data in a computer readable form, in this case of course something
easy to get with R, so disinterested or adversarial parties can run their

own analysis.

I think there

Re: [R] Accessing data via url



--- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote:

 From: Dieter Menne dieter.me...@menne-biomed.de

 Your original file is no longer there, but when I try RCurl
 with a png file
 that is present, I get a certificate error:
 
 Dieter

Since replaced with
dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; 


library(RCurl)
dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;
x = getBinaryURL(dd, ssl.verifypeer = FALSE)

seems to be downloading a binary file.  

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging zoo objects

2011-01-07 Thread Gabor Grothendieck

On Fri, Jan 7, 2011 at 1:01 PM, Pete B peter.breckn...@bp.com wrote:

 Hi

 I have n zoo objects M1, M2, M3, ... , Mn that I want to merge where n is a
 number calculated at run-time.

 I am struggling to find the correct syntax to do this

 Assuming n is calculated as 10 (for example), I have tried

 n = 10
 # First Effort
 alldata= merge(paste(M,rep(1:n), sep=),all=TRUE)

 # Second Effort
 alldata
 =merge(noquote(toString(paste(M,rep(1:nrow(counts1)),sep=))),all=TRUE)


Try this where the sapply creates a list of the objects.

library(zoo)
M1 - zoo(11:13); M2 - zoo(21:24); M3 - zoo(31:35)

do.call(merge, sapply(ls(pattern = ^M), get, simplify = FALSE))


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

--- On Fri, 1/7/11, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:

 From: Prof Brian Ripley rip...@stats.ox.ac.uk
 Subject: Re: [R] Accessing data via url

 ?read.table says

            ‘file’
 can also be a complete URL.

 This is implemented by url(): see the section on URLs on
 its help 
Thanks

 page.  You haven't followed the posting guide and told
 us your OS, and what the section says does depend on the OS.

Sorry I thought I had it in my sig file .  Added now.
R 2.12.0  Windows 7 

 On Thu, 6 Jan 2011, John Kane wrote:

  #   Can anyone suggest why this works

  datafilename - 
  http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data;
  person.data  -
 read.table(datafilename,header=TRUE)

  # but this does not?

  dd - 
  https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt;
  treedata - read.table(dd, header=TRUE)

 ===

  Error in file(file, rt) : cannot open the
 connection
  In addition: Warning message:
  In file(file, rt) : unsupported URL scheme

  # I can access both through a hyperlink in OOO Calc.
 t
  #  Thanks

 -- 
 Brian D. Ripley,           
       rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,         
    Tel:  +44 1865 272861 (self)
 1 South Parks Road,         
            +44 1865
 272866 (PA)
 Oxford OX1 3TG, UK           
     Fax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging zoo objects

2011-01-07 Thread Pete B


Thanks Gabor. Much appreciated (as always).
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Merging-zoo-objects-tp3184696p3186113.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

Take a look on the return of:

 rawToChar(x)

On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote:



 --- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote:

  From: Dieter Menne dieter.me...@menne-biomed.de

  Your original file is no longer there, but when I try RCurl
  with a png file
  that is present, I get a certificate error:
 
  Dieter

 Since replaced with
 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 


 library(RCurl)
 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 
 x = getBinaryURL(dd, ssl.verifypeer = FALSE)

 seems to be downloading a binary file.

 Thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing over specific columns in a matrix

2011-01-07 Thread Mark Knecht

On Fri, Jan 7, 2011 at 8:42 AM, Henrique Dallazuanna www...@gmail.com wrote:
 Try this:

 rowSums(rowsum(t(m), rep(1:3, c(2, 2, 1)), na.rm = TRUE))


 On Fri, Jan 7, 2011 at 2:29 PM, emj83 stp08...@shef.ac.uk wrote:


 Hi,

 I would like to sum some specific columns in my matrix- for example, my
 matrix looks like this:
     [,1]  [,2]  [,3] [,4]  [,5]
 [1,]    1   NA   NA   NA   NA
 [2,]    2    1   NA    1   NA
 [3,]    3    2    1     2    1
 [4,]    4    3    2     3    2
 [5,]   NA   NA   NA    4    3
 [6,]   NA   NA   NA    5   NA

 I would like to find the sum of the first two columns, the second two
 columns and the last column:
 i.e I am left with a vector of c(16, 18, 6).

Can you help me extend this example?

I'd like to get (PL_Pos - Costs) for each row in Data1, sum those
results for each matching date in Result1, and put the result in a new
column in Result1 called 'Daily'.

Been messing with this for an hour now. Nothing comes close.

Thanks,
Mark


Result1 = structure(list(TradeDates = structure(c(14249, 14250, 14251,
14252, 14253, 14256, 14257, 14258, 14259, 14260, 14263, 14264
), class = Date)), .Names = TradeDates, row.names = c(NA,
12L), class = data.frame)

Data1 = structure(list(Trade = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22), PosType = c(1, 1, -1,
-1, -1, -1, -1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, -1, 1, 1,
1, -1), EnDate = structure(c(14249, 14250, 14251, 14253, 14256,
14256, 14256, 14257, 14258, 14258, 14259, 14259, 14260, 14264,
14264, 14265, 14266, 14267, 14270, 14271, 14273, 14274), class = Date),
EnTime = c(1406, 1318, 838, 846, 846, 1038, 1102, 918, 838,
950, 1134, 1254, 1110, 846, 1318, 854, 950, 838, 1246, 838,
854, 902), ExDate = structure(c(14249, 14250, 14251, 14253,
14256, 14256, 14256, 14257, 14258, 14258, 14259, 14259, 14260,
14264, 14264, 14265, 14266, 14267, 14270, 14271, 14273, 14274
), class = Date), ExTime = c(1515, 1515, 1030, 942, 1030,
1046, 1110, 1515, 942, 1030, 1142, 1515, 1515, 1030, 1326,
1515, 1515, 1030, 1515, 1515, 1515, 1022), PL_Pos = c(133.5,
-41.5, 171, 483.5, 333.5, -29, -54, -291.5, 596, -141.5,
-54, 558.5, 533.5, 521, -41.5, 883.5, 358.5, -979, -191.5,
196, -791.5, 446), Costs = c(29, 29, 29, 29, 29, 29, 29,
29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29
), PL = c(104.5, -70.5, 142, 454.5, 304.5, -58, -83, -320.5,
567, -170.5, -83, 529.5, 504.5, 492, -70.5, 854.5, 329.5,
-1008, -220.5, 167, -820.5, 417)), .Names = c(Trade, PosType,
EnDate, EnTime, ExDate, ExTime, PL_Pos, Costs, PL
), row.names = c(NA, 22L), class = data.frame)

Result1
Data1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Ravi Varadhan

I think that the strategy of Editors simply telling the authors to share or
perish is a bit naïve.  There are a number of practical challenges that need
to be addressed in order to create a fair and effective open-learning
environment.  Eysenbach (BMJ 2001) and Vickers (2006) discuss these and some
partial solutions.  We need more creative thinking that uses both carrot and
sticks. We also need more empirical experience with this.  Perhaps, we can
learn from fields, if there are any, that do a good job of data sharing and
open learning.

Best,
Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: Spencer Graves [mailto:spencer.gra...@structuremonitoring.com] 
Sent: Friday, January 07, 2011 1:01 PM
To: Ravi Varadhan
Cc: 'Mike Marchywka'; r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity

   I applaud your efforts, Ravi.  Regarding Whose data is it?, I 
humbly suggest that referees and editorial boards push (demand?) for 
rules that require the raw data be made available to the referees and 
concurrent with publication.


   Spencer


On 1/7/2011 8:43 AM, Ravi Varadhan wrote:
 I have just recently written about this issue (i.e. open learning and data
 sharing) in a manuscript that is currently under review in a clinical
 journal.  I have argued that data hoarding is unethical.  Participants in
 research studies give their time, effort, saliva and blood in the
altruistic
 hope that their sacrifice will benefit humankind.  If they were to realize
 that the real (ulterior) motive of the study investigators is only to
 advance their careers, they would really think hard about participating in
 the studies.  The study participants should only consent to participate if
 they can get a signed assurance from the investigators that the
 investigators will make their data available for scrutiny and for public
use
 (under some reasonable conditions that are fair to the study
investigators).
 As Vickers (Trials 2006) says, whose data is it anyway?  I believe that
we
 can achieve great progress in clinical research if and only if we make a
 concerted effort towards open learning. Stakeholders (i.e. patients,
 clinicians, policy-makers) should demand that all the data that is
 potentially relevant to addressing a critical clinical question should be
 made available in an open learning environment.  Unless, we can achieve
this
 we cannot solve the problems of publication bias and inefficient and
 sub-optimal use of data.

 Best,
 Ravi.
 ---
 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology School of Medicine Johns
 Hopkins University

 Ph. (410) 502-2619
 email: rvarad...@jhmi.edu


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of Spencer Graves
 Sent: Friday, January 07, 2011 8:26 AM
 To: Mike Marchywka
 Cc: r-help@r-project.org
 Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
 scientific validity

 I wholeheartedly agree with the trend towards publishing datasets.
 One way to do that is as datasets in an R package contributed to CRAN.


 Beyond this, there seems to be an increasing trend towards
journals
 requiring authors of scientific research to publish their data as well.
The
 Public Library of Science (PLOS) has such a policy, but it is not
enforced:
 Savage and Vickers (2010) were able to get the raw data behind only one of
 ten published articles they tried, and that one came only after reminding
 the author that s/he had agreed to making the data available as a
condition
 of publishing in PLOS.  (Four other authors refused to share their data in
 spite of their legal and moral commitment to do so as a condition of
 publishing in PLOS.)


 There are other venues for publishing data.  For example, much
 astronomical data is now routinely web published so anyone interested can
 test their pet algorithm on real data

(http://sites.google.com/site/vousergroup/presentations/publishing-astronomi
 cal-data).



 Regarding my earlier comment, I just found a Wikipedia article on
 scientific misconduct that mentioned the tendency to refuse to publish
 research that proves your new drug is positively harmful.  This is an
 extreme version of both types of bias I previously mentioned:  (1) only
 significant results get published.  (2) private funding provides its own
 biases.


 Spencer


 #
 Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors
 Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26, 2010

(http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b

Re: [R] Accessing data via url

Well it seems to mean that the file has moved.  It , to my untrained eye, seems 
to be in the same sport as it was yesterday BTW I am now talking about
dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;

Would this indicate it's physically moving on googe servers?

I had not realised there was something like  rawToChar(). Amazing what R does.

Thank ,

--- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote:

From: Henrique Dallazuanna www...@gmail.com
Subject: Re: [R] Accessing data via url
To: John Kane jrkrid...@yahoo.ca
Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de
Received: Friday, January 7, 2011, 1:21 PM

Take a look on the return of:

 rawToChar(x)

On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote:






--- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote:



 From: Dieter Menne dieter.me...@menne-biomed.de



 Your original file is no longer there, but when I try RCurl

 with a png file

 that is present, I get a certificate error:



 Dieter



Since replaced with

dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;





library(RCurl)

dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;

x = getBinaryURL(dd, ssl.verifypeer = FALSE)



seems to be downloading a binary file.



Thanks



__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-07 Thread Greg Snow

A lot of this depends on what question you are really trying to answer.  For 
one way anova replacing y-values with their ranks essentially transforms the 
distribution to uniform (under the null) and the Central Limit Theorem kicks in 
for the uniform with samples larger than about 5, so the normal approximations 
are pretty good and the theory works, but what are you actually testing?  The 
most meaningful null that is being tested is that all data come from the exact 
same distribution.  So what does it mean when you reject that null?  It means 
that all the groups are not representing the same distribution, but is that 
because the means differ? Or the variances? Or the shapes? It can be any of 
those.  Some point out that if you make certain assumptions such as symmetry or 
shifts of the same distributions, then you can talk about differences in means 
or medians, but usually if I am using non-parametrics it is because I don't 
believe that things are symmetric and the shift idea doesn't fit in my mind.

Some alternatives include bootstrapping or permutation tests, or just 
transforming the data to get something closer to normal.

Now what does replacing by ranks do in 2-way anova where we want to test the 
difference in one factor without making assumptions about whether the other 
factor has an effect or not?  I'm not sure on this one.

I have seen regression on ranks, it basically tests for some level of 
relationship, but regression is usually used for some type of prediction and 
predicting from a rank-rank regression does not seem meaningful to me.

Fitting the regression model does not require normality, it is the tests on the 
coefficients and confidence and prediction intervals that assume normality 
(again the CLT helps for large samples (but not for prediction intervals)).  
Bootstrapping is an option for regression without assuming normality, 
transformations can also help.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ben Ward
 Sent: Thursday, January 06, 2011 2:00 PM
 To: r-help@r-project.org
 Subject: Re: [R] Assumptions for ANOVA: the right way to check the
 normality

 On 06/01/2011 20:29, Greg Snow wrote:
  Some would argue to always use the kruskal wallis test since we never
 know for sure if we have normality.  Personally I am not sure that I
 understand what exactly that test is really testing.  Plus in your case
 you are doing a two-way anova and kruskal.test does one-way, so it will
 not work for your case.  There are other non-parametric options.
 Just read this and had queries of my own and comments on this subject:
 Would one of these options be to rank the data before doing whatever
 model or test you want to do? As I understand it makes the place of the
 data the same, but pulls extreme cases closer to the rest. Not an
 expert
 though.
 I've been doing lm() for my work, and I don't know if that makes an
 assumption of normality (may data is not normal). And I'm unsure of any
 other assumptions as my texts don't really discuss them. Although I can
 comfortably evaluate a model say using residual vs fitted, and F values
 turned to P, resampling and confidence intervals, and looking at sums
 of
 squares terms add to explanation of the model. I've tried the plot()
 function to help graphically evaluate a model, and I want to make sure
 I
 understand what it's showing me. I think the first, is showing me the
 models fitted values vs the residuals, and ideally, I think the closer
 the points are to the red line the better. The next plot is a Q-Q plot,
 the closer the points to the line, the more normal the model
 coefficients (or perhaps the data). I'm not sure what the next two
 plots
 are, but it is titled Scale-Location. And it looks to have the square
 root of standardized residuals on y, and fitted model values on x.
 Might
 this be similar to the first plot? The final one is titled Residuals vs
 Leverage, which has standardized residuals on y and leverage on x, and
 something called Cooks Distance is plotted as well.

 Thanks,
 Ben. W
  Whether to use anova and other normality based tests is really a
 matter of what assumptions you are willing to live with and what level
 of close enough you are comfortable with.  Consulting with a local
 consultant with experience in these areas is useful if you don't have
 enough experience to decide what you are comfortable with.
 
  For your description, I would try the proportional odds logistic
 regression, but again, you should probably consult with someone who has
 experience rather than trying that on your own until you have more
 training and experience.
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
  From: Frodo Jedi [mailto:frodo.j...@yahoo.com]
  Sent: Thursday,

Re: [R] Accessing data via url

Great, but how did you do that?  

--- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote:

From: Henrique Dallazuanna www...@gmail.com
Subject: Re: [R] Accessing data via url
To: John Kane jrkrid...@yahoo.ca
Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de
Received: Friday, January 7, 2011, 1:31 PM

Google redirect to:

https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0




On Fri, Jan 7, 2011 at 4:29 PM, John Kane jrkrid...@yahoo.ca wrote:


Well it seems to mean that the file has moved.  It , to my untrained eye, seems 
to be in the same sport as it was yesterday BTW I am now talking about


dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;

Would this indicate it's physically moving on googe servers?



I had not realised there was something like  rawToChar(). Amazing what R does.

Thank ,

--- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote:



From: Henrique Dallazuanna www...@gmail.com


Subject: Re: [R] Accessing data via url
To: John Kane jrkrid...@yahoo.ca
Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de


Received: Friday, January 7,
 2011, 1:21 PM

Take a look on the return of:

 rawToChar(x)

On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote:








--- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote:



 From: Dieter Menne dieter.me...@menne-biomed.de



 Your original file is no longer there, but when I try RCurl

 with a png file

 that is present, I get a certificate error:



 Dieter



Since replaced with

dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;





library(RCurl)

dd - 
https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;

x = getBinaryURL(dd, ssl.verifypeer = FALSE)



seems to be downloading a binary file.



Thanks



__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

--- On Fri, 1/7/11, Peter Langfelder peter.langfel...@gmail.com wrote:

 From: Peter Langfelder peter.langfel...@gmail.com
 Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, 
 scientific validity
 To: r-help@r-project.org r-help@r-project.org
 Received: Friday, January 7, 2011, 2:06 AM
 From a purely statistical and
 maybe somewhat naive point of view,
 published p-values should be corrected for the multiple
 testing that
 is effectively happening because of the large number of
 published
 studies. My experience is also that people will often try
 several
 statistical methods to get the most significant p-value but
 neglect to
 share that fact with the audience and/or at least attempt
 to correct
 the p-values for the selection bias.

 That being said, it would seem that biomedical sciences do
 make
 progress, so some of the published results are presumably
 correct :)

Totally a placebo effect :)

 Peter

 On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves
 spencer.gra...@structuremonitoring.com
 wrote:
       Part of the phenomenon can be explained by the
 natural censorship in
  what is accepted for publication:  Stronger results
 tend to have less
  difficulty getting published.  Therefore, given that
 a result is published,
  it is evident that the estimated magnitude of the
 effect is in average
  larger than it is in reality, just by the fact that
 weaker results are less
  likely to be published.  A study of the literature on
 this subject might
  yield an interesting and valuable estimate of the
 magnitude of this
  selection bias.

       A more insidious problem, that may not affect
 the work of Jonah Lehrer,
  is political corruption in the way research is funded,
 with less public and
  more private funding of research
  (http://portal.unesco.org/education/en/ev.php-URL_ID=21052URL_DO=DO_TOPICURL_SECTION=201.html).
   For example, I've heard claims (which I cannot
 substantiate right now) that
  cell phone companies allegedly lobbied successfully to
 block funding for
  researchers they thought were likely to document
 health problems with their
  products.  Related claims have been made by
 scientists in the US Food and
  Drug Administration that certain therapies were
 approved on political
  grounds in spite of substantive questions about the
 validity of the research
  backing the request for approval (e.g.,
  www.naturalnews.com/025298_the_FDA_scientists.html).
  Some of these
  accusations of political corruption may be groundless.
  However, as private
  funding replaces tax money for basic science, we must
 expect an increase in
  research results that match the needs of the funding
 agency while degrading
  the quality of published research.  This produces
 more research that can not
  be replicated -- effects that get smaller upon
 replication.  (My wife and I
  routinely avoid certain therapies recommended by
 physicians, because the
  physicians get much of their information on recent
 drugs from the
  pharmaceuticals, who have a vested interest in
 presenting their products in
  the most positive light.)

       Spencer

  On 1/6/2011 2:39 PM, Carl Witthoft wrote:

  The next week's New Yorker has some decent
 rebuttal letters.  The case is
  hardly as clear-cut as the author would like to
 believe.

  Carl

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Claudia Beleites


On 01/07/2011 06:13 AM, Spencer Graves wrote:
  A more insidious problem, that may not affect the work of Jonah 
Lehrer, is political corruption in the way research is funded, with 
less public and more private funding of research 
Maybe I'm too pessimistic, but the term _political_ corruption reminds 
me that I can just as easily imagine a funding bias* in public 
funding. And I'm not sure it is (or would be) less of a problem just 
because the interests of private funding are easier to spot.


* I think of bias on both sides: the funding agency selecting the 
studies to support and the researcher subconsciously complying to the 
expectations of the funding agency.


On 01/07/2011 08:06 AM, Peter Langfelder wrote:

 From a purely statistical and maybe somewhat naive point of view,
published p-values should be corrected for the multiple testing that
is effectively happening because of the large number of published
studies. My experience is also that people will often try several
statistical methods to get the most significant p-value but neglect to
share that fact with the audience and/or at least attempt to correct
the p-values for the selection bias.
Even if the number of all the tests were known, I have the impression 
that the corrected p-value would be kind of the right answer to the 
wrong question. I'm not particularly interested in the probability of 
arriving at  the presented findings if the null hypothesis were true. 
I'd rather know the probability that the conclusions are true. Switching 
to the language of clinical chemistry, this is: I'm presented with the 
sensitivity of a test, but I really want to know the positive predictive 
value. What is still missing with the corrected p-values is the 
prevalence of good ideas of the publishing scientist (not even known 
for all scientists).  And I'm not sure this is not decreasing if the 
scientist generates and tests more and more ideas.
I found my rather hazy thoughts about this much better expressed in the 
books of Beck-Bornholdt and Dubben (which I'm afraid are only available 
in German).


Conclusion: try to be/become a good scientist: with a high prevalence of 
good ideas. At least with a high prevalence of good ideas among the 
tested hypotheses. Including thinking first which hypotheses are the 
ones to test, and not giving in to the temptation to try out more and 
more things as one gets more familiar with the experiment/data set/problem.
The latter I find very difficult. Including the experience of giving a 
presentation where I explicitly talked about why I did not do any 
data-driven optimization of my models. Yet in the discussion I was very 
prominently told I need to try in addition these other pre-processing 
techniques and these other modeling techniques - even by people whom I 
know to be very much aware and concerned about optimistically biased 
validation results. Which were of course very valid questions (and easy 
to comply), but I conclude it is common/natural/human to have and want 
to try out more ideas.
Also, after several years in the field and with the same kind of samples 
of course I run the risk of my ideas being overfit to our kind of 
samples - this is a cost that I have to pay for the gain due to 
experience/expertise.


Some more thoughts:
- reproducibility: I'm analytical chemist. We have huge amounts of work 
going into round robin trials in order to measure the natural 
variability of different labs on very defined systems.
- we also have huge amounts of work going into calibration transfer, 
i.e. making quantitative predictive models work on a different 
instrument. This is always a whole lot of work, and for some fields of 
problems at the moment considered basically impossible even between two 
instruments of the same model and manufacturer.

The quoted results on the mice are not very astonishing to me... ;-)

- Talking about (not so) astonishing differences between between 
replications of experiments:
I find myself moving from reporting ± 1 standard deviation to reporting 
e.g. the 5th to 95th percentiles. Not only because my data distributions 
are often not symmetric, but also because I find Im not able to directly 
perceive the real spread of the data from a standard deviation error 
bar. This is all about perception, of course I can reflect about the 
meaning. Such a reflection also tells me that one student having a 
really unlikely number of right guesses is unlikely but not impossible. 
There is no statistical law stating that unlikely events happen only 
with large sample sizes/number of tests. Yet the immediate perception is 
completely different.


- I happily agree with the ideas of publishing findings (conclusions) as 
well as the data and data analysis code I used to arrive there. But I'm 
aware that part of this agreement is due to the fact that I'm quite 
interested in the data analytical methods (I'd say as well as in the 
particular chemical-analytical problem at hand, but rather

Re: [R] Accessing data via url

I don't know how Henrique did it, but in Firefox one can go to the  
Downloads panel and right click on the downloaded file and choose  
Copy Download link (or something similar) and get:


https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7cpNemjCFz14tAP3IPYCsAnvo-JJbgPNnPEWN_evBHG2jEYaNFOIT6GZF4M3VuKzioPZwvX7QSvMDWfJ3pHac5JK5BHyflOGBLOo_v44C0oU2V6teTwnjeg4TFufeltT-i5T3ThkuyesCztr6g2yLl65YcckwlEGEDtS-L9yzVe1B6tFEu2n6sjAOV9EHokEFx8e-HDFyf-u5mVIGMPgCHvaQL8pupVz-1p1rEdPpS0f6pqApTc%3Dattredirects=0

--
David.

On Jan 7, 2011, at 1:35 PM, John Kane wrote:


Great, but how did you do that?

--- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote:

From: Henrique Dallazuanna www...@gmail.com
Subject: Re: [R] Accessing data via url
To: John Kane jrkrid...@yahoo.ca
Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne- 
biomed.de

Received: Friday, January 7, 2011, 1:31 PM

Google redirect to:

https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0




On Fri, Jan 7, 2011 at 4:29 PM, John Kane jrkrid...@yahoo.ca wrote:


Well it seems to mean that the file has moved.  It , to my untrained  
eye, seems to be in the same sport as it was yesterday BTW I am now  
talking about



dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv 



Would this indicate it's physically moving on googe servers?



I had not realised there was something like  rawToChar(). Amazing  
what R does.


Thank ,

--- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote:



From: Henrique Dallazuanna www...@gmail.com


Subject: Re: [R] Accessing data via url
To: John Kane jrkrid...@yahoo.ca
Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne- 
biomed.de



Received: Friday, January 7,
2011, 1:21 PM

Take a look on the return of:

 rawToChar(x)

On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote:








--- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote:




From: Dieter Menne dieter.me...@menne-biomed.de





Your original file is no longer there, but when I try RCurl



with a png file



that is present, I get a certificate error:







Dieter




Since replaced with

dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv 







library(RCurl)

dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv 



x = getBinaryURL(dd, ssl.verifypeer = FALSE)



seems to be downloading a binary file.



Thanks



__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O





--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Joel Schwartz

The issue Spencer brings up is a problem whether the funding is private or
public. Just as businesses fund studies that support their goals, government
agencies fund studies that justify the need for their services and expansion
of their powers and budgets. In fact, there's a whole field of study
variously called public choice economics and the new institutional
economics that study these and related issues. 

On a related note, there is certainly a lot of self-selection bias in what
fields of study people choose to enter. For just one example, it isn't too
difficult to believe that of the pool of people talented and interested in
statistics, those who choose to enter public health or epidemiology might be
more likely to want research that justifies expansion of public health and
environmental agencies' regulatory powers and this might affect the research
questions they ask, the ways they design and select their statistical
models, and what results they choose to include and exclude from
publications. AFAIK, there is substantial evidence that researchers,
espeically in non-experimental studies, tend to get results they expect or
hope to find, even if they feel no conscious bias. This is likely one of
the reasons observational studies are so frequently overturned by randomized
controlled trials. RCT's provide less room for confirmation bias to rear its
ugly head. 

Joel 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Spencer Graves
Sent: Thursday, January 06, 2011 9:13 PM
To: Carl Witthoft
Cc: r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity


   A more insidious problem, that may not affect the work of Jonah
Lehrer, is political corruption in the way research is funded, with less
public and more private funding of research
(http://portal.unesco.org/education/en/ev.php-URL_ID=21052URL_DO=DO_TOPICU
RL_SECTION=201.html).  
...as private funding replaces tax money for basic science, we must expect
an increase in research results that match the needs of the funding agency
while degrading the quality of published research.  This produces more
research that can not be replicated -- effects that get smaller upon
replication.  (My wife and I routinely avoid certain therapies recommended
by physicians, because the physicians get much of their information on
recent drugs from the pharmaceuticals, who have a vested interest in
presenting their products in the most positive light.)


   Spencer


On 1/6/2011 2:39 PM, Carl Witthoft wrote:
 The next week's New Yorker has some decent rebuttal letters.  The case 
 is hardly as clear-cut as the author would like to believe.

 Carl

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-07 Thread Ben Ward

I believe what I'm doing, is an ancova, because I have two categorical 
and a numerical explanatory variables, and a numerical response variable 
(this is the same experiment as before, the bacteria), and I'm just, at 
the minute (because I'm only half way through), doing some modelling and 
seeing what I get with what I currently have. And I'm paying attention 
to 95% CI for the different terms of a model, as well as the 
coefficient, and the explanatory power of the term and likelyhood that 
the same result could be obtained at random through the P values, 
derived from F. To be honest I havent checked much what my data 
distributions are like and such becasue I'm not finished collecting it 
yet. I mainly mentioned the ranking because it was given considerable 
mention in one of my texts sections on hypothesis testing on models.



On 07/01/2011 18:34, Greg Snow wrote:

A lot of this depends on what question you are really trying to answer.  For 
one way anova replacing y-values with their ranks essentially transforms the 
distribution to uniform (under the null) and the Central Limit Theorem kicks in 
for the uniform with samples larger than about 5, so the normal approximations 
are pretty good and the theory works, but what are you actually testing?  The 
most meaningful null that is being tested is that all data come from the exact 
same distribution.  So what does it mean when you reject that null?  It means 
that all the groups are not representing the same distribution, but is that 
because the means differ? Or the variances? Or the shapes? It can be any of 
those.  Some point out that if you make certain assumptions such as symmetry or 
shifts of the same distributions, then you can talk about differences in means 
or medians, but usually if I am using non-parametrics it is because I don't 
believe that things are symmetric and the shift idea doesn't fit in my mind.

Some alternatives include bootstrapping or permutation tests, or just 
transforming the data to get something closer to normal.

Now what does replacing by ranks do in 2-way anova where we want to test the 
difference in one factor without making assumptions about whether the other 
factor has an effect or not?  I'm not sure on this one.

I have seen regression on ranks, it basically tests for some level of 
relationship, but regression is usually used for some type of prediction and 
predicting from a rank-rank regression does not seem meaningful to me.

Fitting the regression model does not require normality, it is the tests on the 
coefficients and confidence and prediction intervals that assume normality 
(again the CLT helps for large samples (but not for prediction intervals)).  
Bootstrapping is an option for regression without assuming normality, 
transformations can also help.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Ben Ward
Sent: Thursday, January 06, 2011 2:00 PM
To: r-help@r-project.org
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality

On 06/01/2011 20:29, Greg Snow wrote:

Some would argue to always use the kruskal wallis test since we never

know for sure if we have normality.  Personally I am not sure that I
understand what exactly that test is really testing.  Plus in your case
you are doing a two-way anova and kruskal.test does one-way, so it will
not work for your case.  There are other non-parametric options.
Just read this and had queries of my own and comments on this subject:
Would one of these options be to rank the data before doing whatever
model or test you want to do? As I understand it makes the place of the
data the same, but pulls extreme cases closer to the rest. Not an
expert
though.
I've been doing lm() for my work, and I don't know if that makes an
assumption of normality (may data is not normal). And I'm unsure of any
other assumptions as my texts don't really discuss them. Although I can
comfortably evaluate a model say using residual vs fitted, and F values
turned to P, resampling and confidence intervals, and looking at sums
of
squares terms add to explanation of the model. I've tried the plot()
function to help graphically evaluate a model, and I want to make sure
I
understand what it's showing me. I think the first, is showing me the
models fitted values vs the residuals, and ideally, I think the closer
the points are to the red line the better. The next plot is a Q-Q plot,
the closer the points to the line, the more normal the model
coefficients (or perhaps the data). I'm not sure what the next two
plots
are, but it is titled Scale-Location. And it looks to have the square
root of standardized residuals on y, and fitted model values on x.
Might
this be similar to the first plot? The final one is titled Residuals vs
Leverage, which has standardized residuals on y and leverage

Re: [R] plot without points overlap

2011-01-07 Thread Joshua Wiley

On Fri, Jan 7, 2011 at 10:34 AM, JD joon...@gmail.com wrote:
 Hey, thanks for the suggestions but I'm still having some problem.
 I was able to modify the size pdf/ps file that I generate,

This is a somewhat different question.  The size of the finally
created portable document or postscript file is determined by the
width and height arguments when you start the respective device.  I
would just expand the size of the PDF/PS.  If you have a particular
aspect ratio you desire, then just be sure to expand height
proportionally when you add width to accommodate the labels.

 but still labels of the right side of the heatmap are not plot entirely.

 I tried par(mar..) par(pin..) but still cannot do it,
 any ideas?
 Thanks!
 g
 2011/1/4 Joshua Wiley jwiley.ps...@gmail.com

 Hi,

 You can set the device region in inches using the pin argument (see
 ?par maybe halfway down or so).  You can also set the aspect ratio in
 plot(), but I am not sure that is really what you want (see
 ?plot.window for that).

 Two Examples
 ###
 par(pin = c(2, 4))
 plot(1:10)
 dev.off()

 plot(1:10, asp = 2)
 ###

 Hope that helps,

 Josh


 On Tue, Jan 4, 2011 at 8:46 AM, joonR joon...@gmail.com wrote:
 
  Hi,
 
  I'm trying to plot a grid of points of different dimensions using the
  simple
  plot() function.
  I want to plot the points such that they DO NOT overlap,
  I guess there should be a way to set a maximum distance between the
  points,
  but I cannot find it.
 
  Can you help?
  Thanks a lot!
 
  g
 
  PS: Is it possible to produce device regions of different dimensions?
  (i.e. a rectangular one with height  width)
  --
  View this message in context:
  http://r.789695.n4.nabble.com/plot-without-points-overlap-tp3173894p3173894.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot without points overlap

2011-01-07 Thread JD

Hey, thanks for the suggestions but I'm still having some problem.

I was able to modify the size pdf/ps file that I generate,
but still labels of the right side of the heatmap are not plot entirely.

I tried par(mar..) par(pin..) but still cannot do it,
any ideas?

Thanks!

g

2011/1/4 Joshua Wiley jwiley.ps...@gmail.com

 Hi,

 You can set the device region in inches using the pin argument (see
 ?par maybe halfway down or so).  You can also set the aspect ratio in
 plot(), but I am not sure that is really what you want (see
 ?plot.window for that).

 Two Examples
 ###
 par(pin = c(2, 4))
 plot(1:10)
 dev.off()

 plot(1:10, asp = 2)
 ###

 Hope that helps,

 Josh


 On Tue, Jan 4, 2011 at 8:46 AM, joonR joon...@gmail.com wrote:
 
  Hi,
 
  I'm trying to plot a grid of points of different dimensions using the
 simple
  plot() function.
  I want to plot the points such that they DO NOT overlap,
  I guess there should be a way to set a maximum distance between the
 points,
  but I cannot find it.
 
  Can you help?
  Thanks a lot!
 
  g
 
  PS: Is it possible to produce device regions of different dimensions?
  (i.e. a rectangular one with height  width)
  --
  View this message in context:
 http://r.789695.n4.nabble.com/plot-without-points-overlap-tp3173894p3173894.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to export/save an mrpp object?

2011-01-07 Thread Nikos Alexandris

Gavin:
[...]
 I think you should read ?load to rethink what these functions do and how
 they do it.
[...]

Absolutely correct. I will. Thank you for your time Gavin, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rprofile.site

2011-01-07 Thread Julian Pritsch


Hi all,
i just installed R 2.12.1 and Tinn-R 2.3.7.1. My OS is Windows 7 32bit.

Here is my question:
Do I have to execute the rprofile.site manually every time I start a 
R/Tinn-R session? Because it seems that way.

Here is the error code I receive after sending selection(echo=TRUE):

source(.trPaths[5], echo=TRUE, max.deparse.length=150)

It doesn't do that after I configure 
(Rconfigurepermanent(rprofile.site)) but after i closed R/Tinn-R I 
have to configure R again. Is that necessary?


Thank you,
Julian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Compare the risk

2011-01-07 Thread kiotoqq


I have a dataset which looks like this:

 cbr  dust smoking expo
1  0  0.20   15
2  0  0.25   14
3  0  0.25   18
4  0  0.25   14
5  0  0.25   14
6  0  0.25   18
7  0  0.25   18
8  0  0.25   14
9  1  0.25   18
10 0  0.29   17
11 0  0.29   07
12 0  0.29   17
13 0  0.30   0   10
14 0  0.30   1   10
15
16
.
.
.


How can I compare the risk of getting bronchitis between smoker and
no-smoker? 




-- 
View this message in context: 
http://r.789695.n4.nabble.com/Compare-the-risk-tp3189084p3189084.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vector of character with unequal width

On Fri, Jan 07, 2011 at 02:55:18PM +, jose Bartolomei wrote:
 
 Dear R users,
  
 Thanks for your help
 The recomendations were exaclty what I was searching. 
 Bellow the one I will use for my script.
 Thanks again,
 jose
  
 nchar(xx)
   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
 1
  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 4 4 4 4 4 4 4 
 4
  [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  
 z1 - rep(0, times=length(xx))
 z2 - substr(z1, 1, 9 - nchar(xx))
 xx - paste(z2, xx, sep=)
 xx-substring(xx, 6, 9)
 nchar(xx)
  
 [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
  [38] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
 4
  [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Let me suggest a slightly simpler code, which produces the same
output, if the input has length at most 9.

  xx - c(abc, abcd, abcde)

  xx - paste(000, xx, sep=)
  xx - substr(xx, nchar(xx) - 3, nchar(xx))

  cbind(xx)
  # xx
  #[1,] 0abc
  #[2,] abcd
  #[3,] bcde

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Random Effects Meta Regression

2011-01-07 Thread s306


Hi All,

I have run a series of random effects meta regressions on binomial outcomes
using the metabin function in R. Now I would like to conduct some random
effects meta regressions on the outcomes. Is there a command available which
will allow for me to test the impact of a certain variable on the relative
treatment effect from my meta regressions?

Many Thanks,
Steph
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Random-Effects-Meta-Regression-tp3180267p3180267.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to calculate this natural logarithm

On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote:
 Hello
 
   I want to calculate natural logarithm of sum of combinations as follow: 
 (R code)
 
   {
   
 com_sum=choose(200,482)*choose(100,118)+choose(200,483)*choose(100,117)+...+choose(200,i)*choose(100,600-i)+...+choose(200,600)*choose(100,0)
   #calculate the sum
  result=log(com_sum) #calculate 
 the log of the sum
   }
 
   But every element of the com_sum is Inf, so I can't get the result
   Thank you in advance

The natural logarithm of choose() is lchoose(), so the
natural logarithms of the products above are

  i - 482:600
  logx - lchoose(200,i) + lchoose(100,600-i)
  maxlog - max(logx)
  # [1] 5675.315

The sum of numbers, which have very different magnitudes, may
be approximated by their maximum, so max(logx) is an
approximation of the required logarithm. A more accurate
calculation can be done, for example, as follows

  maxlog + log(sum(exp(logx - maxlog)))
  # [1] 5675.977

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

Google redirect to:

https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0


On Fri, Jan 7, 2011 at 4:29 PM, John Kane jrkrid...@yahoo.ca wrote:

 Well it seems to mean that the file has moved.  It , to my untrained eye,
 seems to be in the same sport as it was yesterday BTW I am now talking about

 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 

 Would this indicate it's physically moving on googe servers?

 I had not realised there was something like  rawToChar(). Amazing what R
 does.

 Thank ,


 --- On *Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com* wrote:


 From: Henrique Dallazuanna www...@gmail.com
 Subject: Re: [R] Accessing data via url
 To: John Kane jrkrid...@yahoo.ca
 Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de
 Received: Friday, January 7, 2011, 1:21 PM


 Take a look on the return of:

  rawToChar(x)

 On Fri, Jan 7, 2011 at 4:13 PM, John Kane 
 jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca
  wrote:



 --- On Fri, 1/7/11, Dieter Menne 
 dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de
 wrote:

  From: Dieter Menne 
  dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de
 

  Your original file is no longer there, but when I try RCurl
  with a png file
  that is present, I get a certificate error:
 
  Dieter

 Since replaced with
 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 


 library(RCurl)
 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 
 x = getBinaryURL(dd, ssl.verifypeer = FALSE)

 seems to be downloading a binary file.

 Thanks

 __
 R-help@r-project.org http://mc/compose?to=r-h...@r-project.org mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

In firefox, after the download, rigth click in the download and copy
download link

On Fri, Jan 7, 2011 at 4:35 PM, John Kane jrkrid...@yahoo.ca wrote:

 Great, but how did you do that?

 --- On *Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com* wrote:


 From: Henrique Dallazuanna www...@gmail.com
 Subject: Re: [R] Accessing data via url
 To: John Kane jrkrid...@yahoo.ca
 Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de
 Received: Friday, January 7, 2011, 1:31 PM


 Google redirect to:


 https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0


 On Fri, Jan 7, 2011 at 4:29 PM, John Kane 
 jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca
  wrote:

 Well it seems to mean that the file has moved.  It , to my untrained eye,
 seems to be in the same sport as it was yesterday BTW I am now talking about

 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 

 Would this indicate it's physically moving on googe servers?

 I had not realised there was something like  rawToChar(). Amazing what R
 does.

 Thank ,


 --- On *Fri, 1/7/11, Henrique Dallazuanna 
 www...@gmail.comhttp://mc/compose?to=www...@gmail.com
 * wrote:


 From: Henrique Dallazuanna 
 www...@gmail.comhttp://mc/compose?to=www...@gmail.com
 
 Subject: Re: [R] Accessing data via url
 To: John Kane jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca
 
 Cc: r-help@r-project.org http://mc/compose?to=r-h...@r-project.org,
 Dieter Menne 
 dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de
 
 Received: Friday, January 7, 2011, 1:21 PM


 Take a look on the return of:

  rawToChar(x)

 On Fri, Jan 7, 2011 at 4:13 PM, John Kane 
 jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca
  wrote:



 --- On Fri, 1/7/11, Dieter Menne 
 dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de
 wrote:

  From: Dieter Menne 
  dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de
 

  Your original file is no longer there, but when I try RCurl
  with a png file
  that is present, I get a certificate error:
 
  Dieter

 Since replaced with
 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 


 library(RCurl)
 dd - 
 https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv
 
 x = getBinaryURL(dd, ssl.verifypeer = FALSE)

 seems to be downloading a binary file.

 Thanks

 __
 R-help@r-project.org http://mc/compose?to=r-h...@r-project.org mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O





 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with 2-ways ANOVA interactions

2011-01-07 Thread Greg Snow

Maybe a simple concrete example would help:

 tmpdat - data.frame( One= rep( c('A','B'), each=10 ),
+ Two=rep( c('C','D'), each=5, length.out=20 ),
+ mu1 = rep( c(10, 11, 12, 16), each=5 ) )

 tmpdat$e - with(tmpdat, ave( rnorm(20), One, Two, FUN=scale ) )
 tmpdat$y - with(tmpdat, mu1+e)

 # check the means

 tapply( tmpdat$y, tmpdat[,c('One','Two')], mean )
   Two
One  C  D
  A 10 11
  B 12 16

 # now fit the data

 fit - aov( y ~ One*Two, data=tmpdat )

 # look at what was measured

 coef(fit)
(Intercept)OneBTwoD   OneB:TwoD
 10   2   1   3

 # notice:

 (16-12) - (11-10)
[1] 3

 (16-11) - (12-10)
[1] 3

 # another way of thinking

 model.tables(fit)
Tables of effects

One
One
A B
-1.75  1.75

Two
Two
C D
-1.25  1.25

One:Two
   Two
One C D
  A  0.75 -0.75
  B -0.75  0.75
 model.tables(fit, 'means')
Tables of means
Grand mean

12.25

One
One
   AB
10.5 14.0

Two
Two
   CD
11.0 13.5

One:Two
   Two
One C  D
  A 10 11
  B 12 16

 fit2 - aov( y ~ One + Two, data=tmpdat )
 model.tables(fit2)
Tables of effects

One
One
A B
-1.75  1.75

Two
Two
C D
-1.25  1.25
 model.tables(fit2, 'means')
Tables of means
Grand mean

12.25

One
One
   AB
10.5 14.0

Two
Two
   CD
11.0 13.5

 tmpdat2 - expand.grid( One=c('A','B'), Two=c('C','D') )
 cbind( tmpdat2, fit=predict(fit, tmpdat2), fit2=predict(fit2, tmpdat2) )
  One Two fit  fit2
1   A   C  10  9.25
2   B   C  12 12.75
3   A   D  11 11.75
4   B   D  16 15.25


Now go back and replace the 16 with 13 and see how things change.

Also, how are you accounting for people in your model?  Did you have multiple 
people? Did each person report more than one outcome?  You probably need to 
include person as some form of random effect to properly account for them.  
This is really getting to the point where you need a consultant (or several 
more classes).

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: Frodo Jedi [mailto:frodo.j...@yahoo.com]
Sent: Thursday, January 06, 2011 2:39 PM
To: Greg Snow; r-help@r-project.org
Subject: Re: [R] Problem with 2-ways ANOVA interactions

Dear Greg,
many many thanks, still have a doubt:
 Before I wrongly thought that if I do the analysis anova(response ~ 
 stimulus*condition) I would have got the comparison between
 the same stimulus in condition A and in condition AH (e.g. stimulus_1_A, 
 stimulus_1_AH).
 Instad, apparently, the interaction stimulus:condition means that I find the 
 differences between the stimuli keeping fixed the condition!!
 If this is true then doing the anova with the interaction stimulus:condition 
 is equivalent to do the ONE WAY ANOVA  first on
 the subset where all the conditions are A and then on the subset where all 
 the conditions are AH? Right?

I think you are closer, but not quite there.  The test on the interaction 
tests if the difference between A and AH is the same across the different 
stimuli.  The main effect for condition tests if there is a difference 
between A and AH.


So you mean that the interaction compare for example:  stimulus1 in condition A 
with stimulus 2 in condition AH, right?



Could you please answer also to my question I did at the end?..that is what at 
the end I what to know:

 So if all before is correct, my final question is: how by means of ANOVA can 
 I track the significative differences between the stimuli
 presented in A and AH condition whitout passing for the t-test? Indeed my 
 goal was to find in one hand if globally the condition
 AH bring to better results than condition A, and on the other hand I needed 
 to know for which stimuli the condition AH brings
 better results than condition A.


Thanks

Best regards



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] survval analysis microarray expression data

2011-01-07 Thread Terry Therneau

 For any given pre-specified gene or short list of genes, yes the Cox
model works fine.  Two important caveats:

 1. Remeber the rule of thumb for a Cox model of 20 events per variable
(not n=20).  Many microarray studies will have very marginal sample
size.

2. If you are looking at many genes then a completely different strategy
is required.  There is a large and growing literature; I like Newton et
al, Annals of Applied Statistictis, 2007, 85-106 as an intro; but expect
to read much more.  

Terry Therneau

 begin included message -

I want to test the expression of a subset of genes for correlation with 
patient survival. I found out that the coxph function is appropriate
for 
doing this since it works with continuous variables such as Affy mRNA 
expression values.

I applied the following code:

cp - coxph(Surv(t.rfs, !e.rfs) ~ ex, pData(eset.n0)) #t.rfs: time to 
relapse, status (0=alive,1=dead), ex: expression value (continuous)

The results I get look sensible but I would appreciate any advice on
the 
correctness and also any suggestions for any (better) alternative
methods.

Best wishes

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fitting an Inverse Gamma Distribution to Survey Data

2011-01-07 Thread emorway


Hello, 

I've been attempting to fit the data below with an inverse gamma
distribution.  The reason for this is outside proprietary software (@Risk)
kicked back a Pearson5 (inverse gamma) as the best fitting distribution with
a Chi-Sqr goodness-of-fit roughly 40% better than with a log-normal fit. 
Looking up Inverse gamma on this forum led me the following post:

http://r.789695.n4.nabble.com/Inverse-gamma-td825481.html#a825482

But I think I'm misunderstanding the suggestion made in that post.  Is there
way to estimate the shape and rate parameters for an inverse-gamma and then
plot the PDF as I have done below using other more readily available pdf's
in R?

Thanks, Eric

library(MASS)

iniSal_US_forHist-c(2.368000,3.532614,3.064330,3.347069,3.066333,4.233636,3.465650,2.858553,
2.946731,2.945417,2.415000,2.873019,5.521000,5.788148,5.314630,5.509672,6.032840,6.009310,
4.110833,6.073182,5.652833,4.425733,6.481852,4.076857,3.289310,4.524000,3.985811,5.399714,
4.490606,6.956729,5.270933,8.099107,5.058250,6.394500,5.644000,5.202459,5.67,3.152680,
3.220952,2.777381,3.115467,3.642759,3.488333,3.022439,2.610290,2.618571,3.218000,3.417634,
10.327317,7.344270,6.886154,4.015800,3.063103,6.832292,4.600238,2.939000,5.999027,7.894878,
4.411538,2.384762,6.816154,2.782500,2.475333,2.799138,2.739063,2.619917,2.892545,2.468167,
2.577079,2.821875,2.502500,2.969032,2.046023,3.073077,4.408000,3.411774,3.50,4.283607,
4.284000,4.276714,3.228103,2.639875,3.453194,2.821200,3.838723,1.714253,2.273750,2.611882,
2.321781,2.567500,2.557045,1.288875,2.175211,1.736000,2.250781,7.433366,7.033553,5.47,
7.132727,8.505937,9.174545,6.554487,7.060286,6.617160,8.210986,4.404045,6.062381,5.149625,
2.972105,5.358889,3.910968,3.715873,1.728966,2.843667,4.413906,3.016346,7.168636,3.839394,
3.930141,7.019882,3.459429,5.050250,3.492714,3.226667,3.987667,2.770227,3.661167,1.553000,
2.867391,2.897193,2.611707,2.577167,2.904697,2.733077,2.507241,11.044865,6.425484,8.567222,
8.552344,7.493396,4.807381,9.697869,9.471333,6.783175,4.563571,8.059649,9.448679,5.803778,
4.769423,4.424634,7.586042,4.451556,3.622373,6.390152,4.424375,4.135806,5.025400,5.410635,
7.012292,2.961071,3.192188,2.989643,3.471429,2.867966,1.980541,3.172344,2.574783,2.958983,
1.708140,3.604853,3.479000,2.845000,2.742603,2.923968,3.620308,2.452500,2.721375,3.166333,
2.742162,2.793000,3.337000,5.192025,5.365875,3.079000,8.415970,6.612277,6.734706,4.856857,
5.164783,7.743667,6.894151,4.666538,9.227167,8.077581,6.109833,6.621724,18.098182,12.705600,
15.490784,17.394750,12.422364,14.832727,8.326000,11.352400,3.431429,2.658261,3.219773,3.605185,
4.030299,3.262241,3.503250,3.522763,2.847312,2.996618,3.075769,3.387731,3.066923,3.078200,
2.466957,3.214167,2.707778,3.384839,2.283556,2.912258,3.378000,2.726750,2.95,2.195000,
4.819063,3.604578,3.694906,5.068000,4.676582,3.028831,4.261042,3.593235,4.501224,2.880317,
5.750333,3.257833,3.967458,2.522292,2.725738,2.549231,2.591389,2.990488,2.681222,2.685854,
2.284750,2.585938,2.432824,3.108875,2.611340,3.916667,2.418095,2.476406,2.801235,3.278000,
2.434921,2.617826,3.133939,2.774321,4.196173,3.764286,3.555833,5.317361,3.970800,4.136400,
4.487013,3.746393,4.754000,3.854316,3.742353,3.044079,2.817821,3.995179,3.643134,3.642593,
3.604533,2.935902,4.088310,5.344407,3.076883,3.287105,3.720870,2.032258,2.872593,5.787313,
6.017838,5.425205,4.880600,3.582295,4.90,3.489016,4.603030,5.344407,6.184286,4.047083,
4.788304,4.661325,4.815938,4.056790,3.765595,5.348772,5.200222,4.906311,3.900147,3.782897,
3.767313,3.417732,3.725455,2.888750,2.552333,2.521613,2.531522,2.510833,2.710208,2.445273,
2.619750,2.094737,2.399355,2.758000,2.317077,2.247755,3.594333,4.607805,2.69,3.084706,
3.529000,2.326200,3.309851,2.647805,2.802250,2.778667,3.231235,2.418065,3.134545,3.807843,
2.988372,2.709792,3.084035,3.633279,2.958750,2.170081,2.67,2.640706,2.841600,3.399219,
2.561373,2.574824,3.046447,2.647500,3.554875,1.865000,2.858333,2.355000,2.508082,2.376833,
2.728710,1.752833,1.571967,1.800333,2.265455,2.353226,2.568028,2.359500,2.472639,1.675385,
2.667386,2.49,2.482632,2.593452,2.695510,2.466941,2.624211,3.835645,3.519667,2.661940,
3.516167,3.146585,3.420462,4.809833,3.028500,3.192297,4.256333,2.516897,3.033846,2.793359,
6.700714,5.441250,6.872500,4.528333,7.490328,4.673788,6.376885,3.023167,4.429167,4.317625,
16.729231,8.372500,6.279828,10.805098,8.359452,8.277576,8.008846,8.742308,12.155942,5.975063,
3.317333,2.883021,3.310822,3.119219,3.921190,3.552986,3.647162,4.017667,3.895342,3.381096,
2.769412,3.225294,4.169333,3.733919,2.859492,3.674875,3.401017,3.160267,4.109545,4.347867,
3.867000,3.672763,4.364844,3.804250,2.613000,4.289909,4.212059,4.785797,4.149687,6.299444,
5.640135,5.713448,4.766438,7.032027,5.610533,3.126111,6.322029,4.417692,6.392532,2.753175,
2.549655,3.456533,3.199000,3.852338,3.387549,3.098033,3.271733,3.679180,2.655484,6.858167,
5.808033,7.55,8.388387,5.108732,7.895152,5.223580,5.741714,8.026250,4.928421,2.797292,

Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity

2011-01-07 Thread Ravi Varadhan

Conclusion: try to be/become a good scientist: with a high prevalence of
good ideas.

Or, I would say:  try to publish only good and mature ideas.  Gauss said
it best pauca sed matura  or few, but ripe.


Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Claudia Beleites
Sent: Friday, January 07, 2011 1:46 PM
To: r-help@r-project.org
Subject: Re: [R] Wyy off topic...Statistical methods, pub bias,
scientific validity

On 01/07/2011 06:13 AM, Spencer Graves wrote:
   A more insidious problem, that may not affect the work of Jonah 
 Lehrer, is political corruption in the way research is funded, with 
 less public and more private funding of research 
Maybe I'm too pessimistic, but the term _political_ corruption reminds 
me that I can just as easily imagine a funding bias* in public 
funding. And I'm not sure it is (or would be) less of a problem just 
because the interests of private funding are easier to spot.

* I think of bias on both sides: the funding agency selecting the 
studies to support and the researcher subconsciously complying to the 
expectations of the funding agency.

On 01/07/2011 08:06 AM, Peter Langfelder wrote:
  From a purely statistical and maybe somewhat naive point of view,
 published p-values should be corrected for the multiple testing that
 is effectively happening because of the large number of published
 studies. My experience is also that people will often try several
 statistical methods to get the most significant p-value but neglect to
 share that fact with the audience and/or at least attempt to correct
 the p-values for the selection bias.
Even if the number of all the tests were known, I have the impression 
that the corrected p-value would be kind of the right answer to the 
wrong question. I'm not particularly interested in the probability of 
arriving at  the presented findings if the null hypothesis were true. 
I'd rather know the probability that the conclusions are true. Switching 
to the language of clinical chemistry, this is: I'm presented with the 
sensitivity of a test, but I really want to know the positive predictive 
value. What is still missing with the corrected p-values is the 
prevalence of good ideas of the publishing scientist (not even known 
for all scientists).  And I'm not sure this is not decreasing if the 
scientist generates and tests more and more ideas.
I found my rather hazy thoughts about this much better expressed in the 
books of Beck-Bornholdt and Dubben (which I'm afraid are only available 
in German).

Conclusion: try to be/become a good scientist: with a high prevalence of 
good ideas. At least with a high prevalence of good ideas among the 
tested hypotheses. Including thinking first which hypotheses are the 
ones to test, and not giving in to the temptation to try out more and 
more things as one gets more familiar with the experiment/data set/problem.
The latter I find very difficult. Including the experience of giving a 
presentation where I explicitly talked about why I did not do any 
data-driven optimization of my models. Yet in the discussion I was very 
prominently told I need to try in addition these other pre-processing 
techniques and these other modeling techniques - even by people whom I 
know to be very much aware and concerned about optimistically biased 
validation results. Which were of course very valid questions (and easy 
to comply), but I conclude it is common/natural/human to have and want 
to try out more ideas.
Also, after several years in the field and with the same kind of samples 
of course I run the risk of my ideas being overfit to our kind of 
samples - this is a cost that I have to pay for the gain due to 
experience/expertise.

Some more thoughts:
- reproducibility: I'm analytical chemist. We have huge amounts of work 
going into round robin trials in order to measure the natural 
variability of different labs on very defined systems.
- we also have huge amounts of work going into calibration transfer, 
i.e. making quantitative predictive models work on a different 
instrument. This is always a whole lot of work, and for some fields of 
problems at the moment considered basically impossible even between two 
instruments of the same model and manufacturer.
The quoted results on the mice are not very astonishing to me... ;-)

- Talking about (not so) astonishing differences between between 
replications of experiments:
I find myself moving from reporting ± 1 standard deviation to reporting 
e.g. the 5th to 95th percentiles. Not only because my data distributions 
are often not symmetric, but also because I find Im not able to directly 
perceive the real spread of the data from a standard deviation

Re: [R] how to run linear regression models at once

2011-01-07 Thread Dennis Murphy

Hi:

On Fri, Jan 7, 2011 at 7:04 AM, wangwallace talentt...@gmail.com wrote:


 hey, folks,

 I have two very simple questions. I am not quite sure if I can do this
 using
 R.

 so, I am analyzing a large data frame with thousands of variables. For
 example:

 Dependent variables: d1, d2, d3 (i.e., there are three dependent variables)

 Independent variables: s1, s2, s3, ..s1000 (i.e., there are 1000
 independent variables)

 now I want to run simple linear regression analyses of dependent variables
 on independent variables. A laborious way to do this is running 1000 linear
 regression models for each dependent variable separately. This would take a
 lot of time. My question is:

 1) is there a easy way to run all 1000 linear regression analyses for each
 dependent variable at once?


Yes.

 2) after running all 1000 linear regression analyses at once, how can I
 write 3000 regression weights (1000 regression weights for each dependent
 variable)  and its significance level in to a file (e.g., csv, excel, ect).


Define 'weights'. Do you mean the parameter estimates?

Here's a simplified example, but the output is a separate list object for
each response. Each component of each list is an lm object, produced from
the lm() function.

 # Generate some fake data: three responses and eight covariates
 df - data.frame(y1 = rnorm(50), y2 = rnorm(50), y3 = rnorm(50),
  x1 = rnorm(50), x2 = rnorm(50), x3 = rnorm(50),
  x4 = rpois(50, 30), x5 = rpois(50, 20), x6 = rpois(50,
10),
  x7 = runif(50), x8 = runif(50))

# Create a vector of covariate names
xs - paste('x', 1:8, sep = '')
# Initialize a list whose length is that of the vector xs
rl1 - vector('list', 8)
rl2 - vector('list', 8)
rl3 - vector('list', 8)

# The following loop regresses all three responses individually on a single
covariate x[i]
# and exports the results to separate lists for each response
# The first statement creates a formula with the name of the i-th covariate
# The second statement does the regression and assigns the output to the
i-th
# component of the list corresponding to the j-th response (j = 1, 2, 3)
for(i in 1:8) { fm1- as.formula(paste('y1', xs[i], sep = '~'))
 fm2 - as.formula(paste('y2', xs[i], sep = '~'))
 fm3 - as.formula(paste('y3', xs[i], sep = '~'))
 rl1[[i]] - lm(fm1, data = df)
 rl2[[i]] - lm(fm2, data = df)
 rl3[[i]] - lm(fm3, data = df)
   }

# The print method of lm() applied to the first component is
rl1[[1]]

Each component of each list will resemble the output object you would get
from running lm() on a single response with one explanatory variable at a
time. In each list, there are as many components as there are covariates.

You can extract all sorts of things from each component of the list; I
prefer using
the ldply() function from package plyr, but there are other ways with base
functions.
Here are some examples:

library(plyr)
# R^2 values:
ldply(rl1, function(x) summary(x)$r.squared)

# Model coefficients:
ldply(rl1, function(x) coef(x))

# p-values of significance tests for intercept and slope
ldply(rl1, function(x) summary(x)$coefficients[, 4])

# residuals from each model
res1 - t(ldply(rl1, function(x) resid(x)))   # produces a matrix

Some comments:

1. If you want to run multivariate regression on the response matrix [y1 y2
y3],
   you only need one list object for output. Substitute 'cbind(y1, y2, y3)'
for
   yi when creating the formula. Only one call is needed for multivariate
   regression rather than three for the univariate regressions. However, you
   would then need to be more careful about output structures and pulling
them
   together over list components. For instance, in a multivariate
regression, the
   coefficients are output as a matrix rather than a vector, so to combine
them
   all, you would need to use a 3D array rather than a data frame.
2. The 'x' in the anonymous functions in ldply() corresponds to a generic
list
   component. In this context, x is an lm object, so anything you could do
with
   the output of an lm object could be mapped componentwise with ldply. This
   approach is very convenient if you want to pick off individual output
pieces
   (e.g., R^2 or the root MSE) for all the regressions at once.
3. To associate the covariates with rows of output in each generated data
frame above, you could use, e.g.,
  r2y1 - ldply(rl1, function(x) summary(x)$r.squared)
  rownames(r2y1) - xs

   In the residual example, the data frame is 8 x 50; using the t() function
   [for transpose], the result is a 50 x 8 matrix whose columns correspond
to
   the individual covariates, so in this case
  colnames(res1) - xs
4. Make sure you organize your code and desired output in advance; with 1000
covariates, things could get messy if you're not careful.

HTH,
Dennis

 Many thanks in advance!!
 --
 View this message in

[R] Error in x %*% coef(object) : non-conformable arguments

2011-01-07 Thread Mike Harwood

Hello, and thanks in advance!

What does the error message Error in x %*% coef(object) : non-
conformable arguments when indicate when predicting values for
newdata with a model from bigglm (in package biglm), and how can I
debug it?  I am attempting to do Monte Carlo simulations, which may
explain the somewhat interesting loop which follows.  After the code I
have included the output, which shows that a) the simulations are
changing the values, and b) there are not any atypical values for the
factors in the seventh iteration.  At the end of the output is the
aforementioned error message.

Code:
===
iter - nrow(nov.2010)
predict.nov.2011 - vector(mode='numeric', length=iter)
for (i in 1:iter) {
iter.df - nov.2010
##-- Update values of dynamic variables --
iter.df$age - iter.df$age + 12
iter.df$pct_utilize -
iter.df$pct_utilize + mc.util.delta[i]

iter.df$updated_varname1 -
ceiling(iter.df$updated_varname1 + mc.varname1.delta[i])

if(iter.df$state==WI)
iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i]
if(iter.df$state==MN)
iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i]
if(iter.df$state==IL)
iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i]
if(iter.df$state==US)
iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i]

##--- Bin Variables --
iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1,
300:499 = '300 - 499';
 500:549 = '500 - 549';
 550:599 = '550 - 599';
 600:649 = '600 - 649';
 650:699 = '650 - 699';
 700:749 = '700 - 749';
 750:799 = '750 - 799'; 800:849 = 'GE 800'; else=
'missing';
 ))
iter.df$bin_age - as.factor(recode(iter.df$age,
0:23   = '  24mo.';
 24:72  = '24 - 72mo.';
 72:300 = '72 - 300mo'; else   = 'missing';
 ))
iter.df$bin_util - as.factor(recode(iter.df$pct_utilize,
0.0:0.2 = '  0 - 20%';
 0.2:0.4 = '  20 - 40%';
 0.4:0.6 = '  40 - 60%';
 0.6:0.8 = '  60 - 80%';
 0.8:1.0 = ' 80 - 100%';
 1.0:1.2 = '100 - 120%'; else= 'missing';
 ))
iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop,
0:70 = ' 70%';
 70:85 = ' 70 - 85%';
 85:95 = ' 85 - 95%';
 95:110 = '95 - 110%'; else  =  'missing';
 ))
iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing')
iter.df$bin_age - relevel(iter.df$bin_age, 'missing')
iter.df$bin_util - relevel(iter.df$bin_util, 'missing')
iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing')

#~ print(head(iter.df))
if (i=6  i=8){
 print('-')
 browser()
 print(i)
 print(table(iter.df$bin_varname1))
 print(table(iter.df$bin_age))
 print(table(iter.df$bin_util))
 print(table(iter.df$bin_varname2))
#~ debug(predict.nov.2011[i] -
#~  sum(predict(logModel.1, newdata=iter.df,
type='response')))
 }

predict.nov.2011[i] -
 sum(predict(logModel.1, newdata=iter.df, type='response'))

print(predict.nov.2011[i])

  }


Output
==
[1] 36.56073
[1] 561.4516
[1] 4.83483
[1] 5.01398
[1] 7.984146
[1] -
Called from: top level
Browse[1]
[1] 6

  missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
750 - 799GE 800
  842   283   690  1094  1695  3404
6659 18374 21562

   missing 24mo. 24 - 72mo. 72 - 300mo
16   2997  19709  31881

   missing0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
- 120%
 17906   4832   4599   5154   7205
14865 42

  missing  70%  70 - 85%  85 - 95% 95 - 110%
10423 19429 10568  8350  5833
[1] 11.04090
[1] -
Called from: top level
Browse[1]
[1] 7

  missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
750 - 799
  847   909  1059  1586  3214  6304
16349 24335

   missing 24mo. 24 - 72mo. 72 - 300mo
16   2997  19709  31881

   missing0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
- 120%
 17145   4972   4617   5020   6634
16139 76

  missing  70%  70 - 85%  85 - 95% 95 - 110%
10423 19429 10568  8350  5833
Error in x %*% coef(object) : non-conformable arguments

Model
===
Large data regression model: bigglm(outcome ~ bin_varname1 +
bin_varname2 + bin_age + bin_util +
state + varname3 + varname3:state, family = binomial(link =
logit),
data = dev.data, maxit = 75, sandwich = FALSE)
Sample size =  1372250

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the

[R] Error: unexpected string constant

2011-01-07 Thread paogeomat


I want to analize some points location using the ppp, but i have this
problem:

Datos=read.table(puntos_texto.txt,dec=.,sep=\t,header=T)
 summary(Datos)
   id   y x  
 Min.   :   1.0   Min.   :1013581   Min.   :1177842  
 1st Qu.: 821.2   1st Qu.:1014442   1st Qu.:1179658  
 Median :1641.5   Median :1014455   Median :1179670  
 Mean   :1641.8   Mean   :1014465   Mean   :1179652  
 3rd Qu.:2462.8   3rd Qu.:1014473   3rd Qu.:1179680  
 Max.   :3283.0   Max.   :1015575   Max.   :1180213  
 danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842,max(Datos$x)1180213),c(min(Datos$y)1013581,max(Datos$y)1015575))
Error: inesperado constante numérica en
danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842
 danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842,max(Datos$x)1180213),c(min(Datos$y)1013581,max(Datos$y)1015575))
Error: inesperado string constante en
danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Error-unexpected-string-constant-tp3193987p3193987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare the risk



On Jan 7, 2011, at 1:58 PM, kiotoqq wrote:



I have a dataset which looks like this:

cbr  dust smoking expo
1  0  0.20   15
2  0  0.25   14
3  0  0.25   18
4  0  0.25   14
5  0  0.25   14
6  0  0.25   18
7  0  0.25   18
8  0  0.25   14
9  1  0.25   18
10 0  0.29   17
11 0  0.29   07
12 0  0.29   17
13 0  0.30   0   10
14 0  0.30   1   10
15
16
.
.
.


How can I compare the risk of getting bronchitis between smoker and
no-smoker?


cbrsmk.tbl - with(dfrm, table(cbr, smoking) )
cbrsmk.tbl
cbrsmk.tbl[2, ]/(cbrsmk.tbl[1, ]+cbrsmk.tbl[2, ])

(That homework set must be going awfully slowly.)

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare the risk

2011-01-07 Thread kiotoqq


thank you..

but it says dfrm not found

(sorry, I'm just very new here)
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Compare-the-risk-tp3189084p3204440.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Trouble compiling R-2.10.0

2011-01-07 Thread zacz


Hi,

I am having trouble compiling R-2.10.0 on the Solaris x86_64 platform using
the native solaris-studio cc/F99 compilers.
I am pretty sure that I have all my environment set up for a 64-bit compile.
However, when doing a make I get the following error:

/opt/solstudio12.2/bin/cc -m64 -I./api -I. -I../../../src/include
-I../../../src/include -I/usr/local/include -DHAVE_CONFIG_H  -KPIC  -xO3 -c
alone_decoder.c -o alone_decoder.o
sysdefs.h, line 126: invalid type combination
sysdefs.h, line 126: warning: useless declaration
sysdefs.h, line 126: warning: typedef declares no type name
cc: acomp failed for alone_decoder.c
*** Error code 2
make: Fatal error: Command failed for target `alone_decoder.o'
Current working directory /gtools/src/R-2.10.0/src/extra/xz
*** Error code 1
The following command caused the error:
make liblzma.a
make: Fatal error: Command failed for target `R'
Current working directory /gtools/src/R-2.10.0/src/extra/xz
*** Error code 1
The following command caused the error:
(cd xz; make)
make: Fatal error: Command failed for target `make.xz'
Current working directory /gtools/src/R-2.10.0/src/extra
*** Error code 1
The following command caused the error:
for d in scripts include extra appl nmath unix main modules library; do \
  (cd ${d}  make R) || exit 1; \
done
make: Fatal error: Command failed for target `R'
Current working directory /gtools/src/R-2.10.0/src
*** Error code 1
The following command caused the error:
for d in m4 tools doc etc share src tests  po; do \
  (cd ${d}  make R) || exit 1; \
done
make: Fatal error: Command failed for target `R'

I am perplexed as what could be wrong? Any help would be much appreciated.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Trouble-compiling-R-2-10-0-tp3200883p3200883.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare the risk