[R] favorable mention and use of R on the net

2014-09-03 Thread Ross Boylan
http://equitablegrowth.org/2014/08/26/moving-corrected-calculations-last-weeks-shiller-stock-market-posts-r-afternoon-note/
Begins 

 Because only people who really, really, really want to make bad
 mistakes do things in the un-debuggable Excel (or Numbers)…
 
and then proceeds with an extended analysis in R.  He may have used
Excel in the earlier posts; the comment is certainly a reference to an
infamous series of errors in a spreadsheet behind a much publicized
claim that it was disastrous for countries to have debt  90% GDP.  See
http://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt for details.

The background question to the blog post is whether current stock market
valuations indicate stocks are likely to be a poor medium-term
investment (10 years).

A quick look does not reveal to me what, if any, substantive changes to
the conclusions resulted from using R not Excel (or even if is previous
conclusions were based on Excel).  In fact, it's kind of hard to tell
what the conclusion is!


A couple of more technical comments:

DeLong (the post's author) uses simple regression (lm) and then observes

 The significance levels that R reports are wrong: its naive regression
 package assumes that each of the 1482 observed 10-year returns is
 independent of each of the others. They are not.
I assume R has some tools that can do proper time series analyses; I'm
not sure why he didn't use them.  DeLong is an economic historian, not
an econometrician.

One important observation stems from something even simpler than
regression:
 Basically what we know about expected returns is that on the one
 occasion when CAPE rose above 30, the dot-com crash of 2000 was in the
 near future and the housing crash of 2008 came into the ten-year
 return window. That is not much information on which to base a
 long-run sell decision.
 
Ross Boylan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] wilcox.test - difference between p-values of R and online calculators

2014-09-03 Thread Tal Galili
It seems your numbers has ties. What happens if you run wilcox.test with
correct=FALSE, will the results be the same as the online calculators?



Contact
Details:---
Contact me: tal.gal...@gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--



On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox bradk...@mit.edu wrote:

 Hi.

 I'm taking the long-overdue step of moving from using online calculators to
 compute results for Mann-Whitney U tests to a more streamlined system
 involving R.

 However, I'm finding that R computes a different result than the 3 online
 calculators that I've used before (all of which approximately agree). These
 calculators are here:

 http://elegans.som.vcu.edu/~leon/stats/utest.cgi
 http://vassarstats.net/utest.html
 http://www.socscistatistics.com/tests/mannwhitney/

 An example calculation is


 *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))*

 which prints









 *Wilcoxon rank sum test with continuity correction  data: c(359, 359, 359,
 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, 359,
 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, 359,
 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, 359,
 28.6, 359, 359) and 359, 128)  W = 485, p-value = 0.0002594 alternative
 hypothesis: true location shift is not equal to 0 Warning message: In
 wilcox.test.default(c(359, 359, 359, 359, 359, 359, 335, 359, : cannot
 compute exact p-value with ties*


 However, all of the online calculators find p-values close to 0.0025, 10x
 the value output by R. All results are for a two-tailed case. Importantly,
 the W value computed by R *does agree* with the U values output by the
 first two online calculators listed above, yet it has a different p-value.

 Can anyone shed some light on how and why R's calculation differs from that
 of these online calculators? Thanks for your time.

 
 W. Bradley Knox, PhD
 http://bradknox.net
 bradk...@mit.edu

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shiny datatables column filtering plugin

2014-09-03 Thread Charles Determan Jr
Thank you for checking Yihui, on the off chance are you familiar with any
other methods to filter on multiple conditions?


On Tue, Sep 2, 2014 at 11:07 PM, Yihui Xie x...@yihui.name wrote:

 I just tested it and this plugin does not seem to work with the new
 .DataTable() API in DataTables 1.10.x, so I guess it is unlikely to
 make it work in (the current development version of) shiny. It is not
 in the official list of plugins, either:
 http://www.datatables.net/extensions/index

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Web: http://yihui.name


 On Tue, Sep 2, 2014 at 11:59 AM, Charles Determan Jr deter...@umn.edu
 wrote:
  Greetings,
 
  I am currently exploring some capabilities of the 'Shiny' package.  I am
  currently working with the most recent version of 'shiny' from the
 rstudio
  github repository (version - 0.10.1.9006) in order to use the most up to
  date datatables plugin.  Using the ggplot2 diamonds dataset, I can easily
  set columns as unsearchable (commented out below) and I could also subset
  out all the 'Ideal' diamonds for example, however I cannot filter out
  multiple conditions such as 'Ideal' and 'Fair' diamonds together.  From
 my
  searching, this multiple filtering can be done with checkboxes from the
  column using the jquery column filtering plugin (
 
 http://jquery-datatables-column-filter.googlecode.com/svn/trunk/checkbox.html
 ).
  Despite this, I cannot get this plugin to work with my shiny app.  Any
  insight would be appreciated.
 
  library(shiny)
  library(ggplot2)
  runApp(
list(ui = basicPage(
  h1('Diamonds DataTable with TableTools'),
 
  # added column filter plugin
  singleton(tags$head(tags$script(src='
 https://code.google.com/p/jquery-datatables-column-filter/source/browse/trunk/media/js/jquery.dataTables.columnFilter.js
 ',
  type='text/javascript'))),
  dataTableOutput(mytable)
)
,server = function(input, output) {
  output$mytable = renderDataTable({
diamonds[,1:6]
  }, options = list(
pageLength = 10,#   columnDefs = I('[{targets: [0,1],
  searchable: false}]')
columnFilter = I('[{
  columnDefs: [targets: [0,1], type: checkbox]
  }]')
 
  )
  )
}
))
 
 
 
  Charles



Charles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] .C and .Fortran

2014-09-03 Thread Filippo Monari

Hi,
I'd like to know what is the difference between the functions .C() and 
.Fortran.
I noticed that by enclosing my F90 subroutines in modules .Fortran() 
can't find them any more in the load table, while .C() still can. I also 
checked that the subroutine was loaded with the is.loaded() function...

So can anyone explain to me the difference and which is better to use?
Thanks in advance,
Filippo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] frequencies of a discrete numeric variable, including zeros

2014-09-03 Thread Michael Friendly

Thanks to all who replied to this thread.

To summarize, John Fox and William Dunlap's suggestion amounts to this 
plot, where it
becomes *crucial* to eliminate the zeros (otherwise they would not be 
distinguishable

from the counts of 1, with points()):

# Fox/Dunlap plot, using plot.table method
art.tab0 - table(art)
plot(art.tab0, ylab=Frequency, xlab=Number of articles)
points(as.numeric(names(art.tab0)), art.tab0, pch=16)

Here, I actually prefer the barplot, using factor() to retain the zeros:

# coerce to a factor, then use table()
art.fac - factor(art, levels = 0:19)
art.tab - table(art.fac)
barplot(art.tab, ylab=Frequency, xlab=Number of articles)

However, the frequencies for small values of art dominate the display, 
and I'm contemplating a

Poisson regression anyway, so why not plot on a log scale:

# plot on log scale, but start at 1 to avoid log(0)
barplot(art.tab+1, ylab=log(Frequency+1), xlab=Number of articles, 
log=y)


# plot on log scale, directly
barplot(log(art.tab+1), ylab=log(Frequency+1), xlab=Number of articles)

The first method, using log=y gives axis labels on the scale of frequency.

best,
-Michael


On 9/2/2014 3:49 PM, John Fox wrote:

Hi Michael,

I think that histograms are intrinsically misleading for discrete data, and
that while bar graphs are an improvement, they also invite
misinterpretation. I generally do something like this:

f - table(factor(art, levels=0:19))
plot(as.numeric(names(f)), as.numeric(f), type=h,
 xlab=art, ylab=frequency, axes=FALSE)
axis(1, pos=0, at=0:19)
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)


Actually, I prefer omitting the points corresponding to 0 counts, which is
even simpler:

f - table(art)
plot(as.numeric(names(f)), as.numeric(f), type=h,
 xlab=art, ylab=frequency, axes=FALSE)
axis(1, pos=0, at=min(art):max(art))
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)


Best,
  John

---
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Michael Friendly
Sent: Tuesday, September 02, 2014 1:29 PM
To: R-help
Subject: [R] frequencies of a discrete numeric variable, including
zeros

The data vector, art, given below using dput(),  gives a set of
discrete
numeric values for 915 observations,
in the range of 0:19.  I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom
plot due to 0 frequencies
for some of the 0:19 counts.

table() excludes the values of art that occur with zero frequency, and
these are excluded in
barplot()
   table(art)
art
0   1   2   3   4   5   6   7   8   9  10  11  12  16  19
275 246 178  84  67  27  17  12   1   2   1   1   2   1   1
   barplot(table(art))


A direct calculation, using colSums of outer() gives me the values I
want, but this seems unnecessarily
complicated for this simple task.

   (art.freq - colSums(outer(art, 0:19, `==`)))
   [1] 275 246 178  84  67  27  17  12   1   2   1   1   2   0   0 0
1   0   0   1
barplot(art.freq, names.arg=0:19)


Moreover, I was surprised by the result of hist() on this data, because
the 0  1 counts from
the above were combined in this call:

   art.hist - hist(art, breaks=0:19, plot=FALSE)
   art.hist$breaks
   [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
   art.hist$counts
   [1] 521 178  84  67  27  17  12   1   2   1   1   2   0   0   0 1
0   0   1

Is there some option I missed here?

The data:

   dput(art)
c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

[R] strange date format after extracting calendar data with RSQLite

2014-09-03 Thread Nuno Prista
Hi,

I have recently downloaded RSQLite with intention of using it to access 
googlecalendar data stored in Thunderbird cache.sqlite and 
local.sqlite. Thunderbird exports the data just fine as a .ics file 
but when I access it directly from R using RSQLite function dbGetQuery 
I get strange dates. As an examples:

a date stored as
20140818
appears to me in R as
1129528320

a date stored as
20140823
appears to me in R as
-1663476736

I have not been able to make sense of this and I have found no way to 
convert it back. I am wondering if anyone has experienced these issues 
or have any suggestions what is causing this and how to correct it.

Thank you,

Nuno


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predictNLS {propagate} bug for multiple predictors?

2014-09-03 Thread Patrick Gauthier
Many R users have requested means to generate prediction/confidence
intervals
for their non-linear models in R. The propagate packages offers this
service. However, I'm having issues when applying predictNLS to my model.
Has anyone else had issues with predictNLS?

Here is my simple problem.

t - c(194.84, 95.62, 40.87, 31.64, 0.00)
m - c(0.00, 2.29, 4.72, 10.95, 28.36)
t0 - t[1]
m0 - m[5]

dat - data.frame(t = t, m = m)

nlls - nls(m ~ (m0 ^ (1 / lambda) - (t * m0 / t0) ^ (1 / lambda)) ^ lambda,
 start=list(lambda = 1), data = dat)

xv - seq(0,t0, length.out = 100)
library(propagate)

predictNLS forces me to redefine m0 and t0, which are constants in the
model.

predictNLS(nlls, newdata = data.frame(t = xv))

/Error in predictNLS(nlls, newdata = data.frame(t = xv, t0 = t0, m0 = m0)) :
  newdata should have name(s) m0tt0!/

OK, predict.nls doesn't have this issue. But, let's just give it what it
wants. Notice the predictors needs to be defined in the right order as
well. .

predictNLS(nlls, newdata = data.frame(m0 = m0, t = xv, t0 = t0))

/Propagating predictor value #1 ...
Error in propagate(expr = EXPR, data = DF, use.cov = COV, alpha = alpha,  :
  Names of input dataframe and var-cov matrix do not match!/

I'm not sure what else I can do here to fix the problem, as the issue seems
to be related to names, which predictNLS has accepted until it's internal
use of propagate(). My model is quite simple (i.e., plot the data and see).
The predictNLS works fine on the example provided in the packages
documentation which has only one predictor. Could this be where
the error is being generated (i.e., multiple predictors)? What am I doing
wrong here?

Thanks for any help/advice,

-- 
Patrick Gauthier
PhD Candidate
Faculty of Natural Resources Management
Lakehead University
Office: BB-1007-D (tel. 807 766-7127)
Lab: BB-1071-A (tel. 807 343-8739)
http://abel.lakeheadu.ca/index.shtml
http://forward.lakeheadu.ca/gradstudents.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] wilcox.test - difference between p-values of R and online calculators

2014-09-03 Thread David L Carlson
That does not change the results. The problem is likely to be the way ties are 
handled. The first sample has 25 values of which 23 are identical (359). The 
second sample has 26 values of which 12 are identical (359). The difference 
between the implementations may be a result of the way the ties are ranked. For 
example the R function rank() offers 5 different ways of handling the rank on 
tied observations. With so many ties, that could make a substantial difference.

Package coin has wilxon_test() which uses Monte Carlo simulation to estimate 
the confidence limits.

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Tal Galili
Sent: Wednesday, September 3, 2014 5:24 AM
To: W Bradley Knox
Cc: r-help@r-project.org
Subject: Re: [R] wilcox.test - difference between p-values of R and online 
calculators

It seems your numbers has ties. What happens if you run wilcox.test with
correct=FALSE, will the results be the same as the online calculators?



Contact
Details:---
Contact me: tal.gal...@gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--



On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox bradk...@mit.edu wrote:

 Hi.

 I'm taking the long-overdue step of moving from using online calculators to
 compute results for Mann-Whitney U tests to a more streamlined system
 involving R.

 However, I'm finding that R computes a different result than the 3 online
 calculators that I've used before (all of which approximately agree). These
 calculators are here:

 http://elegans.som.vcu.edu/~leon/stats/utest.cgi
 http://vassarstats.net/utest.html
 http://www.socscistatistics.com/tests/mannwhitney/

 An example calculation is


 *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))*

 which prints









 *Wilcoxon rank sum test with continuity correction  data: c(359, 359, 359,
 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, 359,
 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, 359,
 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, 359,
 28.6, 359, 359) and 359, 128)  W = 485, p-value = 0.0002594 alternative
 hypothesis: true location shift is not equal to 0 Warning message: In
 wilcox.test.default(c(359, 359, 359, 359, 359, 359, 335, 359, : cannot
 compute exact p-value with ties*


 However, all of the online calculators find p-values close to 0.0025, 10x
 the value output by R. All results are for a two-tailed case. Importantly,
 the W value computed by R *does agree* with the U values output by the
 first two online calculators listed above, yet it has a different p-value.

 Can anyone shed some light on how and why R's calculation differs from that
 of these online calculators? Thanks for your time.

 
 W. Bradley Knox, PhD
 http://bradknox.net
 bradk...@mit.edu

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shiny datatables column filtering plugin

2014-09-03 Thread Yihui Xie
The built-in version of DataTables in shiny has already supported
numeric ranges. For a numeric column x in data, if you type a,b in the
search box, the data will be filtered using a = x = b. The check
boxes are not supported, but you can use regular expressions (more
flexible) to achieve the same thing, e.g. (this example requires the
development version of shiny:
https://groups.google.com/forum/#!topic/shiny-discuss/-0u-wTnq_lA)

library(shiny)
runApp(list(
  ui = fluidPage(
dataTableOutput(mytable)
  ),
  server = function(input, output) {
output$mytable = renderDataTable(
  iris[sample(nrow(iris)), ],
  options = list(search = list(regex = TRUE))
)
  }
))


Then you can search for ^setosa|versicolor$, which means both setosa
and versicolor in the iris data. Or 4,5 in the search box of
Sepal.Length to filter this column. Depending on what you want, this
may or may not be enough.

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Web: http://yihui.name


On Wed, Sep 3, 2014 at 7:12 AM, Charles Determan Jr deter...@umn.edu wrote:
 Thank you for checking Yihui, on the off chance are you familiar with any
 other methods to filter on multiple conditions?


 On Tue, Sep 2, 2014 at 11:07 PM, Yihui Xie x...@yihui.name wrote:

 I just tested it and this plugin does not seem to work with the new
 .DataTable() API in DataTables 1.10.x, so I guess it is unlikely to
 make it work in (the current development version of) shiny. It is not
 in the official list of plugins, either:
 http://www.datatables.net/extensions/index

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Web: http://yihui.name


 On Tue, Sep 2, 2014 at 11:59 AM, Charles Determan Jr deter...@umn.edu
 wrote:
  Greetings,
 
  I am currently exploring some capabilities of the 'Shiny' package.  I am
  currently working with the most recent version of 'shiny' from the
  rstudio
  github repository (version - 0.10.1.9006) in order to use the most up to
  date datatables plugin.  Using the ggplot2 diamonds dataset, I can
  easily
  set columns as unsearchable (commented out below) and I could also
  subset
  out all the 'Ideal' diamonds for example, however I cannot filter out
  multiple conditions such as 'Ideal' and 'Fair' diamonds together.  From
  my
  searching, this multiple filtering can be done with checkboxes from the
  column using the jquery column filtering plugin (
 
  http://jquery-datatables-column-filter.googlecode.com/svn/trunk/checkbox.html).
  Despite this, I cannot get this plugin to work with my shiny app.  Any
  insight would be appreciated.
 
  library(shiny)
  library(ggplot2)
  runApp(
list(ui = basicPage(
  h1('Diamonds DataTable with TableTools'),
 
  # added column filter plugin
 
  singleton(tags$head(tags$script(src='https://code.google.com/p/jquery-datatables-column-filter/source/browse/trunk/media/js/jquery.dataTables.columnFilter.js',
  type='text/javascript'))),
  dataTableOutput(mytable)
)
,server = function(input, output) {
  output$mytable = renderDataTable({
diamonds[,1:6]
  }, options = list(
pageLength = 10,#   columnDefs = I('[{targets: [0,1],
  searchable: false}]')
columnFilter = I('[{
  columnDefs: [targets: [0,1], type: checkbox]
  }]')
 
  )
  )
}
))
 
 
 
  Charles



 Charles

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] .C and .Fortran

2014-09-03 Thread Prof Brian Ripley

On 03/09/2014 13:28, Filippo Monari wrote:

Hi,
I'd like to know what is the difference between the functions .C() and
.Fortran.
I noticed that by enclosing my F90 subroutines in modules .Fortran()
can't find them any more in the load table, while .C() still can. I also
checked that the subroutine was loaded with the is.loaded() function...
So can anyone explain to me the difference and which is better to use?


This is not the right list: see the posting guide.  But .Fortran is 
intended for *Fortran 77* code (as the help page says), and maps the 
supplied NAME argument in the same way as the Fortran 77 compiler does, 
which is often different from the way the C compiler does.


I would strongly recommend that new code uses .Call and a C wrapper to 
F90 code: it is a safer and more portable route.


If you want any more help, you need to follow the posting guide and

- post to R-devel,
- supply the 'at a minimum information' asked for there,
- supply the minimal example asked for, and what the messages you see are.


Thanks in advance,
Filippo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] wilcox.test - difference between p-values of R and online calculators

2014-09-03 Thread David L Carlson
Since they all have the same W/U value, it seems likely that the difference is 
how the different versions adjust the standard error for ties. Here are a 
couple of posts addressing the issues of ties:

http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html
http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and

David C

From: wbradleyk...@gmail.com [mailto:wbradleyk...@gmail.com] On Behalf Of W 
Bradley Knox
Sent: Wednesday, September 3, 2014 9:20 AM
To: David L Carlson
Cc: Tal Galili; r-help@r-project.org
Subject: Re: [R] wilcox.test - difference between p-values of R and online 
calculators

Tal and David, thanks for your messages.

I should have added that I tried all variations of true/false values for the 
exact and correct parameters. Running with correct=FALSE makes only a tiny 
change, resulting in W = 485, p-value = 0.0002481.

At one point, I also thought that the discrepancy between R and these online 
calculators might come from how ties are handled, but the fact that R and two 
of the online calcultors reach the same U/W values seems to indicate that ties 
aren't the issue, since (I believe) the U or W values contain all of the 
information needed to calculate the p-value, assuming the number of samples is 
also known for each condition. (However, it's been a while since I looked into 
how MWU tests work, so maybe now's the time to refresh.) If that's correct, the 
discrepancy seems to be based in what R does with the W value that is identical 
to the U values of two of the online calculators. (I'm also assuming that U and 
W have the same meaning, which seems likely.)

- Brad


W. Bradley Knox, PhD
http://bradknox.nethttp://bradknox.net/
bradk...@mit.edumailto:bradk...@mit.edu

On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson 
dcarl...@tamu.edumailto:dcarl...@tamu.edu wrote:
That does not change the results. The problem is likely to be the way ties are 
handled. The first sample has 25 values of which 23 are identical (359). The 
second sample has 26 values of which 12 are identical (359). The difference 
between the implementations may be a result of the way the ties are ranked. For 
example the R function rank() offers 5 different ways of handling the rank on 
tied observations. With so many ties, that could make a substantial difference.

Package coin has wilxon_test() which uses Monte Carlo simulation to estimate 
the confidence limits.

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On 
Behalf Of Tal Galili
Sent: Wednesday, September 3, 2014 5:24 AM
To: W Bradley Knox
Cc: r-help@r-project.orgmailto:r-help@r-project.org
Subject: Re: [R] wilcox.test - difference between p-values of R and online 
calculators

It seems your numbers has ties. What happens if you run wilcox.test with
correct=FALSE, will the results be the same as the online calculators?



Contact
Details:---
Contact me: tal.gal...@gmail.commailto:tal.gal...@gmail.com |
Read me: www.talgalili.comhttp://www.talgalili.com (Hebrew) | 
www.biostatistics.co.ilhttp://www.biostatistics.co.il (Hebrew) |
www.r-statistics.comhttp://www.r-statistics.com (English)
--



On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox 
bradk...@mit.edumailto:bradk...@mit.edu wrote:

 Hi.

 I'm taking the long-overdue step of moving from using online calculators to
 compute results for Mann-Whitney U tests to a more streamlined system
 involving R.

 However, I'm finding that R computes a different result than the 3 online
 calculators that I've used before (all of which approximately agree). These
 calculators are here:

 http://elegans.som.vcu.edu/~leon/stats/utest.cgi
 http://vassarstats.net/utest.html
 http://www.socscistatistics.com/tests/mannwhitney/

 An example calculation is


 *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))*

 which prints









 *Wilcoxon rank sum test with continuity correction  data: c(359, 359, 359,
 359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359, 359,
 220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359, 359,
 359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, 359,
 28.6, 359, 359) and 359, 128)  W = 485, p-value = 0.0002594 alternative
 hypothesis: true location shift is not equal to 0 Warning message: In
 wilcox.test.default(c(359, 359, 359, 359, 359, 359, 335, 359, : cannot
 compute exact 

Re: [R] shiny datatables column filtering plugin

2014-09-03 Thread Charles Determan Jr
Thank you Yihui, this would certainly work for me however I have having
trouble getting the regex to work appropriately.  I am using the
developmental version of shiny and have copied your code.  I launch the app
and the filtering of numbers works fine (i.e. 4,5) but the search for
setosa and versicolor gives me a blank datatable.  Is there some dependency
that I am missing that would prevent this regex to work with shiny?


On Wed, Sep 3, 2014 at 11:27 AM, Yihui Xie x...@yihui.name wrote:

 The built-in version of DataTables in shiny has already supported
 numeric ranges. For a numeric column x in data, if you type a,b in the
 search box, the data will be filtered using a = x = b. The check
 boxes are not supported, but you can use regular expressions (more
 flexible) to achieve the same thing, e.g. (this example requires the
 development version of shiny:
 https://groups.google.com/forum/#!topic/shiny-discuss/-0u-wTnq_lA)

 library(shiny)
 runApp(list(
   ui = fluidPage(
 dataTableOutput(mytable)
   ),
   server = function(input, output) {
 output$mytable = renderDataTable(
   iris[sample(nrow(iris)), ],
   options = list(search = list(regex = TRUE))
 )
   }
 ))


 Then you can search for ^setosa|versicolor$, which means both setosa
 and versicolor in the iris data. Or 4,5 in the search box of
 Sepal.Length to filter this column. Depending on what you want, this
 may or may not be enough.

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Web: http://yihui.name


 On Wed, Sep 3, 2014 at 7:12 AM, Charles Determan Jr deter...@umn.edu
 wrote:
  Thank you for checking Yihui, on the off chance are you familiar with any
  other methods to filter on multiple conditions?
 
 
  On Tue, Sep 2, 2014 at 11:07 PM, Yihui Xie x...@yihui.name wrote:
 
  I just tested it and this plugin does not seem to work with the new
  .DataTable() API in DataTables 1.10.x, so I guess it is unlikely to
  make it work in (the current development version of) shiny. It is not
  in the official list of plugins, either:
  http://www.datatables.net/extensions/index
 
  Regards,
  Yihui
  --
  Yihui Xie xieyi...@gmail.com
  Web: http://yihui.name
 
 
  On Tue, Sep 2, 2014 at 11:59 AM, Charles Determan Jr deter...@umn.edu
  wrote:
   Greetings,
  
   I am currently exploring some capabilities of the 'Shiny' package.  I
 am
   currently working with the most recent version of 'shiny' from the
   rstudio
   github repository (version - 0.10.1.9006) in order to use the most up
 to
   date datatables plugin.  Using the ggplot2 diamonds dataset, I can
   easily
   set columns as unsearchable (commented out below) and I could also
   subset
   out all the 'Ideal' diamonds for example, however I cannot filter out
   multiple conditions such as 'Ideal' and 'Fair' diamonds together.
 From
   my
   searching, this multiple filtering can be done with checkboxes from
 the
   column using the jquery column filtering plugin (
  
  
 http://jquery-datatables-column-filter.googlecode.com/svn/trunk/checkbox.html
 ).
   Despite this, I cannot get this plugin to work with my shiny app.  Any
   insight would be appreciated.
  
   library(shiny)
   library(ggplot2)
   runApp(
 list(ui = basicPage(
   h1('Diamonds DataTable with TableTools'),
  
   # added column filter plugin
  
   singleton(tags$head(tags$script(src='
 https://code.google.com/p/jquery-datatables-column-filter/source/browse/trunk/media/js/jquery.dataTables.columnFilter.js
 ',
   type='text/javascript'))),
   dataTableOutput(mytable)
 )
 ,server = function(input, output) {
   output$mytable = renderDataTable({
 diamonds[,1:6]
   }, options = list(
 pageLength = 10,#   columnDefs = I('[{targets: [0,1],
   searchable: false}]')
 columnFilter = I('[{
   columnDefs: [targets: [0,1], type:
 checkbox]
   }]')
  
   )
   )
 }
 ))
  
  
  
   Charles
 
 
 
  Charles




-- 
Dr. Charles Determan, PhD
Integrated Biosciences

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shiny datatables column filtering plugin

2014-09-03 Thread Yihui Xie
It looks like a problem of DataTables -- I cannot find a way to
specify the search.regex option for individual columns. You may ask
this question on the DataTables forum. Basically I was expecting this
to work:

.DataTable({
  search: { regex: true },
  columnDefs: [{ search: { regex: true }, targets: [0, 1, 2, 3, 4] }]
})

The global search box works, though.

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Web: http://yihui.name


On Wed, Sep 3, 2014 at 2:09 PM, Charles Determan Jr deter...@umn.edu wrote:
 Thank you Yihui, this would certainly work for me however I have having
 trouble getting the regex to work appropriately.  I am using the
 developmental version of shiny and have copied your code.  I launch the app
 and the filtering of numbers works fine (i.e. 4,5) but the search for setosa
 and versicolor gives me a blank datatable.  Is there some dependency that I
 am missing that would prevent this regex to work with shiny?


 On Wed, Sep 3, 2014 at 11:27 AM, Yihui Xie x...@yihui.name wrote:

 The built-in version of DataTables in shiny has already supported
 numeric ranges. For a numeric column x in data, if you type a,b in the
 search box, the data will be filtered using a = x = b. The check
 boxes are not supported, but you can use regular expressions (more
 flexible) to achieve the same thing, e.g. (this example requires the
 development version of shiny:
 https://groups.google.com/forum/#!topic/shiny-discuss/-0u-wTnq_lA)

 library(shiny)
 runApp(list(
   ui = fluidPage(
 dataTableOutput(mytable)
   ),
   server = function(input, output) {
 output$mytable = renderDataTable(
   iris[sample(nrow(iris)), ],
   options = list(search = list(regex = TRUE))
 )
   }
 ))


 Then you can search for ^setosa|versicolor$, which means both setosa
 and versicolor in the iris data. Or 4,5 in the search box of
 Sepal.Length to filter this column. Depending on what you want, this
 may or may not be enough.

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Web: http://yihui.name


 On Wed, Sep 3, 2014 at 7:12 AM, Charles Determan Jr deter...@umn.edu
 wrote:
  Thank you for checking Yihui, on the off chance are you familiar with
  any
  other methods to filter on multiple conditions?
 
 
  On Tue, Sep 2, 2014 at 11:07 PM, Yihui Xie x...@yihui.name wrote:
 
  I just tested it and this plugin does not seem to work with the new
  .DataTable() API in DataTables 1.10.x, so I guess it is unlikely to
  make it work in (the current development version of) shiny. It is not
  in the official list of plugins, either:
  http://www.datatables.net/extensions/index
 
  Regards,
  Yihui
  --
  Yihui Xie xieyi...@gmail.com
  Web: http://yihui.name
 
 
  On Tue, Sep 2, 2014 at 11:59 AM, Charles Determan Jr deter...@umn.edu
  wrote:
   Greetings,
  
   I am currently exploring some capabilities of the 'Shiny' package.  I
   am
   currently working with the most recent version of 'shiny' from the
   rstudio
   github repository (version - 0.10.1.9006) in order to use the most up
   to
   date datatables plugin.  Using the ggplot2 diamonds dataset, I can
   easily
   set columns as unsearchable (commented out below) and I could also
   subset
   out all the 'Ideal' diamonds for example, however I cannot filter out
   multiple conditions such as 'Ideal' and 'Fair' diamonds together.
   From
   my
   searching, this multiple filtering can be done with checkboxes from
   the
   column using the jquery column filtering plugin (
  
  
   http://jquery-datatables-column-filter.googlecode.com/svn/trunk/checkbox.html).
   Despite this, I cannot get this plugin to work with my shiny app.
   Any
   insight would be appreciated.
  
   library(shiny)
   library(ggplot2)
   runApp(
 list(ui = basicPage(
   h1('Diamonds DataTable with TableTools'),
  
   # added column filter plugin
  
  
   singleton(tags$head(tags$script(src='https://code.google.com/p/jquery-datatables-column-filter/source/browse/trunk/media/js/jquery.dataTables.columnFilter.js',
   type='text/javascript'))),
   dataTableOutput(mytable)
 )
 ,server = function(input, output) {
   output$mytable = renderDataTable({
 diamonds[,1:6]
   }, options = list(
 pageLength = 10,#   columnDefs = I('[{targets: [0,1],
   searchable: false}]')
 columnFilter = I('[{
   columnDefs: [targets: [0,1], type:
   checkbox]
   }]')
  
   )
   )
 }
 ))
  
  
  
   Charles
 
 
 
  Charles




 --
 Dr. Charles Determan, PhD
 Integrated Biosciences

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Four available places on GLMM course in Banff

2014-09-03 Thread Highland Statistics Ltd

There are four remaining places on the following course:


Course: Introduction to MCMC, Linear mixed effects models and GLMM with R
When: 22-26 September, 2014
Where: Parks Canada, Banff, Canada
Flyer: http://www.highstat.com/Courses/Flyer2014_09Banff.pdf

Course website: http://www.highstat.com/statscourse.htm


Kind regards,

Alain Zuur

--
Dr. Alain F. Zuur

First author of:
1. Beginner's Guide to GAMM with R (2014).
2. Beginner's Guide to GLM and GLMM with R (2013).
3. Beginner's Guide to GAM with R (2012).
4. Zero Inflated Models and GLMM with R (2012).
5. A Beginner's Guide to R (2009).
6. Mixed effects models and extensions in ecology with R (2009).
7. Analysing Ecological Data (2007).

Highland Statistics Ltd.
9 St Clair Wynd
UK - AB41 6DZ Newburgh
Tel:   0044 1358 788177
Email: highs...@highstat.com
URL:   www.highstat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] wilcox.test - difference between p-values of R and online calculators

2014-09-03 Thread peter dalgaard
Notice that correct=TRUE for wilcox.test refers to the continuity correction, 
not the correction for ties. 

You can fairly easily simulate from the exact distribution of W:

x - c(359,359,359,359,359,359,335,359,359,359,359,
  359,359,359,359,359,359,359,359,359,359,303,359,359,359)
y - c(332,85,359,359,359,220,231,300,359,237,359,183,286,
  355,250,105,359,359,298,359,359,359,28.6,359,359,128)
R - rank(c(x,y))
sim - replicate(1e6,sum(sample(R,25))) - 325

# With no ties, the ranks would be a permutation of 1:51, and we could do
sim2 - replicate(1e6,sum(sample(1:51,25))) - 325

In either case, the p-value is the probability that W = 485 or W = 165, and

 mean(sim = 485 | sim = 165) 
[1] 0.000151
 mean(sim2 = 485 | sim2 = 165) 
[1] 0.002182

Also, try

plot(density(sim))
lines(density(sim2))

and notice that the distribution of sim is narrower than that of sim2 (hence 
the smaller p-value with tie correction), but also that the normal 
approximationtion is not nearly as good as for the untied case. The 
clumpiness is due to the fact that 35 of the ranks have the maximum value of 
34 (corresponding to the original 359's).

-pd 

On 03 Sep 2014, at 19:13 , David L Carlson dcarl...@tamu.edu wrote:

 Since they all have the same W/U value, it seems likely that the difference 
 is how the different versions adjust the standard error for ties. Here are a 
 couple of posts addressing the issues of ties:
 
 http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9200.html
 http://stats.stackexchange.com/questions/6127/which-permutation-test-implementation-in-r-to-use-instead-of-t-tests-paired-and
 
 David C
 
 From: wbradleyk...@gmail.com [mailto:wbradleyk...@gmail.com] On Behalf Of W 
 Bradley Knox
 Sent: Wednesday, September 3, 2014 9:20 AM
 To: David L Carlson
 Cc: Tal Galili; r-help@r-project.org
 Subject: Re: [R] wilcox.test - difference between p-values of R and online 
 calculators
 
 Tal and David, thanks for your messages.
 
 I should have added that I tried all variations of true/false values for the 
 exact and correct parameters. Running with correct=FALSE makes only a tiny 
 change, resulting in W = 485, p-value = 0.0002481.
 
 At one point, I also thought that the discrepancy between R and these online 
 calculators might come from how ties are handled, but the fact that R and two 
 of the online calcultors reach the same U/W values seems to indicate that 
 ties aren't the issue, since (I believe) the U or W values contain all of the 
 information needed to calculate the p-value, assuming the number of samples 
 is also known for each condition. (However, it's been a while since I looked 
 into how MWU tests work, so maybe now's the time to refresh.) If that's 
 correct, the discrepancy seems to be based in what R does with the W value 
 that is identical to the U values of two of the online calculators. (I'm also 
 assuming that U and W have the same meaning, which seems likely.)
 
 - Brad
 
 
 W. Bradley Knox, PhD
 http://bradknox.nethttp://bradknox.net/
 bradk...@mit.edumailto:bradk...@mit.edu
 
 On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson 
 dcarl...@tamu.edumailto:dcarl...@tamu.edu wrote:
 That does not change the results. The problem is likely to be the way ties 
 are handled. The first sample has 25 values of which 23 are identical (359). 
 The second sample has 26 values of which 12 are identical (359). The 
 difference between the implementations may be a result of the way the ties 
 are ranked. For example the R function rank() offers 5 different ways of 
 handling the rank on tied observations. With so many ties, that could make a 
 substantial difference.
 
 Package coin has wilxon_test() which uses Monte Carlo simulation to estimate 
 the confidence limits.
 
 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 
 -Original Message-
 From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On 
 Behalf Of Tal Galili
 Sent: Wednesday, September 3, 2014 5:24 AM
 To: W Bradley Knox
 Cc: r-help@r-project.orgmailto:r-help@r-project.org
 Subject: Re: [R] wilcox.test - difference between p-values of R and online 
 calculators
 
 It seems your numbers has ties. What happens if you run wilcox.test with
 correct=FALSE, will the results be the same as the online calculators?
 
 
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.commailto:tal.gal...@gmail.com |
 Read me: www.talgalili.comhttp://www.talgalili.com (Hebrew) | 
 www.biostatistics.co.ilhttp://www.biostatistics.co.il (Hebrew) |
 www.r-statistics.comhttp://www.r-statistics.com (English)
 --
 
 
 
 On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox 
 

Re: [R] wilcox.test - difference between p-values of R and online calculators

2014-09-03 Thread W Bradley Knox
Tal and David, thanks for your messages.

I should have added that I tried all variations of true/false values for
the exact and correct parameters. Running with correct=FALSE makes only a
tiny change, resulting in W = 485, p-value = 0.0002481.

At one point, I also thought that the discrepancy between R and these
online calculators might come from how ties are handled, but the fact that
R and two of the online calcultors reach the same U/W values seems to
indicate that ties aren't the issue, since (I believe) the U or W values
contain all of the information needed to calculate the p-value, assuming
the number of samples is also known for each condition. (However, it's been
a while since I looked into how MWU tests work, so maybe now's the time to
refresh.) If that's correct, the discrepancy seems to be based in what R
does with the W value that is identical to the U values of two of the
online calculators. (I'm also assuming that U and W have the same meaning,
which seems likely.)

- Brad


W. Bradley Knox, PhD
http://bradknox.net
bradk...@mit.edu


On Wed, Sep 3, 2014 at 9:10 AM, David L Carlson dcarl...@tamu.edu wrote:

 That does not change the results. The problem is likely to be the way ties
 are handled. The first sample has 25 values of which 23 are identical
 (359). The second sample has 26 values of which 12 are identical (359). The
 difference between the implementations may be a result of the way the ties
 are ranked. For example the R function rank() offers 5 different ways of
 handling the rank on tied observations. With so many ties, that could make
 a substantial difference.

 Package coin has wilxon_test() which uses Monte Carlo simulation to
 estimate the confidence limits.

 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Tal Galili
 Sent: Wednesday, September 3, 2014 5:24 AM
 To: W Bradley Knox
 Cc: r-help@r-project.org
 Subject: Re: [R] wilcox.test - difference between p-values of R and online
 calculators

 It seems your numbers has ties. What happens if you run wilcox.test with
 correct=FALSE, will the results be the same as the online calculators?



 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --



 On Wed, Sep 3, 2014 at 3:54 AM, W Bradley Knox bradk...@mit.edu wrote:

  Hi.
 
  I'm taking the long-overdue step of moving from using online calculators
 to
  compute results for Mann-Whitney U tests to a more streamlined system
  involving R.
 
  However, I'm finding that R computes a different result than the 3 online
  calculators that I've used before (all of which approximately agree).
 These
  calculators are here:
 
  http://elegans.som.vcu.edu/~leon/stats/utest.cgi
  http://vassarstats.net/utest.html
  http://www.socscistatistics.com/tests/mannwhitney/
 
  An example calculation is
 
 
 
 *wilcox.test(c(359,359,359,359,359,359,335,359,359,359,359,359,359,359,359,359,359,359,359,359,359,303,359,359,359),c(332,85,359,359,359,220,231,300,359,237,359,183,286,355,250,105,359,359,298,359,359,359,28.6,359,359,128))*
 
  which prints
 
 
 
 
 
 
 
 
 
  *Wilcoxon rank sum test with continuity correction  data: c(359, 359,
 359,
  359, 359, 359, 335, 359, 359, 359, 359, 359, and c(332, 85, 359, 359,
 359,
  220, 231, 300, 359, 237, 359, 183, 359, 359, 359, 359, 359, 359, 359,
 359,
  359, 303, 359, 359, and 286, 355, 250, 105, 359, 359, 298, 359, 359, 359,
  28.6, 359, 359) and 359, 128)  W = 485, p-value = 0.0002594 alternative
  hypothesis: true location shift is not equal to 0 Warning message: In
  wilcox.test.default(c(359, 359, 359, 359, 359, 359, 335, 359, : cannot
  compute exact p-value with ties*
 
 
  However, all of the online calculators find p-values close to 0.0025, 10x
  the value output by R. All results are for a two-tailed case.
 Importantly,
  the W value computed by R *does agree* with the U values output by the
  first two online calculators listed above, yet it has a different
 p-value.
 
  Can anyone shed some light on how and why R's calculation differs from
 that
  of these online calculators? Thanks for your time.
 
  
  W. Bradley Knox, PhD
  http://bradknox.net
  bradk...@mit.edu
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML 

[R] snow/Rmpi without MPI.spawn?

2014-09-03 Thread Jim Leek

I'm a programmer at a high-performance computing center.  I'm not very
familiar with R, but I have used MPI from C, C++, and Python.  I have to run
an R code provided by a guy who knows R, but not MPI.  So, this fellow used
the R snow library to parallelize his R code (theoretically, I'm not
actually sure what he did.)  I need to get this code running on our
machines.

However, Rmpi and snow seem to require mpi spawn, which our computing center
doesn't support.  I even tried building Rmpi with MPICH1 instead of 2,
because Rmpi has that option, but it still tries to use spawn.

I can launch plenty of processes, but I have to launch them all at once at
the beginning. Is there any way to convince Rmpi to just use those processes
rather than trying to spawn its own?  I haven't found any documentation on
this issue, although I would've thought it would be quite common.

Thanks,
Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM Help

2014-09-03 Thread Kathy Haapala
Hi all,

I have a large set of data that looks something like this, although
this data frame is much smaller and includes made up numbers to make
my question easier.

 x.df - data.frame(Region = c(A, A, A, A, A, B, B, B, B, 
 B, B, C, C, C, C), Group_ID = c(1:15), No_Offspring = c(3, 0, 4, 
 2, 1, 0, 3, 4, 3, 2, 2, 5, 4, 1, 3), M_Offspring = c(2, 0, 2, 1, 0, 0, 1, 1, 
 2, 0, 1, 3, 2, 1, 1), F_Offspring = c(1, 0, 2, 1, 1, 0, 2, 3, 1, 2, 1, 2, 2, 
 0, 2), No_Helpers = c(5, 0, 2, 1, 0, 1, 3, 4, 2, 3, 2, 3, 4, 0, 0))

 x.df
   Region Group_ID No_Offspring M_Offspring F_Offspring No_Helpers
1   A13   2   1  5
2   A20   0   0  0
3   A34   2   2  2
4   A42   1   1  1
5   A51   0   1  0
6   B60   0   0  1
7   B73   1   2  3
8   B84   1   3  4
9   B93   2   1  2
10  B   102   0   2  3
11  B   112   1   1  2
12  C   125   3   2  3
13  C   134   2   2  4
14  C   141   1   0  0
15  C   153   1   2  0

I have been using GLMs to determine if the number of helpers
(No_Helpers) has an effect on the sex ratio of the offspring. Here's
the GLM I have been using:

 prop.male - x.df$M_Offspring/x.df$No_Offspring
 glm = glm(prop.male~No_Helpers,binomial,data=x.df)

However, now I'd like to fit a model with region-specific regressions
and see if this has more support than the model without
region-specificity. So, I'd like one model that generates a regression
for each region (A, B,  C).

I've tried treating No_Helpers and Region as covariates:
 glm2 = glm(prop.male~No_Helpers+Region-1,binomial,data=x.df)
which includes region-specificity in the intercepts, but not the
entire regression,
and as interaction terms:
 glm3 = glm(prop.male~No_Helpers*Region-1,binomial,data=x.df)
which also does not give me an intercept and slope for each region.

I'm not sure how else to adjust the formula, or if the adjustment
should be somewhere else in the GLM call.

Thanks in advance for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] snow/Rmpi without MPI.spawn?

2014-09-03 Thread Martin Morgan

On 09/03/2014 03:25 PM, Jim Leek wrote:

I'm a programmer at a high-performance computing center.  I'm not very
familiar with R, but I have used MPI from C, C++, and Python.  I have to run
an R code provided by a guy who knows R, but not MPI.  So, this fellow used
the R snow library to parallelize his R code (theoretically, I'm not
actually sure what he did.)  I need to get this code running on our
machines.

However, Rmpi and snow seem to require mpi spawn, which our computing center
doesn't support.  I even tried building Rmpi with MPICH1 instead of 2,
because Rmpi has that option, but it still tries to use spawn.

I can launch plenty of processes, but I have to launch them all at once at
the beginning. Is there any way to convince Rmpi to just use those processes
rather than trying to spawn its own?  I haven't found any documentation on
this issue, although I would've thought it would be quite common.


This script

spawn.R
===
# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size()
nWorkers - mpi.universe.size()
mpi.spawn.Rslaves(nslaves=nWorkers)
mpiRank - function(i)
  c(i=i, rank=mpi.comm.rank())
mpi.parSapply(seq_len(2*nWorkers), mpiRank)
mpi.close.Rslaves()
mpi.quit()

can be run like the comment suggests

   salloc -n 12 orterun -n 1 R -f spawn.R

uses slurm (or whatever job manager) to allocate resources for 12 tasks and 
spawn within that allocation. Maybe that's 'good enough' -- spawning within the 
assigned allocation? Likely this requires minimal modification of the current code.


More extensive is to revise the manager/worker-style code to something more like 
single instruction, multiple data



simd.R
==
## salloc -n 4 orterun R --slave -f simd.R
sink(/dev/null) # don't capture output -- more care needed here
library(Rmpi)

TAGS = list(FROM_WORKER=1L)
.comm = 0L

## shared `work', here just determine rank and host
work = c(rank=mpi.comm.rank(.comm),
 host=system(hostname, intern=TRUE))

if (mpi.comm.rank(.comm) == 0) {
## manager
mpi.barrier(.comm)
nWorkers = mpi.comm.size(.comm)
res = list(nWorkers)
for (i in seq_len(nWorkers - 1L)) {
res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER,
  comm=.comm)
}
res[[nWorkers]] = work
sink() # start capturing output
print(do.call(rbind, res))
} else {
## worker
mpi.barrier(.comm)
mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm)
}
mpi.quit()

but this likely requires some serious code revision; if going this route then 
http://r-pbd.org/ might be helpful (and from a similar HPC environment).


It's always worth asking whether the code is written to be efficient in R -- a 
typical 'mistake' is to write R-level explicit 'for' loops that 
copy-and-append results, along the lines of


   len - 10
   result - NULL
   for (i in seq_len(len))
   ## some complicated calculation, then...
   result - c(result, sqrt(i))

whereas it's much better to pre-allocate and fill

result - integer(len)
for (i in seq_len(len))
result[[i]] = sqrt(i)

or

lapply(seq_len(len), sqrt)

and very much better still to 'vectorize'

result - sqrt(seq_len(len))

(timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate 
and fill, and .002s for vectorize).


Pushing back on the guy providing the code (grep for for loops, and look for 
that copy-and-append pattern) might save you from having to use parallel 
evaluation at all.


Martin



Thanks,
Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overwriting a procedure

2014-09-03 Thread MacQueen, Don
If you have more than one version of fixx(), then the one that is used is
the one that comes first in the search path. The search path is revealed
by the search() function. So if you can learn to control the search path,
then you can control which version of fixx() is used. That would be my
initial approach.

As an aside, you can define your first version of fixx() more simply as

  fixx - function(x) list(x=x)

and the second more simply as

fixx - function(x) {
 x[,6]-x[,5]^2/10
 list(x=x)
}

Using return() is completely unnecessary (but of course ok if preferred as
a matter of programming style).


Of course, this all assumes you truly need two different functions with
the same name. I would think that is unlikely, but since there¹s no
indication of what determines which one should be used, I can¹t say.
However, there must be some criterion that determines that the second
version should be used, so perhaps


fixx - function(x, criterion) {
  ## criterion must be a logical value of length 1
  if (criterion) x[,6]-x[,5]^2/10
  list(x=x)
}

would work.


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 9/2/14, 11:45 AM, Steven Yen sye...@gmail.com wrote:

Is there a way to over-write a procedure (subroutine)?

I include a default procedure fixx in a list of procedures which are
compiled into a package. By default, the procedure deliver the data
matrix x.

fixx - function(x){
result - list(x=x)
return(result)
}

In some applications, I have transformations (such as squared terms)
in some column(s) in x. So I include the following procedure in the
mail (calling) program, hoping to over-write the default procedure
under the same name in the package (which is the way other languages
works, e.g., Gauss):

fixx - function(x){
x[,6]-x[,5]^2/10
result - list(x=x)
return(result)
}

This does not seem to work. The procedure in the main (calling)
program seems to get ignored. Any idea? Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overwriting a procedure

2014-09-03 Thread Duncan Murdoch
On 03/09/2014, 8:58 PM, MacQueen, Don wrote:
 If you have more than one version of fixx(), then the one that is used is
 the one that comes first in the search path. The search path is revealed
 by the search() function. So if you can learn to control the search path,
 then you can control which version of fixx() is used. That would be my
 initial approach.

This is only partially correct information.  It only applies to uses of
fixx() from the console.  The original poster wanted to replace a
function in a package:  uses within the package will search the package
namespace first, regardless of the search path.

Duncan Murdoch

 
 As an aside, you can define your first version of fixx() more simply as
 
   fixx - function(x) list(x=x)
 
 and the second more simply as
 
 fixx - function(x) {
  x[,6]-x[,5]^2/10
  list(x=x)
 }
 
 Using return() is completely unnecessary (but of course ok if preferred as
 a matter of programming style).
 
 
 Of course, this all assumes you truly need two different functions with
 the same name. I would think that is unlikely, but since there¹s no
 indication of what determines which one should be used, I can¹t say.
 However, there must be some criterion that determines that the second
 version should be used, so perhaps
 
 
 fixx - function(x, criterion) {
   ## criterion must be a logical value of length 1
   if (criterion) x[,6]-x[,5]^2/10
   list(x=x)
 }
 
 would work.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] depth of labels of axis

2014-09-03 Thread Jinsong Zhao

On 2014/9/1 20:39, David Winsemius wrote:


On Sep 1, 2014, at 4:40 PM, Jinsong Zhao wrote:


Hi there,

With the following code,

plot(1:5, xaxt = n)
axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]),
E, expression(E[t])))

you may notice that the E within labels of axis(1) are not at the
same depth. So the vision of axis(1) labels is something like wave.

Is there a possible way to typeset the labels so that they are have
the same depth?


I'm not sure that we share an interpretation of the term depth in this
context. I'm interpreting your request to me vertical alighnment.


depth is not the accurate word for this situation. Maybe, baseline 
is a better one.





Any suggestions will be really appreciated.


Read the help page, especially the paragraph about padj. I will admit
that I thought the description of the actions of padj=0 and padj=1 were
not what I experienced when I tried alternate versions. It did not seem
to me that padj=0 produced top alignment.


Thank you very much for pointing me to this. I did not notice this argument.

Best regards,
Jinsong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] depth of labels of axis

2014-09-03 Thread Jinsong Zhao

On 2014/9/2 11:50, David L Carlson wrote:

The bottom of the expression is set by the lowest character (which can even 
change for subscripted letters with descenders. The solution is to get axis() 
to align the tops of the axis labels and move the line up to reduce the space, 
e.g.

plot(1:5, xaxt = n)
axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]),
E, expression(E[t])), padj=1, mgp=c(3, .1, 0))
# Check alignment
abline(h=.7, xpd=TRUE, lty=3)


yes. In this situation, padj = 1 is the fast solution. However, If there 
are also superscript, then it's hard to alignment all the labels.


If R provide a mechanism that aligns the label in axis() or text() with 
the baseline of the character without the super- and/or sub-script, that 
will be terrific.


Regards,
Jinsong



-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jinsong Zhao
Sent: Monday, September 1, 2014 6:41 PM
To: r-help@r-project.org
Subject: [R] depth of labels of axis

Hi there,

With the following code,

plot(1:5, xaxt = n)
axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]),
E, expression(E[t])))

you may notice that the E within labels of axis(1) are not at the same
depth. So the vision of axis(1) labels is something like wave.

Is there a possible way to typeset the labels so that they are have the
same depth?

Any suggestions will be really appreciated. Thanks in advance.

Best regards,
Jinsong



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] depth of labels of axis

2014-09-03 Thread Jinsong Zhao

On 2014/9/3 21:33, Jinsong Zhao wrote:

On 2014/9/2 11:50, David L Carlson wrote:

The bottom of the expression is set by the lowest character (which can
even change for subscripted letters with descenders. The solution is
to get axis() to align the tops of the axis labels and move the line
up to reduce the space, e.g.

plot(1:5, xaxt = n)
axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]),
E, expression(E[t])), padj=1, mgp=c(3, .1, 0))
# Check alignment
abline(h=.7, xpd=TRUE, lty=3)


yes. In this situation, padj = 1 is the fast solution. However, If there
are also superscript, then it's hard to alignment all the labels.

If R provide a mechanism that aligns the label in axis() or text() with
the baseline of the character without the super- and/or sub-script, that
will be terrific.


it seems that the above wish is on the Graphics TODO lists:
https://www.stat.auckland.ac.nz/~paul/R/graphicstodos.html

Allow text adjustment for mathematical annotations which is relative to 
a text baseline (in addition to the current situation where adjustment 
is relative to the bounding box).




Regards,
Jinsong



-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Jinsong Zhao
Sent: Monday, September 1, 2014 6:41 PM
To: r-help@r-project.org
Subject: [R] depth of labels of axis

Hi there,

With the following code,

plot(1:5, xaxt = n)
axis(1, at = 1:5, labels = c(expression(E[g]), E, expression(E[j]),
E, expression(E[t])))

you may notice that the E within labels of axis(1) are not at the same
depth. So the vision of axis(1) labels is something like wave.

Is there a possible way to typeset the labels so that they are have the
same depth?

Any suggestions will be really appreciated. Thanks in advance.

Best regards,
Jinsong


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] snow/Rmpi without MPI.spawn?

2014-09-03 Thread Leek, Jim
Thanks for the tips.  I'll take a look around for for loops in the morning.

I think the example you provided worked for OpenMPI.  (The default on our 
machine is MPICH2, but it gave the same error about calling spawn.)  Anyway, 
with OpenMPI I got this:

 # salloc -n 12 orterun -n 1 R -f spawn.R
 library(Rmpi)
 ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - 
 mpi.universe.size()
 nslaves = 4
 mpi.spawn.Rslaves(nslaves)
Reported: 2 (out of 2) daemons - 4 (out of 4) procs

Then it hung there.  So things spawned anyway, which is progress.  I'm just not 
sure is that expected behavior for parSupply or not.

Jim

-Original Message-
From: Martin Morgan [mailto:mtmor...@fhcrc.org] 
Sent: Wednesday, September 03, 2014 5:08 PM
To: Leek, Jim; r-help@r-project.org
Subject: Re: [R] snow/Rmpi without MPI.spawn?

On 09/03/2014 03:25 PM, Jim Leek wrote:
 I'm a programmer at a high-performance computing center.  I'm not very 
 familiar with R, but I have used MPI from C, C++, and Python.  I have 
 to run an R code provided by a guy who knows R, but not MPI.  So, this 
 fellow used the R snow library to parallelize his R code 
 (theoretically, I'm not actually sure what he did.)  I need to get 
 this code running on our machines.

 However, Rmpi and snow seem to require mpi spawn, which our computing 
 center doesn't support.  I even tried building Rmpi with MPICH1 
 instead of 2, because Rmpi has that option, but it still tries to use spawn.

 I can launch plenty of processes, but I have to launch them all at 
 once at the beginning. Is there any way to convince Rmpi to just use 
 those processes rather than trying to spawn its own?  I haven't found 
 any documentation on this issue, although I would've thought it would be 
 quite common.

This script

spawn.R
===
# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - 
mpi.universe.size()
mpi.spawn.Rslaves(nslaves=nWorkers)
mpiRank - function(i)
   c(i=i, rank=mpi.comm.rank())
mpi.parSapply(seq_len(2*nWorkers), mpiRank)
mpi.close.Rslaves()
mpi.quit()

can be run like the comment suggests

salloc -n 12 orterun -n 1 R -f spawn.R

uses slurm (or whatever job manager) to allocate resources for 12 tasks and 
spawn within that allocation. Maybe that's 'good enough' -- spawning within the 
assigned allocation? Likely this requires minimal modification of the current 
code.

More extensive is to revise the manager/worker-style code to something more 
like single instruction, multiple data


simd.R
==
## salloc -n 4 orterun R --slave -f simd.R
sink(/dev/null) # don't capture output -- more care needed here
library(Rmpi)

TAGS = list(FROM_WORKER=1L)
.comm = 0L

## shared `work', here just determine rank and host
work = c(rank=mpi.comm.rank(.comm),
  host=system(hostname, intern=TRUE))

if (mpi.comm.rank(.comm) == 0) {
 ## manager
 mpi.barrier(.comm)
 nWorkers = mpi.comm.size(.comm)
 res = list(nWorkers)
 for (i in seq_len(nWorkers - 1L)) {
 res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER,
   comm=.comm)
 }
 res[[nWorkers]] = work
 sink() # start capturing output
 print(do.call(rbind, res))
} else {
 ## worker
 mpi.barrier(.comm)
 mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm)
}
mpi.quit()

but this likely requires some serious code revision; if going this route then 
http://r-pbd.org/ might be helpful (and from a similar HPC environment).

It's always worth asking whether the code is written to be efficient in R -- a 
typical 'mistake' is to write R-level explicit 'for' loops that 
copy-and-append results, along the lines of

len - 10
result - NULL
for (i in seq_len(len))
## some complicated calculation, then...
result - c(result, sqrt(i))

whereas it's much better to pre-allocate and fill

 result - integer(len)
 for (i in seq_len(len))
 result[[i]] = sqrt(i)

or

 lapply(seq_len(len), sqrt)

and very much better still to 'vectorize'

 result - sqrt(seq_len(len))

(timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate 
and fill, and .002s for vectorize).

Pushing back on the guy providing the code (grep for for loops, and look for 
that copy-and-append pattern) might save you from having to use parallel 
evaluation at all.

Martin


 Thanks,
 Jim

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list

Re: [R] snow/Rmpi without MPI.spawn?

2014-09-03 Thread Leek, Jim
I spoke a bit too soon.  You may have noticed that it isn't hanging in 
parSapply, it's hanging in mpi.spawn.Rslaves().  It claims to have launched the 
slaves, but I can't see them by logging into the target node and running 'top'. 
 I only see the top level R process (which is burning up 100% of a CPU).  So I 
don't know what's going on.  It never gets back from the spawn call anyway.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Leek, Jim
Sent: Wednesday, September 03, 2014 10:25 PM
To: Martin Morgan; r-help@r-project.org
Subject: Re: [R] snow/Rmpi without MPI.spawn?

Thanks for the tips.  I'll take a look around for for loops in the morning.

I think the example you provided worked for OpenMPI.  (The default on our 
machine is MPICH2, but it gave the same error about calling spawn.)  Anyway, 
with OpenMPI I got this:

 # salloc -n 12 orterun -n 1 R -f spawn.R
 library(Rmpi)
 ## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - 
 mpi.universe.size() nslaves = 4
 mpi.spawn.Rslaves(nslaves)
Reported: 2 (out of 2) daemons - 4 (out of 4) procs

Then it hung there.  So things spawned anyway, which is progress.  I'm just not 
sure is that expected behavior for parSupply or not.

Jim

-Original Message-
From: Martin Morgan [mailto:mtmor...@fhcrc.org]
Sent: Wednesday, September 03, 2014 5:08 PM
To: Leek, Jim; r-help@r-project.org
Subject: Re: [R] snow/Rmpi without MPI.spawn?

On 09/03/2014 03:25 PM, Jim Leek wrote:
 I'm a programmer at a high-performance computing center.  I'm not very 
 familiar with R, but I have used MPI from C, C++, and Python.  I have 
 to run an R code provided by a guy who knows R, but not MPI.  So, this 
 fellow used the R snow library to parallelize his R code 
 (theoretically, I'm not actually sure what he did.)  I need to get 
 this code running on our machines.

 However, Rmpi and snow seem to require mpi spawn, which our computing 
 center doesn't support.  I even tried building Rmpi with MPICH1 
 instead of 2, because Rmpi has that option, but it still tries to use spawn.

 I can launch plenty of processes, but I have to launch them all at 
 once at the beginning. Is there any way to convince Rmpi to just use 
 those processes rather than trying to spawn its own?  I haven't found 
 any documentation on this issue, although I would've thought it would be 
 quite common.

This script

spawn.R
===
# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - 
mpi.universe.size()
mpi.spawn.Rslaves(nslaves=nWorkers)
mpiRank - function(i)
   c(i=i, rank=mpi.comm.rank())
mpi.parSapply(seq_len(2*nWorkers), mpiRank)
mpi.close.Rslaves()
mpi.quit()

can be run like the comment suggests

salloc -n 12 orterun -n 1 R -f spawn.R

uses slurm (or whatever job manager) to allocate resources for 12 tasks and 
spawn within that allocation. Maybe that's 'good enough' -- spawning within the 
assigned allocation? Likely this requires minimal modification of the current 
code.

More extensive is to revise the manager/worker-style code to something more 
like single instruction, multiple data


simd.R
==
## salloc -n 4 orterun R --slave -f simd.R
sink(/dev/null) # don't capture output -- more care needed here
library(Rmpi)

TAGS = list(FROM_WORKER=1L)
.comm = 0L

## shared `work', here just determine rank and host work = 
c(rank=mpi.comm.rank(.comm),
  host=system(hostname, intern=TRUE))

if (mpi.comm.rank(.comm) == 0) {
 ## manager
 mpi.barrier(.comm)
 nWorkers = mpi.comm.size(.comm)
 res = list(nWorkers)
 for (i in seq_len(nWorkers - 1L)) {
 res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER,
   comm=.comm)
 }
 res[[nWorkers]] = work
 sink() # start capturing output
 print(do.call(rbind, res))
} else {
 ## worker
 mpi.barrier(.comm)
 mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm) }
mpi.quit()

but this likely requires some serious code revision; if going this route then 
http://r-pbd.org/ might be helpful (and from a similar HPC environment).

It's always worth asking whether the code is written to be efficient in R -- a 
typical 'mistake' is to write R-level explicit 'for' loops that 
copy-and-append results, along the lines of

len - 10
result - NULL
for (i in seq_len(len))
## some complicated calculation, then...
result - c(result, sqrt(i))

whereas it's much better to pre-allocate and fill

 result - integer(len)
 for (i in seq_len(len))
 result[[i]] = sqrt(i)

or

 lapply(seq_len(len), sqrt)

and very much better still to 'vectorize'

 result - sqrt(seq_len(len))

(timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate 
and fill, and .002s for vectorize).

Pushing back on the guy providing the code (grep for for loops, and look for 
that copy-and-append 

[R-es] Duda programacion

2014-09-03 Thread Wences Alonso
Hola a todos,

Soy nuevo en esta lista y sobretodo soy nuevo utilizando R.

Tengo una duda que no soy capaz de solucionar, en un data.frame tengo varias 
variables, quiero crear un c�lculo y que me lo devuelva abierto por una de esas 
variable.
He conseguido hacerlo si el c�lculo es una media de una variable, pero en mi 
caso se trata de un % por lo que no puedo hacer la media, ser�a m�s bien una 
media ponderada, pero tampoco a� me funciona.

Pongo un ejemplo:



CampoC1C2%C2/C1
A10110
A10990
A20042
B50714
B10770
B10022


Agrupando;
C1C2Promedio% Real
A2201434,06,4
B1601628,710,0
Total3803031,37,9

Por ejemplo si el promedio fuera correcto lo har�a asi:

aggregate(Datos$C1, list(Datos$%C2/C1), mean)

Me podrias ayudar?

Gracias
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R-es] Como se hace el operador o (OR) para seleccionar dos o mas niveles de un vector ?

2014-09-03 Thread eric
Estimados, tengo un data.frame con una columna que tiene tres diferentes
niveles (aunque la columna no es propiamente de un factor, son solo tres
letras diferentes), por ejemplo c, t y s, y necesito usar los
datos que tienen c o t, como tengo que hacerlo ?

Creo que a veces he usado algo asi:

dataframe - dataframe[dataframe$columna==c(c,t),]

pero por alguna razon, cuando uso esa forma dentro del codigo para crear
un grafico, por ejemplo:


xyplot(are ~ con | sol, data=datEnd[datEnd$iso==c(c,t),])


el resultado no es correcto.

Alguna idea ?

Muchas gracias,

Eric.







-- 
Forest Engineer
Master in Environmental and Natural Resource Economics
Ph.D. student in Sciences of Natural Resources at La Frontera University
Member in AguaDeTemu2030, citizen movement for Temuco with green city
standards for living

Nota: Las tildes se han omitido para asegurar compatibilidad con algunos
lectores de correo.

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Como se hace el operador o (OR) para seleccionar dos o mas niveles de un vector ?

2014-09-03 Thread Jorge I Velez
Hola Eric,

Revisa

?%in%

Saludos,
Jorge.-



2014-09-04 8:41 GMT+10:00 eric ericconchamu...@gmail.com:

 Estimados, tengo un data.frame con una columna que tiene tres diferentes
 niveles (aunque la columna no es propiamente de un factor, son solo tres
 letras diferentes), por ejemplo c, t y s, y necesito usar los
 datos que tienen c o t, como tengo que hacerlo ?

 Creo que a veces he usado algo asi:

 dataframe - dataframe[dataframe$columna==c(c,t),]

 pero por alguna razon, cuando uso esa forma dentro del codigo para crear
 un grafico, por ejemplo:


 xyplot(are ~ con | sol, data=datEnd[datEnd$iso==c(c,t),])


 el resultado no es correcto.

 Alguna idea ?

 Muchas gracias,

 Eric.







 --
 Forest Engineer
 Master in Environmental and Natural Resource Economics
 Ph.D. student in Sciences of Natural Resources at La Frontera University
 Member in AguaDeTemu2030, citizen movement for Temuco with green city
 standards for living

 Nota: Las tildes se han omitido para asegurar compatibilidad con algunos
 lectores de correo.

 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es


[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es