from:"Peter Langfelder"

Re: [R] Looping

2024-02-18 Thread Peter Langfelder

Try

for (ind in 1:24)
{
   data = read.csv(paste0("data", ind, ".csv"))
   ...
}


Peter

On Mon, Feb 19, 2024 at 11:33 AM Steven Yen  wrote:
>
> I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
> 24 altogether. That is,
>
> data<-read.csv(“data1.csv”)
> …
> data<-read.csv(“data24.csv”)
> …
>
> Is there a way to do this in a loop? Thank you.
>
> Steven from iPhone
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Count matrix of GSE146049

2023-04-02 Thread Peter Langfelder

It's a microarray data set, so I don't think you would want to apply
an RNA-seq pipeline. You'd be better off applying a normalization
appropriate for this type of microarray data.

HTH,

Peter

On Sun, Apr 2, 2023 at 11:09 PM Anas Jamshed  wrote:
>
> I want to get the count matrix of genes from
> https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146049. Is it
> possible for GSE146049? After getting counts, I want to do TMM
> normalization.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confusion about dev.prev()

2022-12-05 Thread Peter Langfelder

Ah, thanks, got it. Misread the help again...

Peter

On Mon, Dec 5, 2022 at 9:38 PM Ivan Krylov  wrote:
>
> В Mon, 5 Dec 2022 21:28:16 +0800
> Peter Langfelder  пишет:
>
> > Open two devices, plot a plot, call dev.prev() and plot again. I
> > would expect the second plot to appear in the first device, but that
> > is not what happens; both plots appear in the second device.
>
> Unfortunately, dev.prev() and dev.next() only return the number of the
> respective device. You need dev.set() to actually make the change.
>
> --
> Best regards,
> Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] confusion about dev.prev()

2022-12-05 Thread Peter Langfelder

Hi all,

I'm either confused about dev.prev() or there's a bug in it. Open two
devices, plot a plot, call dev.prev() and plot again. I would expect
the second plot to appear in the first device, but that is not what
happens; both plots appear in the second device. Is this expected
behavior or a bug?
Example (in linux):

x11()
par(mfrow = c(1,2));
x11()
par(mfrow = c(1,2));
plot(1:10)
dev.prev()
plot(10:20)

This happens both in R-4.2.0 (patched) and R-devel from 2022-09-13
(sorry, don't have a newer one handy).

sessionInfo() from the R-devel session:

R Under development (unstable) (2022-09-13 r82849)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora Linux 36 (Thirty Six)

Matrix products: default
BLAS:   /usr/local/lib64/R-devel/lib/libRblas.so
LAPACK: /usr/local/lib64/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a vector using an index with all missing values

2022-07-02 Thread Peter Langfelder

Ah, thanks, that makes sense.

Peter

On Fri, Jul 1, 2022 at 10:01 PM Bill Dunlap  wrote:
>
> This has to do with the mode of the subscript - logical subscripts are 
> repeated to the length of x and integer/numeric ones are not.  NA is logical, 
> NA_integer_ is integer, so we get
>
> > x <- 1:10
> > x[ rep(NA_integer_, 3) ]
> [1] NA NA NA
> > x[ rep(NA, 3) ]
>  [1] NA NA NA NA NA NA NA NA NA NA
>
> -Bill
>
>
> On Fri, Jul 1, 2022 at 8:31 PM Peter Langfelder  
> wrote:
>>
>> Hi all,
>>
>> I stumbled on subsetting behavior that seems counterintuitive and
>> perhaps is a bug. Here's a simple example:
>>
>> > x = 1:10
>> > x[ rep(NA, 3)]
>>  [1] NA NA NA NA NA NA NA NA NA NA
>>
>> I would have expected 3 NAs (the length of the index), not 10 (all
>> values in x). Looked at the documentation for the subsetting operator
>> `[` but found nothing indicating that if the index contains all
>> missing data, the result is the entire vector.
>>
>> I can work around the issue for a general 'index' using a somewhat
>> clunky but straightforward construct along the lines of
>>
>> > index = rep(NA, 3)
>> > x[c(1, index)][-1]
>> [1] NA NA NA
>>
>> but I'm wondering if the behaviour above is intended.
>>
>> Thanks,
>>
>> Peter
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking a function for undeclared variables

2022-04-05 Thread Peter Langfelder

Thanks!

Peter

On Tue, Apr 5, 2022 at 6:01 PM Jeff Newmiller  wrote:
>
> ?codetools::findGlobals
>
> On April 5, 2022 5:36:54 PM PDT, Peter Langfelder 
>  wrote:
> >Hi all,
> >
> >I'd like to check a function for undeclared global variables using
> >something similar to what R CMD check does when "checking R code for
> >possible problems". My search came up empty but I hope there is way to
> >do it without building a package just for this purpose, isn't there?
> >
> >Thanks!
> >
> >Peter
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sum every n (4) observations by group

2021-12-19 Thread Peter Langfelder

I'm not sure I understand the task, but if I do, assuming your data
frame is assigned to a variable named df, I would do something like

sumNs = function(x, n)
{
   if (length(x) %%n !=0) stop("Length of 'x' must be a multiple of 'n'.")
   n1 = length(x)/n
   ind = rep(1:n1, each = n)
   tapply(x, ind, sum)
}
sums = tapply(df$Value, df$ID, sumNs, 4)

Peter

On Sun, Dec 19, 2021 at 10:32 AM Miluji Sb  wrote:
>
> Dear all,
>
> I have a dataset (below) by ID and time sequence. I would like to sum every
> four observations by ID.
>
> I am confused how to combine the two conditions. Any help will be highly
> appreciated. Thank you!
>
> Best.
>
> Milu
>
> ## Dataset
> structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
> "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C",
> "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L,
> 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L,
> 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L
> ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384,
> 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797,
> 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326,
> 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854,
> 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383
> )), class = "data.frame", row.names = c(NA, -24L))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random Forest: OOB performance = test set performance?

2021-04-10 Thread Peter Langfelder

I think the only thing you are doing wrong is not setting the random
seed (set.seed()) so your results are not reproducible. Depending on
the random sample used to select the training and test sets, you get
slightly varying accuracy for both, sometimes one is better and
sometimes the other.

HTH,

Peter

On Sat, Apr 10, 2021 at 8:49 PM  wrote:
>
> Hi ML,
>
> For random forest, I thought that the out-of-bag performance should be
> the same (or at least very similar) to the performance calculated on a
> separated test set.
>
> But this does not seem to be the case.
>
> In the following code, the accuracy computed on out-of-bag sample is
> 77.81%, while the one computed on a separated test set is 81%.
>
> Can you please check what I am doing wrong?
>
> Thanks in advance and best regards.
>
> library(randomForest)
> library(ISLR)
>
> Carseats$High <- ifelse(Carseats$Sales<=8,"No","Yes")
> Carseats$High <- as.factor(Carseats$High)
>
> train = sample(1:nrow(Carseats), 200)
>
> rf = randomForest(High~.-Sales,
>data=Carseats,
>subset=train,
>mtry=6,
>importance=T)
>
> acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion)
> print(paste0("Accuracy OOB: ", round(acc*100,2), "%"))
>
> yhat <- predict(rf, newdata=Carseats[-train,])
> y <- Carseats[-train,]$High
> conftest <- table(y, yhat)
> acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest)
> print(paste0("Accuracy test set: ", round(acctest*100,2), "%"))
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about a p-value < 2.2e-16

2021-03-18 Thread Peter Langfelder

I thinnk the answer is much simpler. The print method for hypothesis
tests (class htest) truncates the p-values. In the above example,
instead of using

wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)

and copying the output, just print the p-value:

tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
tst$p.value

[1] 2.988368e-32


I think this value is what the journal asks for.

HTH,

Peter

On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
 wrote:
>
>I would push back on that from two perspectives:
>
>
>  1.  I would study exactly what the journal said very
> carefully.  If they mandated "wilcox.test", that function has an
> argument called "exact".  If that's what they are asking, then using
> that argument gives the exact p-value, e.g.:
>
>
>  > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
>
>  Wilcoxon rank sum exact test
>
> data:  rnorm(100) and rnorm(100, 2)
> W = 691, p-value < 2.2e-16
>
>
>  2.  If that's NOT what they are asking, then I'm not
> convinced what they are asking makes sense:  There is is no such thing
> as an "exact p value" except to the extent that certain assumptions
> hold, and all models are wrong (but some are useful), as George Box
> famously said years ago.[1]  Truth only exists in mathematics, and
> that's because it's a fiction to start with ;-)
>
>
>Hope this helps.
>Spencer Graves
>
>
> [1]
> https://en.wikipedia.org/wiki/All_models_are_wrong
>
>
> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
> >   
> > Dear all,
> >
> > i would appreciate having your advice on the following please :
> >
> > in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare
> > sets of 1000 genes expression (in the genomics field).
> >
> > however, the journal asks us to provide the exact p value ...
> >
> > would it be legitimate to write : "p-value = 0" ? thanks a lot,
> >
> > -- bogdan
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does anyone have any use for this?

2021-01-01 Thread Peter Langfelder

This would certainly simplify and make more readable some of my code where
I create multiple versions of the same plot calling the same function with
minor variations of a few of many arguments. Thanks!

Peter

On Fri, Jan 1, 2021 at 12:20 PM Bert Gunter  wrote:

> Hi all:
>
> In the course of playing around with other issues, I wrote the following
> little function that allows functions to keep state
> and easily replay and update their state.(see comments after):
>
> memify <- function(f)
> {
>if (is.primitive(f)) {
>   cat("Cannot memify primitive functions.\n")
>   return(NULL)
>}
>if (!inherits(f, "function"))
>   stop("Argument must inherit from class 'function'")
>arglist <- list()
>structure(
>   function(...) {
>  m <- tryCatch(
> as.list(match.call(f)[-1]),
> error = function(e) {
>warning("Bad function call; cannot update arguments\n")
>return(NULL)
> }
>  )
>  nm <- names(m)
>  hasname <- nm != "" #logical index of named values
>  if (any(hasname)) {
> if (anyDuplicated(nm, incomparables = ""))
>warning("Duplicated names in call; only the first will be
> used.")
> arglist <<- modifyList(arglist, m[hasname]) ## this is what
> does the job
>  }
>  do.call(f, modifyList(m, arglist))
>   },
>   class = c("memified", class(f)))
> }
>
> Examples:
>
>  x <- 1:9; y <- runif(9)
>  plt <- memify(plot)
>  x <- 1:9; y <- runif(9)
>  plt(x,y, col = "blue")  ## plt "remembers" these arguments; i.e. keeps
> state
>  plt( type = "b") ## all other arguments as previous
>  plt(col = "red") ## ditto
>
> So my question is: Beyond allowing one to easily change/add argument values
> and replay when there are lots of arguments floating around, which we often
> use an IDE's editor to do, is there any real use for this? I confess that,
> like Pygmalion, I have been charmed by this idea, but it could well be
> useless, By all means feel free to chastise me if so.
>
> 1. I am aware that many functions already have "update" methods to "update"
> their results without re-entering all arguments -- e.g. lattice graphics,
> glm, etc.
> 2. Several packages -- rlang and R6 anyway -- but very likely more, do this
> sort of thing and way more; the price is added complexity, of course.
> 3. I realize also that keeping state would be a bad idea in many
> circumstances, e.g. essentially changing documented defaults.
>
> Reply privately to praise or bury if you do not think this is of any
> interest to readers of this list. Publicly is fine, too. If it's dumb it's
> dumb.
>
> Cheers and best wishes for a better new year for us all,
>
> Bert Gunter
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error pvclust package: Error in hclust(distance, method = method.hclust)

2020-12-14 Thread Peter Langfelder

I don't use pvclust but from a cursory reading and from the error
indicating bootstrap I am guessing that pvclust carries out some sort of a
sampling of the features on which you cluster. Since you only retain two
features (coordinates), the sampling necessarily results in just one
feature being retained, leading to an error. Just my guess though;
understanding what pvclust does would help you diagnose the problem.

Peter

On Mon, Dec 14, 2020 at 2:48 PM Jovani T. de Souza 
wrote:

> This question was also made in
>
> https://stackoverflow.com/questions/65290436/error-pvclust-package-error-in-hclustdistance-method-method-hclust-must
>
> Question:
>
> Could you help me resolve the following error:
>
> Error in hclust(distance, method = method.hclust) :
>   must have n >= 2 objects to cluster
>
> I'm trying to use the `pvclust` package, but I'm not able to generate the
> dendogram. If I use all data from the df database, I can generate it, but
> as I am restricting only to the coordinates (Latitude and Longitude), it is
> not working, it gives the error that I mentioned. Below, I entered an
> executable code.
>
> Thank you so much!
>
>
> ###USING PVCLUST
>
> library(rdist)
> library(pvclust)
> library(geosphere)
>
> df <- structure(
>   list(Industries = c(1,2,3,4,5,6,7), Latitude = c(-24.779225,
> -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081),
>Longitude = c(-49.934816, -49.922324, -49.911616, -49.906262,
> -49.890796,-49.8875254,-49.8875254),
>Waste = c(526, 350, 526, 469, 285, 433, 456)),class =
> "data.frame", row.names = c(NA, -7L))
>
> mat <- as.data.frame.matrix(df)
> mat <- t(mat)
> fit <- pvclust(mat, method.hclust="average", method.dist="euclidean")
> fit
> plot(fit)
> pvrect(fit)
>
>
> # RESTRICTING ONLY TO COORDINATES
> coordinates<-subset(df,select=c("Latitude","Longitude"))
> mat <- as.data.frame.matrix(coordinates)
> mat <- t(mat)
> fit <- pvclust(mat, method.hclust="average", method.dist="euclidean")
> > fit <- pvclust(mat, method.hclust="average", method.dist="euclidean")
> Bootstrap (r = 0.5)... Error in hclust(distance, method =
> method.hclust) :
>   must have n >= 2 objects to cluster*
>
>
>
>
> [image: Mailtrack]
> <
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> Remetente
> notificado por
> Mailtrack
> <
> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
> >
> 14/12/20
> 19:37:37
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assigning cores

2020-09-03 Thread Peter Langfelder

The big question is whether each worker or thread uses parallel
processing itself, or whether it uses resources like cache in which
case 20 threads fighting over the cache would slow you down
substantially. If your simulations use operations implemented in BLAS
or LAPACK, be aware that some R installations use custom fast BLAS
that can use multiple cores and the processor cache. You can see some
of it in sessionInfo().

The other issue is memory usage - if you exhaust your physical RAM,
your computer will slow down not so much because of CPU load but
rather because of memory management (swapping to and from disk).

I would run some smaller experimental runs that take just a minute or
two to finish with say 4, 8 and 12 workers and see how fast these go -
you may find no or very little speed up past 8 or perhaps even 4-6
workers.

HTH,

Peter

On Thu, Sep 3, 2020 at 10:45 AM Leslie Rutkowski
 wrote:
>
> Hi all,
>
> I'm working on a large simulation and I'm using the doParallel package to
> parallelize my work. I have 20 cores on my machine and would like to
> preserve some for day-to-day activities - word processing, sending emails,
> etc.
>
> I started by saving 1 core and it was clear that *everything* was so slow
> as to be nearly unusable.
>
> Any suggestions on how many cores to hold back (e.g., not to put to work on
> the parallel process)?
>
> Thanks,
> Leslie
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] custom function gives unexpected result - for me

2020-04-17 Thread Peter Langfelder

You need 1:(m-1) in your function. The operator : has precedence over -:

> 1:3-1
[1] 0 1 2
> 1:(3-1)
[1] 1 2

Happened to me a few times as well before I remembered.

HTH,

Peter

On Fri, Apr 17, 2020 at 3:50 PM Monica Palaseanu-Lovejoy
 wrote:
>
> Hi,
>
> I wrote a relatively simple function. If i run the code inside the function
> line by line i am getting the result i was expecting, but if i run the
> function, i get a different result.
>
> The function:
>
> grr1 <- function(rn) {
> r.up <- c()
> for (i in 1:rn-1) {
> if (i%%2==0) ru <- seq(1,i) else ru <- seq(i,1)
> r.up <- c(r.up, ru)
> }
> return(r.up)
> }
>
> So, if rn is 3 for example i would expect to get 1 1 2
>
> grr1(3)
> [1] 1 0 1 1 2
>
> If i run it line by line inside the function:
> r.up <- c()
> > r.up
> NULL
>
> i=1
> if (i%%2==0) ru <- seq(1,i) else ru <- seq(i,1)
> > ru
> [1] 1
>
> r.up <- c(r.up, ru)
> r.up
> [1] 1
>
> i=2
> if (i%%2==0) ru <- seq(1,i) else ru <- seq(i,1)
> ru
> [1] 1 2
> r.up <- c(r.up, ru)
> > r.up
> [1] 1 1 2
>
> So - i am getting the result i am expecting. From where the 1 0 before what
> i expect as a result comes from? I am sure i am doing some very basic
> error, but it seems i cannot figure it out.
>
> I run R x64  3.2.6. I know it is not the latest version, but it should not
> give me unexpected results because of that i would think.
>
> sessionInfo()
> R version 3.6.2 (2019-12-12)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 17763)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.2
>
> Thanks,
> Monica
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] POSIX system oddities

2020-03-29 Thread Peter Langfelder

The time has changed from "standard" (EST) to "Daylight saving" (EDT) which
shaves off 1 hour.

Peter

On Sun, Mar 29, 2020 at 5:03 PM Sebastien Bihorel via R-help <
r-help@r-project.org> wrote:

> Hi,
>
> Why is there less number of seconds on 03/10/2019 in the internal POSIX
> system? The difference between the previous or the next day eems to be
> exactly 1 hour. I could not find anything in the manuals on CRAN.
>
> > dates <- as.POSIXct(sprintf('03/%s/2019',9:12), format = '%m/%d/%Y')
> > dates
> [1] "2019-03-09 EST" "2019-03-10 EST" "2019-03-11 EDT" "2019-03-12 EDT"
> > diff(as.numeric(dates[1:2]))
> [1] 86400
> > diff(as.numeric(dates[2:3]))
> [1] 82800
> > diff(as.numeric(dates[3:4]))
> [1] 86400
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add labels to dendogram

2020-03-26 Thread Peter Langfelder

Your code does not work because Tag is not numeric. You need to exclude Tag
from the data frame df and instead assign it as rownames. Also, dist
requires a numeric matrix, not data frame.

df = as.matrix(data.frame(Healthy, Tumour, Metastasis))
or
df = cbind(Healthy, Tumour, Metastasis)

rownames(df) = Tag

Then continue as in your code.

Peter



On Thu, Mar 26, 2020 at 8:10 AM Luigi Marongiu 
wrote:

> Dear all,
> I have built a hierarchical clustering on some data as follows:
> ```
> Tag = c(
>   "YP_008603282", "NP_054035","BAA00606", "NP_054034",
>  "NP_054033",
>   "AAC17846" ,"NP_054036","YP_073767" ,   "BAQ20411", "P52455")
> Healthy = c(
>   12.15540751,  2.33103008,  1.46924258,  0.26274009,  0.95217008,
>   -0.08197491,  0.09038259, -0.08197491, -0.25433241, -0.08197491)
> Tumour = c(
>   12.51939026,  1.20983671,  0.61459705,  0.61459705,  0.81301027,
>   0.21777061,  -0.17905583, -0.17905583, -0.17905583,  0.01935739)
> Metastasis = c(
>   12.55328882,  1.04722513,  1.04722513,  0.70881149,  0.37039785,
>   0.20119103, 0.20119103,0.20119103,  0.20119103,  0.03198422)
> df = data.frame(Tag, Healthy, Tumour, Metastasis, stringsAsFactors = FALSE)
> d <- dist(df, method = "euclidean")
> hc1 <- hclust(d, method = "complete" )
> plot(hc1)
> ```
>
> Is there a way to add the Tag column instead of the numbers to the leaves
> of the dendrogram?
> Thank you
>
> --
> Best regards,
> Luigi
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-15 Thread Peter Langfelder

Try hclust(as.dist(1-calc.rho), method = "average").

Peter

On Fri, Nov 15, 2019 at 10:02 AM Ana Marija  wrote:
>
> HI Peter,
>
> Thank you for getting back to me and shedding light on this. I see
> your point, doing Jim's method:
>
> > keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3))
> > ro246.lt.8<-calc.rho[keeprows,keeprows]
> > ro246.lt.8[ro246.lt.8 == 1] <- NA
> > (mmax <- max(abs(ro246.lt.8), na.rm=TRUE))
> [1] 0.566
>
> Which is good in general, correlations in my matrix  should not be
> exceeding 0.8. I need to run Mendelian Rendomization on it later on so
> I can not be having there highly correlated SNPs. But with Jim's
> method I am only left with 17 SNPs (out of 246) and that means that
> both pairs of highly correlated SNPs are removed and it would be good
> to keep one of those highly correlated ones.
>
> I tried to do your code:
> > tree = hclust(1-calc.rho, method = "average")
> Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor
> exceed 65536") :
>   missing value where TRUE/FALSE needed
>
> Please advise.
>
> Thanks
> Ana
>
> On Thu, Nov 14, 2019 at 7:37 PM Peter Langfelder
>  wrote:
> >
> > I suspect that you want to identify which variables are highly
> > correlated, and then keep only "representative" variables, i.e.,
> > remove redundant ones. This is a bit of a risky procedure but I have
> > done such things before as well sometimes to simplify large sets of
> > highly related variables. If your threshold of 0.8 is approximate, you
> > could simply use average linkage hierarchical clustering with
> > dissimilarity = 1-correlation, cut the tree at the appropriate height
> > (1-0.8=0.2), and from each cluster keep a single representative (e.g.,
> > the one with the highest mean correlation with other members of the
> > cluster). Something along these lines (untested)
> >
> > tree = hclust(1-calc.rho, method = "average")
> > clusts = cutree(tree, h = 0.2)
> > clustLevels = sort(unique(clusts))
> > representatives = unlist(lapply(clustLevels, function(cl)
> > {
> >   inClust = which(clusts==cl);
> >   rho1 = calc.rho[inClust, inClust, drop = FALSE];
> >   repr = inClust[ which.max(colSums(rho1)) ]
> >   repr
> > }))
> >
> > the variable representatives now contains indices of the variables you
> > want to retain, so you could subset the calc.rho matrix as
> > rho.retained = calc.rho[representatives, representatives]
> >
> > I haven't tested the code and it may contain bugs, but something along
> > these lines should get you where you want to be.
> >
> > Oh, and depending on how strict you want to be with the remaining
> > correlations, you could use complete linkage clustering (will retain
> > more variables, some correlations will be above 0.8).
> >
> > Peter
> >
> > On Thu, Nov 14, 2019 at 10:50 AM Ana Marija  
> > wrote:
> > >
> > > Hello,
> > >
> > > I have a data frame like this (a matrix):
> > > head(calc.rho)
> > > rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> > > rs56192520  0.903 0.268 0.327 0.327 0.327 0.582
> > > rs3764410   0.928 0.276 0.336 0.336 0.336 0.598
> > > rs145984817 0.975 0.309 0.371 0.371 0.371 0.638
> > > rs1807401   0.975 0.309 0.371 0.371 0.371 0.638
> > > rs1807402   0.975 0.309 0.371 0.371 0.371 0.638
> > > rs35350506  0.975 0.309 0.371 0.371 0.371 0.638
> > >
> > > > dim(calc.rho)
> > > [1] 246 246
> > >
> > > I would like to remove from this data all highly correlated variables,
> > > with correlation more than 0.8
> > >
> > > I tried this:
> > >
> > > > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > > > dim(data)
> > > [1] 246   0
> > >
> > > Can you please advise,
> > >
> > > Thanks
> > > Ana
> > >
> > > But this removes everything.
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Peter Langfelder

I suspect that you want to identify which variables are highly
correlated, and then keep only "representative" variables, i.e.,
remove redundant ones. This is a bit of a risky procedure but I have
done such things before as well sometimes to simplify large sets of
highly related variables. If your threshold of 0.8 is approximate, you
could simply use average linkage hierarchical clustering with
dissimilarity = 1-correlation, cut the tree at the appropriate height
(1-0.8=0.2), and from each cluster keep a single representative (e.g.,
the one with the highest mean correlation with other members of the
cluster). Something along these lines (untested)

tree = hclust(1-calc.rho, method = "average")
clusts = cutree(tree, h = 0.2)
clustLevels = sort(unique(clusts))
representatives = unlist(lapply(clustLevels, function(cl)
{
  inClust = which(clusts==cl);
  rho1 = calc.rho[inClust, inClust, drop = FALSE];
  repr = inClust[ which.max(colSums(rho1)) ]
  repr
}))

the variable representatives now contains indices of the variables you
want to retain, so you could subset the calc.rho matrix as
rho.retained = calc.rho[representatives, representatives]

I haven't tested the code and it may contain bugs, but something along
these lines should get you where you want to be.

Oh, and depending on how strict you want to be with the remaining
correlations, you could use complete linkage clustering (will retain
more variables, some correlations will be above 0.8).

Peter

On Thu, Nov 14, 2019 at 10:50 AM Ana Marija  wrote:
>
> Hello,
>
> I have a data frame like this (a matrix):
> head(calc.rho)
> rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
> rs56192520  0.903 0.268 0.327 0.327 0.327 0.582
> rs3764410   0.928 0.276 0.336 0.336 0.336 0.598
> rs145984817 0.975 0.309 0.371 0.371 0.371 0.638
> rs1807401   0.975 0.309 0.371 0.371 0.371 0.638
> rs1807402   0.975 0.309 0.371 0.371 0.371 0.638
> rs35350506  0.975 0.309 0.371 0.371 0.371 0.638
>
> > dim(calc.rho)
> [1] 246 246
>
> I would like to remove from this data all highly correlated variables,
> with correlation more than 0.8
>
> I tried this:
>
> > data<- calc.rho[,!apply(calc.rho,2,function(x) any(abs(x) > 0.80))]
> > dim(data)
> [1] 246   0
>
> Can you please advise,
>
> Thanks
> Ana
>
> But this removes everything.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading in csv files, some of which have column names and some of which don't

2019-08-13 Thread Peter Langfelder

If the data are numeric (or at least some columns are numeric), a
brute force solution is to read a file once with header = FALSE, check
the relevant column(s) for being numeric, and if they are not numeric,
re-read with header = TRUE. Alternatively, if you know the column
names (headers) beforehand, read with header = FALSE and check the
first row for being equal to the known column names; if it contains
the column names, re-read with header = TRUE.

With a total of 1600 records, reading each file (at most) twice should
not be a problem.

Peter

On Tue, Aug 13, 2019 at 11:00 AM Christopher W Ryan
 wrote:
>
> Alas, we spend so much time and energy on data wrangling . . . .
>
> I'm given a collection of csv files to work with---"found data". They arose
> via saving Excel files to csv format. They all have the same column
> structure, except that some were saved with column names and some were not.
>
> I have a code snippet that I've used before to traverse a directory and
> read into R all the csv files of a certain filename pattern within it, and
> combine them all into a single dataframe:
>
> library(dplyr)
> ## specify the csv files that I will want to access
> files.to.read <- list.files(path = "H:/EH", pattern =
> "WICLeadLabOrdersDone.+", all.files = FALSE, full.names = TRUE, recursive =
> FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
>
> ## function to read csv files back in
> read.csv.files <- function(filename) {
> bb <- read.csv(filename, colClasses = "character", header = TRUE)
> bb
> }
>
> ## now read the csv files, as all character
> b <- lapply(files.to.read, read.csv.files)
>
> ddd <- bind_rows(b)
>
> But this assumes that all files have column names in their first row. In
> this case, some don't. Any advice how to handle it so that those with
> column names and those without are read in and combined properly? The only
> thing I've come up with so far is:
>
> ## function to read csv files back in
> ## Unfortunately, some of the csv files are saved with column headers, and
> some are saved without them.
> ## This presents a problem when defining the function to read them: header
> = TRUE or header = FALSE?
> ## The best solution I can think of as of 13 August 2019 is to use header =
> FALSE and skip the
> ## first row of every file. This will sacrifice one record from each csv of
> about 80 files
> read.csv.files <- function(filename) {
> bb <- read.csv(filename, colClasses = "character", header = FALSE, skip
> = 1)
> bb
> }
>
> This sacrifices about 80 out of about 1600 records. For my purposes in this
> instance, this may be acceptable, but of course I'd rather not.
>
> Thanks.
>
> --Chris Ryan
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read

2019-08-08 Thread Peter Langfelder

I would remove the quotes using sub, something like

# Read the file as text lines
text = readLines(con = file(yourFileName))
# Remove the offending quotes
text = gsub("'|\"", "", text)
# Concatenate and turn into a data frame
concat = paste(text, collapse = "\n")
df = read.table(text = concat, ...) # Change arguments as needed

HTH,

Peter

On Thu, Aug 8, 2019 at 5:41 PM Val  wrote:
>
> Thank you  all, I can read the text file but the problem was there is
> a single quote embedded  in  the first row of second column. This
> quote causes the problem
>
> vld<-read.table(text="name prof
>   A  '4.5
>   B   "3.2
>   C   5.5 ",header=TRUE)
>
> On Thu, Aug 8, 2019 at 7:24 PM Anaanthan Pillai
>  wrote:
> >
> > data <- read.table(header=TRUE, text='
> >  name prof
> >   A  4.5
> >   B  3.2
> >   C  5.5
> >  ')
> > > On 9 Aug 2019, at 8:11 AM, Val  wrote:
> > >
> > > Hi all,
> > >
> > > I am trying to red data where single and double quotes are embedded
> > > in some of the fields and prevented to read the data.   As an example
> > > please see below.
> > >
> > > vld<-read.table(text="name prof
> > >  A  '4.5
> > >  B   "3.2
> > >  C   5.5 ",header=TRUE)
> > >
> > > Error in read.table(text = "name prof \n  A  '4.5\n  B
> > > 3.2 \n  C   5.5 ",  :
> > >  incomplete final line found by readTableHeader on 'text'
> > >
> > > Is there a way how to  read this data and gt the following output
> > >  name prof
> > > 1A  4.5
> > > 2B  3.2
> > > 3C  5.5
> > >
> > > Thank you inadvertence
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bizarre color space conversion problem

2019-07-18 Thread Peter Langfelder

Sarah, if you haven't done so already, please do us (OpenBLAS users) a
big favor and report the bug, either to Fedora or directly to OpenBLAS
maintainers.

Peter

On Thu, Jul 18, 2019 at 11:46 AM Sarah Goslee  wrote:
>
> Wow. You are entirely correct. I would not have been able to pinpoint
> the problem, or how to test it. Thank you.
>
> I am unhappy you are right, since these are the fast workstations I
> use for all of my heavy-duty analysis, and it's not even *possible* to
> rerun everything.
>
> Oh dear.
>
> Here, with the options you suggested, it produces the expected results:
>
> # env OPENBLAS_CORETYPE=Haswell R --vanilla
>
> > white <- c(x = 0.953205712549377, 1, y = 1.08538438164692)
> > red10.rgb <- structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(10L, 3L), .Dimnames = 
> > list( NULL, c("r", "g", "b")))
> > convertColor(red10.rgb, from = "sRGB", to = "Lab")
>  Lab
>  [1,] 53.48418 80.01027 67.38407
>  [2,] 53.48418 80.01027 67.38407
>  [3,] 53.48418 80.01027 67.38407
>  [4,] 53.48418 80.01027 67.38407
>  [5,] 53.48418 80.01027 67.38407
>  [6,] 53.48418 80.01027 67.38407
>  [7,] 53.48418 80.01027 67.38407
>  [8,] 53.48418 80.01027 67.38407
>  [9,] 53.48418 80.01027 67.38407
> [10,] 53.48418 80.01027 67.38407
>
>
> > red10.rgb %*% as.list(environment(grDevices::colorspaces$sRGB$toXYZ))$M
>[,1]  [,2]  [,3]
>  [1,] 0.4168213 0.2149235 0.0195385
>  [2,] 0.4168213 0.2149235 0.0195385
>  [3,] 0.4168213 0.2149235 0.0195385
>  [4,] 0.4168213 0.2149235 0.0195385
>  [5,] 0.4168213 0.2149235 0.0195385
>  [6,] 0.4168213 0.2149235 0.0195385
>  [7,] 0.4168213 0.2149235 0.0195385
>  [8,] 0.4168213 0.2149235 0.0195385
>  [9,] 0.4168213 0.2149235 0.0195385
> [10,] 0.4168213 0.2149235 0.0195385
>
> On Thu, Jul 18, 2019 at 1:59 PM Ivan Krylov  wrote:
> >
> > On Thu, 18 Jul 2019 13:30:09 -0400
> > Sarah Goslee  wrote:
> >
> > > I'm not even remotely a hardware expert: if the difference is due to
> > > changes in the instruction set, I assume that has potential
> > > consequences for other things, and I just happened to spot it in this
> > > particular case because it's visualization-based? (Yikes.)
> >
> > Yes, this might be bad. I have heard about OpenBLAS (specifically, the
> > matrix product routine) misbehaving on certain AVX-512 capable
> > processors, so much that they had to disable some optimizations in
> > 0.3.6 [*], which you already have installed. Still, would `env
> > OPENBLAS_CORETYPE=Haswell R --vanilla` give a better result?
> >
> > > As it says in my first email (but way at the bottom), I'd already
> > > gotten as far as locating the problem in this line from
> > > grDevices::convertColor()
> > >
> > > xyz <- from$toXYZ(color, from.ref.white)
> >
> > Thanks for confirming this! It felt that I had to make sure, since the
> > behaviour we observe is so confusing.
> >
> > > > (red.xyz <- grDevices::colorspaces$sRGB$toXYZ(red.rgb,
> > > > white.point))[1,]
> > > [1] 0.7733981 0.9280769 0.1383974
> > > > # [1] 0.4168213 0.2149235 0.0195385
> >
> > One last check: would
> >
> > red.rgb %*% as.list(environment(grDevices::colorspaces$sRGB$toXYZ))$M
> >
> > still produce different results on your computers?
> >
> > > blas.x86_64 3.8.0-12.fc30
> > > openblas.x86_64 0.3.6-2.fc30
> >
> > I do not know enough about Fedora's "alternatives" system, but it does
> > look like R is using OpenBLAS.
> >
> > > A: working
> > > Model name:  Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
> >
> > > B: not working
> > > Model name:  Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
> >
> > Again, this points in the direction of OpenBLAS not doing AVX-512 math
> > properly. Let's hope that OPENBLAS_CORETYPE=Haswell solves it.
> >
> > --
> > Best regards,
> > Ivan
> >
> > [*]
> > https://github.com/xianyi/OpenBLAS/issues/1955
> > https://github.com/xianyi/OpenBLAS/issues/2029
> > https://github.com/xianyi/OpenBLAS/issues/2168
> > https://github.com/xianyi/OpenBLAS/issues/2182
>
>
>
> --
> Sarah Goslee (she/her)
> http://www.numberwright.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Peter Langfelder

Glad to be of help.

Peter

On Tue, Apr 9, 2019 at 10:03 PM Matthew Snyder  wrote:

> You are not late to the party. And you solved it!
>
> Thank you very much. You just made my PhD a little closer to reality!
>
> Matt
>
>
>
> *Matthew R. Snyder*
> *~*
> PhD Candidate
> University Fellow
> University of Toledo
> Computational biologist, ecologist, and bioinformatician
> Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> matthew.snyd...@rockets.utoledo.edu
> msnyder...@gmail.com
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
>  Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;>
>  04/09/19,
> 10:01:53 PM
>
> On Tue, Apr 9, 2019 at 9:37 PM Peter Langfelder <
> peter.langfel...@gmail.com> wrote:
>
>> Sorry for being late to the party, but has anyone suggested a minor
>> but important modification of the code from stack exchange?
>>
>> xyplot(mpg ~ wt | cyl,
>>   panel = function(x, y, ..., groups, subscripts) {
>>   pch <- mypch[factor(carb)[subscripts]]
>>   col <- mycol[factor(gear)[subscripts]]
>>   grp <- c(gear,carb)
>>   panel.xyplot(x, y, pch = pch, col = col)
>>   }
>> )
>>
>> From the little I understand about what you're trying to do, this may
>> just do the trick.
>>
>> Peter
>>
>> On Tue, Apr 9, 2019 at 2:43 PM Matthew Snyder 
>> wrote:
>> >
>> > I am making a lattice plot and I would like to use the value in one
>> column
>> > to define the pch and another column to define color of points.
>> Something
>> > like:
>> >
>> > xyplot(mpg ~ wt | cyl,
>> >data=mtcars,
>> >col = gear,
>> >pch = carb
>> > )
>> >
>> > There are unique pch points in the second and third panels, but these
>> > points are only unique within the plots, not among all the plots (as
>> they
>> > should be). You can see this if you use the following code:
>> >
>> > xyplot(mpg ~ wt | cyl,
>> >data=mtcars,
>> >groups = carb
>> > )
>> >
>> > This plot looks great for one group, but if you try to invoke two groups
>> > using c(gear, carb) I think it simply takes unique combinations of those
>> > two variables and plots them as unique colors.
>> >
>> > Another solution given by a StackExchange user:
>> >
>> > mypch <- 1:6
>> > mycol <- 1:3
>> >
>> > xyplot(mpg ~ wt | cyl,
>> >   panel = function(x, y, ..., groups, subscripts) {
>> >   pch <- mypch[factor(carb[subscripts])]
>> >   col <- mycol[factor(gear[subscripts])]
>> >   grp <- c(gear,carb)
>> >   panel.xyplot(x, y, pch = pch, col = col)
>> >   }
>> > )
>> >
>> > This solution has the same problems as the code at the top. I think the
>> > issue causing problems with both solutions is that not every value for
>> each
>> > group is present in each panel, and they are almost never in the same
>> > order. I think R is just interpreting the appearance of unique values
>> as a
>> > signal to change to the next pch or color. My actual data file is very
>> > large, and it's not possible to sort my way out of this mess. It would
>> be
>> > best if I could just use the value in two columns to actually define a
>> > color or pch for each point on an entire plot. Is there a way to do
>> this?
>> >
>> > Ps, I had to post this via email because the Nabble site kept sending
>> me an
>> > error message: "Message rejected by filter rule match"
>> >
>> > Thanks,
>> > Matt
>> >
>> >
>> >
>> > *Matthew R. Snyder*
>> > *~*
>> > PhD Candidate
>> > University Fellow
>> > University of Toledo
>> > Computational biologist, ecologist, and bioinformatician
>> > Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
>> > matthew.snyd...@rockets.utoledo.edu
>> > msnyder...@gmail.com
>> >
>> >
>> >
>> > [image: Mailtrack]
>> > <
>> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
>> >
>> > Sender
>> > notified by
>> > Mailtrack
>> > <
>> https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;
>> >
>> > 04/09/19,
>> > 1:49:27 PM
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Define pch and color based on two different columns

2019-04-09 Thread Peter Langfelder

Sorry for being late to the party, but has anyone suggested a minor
but important modification of the code from stack exchange?

xyplot(mpg ~ wt | cyl,
  panel = function(x, y, ..., groups, subscripts) {
  pch <- mypch[factor(carb)[subscripts]]
  col <- mycol[factor(gear)[subscripts]]
  grp <- c(gear,carb)
  panel.xyplot(x, y, pch = pch, col = col)
  }
)

>From the little I understand about what you're trying to do, this may
just do the trick.

Peter

On Tue, Apr 9, 2019 at 2:43 PM Matthew Snyder  wrote:
>
> I am making a lattice plot and I would like to use the value in one column
> to define the pch and another column to define color of points. Something
> like:
>
> xyplot(mpg ~ wt | cyl,
>data=mtcars,
>col = gear,
>pch = carb
> )
>
> There are unique pch points in the second and third panels, but these
> points are only unique within the plots, not among all the plots (as they
> should be). You can see this if you use the following code:
>
> xyplot(mpg ~ wt | cyl,
>data=mtcars,
>groups = carb
> )
>
> This plot looks great for one group, but if you try to invoke two groups
> using c(gear, carb) I think it simply takes unique combinations of those
> two variables and plots them as unique colors.
>
> Another solution given by a StackExchange user:
>
> mypch <- 1:6
> mycol <- 1:3
>
> xyplot(mpg ~ wt | cyl,
>   panel = function(x, y, ..., groups, subscripts) {
>   pch <- mypch[factor(carb[subscripts])]
>   col <- mycol[factor(gear[subscripts])]
>   grp <- c(gear,carb)
>   panel.xyplot(x, y, pch = pch, col = col)
>   }
> )
>
> This solution has the same problems as the code at the top. I think the
> issue causing problems with both solutions is that not every value for each
> group is present in each panel, and they are almost never in the same
> order. I think R is just interpreting the appearance of unique values as a
> signal to change to the next pch or color. My actual data file is very
> large, and it's not possible to sort my way out of this mess. It would be
> best if I could just use the value in two columns to actually define a
> color or pch for each point on an entire plot. Is there a way to do this?
>
> Ps, I had to post this via email because the Nabble site kept sending me an
> error message: "Message rejected by filter rule match"
>
> Thanks,
> Matt
>
>
>
> *Matthew R. Snyder*
> *~*
> PhD Candidate
> University Fellow
> University of Toledo
> Computational biologist, ecologist, and bioinformatician
> Sponsored Guest Researcher at NOAA PMEL, Seattle, WA.
> matthew.snyd...@rockets.utoledo.edu
> msnyder...@gmail.com
>
>
>
> [image: Mailtrack]
> 
> Sender
> notified by
> Mailtrack
> 
> 04/09/19,
> 1:49:27 PM
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with gsub function

2019-03-15 Thread Peter Langfelder

If you want to remove just the hyphen, why not do

sub("-", "", tb2a$TID)

sub("-", "", "73-017323")
[1] "73017323"

Am I missing something?

Peter

On Fri, Mar 15, 2019 at 12:46 PM Bill Poling  wrote:
>
> Good afternoon.
>
> sessionInfo()
> #R version 3.5.3 (2019-03-11)
> #Platform: x86_64-w64-mingw32/x64 (64-bit)
> #Running under: Windows >= 8 x64 (build 9200)
>
> I am using gsub function to remove a hyphen in a 9 character column of values 
> in order to convert it to integer.
>
> Works fine except where the second segment has a leading 0, then it is 
> eliminating the 0
>
> Example "73-0700090" becomes " 73700090"
>  "77-0633896" becomes "77633896"
>
> Is there a remedy for this?
>
> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")
>
> head(tb2a$TID,n=10)
>  [1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" 
> "61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117"
> > head(tb2a$TID2,n=10)
>  [1] "111352310" "452711804" "356001540" "77633896"  "621762545" "611029768" 
> "73700090"  "47376604"  "47486026"  "383833117"
>
> I have googled the problem and have not found a solution.
>
> http://www.endmemo.com/program/R/gsub.php
> http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html
>
>
> Thank you.
>
> WHP
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confusion Table

2019-01-16 Thread Peter Langfelder

The lazy way is to do

tst_tab = tst_tab[c(2,1), c(2,1)]

The less lazy way is something like

tst_tab <- table(predicted = factor(tst_pred, levels = c("Yes",
"No")),  actual = factor(default_tst$default, levels = c("Yes",
"No")))

Peter

On Wed, Jan 16, 2019 at 4:39 PM  wrote:
>
> R-Help
>
>
>
> R-Help community is there an simple straight forward way  of changing my
> confusion table output to list "Yes" before "No" rather than "No" before
> "Yes" - R default.
>
>
>
> # Making predictions on the test set.
>
> tst_pred <- ifelse(predict(model_glm, newdata = default_tst, type =
> "response") > 0.5, "Yes", "No")
>
> tst_tab <- table(predicted = tst_pred, actual = default_tst$default)
>
> tst_tab
>
>
>
> ##actual
>
> ## predicted   No  Yes
>
> ##  No  4817  113
>
> ##  Yes  1852
>
>
>
> Jeff
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] randomForest out of bag prediction

2019-01-12 Thread Peter Langfelder

See inline.

On Sat, Jan 12, 2019 at 9:56 AM Witold E Wolski  wrote:

> ypred_oob <- predict(diachp.rf)

AFAIK these are, indeed, the out-of-bag predictions.

> dataX <- data %>% select(-quality) # remove response.
> ypred <- predict( diachp.rf, dataX )

These are not out of bag predictions. dataX is interpreted as new data
(argument newdata), and it is assumed to contain entirely new
observations. Each observation in dataX is fed through all of the
trees and the predictions are then pooled. There is no out-of-bag here
- all of the new data observations are assumed to be independent of
the training set.

>
> What I find even more disturbing is that 100% accuracy for ypred.
> Would you agree that this is rather unexpected?

It is expected (and not disturbing) l if your training set had enough
variables (or signal) to create trees that fit the training data
perfectly.

HTH,

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using apply

2018-10-30 Thread Peter Langfelder

It should be said that for many basic statistics, there are faster
functions than apply, for example here you want

sum = colSums(x)

As already said, for sum of squares you would do colSums(x^2).

Many useful functions of this kind are implemented in package
matrixStats. Once you install it, either look at the package manual or
type ls("package:matrixStats") to see a list of functions. Most if not
all have self-explanatory names.

HTH,

Peter
On Tue, Oct 30, 2018 at 7:28 PM Steven Yen  wrote:
>
> I need help with "apply". Below, I have no problem getting the column sums.
> 1. How do I get the sum of squares?
> 2. In general, where do I look up these functions?
> Thanks.
>
> x<-matrix(1:10,nrow=5); x
> sum <- apply(x,2,sum); sum
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove text from nested list

2018-10-25 Thread Peter Langfelder

You should be more specific about what you want to replace and with
what. The pattern you use, namely "[0-9][0-9]/[0-9[0-9].*com", does
not (AFAICS) match any of the strings in your data, so don't be
surprised that your commands do not change anything.

If you have a correct pattern and replacement and all lists have depth
3, using something like

lapply(mylist, lapply, lapply, function(y) gsub(pattern, replacement, y))

should work. If your list has a variable depth, I would use a
recursive function, something like

recursiveGSub = function(x, pattern, replacement)
{
  if (is.atomic(x)) gsub(pattern, replacement, x) else lapply(x,
recursiveGSub, pattern, replacement)
}

Example:

lst = list("a001", list("b001", list("c001", "d001")))

lst
recursiveGSub(lst, "00", "")


HTH,

Peter
On Thu, Oct 25, 2018 at 6:04 PM Ek Esawi  wrote:
>
> Hi All—
>
> I have a list that contains multiple sub-lists and each sub-list
> contains multiple  sub(sub-lists), each of the sub(sub-lists) is made
> up of matrices of text. I want to replace some of the text in some
> parts in the matrices on the list. I tried gsub and stringr,
> str_remove, but nothing seems to work
>
> I tried:
>
> lapply(mylist, function(x) lapply(x, function(y)
> gsub("[0-9][0-9]/[0-9[0-9].*com","",y)))
> lapply(mylist, function(x) str_remove(x,"[0-9][0-9]/[0-9[0-9].*com"))
>
> Any help is greatly apprercaited.
>
>
>
> mylist—this is just an example
>
> [[1]]
> [[1]][[1]]
> [[1]][[1]][[1]]
> [,1]  [,2]  [,3]  [,4] [,5]
> [1,] "12/30 12/30"  "ABABABABABAB"  "8.00"
> [2,] "01/02 01/02"  "”.   “99"
> [3,] "01/02 01/02"  "CACACACACACC” "55.97"
>
> [[1]][[1]][[2]]
> [,1]  [,2]
> [1,] "12/30 12/30" "DDD” “29"
> [2,] "12/30 12/30"  :GGG” “333”
>
> [[1]][[2]]
> [[1]][[2]][[1]]
> [,1]  [,2]  [,3] [,4]  [,5]
> [1,]  "01/02 01/02" "ThankYou" “23”
> [2,] "01/02 01/02"  "Standard data"  "251"
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM Model Summary

2018-10-16 Thread Peter Langfelder

The coefficients are best obtained as summary(Model)$coefficients.
This is a matrix can than be saved as a csv file and opened in excel
or other spreadsheet software.

HTH,

Peter
On Tue, Oct 16, 2018 at 9:44 AM Neslin, Scott A.
 wrote:
>
> R-Help:
>
> We are working with your GLM R package.  The Summary(Model) now gets printed 
> by the program as one object and we want to put the coefficient columns into 
> Excel.  We took an initial stab at this by counting the number of characters 
> occupied by each column.  But we have now learned that the number of 
> characters in a column depends on the length of the variable names, so is not 
> a constant number (e.g., 54 characters to a line).
>
> We therefore ask, is it possible for us to get the Summary(Model) column by 
> column, i.e., a separate object for each column?  That way we could assemble 
> an Excel table easily rather than having to count the number of characters.
>
> Is this possible for us to do by ourselves?  Or could you modify the package 
> in some way?
>
> We appreciate your attention.  Thank you!
>
> Scott Neslin
> Prasad Vana
>
> Dartmouth College
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Set attributes for object known by name

2018-10-10 Thread Peter Langfelder

oops, I think the right code would be

x = get(varname)
attr(x, "foo") = "bar"
assign(varname, x)

On Wed, Oct 10, 2018 at 9:30 PM Peter Langfelder 
wrote:

> I would try something like
>
> x = get(myvarname)
> attr(x, "foo") = "bar"
> assign(varname, x)
>
> HTH,
>
> Peter
>
> On Wed, Oct 10, 2018 at 9:15 PM Marc Girondot via R-help <
> r-help@r-project.org> wrote:
>
>> Hello everybody,
>>
>> Has someone the solution to set attribute when variable is known by name ?
>>
>> Thanks a lot
>>
>> Marc
>>
>> Let see this exemple:
>>
>> # The variable name is stored as characters.
>>
>> varname <- "myvarname"
>> assign(x = varname, data.frame(A=1:5, B=2:6))
>> attributes(myvarname)
>>
>> $names
>> [1] "A" "B"
>> $class
>> [1] "data.frame"
>> $row.names
>> [1] 1 2 3 4 5
>>
>> # perfect
>>
>> attributes(get(varname))
>>
>> # It works also
>>
>> $names
>> [1] "A" "B"
>> $class
>> [1] "data.frame"
>>
>> $row.names
>> [1] 1 2 3 4 5
>>
>> attributes(myvarname)$NewAtt <- "MyAtt"
>>
>> # It works
>>
>> attributes(get(varname))$NewAtt2 <- "MyAtt2"
>> Error in attributes(get(varname))$NewAtt2 <- "MyAtt2" :
>>impossible de trouver la fonction "get<-"
>>
>> # Error...
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Set attributes for object known by name

2018-10-10 Thread Peter Langfelder

I would try something like

x = get(myvarname)
attr(x, "foo") = "bar"
assign(varname, x)

HTH,

Peter

On Wed, Oct 10, 2018 at 9:15 PM Marc Girondot via R-help <
r-help@r-project.org> wrote:

> Hello everybody,
>
> Has someone the solution to set attribute when variable is known by name ?
>
> Thanks a lot
>
> Marc
>
> Let see this exemple:
>
> # The variable name is stored as characters.
>
> varname <- "myvarname"
> assign(x = varname, data.frame(A=1:5, B=2:6))
> attributes(myvarname)
>
> $names
> [1] "A" "B"
> $class
> [1] "data.frame"
> $row.names
> [1] 1 2 3 4 5
>
> # perfect
>
> attributes(get(varname))
>
> # It works also
>
> $names
> [1] "A" "B"
> $class
> [1] "data.frame"
>
> $row.names
> [1] 1 2 3 4 5
>
> attributes(myvarname)$NewAtt <- "MyAtt"
>
> # It works
>
> attributes(get(varname))$NewAtt2 <- "MyAtt2"
> Error in attributes(get(varname))$NewAtt2 <- "MyAtt2" :
>impossible de trouver la fonction "get<-"
>
> # Error...
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tibble question with a mean

2018-09-20 Thread Peter Langfelder

I don't know tibble, so I'll do the same with a plain data frame:

a =
data.frame(x=LETTERS[1:4],y=1:4,z=rnorm(4),a=c("dog","cat","tree","ferret"))
> a
  x y   z  a
1 A 1 -0.08264865dog
2 B 2  0.32344426cat
3 C 3 -0.80416061   tree
4 D 4  1.27052529 ferret
> mean(a[2:3])
[1] NA
Warning message:
In mean.default(a[2:3]) : argument is not numeric or logical: returning NA
> mean(as.matrix(a[2:3]))
[1] 1.338395

The reason you get an error on mean(a[2:3]) is that a[2:3] is still a data
frame (a special list) and you cannot simply apply mean to a list. You need
to first convert to a matrix or vector which can then be fed to mean().

Peter


On Thu, Sep 20, 2018 at 5:50 PM Erin Hodgess 
wrote:

> Hello!
>
> Here is a toy tibble problem:
>
> xt <-
> tibble(x=LETTERS[1:4],y=1:4,z=rnorm(4),a=c("dog","cat","tree","ferret"))
> str(xt)
> Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of  4 variables:
>  $ x: chr  "A" "B" "C" "D"
>  $ y: int  1 2 3 4
>  $ z: num  0.3246 0.0504 0.339 0.4872
>  $ a: chr  "dog" "cat" "tree" "ferret"
> #No surprise
>  xt %>% mean
> [1] NA
> Warning message:
> In mean.default(.) : argument is not numeric or logical: returning NA
> #surprised!
> mean(xt[2:3])
> [1] NA
> Warning message:
> In mean.default(xt[2:3]) : argument is not numeric or logical: returning NA
>  xt[, 2:3] %>% mean
> [1] NA
> Warning message:
> In mean.default(.) : argument is not numeric or logical: returning NA
>
> I have a feeling that I'm doing something silly wrong.  Has anyone run into
> this, please?  I saw something like this on this list, but didn't see a
> solution.
>
> Thanks,
> Erin
>
>
> Erin Hodgess, PhD
> mailto: erinm.hodg...@gmail.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sink() output to another directory

2018-09-13 Thread Peter Langfelder

Apologies if my advice wasn't clear: the file you want to write to goes in
the sink() function/command. You can put the file anywhere on your file
system, no need to write into current directory and then move the file.

The print command is completely unaware of the file you point to in sink().
Technically, print() sends output to a device called "standard output"
which is usually screen, but it can be changed to a file (_any_ writable
file) using the sink() command.

Hope this helps,

Peter

On Thu, Sep 13, 2018 at 4:35 PM Rich Shepard 
wrote:

> On Thu, 13 Sep 2018, Peter Langfelder wrote:
>
> > Remove the / from the print command, it does not belong there.
>
> Peter,
>
>So the print() function cannot accept a relative path to a different
> directory for its output? This does seem to be the case:
>
> source('rainfall-dubois-crk-all.r')
> Error in source("rainfall-dubois-crk-all.r") :
>rainfall-dubois-crk-all.r:25:7: unexpected '/'
> 24: sink('stat-summaries/estacada-wnw-precip.txt')
> 25: print(/
>^
>
>Then I'll print to the cwd and move the files manually afterwards.
>
> Thanks,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sink() output to another directory

2018-09-13 Thread Peter Langfelder

There is no path in print. The path (file) is set in sink().

Peter

On Thu, Sep 13, 2018 at 4:35 PM Rich Shepard 
wrote:

> On Thu, 13 Sep 2018, Peter Langfelder wrote:
>
> > Remove the / from the print command, it does not belong there.
>
> Peter,
>
>So the print() function cannot accept a relative path to a different
> directory for its output? This does seem to be the case:
>
> source('rainfall-dubois-crk-all.r')
> Error in source("rainfall-dubois-crk-all.r") :
>rainfall-dubois-crk-all.r:25:7: unexpected '/'
> 24: sink('stat-summaries/estacada-wnw-precip.txt')
> 25: print(/
>^
>
>Then I'll print to the cwd and move the files manually afterwards.
>
> Thanks,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sink() output to another directory

2018-09-13 Thread Peter Langfelder

For the second time: Rich, there should be no slash in the print() command.

Use the form

sink("../directory/file")
print(summary(foo)) ### no slashes here
sink(NULL)

Peter

On Thu, Sep 13, 2018 at 7:12 PM Rich Shepard 
wrote:

> On Thu, 13 Sep 2018, Henrik Bengtsson wrote:
>
> >> sink('stat-summaries/estacada-se-precip.txt')
> >> print(summary(estacada_se_wx))
> >> sink()
> >>
> >> while accepting:
> >>
> >> pdf('../images/rainfall-estacada-se.pdf')
> >>   
> >> plot(rain_est_se)
> >> dev.off()
> >>
> >>Changing the sink() file to
> >> './stat-summaries/estacada-se-precip.txt'
> >>
> >> generates the same error
> >
> > "same error" as what? (ambiguity is the reason for not being able to
> > help you - all the replies in this thread this far are correct and on
> > the spot)
> >
> > BTW, not that it should matter, what is your operating system and
> version of R?
>
> Henrik,
>
>As I wrote in earlier messages:
>
> sink('stat-summaries/estacada-wnw-precip.txt')
> print(summary(estacada_se_wx))
> sink()
>
> results in
>
> 24: sink('stat-summaries/estacada-wnw-precip.txt')
> 25: print(/
> ^
> Does not matter if I use single or double quotes.
>
>The message that print() doesn't like the forward slash results when I
> specify 'stat-summaries/estacada-wnw-precip.txt' or
> './stat-summaries/estacada-wnw-precip.txt'.
>
>Running R-3.5.1 on Slackware-14.2.
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sink() output to another directory

2018-09-13 Thread Peter Langfelder

Remove the / from the print command, it does not belong there.

sink("../directory/file.txt");
print(summary(foo))
sink(NULL)


On Thu, Sep 13, 2018 at 4:03 PM Rich Shepard 
wrote:

> On Thu, 13 Sep 2018, Rich Shepard wrote:
>
> >  sink('example-output.txt')
> >  print(summary(df))
> >  sink()
>
>Let me expand on this. When the script contains
>
> # Open PDF device to save plot
> pdf('../images/rainfall-estacada-se.pdf')
> ...
> plot(rain_est_se)
> dev.off()
>
> the file, rainfall-estacada-se.pdf is placed in the images directory, which
> is on the same directory level as the one in which the script is being run.
> I thought the equivalent syntax with sink() would work, but the print
> command rejects the forward slash that plot() accepts:
>
> Error in source("rainfall-dubois-crk-all.r") :
>rainfall-dubois-crk-all.r:25:7: unexpected '/'
>
>Is this more clear?
>
> Thanks,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] histogram in GNU R....

2018-09-07 Thread Peter Langfelder

A simpler short term solution is to execute dev.off() and look for the plot
in file Rplots.pdf in the current directory. Depending on the OS of the
local computer, you should be able to point a file browser at the EC
instance and simply click the file to open in in a pdf viewer on the local
machine.

Peter

On Fri, Sep 7, 2018 at 10:31 AM William Dunlap via R-help <
r-help@r-project.org> wrote:

> You may have to install X11 stuff to your ec2 instance.  E.g., googling for
> "ec2 X11 forwarding" showed things like the following:
>
> Re: X11 forwarding to access AWS EC2 Linux instance
> Posted by: wilderfield
> 
> Posted on: Apr 5, 2018 11:31 AM
> [image: in response to: LE M.]
>  in
> response to: LE M.
> 
>   [image: Click to reply to this thread]
>  Reply
> 
> x11 
> sudo yum install xorg-x11-xauth
>
> The above is all I needed to get X11 forwarding working over ssh
>
> When ssh-ing to the instance, use the -X flag
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Sep 7, 2018 at 1:26 AM, akshay kulkarni 
> wrote:
>
> > dear members,
> >  I am running R on Linux AWS ec2 instance.
> > When I try to create a histogram in it, I am running into problems:
> >
> > > xht <- c(1,2,3,4,5,6,7,8,9,10)
> > >  hist(xht)
> > >
> >
> > when I type hist(xht), it goes to the next prompt. More importantly,
> there
> > is no error message. So, the most probable conclusion is that the command
> > gets executed. But there is no pop up screen with a histogram, and
> nothing
> > else...
> >
> > whats going on?
> >
> > How can I circumvent the help of histogram(which is not available in GNU
> > R)? summary(xht) would help, but not much. Any other function that can
> give
> > information, in LINUX R, that a histogram gives, in LINUX CLI?
> >
> > Very many thanks for your time and effort...
> > Yours sincerely,
> > AKSHAYM KULKARNI
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R shared library (/usr/lib64/R/lib/libR.so) not found.

2018-08-23 Thread Peter Langfelder

On Thu, Aug 23, 2018 at 7:33 AM Berwin A Turlach
 wrote:
>
> G'day Rolf,
>
> On Thu, 23 Aug 2018 23:34:38 +1200
> Rolf Turner  wrote:
>
> > I guess I should have said --- I did
> >
> >  sudo make prefix=/usr install
> >
> > which puts stuff into /usr rather than into /usr/local.
>
> ???
>
> I do not remember ever specifying "prefix=foo" at the make install
> stage.  Not for any software that uses autoconf 
>
> I thought the prefix should be specified to ./configure and after that
> just
> make
> make check
> make install
>
> I am pretty sure that the location of RHOME is set by the path
> specified (explicitly or implicitly) to ./configure.  If you then
> install R at another location with your construct, some problems seem
> to be pre-programmed.  But I could be wrong.

The manual, specifically

https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Installation

documents this way of choosing the installation directory.

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Re: bar plot add space to group data

2018-08-19 Thread Peter Langfelder

On Sun, Aug 19, 2018 at 7:15 AM  wrote:
>
> August 19, 2018 4:58 AM, "Peter Langfelder"  
> wrote:
>
> > To the OP, try formatting the data to be plotted as a matrix, not as a
> > vector
>
> CSV data provided in a previous message; is not the data formatted as a 
> matrix?

I meant the data you give to barplot - your code supplies only the
third column of the data frame, so barplot only sees a vector. I would
try something like

plotData = do.call(cbind, tapply(csv.data$percentage, csv.data$year, identity))

barplot(plotData, )

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Re: bar plot add space to group data

2018-08-18 Thread Peter Langfelder

My guess is that space has no effect because (1) the first element is
zero and (2) the code in OP's message has
barplot(gceac[,3], ...

i.e. barplot does not see a matrix, only a vector.

To the OP, try formatting the data to be plotted as a matrix, not as a
vector, then the space argument should be useful to add space between
groups.

Peter


On Sat, Aug 18, 2018 at 4:53 PM Rolf Turner  wrote:
>
>
> Jim:
>
> (a) There's no legend.
>
> (b) I am still curious as to why the OP's code didn't work, in that
> the "space=c(0,2)" argument seemed to have no effect.
>
> cheers,
>
> Rolf
>
> On 18/08/18 20:45, Jim Lemon wrote:
> > Hi citc,
> > Try this:
> >
> > geac<-matrix(c(9,9,8,8,8,23,23,23,23,22,27,27,27,25,24,
> >   19,19,19,20,20,17,17,17,18,19,8,8,8,9,9,2,2,3,3,3),ncol=5,byrow=TRUE)
> > library(plotrix)
> > barp(geac,names.arg=2014:2018,main="A level grades chemistry",
> >   xlab="Year",ylab="Percentage of each grade",ylim=c(0,30),
> >   col=c("white","lightblue","blue","orange","green","red","pink"))
> >
> > Jim
> >
> > On Fri, Aug 17, 2018 at 9:55 PM,   wrote:
> >> R-users,
> >>
> >> Can someone please advise how to improve the code below that was used to 
> >> produce the graph shown at the following hyperlink 
> >> (https://chemistryinthecity.neocities.org/content/entry1808.html#17)? The 
> >> request is to add space between the annual data groups.
> >>
> >> barplot(gceac[,3], xlab='year', ylab='percentage of each grade', 
> >> col=c('aliceblue', 'aquamarine', 'blue', 'chocolate', 'darkgreen', 
> >> 'firebrick', 'violet'), legend=gceac[1:7,2], args.legend = list(x = 40, y 
> >> = 30, title='grades'), main='A-level grades, chemistry', beside=T, 
> >> space=c(0,2), ylim=c(0,30))
> >> years<-c(2014,2015,2016,2017,2018)
> >> mtext(years, side=1, at=c(5, 12, 19, 26, 33))
> >> R-users, Can someone please advise how to improve the code below that was 
> >> used to produce the graph shown at the following hyperlink 
> >> (https://chemistryinthecity.neocities.org/content/entry1808.html#17)? The 
> >> request is to add space between the annual data groups.  
> >> barplot(gceac[,3], xlab='year', ylab='percentage of each grade', 
> >> col=c('aliceblue', 'aquamarine', 'blue', 'chocolate', 'darkgreen', 
> >> 'firebrick', 'violet'), legend=gceac[1:7,2], args.legend = list(x = 40, y 
> >> = 30, title='grades'), main='A-level grades, chemistry', beside=T, 
> >> space=c(0,2), ylim=c(0,30)) years<-c(2014,2015,2016,2017,2018) 
> >> mtext(years, side=1, at=c(5, 12, 19, 26, 33))
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How deep into function calls does trycatch() work

2018-08-16 Thread Peter Langfelder

AFAIK a try or tryCatch will intercept the error thrown by stop(). Why
not try it and see if it works?

Peter
On Thu, Aug 16, 2018 at 1:05 PM Roy Mendelssohn - NOAA Federal via
R-help  wrote:
>
> Hi All:
>
> I am using another package in a project I have. Because of that,  I have no 
> control on how that package behaves or what it returns.  This package has a 
> function foo()  that calls httr::GET(),  and if it gets an error from 
> httr::GET() it calls the following routine:
>
>
> err_handle2 <- function(x) {
>   if (x$status_code > 201) {
> tt <- content(x, "text")
> mssg <- xml_text(xml_find_all(read_html(tt), "//h1"))
> stop(paste0(mssg, collapse = "\n\n"), call. = FALSE)
>   }
> }
>
> My question is if I embed my call to foo() in try...catch will that override 
> the stop() call or am I a goner, or is there another way to override it,  
> given that I can't change the code to err_handle2().
>
> Thanks,
>
> -Roy
>
>
> **
> "The contents of this message do not reflect any position of the U.S. 
> Government or NOAA."
> **
> Roy Mendelssohn
> Supervisory Operations Research Analyst
> NOAA/NMFS
> Environmental Research Division
> Southwest Fisheries Science Center
> ***Note new street address***
> 110 McAllister Way
> Santa Cruz, CA 95060
> Phone: (831)-420-3666
> Fax: (831) 420-3980
> e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/
>
> "Old age and treachery will overcome youth and skill."
> "From those who have been given much, much will be expected"
> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fast matrix multiplication

2018-08-13 Thread Peter Langfelder

On Mon, Aug 13, 2018 at 12:18 PM Ista Zahn  wrote:
>
> On Mon, Aug 13, 2018 at 2:41 PM Ravi Varadhan  wrote:
> >
> > Hi Ista,
> > Thank you for the response.  I use Windows.  Is there a pre-compiled 
> > version of openBLAS for windows that would make it easy for me to use it?
>
> Not sure. If you want an easy way I would use MRO. More info at
> https://mran.microsoft.com/rro#intelmkl1

OpenBLAS is provided as a binary for Windows, see http://www.openblas.net/ .

You may need to compile R from source though, unless you can use an
equivalent of the linux trick to replace libRblas.so with a symlink to
the compiled openBLAS library.

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mysterious seg fault.

2018-08-11 Thread Peter Langfelder

Segfaults are not always repeatable. You may have an undefined pointer that
sometime points into unreachable or unallocated memory, causing a segfault,
and sometimes may point into valid memory, without causing a segfault.

You may want to read
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Checking-memory-access
for tips on how to diagnose such problems.

HTH,

Peter

On Sat, Aug 11, 2018 at 2:13 PM Rolf Turner  wrote:

>
> I am getting a seg fault from a package that I am working on, and I am
> totally flummoxed by it.  The fault presumably arises from dynamically
> loaded Fortran code, but I'm damned if I can see where the error lies.
>
> In an effort to diagnose the problem I created a "non-package" version
> of the code.  That is, I copied all the *.R files and *.f file into a
> new directory.  In that directory I created a *.so file using
> R CMD SHLIB.
>
> In the R code I removed all the "PACKAGE=" lines from the calls to
> .Fortran() and put in appropriate dyn.load() calls.
>
> I then started R in this new "clean" directory and sourced all of the
> *.R files.
>
> I then issued the command that produces the seg fault when run under the
> aegis of the package.  The command ran without a murmur of complaint.
> WTF?
>
> Can anyone suggest a reason why a seg fault might arise when the code is
> run in the context of a package, but not when it is run in "standalone
> mode"?
>
> I have checked and rechecked my init.c file --- which is the only thing
> that I can think of that might create a difference --- and cannot find
> any discrepancy between the declarations in the init.c file and the
> Fortran code.
>
> The package is a bit complicated, so giving more detail would be
> cumbersome.  Also I have no idea what aspects of detail would be
> relevant.  If anyone would like more info, feel free to ask.
>
> I would really appreciate it if someone could give me some suggestions
> before I go *completely* mad!
>
> cheers,
>
> Rolf Turner
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to Generalize a Function in R

2018-08-09 Thread Peter Langfelder

Well, your function uses AVB$AVB.Close, so I assumed AVB is a list (a
data frame can be thought of as a special list). What do you get when
you type class(AVB)?

Peter
On Thu, Aug 9, 2018 at 2:24 PM rsherry8  wrote:
>
> Peter,
>
> Thanks for the response. I tired the following command:
>  AVB[["AVB.Close"]]
> and I got:
>  Error in AVB[["AVB.Close"]] : subscript out of bounds
> Are you assuming that AVB is a data frame? I do not think AVB is a data
> frame. Is there a way
> for me to check?
> Thanks,
> Bob
>
> On 8/9/2018 3:46 PM, Peter Langfelder wrote:
> > If I understand it correctly, the function getSymbols creates a
> > variable with the name being the stock symbol. Then use the function
> > get(symbol) to retrieve the value of the variable whose name is
> > contained in the character string `symbol'. Assign that to a variable
> > (e.g. AVB). You may also have to modify the names of the components
> > you retrieve from the list AVB. For that, you can use
> > AVB[["AVB.Close"]] instead of AVB$AVB.Close. You can them use
> > something like AVB[[paste0(symbol, ".Close"]] to generalize the
> > retrieval of list components.
> >
> > HTH,
> >
> > Peter
> > On Thu, Aug 9, 2018 at 12:40 PM rsherry8  wrote:
> >>
> >> I wrote the following function:
> >>
> >> # This method gets historical stock data for the stock Avalon Bay whose
> >> symbol is AVB.
> >> getReturns <- function(norm = FALSE)
> >> {
> >>   library(quantmod)
> >>
> >>   getSymbols("AVB", src = "yahoo", from = start, to = end)
> >>   length = length(  AVB$AVB.Close )
> >>   close = as.numeric( AVB$AVB.Close )
> >>   cat( "length = ", length(close ), "\n" )
> >>   for( i in 1:length-1 )
> >>   diff[i] = ((close[i+1] - close[i]) ) / close[i]
> >>   u = mean(diff)
> >>   stdDev = sd(diff)
> >>   cat( "stdDev = ", stdDev, "\n" )
> >>
> >>   if ( norm == TRUE ) {
> >>   diff = (diff - u)
> >>   diff = diff / stdDev
> >>   }
> >>   return (diff)
> >> }
> >>
> >> I would like to generalize it to work for any stock by passing in the
> >> stock symbol. So the header for the
> >> function would be:
> >>
> >> getReturns <- function(symbol, norm = FALSE)
> >>
> >> Now how do I update this line:
> >>   length = length(  AVB$AVB.Close )
> >> This statement will not work:
> >>   length = length(  symbol$AVB.Close )
> >> because the name that holds the closing price is a function of the stock
> >> symbol.
> >>
> >> Thanks,
> >> Bob
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to Generalize a Function in R

2018-08-09 Thread Peter Langfelder

If I understand it correctly, the function getSymbols creates a
variable with the name being the stock symbol. Then use the function
get(symbol) to retrieve the value of the variable whose name is
contained in the character string `symbol'. Assign that to a variable
(e.g. AVB). You may also have to modify the names of the components
you retrieve from the list AVB. For that, you can use
AVB[["AVB.Close"]] instead of AVB$AVB.Close. You can them use
something like AVB[[paste0(symbol, ".Close"]] to generalize the
retrieval of list components.

HTH,

Peter
On Thu, Aug 9, 2018 at 12:40 PM rsherry8  wrote:
>
>
> I wrote the following function:
>
> # This method gets historical stock data for the stock Avalon Bay whose
> symbol is AVB.
> getReturns <- function(norm = FALSE)
> {
>  library(quantmod)
>
>  getSymbols("AVB", src = "yahoo", from = start, to = end)
>  length = length(  AVB$AVB.Close )
>  close = as.numeric( AVB$AVB.Close )
>  cat( "length = ", length(close ), "\n" )
>  for( i in 1:length-1 )
>  diff[i] = ((close[i+1] - close[i]) ) / close[i]
>  u = mean(diff)
>  stdDev = sd(diff)
>  cat( "stdDev = ", stdDev, "\n" )
>
>  if ( norm == TRUE ) {
>  diff = (diff - u)
>  diff = diff / stdDev
>  }
>  return (diff)
> }
>
> I would like to generalize it to work for any stock by passing in the
> stock symbol. So the header for the
> function would be:
>
> getReturns <- function(symbol, norm = FALSE)
>
> Now how do I update this line:
>  length = length(  AVB$AVB.Close )
> This statement will not work:
>  length = length(  symbol$AVB.Close )
> because the name that holds the closing price is a function of the stock
> symbol.
>
> Thanks,
> Bob
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting ls() as per class...

2018-07-28 Thread Peter Langfelder

Looking at ?rm, my solution would be something like

rm(list = grep("\\.NS$", ls(), value = TRUE))

But test it since I have not tested it.

Peter


On Fri, Jul 27, 2018 at 10:58 PM akshay kulkarni  wrote:
>
> dear memebers,
>I am using R in AWS linux instance for my 
> research. I want to remove certain objects from the global environment  to 
> reduce my EBS cost..for example, I want to remove all objects of class "xts", 
> "zoo". Is there any way to automate this, instead of removing the objects one 
> by one?
>
> Basically, I want to subset  ls() according to class, and then remove that 
> subset by using rm function.
>
> I got to know about mget in SO, but that is not working in my case
>
> Also, all the above objects end with ".NS".  I came to know that you can 
> remove objects starting with a certain pattern; is there any way to remove 
> objects ending in a certain pattern?
>
> very many thanks for your time and effort...
> yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT --- grammar.

2018-06-24 Thread Peter Langfelder

I would use "the number of degrees of freedom is defined... ".

Peter
On Sun, Jun 24, 2018 at 2:46 PM Rolf Turner  wrote:
>
>
> Does/should one say "the degrees of freedom is defined to be" or "the
> degrees of freedom are defined to be"?
>
> Although value of "degrees of freedom" is a single number, the first
> formulation sounds very odd to my ear.
>
> I would like to call upon the collective wisdom of the R community to
> help me decide.
>
> Thanks, and my apologies for the off-topic post.
>
> cheers,
>
> Rolf Turner
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hacked

2018-04-17 Thread Peter Langfelder

I got some spam emails after my last post to the list, and the emails
did not seem to go through r-help. The spammers may be subscribed to
the r-help, or they get the poster emails from some of the web copies
of this list (nabble or similar).

Peter

On Tue, Apr 17, 2018 at 11:37 AM, Ulrik Stervbo  wrote:
> I asked the moderators about it. This is the reply
>
> "Other moderators have looked into this a bit and may be able to shed more
> light on it. This is a "new" tactic where the spammers appear to reply to
> the r-help post. They are not, however, going through the r-help server.
>
> It also seems that this does not happen to everyone.
>
> I am not sure how you can automatically block the spammers.
>
> Sorry I cannot be of more help."
>
> --Ulrik
>
> Jeff Newmiller  schrieb am Di., 17. Apr. 2018,
> 14:59:
>
>> Likely a spammer has joined the mailing list and is auto-replying to posts
>> made to the list. Unlikely that the list itself has been "hacked". Agree
>> that it is obnoxious.
>>
>> On April 17, 2018 5:01:10 AM PDT, Neotropical bat risk assessments <
>> neotropical.b...@gmail.com> wrote:
>> >Hi all,
>> >
>> >Site has been hacked?
>> >Bad SPAM arriving
>> >
>> >__
>> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] WGCNA package installation segmentation fault

2018-04-12 Thread Peter Langfelder

Hi all,

a user contacted me about a segfault when installing WGCNA package
dowloaded from CRAN. I also see a segfault like that on certain
installs of R.

The package passes all CRAN checks, so presumably this has something
to do with the R installation or environment. The R versions here are
not the newest but I would guess that this is not an R version issue.

I'm attaching two sessionInfo() outputs on systems where the
installation throws a segfault.

Any pointers/ideas as to what could be going wrong?

Thanks,

Peter

##

System 1:

trying URL 'https://mirrors.sorengard.com/cran/src/contrib/WGCNA_1.63.tar.gz'
Content type 'application/x-gzip' length 1153113 bytes (1.1 MB)
==
downloaded 1.1 MB

* installing *source* package ‘WGCNA’ ...
** package ‘WGCNA’ successfully unpacked and MD5 sums checked
** libs
gcc -I/mnt/mfs/cluster/bin/R-3.4/include -DNDEBUG -DWITH_THREADS
-I"/mnt/mfs/cluster/bin/R-3.4/library/Rcpp/include"
-I/usr/local/include   -fpic  -g -O2  -c corFunctions-utils.c -o
corFunctions-utils.o
gcc -I/mnt/mfs/cluster/bin/R-3.4/include -DNDEBUG -DWITH_THREADS
-I"/mnt/mfs/cluster/bin/R-3.4/library/Rcpp/include"
-I/usr/local/include   -fpic  -g -O2  -c corFunctions.c -o
corFunctions.o
gcc -I/mnt/mfs/cluster/bin/R-3.4/include -DNDEBUG -DWITH_THREADS
-I"/mnt/mfs/cluster/bin/R-3.4/library/Rcpp/include"
-I/usr/local/include   -fpic  -g -O2  -c myMatrixMultiplication.c -o
myMatrixMultiplication.o
gcc -I/mnt/mfs/cluster/bin/R-3.4/include -DNDEBUG -DWITH_THREADS
-I"/mnt/mfs/cluster/bin/R-3.4/library/Rcpp/include"
-I/usr/local/include   -fpic  -g -O2  -c networkFunctions.c -o
networkFunctions.o
g++  -I/mnt/mfs/cluster/bin/R-3.4/include -DNDEBUG -DWITH_THREADS
-I"/mnt/mfs/cluster/bin/R-3.4/library/Rcpp/include"
-I/usr/local/include   -fpic  -g -O2  -c parallelQuantile.cc -o
parallelQuantile.o
gcc -I/mnt/mfs/cluster/bin/R-3.4/include -DNDEBUG -DWITH_THREADS
-I"/mnt/mfs/cluster/bin/R-3.4/library/Rcpp/include"
-I/usr/local/include   -fpic  -g -O2  -c pivot.c -o pivot.o
g++ -shared -L/usr/local/lib -o WGCNA.so corFunctions-utils.o
corFunctions.o myMatrixMultiplication.o networkFunctions.o
parallelQuantile.o pivot.o -lpthread
installing to /mnt/mfs/cluster/bin/R-3.4/library/WGCNA/libs
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
==
*
*  Package WGCNA 1.63 loaded.
*
*Important note: It appears that your system supports multi-threading,
*but it is not enabled within WGCNA in R.
*To allow multi-threading within WGCNA with all available cores, use
*
*  allowWGCNAThreads()
*
*within R. Use disableWGCNAThreads() to disable threading if necessary.
*Alternatively, set the following environment variable on your system:
*
*  ALLOW_WGCNA_THREADS=
*
*for example
*
*  ALLOW_WGCNA_THREADS=32
*
*To set the environment variable in linux bash shell, type
*
*   export ALLOW_WGCNA_THREADS=32
*
* before running R. Other operating systems or shells will
* have a similar command to achieve the same aim.
*
==



 *** caught segfault ***
address (nil), cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault
ERROR: loading failed
* removing ‘/mnt/mfs/cluster/bin/R-3.4/library/WGCNA’

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: q("no", status = status, runLast = FALSE)
 2: do_exit(status = status)
 3: do_exit_on_error()
 4: errmsg("loading failed")
 5: do_install_source(pkg_name, instdir, pkg, desc)
 6: do_install(pkg)
 7: tools:::.install_packages()
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault

The downloaded source packages are in
‘/tmp/RtmpQ3mLx7/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages("WGCNA") :
  installation of package ‘WGCNA’ had non-zero exit status



> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /mnt/mfs/cluster/bin/R-3.4/lib/libRblas.so
LAPACK: /mnt/mfs/cluster/bin/R-3.4/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.2

Re: [R] parallel computing with foreach()

2017-12-06 Thread Peter Langfelder

Your code generates an error that has nothing to do with dopar. I have
no idea what your function stack is supposed to do; you may be
inadvertently calling utils::stack which would produce this kind of
error:

> stack(1:25, RAT = FALSE)
Error in data.frame(values = unlist(unname(x)), ind, stringsAsFactors = FALSE) :
  arguments imply differing number of rows: 25, 0

HTH,

Peter

On Wed, Dec 6, 2017 at 10:03 PM, Kumar Mainali  wrote:
> I have used foreach() for parallel computing but in the current problem, it
> is not working. Given the volume and type of the data involved in the
> analysis, I will try to give below the complete code without reproducible
> example.
>
> In short, each R environment will draw a set of separate files, perform the
> analysis and dump in separate folders.
>
> splist <- c("juoc", "juos", "jusc", "pico", "pifl", "pipo", "pire", "psme")
> covset <- c("PEN", "Thorn")
>
> foreach(i = 1:length(splist)) %:%
> foreach(j = 1:length(covset)) %dopar% {
>
> spname <- splist[i]; spname
> myTorP <- covset[j]; myTorP
>
> DataSpecies = data.frame(prsabs = rep(1, 10), lon = rep(30, 10), lat =
> rep(80, 10))
> myResp = as.numeric(DataSpecies[,1])
> myRespXY = DataSpecies[, c("lon", "lat")]
> # directory of a bunch of raster files specific to each R environment
> rastdir <- paste0(rootdir, "Current/", myTorP); rastdir
> rasterc = list.files(rastdir, pattern="\\.tif$", full.names = T)
> print(rasterc)
> myExplc = stack(rasterc, RAT=FALSE)
> }
>
> I get the following error message that most likely generates while stacking
> rasters because there are 25 rasters in the folder of each environment.
> Also, in the normal for loop, this reads all fine.
> Error in { :
>   task 1 failed - "arguments imply differing number of rows: 25, 0"
>
> Thank you.
> ᐧ
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rcpp, dyn.load and C++ problems

2017-12-03 Thread Peter Langfelder

I would go to the source, in this case Dirk Eddelbuettel's (I hope I
spelled it correctly) documentation for Rcpp:

http://dirk.eddelbuettel.com/code/rcpp/Rcpp-attributes.pdf

Note that you need to do

sourceCpp("logistic_map.cpp")

in R instead of building and dyn.load()-ing the object.

HTH,

Peter

On Sun, Dec 3, 2017 at 11:04 AM, Martin Møller Skarbiniks Pedersen
 wrote:
> On 3 December 2017 at 05:23, Eric Berger  wrote:
>
>>
>> Do a search on "Rcpp calling C++ functions from R"
>>
>
> Thanks. However search for "Rcpp calling C++ functions from R" gives a lot
> of result but I think
> some of them are outdated and others don't agree with each other.
>
> Can you point to a specific good on-line guide for me?
>
> Regards
> Martin
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mystery "158"

2017-11-21 Thread Peter Langfelder

Your data frame fam contains factors. Turn it into character strings using

fam$Family = as.character(fam$Family)

and try again. It may be helpful if you read up on R's factors, see ?factor.

HTH,

Peter

On Tue, Nov 21, 2017 at 2:14 PM, Glen Forister  wrote:
> This is a simple problem, but a mystery to me.
> I'm trying to grab $Family "Scelionidae" from one dataframe and put it into
> another dataframe occupied with NA in $Family.  The result is a "158" ends
> up there instead of Scelionidae.
> Simply put  fam$Family[1] <- least$Family[1]
>
> If I have made a mistake here, can somebody point it out.  I've included
> the simple steps I got there showing the structure and heads of the objects.
> =  add a col of NA  = Family
>> least$Family <- NA; str(least)
> 'data.frame':243 obs. of  6 variables:
>  $ sp: int  1 3 5 6 8 11 13 15 18 19 ...
>  $ Fallon: int  14 11 109 6 1 44 70 23 4 100 ...
>  $ Dimen : int  10 13 52 2 1 19 18 0 2 116 ...
>  $ Farm  : int  6 2 3 0 0 2 0 1 2 1 ...
>  $ Sums  : int  30 26 164 8 2 65 88 24 8 217 ...
>  $ Family: logi  NA NA NA NA NA NA ...
>>head(least,2)
>   sp Fallon Dimen Farm Sums Family
> 1  1 14106   30 NA
> 3  3 11132   26 NA
>>
>> #next change the property logi to char
>> least$Family <- as.character(least$Family)
>> str(least)
> 'data.frame':243 obs. of  6 variables:
>  $ sp: int  1 3 5 6 8 11 13 15 18 19 ...
>  $ Fallon: int  14 11 109 6 1 44 70 23 4 100 ...
>  $ Dimen : int  10 13 52 2 1 19 18 0 2 116 ...
>  $ Farm  : int  6 2 3 0 0 2 0 1 2 1 ...
>  $ Sums  : int  30 26 164 8 2 65 88 24 8 217 ...
>  $ Family: chr  NA NA NA NA ...
>>#  This is where I will grab the info to put into the above.
>> head(fam,2)
>  Family Sp
> 1   Scelionidae  1
> 2 Aphid  2
>>#  This shows the id of my object I want to copy
>> fam$Family[1]
> [1] Scelionidae
> 180 Levels:  ? ? = 97 ? immature ? sp sample ??  1 2 3 ... wolf?
>>
>># This shows me copying Scelionidae into dataframe least
>> least$Family[1] <- fam$Family[1]
>>
>>#Here is where I don't get what I expect, but 158
>>str(least);
> 'data.frame':243 obs. of  6 variables:
>  $ sp: int  1 3 5 6 8 11 13 15 18 19 ...
>  $ Fallon: int  14 11 109 6 1 44 70 23 4 100 ...
>  $ Dimen : int  10 13 52 2 1 19 18 0 2 116 ...
>  $ Farm  : int  6 2 3 0 0 2 0 1 2 1 ...
>  $ Sums  : int  30 26 164 8 2 65 88 24 8 217 ...
>  $ Family: chr  "158" NA NA NA ...
>>head(least, 1)
>   sp Fallon Dimen Farm Sums Family
> 1  1 14 106   30158
>>
>>#Showing what I wanted to copy still exists.
>> fam$Family[1]
> [1] Scelionidae
> 180 Levels:  ? ? = 97 ? immature ? sp sample ??  1 2 3 ... wolf?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk problems

2017-11-17 Thread Peter Langfelder

Rolf,

looking at the configure script I believe you need to specify

--with-tcl-config=/usr/lib/tcl8.6/tclConfig.sh

and similarly

--with-tk-config=

HTH,

Peter


On Fri, Nov 17, 2017 at 8:43 PM, Rolf Turner  wrote:
> On 18/11/17 17:00, Erin Hodgess wrote:
>>
>> When I have compiled from sourced on Ubuntu, I did NOT include the
>> "with-tcltk" and it worked fine.  Did you try that, please?
>
>
> In the past I have configured without using the "--with-tcltk" flag,
> and R of course built just fine.  But it *did not* have tcltk capability.
> When I wanted that capability I had to start using the
> aforesaid flag.
>
> It makes absolutely no sense that one would get tcltk capability when
> configuring without the flag but *not* get it when configuring *with* the
> flag.  If that is indeed the case then this definitely constitutes a bug in
> the "configure" system.
>
> I cannot believe that it would work to leave out the flag, but I'll try it
> just for the sake of "completeness".
>
> cheers,
>
> Rolf
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] valid package repositories

2017-10-02 Thread Peter Langfelder

On Mon, Oct 2, 2017 at 7:47 AM, Federico Calboli
 wrote:

>
> Thus my question: when can I consider a library to be properly published and 
> really publicly available?  CRAN and BioConductor are clearly gold standards. 
>  What about Github?  I am currently using the rule ‘not on CRAN == outright 
> rejection’.  If Github is as good as CRAN I will include it on my list of 
> ‘the code is available in a functional state as claimed’.

CRAN has certain rules that are necessary for CRAN to function but may
not be necessary for a package to be useful (e.g. size of data in a
non-data package, licensing, run time of examples etc). I would ask
two things from developers of a new package: 1. package is available
for download from somewhere public; 2. package passes R CMD check
without errors or warnings. Possibly also an explanation why they
cannot upload the package to CRAN or Bioconductor, but I would not
make the acceptance by CRAN or Bioconductor a condition for
publishing.

Just my humble opinion.

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting SAS Code

2017-09-29 Thread Peter Langfelder

On Fri, Sep 29, 2017 at 2:32 PM, peter dalgaard  wrote:
>
>> On 29 Sep 2017, at 22:43 , MacQueen, Don  wrote:
>>
>> I used to use SAS a lot, but I don't know what the line
>>  *Yield Champagin;
>> does.
>
> Nothing. It's a comment...

Fortune nomination!

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] building random matrices from vectors of random parameters

2017-09-27 Thread Peter Langfelder

I would try something like

n = 5
a <- rnorm(n,0.8,0.1)
so <- rnorm(n,0.5,0.1)
m <- rnorm(n,1.2,0.1)
mats = mapply(function(sa1, so1, m1) matrix(c(0,sa1*m1,so1,sa1),2,2,byrow=T),
   a, so, m, SIMPLIFY = FALSE)

> mats
[[1]]
  [,1]  [,2]
[1,] 0.000 0.9129962
[2,] 0.4963598 0.7067311

[[2]]
  [,1]  [,2]
[1,] 0.000 1.0150316
[2,] 0.5489887 0.8469046

[[3]]
  [,1]  [,2]
[1,] 0.000 0.9516137
[2,] 0.3724521 0.8306535

[[4]]
  [,1]  [,2]
[1,] 0.000 1.0525355
[2,] 0.8075108 0.8314638

[[5]]
  [,1]  [,2]
[1,] 0.000 0.9400074
[2,] 0.4803386 0.7901753

On Wed, Sep 27, 2017 at 5:47 PM, Evan Cooch  wrote:
> Suppose I have interest in a matrix with the following symbolic structure
> (specified by 3 parameters: sa, so, m):
>
> matrix(c(0,sa*m,so,sa),2,2,byrow=T)
>
> What I can't figure out is how to construct a series of matrices, where the
> elements/parameters are rnorm values. I'd like to construct separate
> matrices, with each matrix in the series using the 'next random parameter
> value'. While the following works (for generating, say, 5 such random
> matrices)
>
> replicate(5,matrix(c(0,rnorm(1,0.8,0.1)*rnorm(1,1.2,0.1),rnorm(1,0.5,0.1),rnorm(1,0.8,0.1)),2,2,byrow=T))
>
> its inelegant, and a real pain if the matrix gets large (say, 20 x 20).
>
> I'm wondering if there is an easier way. I tried
>
>> sa <- rnorm(5,0.8,0.1)
>> so <- rnorm(5,0.5,0.1)
>> m <- rnorm(5,1.2,0.1)
>
> matrix(c(0,sa*m,so,sa),2,2,byrow=T)
>
> but that only returns a single matrix, not 5 matrices as I'd like. I also
> tried several variants of the 'replicate' approach (above), but didn't
> stumble across anything that seemed to work.
>
> So, is there a better way than something like:
>
> replicate(5,matrix(c(0,rnorm(1,0.8,0.1)*rnorm(1,1.2,0.1),rnorm(1,0.5,0.1),rnorm(1,0.8,0.1)),2,2,byrow=T))
>
> Many thanks in advance...
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A problem with order() function in R

2017-07-17 Thread Peter Langfelder

I think you want rank, not order.

> x <- c(19,17,23,11)
> order(x)
[1] 4 2 1 3
> rank(x)
[1] 3 2 4 1

See help(order) and help(rank) for the difference.

Peter

On Mon, Jul 17, 2017 at 7:58 PM, Jesadaporn Pupantragul
 wrote:
> Hello r-help
> I am learning R and use R-studio.
> I create vector x <- c(19,17,23,11) and use function order(x).
> The result show [1]  4 2 1 3. Why it doesn't show [1] 3 2 4 1.
> Follow picture that i attach.
> Thank you for you answer.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in WGCNA package

2017-07-09 Thread Peter Langfelder

First, please read WGCNA FAQ at
https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html
regarding using RNA-seq and other count data.

Second, if you insist on using WGCNA on raw count data (which I don't
recommend), use something like

storage.mode(datExpr) = "double"

and try again.

Peter

On Sun, Jul 9, 2017 at 2:29 AM, Ankush Sharma  wrote:
> Dear all ,
>
> I would like to reconstruct coexpression networks from proteomic count data
> having integer values. Some internal function doesn't like to work well with
> integers.  How can this error be rectified?
>
>
>> adjacency = adjacency(datExpr, power = softPower, type = "signed");
> Error in cor(datExpr, use = "p") :
> REAL() can only be applied to a 'numeric', not a 'integer'
>
>
>
> Appreciate your help , Many thanks
>
> Best Regards,
> Ankush Sharma

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in y - ymean : non-numeric argument to binary operator

2017-05-26 Thread Peter Langfelder

This is a bit of a shot in the dark since I haven't used randomForest
in several years, but I seem to recall that running randomForest
through the formula interface was asking for trouble... Try not using
the formula interface and specify the x, y, xtest arguments directly.

Peter

On Fri, May 26, 2017 at 8:54 PM, ddkssk 909  wrote:
> I am trying to do classification with Randomforest() . the class variable
> is nominal.
>
> But I get this error
> model1 <-randomForest(Cath~.,data=trainrf)
> Error in y - ymean : non-numeric argument to binary operator
> In addition: There were 26 warnings (use warnings() to see them)
>>  model1
> Error: object 'model1' not found
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Override/Insert (Change) a value (default value) inside a function

2017-03-11 Thread Peter Langfelder

On Sat, Mar 11, 2017 at 2:11 PM, Mohammad Tanvir Ahamed via R-help
 wrote:
> Thanks for reply.
> as I said , the function in the package is like
> myplot <- function(x,y) { plot(x,y) }
>
> not like
> myplot <- function(x,y) { plot(x,y,...) }
>
> And I cant change the function inside the package!!

The easiest solution __is__ to change the function inside the package.
If the license of the package allows it and your coding skills are up
to it, download the source of the package, make the necessary
modification in the code, and use that. If you make the modifications
so that they are useful for others, you could email the package
maintainer with your code and suggest that he/she incorporates it in
the package for wider release.

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: long vectors (argument 1) are not supported in .Fortran

2017-02-03 Thread Peter Langfelder

Just to set the record straight, WGCNA is a CRAN package.

As to Ankush's question - the current WGCNA version does not support
analysis of more than about 46300 nodes (probes) in one block. You
have two options: 1. filter out some of the least-informative probes
(e.g., probes with lowest mean expression or lowest variance); 2. use
the "blockwise" approach as implemented in blockwiseModules. Set the
maxBlockSize argument to say 4, and the function will
automatically split your data into 2 blocks and run the analysis in
each block separately.

The third option is to wait a few weeks (possibly months), I do have a
WGCNA update in the works that __should__ work on blocks larger than
46300.

Best,

Peter

On Fri, Feb 3, 2017 at 7:31 AM, Bert Gunter  wrote:
> Probably wrong list. Try the Bioconductor list instead.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Feb 3, 2017 at 3:19 AM, Ankush Sharma  wrote:
>> Hi all ,
>>
>> I'm working on WGCNA on  R-3.3.1 version to reconstruct gene -gene
>> coexpression networks of 54000 probes in 230 samples on Load Sharing
>> facility (Remote computing cluster). Despite memory at dispose, I'm
>> encountering a error of allocation of memory at soft thresholding step or
>>  at TOM Similarity step.  The problem of memory allocation at soft
>> thresholding step  was corrected by allocating the required memory using 
>> [bsub
>> -R "rusage[mem=4]".
>>
>> Error Message
>>  > # Turn adjacency into topological overlap
>>
>>> TOM = TOMsimilarity(adjacency);
>>
>> Error in TOMsimilarity(adjacency) :
>>
>>   long vectors (argument 1) are not supported in .Fortran
>>
>> Calls: TOMsimilarity -> .C
>>
>> Execution halted
>>
>> Warning message:
>>
>> system call failed: Cannot allocate memory
>>
>>
>> Is there a way to run build this TOMsimilarity matrix.
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>> Best Regards,
>> Ankush Sharma,PhD
>> Visiting CASyM Postdoctoral Research fellow (CASyM Consortium, EU-FP7)
>> LISM, Institute of Clinical Physiology, Siena (Italy)
>> Experimental Oncology Unit (UOS),
>> I
>> nstitute of Clinical Physiology
>> - National Research Council,
>>  Siena (IT)
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R in raspberry Pi

2017-01-02 Thread Peter Langfelder

I can see the file under this link:

http://www.floppybunny.org/robin/web/rbook/online_chapters/r_and_the_raspberry_pi.pdf

Make sure the (English) words are not split - my first attempt
contained raspber_ry and thus it failed.

Peter

On Mon, Jan 2, 2017 at 5:10 PM, John Sorkin  wrote:
> Robin,
> Your chapter sounds very interesting. Unfortunately it appears that it is not 
> available, at least not to me.
> John
>
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology and 
>> Geriatric Medicine
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>> On Jan 2, 2017, at 5:57 PM, robinbt2  wrote:
>>
>> One of the additional free online chapters to my book provides a step by
>> step guide to downloading and running R on the Raspberry Pi:
>> http://www.floppybunny.org/robin/web/rbook/online_chapters/r_and_the_raspber
>> ry_pi.pdf
>>
>> All the best robin beaumont
>>
>> For details of the entire book go to:
>> http://www.amazon.co.uk/dp/190790431X
>> Books website: http://www.floppybunny.org/robin/web/rbook/
>> producible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:12}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gobbling up a repeating, irregular list of data

2016-11-10 Thread Peter Langfelder

It's not clear whether your numbers are tab or space-separated, I will
assume space-separated. My lowtech (and not R) solution would be to
dump the output into a text file (call it data.in), then run a sed
command to first replace two initial spaces from each line, then
replace initial spaces with 4 (if I count correctly) tabs, then
replace all contiguous blocks of spaces by tabs, something like

sed 's/^  //' data.in | sed 's/^  */\t\t\t\t/' | sed 's/  */\t/g' > data.txt

This should produce a regular 6-column table that should be readable
using standard read.delim or read.table. You will then have figure out
how to deal with the empty cells in R.

Peter

On Thu, Nov 10, 2016 at 8:26 PM, Morway, Eric  wrote:
> What would be the sophisticated R method for reading the data shown below
> into a list?  The data is output from a numerical model.  Pasting the
> second block of example R commands (at the end of the message) results in a
> failure ("Error in scan...line 2 did not have 6 elements").  I no doubt
> could cobble together some script for reading line-by-line using for loops,
> and then appending vectors with values from each line, but this strikes me
> as bad form.
>
> One final note, the lines with 6 values contain important values that
> should somehow remain associated with the data appearing in columns 5 & 6
> (the continuous data).  The first value, which is always 1, can be
> discarded, but the second value on these lines contain the time step number
> ("1.00E+00", "2.00E+00", etc.), the 3rd and 4th values are contain a depth
> and thickness, respectively. Columns 5 & 6 are a depth and water content
> pairing and should be associated with the time steps.
>
> Thanks, Eric
>
> Start of example output data (Use of an R script to read in this data below)
>
>   11.00E+00  1.24E+03  7.79E+00  1.925E-01  1.88E-01
>  3.850E-01  1.88E-01
>  5.775E-01  1.88E-01
>  7.700E-01  1.88E-01
>  9.626E-01  1.88E-01
>  1.155E+00  1.88E-01
>  1.347E+00  1.88E-01
>   12.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>  1.732E+00  2.80E-01
>  1.925E+00  2.80E-01
>  2.310E+00  2.93E-01
>  2.502E+00  2.22E-01
>  2.695E+00  1.88E-01
>  2.887E+00  1.88E-01
>   13.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>  3.850E-01  1.30E-01
>  5.775E-01  1.48E-01
>  7.701E-01  1.61E-01
>  9.626E-01  1.72E-01
>  1.155E+00  1.86E-01
>  1.347E+00  1.93E-01
>   14.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>  3.803E-01  1.80E-01
>  5.705E-01  1.38E-01
>  7.607E-01  1.32E-01
>  2.282E+00  1.86E-01
>  2.472E+00  1.98E-01
>  2.662E+00  2.00E-01
>
> Same data as above, but scan function fails.
>
> dat <- read.table(textConnection("  11.00E+00  1.24E+03  7.79E+00
>  1.925E-01  1.88E-01
>  3.850E-01  1.88E-01
>  5.775E-01  1.88E-01
>  7.700E-01  1.88E-01
>  9.626E-01  1.88E-01
>  1.155E+00  1.88E-01
>  1.347E+00  1.88E-01
>   12.00E+00  1.26E+03  7.80E+00  1.925E-01  2.80E-01
>  1.732E+00  2.80E-01
>  1.925E+00  2.80E-01
>  2.310E+00  2.93E-01
>  2.502E+00  2.22E-01
>  2.695E+00  1.88E-01
>  2.887E+00  1.88E-01
>   13.00E+00  1.28E+03  7.70E+00  1.925E-01  1.03E-01
>  3.850E-01  1.30E-01
>  5.775E-01  1.48E-01
>  7.701E-01  1.61E-01
>  9.626E-01  1.72E-01
>  1.155E+00  1.86E-01
>  1.347E+00  1.93E-01
>   14.00E+00  1.29E+03  7.60E+00  1.901E-01  1.80E-01
>  3.803E-01  1.80E-01
>  5.705E-01  1.38E-01
>

Re: [R] creating lists of random matrices

2016-11-09 Thread Peter Langfelder

Add a

simplify = FALSE

to the call to replicate, and you'll get a list.

replicate(5, matrix(rnorm(4), 2, 2), simplify = FALSE)

Peter

On Wed, Nov 9, 2016 at 10:45 AM, Rui Barradas  wrote:
> Hello,
>
> I also thought of replicate() but it creates an 2x2x5 array, not a list.
> Maybe it's all the same for the OP.
>
> Rui Barradas
>
> Em 09-11-2016 18:41, Marc Schwartz escreveu:
>>
>>
>>> On Nov 9, 2016, at 12:32 PM, Evan Cooch  wrote:
>>>
>>> So, its easy enough to create a random matrix, using something like (say)
>>>
>>> matrix(rnorm(4),2,2)
>>>
>>> which generates a (2x2) matrix with random N(0,1) in each cell.
>>>
>>> But, what I need to be able to do is create a 'list' of such random
>>> matrices, where the length of the list (i.e., the number of said random
>>> matrices I store in the list) is some variable I can pass to the function
>>> (or loop).
>>>
>>> I tried the obvious like
>>>
>>> hold <- list()
>>> for (i in 1:5) {
>>> hold[[i]] <- matrix(rnorm(4),2,2)
>>> }
>>>
>>>
>>> While this works, it seems inelegant, and I'm wondering if there is a
>>> better (more efficient) way to accomplish the same thing -- perhaps avoiding
>>> the loop.
>>>
>>> Thanks in advance...
>>
>>
>>
>> Hi,
>>
>> See ?replicate
>>
>> Example:
>>
>> ## Create a list of 5 2x2 matrices
>>
>>> replicate(5, matrix(rnorm(4), 2, 2))
>>
>> , , 1
>>
>> [,1]  [,2]
>> [1,] -0.1695775 1.0306685
>> [2,]  0.1636667 0.1044762
>>
>> , , 2
>>
>> [,1]  [,2]
>> [1,] -0.3098566 2.1758363
>> [2,] -0.8029768 0.9697776
>>
>> , , 3
>>
>> [,1]  [,2]
>> [1,]  0.5702972 0.7165806
>> [2,] -0.9731331 0.8332827
>>
>> , , 4
>>
>> [,1]   [,2]
>> [1,] -0.8089588 0.09195256
>> [2,] -0.2026994 0.67545827
>>
>> , , 5
>>
>>[,1]   [,2]
>> [1,] 0.5093008 -0.3097362
>> [2,] 0.6467358  0.3536414
>>
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] separate commands by semicolon

2016-09-17 Thread Peter Langfelder

On Sat, Sep 17, 2016 at 2:12 PM, David Winsemius  wrote:
>
>
> Not entirely clear. If you were intending to just get character output then 
> you could just use:
>
> strsplit(txt, ";")
>
> If you wanted parsing to an R expression to occur you could pass through 
> sapply and get a full accounting of the syntactic deficit using `try`:
>
> sapply(strsplit( "print(2); ls(" , ";")[[1]] , function(t) 
> {try(parse(text=t))})
> Error in parse(text = t) : :2:0: unexpected end of input
> 1:  ls(
>^
> expression(`print(2)` = print(2), ` ls(` = "Error in parse(text = t) : 
> :2:0: unexpected end of input\n1:  ls(\n   ^\n")
>

You would want to avoid splitting within character strings
(print(";")) and in comments (print(2); ls() # This prints 2; then
lists...) The comment char could also appear in a character string,
where it does not mean the start of a comment...

Not sure how to accomplish that using strsplit (or in general using
just regular expressions).

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Visualizing and clustering one half of a symmetric matrix

2016-09-15 Thread Peter Langfelder

Do not set the upper (or lower) triangle to NA. Simply supply the full
matrix to pheatmap. I am not an expert on pheatmap but looking at the
manual you should supply clustering_distance_rows = "none",
clustering_distance_cols = "none" or something like that to make
pheatmap interpret the matrix as a distance matrix. Read carefully
through the help on pheatmap to make sure the function plots what you
want it to plot.

HTH,

Peter

On Thu, Sep 15, 2016 at 7:38 PM, Khan, Saad M. (MU-Student)
 wrote:
> Hi all,
>
> I have a distance matrix (symmetric) which looks somewhat like this (only a 
> small portion shown)
>
> ENSG0101413 ENSG0176884 ENSG0185532 
> ENSG0106829
> ENSG0101413   1.000   1.000   1.000   
> 1.000
> ENSG0176884   0.328   0.258   0.260   
> 0.390
> ENSG0185532   1.000   1.000   1.000   
> 1.000
> ENSG0106829   0.684   0.443   0.531   
> 0.701
>
> These distances are custom measures that I need to cluster. Since it's a 
> symmetric matrix I only need to consider one half triangle of the matrix. So 
> I do something like this :-
>
> newmat <- ensembl_copygosimmat
> newmat[upper.tri(ensembl_copygosimmat)] <- NA
>
> Then I wanted to visualize how the lower triangle looked using pheatmap which 
> does hierarchical clustering itself.
>
> library(pheatmap)
> pheatmap(newmat)
>
> But since there are NA values in the matrix (in the upper half) it always 
> throws an error. I was wondering what would be the ideal way to visualize as 
> well as cluster such a matrix.
>
> Regards
> Saad
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Efficiently parallelize across columns of a data.table

2016-08-19 Thread Peter Langfelder

Last time I looked (admittedly a few years back), on unix-alikes
(which you seem to be using, based on your use of top),
foreach/doParallel used forking. This means each worker gets a copy of
the entire R session, __but__ modern operating systems do not actually
copy on spawn, they only copy on write (i.e., when the worker process
starts modifying the existing variables). I believe top shows memory
use as if the copy actually occurred (what the operating system
promises to each worker).

I would run the code and monitor usage of swap space - as long as the
system isn't swapping to disk, I would not worry about copying the
table to every slave node, since the copy doesn't really happen unless
the worker processes modify the table.

HTH,

Peter

On Fri, Aug 19, 2016 at 11:22 AM, Rebecca Payne  wrote:
> I am trying to parallelize a task across columns of a data.table using
> foreach and doParallel. My data is large relative to my system memory
> (about 40%) so I'm avoiding making any copies of the full table. The
> function I am parallelizing is pretty simple, taking as input a fixed set
> of columns and a single unique column per task. I'd like to send only the
> small subset of columns actually needed to the worker nodes. I'd also like
> the option to only send a subset of rows to the worker nodes. My initial
> attempts to parallelize did not work as expected, and seemed to copy the
> entire data.table to every worker node.
>
>
>
>
>
> ### start code ###
>
> library(data.table)
>
> library(foreach)
>
> library(doParallel)
>
> registerDoParallel()
>
>
>
> anotherVar = "Y"
>
> someVars = paste0("X", seq(1:20))
>
> N = 1
>
> # I've chosen N such that my Rsession consumes ~15GB of memory according to
> top right after DT is created
>
> DT = as.data.table(matrix(rnorm(21*N), ncol=21))
>
> setnames(DT, c(anotherVar, someVars))
>
>
>
> MyFun = function(inDT, inX, inY){
>
>   cor(inDT[[inX]], inDT[[inY]])
>
> }
>
>
>
> #Warning: Will throw an error on the mac GUI
>
> corrWithY_1 = foreach(i = 1:length(someVars), .combine = c) %dopar%
>
>   MyFun(DT[,c(anotherVar, someVars[i]), with=FALSE], someVars[i],
> anotherVar)
>
> # Watching top, all of the slave nodes also appear to consume the full
> ~15Gb of system memory
>
>
>
> gc()
>
>
>
> # So I tried creating an entirely separate subset of DT to send to the
> slave nodes, and then removing it by hand.
>
> # This task, too, appears to take ~15GB of memory per slave node according
> to top.
>
>
>
> MyFun2 = function(DT, anotherVar, uniqueVar){
>
>   tmpData = DT[, c(anotherVar, uniqueVar), with=FALSE]
>
>   out = MyFun(tmpData, anotherVar, uniqueVar)
>
>   rm(tmpData)
>
>   return(out)
>
> }
>
>
>
> corrWithY_2 = foreach(i = 1:length(someVars), .combine = c) %dopar%
>
>   MyFun2(DT, anotherVar, someVars[i])
>
>
>
> ### end code ###
>
>
>
> Another thing I've tried is to send only the name of DT and it's
> environment to the slave nodes, but `get`doesn't seem to be able to only
> get a subset of rows from DT, as I would need to do frequently
>
>
>
> Questions:
>
> 1. Is top accurately reflecting my R session's memory usage?
>
> 2. If so, is there a way to parallelize over the columns of a data.table
> without copying the entire table to every slave node?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reduce woes

2016-07-27 Thread Peter Langfelder

If you have a simple list of vectors (call it lst), use

lengths = sapply(lst, length)

In general, you may want to look at functions lapply and sapply which
apply a function over a list, in this case the function length().

Peter

On Wed, Jul 27, 2016 at 8:20 AM, Stefan Kruger  wrote:
> Hi -
>
> I'm new to R.
>
> In other functional languages I'm familiar with you can often seed a call
> to reduce() with a custom accumulator. Here's an example in Elixir:
>
> map = %{"one" => [1, 1], "three" => [3], "two" => [2, 2]}
> map |> Enum.reduce(%{}, fn ({k,v}, acc) -> Map.update(acc, k,
> Enum.count(v), nil) end)
> # %{"one" => 2, "three" => 1, "two" => 2}
>
> In R-terms that's reducing a list of vectors to become a new list mapping
> the names to the vector lengths.
>
> Even in JavaScript, you can do similar things:
>
> list = { one: [1, 1], three: [3], two: [2, 2] };
> var result = Object.keys(list).reduceRight(function (acc, item) {
>   acc[item] = list[item].length;
>   return acc;
> }, {});
> // result == { two: 2, three: 1, one: 2 }
>
> In R, from what I can gather, Reduce() is restricted such that any init
> value you feed it is required to be of the same type as the elements of the
> vector you're reducing -- so I can't build up. So whilst I can do, say
>
>> Reduce(function(acc, item) { acc + item }, c(1,2,3,4,5), 96)
> [1] 111
>
> I can't use Reduce to build up a list, vector or data frame?
>
> What am I missing?
>
> Many thanks for any pointers,
>
> Stefan
>
>
>
> --
> Stefan Kruger 
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: How to make the "apply" faster

2016-07-09 Thread Peter Langfelder

Forgot to cc the list...

-- Forwarded message --
From: Peter Langfelder <peter.langfel...@gmail.com>
Date: Sat, Jul 9, 2016 at 1:32 PM
Subject: Re: [R] How to make the "apply" faster
To: Debasish Pai Mazumder <pai1...@gmail.com>

You could try the following (I haven't tested it so check that the
results make sense):

indicator = x>=70;
new = apply(indicator, c(1,2,4), sum);

If you could find a way to make the dimension over which you sum the
last (or first), you could use rowSums (or colSums) on indicator which
would be orders of magnitude faster.

Peter

On Sat, Jul 9, 2016 at 1:19 PM, Debasish Pai Mazumder <pai1...@gmail.com> wrote:
> I have 4-dimension array x(lat,lon,time,var)
>
> I am using "apply" to calculate over time
>  new = apply(x,c(1,2,4),FUN=function(y) {length(which(y>=70))})
>
> This is very slow. Is there anyway make it faster?
>
> -Debasish
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Element-by-element operation (adding)

2016-05-22 Thread Peter Langfelder

Two solutions...

v + matrix(b, nrow(v), ncol(v), byrow = TRUE)

or

t(apply(v, 1, `+`, b))

Peter

On Sun, May 22, 2016 at 10:39 PM, Steven Yen  wrote:
> Hi all, need help below. Thank you.
>
>  > # Matrix v is 5 x 3
>  > # Vector b is of length 3
>  > # I like to add b[1] to all element in v[,1]
>  > # I like to add b[2] to all element in v[,2]
>  > # I like to add b[3] to all element in v[,3]
>  > # as follows
>  > v<-matrix(0,nrow=5,ncol=3); v
>   [,1] [,2] [,3]
> [1,]000
> [2,]000
> [3,]000
> [4,]000
> [5,]000
>  > b<-c(0.1,0.2,0.3)
>  > cbind(
> + (b[1]+v[,1]),
> + (b[2]+v[,2]),
> + (b[3]+v[,3]))
>   [,1] [,2] [,3]
> [1,]  0.1  0.2  0.3
> [2,]  0.1  0.2  0.3
> [3,]  0.1  0.2  0.3
> [4,]  0.1  0.2  0.3
> [5,]  0.1  0.2  0.3
>  > # I am obviously not using sapply correctly:
>  > as.data.frame(sapply(b,"+",v))
>  V1  V2  V3
> 1  0.1 0.2 0.3
> 2  0.1 0.2 0.3
> 3  0.1 0.2 0.3
> 4  0.1 0.2 0.3
> 5  0.1 0.2 0.3
> 6  0.1 0.2 0.3
> 7  0.1 0.2 0.3
> 8  0.1 0.2 0.3
> 9  0.1 0.2 0.3
> 10 0.1 0.2 0.3
> 11 0.1 0.2 0.3
> 12 0.1 0.2 0.3
> 13 0.1 0.2 0.3
> 14 0.1 0.2 0.3
> 15 0.1 0.2 0.3
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grep command

2016-05-19 Thread Peter Langfelder

I use my own functions multiGrep and multiGrepl:

multiGrep = function(patterns, x, ..., sort = TRUE, invert = FALSE)
{
  if (invert)
  {
out = multiIntersect(lapply(patterns, grep, x, ..., invert = TRUE))
  } else
out = unique(unlist(lapply(patterns, grep, x, ..., invert = FALSE)));
  if (sort) out = sort(out);
  out;
}

multiGrepl = function(patterns, x, ...)
{
  mat = do.call(cbind, lapply(patterns, function(p)
as.numeric(grepl(p, x, ...;
  rowSums(mat)>0;
}

> multiGrep(some, all)
[1] 1 3 6

> multiGrepl(some, all)
[1]  TRUE FALSE  TRUE FALSE FALSE  TRUE

multiGrep(some, all, invert = TRUE)
[1] 2 4 5

Peter


On Thu, May 19, 2016 at 4:09 PM, Steven Yen  wrote:
> What is a good way to grep multiple strings (say in a vector)? In the
> following, I grep ants, cats, and fox separately and concatenate them,
> is there a way to grep the trio in one action? Thanks.
>
> all<-c("ants","birds","cats","dogs","elks","fox"); all
> [1] "ants"  "birds" "cats"  "dogs"  "elks"  "fox"
> some<-c("ants","cats","fox"); some
> [1] "ants" "cats" "fox"
> j<-c(
>grep(some[1],all,value=F),
>grep(some[2],all,value=F),
>grep(some[3],all,value=F)); j; all[j]
> [1] 1 3 6
> [1] "ants" "cats" "fox"
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Same sum, different sets of integers

2016-04-27 Thread Peter Langfelder

I came up with this, using recursion. Short and should work for n
greater than 9 :)

Peter

sumsToN = function(n)
{
  if (n==1) return(1);
  out = lapply(1:(n-1), function(i) {
s1 = sumsToN(n-i);
lapply(s1, c, i)
  })
  c(n, unlist(out, recursive = FALSE));
}

> sumsToN(4)
[[1]]
[1] 4

[[2]]
[1] 3 1

[[3]]
[1] 2 1 1

[[4]]
[1] 1 1 1 1

[[5]]
[1] 1 2 1

[[6]]
[1] 2 2

[[7]]
[1] 1 1 2

[[8]]
[1] 1 3

> sumsToN(5)
[[1]]
[1] 5

[[2]]
[1] 4 1

[[3]]
[1] 3 1 1

[[4]]
[1] 2 1 1 1

[[5]]
[1] 1 1 1 1 1

[[6]]
[1] 1 2 1 1

[[7]]
[1] 2 2 1

[[8]]
[1] 1 1 2 1

[[9]]
[1] 1 3 1

[[10]]
[1] 3 2

[[11]]
[1] 2 1 2

[[12]]
[1] 1 1 1 2

[[13]]
[1] 1 2 2

[[14]]
[1] 2 3

[[15]]
[1] 1 1 3

[[16]]
[1] 1 4


On Wed, Apr 27, 2016 at 6:10 PM, jim holtman  wrote:
> This is not the most efficient, but gets the idea across.  This is the
> largest sum I can compute on my laptop with 16GB of memory.  If I try to
> set N to 9, I run out of memory due to the size of the expand.grid.
>
>> N <- 8  # value to add up to
>> # create expand.grid for all combinations and convert to matrix
>> x <- as.matrix(expand.grid(rep(list(0:(N - 1)), N)))
>>
>> # generate rowSums and determine which rows add to N
>> z <- rowSums(x)
>>
>> # now extract those rows, sort and convert to strings to remove dups
>> add2N <- x[z == N, ]
>> strings <- apply(
> + t(apply(add2N, 1, sort))  # sort
> + , 1
> + , toString
> + )
>>
>> # remove dups
>> strings <- strings[!duplicated(strings)]
>>
>> # remove leading zeros
>> strings <- gsub("0, ", "", strings)
>>
>> # print out
>> cat(strings, sep = '\n')
> 1, 7
> 2, 6
> 3, 5
> 4, 4
> 1, 1, 6
> 1, 2, 5
> 1, 3, 4
> 2, 2, 4
> 2, 3, 3
> 1, 1, 1, 5
> 1, 1, 2, 4
> 1, 1, 3, 3
> 1, 2, 2, 3
> 2, 2, 2, 2
> 1, 1, 1, 1, 4
> 1, 1, 1, 2, 3
> 1, 1, 2, 2, 2
> 1, 1, 1, 1, 1, 3
> 1, 1, 1, 1, 2, 2
> 1, 1, 1, 1, 1, 1, 2
> 1, 1, 1, 1, 1, 1, 1, 1
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Wed, Apr 27, 2016 at 11:46 AM, Atte Tenkanen  wrote:
>
>> Hi,
>>
>> Do you have ideas, how to find all those different combinations of
>> integers (>0) that produce as a sum, a certain integer.
>>
>> i.e.: if that sum is
>>
>> 3, the possibilities are c(1,1,1), c(1,2), c(2,1)
>> 4, the possibilities are
>> c(1,1,1,1),c(1,1,2),c(1,2,1),c(2,1,1),c(2,2),c(1,3),c(3,1)
>>
>> etc.
>>
>> Best regards,
>>
>> Atte Tenkanen
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using apply to a data frame

2016-04-07 Thread Peter Langfelder

Use lapply or sapply. A data frame is also a list with each component
representing one column; lapply/sapply will apply the function to each
column.

Peter

On Thu, Apr 7, 2016 at 1:25 PM, John Sorkin  wrote:
>
> ‪‪I would like to apply a function, fract, to the columns of a
> dataframe. I tried the following
> apply(data5NonEventEpochs,2,fract)
> but, no surprise it did not work as apply works on matrices not data
> frames. How can I apply a fuction to the columns of a data frame? (I
> can't covert data5NonEventsEpochs to a matrix as it contains character
> data).
> Thank you,
> John
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:12}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to read ./configure messages

2016-02-01 Thread Peter Langfelder

I am not overly familar with Mint, but you need the "development
version" of the readline library. If you have a GUI package manager
installed, open it and search for readline. You should see a version
that ends with -dev or -devel; you need to install that.

HTH,

Peter

On Mon, Feb 1, 2016 at 3:06 PM, p_connolly  wrote:
>
> I've installed R from the tgz file since about R-0.9.x following the
> INSTALL instructions and have always succeeded using rpm-based OSes.
> With each new OS, that involved installing various additional packages
> before the configure script would complete.  Figuring out which
> packages were required usually involved searching for rpms that
> supplied missing .so or .h files, dev packages or something else I
> could figure out.
>
> I'm now trying to do the same with LinuxMint 17.2 but I got stuck when
> this message came up:
>
>checking for main in -ltermlib... no
>checking for rl_callback_read_char in -lreadline... no
>checking for history_truncate_file... no
>configure: error: --with-readline=yes (default) and headers/libs are not
> available
>
> Near the bottom of the log file it shows this:
>
>configure:6747: gcc -E -I/usr/local/include conftest.c
>configure:6747: $? = 0
>configure:6761: gcc -E -I/usr/local/include conftest.c
>conftest.c:17:28: fatal error: ac_nonexistent.h: No such file or
> directory
> #include 
>^
>compilation terminated.
>configure:6761: $? = 1
>configure: failed program was:
>| /* confdefs.h */
>| #define PACKAGE_NAME "R"
>
> So I'm assuming that's behind the failure.  Searching shows the same
> problem shows up in all sorts of places for decades, notably cygwin
> users.  But I didn't see anything that would help to work out what is
> missing.
>
> Ideas greatly appreciated.
>
>
> best
> Patrick
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Thread parallelism and memory management on shared-memory supercomputers

2015-12-30 Thread Peter Langfelder

I'm not really an expert, but here are my 2 cents:

To the best of my limited knowlede, there is no direct way of ensuring
that the total memory being requested by N workers remains below a
certain threshold. You can control the number of child processes
forked by foreach/doPar in the registerDoParallel call using argument
'cores'. The parallel computation implemented in parallel and
foreach/doPar uses process forking (at least last time I checked it
did). When a process is forked, the entire memory of its parent is
"forked" as well (not sure what the right terms is). This does not
mean a real copy (modern systems use copy-on-write), but for the OS
memory management purposes each child occupies as much memory as the
parent.

If you want to benchmark your memory usage, run a single (non-forked)
process and at the end, look at the output of gc() which gives you,
among other things, maximum memory usage. For a more detailed
information on memory usage, you can run Rprof, tracemem, or Rprofmem,
see their help for details.

To decrease memory usage, you will have to optimize your code and
perhaps sprinkle in garbage collection (gc()) calls after large object
manipulations. Just be aware that garbage collection is rather slow,
so you don't want to do it too often.

The difference between the cluster and your laptop may be that on the
laptop the system doesn't care so much about how much memory each
child uses, so you can fork a process with a large memory footprint as
long as you don't cause copying by modifying large chunks of memory.

HTH,

Peter

On Wed, Dec 30, 2015 at 9:36 AM, Andrew Crane-Droesch
 wrote:
> I've got allocations on a couple of shared memory supercomputers, which I
> use to run computationally-intensive scripts on multiple cores of the same
> node.  I've got 24 cores on the one, and 48 on the other.
>
> In both cases, there is a hard memory limit, which is shared among the cores
> in the node.  In the latter, the limit is 255G. If my job requests more than
> that, the job gets aborted.
>
> Now, I don't fully understand resource allocation in these sorts of systems.
> But I do get that the sort of "thread parallelism" done by e.g. the
> `parallel` package in R isn't identical to the sort of parallelism commonly
> done in lower-level languages.  For example, when I request a node, I only
> ask for one of its cores.  My R script then detects the number of cores on
> the node, and farms out tasks to the cores via the `foreach` package.  My
> understanding is that lower-level languages need the number of cores to be
> specified in the shell script, and a particular job script is given directly
> to each worker.
>
> My problem is that my parallel-calling R script is crashing the cluster,
> which terminates my script because the sum of the memory being requested by
> each thread is greater than what I'm allocated. I don't get this problem
> when running on my laptop's 4 cores, presumably because my laptop has a
> higher ratio of memory/core.
>
> My question:  how can I ensure that the total memory being requested by N
> workers remains below a certain threshold?  Is this even possible?  If not,
> is it possible to benchmark a process locally, collecting the maximum
> per-worker memory requested, and use this to back out the number of workers
> that I can request for a given node's memory limit?
>
> Thanks in advance!
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do we do correlation for big matrices?

2015-12-26 Thread Peter Langfelder

My guess is that a mapply would take forever to run. I would split it
up into smaller blocks - not too large so the calculation can fit into
the RAM, and not too small to make the calculation tun too long. Say
500 columns per block, that way each correlation matrix takes up
500*500*8 bytes = 1.9 MB, so a even the full 1000 blocks would fit
into a reasonably sized RAM (hopefully R will do a garbage collection
from time to time anyway). At the risk of tooting my own horn,

library(WGCNA) ## For allocateJobs
n = ncol(df1)
blocks = allocateJobs(n, 1000) # With 1000 blocks, roughly 500 columns
per block...
results.lst = lapply(blocks, function(index) diag(cor(df1[, index],
df2[, index])));
result = unlist(results.lst)

I haven't tested this code, but it shouldn't be too far from correct.

On Sat, Dec 26, 2015 at 11:14 AM, William Dunlap via R-help
 wrote:
> Since you only want the diagonal of the correlation matrix, the following
> will probably
> do the job using less memory.  The mapply versions works on the data.frames
> you supplied, but will not work on matrices - be careful not to conflate
> the two classes of data objects.
>
>   > vapply(colnames(df1), function(i)cor(df1[,i],df2[,i]), 0)
>  site1site2site3site4site5
>  site6site7
>   -0.540644946  0.006898188 -0.035279748 -0.261648270  0.274059055
> -0.076396648   -0.147696334
>  site8site9   site10
>   -0.138916728  0.330632540  0.366095090
>   > mapply(FUN=cor, df1, df2)
>  site1site2site3site4site5
>  site6site7
>   -0.540644946  0.006898188 -0.035279748 -0.261648270  0.274059055
> -0.076396648   -0.147696334
>  site8site9   site10
>   -0.138916728  0.330632540  0.366095090
> Compare to your:
>   > diag(cor(df1,df2))
>  site1site2site3site4site5
>  site6site7
>   -0.540644946  0.006898188 -0.035279748 -0.261648270  0.274059055
> -0.076396648   -0.147696334
>  site8site9   site10
>   -0.138916728  0.330632540  0.366095090
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Sat, Dec 26, 2015 at 10:55 AM, Marna Wagley 
> wrote:
>
>> Hi R users,
>> I have a very big two matrices of 12 columns and over 0.5 million columns
>> (50,4710) and trying to get correlation value between two tables but I
>> could not compute it because of big files.
>> Would you give me any suggestion on how I can do the correlations for the
>> big files?
>>
>> I used the following codes and the example data.
>>
>> df1<-structure(list(X = structure(c(1L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
>> 12L, 2L, 3L, 4L), .Label = c("env1", "env10", "env11", "env12",
>> "env2", "env3", "env4", "env5", "env6", "env7", "env8", "env9"
>> ), class = "factor"), site1 = c(0.38, 0.83, 0.53, 0.48, 0.66,
>> 0.09, 0.21, 0.02, 0.76, 0.62, 0.2, 0.47), site2 = c(0.19, 0.14,
>> 0.66, 0.35, 0.18, 0.24, 0.18, 0.2, 0.86, 0.06, 0.51, 0.29), site3 = c(0.95,
>> 0.51, 0.91, 0.48, 0.74, 0.67, 0.34, 0.72, 0.43, 0.49, 0.1, 0.48
>> ), site4 = c(0.89, 0.54, 0.93, 0.18, 0.99, 0.21, 0.69, 0.29,
>> 0.89, 0.84, 0.45, 0.2), site5 = c(0.38, 0.37, 0.01, 0.26, 0.97,
>> 0.49, 0.39, 0.31, 0.14, 0.83, 0.99, 0.2), site6 = c(0.68, 0.67,
>> 0.6, 0.92, 0.01, 0.04, 0.49, 0.38, 0.5, 0.37, 0.51, 0.17), site7 = c(0.08,
>> 0.54, 0.31, 0.3, 0.77, 0.39, 0.03, 0.51, 0.28, 0.32, 0.86, 0.95
>> ), site8 = c(0.54, 0.26, 0.87, 0.91, 0.12, 0.51, 0.31, 0.67,
>> 0.69, 0.79, 0.76, 0.08), site9 = c(0.1, 0.68, 0.17, 0.44, 0.78,
>> 0.9, 0.16, 0.31, 0.13, 0.34, 0.9, 0.16), site10 = c(0.53, 0.31,
>> 0.88, 0.61, 0.92, 0.44, 0.92, 0.94, 0.55, 0.8, 0.27, 0.07)), .Names =
>> c("X",
>> "site1", "site2", "site3", "site4", "site5", "site6", "site7",
>> "site8", "site9", "site10"), class = "data.frame", row.names = c(NA,
>> -12L))
>> df1<-df1[-1]
>>
>> df2<-structure(list(X = structure(c(1L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
>> 12L, 2L, 3L, 4L), .Label = c("env1", "env10", "env11", "env12",
>> "env2", "env3", "env4", "env5", "env6", "env7", "env8", "env9"
>> ), class = "factor"), site1 = c(0.36, 0.29, 0.09, 0.07, 0.82,
>> 0.88, 0.59, 0.57, 0.2, 0.29, 0.76, 0.2), site2 = c(0.91, 0.87,
>> 0.91, 0.54, 0.53, 0.2, 0.23, 0.16, 0.42, 0.44, 0.01, 0.29), site3 = c(0.96,
>> 0.56, 0.34, 0.34, 0.6, 0.63, 0.28, 0.25, 0.73, 0.45, 0.88, 0.39
>> ), site4 = c(0.73, 0.79, 0.39, 0.59, 0.63, 0.24, 0.69, 0.94,
>> 0.07, 0.23, 0.01, 0.99), site5 = c(0.88, 0.18, 0.37, 0.24, 0.61,
>> 0.61, 0.54, 0.71, 0.12, 0.82, 0.26, 0.5), site6 = c(0.43, 0.52,
>> 0.01, 0.76, 0.41, 0.57, 0.08, 0.75, 0.82, 0.98, 0.61, 0.74),
>> site7 = c(0.84, 0.14, 0.96, 0.04, 0.41, 0.84, 0.26, 0.59,
>> 0.29, 0.3, 0.76, 0.05), site8 = c(0.12, 0.18, 0.75, 0.23,
>> 0.96, 0.64, 0.33, 0.61, 0.25, 0.13, 0.99, 0.6), site9 = c(0.26,
>> 0.58, 0.32, 0.67, 0.11, 0.8, 0.87, 0.05, 0.03, 0.47, 0.95,
>> 0.81), site10 = c(0.94, 0.63, 0.64, 0.5, 0.94, 0.75, 0.44,
>>

Re: [R] F Distribution

2015-12-21 Thread Peter Langfelder

You want to use qf which gives you the value at a given percentile. pf
gives you the p-value for a given value of F (inverse)

> qf(0.95, 1, 1)
[1] 161.4476

> pf(161.4476, 1, 1)
[1] 0.95



Peter

On Mon, Dec 21, 2015 at 11:51 AM, Robert Sherry  wrote:
>
> When I use a table, from a Schaum book, I see that for the 95 percentile,
> with v_1 = 1 and v_2 = 1 the value is 161. In the modern era,
> looking values up in a table is less than ideal. Therefore, I would expect R
> to have a function to do this and based upon my
> reading of the documentation, I would expect the following call to get the
> value I expect:
>  pf( .95,1, 1)
> However, it produces
> 0.4918373
> Therefore, I conclude that I am using the wrong function. What function
> should I use?
>
> Thanks
> Bob
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] WGCNA cluster

2015-11-18 Thread Peter Langfelder

Hi Giovanni,

please follow Tutorial I, section 3 (particularly 3d, "Summary output
of network analysis results") at
http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html
. This will show you how to output module membership of each CpG into
a file. If you want to assign different colors, you will have to do it
yourself - first choose a set of colors (you will need at least 26),
then you will need to write a simple R code to change the colors.

Best,

Peter

On Wed, Nov 18, 2015 at 3:31 AM, Calice Giovanni
 wrote:
> Hi all,
>
> I am using WGCNA for methylation level network construction e modules 
> detection.
> In my network there are 26 modules with assigned colors and numeric labels, 
> id.
>
> Which Is the best way to reassign different color to each module?
>
> How to know the elements associated to each colored module in the cluster?
>
>
> Thanks in advance, Regards
>
> Giovanni
>
>
> Laboratory of Preclinical and Translational Research
> IRCCS - CROB Oncology Referral Center of Basilicata - Italy
>
> Servizio di posta elettronica della Regione Basilicata "Powered By Microsoft 
> Exchange 2007"
> Sito web istituzionale www.regione.basilicata.it

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] c(1:n, 1:(n-1), 1:(n-2), ... , 1)

2015-09-17 Thread Peter Langfelder

Not sure if this is slicker or easier to follow than your solution,
but it is shorter :)

do.call(c, lapply(n:1, function(n1) 1:n1))

Peter

On Thu, Sep 17, 2015 at 11:19 AM, Dan D  wrote:
> Can anyone think of a slick way to create an array that looks like c(1:n,
> 1:(n-1), 1:(n-2), ... , 1)?
>
> The following works, but it's inefficient and a little hard to follow:
> n<-5
> junk<-array(1:n,dim=c(n,n))
> junk[((lower.tri(t(junk),diag=T)))[n:1,]]
>
> Any help would be greatly appreciated!
>
> -Dan
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/c-1-n-1-n-1-1-n-2-1-tp4712390.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-21 Thread Peter Langfelder

Thanks for the correction, I learned something new.

Peter

On Fri, Aug 21, 2015 at 7:32 AM, Bert Gunter bgunter.4...@gmail.com wrote:
 Inline.

 -- Bert
 Bert Gunter

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
-- Clifford Stoll


 On Thu, Aug 20, 2015 at 10:47 PM, Peter Langfelder
 peter.langfel...@gmail.com wrote:
 On Thu, Aug 20, 2015 at 10:04 PM, Bert Gunter bgunter.4...@gmail.com wrote:

 I noticed you made two data-frames, ‘my4s' and ‘my4S'. The `my4S` was 
 built with `cbind` which would create a matrix (probably a character 
 matrix) rather than a data frame.

 False. There is a data.frame method for cbind that returns a data
 frame. Don't know the specifics here, though.


 True, but does not apply here, i.e., David is correct. cbind will
 return a data frame if the first argument is a data frame. In the OP
 case, the first argument was a vector and hence cbind gives a matrix,

 False again.

 class(cbind(a=1:5,b=data.frame(a=letters[1:5],b=3:7)))

 [1] data.frame

 ##First argument a vector, but data frame is returned. Please consult
 ?cbind -- especially the data frame section -- for details.

 Again, I don't know the specifics here, and you and David may still
 well be right for what the OP did. I am only trying to correct what
 appear to me to be incorrect statements about the data.frame method of
 cbind (or rbind). Apologies if I have misinterpreted.

 Cheers,
 Bert



 of mode character if any of the inputs were character. Here's a
 short demo:

 a = data.frame(a1 = 1:10)
 # First argument a data frame, so the results is also a data frame  :
 class(cbind(a, b = 11:20))
 [1] data.frame
 # First argument is a vector, so the result is a matrix:
 class(cbind(a$a1, b = 11:20))
 [1] matrix
 mode(cbind(a$a1, b = 11:20))
 [1] numeric
 mode(cbind(a$a1, b = letters[11:20]))
 [1] character

 Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm help

2015-08-20 Thread Peter Langfelder

On Thu, Aug 20, 2015 at 10:04 PM, Bert Gunter bgunter.4...@gmail.com wrote:

 I noticed you made two data-frames, ‘my4s' and ‘my4S'. The `my4S` was built 
 with `cbind` which would create a matrix (probably a character matrix) 
 rather than a data frame.

 False. There is a data.frame method for cbind that returns a data
 frame. Don't know the specifics here, though.


True, but does not apply here, i.e., David is correct. cbind will
return a data frame if the first argument is a data frame. In the OP
case, the first argument was a vector and hence cbind gives a matrix,
of mode character if any of the inputs were character. Here's a
short demo:

 a = data.frame(a1 = 1:10)
# First argument a data frame, so the results is also a data frame  :
 class(cbind(a, b = 11:20))
[1] data.frame
# First argument is a vector, so the result is a matrix:
 class(cbind(a$a1, b = 11:20))
[1] matrix
 mode(cbind(a$a1, b = 11:20))
[1] numeric
 mode(cbind(a$a1, b = letters[11:20]))
[1] character

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie question: error message with install.packages

2015-08-20 Thread Peter Langfelder

From an older post by Uwe Ligges:

Anyway: R tried to download the package but got an html page, obviously,
hence either the mirror you are using is corrupted or someone in between
(like some proxy?) delivers html pages rather than packages...


In other words, check your proxy/internet settings, or perhaps try a
different mirror.

Peter


On Thu, Aug 20, 2015 at 10:09 AM, Peter Wicher pjwic...@gmail.com wrote:
 Many thanks.

 Yes, I am using the R.app GUI:
 [R.app GUI 1.66 (6996) x86_64-apple-darwin13.4.0]


 At startup,

 getOption(repos)
 CRAN
 @CRAN@

 Then when attempting again to install RWeka I'm able to select the mirror,
 and after that the result is the same as yours:

 getOption(repos)
CRAN
 https://cran.cnr.Berkeley.edu;

 Unfortunately the same error message happens with install.packages, for
 example:
 install.packages(err)
 Error: Line starting '!DOCTYPE HTML PUBLI ...' is malformed!

 I've confirmed that Java 8 update 60 is correctly loaded.

 Interestingly I've loaded R on my Windows machine and this error message
 doesn't happen, the packages load properly.

 Peter

 On Wed, Aug 19, 2015 at 11:17 PM, David Winsemius dwinsem...@comcast.net
 wrote:


  On Aug 19, 2015, at 9:13 PM, Peter Wicher pjwic...@gmail.com wrote:
 
  Hi,
 
  I’m starting to work my way through “Machine Learning With R” by Brett
 Lantz.
 
  Running on Mac OS X 10.10.4
 
  I’ve downloaded and installed R and the R Console comes up fine.
 
  Whenever I use the install.packages command, regardless of the package I
 get the same error message:
 
  install.packages (RWeka)
  Error: Line starting '!DOCTYPE HTML PUBLI ...' is malformed!
 
  Any idea of what is wrong and how to solve it?

 Not sure. Have never gotten that error message, and I do use OS X 10.10.3
 as well as having very recently updated to R 3.2.2.

 Are you using the R.app GUI?

 If so … What do you see when you look at Preference/Startup for the
 default CRAN repository?

 If not … What do you see when you run:

 getOption(repos”)

 I get:

   CRAN
 http://cran.cnr.Berkeley.edu;


 
  Thanks!
 
  Peter Wicher
  pjwic...@gmail.com
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Plot

2015-08-04 Thread Peter Langfelder

Try removing the line

x - x[order(x[,1], decreasing=TRUE),]


Peter

On Tue, Aug 4, 2015 at 10:58 AM, April Smith aprilgracesm...@gmail.com wrote:
 Let me just preface that everything I know about writing code for R is self
 taught so this may be really basic but I can't figure it out!

 I am using someone else code to create plots.  I would like to change the
 automatically generated colors to the same colors for every plot.  The
 current code makes the highest line in the graph black, the second highest
 line red, 3rd blue, etc, regardless of what the line represents.  I need to
 create 10 of these plots and it gets confusing when the black line means a
 different thing in each plot!   Here is the line I need to adjust, I just
 don't know how.

 lines(1:orders, x[i,], col=i)

 Here is the code in entirety:
 plot.hill - function(x, scales = c(0, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64,
Inf), ...) {
require(vegan)
nsites - if(is.null(ncol(x))) 1 else ncol(x)
x - renyi(t(x), scales=scales, hill=TRUE)
orders - length(scales)
if(nsites  1) {
   x - x[order(x[,1], decreasing=TRUE),]
   OP - matrix(.   , nsites,nsites)
   colnames(OP) - rownames(OP) - rownames(x)
   for(i in 1:(nsites-1))
  for(j in (i+1):nsites)
 if(all(x[i,]  x[j,])) {
OP[i,j] -
OP[j,i] - ^   
 }
   diag(OP) -  
   OP - as.data.frame(OP)
   cat(The arrow  or ^ points to the more diverse site:\n)
   print(OP, na.print= )
} else
   OP - NULL
plot(1:4,1:4,type=n,xlim=c(0.9,orders+0.1),ylim=range(0,x),axes=FALSE,
   ylab=Hill Diversity Numbers,xlab=Order, ...)
axis(2)
axis(1, at=1:orders, labels=scales)
if(nsites  1) {
   for(i in 1:nsites)
  lines(1:orders, x[i,], col=i)
   legend(topright, legend=row.names(x), col=1:nsites, lty=1, cex=0.7)
} else
   lines(1:orders, x)
invisible(OP)
 }

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splitting lines in R script

2015-08-02 Thread Peter Langfelder

R does not need a semicolon or other character to terminate a command;
if a line can be interpreted as a complete command, it will (first
line in your second example).

Also note that the first example may not produce what you want (if
your second example is any indication) - the result of
pbivnorm(aa,dd,tau) is added to the sum of the first two terms,
because the two minuses give a plus:

 1- -1
[1] 2

Peter

On Sun, Aug 2, 2015 at 9:05 PM, Steven Yen sye...@gmail.com wrote:
 I have a line containing summation of four components.

 # This works OK:
   p-pbivnorm(bb,dd,tau)+pbivnorm(aa,cc,tau)-
 -pbivnorm(aa,dd,tau)-pbivnorm(bb,cc,tau)

 # This produces unpredicted results without warning:
   p-pbivnorm(bb,dd,tau)+pbivnorm(aa,cc,tau)
 -pbivnorm(aa,dd,tau)-pbivnorm(bb,cc,tau)

 Is there a general rule of thumb for line breaks? Thanks you.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix manipulation

2015-07-16 Thread Peter Langfelder

Hi Terry,

maybe I'm missing something, but why not define a matrix BB = V'B;
then t(B) %*% V = t(BB), then your problem reduces to finding A such
that t(BB) %*% A = 0?

Peter

On Thu, Jul 16, 2015 at 10:28 AM, Therneau, Terry M., Ph.D.
thern...@mayo.edu wrote:
 This is as much a mathematics as an R question, in the this should be easy
 but I don't see it category.

 Assume I have a full rank p by p matrix V  (aside: V = (X'X)^{-1} for a
 particular setup), a p by k matrix B, and I want to complete an orthagonal
 basis for the space with distance function V.  That is, find A such that
 t(B) %*% V %*% A =0, where A has p rows and p-k columns.

 With V=identity this is easy. I can do it in 1-2 lines using qr(), lm(), or
 several other tools.  A part of me is quite certain that the general problem
 isn't more than 3 lines of R, but after a day of beating my head on the
 issue I still don't see it.  Math wise it looks like a simple homework
 problem in a mid level class, but I'm not currently sure that I'd pass said
 class.

 If someone could show the way I would be grateful.  Either that or assurance
 that the problem actually IS hard and I'm not as dense as I think.

 Terry T.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data$variable=factor(....) NA NA NA

2015-07-11 Thread Peter Langfelder

There are two issues here. First, your original factor seems to have 4
levels:  F,  M, F, M. Note the extra space in front of the
first two F and M. You may want to fix that first:

gender.fixed = sub( , , as.character(data$gender))

Check that everything is correct by typing

table(gender.fixed)

or

table(data$gender, gender.fixed)

Then you can convert the fixed gender back to a factor, but pay
attention to the levels:

data$gender = factor(gender.fixed, levels = c(F, M))

Hopefully this works,

Peter

On Sat, Jul 11, 2015 at 12:21 PM, Dagmar Juranková
dagmar.jura...@gmail.com wrote:
 Hello everybody, I have a problem with R.


 I uploaded a questionnaire saved as csv into R and I tried to test
 independence between two variables.



 data - read.csv(C:/Users/Me/Desktop/data.csv)   View(data) df =
 read.csv(C:/Users/Me/Desktop/data.csv) ls()
 [1] df data attributes(data$gender)
 $levels
 [1]  F  M F  M

 $class
 [1] factor


 I changed my variable gender into a factor using:


 data$gender=factor(data$gender, levels=c(1:2), labels= c( F, M),
 exclude= NA, nmax= NA).


 Then I wrote data$gender and the only thing i got was:


 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 NA NA NA NA NA NA

 [21] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 NA NA NA NA NA NA

 [41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 NA NA NA NA NA NA

 [61] NA NA NA NA NA NA NA NA

 Levels: F M


 Does anybody know why?


 -My csv doc in the column gender is filled out properly. (M=Male, F= Female)

 -My imported dataset in R is complete (all values)


 ! I have done this with a different excel document and it worked out
 without any problems. I am really clueless. I cant go further and compare
 the variables and do t-tests without this working.


 Could someone please help me out?

 Thank you.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlation matrix for pearson correlation (r,p,BH(FDR))

2015-06-18 Thread Peter Langfelder

You have multiple options. I will advertise my own solution - install
the package WGCNA, installation instructions at

http://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/#cranInstall

then you can use the function
cp = corAndPvalue(t(genes), t(features)).

You need to transpose both because the function expects variables in
columns and samples in rows.

This will give you a list whose components include 'cor' (matrix of
the correlation values) and 'p' (matrix of the Student p-values). To
get a matrix of the corresponding FDR, use

fdr = apply(cp$p, 2, p.adjust, method = fdr)

Hope this helps,

Peter


On Thu, Jun 18, 2015 at 1:19 AM, Sarah Bazzocco sarah.bazzo...@vhir.org wrote:
 This post was called help before, I changed the Subject.
 Thanks for the comments.
 Here the example: (I have the two lists saved as .csv and I can open them in 
 R)

 Sheet one- Genes (10 genes expression, not binary, meaured in 10 cell lines)
 genes
  Genes  Cell.line1 Cell.line2  Cell.line3  Cell.line4  Cell.line5
 1   KCNAB3 12.02005181 11.1400910 15.60381163 13.44151596 25.37161030
 2KCNB1  0.02457449  1.3028535  0.81538294  0.59318327  0.15332321
 3KCNB2  0.44791862  0.1060137  0.09864136  0.  0.
 4 KERA  0.06090217  0.000  0.03352993  0.03634781  0.04190912
 5   KGFLP1  0.02450101  0.000  0.  0.  0.
 6   KGFLP2  0.  0.000  0.  0.  0.
 7KHDC1  0.  0.000  0.  0.  0.
 8   KHDC1L  2.31894450  2.8252262  5.29099724  7.44183228  1.94629741
 9   KHDC3L  0.  0.000  0.  0.  0.
 10 KHDRBS1  0.  0.000  0.  0.  0.
Cell.line6 Cell.line7  Cell.line8  Cell.line9 Cell.line10
 1  8.12373424 7.67506261 24.43776341 18.332448189.224225
 2  4.18181234 1.65268403  5.98346320  1.514238070.00
 3  0.05857207 0.05945414  0.20733924  0.058309820.00
 4  0. 0.  0.07752608  0.01585643   16.664245
 5  0.02563099 0.03902548  0.  0.0.00
 6  0. 0.  0.  0.0.00
 7  0. 0.  0.  0.0.00
 8  8.56022436 7.50838343  7.17964645  3.286027290.00
 9  0. 0.  0.  0.3.598534
 10 0. 0.03081180  0.  0.2.600173

 Sheet two - features (2 features(Growth rate,drug sensitivity for 10 cell 
 lines)
 features
  Cell.line Cell.line1 Cell.line2 Cell.line3 Cell.line4 Cell.line5
 1  Growth rate NA NA NA  51.41 NA
 2 Drug sensitivity   5.03   6.57  8   1.26  3
   Cell.line6 Cell.line7 Cell.line8 Cell.line9 Cell.line10
 1  41.33  26.76  24.19 NA  NA
 2   1.40   1.88   1.33   5.059.12

 What I found:
 corr.test {psych}
 corr.test(x, y = NULL, use = 
 pairwise,method=pearson,adjust=BH,alpha=.01)
 -- I adjusted the original command to what I need (BH insted og holm) and 
 alpha=.01 insted of 0.05.

 I would be very happy, if someone could show me how to use this command, in 
 particular how to refer as x and y to the two sheets I have (Genes and 
 Features). I would take it from there.

 Thanks a lot in advance.

 Sarah






 - Original Message -
 From: Rainer Schuermann rainer.schuerm...@gmx.net
 To: Sarah Bazzocco sarah.bazzo...@vhir.org
 Sent: Thursday, 18 June, 2015 8:14:56 AM
 Subject: Re: [R] help



 Hi Sarah,



 Not an answer to our question but a piece of well intended advice:



 1. Don't post HTML but plain text. Not only that people will tell you this in 
 a sometimes not very friendly manner - using HTML actually does make posts 
 illegible in this mailing list. Code, and R _is_ code, is always plain text.



 2. Don't pose an abstract problem - this looks too much like Can you please 
 do my work for me. Show us what you have tried already, and people will 
 happily jump in and provide their thoughts and advice.



 3. Always make sure that you ave a reproducible example in your mail, and a 
 set of data of the same type and structure you are using - ideally using 
 dput().



 See further advice here



 PLEASE do read the posting guide   http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.



 and here:



 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example



 For your problem, R has an immense wealth of ideas and solutions.



 Rgds,

 Rainer







 On Wed June 17 2015 16:57:24 Sarah Bazzocco wrote:



 Hello,



 �



 I am a R-beginner and I need some help.�The question is very simple: I need 
 to do a pearson correlations (r,p-value and FDR with BH) from an Expression 
 array (with several thousand genes for lets say 20 cell lines)�with some 
 features of those cell lines.







 My problem I

Re: [R] creating a distinct zip file

2015-02-21 Thread Peter Langfelder

On Fri, Feb 20, 2015 at 6:56 PM, Rolf Turner r.tur...@auckland.ac.nz wrote:
 On 21/02/15 15:02, Jeff Newmiller wrote:

 R CMD INSTALL --build packagename


 That will create a *.tar.gz file, not a *.zip file.  The latter being
 what Erin wanted, if I understand correctly.


It depends on her system (I don't see it specified anywhere). On
Windows, R CMD INSTALL --build packagename produces a compiled .zip
file. On Mac it produces a .tgz. Haven't tried it on Linux.

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Swirl course crashes

2015-02-04 Thread Peter Langfelder

It's hard to say from your description what the situation is. The error
simply means the plot area is too small for the figure margins to fit. Try
closing the graphics (plot) window before you run the section that causes
the error, or you can try maximizing the plotting window.
You can also contact the course developers and ask them to fix the bug.

HTH,

Peter


On Wed, Feb 4, 2015 at 11:15 AM, Glen Forister gforis...@gmail.com wrote:

 Is there a way for me to use the Swirl course Data_Visualization?  This
 is my main reason for learning R, for the plotting ability but can't use
 the course.  I've already been through the following (Programming,
 Programming Alt, Getting_and_Cleaning_Data).

 I get the following
 =
 Would you like to continue with one of these lessons?

 1: Data Analysis Data Visualization
 2: R Programming Basic Building Blocks
 3: No. Let me start something new.

 Selection:
 | Attemping to load lesson dependencies.
 | Package ‘openintro’ loaded correctly!

 | Here is a dot plot created using the variable 'price' from our 'cars'
 | data set. As you may notice, the price is reported along the x-axis in
 | $1000s, and each point above the axis represents the price of one of
 | the 54 cars in our data set.

 Error in plot.new() : figure margins too large

 | Leaving swirl now. Type swirl() to resume.

 =
 Hopefully, something can be pointed out that I have forgotten.  so far all
 the courses have been great.

 --
 Glen Forister
 2319 Vernon St.
 Roseville, CA  95678

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package corpcor: Putting symmetric matrix entries in vector

2015-01-30 Thread Peter Langfelder

If you have a symmetric matrix, you can work with the upper triangle
instead of the lower one, and you get what you want by simply using

as.vector(A[upper.tri(A)])

Example:

 a = matrix(rnorm(16), 4, 4)
 A = a + t(a)
 A
   [,1]  [,2]   [,3][,4]
[1,]  0.3341294 0.5460334 -0.4388050  1.09415343
[2,]  0.5460334 0.1595501  0.3907721  0.24021833
[3,] -0.4388050 0.3907721 -0.4024922 -1.62140865
[4,]  1.0941534 0.2402183 -1.6214086  0.03987924
 as.vector(A[upper.tri(A)])
[1]  0.5460334 -0.4388050  0.3907721  1.0941534  0.2402183 -1.6214086

No need to play with potentially error-prone index vectors; upper.tri
does that for you.

Hope this helps,

Peter

On Fri, Jan 30, 2015 at 3:03 PM, Steven Yen sye...@gmail.com wrote:
 Dear
 I use sm2vec from package corpcor to puts the lower triagonal entries of a
 symmetric matrix (matrix A) into a vector. However, sm2vec goes downward
 (columnwise, vector B), but I would like it to go across (rowwise). So I
 define a vector to re-map the vector (vector C). This works. But is there a
 short-cut (simpler way)? Thank you.

 A-cor(e); A
 [,1]   [,2][,3][,4]   [,5][,6]
 [1,]  1.  0.5240809  0.47996616  0.11200672 -0.1751103 -0.09276455
 [2,]  0.52408090  1.000  0.54135982 -0.15985028 -0.2627738 -0.14184545
 [3,]  0.47996616  0.5413598  1. -0.06823105 -0.2046897 -0.23815967
 [4,]  0.11200672 -0.1598503 -0.06823105  1.  0.2211311  0.08977677
 [5,] -0.17511026 -0.2627738 -0.20468966  0.22113112  1.000  0.23567235
 [6,] -0.09276455 -0.1418455 -0.23815967  0.08977677  0.2356724  1.
 B-sm2vec(A); B
  [1]  0.52408090  0.47996616  0.11200672 -0.17511026 -0.09276455
  [6]  0.54135982 -0.15985028 -0.26277383 -0.14184545 -0.06823105
 [11] -0.20468966 -0.23815967  0.22113112  0.08977677  0.23567235
 jj-c(1,2,6,3,7,10,4,8,11,13,5,9,12,14,15)
 C-B[jj]; C
  [1]  0.52408090  0.47996616  0.54135982  0.11200672 -0.15985028
  [6] -0.06823105 -0.17511026 -0.26277383 -0.20468966  0.22113112
 [11] -0.09276455 -0.14184545 -0.23815967  0.08977677  0.23567235

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix element-by-element multiplication

2015-01-07 Thread Peter Langfelder

You can create a suitable matrix bb as below (note the byrow = TRUE argument)

aa-matrix(1:30,nrow=10,ncol=3); aa
bb-matrix(c(100,100,1),nrow=10,ncol=3, byrow = TRUE); bb
dim(aa)
dim(bb)
aa * bb


You can also use matrix multiplication, but that;s slightly more involved:

aa-matrix(1:30,nrow=10,ncol=3); aa
bb-matrix(0,nrow=3,ncol=3);
diag(bb) = c(100,100,1);
bb
dim(aa)
dim(bb)
aa %*% bb


HTH,

Peter



On Wed, Jan 7, 2015 at 3:05 PM, Steven Yen sye...@gmail.com wrote:
 I like to multiple the first and second column of a 10 x 3 matrix by 100.
 The following did not work. I need this in an operation with a much larger
 scale. Any help?

 aa-matrix(1:30,nrow=10,ncol=3); aa
 bb-matrix(c(100,100,1),nrow=1,ncol=3); bb
 dim(aa)
 dim(bb)
 aa*bb

 Results:

 aa-matrix(1:30,nrow=10,ncol=3); aa
   [,1] [,2] [,3]
  [1,]1   11   21
  [2,]2   12   22
  [3,]3   13   23
  [4,]4   14   24
  [5,]5   15   25
  [6,]6   16   26
  [7,]7   17   27
  [8,]8   18   28
  [9,]9   19   29
 [10,]   10   20   30
 bb-matrix(c(100,100,1),nrow=1,ncol=3); bb
  [,1] [,2] [,3]
 [1,]  100  1001
 dim(aa)
 [1] 10  3
 dim(bb)
 [1] 1 3
 aa*bb
 Error in aa * bb : non-conformable arrays



 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix element-by-element multiplication

2015-01-07 Thread Peter Langfelder

On Wed, Jan 7, 2015 at 3:15 PM, Peter Langfelder
peter.langfel...@gmail.com wrote:
 You can create a suitable matrix bb as below (note the byrow = TRUE argument)

 aa-matrix(1:30,nrow=10,ncol=3); aa
 bb-matrix(c(100,100,1),nrow=10,ncol=3, byrow = TRUE); bb
 dim(aa)
 dim(bb)
 aa * bb


 You can also use matrix multiplication, but that;s slightly more involved:

I should add that it will also be much slower if, as you say, you do
it on a much larger scale and the dimensions of bb are large.

Peter

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading a rda file for predicton

2014-10-13 Thread Peter Langfelder

see help(load) and pay particular attention to what the function
returns: the names of the loaded objects, not the object(s)
themselves.

You have to use

predict(fit,Testsamp,type=response)

since the load() created a variable 'fit' (same name as the one saved).

HTH

Peter



On Mon, Oct 13, 2014 at 6:37 AM, TJUN KIAT TEO teotj...@hotmail.com wrote:
 I tried this

 
 fit-glm(Pred~Pressure+MissingStep, data = Test, family=binomial)

 save(fit,file=pred.rda)

 pred-load(pred.rda)


 predict(pred,Testsamp,type=response)

 

 But got this error message

  no applicable method for 'predict' applied to an object of class character

 What did I do wrong?


 Tjun Kiat


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with PredicABEL

2014-10-03 Thread Peter Langfelder

You are getting a p-value, namely p=0. It's just that, when taken
literally, the p-values are wrong.

I'm not familiar with predictABEL, but my guess is that the p-value is
below 2e-16 or some such cutoff and gets printed as zero (the means
seem to be about 10 standard deviations away from zero, which would
give a p-value of 1e-24, plus or minus a few orders of magnitude).

You may want to ask the maintainer of predictABEL what the lowest
printed p-value is (say 1e-15), then change the p=0 values to less
than 1e-15.


Peter

On Fri, Oct 3, 2014 at 8:29 AM, Evan Kransdorf evan.kransd...@gmail.com wrote:
 I am using PredictABEL to do reclassification.  When I use it to compare
 two models (+/- a new marker), I get some output without a p-valve.  Anyone
 know why this might be?

 #BEGIN R OUTPUT
  NRI(Categorical) [95% CI]: 0.0206 [ 0.0081 - 0.0332 ] ; p-value: 0.00129
  NRI(Continuous) [95% CI]: 0.1781 [ 0.1418 - 0.2144 ] ; p-value: 0
  IDI [95% CI]: 0.009 [ 0.0074 - 0.0107 ] ; p-value: 0

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to check to see if a variable is within a range of another variable

2014-10-01 Thread Peter Langfelder

On Wed, Oct 1, 2014 at 3:11 PM, Kate Ignatius kate.ignat...@gmail.com wrote:
 Is there an easy way to check whether a variable is within  +/- 10%
 range of another variable in R?

Yes,

checkRange = function(A, B, range = 0.1)
{
  A=B*(1-range)  A=B*(1+range);
}

Test:

A = c(67, 24, 40, 10, 70, 101, 9)
B = c(76, 23, 45, 12, 72, 90, 12)

outcome = checkRange(A, B)

You can create the desired data frame for example as

data.frame (A = A, B=B, C = c(no, yes)[outcome+1])


 Say, if I have a variable 'A', whether its in +/- 10% range of
 variable 'B' and if so, create another variable 'C' to say whether it
 is or not?

What do you mean by range of variable B? In your example below, 40 is
not within 10% of 45, which is 4.5; 10 is not within 10% of 12 which
is 1.2.

 eventual outcome:
 A B C
 67 76 no
 24 23 yes
 40 45 yes
 10 12 yes
 70 72 yes
 101 90 no
 9 12 no


HTH,

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] as.Date woes

2014-08-20 Thread Peter Langfelder

Hi all,

I have recently started working with Date objects and find the
experience unsettling, to put it mildly.

The help for as.Date says, in part:

 ## S3 method for class 'character'
 as.Date(x, format = , ...)

   x: An object to be converted.

  format: A character string.  If not specified, it will try
  ‘%Y-%m-%d’ then ‘%Y/%m/%d’ on the first non-‘NA’ element,
  and give an error if neither works.


If I read this correctly,

as.Date(2012-04-30) and
as.Date(2012-04-30, format = )

should give the same results, but they don't:

 as.Date(2012-04-30)
[1] 2012-04-30
 as.Date(2012-04-30, format = )
[1] 2014-08-20

Note the latter gives today's date, without any warning or message.

What method is called in the latter case?

Another issue I am running into, that is probably connected to the
'format' argument above, is trying to convert a numeric or character
in the same call. Basically, I would like to call

as.Date(object, format = , origin = 1970-1-1)

where object can be a Date, numeric or character, in the hope that the
appropriate method will be selected and will ignore unnecessary
arguments.

Here's what I get:

 as.Date( as.numeric(Sys.Date()), origin = 1970-1-1)
[1] 2014-08-20    Correct
 as.Date( as.numeric(Sys.Date()), origin = 1970-1-1, format = )
[1] 2059-04-08    ???

Excuse the coarse language, but WTF??? The first call confirms that
the origin is specified correctly, and the second gives a date removed
from the origin by twice the number of days than the actual input??

 as.numeric(Sys.Date())
[1] 16302
 as.numeric(as.Date( as.numeric(Sys.Date()), origin = 1970-1-1))
[1] 16302
 as.numeric(as.Date( as.numeric(Sys.Date()), origin = 1970-1-1, format = ))
[1] 32604


Thanks in advance for any pointers!

Peter

PS: I know my R is not the most up to date, but I haven't found
anything about Date mentioned in the changelog for the 3.x series.


 sessionInfo()
R version 3.0.2 Patched (2013-10-08 r64039)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.Date woes

2014-08-20 Thread Peter Langfelder

Never mind... the solution was to read the source code of
as.Date.character. It turns out the default format= is meaningless.
If 'format' is not given in the call to as.Date, it is NOT assumed to
be , and the function gives very different results from a call where
the argument format= is given. G...

Peter

On Wed, Aug 20, 2014 at 11:56 AM, Peter Langfelder
peter.langfel...@gmail.com wrote:
 Hi all,

 I have recently started working with Date objects and find the
 experience unsettling, to put it mildly.

 The help for as.Date says, in part:

  ## S3 method for class 'character'
  as.Date(x, format = , ...)

x: An object to be converted.

   format: A character string.  If not specified, it will try
   ‘%Y-%m-%d’ then ‘%Y/%m/%d’ on the first non-‘NA’ element,
   and give an error if neither works.


 If I read this correctly,

 as.Date(2012-04-30) and
 as.Date(2012-04-30, format = )

 should give the same results, but they don't:

 as.Date(2012-04-30)
 [1] 2012-04-30
 as.Date(2012-04-30, format = )
 [1] 2014-08-20

 Note the latter gives today's date, without any warning or message.

 What method is called in the latter case?

 Another issue I am running into, that is probably connected to the
 'format' argument above, is trying to convert a numeric or character
 in the same call. Basically, I would like to call

 as.Date(object, format = , origin = 1970-1-1)

 where object can be a Date, numeric or character, in the hope that the
 appropriate method will be selected and will ignore unnecessary
 arguments.

 Here's what I get:

 as.Date( as.numeric(Sys.Date()), origin = 1970-1-1)
 [1] 2014-08-20    Correct
 as.Date( as.numeric(Sys.Date()), origin = 1970-1-1, format = )
 [1] 2059-04-08    ???

 Excuse the coarse language, but WTF??? The first call confirms that
 the origin is specified correctly, and the second gives a date removed
 from the origin by twice the number of days than the actual input??

 as.numeric(Sys.Date())
 [1] 16302
 as.numeric(as.Date( as.numeric(Sys.Date()), origin = 1970-1-1))
 [1] 16302
 as.numeric(as.Date( as.numeric(Sys.Date()), origin = 1970-1-1, format = 
 ))
 [1] 32604


 Thanks in advance for any pointers!

 Peter

 PS: I know my R is not the most up to date, but I haven't found
 anything about Date mentioned in the changelog for the 3.x series.


 sessionInfo()
 R version 3.0.2 Patched (2013-10-08 r64039)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
  [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
  [5] LC_MONETARY=en_US.utf8LC_MESSAGES=en_US.utf8
  [7] LC_PAPER=en_US.utf8   LC_NAME=C
  [9] LC_ADDRESS=C  LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How Can SVD Reconstruct a Matrix

2014-08-14 Thread Peter Langfelder

On Wed, Aug 13, 2014 at 11:57 PM, Peter Brady
subscripti...@simonplace.net wrote:
 Hi All,

 I've inherited some R code that I can't work out what they've done.  It
 appears to work and give sort of reasonable answers, I'm just trying to
 work out why they've done what they have.  I suspect that this is a
 simple vector identity that I've just been staring at too long and have
 forgotten...

 The code:

 GGt - M0 - M1 %*% M0inv %*% t(M1)
 svdGG - svd(GGt)
 Gmat - svdGG$u %*% diag(sqrt(svdGG$d))

 It is supposed to solve:

 G*G^T = M0 - M1*M0^-1*M1^T

 for G, where G^T is the transpose of G.  It is designed to reproduce a
 numerical method described in two papers:

 Srikanthan and Pegram, Journal of Hydrology, 371 (2009) 142-153,
 Equation A13, who suggest the SVD method but don't describe the
 specifics, eg: ...G is found by singular value decomposition...

 Alternatively, Matalas (1967) Water Resources Research 3 (4) 937-945,
 Equation 17, say that the above can be solved using Principle Component
 Analysis (PCA).

 I use PCA (specifically POD) and SVD to look at the components after
 decomposition, so I'm a bit lost as to how the original matrix G can be
 constructed in this case from only the singular values and the left
 singular vectors.

GG' is a symmetric matrix, so left- and right-singular vectors are the
same. If I recall right, in general it is impossible to find G from
GG' (I denote the transpose by ') since, given an orthogonal
transformation U (that is, UU'=1), GUU'G' = GG', so you can only find
G up to multiplication with an orthogonal transformation matrix.

Since SVD decomposes a matrix X = UDV', the decomposition for GG' is

GG' = UDU'; setting S = sqrt(D) (i.e., diagonal matrix with elements
that are sqrt of those in D), GG' = USSU' = USS'U', so one solution is
G = US which is the solution used.

You could use PCA on G, which is roughly equivalent to doing SVD on
GG' (up to centering and scaling of the columns of G). I am not very
familiar with PCA in R since I always use SVD, but here's what the
help file for prcomp (PCA in R) says:

   The calculation is done by a singular value decomposition of the
 (centered and possibly scaled) data matrix, not by using ‘eigen’
 on the covariance matrix.  This is generally the preferred method
 for numerical accuracy.

HTH,

Peter


 Like I said earlier, I suspect that this is a simple
 array identity that I've forgotten.  My Google Fu is letting me down at
 this point.

 My questions:
 1) What is the proof, or where can I better find it to satisfy myself,
 that the above works?

 2) Alternatively, can anyone suggest how I could apply PCA in R to
 compute the same?

 Thanks in advance,
 -pete

 --
 Peter Brady
 Email: pdbr...@ans.com.au
 Skype: pbrady77

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] big data?

2014-08-05 Thread Peter Langfelder

Have you tried read.csv.sql from package sqldf?

Peter

On Tue, Aug 5, 2014 at 10:20 AM, Spencer Graves
spencer.gra...@structuremonitoring.com wrote:
   What tools do you like for working with tab delimited text files up to
 1.5 GB (under Windows 7 with 8 GB RAM)?


   Standard tools for smaller data sometimes grab all the available RAM,
 after which CPU usage drops to 3% ;-)


   The bigmemory project won the 2010 John Chambers Award but is not
 available (for R version 3.1.0).


   findFn(big data, 999) downloaded 961 links in 437 packages. That
 contains tools for data PostgreSQL and other formats, but I couldn't find
 anything for large tab delimited text files.


   Absent a better idea, I plan to write a function getField to extract a
 specific field from the data, then use that to split the data into 4 smaller
 files, which I think should be small enough that I can do what I want.


   Thanks,
   Spencer

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cutting hierarchical cluster tree at specific height fails

2014-07-15 Thread Peter Langfelder

Hi Johannes,

you mentioned dynamicTreeCut - the dynamic hybrid method works fine on
your data. Just supply the dissimilarity matrix as well: I use the
function plotDendroAndColors from WGCNA to show the results; if you
don't want to use WGCNA, just leave out the last call.

library(WGCNA)

set.seed(42)
x - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,80,15))
y - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,150,25))
df - data.frame(x,y)
hc - hclust(dist(df,method = euclidean), method=centroid)
dm = as.matrix(dist(df,method = euclidean))
plot(hc)
labels = cutreeDynamic(hc, distM = dm, deepSplit = 2)
# ..cutHeight not given, setting it to 115  ===  99% of the
(truncated) height range in #dendro.
#..done.
plotDendroAndColors(hc, labels)

As you see, the algorithm found 3 clusters that seem right based on
the dendrogram.

Please look carefully at the help file for cutreeDynamic since the
defaults may not be what you want.

If you absolutely want to cut at a given height, it can be done as
well, but the arguments will need some massaging.

Best,

Peter

On Mon, Jul 14, 2014 at 4:42 AM, Johannes Radinger
johannesradin...@gmail.com wrote:
 Of course,
 manually checking the number of clusters that are cut at a specific height
 (e.g. by abline())
 is one possibility. However, this only makes sense for single trees, but is
 not a feasible
 approach for multiple model runs when hundreds of trees are built with many
 cluster branches.

 Thus, I'd be nice if somebody knows a more programatic approach or another
 package
 that allows cutting centroid-trees.

 /Johannes


 On Fri, Jul 11, 2014 at 4:19 PM, David L Carlson dcarl...@tamu.edu wrote:

  The easiest workaround is the one you included in your original posting.
 Specify k= and not h=. Examine the dendrogram and decide how many clusters
 are at the level you want. You could add guidelines to the dendrogram with
 abline() to make it easier to see the number of clusters at various heights.



 plot(hc)

 abline(h=c(20, 40, 60, 80, 100, 120), lty=3)



 David C



 *From:* Johannes Radinger [mailto:johannesradin...@gmail.com]
 *Sent:* Friday, July 11, 2014 3:24 AM
 *To:* David L Carlson; R help
 *Subject:* Re: [R] Cutting hierarchical cluster tree at specific height
 fails



 Hi,



 @David: Thanks for the explanation why this does not work. This of

 course makes theoretically sense.



 However in a recent discussion

 (
 http://stats.stackexchange.com/questions/107448/spatial-distance-between-cluster-means
 )

 it was stated that the 'reversals problem' of  centroid method is

 not a serious reason to deactivate the option of 'tree cut'. Instead

 a warning message should be provided rather than a deactivation.



 So does anyone know how a tree that was created with centroid can still

 be cut at a specific height? I tried the package dynamicTreeCut, but this

 also relies on cutree and consequently raises an error when used for
 cutting

 centroid trees.



 Does anyone know a work around and can provide a minimum working example?



 /Johannes



 On Wed, Jul 9, 2014 at 4:58 PM, David L Carlson dcarl...@tamu.edu wrote:

 To cut the tree, the clustering algorithm must produce consistently
 increasing height values with no reversals. You used one of the two options
 in hclust that does not do this. Note the following from the hclust manual
 page:

 Note however, that methods median and centroid are not leading to a
 monotone distance measure, or equivalently the resulting dendrograms can
 have so called inversions (which are hard to interpret).

 The cutree manual page:

 Cutting trees at a given height is only possible for ultrametric trees
 (with monotone clustering heights).

 Use a different method (but not median).

 -
 David L Carlson
 Department of Anthropology
 Texas AM University
 College Station, TX 77840-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Johannes Radinger
 Sent: Wednesday, July 9, 2014 7:07 AM
 To: R help
 Subject: [R] Cutting hierarchical cluster tree at specific height fails

 Hi,

 I'd like to cut a hierachical cluster tree calculated with hclust at a
 specific height.
 However ever get following error message:
 Error in cutree(hc, h = 60) :
   the 'height' component of 'tree' is not sorted (increasingly)


 Here is a working example to show that when specifing a height in  cutree()
 the code fails. In contrast, specifying the number of clusters in cutree()
 works.
 What is the exact problem and how can I solve it?

 x - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,80,15))
 y - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,150,25))
 df - data.frame(x,y)
 plot(df)

 hc - hclust(dist(df,method = euclidean), method=centroid)
 plot(hc)

 df$memb - cutree(hc, h = 60) # this does not work
 df$memb - cutree(hc, k = 3) # this works!

 plot(df$x,df$y,col=df$memb)


 Thank you for your hints!

 Best regards,
 Johannes

Re: [R] odd behavior of seq()

2014-07-03 Thread Peter Langfelder

Precision, precision, precision...

 z[2]-0.15
[1] 2.775558e-17

My solution:

 z - signif(seq(.05,.85,by=.1), 5)
 z[2] - 0.15
[1] 0
 z[2]==0.15
[1] TRUE

Peter

On Thu, Jul 3, 2014 at 11:28 AM, Matthew Keller mckellerc...@gmail.com wrote:
 Hi all,

 A bit stumped here.

 z - seq(.05,.85,by=.1)
 z==.05 #good
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 z==.15  #huh
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 More generally:
 sum(z==.25)
 [1] 1
 sum(z==.35)
 [1] 0
 sum(z==.45)
 [1] 1
 sum(z==.55)
 [1] 1
 sum(z==.65)
 [1] 0
 sum(z==.75)
 [1] 0
 sum(z==.85)
 [1] 1

 Does anyone have any ideas what is going on here?

 R.Version()
 $platform
 [1] x86_64-apple-darwin9.8.0

 $arch
 [1] x86_64

 $os
 [1] darwin9.8.0

 $system
 [1] x86_64, darwin9.8.0

 $status
 [1] 

 $major
 [1] 2

 $minor
 [1] 13.1

 $year
 [1] 2011

 $month
 [1] 07

 $day
 [1] 08

 $`svn rev`
 [1] 56322

 $language
 [1] R

 $version.string
 [1] R version 2.13.1 (2011-07-08)

 --
 Matthew C Keller
 Asst. Professor of Psychology
 University of Colorado at Boulder
 www.matthewckeller.com

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 >

1 - 100 of 413 matches

Mail list logo