from:"Martin Morgan"

Re: [R] Sorting based a custom sorting function

2023-12-14 Thread Martin Morgan

In the spirit of 'advent of code', maybe it is better to exploit the features 
of the particular language you've chosen? Then the use of factors seems very 
relevant.

value_levels <- c("Small", "Medium", "Large")
df <- data.frame(
person = c("Alice", "Bob", "Bob", "Charlie"),
value = factor(
c("Medium", "Large", "Small", "Large"),
levels = value_levels
)
)
df[with(df, order(person, value)),]

Likely this is more efficient than the hints of your existing solution, because 
it will act on vectors rather than iterating through individual elements of the 
'person' and 'value' vectors.

For a more general solution, I don't think I'd follow the low-level approach 
Duncan suggests (maybe see also ?Math for S3 generics), but rather define a 
class (e.g., that requires vectors person and value) and implement a 
corresponding `xtfrm()` method.

Have fun with the remainder of the advent!

Another Martin

From: R-help  on behalf of Martin Møller 
Skarbiniks Pedersen 
Date: Thursday, December 14, 2023 at 6:42 AM
To: R mailing list 
Subject: Re: [R] Sorting based a custom sorting function
On Thu, 14 Dec 2023 at 12:02, Duncan Murdoch  wrote:
>

> class(df$value) <- "sizeclass"
>
> `>.sizeclass` <- function(left, right) custom_sort(unclass(left),
> unclass(right)) == 1
>
> `==.sizeclass` <- function(left, right) custom_sort(unclass(left),
> unclass(right)) == 0
>
> `[.sizeclass` <- function(x, i) structure(unclass(x)[i], class="sizeclass")
>
> df[order(df$value),]
>
> All the "unclass()" calls are needed to avoid infinite recursion.  For a
> more complex kind of object where you are extracting attributes to
> compare, you probably wouldn't need so many of those.

Great! Just what I need. I will create a class and overwrite > and ==.
I didn't know that order() used these exact methods.

My best solution was something like this:

quicksort <- function(arr, compare_func) {
  if (length(arr) <= 1) {
return(arr)
  } else {
pivot <- arr[[1]]
less <- arr[-1][compare_func(arr[-1], pivot) <= 0]
greater <- arr[-1][compare_func(arr[-1], pivot) > 0]
return(c(quicksort(less, compare_func), pivot, quicksort(greater,
compare_func)))
  }
}

persons <- c("alfa", "bravo", "charlie", "delta", "echo", "foxtrot", "golf",
 "hotel", "india", "juliett", "kilo", "lima", "mike", "november",
 "oscar", "papa", "quebec", "romeo", "sierra", "tango", "uniform",
 "victor", "whiskey", "x-ray", "yankee", "zulu")

quicksort(persons, function(left, right) {
  nchar(left) - nchar(right)
})

Regards
Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R does not run under latest RStudio

2023-04-06 Thread Martin Morgan

Do you mean here

https://community.rstudio.com/

? there seems to be quite a bit of activity… here's a very similar post to 
yours in the last day

https://community.rstudio.com/t/latest-rstudio-version-did-not-launch-appropriately-in-my-computer/163585

with a response from an RStudio / Posit employee.

Martin Morgan

From: R-help  on behalf of Steven Yen 

Date: Thursday, April 6, 2023 at 3:20 PM
To: Uwe Ligges 
Cc: R-help Mailing List , Steven T. Yen 
Subject: Re: [R] R does not run under latest RStudio
The RStudio list generally does not respond to free version users. I was hoping 
someone one this (R) list would be kind enough to help me.

Steven from iPhone

> On Apr 6, 2023, at 6:22 PM, Uwe Ligges  
> wrote:
>
> No, but you need to ask on an RStudio mailing list.
> This one is about R.
>
> Best,
> Uwe Ligges
>
>
>
>
>> On 06.04.2023 11:28, Steven T. Yen wrote:
>> I updated to latest RStudio (RStudio-2023.03.0-386.exe) but
>> R would not run. Error message:
>> Error Starting R
>> The R session failed to start.
>> RSTUDIO VERSION
>> RStudio 2023.03.0+386 "Cherry Blossom " (3c53477a, 2023-03-09) for Windows
>> [No error available]
>> I also tried RStudio 2022.12.0+353 --- same problem.
>> I then tried another older version of RStudio (not sure version
>> as I changed file name by accident) and R ran.
>> Any clues? Please help. Thanks.
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to access source code

2022-12-08 Thread Martin Morgan

showMethods(LGD, includeDef = TRUE) shows the implementation of all methods on 
the LGD generic, and can be a useful fast track to getting an overview of what 
is going on.

Martin Morgan

From: R-help  on behalf of Ivan Krylov 

Date: Thursday, December 8, 2022 at 11:23 AM
To: Christofer Bogaso 
Cc: r-help 
Subject: Re: [R] How to access source code
� Thu, 8 Dec 2022 20:56:12 +0530
Christofer Bogaso  �:

> > showMethods(LGD)
>
> Function: LGD (package GCPM)
>
> this="GCPM"

Almost there! Try getMethod(LGD, signature = 'GCPM').

Not sure if this is going to work as written, but if you need to see an
S4 method definition, getMethod is the way.

--
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] interval between specific characters in a string...

2022-12-02 Thread Martin Morgan

You could split the string into letters and figure out which ones are �b�

which(strsplit(x, "")[[1]] == "b")

and then find the difference between each position, �anchoring� at position 0

> diff(c(0, which(strsplit(x, "")[[1]] == "b")))
[1] 2 4 1 6 4

From: R-help  on behalf of Evan Cooch 

Date: Friday, December 2, 2022 at 6:56 PM
To: r-help@r-project.org 
Subject: [R] interval between specific characters in a string...
Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string

abaaabbabaaab

Its easy enough to count the number of times the character 'b' shows up
in the string, but...what I'm looking for is outputing the 'intervals'
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something
like unlist(gregexpr('b', target_string))), and 'do the math' between
successive positions. Can anyone suggest a more elegant approach?

Thanks in advance...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Partition vector of strings into lines of preferred width

2022-10-28 Thread Martin Morgan

> strwrap(text)
[1] "What is the best way to split/cut a vector of strings into lines of"
[2] "preferred width? I have come up with a simple solution, albeit naive,"
[3] "as it involves many arithmetic divisions. I have an alternative idea"
[4] "which avoids this problem. But I may miss some existing functionality!"

Maybe used as

> strwrap(text)  |> paste(collapse = "\n") |> cat("\n")
What is the best way to split/cut a vector of strings into lines of
preferred width? I have come up with a simple solution, albeit naive,
as it involves many arithmetic divisions. I have an alternative idea
which avoids this problem. But I may miss some existing functionality!
>

?

From: R-help  on behalf of Leonard Mada via 
R-help 
Date: Friday, October 28, 2022 at 5:42 PM
To: R-help Mailing List 
Subject: [R] Partition vector of strings into lines of preferred width
Dear R-Users,

text = "
What is the best way to split/cut a vector of strings into lines of
preferred width?
I have come up with a simple solution, albeit naive, as it involves many
arithmetic divisions.
I have an alternative idea which avoids this problem.
But I may miss some existing functionality!"

# Long vector of strings:
str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
lenWords = nchar(str);

# simple, but naive solution:
# - it involves many divisions;
cut.character.int = function(n, w) {
 ncm = cumsum(n);
 nwd = ncm %/% w;
 count = rle(nwd)$lengths;
 pos = cumsum(count);
 posS = pos[ - length(pos)] + 1;
 posS = c(1, posS);
 pos = rbind(posS, pos);
 return(pos);
}

npos = cut.character.int(lenWords, w=30);
# lets print the results;
for(id in seq(ncol(npos))) {
len = npos[2, id] - npos[1, id];
cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
}


The first solution performs an arithmetic division on all string
lengths. It is possible to find out the total length and divide only the
last element of the cumsum. Something like this should work (although it
is not properly tested).


w = 30;
cumlen = cumsum(lenWords);
max = tail(cumlen, 1) %/% w + 1;
pos = cut(cumlen, seq(0, max) * w);
count = rle(as.numeric(pos))$lengths;
# everything else is the same;
pos = cumsum(count);
posS = pos[ - length(pos)] + 1;
posS = c(1, posS);
pos = rbind(posS, pos);

npos = pos; # then print


The cut() may be optimized as well, as the cumsum is sorted ascending. I
did not evaluate the efficiency of the code either.

But do I miss some existing functionality?


Note:

- technically, the cut() function should probably return a vector of
indices (something like: rep(seq_along(count), count)), but it was more
practical to have both the start and end positions.


Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Handling dependencies on Bioconductor packages for packages on CRAN

2021-12-07 Thread Martin Morgan

One possibility is to make graph a Suggests: dependency, and preface any code 
using it (or, e.g., in an .onLoad function) with

  if (!requireNamespace("graph", quietly = TRUE))
stop(
  "install the Bioconductor 'graph' package using these commands\n\n",
  ## standard Bioconductor package installation instructions
  "  if (!requireNamespace('BiocManager', quiety = TRUE))\n",
  "install.packages('BiocManager')\n",
  "  BiocManager::install('graph')\n\n"
)

Use graph:: for any function used in the graph package. The code could be 
simplified if BiocManager were an Imports: dependency of your package -- it 
would already be installed. The 'Suggests:' dependency would not cause problems 
with CRAN, because Suggest'ed packages are available when the package is built 
/ checked.

The user experience of package installation would be 'non-standard' (didn't I 
just install gRbase??), so this is not an ideal solution.

Martin

On 12/4/21, 10:55 AM, "R-help on behalf of Søren Højsgaard" 
 wrote:

Dear all

My gRbase package imports the packages from Bioconductor: graph, RBGL and 
Rgraphviz

If these packages are not installed, then gRbase can not be installed. The 
error message is:

   ERROR: dependency ‘graph’ is not available for package ‘gRbase’

If I, prior to installation, run setRepositories and highlight 'BioC 
software', then gRbase installs as it should, because the graph package from 
Bioconductor is installed along with it. However, this extra step is an 
obstacle to many users of the package which means that either people do not use 
the package or people ask questions about this issue on stack overflow, R-help, 
by email to me etc. It is not a problem to get the package on CRAN because, I 
guess, the CRAN check machines already have the three bioconductor packages 
installed.

Therefore, I wonder if there is a way of specifying, in the DESCRIPTION 
file or elsewhere, that these packages should be installed automatically from 
bioconductor.

An alternative would be if one could force the error message

   ERROR: dependency ‘graph’ is not available for package ‘gRbase’

to be accompanied by a message about what the user then should do.

Any good suggestions? Thanks in advance.

Best regards
Søren

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create a proper S4 class?

2021-11-18 Thread Martin Morgan

From my example, 

  as(employees, "People")

more general coercion is possible; see the documentation ?setAs.

From your problem description I would have opted for the solution that you now 
have, with two slots rather than inheritance. Inheritance has a kind of weird 
contract when using another package, where you're agreeing to inherit whatever 
generics and methods are / will be defined on the object, by any package loaded 
by the user; a usual benefit of object-oriented programming is better control 
over object type, and this contract seems to total undermine that.

I actually don't know how to navigate the mysterious error; the secret is 
somewhere in the cbind / cbind2 documentation, but this seems quite opaque to 
me.

Martin



On 11/17/21, 8:00 PM, "Leonard Mada"  wrote:

Dear Martin,


thank you very much for the guidance.


Ultimately, I got it running. But, for mysterious reasons, it was 
challenging:

- I skipped for now the inheritance (and used 2 explicit non-inherited 
slots): this is still unresolved; [*]

- the code is definitely cleaner;


[*] Mysterious errors, like:

"Error in cbind(deparse.level, ...) :
   cbind for agentMatrix is only defined for 2 agentMatrices"


One last question pops up:

If B inherits from A, how can I down-cast back to A?

b = new("B", someA);

??? as.A(b) ???

Is there a direct method?

I could not explore this, as I am still struggling with the inheritance. 
The information may be useful, though: it helps in deciding the design 
of the data-structures. [Actually, all base-methods should work natively 
as well - but to have a solution in any case.]


Sincerely,


    Leonard


On 11/17/2021 5:48 PM, Martin Morgan wrote:
> Hi Leonard --
>
> Remember that a class can have 'has a' and 'is a' relationships. For 
instance, a People class might HAVE slots name and age
>
> .People <- setClass(
>  "People",
>  slots = c(name = "character", age = "numeric")
> )
>
> while an Employees class might be described as an 'is a' relationship -- 
all employeeds are people -- while also having slots like years_of_employment 
and job_title
>
> .Employees <- setClass(
>  "Employees",
>  contains = "People",
>  slots = c(years_of_employment = "numeric", job_title = "character")
> )
>
> I've used .People and .Employees to capture the return value of 
setClass(), and these can be used as constructors
>
> people <- .People(
> name = c("Simon", "Andre"),
> age = c(30, 60)
> )
>
> employees = .Employees(
>  people, # unnamed arguments are class(es) contained in 'Employees'
>  years_of_employment = c(3, 30),
>  job_title = c("hard worker", "manager")
> )
>
> I would not suggest using attributes in addition to slots. Rather, 
embrace the paradigm and represent attributes as additional slots. In practice 
it is often helpful to write a constructor function that might transform 
between formats useful for users to formats useful for programming, and that 
can be easily documented.
>
> Employees <-
>  function(name, age, years_of_employment, job_title)
> {
>  ## implement sanity checks here, or in validity methods
>  people <- .People(name = name, age = age)
>  .Employees(people, years_of_employment = years_of_employment, 
job_title = job_title)
> }
>
> plot() and lines() are both S3 generics, and the rules for S3 generics 
using S4 objects are described in the help page ?Methods_for_S3. Likely you 
will want to implement a show() method; show() is an S4 method, so see 
?Methods_Details. Typically this should use accessors rather than relying on 
direct slot access, e.g.,
>
> person_names <- function(x) x@name
> employee_names <- person_names
>
> The next method implemented is often the [ (single bracket subset) 
function; this is relatively complicated to get right, but worth exploring.
>
> I hope that gets you a little further along the road.
>
> Martin Morgan
>
> On 11/16/21, 11:34 PM, "R-help on behalf of Leonard Mada via R-help" 
 wrote:
>
>  Dear List-Members,
>
>
>  I want to create an S4 class with 2 data slots, as well as a plot 
and a
>  line method.
>
>
>  Unfortunately I lack any experience with S4 classes. I have put 
together
>  some working code - but I presume that it is not the best way to do 
it.
>

Re: [R] How to create a proper S4 class?

2021-11-17 Thread Martin Morgan

Hi Leonard --

Remember that a class can have 'has a' and 'is a' relationships. For instance, 
a People class might HAVE slots name and age

.People <- setClass(
"People",
slots = c(name = "character", age = "numeric")
)

while an Employees class might be described as an 'is a' relationship -- all 
employeeds are people -- while also having slots like years_of_employment and 
job_title

.Employees <- setClass(
"Employees",
contains = "People",
slots = c(years_of_employment = "numeric", job_title = "character")
)

I've used .People and .Employees to capture the return value of setClass(), and 
these can be used as constructors

people <- .People(
   name = c("Simon", "Andre"),
   age = c(30, 60)
)

employees = .Employees(
people, # unnamed arguments are class(es) contained in 'Employees'
years_of_employment = c(3, 30),
job_title = c("hard worker", "manager")
)

I would not suggest using attributes in addition to slots. Rather, embrace the 
paradigm and represent attributes as additional slots. In practice it is often 
helpful to write a constructor function that might transform between formats 
useful for users to formats useful for programming, and that can be easily 
documented.

Employees <-
function(name, age, years_of_employment, job_title)
{
## implement sanity checks here, or in validity methods
people <- .People(name = name, age = age)
.Employees(people, years_of_employment = years_of_employment, job_title = 
job_title)
}

plot() and lines() are both S3 generics, and the rules for S3 generics using S4 
objects are described in the help page ?Methods_for_S3. Likely you will want to 
implement a show() method; show() is an S4 method, so see ?Methods_Details. 
Typically this should use accessors rather than relying on direct slot access, 
e.g.,

person_names <- function(x) x@name
employee_names <- person_names

The next method implemented is often the [ (single bracket subset) function; 
this is relatively complicated to get right, but worth exploring.

I hope that gets you a little further along the road.

Martin Morgan

On 11/16/21, 11:34 PM, "R-help on behalf of Leonard Mada via R-help" 
 wrote:

Dear List-Members,


I want to create an S4 class with 2 data slots, as well as a plot and a 
line method.


Unfortunately I lack any experience with S4 classes. I have put together 
some working code - but I presume that it is not the best way to do it. 
The actual code is also available on Github (see below).


1.) S4 class
- should contain 2 data slots:
Slot 1: the agents:
  = agentMatrix class (defined externally, NetlogoR S4 class);
Slot 2: the path traveled by the agents:
   = a data frame: (x, y, id);
  - my current code: defines only the agents ("t");
setClass("agentsWithPath", contains = c(t="agentMatrix"));

1.b.) Attribute with colors specific for each agent
- should be probably an attribute attached to the agentMatrix and not a 
proper data slot;
Note:
- it is currently an attribute on the path data.frame, but I will 
probably change this once I get the S4 class properly implemented;
- the agentMatrix does NOT store the colors (which are stored in another 
class - but it is useful to have this information available with the 
agents);

2.) plot & line methods for this class
plot.agentsWithPath;
lines.agentsWithPath;


I somehow got stuck with the S4 class definition. Though it may be a 
good opportunity to learn about S4 classes (and it is probably better 
suited as an S4 class than polynomials).


The GitHub code draws the agents, but was somehow hacked together. For 
anyone interested:

https://github.com/discoleo/R/blob/master/Stat/ABM.Models.Particles.R


Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RSQLite slowness

2021-10-06 Thread Martin Morgan

https://support.bioconductor.org and the community slack (sign up at 
https://bioc-community.herokuapp.com/ ) as well as the general site 
https://bioconductor.org . Actually your question sounds like a SQLite question 
� JOIN a table, versus parameterized query. One could perhaps construct the 
relevant example at the sqlite command line?

Martin Morgan

On 10/6/21, 2:50 PM, "R-help"  wrote:
Thank you Bert, I set up a new thread on
BioStars [1].  So far, I'm a bit
unfamilliar with Bioconductor (but will
hopefully attend a course about it in
November, which I'm kinda hyped about),
other than installing and updating R
packages using BiocManager   Did you
think of something else than
BioStars.org when saying �Bioconductor?�

The question could be viewed as gene
related, but I think it is really about
how can one easier than with sqlite
handle large tsv files, and why is that
parser thing so slow ...  I think this
is more like a core R thing than gene
related question ...

[1] https://www.biostars.org/p/9492486/



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inverse of the methods function

2021-05-03 Thread Martin Morgan

> methods(class = "lm")
 [1] add1   alias  anova  case.names coerce
 [6] confintcooks.distance deviance   dfbeta dfbetas
[11] drop1  dummy.coef effectsextractAIC family
[16] formulahatvalues  influence  initialize kappa
[21] labels logLik model.framemodel.matrix   nobs
[26] plot   predictprint  proj   qr
[31] residuals  rstandard  rstudent   show   simulate
[36] slotsFromS3summaryvariable.names vcov
see '?methods' for accessing help and source code

Martin Morgan

On 5/3/21, 6:34 PM, "R-help on behalf of Therneau, Terry M., Ph.D. via R-help" 
 wrote:

Is there a complement to the methods function, that will list all the 
defined methods for 
a class?One solution is to look directly at the NAMESPACE file, for the 
package that 
defines it, and parse out the entries.   I was looking for something 
built-in, i.e., easier.


-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] parallel: socket connection behind a NAT router

2021-01-18 Thread Martin Morgan

A different approach uses doRedis https://CRAN.R-project.org/package=doRedis 
(currently archived, but actively developed) for use with the foreach package, 
or RedisParam https://github.com/mtmorgan/RedisParam (not released) for use 
with Bioconductor's BiocParallel package.

These use a redis server https://redis.io/ to communicate -- the manager 
submits jobs / obtains results from the redis server, the workers retrieve jobs 
/ submit results to the redis server. Manager and worker need to know the 
(http) address of the server, etc, but there are no other ports involved.

Redis servers are easy to establish in a cloud environment, using e.g., 
existing AWS or docker images. The README for doRedis 
https://github.com/bwlewis/doRedis probably provides the easiest introduction.

The (not mature) k8sredis Kubernetes / helm chart 
https://github.com/Bioconductor/k8sredis illustrates a complete system using 
RedisParam, deploying manager and workers locally or in the google cloud; the 
app could be modified to only start the workers in the cloud, exposing the 
redis server for access by a local 'manager'; this would be cool.

Martin 

On 1/19/21, 1:50 AM, "R-help on behalf of Henrik Bengtsson" 
 wrote:

On Mon, Jan 18, 2021 at 9:42 PM Jiefei Wang  wrote:
>
> Thanks for introducing this interesting package to me! it is great to 
know a new powerful tool, but it seems like this method does not work in my 
environment. ` parallelly::makeClusterPSOCK` will hang until timeout.
>
> I checked the verbose output and it looks like the parallelly package 
also depends on `parallel:::.slaveRSOCK` on the remote instance to build the 
connection. This explains why it failed for the local machine does not have a 
public IP and the remote does not know how to build the connection.

It's correct that the worker does attempt to connect back to the
parent R process that runs on your local machine.  However, it does
*not* do so by your local machines public IP address but it does it by
connecting to a port on its own machine - a port that was set up by
SSH.  More specifically, when parallelly::makeClusterPSOCK() connects
to the remote machine over SSH it also sets up a so-called reverse SSH
tunnel with a certain port on your local machine and certain port of
your remote machine.  This is what happens:

> cl <- parallelly::makeClusterPSOCK("machine1.example.org", verbose=TRUE)
[local output] Workers: [n = 1] 'machine1.example.org'
[local output] Base port: 11019
...
[local output] Starting worker #1 on 'machine1.example.org':
'/usr/bin/ssh' -R 11068:localhost:11068 machine1.example.org
"'Rscript' 
--default-packages=datasets,utils,grDevices,graphics,stats,methods
-e 'workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e)
parallel:::.workRSOCK); workRSOCK()' MASTER=localhost PORT=11068
OUT=/dev/null TIMEOUT=2592000 XDR=FALSE"
[local output] - Exit code of system() call: 0
[local output] Waiting for worker #1 on 'machine1.example.org' to
connect back  '/usr/bin/ssh' -R 11019:localhost:11019
machine1.example.org "'Rscript'
--default-packages=datasets,utils,grDevices,graphics,stats,methods -e
'workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e)
parallel:::.workRSOCK); workRSOCK()' MASTER=localhost PORT=11019
OUT=/dev/null TIMEOUT=2592000 XDR=FALSE"

All the magic is in that SSH option '-R 11068:localhost:11068' SSH
options, which allow the parent R process on your local machine to
communicate with the remote worker R process on its own port 11068,
and vice versa, the worker R process will communicate with the parent
R process as if it was running on MASTER=localhost PORT=11068.
Basically, for all that the worker R process' knows, the parent R
process runs on the same machine as itself.

You haven't said what operating system you're running on your local
machine, but if it's MS Windows, know that the 'ssh' client that comes
with Windows 10 has some bugs in its reverse tunneling.  See
?parallelly::makeClusterPSOCK for lots of details.  You also haven't
said what OS the cloud workers run, but I assume it's Linux.

So, my guesses on your setup is, the above "should work" for you.  For
your troubleshooting, you can also set argument outfile=NULL.  Then
you'll also see output from the worker R process.  There are
additional troubleshooting suggestions in Section 'Failing to set up
remote workers' of ?parallelly::makeClusterPSOCK that will help you
figure out what the problem is.

>
> I see in README the package states it works with "remote clusters without 
knowing public IP". I think this might be where the confusion is, it may mean 
the remote machine does not have a public IP, but the server machine does. I'm 
in the opposite situation, the server does not have a public IP, but the remote 
does. I'm not sure if

Re: [R] error in installing limma

2020-12-22 Thread Martin Morgan

limma is a Bioconductor package so you should use 
https://support.bioconductor.org

I'd guess that you've trimmed your screen shot just after the informative 
information. Just copy and paste as plain text the entire output of your 
installation attempt. Presumably you are using standard practices documented 
on, e.g., https://bioconductor.org/packages/limma to install packages


  BiocManager::install("limma")

Martin Morgan

On 12/22/20, 1:11 PM, "R-help on behalf of Ayushi Dwivedi" 
 wrote:

Good afternoon Sir,
 With due respect I want to convey that while installing limma package in
R, I am getting the error message, not just limma If I am installing any
package in R like biomaRt the same error message is coming it is
terminating with "installation of package ‘limma’ had non-zero exit status".
Hereby, I am attaching the screenshot of the error. Kindly, go through it.
I shall be highly obliged.







*Ayushi Dwivedi*
*Ph.D. Scholar*
*Dept. of Biotechnology & Bioinformatics,*
School of Life Sciences, University of Hyderabad,
Hyderabad - 500046 ( India ).
Phone No. :- +91 - 8858037252
Email Id :- ayushi.crea...@gmail.com* **
*
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inappropriate color name

2020-11-16 Thread Martin Morgan

Lainey wishes to report a bug, so should see ?bug.report. Mail sent to R-core 
will be held for moderator approval, and relevant input or ultimate resolution 
would not be visible to the wider community; it is not a good place to report 
bugs.

Martin Morgan

On 11/16/20, 4:48 PM, "R-help on behalf of Mitchell Maltenfort" 
 wrote:

r-c...@r-project.org. would be the first stop.



On Mon, Nov 16, 2020 at 4:37 PM Lainey Gallenberg <
laineygallenb...@gmail.com> wrote:

>  Whether or not you agree with my reason for doing so, my question was how
> to contact the creator of the "colors" function. If you do not have advice
> on this, please refrain from weighing in.
>
> On Mon, Nov 16, 2020 at 12:03 PM Bert Gunter 
> wrote:
>
> > WIth all due respect, can we end this thread NOW. This is not a forum to
> > discuss social or political viewpoints. I consider it a disservice to
> make
> > it one.
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Mon, Nov 16, 2020 at 12:54 PM Jim Lemon  wrote:
> >
> >> Hi Elaine,
> >> There seems to be a popular contest to discover offence everywhere. I
> >> don't
> >> think that it does anything against racism, sexism or
> >> antidisestablishmentarianism. Words are plucked from our vast lexicon 
to
> >> comfort or insult our fellows depending upon the intent of the user. It
> is
> >> the intent that matters, not the poor word. Chasing the words wastes
> your
> >> time, blames those who use the words harmlessly, and gives the real
> >> offender time to find another epithet.
> >>
> >> Jim
> >>
> >> On Tue, Nov 17, 2020 at 5:39 AM Lainey Gallenberg <
> >> laineygallenb...@gmail.com> wrote:
> >>
> >> > Hello,
> >> >
> >> > I'm hoping someone on here knows the appropriate place/contact for me
> to
> >> > lodge a complaint about a color name in the "colors" function. I was
> >> > shocked to see there are four named color options that include the
> term
> >> > "indianred." Surely these colors can be changed to something less
> >> > offensive- my suggestion is "blush." How can I find out who to 
contact
> >> > about making this happen?
> >> >
> >> > Thank you in advance for any suggestions.
> >> >
> >> > Sincerely,
> >> > Elaine Gallenberg
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing bioconduction packages in connection with loading an R package

2020-10-12 Thread Martin Morgan

An alternative to setRepositories() is use of (the CRAN package) 
BiocManager::install("gRbase") instead of install.packages(). BiocManager 
installs CRAN packages as well as Bioconductor packages.

Another, more transparent, solution is to use

  install.packages("gRbase", repos = BiocManager::repositories())

where the key idea is to include Bioconductor repositories explicitly. These 
approaches are preferred to  setRepositories(), because of the details of the 
twice-yearly Bioconductor release cycle, compared to the annual R release and 
patch cycles.

The usual approach to your problem is to move the package to Suggests:. But 
then the namespace commands like Imports, and the direct use of imported 
package functions, is not possible; you'll need to litter your code with fully 
resolved functions (graph::foo() instead of foo()). Also Suggests: is usually 
home to packages that have a limited role to play, but that does not seem 
likely for RBGL etc in your package.

Also, in implementing this approach one would normally check that the package 
were installed, and fail with an error message telling the user how to fix the 
problem (e.g., by installing the package). This doesn't really sound like 
progress. If you instead try to automatically install the package (in 
.onAttach(), I guess was your plan) you'll shortly run into users who need to 
use arguments to install.packages() that you have not made available to them.

Your CRAN page took me quickly to your package web site and clear installation 
instructions; I do not think use of Bioc packages is a particular barrier to 
use.

Martin Morgan

  

On 10/11/20, 2:52 PM, "R-help on behalf of Søren Højsgaard" 
 wrote:

Dear all,

My gRbase package imports functionality from the bioconductor packages 
graph, Rgraphviz and RBGL.

To make installation of gRbase easy, I would like to have these 
bioconductor packages installed in connection with installation of gRbase, but 
to do so the user must use setRepositories() to make sure that R also installs 
packages from bioconductor.

Having to call setRepositories causes what can perhaps be called an 
(unnecessary?) obstacle. Therefore I have been experimenting with deferring 
installation of these bioc-packages until gRbase is loaded the first time using 
.onAttach; please see my attempt below.

However, if the bioc-packages are not installed I can not install gRbase so 
that does not seem to be a viable approach. (The bioc-packages appear as 
Imports: in DESCRIPTION).

Can anyone tell if it is a futile approach and / or perhaps suggest a 
solution. (I would guess that there are many CRAN packages that use 
bioc-packages, so other people must have faced this challenge before).

Thanks in advance.

Best regards
S�ren





.onAttach<-function(libname, pkgname) {

## package startup check
toinstall=c(
"graph",
"Rgraphviz",
"RBGL"
)

already_installed <- sapply(toinstall, function(pkg)
requireNamespace(pkg, quietly=TRUE))

if (any(!already_installed)){
packageStartupMessage("Need to install the following package(s): ",
  toString(toinstall[!already_installed]), "\n")
}

## install if needed
if(!base::all(already_installed)){
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install(toinstall[!already_installed], 
dependencies=TRUE)
}
}



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combine filter() and select()

2020-08-20 Thread Martin Morgan

A kind of hybrid answer is to use base::subset(), which supports non-standard 
evaluation (it searches for unquoted symbols like 'files' in the code line 
below in the object that is its first argument; %>% puts 'mytbl' in that first 
position) and row (filter) and column (select) subsets

> mytbl %>% subset(files %in% "a", files)
# A tibble: 1 x 1
  files

1 a

Or subset(grepl("a", files), files) if that was what you meant.

One important idea that the tidyverse implements is, in my opinion, 
'endomorphism' -- you get back the same type of object as you put in -- so I 
wouldn't use a base R idiom that returned a vector unless that were somehow 
essential for the next step in the analysis. 

There is value in having separate functions for filter() and select(), and 
probably there are edge cases where filter(), select(), and subset() behave 
differently, but for what it's worth subset() can be used to perform these 
operations individually

> mytbl %>% subset(, files)
# A tibble: 6 x 1
  files

1 a
2 b
3 c
4 d
5 e
6 f
> mytbl %>% subset(grepl("a", files), )
# A tibble: 1 x 2
  files  prop

1 a 1

Martin Morgan

On 8/20/20, 2:48 AM, "R-help on behalf of Ivan Calandra" 
 wrote:

Hi Jeff,

The code you show is exactly what I usually do, in base R; but I wanted
to play with tidyverse to learn it (and also understand when it makes
sense and when it doesn't).

And yes, of course, in the example I gave, I end up with a 1-cell
tibble, which could be better extracted as a length-1 vector. But my
real goal is not to end up with a single value or even a single column.
I just thought that simplifying my example was the best approach to ask
for advice.

But thank you for letting me know that what I'm doing is pointless!

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 19/08/2020 19:27, Jeff Newmiller wrote:
> The whole point of dplyr primitives is to support data frames... that is, 
lists of columns. When you pare your data frame down to one column you are 
almost certainly using the wrong tool for the job.
>
> So, sure, your code works... and it even does what you wanted in the 
dplyr style, but what a pointless exercise.
>
> grep( "a", mytbl$file, value=TRUE )
>
> On August 19, 2020 7:56:32 AM PDT, Ivan Calandra  wrote:
>> Dear useRs,
>>
>> I'm new to the tidyverse world and I need some help on basic things.
>>
>> I have the following tibble:
>> mytbl <- structure(list(files = c("a", "b", "c", "d", "e", "f"), prop =
>> 1:6), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
>>
>> I want to subset the rows with "a" in the column "files", and keep only
>> that column.
>>
>> So I did:
>> myfile <- mytbl %>%
>>   filter(grepl("a", files)) %>%
>>   select(files)
>>
>> It works, but I believe there must be an easier way to combine filter()
>> and select(), right?
>>
>> Thank you!
>> Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best settings for RStudio video recording?

2020-08-13 Thread Martin Morgan

Excellent question! I think most R courses use RStudio, so it is completely 
appropriate to ask about how to help people learn R using RStudio.

I don't have a lot experience with virtual teaching, and very limited 
experience with anything other than short-term workshops.

I think that there is tremendous value, during the 'in person' portion of a 
course, in doing interactive and even 'ad hoc' analysis, perhaps especially 
handling the off-the-wall questions that participants might raise (when I have 
to struggle to figure out what the R answer is, and then convey to the 
attendees my thinking process), and making all kinds of mistakes, including 
simple typos (requiring me to explain what the error message means, and how I 
diagnosed the problem and arrived at a solution that was other than a 
pull-it-out-of-the-hat miracle).

With this in mind, I try to increase the prominence of the console portion of 
the RStudio interface. I place it at the top left of the screen (this might be 
a remnant of in-person presentations, where the heads of people in front often 
block the view of the lines where code is being enter; this is obviously not 
relevant in a virtual context). Usually I keep the script portion of the 
display visible at the bottom left, with only a few lines showing, as a kind of 
cheat sheet for me, rather than for the students to 'follow along').

I use a large font, which I think helps in both virtual and physical sessions 
in part because it limits the amount of information on the screen, causing me 
to slow my presentation enough that the students can absorb what I am saying. 
Perhaps as a consequence of the limited screen real-estate, students often ask 
'to see the last command' so I now include in the right panel the 'History' 
tab. The division is asymmetric, so the console continues to take up the 
majority of screen real estate.

The end result of a sequence of operations is often a pretty picture, but since 
this is only the end result and not the meat of the learning experience I tend 
to keep the plot window (lower right) relatively small, and try to remember to 
expand things at the time when the end result is in sight (so to speak;)).

I hope others with more direct experience are not dissuaded by Bert's opinions, 
and offer up their own experiences or resource recommendations.

Martin Morgan

On 8/13/20, 6:05 PM, "R-help on behalf of Jonathan Greenberg" 
 wrote:

Folks:

I was wondering if you all would suggest some helpful RStudio
configurations that make recording a session via e.g. zoom the most useful
for students doing remote learning.  Thoughts?

--j

-- 
Jonathan A. Greenberg, PhD
Randall Endowed Professor and Associate Professor of Remote Sensing
Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
Natural Resources & Environmental Science
University of Nevada, Reno
1664 N Virginia St MS/0186
Reno, NV 89557
Phone: 415-763-5476
https://www.gearslab.org/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting DMRs (Differentially Methylated Regions) using Gviz package in R

2020-02-07 Thread Martin Morgan

Probably have more success asking on https://support.bioconductor.org.

Martin Morgan

On 2/7/20, 12:57 PM, "R-help on behalf of pooja sinha" 
 wrote:

Hi All,

I have a file list consisting of Chromosome, Start , End & Methylation
Difference in the following format in excel:

Chrom Start  End  Meth. Diff

chr1 38565900 38566000 -0.20276818

chr1 38870400 38870500 -0.342342342

chr1 39469400 39469500 -0.250260552

chr1 52013600 52013700 -0.37797619

chr1 52751700 52751800  0.257575758

chr1 75505100 75505200 -0.262847308

I need help in plotting the DMRs using Gviz package in R. I tried a code
below but it doesn't turn out correct.

library(GenomicRanges)
library(grid)
library(Gviz)
library(rtracklayer)
library(BSgenome)
library(readxl)
library(BSgenome.Rnorvegicus.UCSC.rn6)
genome <- getBSgenome("BSgenome.Rnorvegicus.UCSC.rn6")
genome
data1 <- read_excel("DMRs_plots.xlsx")
head(data1)
data1$Chrom = Chrom$chr1

track1 <- DataTrack(data = data1, from = "38565900" , to = "28225",
chromosome = Chrom$chr1, name = "DMRs")

itrack <- IdeogramTrack(genome = genome, chromosome = chr)

plotTracks(track1, itrack)


If anyone know how to plot and correct my code including how to add
methylation difference values, then that will be of great help.


Thanks,

Puja

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to find number of unique rows for combination of r columns

2019-11-08 Thread Martin Morgan

With this example

> df = data.frame(a = c(1, 1, 2, 2), b = c(1, 1, 2, 3), value = 1:4)
> df
  a b value
1 1 1 1
2 1 1 2
3 2 2 3
4 2 3 4

The approach to drop duplicates in the first and second columns has as a 
consequence the arbitrary choice of 'value' for the duplicate entries -- why 
chose a value of '1' rather than '2' (or the average of 1 and 2, or a list 
containing all possible values, or...) for the rows duplicated in columns a and 
b?

> df[!duplicated(df[,1:2]),]
  a b value
1 1 1 1
3 2 2 3
4 2 3 4

In base R one might

> aggregate(value ~ a + b, df, mean)
  a b value
1 1 1   1.5
2 2 2   3.0
3 2 3   4.0
> aggregate(value ~ a + b, df, list)
  a b value
1 1 1  1, 2
2 2 2 3
3 2 3 4

but handling several value-like columns would be hard(?)

Using library(dplyr), I have

> group_by(df, a, b) %>% summarize(mean_value = mean(value))
# A tibble: 3 x 3
# Groups:   a [2]
  a b mean_value
 
1 1 11.5
2 2 23
3 2 34

or

> group_by(df, a, b) %>% summarize(values = list(value))
# A tibble: 3 x 3
# Groups:   a [2]
  a b values

1 1 1 
2 2 2 
3 2 3 

summarizing multiple columns with dplyr

> df$v1 = 1:4
> df$v2 = 4:1   
>  group_by(df, a, b) %>% summarize(v1_mean = mean(v1), v2_median = median(v2))
# A tibble: 3 x 4
# Groups:   a [2]
  a b v1_mean v2_median
   
1 1 1 1.5   3.5
2 2 2 3 2
3 2 3 4 1

I do not know how performant this would be with data of your size.

Martin Morgan

On 11/8/19, 1:39 PM, "R-help on behalf of Ana Marija" 
 wrote:

Thank you so much!!!

On Fri, Nov 8, 2019 at 11:40 AM Bert Gunter  wrote:
>
> Correction:
> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
> df[!duplicated(df[,2:3]), ]  ## Note the ! sign
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along 
and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Nov 8, 2019 at 7:59 AM Bert Gunter  wrote:
>>
>> Sorry, but you ask basic questions.You really need to spend some more 
time with an R tutorial or two. This list is not meant to replace your own 
learning efforts.
>>
>> You also do not seem to be reading the docs carefully. Under ?unique, it 
links ?duplicated and tells you that it gives indices of duplicated rows of a 
data frame. These then can be used by subscripting to remove those rows from 
the data frame. Here is a reproducible example:
>>
>> df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)])
>> df[-duplicated(df[,2:3]), ]  ## Note the - sign
>>
>> If you prefer, the "Tidyverse" world has what are purported to be more 
user-friendly versions of such data handling functionality that you can use 
instead.
>>
>>
>> Bert
>>
>> On Fri, Nov 8, 2019 at 7:38 AM Ana Marija  
wrote:
>>>
>>> would you know how would I extract from my original data frame, just
>>> these unique rows?
>>> because this gives me only those 3 columns, and I want all columns
>>> from the original data frame
>>>
>>> > head(udt)
>>>chr   pos gene_id
>>> 1 chr1 54490 ENSG0227232
>>> 2 chr1 58814 ENSG0227232
>>> 3 chr1 60351 ENSG0227232
>>> 4 chr1 61920 ENSG0227232
>>> 5 chr1 63671 ENSG0227232
>>> 6 chr1 64931 ENSG0227232
>>>
>>> > head(dt)
>>> chr   pos gene_id pval_nominal pval_ret   wl  wr
  META
>>> 1: chr1 54490 ENSG0227232 0.608495 0.783778 31.62278 21.2838 
0.7475480
>>> 2: chr1 58814 ENSG0227232 0.295211 0.897582 31.62278 21.2838 
0.6031214
>>> 3: chr1 60351 ENSG0227232 0.439788 0.867959 31.62278 21.2838 
0.6907182
>>> 4: chr1 61920 ENSG0227232 0.319528 0.601809 31.62278 21.2838 
0.4032200
>>> 5: chr1 63671 ENSG0227232 0.237739 0.988039 31.62278 21.2838 
0.7482519
>>> 6: chr1 64931 ENSG0227232 0.276679 0.907037 31.62278 21.2838 
0.5974800
>>>
>>> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija  
wrote:
>>> >
>>> > Thank you so much! Converting it to data frame resolved the issue!
>>> >
>>> > On Fri, No

Re: [R] how to use a matrix as an index to another matrix?

2019-10-11 Thread Martin Morgan

A matrix can be subset by another 2-column matrix, where the first column is 
the row index and the second column the column index. So

idx = matrix(c(B, col(B)), ncol = 2)
A[] <- A[idx]

Martin Morgan

On 10/11/19, 6:31 AM, "R-help on behalf of Eric Berger" 
 wrote:

Here is one way
A <- sapply(1:ncol(A), function(i) {A[,i][B[,i]]})

On Fri, Oct 11, 2019 at 12:44 PM Jinsong Zhao  wrote:

> Hi there,
>
> I have two matrices, A and B. The columns of B is the index of the
> corresponding columns of A. I hope to rearrange of A by B. A minimal
> example is following:
>
>  > set.seed(123)
>  > A <- matrix(sample(1:10), nrow = 5)
>  > B <- matrix(c(sample(1:5), sample(1:5)), nrow =5, byrow = FALSE)
>  > A
>   [,1] [,2]
> [1,]39
> [2,]   101
> [3,]27
> [4,]85
> [5,]64
>  > B
>   [,1] [,2]
> [1,]21
> [2,]34
> [3,]15
> [4,]43
> [5,]52
>  > A[,1] <- A[,1][B[,1]]
>  > A[,2] <- A[,2][B[,2]]
>  > A
>   [,1] [,2]
> [1,]   109
> [2,]25
> [3,]34
> [4,]87
> [5,]61
>
> My question is whether there is any elegant or generalized way to replace:
>
>  > A[,1] <- A[,1][B[,1]]
>  > A[,2] <- A[,2][B[,2]]
>
> Thanks in advance.
>
> PS., I know how to do the above thing by loop.
>
> Best,
> Jinsong
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] BiocManager problem.

2019-10-10 Thread Martin Morgan

Please follow the response to your question on the Bioconductor support site

https://support.bioconductor.org/p/125493/

Martin Morgan

On 10/10/19, 12:23 PM, "R-help on behalf of Ali Siavosh" 
 wrote:

Hi,
I have installation of R in a server running on redhat 7. I have upgraded R 
and now to upgrade BiocManager I get error messages as below:

> install.packages("BiocManager")
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
trying URL 
'https://cran.revolutionanalytics.com/src/contrib/BiocManager_1.30.7.tar.gz'
Content type 'application/octet-stream' length 38020 bytes (37 KB)
==
downloaded 37 KB

* installing *source* package ‘BiocManager’ ...
** package ‘BiocManager’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package ‘BiocManager’
finding HTML links ... done
BiocManager-pkg html  
available   html  
install html  
repositorieshtml  
valid   html  
version html  
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation 
path
* DONE (BiocManager)
Making 'packages.html' ... done

The downloaded source packages are in
‘/tmp/RtmpgHhwMp/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
> BiocManager::version()
Error: .onLoad failed in loadNamespace() for 'BiocManager', details:
  call: NULL
  error: Bioconductor version '3.8' requires R version '3.5'; see
  https://bioconductor.org/install
> BiocManager::valid()
Error: .onLoad failed in loadNamespace() for 'BiocManager', details:
  call: NULL
  error: Bioconductor version '3.8' requires R version '3.5'; see
  https://bioconductor.org/install
> BiocManager::install(version="3.5")
Error: .onLoad failed in loadNamespace() for 'BiocManager', details:
  call: NULL
  error: Bioconductor version '3.8' requires R version '3.5'; see
  https://bioconductor.org/install
> BiocManager::install(version="3.7")
Error: .onLoad failed in loadNamespace() for 'BiocManager', details:
  call: NULL
  error: Bioconductor version '3.8' requires R version '3.5'; see
  https://bioconductor.org/install <https://bioconductor.org/install>

I appreciate any help with regard to this.
Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to coerce an AnnotatedDataFrame in order to access Probeset Info

2019-07-17 Thread Martin Morgan

Are you remembering to attach the Biobase package to your R session?

> AnnotatedDataFrame()
Error in AnnotatedDataFrame() :
  could not find function "AnnotatedDataFrame"
> suppressPackageStartupMessages({ library(Biobase) })
> AnnotatedDataFrame()
An object of class 'AnnotatedDataFrame': none

Biobase is a Bioconductor package, so support questions should more 
appropriately go to https://support.bioconductor.org

Martin

On 7/17/19, 4:20 PM, "R-help on behalf of Spencer Brackett" 

wrote:

Good evening,

I downloaded the Biobase package in order to utilize the ExpressionSet and
other features hosted there to examine annotations for probeset data, which
I seek to visualize. I currently have pre-analyzed object located in my
environment containing said probeset info, along with gene id and location.
After experimenting with the following approaches, I'm am at a loss for as
to why the AnnotatedDataFrame function is not being recognized by R.

##Example of some of my attempts and their respective error messages##

>AnnotatedDataFrame()
Error in AnnotatedDataFrame() : could not find function
   "AnnotatedDataFrame"

 signature(object="assayData")
 object  "assayData"
> annotatedDataFrameFrom("assayData", byrow=FALSE)
Error in annotatedDataFrameFrom("assayData", byrow = FALSE) :
  could not find function "annotatedDataFrameFrom"

>as(data.frame, "AnnotatedDataFrame")
Error in as(data.frame, "AnnotatedDataFrame") :
  no method or default for coercing “function” to “AnnotatedDataFrame”

Best,

Spencer

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Was there a change to R ver. 3.5.2 so that it now treats warnings during installs as errors?

2019-01-20 Thread Martin Morgan

Looks like you're using remotes::install_github(), which in turn uses 
remotes::install().

The README

https://github.com/r-lib/remotes/blob/254c67ed6502e092a316553f2a44f04b0e595b64/README.md

says "Setting R_REMOTES_NO_ERRORS_FROM_WARNINGS=true avoids stopping the 
installation for warning messages. Warnings usually mean installation errors, 
so by default remotes stops for a warning. However, sometimes other warnings 
might happen, that could be ignored by setting this environment variable.

So I'd guess

  Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = TRUE)

before installing the package would address this problem.

Martin Morgan

On 1/20/19, 6:58 AM, "R-help on behalf of Duncan Murdoch" 
 wrote:

On 19/01/2019 8:22 p.m., Peter Waltman wrote:
> I'm trying to install a devel package called gGnome (
> https://github.com/mskilab/gGnome). One of its dependencies is another
> package from the same group, called gTrack, which causes several warning
> messages to be generated because it overloads a couple of functions that
> are part of other packages that gTrack is dependent upon.  The specific
> warnings are provided below.  During the lazy-loading step of gGnome's
> install, gTrack is loaded, and when these warnings come up, they are
> converted to errors, causing the install to fail. This behavior is new to
> version 3.5.2, as I've been able to successfully install these packages
> with R versions 3.5.0 and 3.5.1. Is there a workaround for this for 
version
> 3.5.2?
> 
> Thanks!
> 
> Error message during gGnome install:
> 
>> install_github('mskilab/gGnome')
> Downloading GitHub repo mskilab/gGnome@master
> Skipping 3 packages not available: GenomicRanges, rtracklayer,
> VariantAnnotation
> ✔  checking for file
> ‘/tmp/Rtmp4hnMMO/remotes7fb938cd0553/mskilab-gGnome-81f661e/DESCRIPTION’ 
...
> ─  preparing ‘gGnome’:
> ✔  checking DESCRIPTION meta-information ...
> ─  checking for LF line-endings in source and make files and shell scripts
> ─  checking for empty or unneeded directories
> Removed empty directory ‘gGnome/inst/extdata/gTrack.js’
> ─  building ‘gGnome_0.1.tar.gz’
> 
> * installing *source* package ‘gGnome’ ...
> ** R
> ** inst
> ** byte-compile and prepare package for lazy loading
> Error: package or namespace load failed for ‘gTrack’:
> * (converted from warning)* multiple methods tables found for ‘seqinfo<-’
> Error : package ‘gTrack’ could not be loaded
> ERROR: lazy loading failed for package ‘gGnome’
> * removing ‘/home/waltman/bin/R/3.5.2/lib/R/library/gGnome’
> Error in i.p(...) :
>(converted from warning) installation of package
> ‘/tmp/Rtmp4hnMMO/file7fb929638ed8/gGnome_0.1.tar.gz’ had non-zero exit
> status

That message indicates that options("warn") is 2 or higher when the 
warning occurs.  What is its setting before you start the install?

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Efficient way of loading files in R

2018-09-07 Thread Martin Morgan

Ask on the Bioconductor support site https://support.bioconductor.org

Provide (on the support site) the output of the R commands

  library(GEOquery)
  sessionInfo()

Also include (copy and paste) the output of the command that fails. I have

> gseEset2 <- getGEO('GSE76896')[[1]]
Found 1 file(s)
GSE76896_series_matrix.txt.gz
trying URL 
'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz'

Content type 'application/x-gzip' length 40561936 bytes (38.7 MB)
==
downloaded 38.7 MB

Parsed with column specification:
cols(
  .default = col_double(),
  ID_REF = col_character()
)
See spec(...) for full column specifications.
|=| 100% 
  84 MB

File stored at:
/tmp/Rtmpe4NWji/GPL570.soft
|=| 100% 
  75 MB

> sessionInfo()
R version 3.5.1 Patched (2018-08-22 r75177)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so
LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] bindrcpp_0.2.2  GEOquery_2.49.1 Biobase_2.41.2
[4] BiocGenerics_0.27.1 BiocManager_1.30.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18 tidyr_0.8.1  crayon_1.3.4 dplyr_0.7.6
 [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0
 [9] stringi_1.2.4rlang_0.2.2  curl_3.2 limma_3.37.4
[13] xml2_1.2.0   tools_3.5.1  readr_1.1.1  glue_1.3.0
[17] purrr_0.2.5  hms_0.4.2compiler_3.5.1   pkgconfig_2.0.2
[21] tidyselect_0.2.4 bindr_0.1.1  tibble_1.4.2

On 09/07/2018 06:08 AM, Deepa wrote:

Hello,

I am using a bioconductor package in R.
The command that I use reads the contents of a file downloaded from a
database and creates an expression object.

The syntax works perfectly fine when the input size is of 10 MB. Whereas,
when the file size is around 40MB the object isn't created.

Is there an efficient way of loading a large input file to create the
expression object?

This is my code,

library(gcrma)
library(limma)
library(biomaRt)
library(GEOquery)
library(Biobase)
require(GEOquery)
require(Biobase)
gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB
gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB

##gseEset2 doesn't load and isn't created

Many thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mzR fails to install/compile (linuxes)

2018-06-16 Thread Martin Morgan

mzR is a Bioconductor package so you might have more luck contacting the 
maintainer on the Bioconductor support site


  https://support.bioconductor.org

or on the 'bioc-devel' mailing list

  https://stat.ethz.ch/mailman/listinfo/bioc-devel

or most directly by opening an issue on the maintainer's github

  https://github.com/sneumann/mzR/issues/

this is linked to from the package 'landing page'

  https://bioconductor.org/packages/mzR

Martin Morgan

On 06/15/2018 10:49 AM, lejeczek via R-help wrote:

hi guys, just an admin here.

I wonder if anybody see what I see, or similar? I'm on Centos 7.x and 
this occurs with R 3.4.x 3.5.x and probably earlier versions too.


Every time I use something like -j>1 to pass to a compiler, eg.echo -ne

$ "Sys.setenv(MAKEFLAGS = \"-j2\")\\n 
source(\"https://bioconductor.org/biocLite.R\;)\\n biocLite(c(\"mzR\"), 
suppressUpdates=FALSE, suppressAutoUpdate=FALSE, ask=FALSE)" | 
/usr/bin/R --vanilla


mzR fails to compile:
...
g++ -m64 -std=gnu++11 -shared -L/usr/lib64/R/lib -Wl,-z,relro -o mzR.so 
cramp.o ramp_base64.o ramp.o RcppRamp.o RcppRampModule.o rnetCDF.o 
RcppPwiz.o RcppPwizModule.o RcppIdent.o RcppIdentModule.o 
./boost/libs/system/src/error_code.o ./boost/libs/regex/src/posix_api.o 
./boost/libs/regex/src/fileiter.o 
./boost/libs/regex/src/regex_raw_buffer.o 
./boost/libs/regex/src/cregex.o ./boost/libs/regex/src/regex_debug.o 
./boost/libs/regex/src/instances.o ./boost/libs/regex/src/icu.o 
./boost/libs/regex/src/usinstances.o ./boost/libs/regex/src/regex.o 
./boost/libs/regex/src/wide_posix_api.o 
./boost/libs/regex/src/regex_traits_defaults.o 
./boost/libs/regex/src/winstances.o 
./boost/libs/regex/src/wc_regex_traits.o 
./boost/libs/regex/src/c_regex_traits.o 
./boost/libs/regex/src/cpp_regex_traits.o 
./boost/libs/regex/src/static_mutex.o 
./boost/libs/regex/src/w32_regex_traits.o 
./boost/libs/iostreams/src/zlib.o 
./boost/libs/iostreams/src/file_descriptor.o 
./boost/libs/filesystem/src/operations.o 
./boost/libs/filesystem/src/path.o 
./boost/libs/filesystem/src/utf8_codecvt_facet.o 
./boost/libs/chrono/src/chrono.o 
./boost/libs/chrono/src/process_cpu_clocks.o 
./boost/libs/chrono/src/thread_clock.o ./pwiz/data/msdata/Version.o 
./pwiz/data/identdata/Version.o ./pwiz/data/common/MemoryIndex.o 
./pwiz/data/common/CVTranslator.o ./pwiz/data/common/cv.o 
./pwiz/data/common/ParamTypes.o ./pwiz/data/common/BinaryIndexStream.o 
./pwiz/data/common/diff_std.o ./pwiz/data/common/Unimod.o 
./pwiz/data/msdata/mz5/Configuration_mz5.o 
./pwiz/data/msdata/mz5/Connection_mz5.o 
./pwiz/data/msdata/mz5/Datastructures_mz5.o 
./pwiz/data/msdata/mz5/ReferenceRead_mz5.o 
./pwiz/data/msdata/mz5/ReferenceWrite_mz5.o 
./pwiz/data/msdata/mz5/Translator_mz5.o 
./pwiz/data/msdata/SpectrumList_MGF.o 
./pwiz/data/msdata/DefaultReaderList.o 
./pwiz/data/msdata/ChromatogramList_mzML.o 
./pwiz/data/msdata/ChromatogramList_mz5.o ./pwiz/data/msdata/examples.o 
./pwiz/data/msdata/Serializer_mzML.o ./pwiz/data/msdata/Serializer_MSn.o 
./pwiz/data/msdata/Reader.o ./pwiz/data/msdata/Serializer_mz5.o 
./pwiz/data/msdata/Serializer_MGF.o 
./pwiz/data/msdata/Serializer_mzXML.o 
./pwiz/data/msdata/SpectrumList_mzML.o 
./pwiz/data/msdata/SpectrumList_MSn.o 
./pwiz/data/msdata/SpectrumList_mz5.o 
./pwiz/data/msdata/BinaryDataEncoder.o ./pwiz/data/msdata/Diff.o 
./pwiz/data/msdata/MSData.o ./pwiz/data/msdata/References.o 
./pwiz/data/msdata/SpectrumList_mzXML.o ./pwiz/data/msdata/IO.o 
./pwiz/data/msdata/SpectrumList_BTDX.o ./pwiz/data/msdata/SpectrumInfo.o 
./pwiz/data/msdata/RAMPAdapter.o ./pwiz/data/msdata/LegacyAdapter.o 
./pwiz/data/msdata/SpectrumIterator.o ./pwiz/data/msdata/MSDataFile.o 
./pwiz/data/msdata/MSNumpress.o ./pwiz/data/msdata/SpectrumListCache.o 
./pwiz/data/msdata/Index_mzML.o 
./pwiz/data/msdata/SpectrumWorkerThreads.o 
./pwiz/data/identdata/IdentDataFile.o ./pwiz/data/identdata/IdentData.o 
./pwiz/data/identdata/DefaultReaderList.o ./pwiz/data/identdata/Reader.o 
./pwiz/data/identdata/Serializer_protXML.o 
./pwiz/data/identdata/Serializer_pepXML.o 
./pwiz/data/identdata/Serializer_mzid.o ./pwiz/data/identdata/IO.o 
./pwiz/data/identdata/References.o ./pwiz/data/identdata/MascotReader.o 
./pwiz/data/proteome/Modification.o ./pwiz/data/proteome/Digestion.o 
./pwiz/data/proteome/Peptide.o ./pwiz/data/proteome/AminoAcid.o 
./pwiz/utility/minimxml/XMLWriter.o ./pwiz/utility/minimxml/SAXParser.o 
./pwiz/utility/chemistry/Chemistry.o 
./pwiz/utility/chemistry/ChemistryData.o 
./pwiz/utility/chemistry/MZTolerance.o ./pwiz/utility/misc/IntegerSet.o 
./pwiz/utility/misc/Base64.o ./pwiz/utility/misc/IterationListener.o 
./pwiz/utility/misc/MSIHandler.o ./pwiz/utility/misc/Filesystem.o 
./pwiz/utility/misc/TabReader.o 
./pwiz/utility/misc/random_access_compressed_ifstream.o 
./pwiz/utility/misc/SHA1.o ./pwiz/utility/misc/SHA1Calculator.o 
./pwiz/utility/misc/sha1calc.o ./random_access_gzFile.o ./RcppExports.o 
./boost/libs/thread/src/pthread/o

Re: [R] S4 class slot type S4 class

2018-05-21 Thread Martin Morgan

On 05/21/2018 12:06 AM, Glenn Schultz wrote:

All,

I am considering creating an S4 class whose slots (2) are both S4 
classes.  Since an S4 slot can be an S3 class I figure this can be done. 
  However, the correct syntax of which I am unsure.  Reviewing the docs 
I have come to the following conclusion:

SetClass('myfoo',
                   slots = (foo1, foo2))

Without a type I believe each slot is .Data.  A get method on the above 
class slots would return say foo1 which will have all methods and 
generics belonging to foo1 class.  Is this the correct approach?

Suppose you have two classes

  .A = setClass("A", slots = c(x = "numeric"))
  .B = setClass("B", slots = c(y = "numeric", z = "numeric"))

A third class containing these would be

  .C = setClass("C", slots = c(a = "A", b = "B"))

where names of the slot argument are the slot names, and the character 
strings "A", "B" are the type of object the slot will store.

> .C()
An object of class "C"
Slot "a":
An object of class "A"
Slot "x":
numeric(0)

Slot "b":
An object of class "B"
Slot "y":
numeric(0)

Slot "z":
numeric(0)

> .C(a = .A(x = 1:2), b = .B(y = 2:1, z = 1:2))
An object of class "C"
Slot "a":
An object of class "A"
Slot "x":
[1] 1 2

Slot "b":
An object of class "B"
Slot "y":
[1] 2 1

Slot "z":
[1] 1 2

Martin Morgan

Best,
Glenn
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Possible Improvement to sapply

2018-03-13 Thread Martin Morgan




On 03/13/2018 09:23 AM, Doran, Harold wrote:

While working with sapply, the documentation states that the simplify argument will yield a vector, 
matrix etc "when possible". I was curious how the code actually defined "as 
possible" and see this within the function

if (!identical(simplify, FALSE) && length(answer))

This seems superfluous to me, in particular this part:

!identical(simplify, FALSE)

The preceding code could be reduced to

if (simplify && length(answer))

and it would not need to execute the call to identical in order to trigger the 
conditional execution, which is known from the user's simplify = TRUE or FALSE 
inputs. I *think* the extra call to identical is just unnecessary overhead in 
this instance.

Take for example, the following toy example code and benchmark results and a 
small modification to sapply:

myList <- list(a = rnorm(100), b = rnorm(100))

answer <- lapply(X = myList, FUN = length)
simplify = TRUE

library(microbenchmark)

mySapply <- function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE){
FUN <- match.fun(FUN)
 answer <- lapply(X = X, FUN = FUN, ...)
 if (USE.NAMES && is.character(X) && is.null(names(answer)))
 names(answer) <- X
 if (simplify && length(answer))
 simplify2array(answer, higher = (simplify == "array"))
 else answer
}



microbenchmark(sapply(myList, length), times = 1L)

Unit: microseconds
exprmin lq mean median uqmax neval
  sapply(myList, length) 14.156 15.572 16.67603 15.926 16.634 650.46 1

microbenchmark(mySapply(myList, length), times = 1L)

Unit: microseconds
  exprmin lq mean median uq  max neval
  mySapply(myList, length) 13.095 14.864 16.02964 15.218 15.573 1671.804 1

My benchmark timings show a timing improvement with only that small change made 
and it is seemingly nominal. In my actual work, the sapply function is called 
millions of times and this additional overhead propagates to some overall 
additional computing time.

I have done some limited testing on various real data to verify that the 
objects produced under both variants of the sapply (base R and my modified) 
yield identical objects when simply is both TRUE or FALSE.

Perhaps someone else sees a counterexample where my proposed fix does not cause 
for sapply to behave as expected.



Check out ?sapply for possible values of `simplify=` to see why your 
proposal is not adequate.


For your example, lengths() is an order of magnitude faster than 
sapply(., length). This is a example of the advantages of vectorization 
(single call to an R function implemented in C) versus iteration (`for` 
loops but also the *apply family calling an R function many times). 
vapply() might also be relevant.


Often performance improvements come from looking one layer up from where 
the problem occurs and re-thinking the algorithm. Why would one need to 
call sapply() millions of times, in a situation where this becomes 
rate-limiting? Can the algorithm be re-implemented to avoid this step?


Martin Morgan


Harold

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] UseDevel: version requires a more recent R

2018-01-09 Thread Martin Morgan


Ask questions about Bioconductor on the support site

  https://support.bioconductor.org

Bioconductor versions are tied to particular R versions. The current 
Bioc-devel requires use of R-devel. You're using R-3.4.2, so need to 
install the devel version of R.


Additional information is at

  http://bioconductor.org/developers/how-to/useDevel/

Martin Morgan

On 01/09/2018 01:32 PM, Sariya, Sanjeev wrote:

Hello R experts:

I need a developer version of a Bioconductor library.


sessionInfo()

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

When I try to useDevel it fails.

I've removed packages and again loaded but I get the same error message.

remove.packages("BiocInstaller")
source("https://bioconductor.org/biocLite.R;)
library(BiocInstaller)

Bioconductor version 3.6 (BiocInstaller 1.28.0), ?biocLite for help



useDevel()


Error: 'devel' version requires a more recent R

I'm running into this error for few days now. I close R after removing 
biocInstaller and proceed with following steps.

Please guide me to fix this.

Thanks,
SS

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Facing problem in installing the package named "methyAnalysis"

2017-12-29 Thread Martin Morgan


On 12/29/2017 07:00 AM, Pijush Das wrote:

Thank you Michael Dewey.
Can you please send me the email id for Bioconductor.


https://support.bioconductor.org

Make sure you are using packages from a consistent version of Bioconductor

  source("https://bioconductor.org/biocLite.R;)
  BiocInstaller::biocValid()

Martin






regards
Pijush

On Fri, Dec 29, 2017 at 5:20 PM, Michael Dewey 
wrote:


Dear Pijush

You might do better to ask on the Bioconductor list as IRanges does not
seem to be on CRAN so I deduce it is a Bioconductor package too.

Michael


On 29/12/2017 07:29, Pijush Das wrote:


Dear Sir,




I have been using R for a long time. But recently I have faced a problem
when installing the Bioconductor package named "methyAnalysis". Firstly it
was require to update my older R (R version 3.4.3 (2017-11-30)) in to
newer
version. That time I have also updated the RStudio software.

After that when I have tried to install the package named "methyAnalysis".
It shows some error given below.

No methods found in package ‘IRanges’ for requests: ‘%in%’,
‘elementLengths’, ‘elementMetadata’, ‘ifelse’, ‘queryHits’, ‘Rle’,
‘subjectHits’, ‘t’ when loading ‘bumphunter’
Error: package or namespace load failed for ‘methyAnalysis’:
   objects ‘.__T__split:base’, ‘split’ are not exported by
'namespace:IRanges'
In addition: Warning message:
replacing previous import ‘BiocGenerics::image’ by ‘graphics::image’ when
loading ‘methylumi’

I also try to install the package after downloading the source package
from
Bioconductor but the method is useless.

Please help me to install the package named "methyAnalysis".

Thanking you



regards
Pijush

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posti
ng-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dplyr - add/expand rows

2017-11-29 Thread Martin Morgan


On 11/29/2017 05:47 PM, Tóth Dénes wrote:

Hi Martin,

On 11/29/2017 10:46 PM, Martin Morgan wrote:

On 11/29/2017 04:15 PM, Tóth Dénes wrote:

Hi,

A benchmarking study with an additional (data.table-based) solution. 


I don't think speed is the right benchmark (I do agree that 
correctness is!).


Well, agree, and sorry for the wording. It was really just an exercise 
and not a full evaluation of the approaches. When I read the avalanche 
of solutions neither of which mentioning data.table (my first choice for 
data.frame-manipulations), I became curious how a one-liner data.table 
code performs against the other solutions in terms of speed and 
readability.
Second, I quite often have the feeling that dplyr is extremely overused 
among novice (and sometimes even experienced) R users nowadays. This is 
unfortunate, as the present example also illustrates.


Another solution is Bill's approach and dplyr's implementation (adding 
the 1L to keep integers integers!)


fun_bill1 <- function(d) {
  i <- rep(seq_len(nrow(d)), d$to - d$from + 1L)
  j <- sequence(d$to - d$from + 1L)
  ## d[i,] %>% mutate(year = from + j - 1L, from = NULL, to = NULL)
  mutate(d[i,], year = from + j - 1L, from = NULL, to = NULL)
}

which is competitive with IRanges and data.table (the more dplyr-ish? 
solution


  d[i, ] %>% mutate(year = from + j - 1L) %>%
  select(station, record, year))

has intermediate performance) and might appeal to those introduced to R 
through dplyr but wanting more base R knowledge, and vice versa. I think 
if dplyr introduces new users to R, or exposes R users to new approaches 
for working with data, that's great!


Martin




Regards,
Denes



For the R-help list, maybe something about least specialized R 
knowledge required would be appropriate? I'd say there were some 
'hard' solutions -- Michael (deep understanding of Bioconductor and 
IRanges), Toth (deep understanding of data.table), Jim (at least for 
me moderate understanding of dplyr,especially the .$ notation; a 
simpler dplyr answer might have moved this response out of the 
'difficult' category, especially given the familiarity of the OP with 
dplyr). I'd vote for Bill's as requiring the least specialized 
knowledge of R (though the +/- 1 indexing is an easy thing to get wrong).


A different criteria might be reuse across analysis scenarios. Bill 
seems to win here again, since the principles are very general and at 
least moderately efficient (both Bert and Martin's solutions are 
essentially R-level iterations and have poor scalability, as 
demonstrated in the microbenchmarks; Bill's is mostly vectorized). 
Certainly data.table, dplyr, and IRanges are extremely useful within 
the confines of the problem domains they address.


Martin


Enjoy! ;)

Cheers,
Denes


--


## packages ##

library(dplyr)
library(data.table)
library(IRanges)
library(microbenchmark)

## prepare example dataset ###

## use Bert's example, with 2000 stations instead of 2
d_df <- data.frame( station = rep(rep(c("one","two"),c(5,4)), 1000L),
 from = as.integer(c(60,61,71,72,76,60,65,82,83)),
 to = as.integer(c(60,70,71,76,83,64, 81, 82,83)),
 record = c("A","B","C","B","D","B","B","D","E"),
 stringsAsFactors = FALSE)
stations <- rle(d_df$station)
stations$value <- gsub(
   " ", "0",
   paste0("station", format(1:length(stations$value), width = 6)))
d_df$station <- rep(stations$value, stations$lengths)

## prepare tibble and data.table versions
d_tbl <- as_tibble(d_df)
d_dt <- as.data.table(d_df)

## solutions ##

## Bert - by
fun_bert <- function(d) {
   out <- by(
 d, d$station, function(x) with(x, {
   i <- to - from +1
   data.frame(record =rep(record,i),
  year =sequence(i) -1 + rep(from,i),
  stringsAsFactors = FALSE)
 }))
   data.frame(station = rep(names(out), sapply(out,nrow)),
  do.call(rbind,out),
  row.names = NULL,
  stringsAsFactors = FALSE)
}

## Bill - transform
fun_bill <- function(d) {
   i <- rep(seq_len(nrow(d)), d$to-d$from+1)
   j <- sequence(d$to-d$from+1)
   transform(d[i,], year=from+j-1, from=NULL, to=NULL)
}

## Michael - IRanges
fun_michael <- function(d) {
   df <- with(d, DataFrame(station, record, year=IRanges(from, to)))
   expand(df, "year")
}

## Jim - dplyr
fun_jim <- function(d) {
   d %>%
 rowwise() %>%
 do(tibble(station = .$station,
   record = .$record,
   year = seq(.$from, .$to))
 )
}

## Martin - Map
fun_martin <- function(d) {
   d$year <- with(d, Map(seq, from, to))
   res0 <- with(d, Map(data.frame,

Re: [R] dplyr - add/expand rows

2017-11-29 Thread Martin Morgan

0 QMS
2   07EA001  1961 QMC
3   07EA001  1962 QMC
4   07EA001  1963 QMC
5   07EA001  1964 QMC
... ...   ... ...
20  07EA001  1979 QRC
21  07EA001  1980 QRC
22  07EA001  1981 QRC
23  07EA001  1982 QRC
24  07EA001  1983 QRC

If you tell the computer more about your data, it can do more things for
you.

Michael

On Tue, Nov 28, 2017 at 7:34 AM, Martin Morgan <
martin.mor...@roswellpark.org> wrote:


On 11/26/2017 08:42 PM, jim holtman wrote:


try this:

##

library(dplyr)

input <- tribble(
    ~station, ~from, ~to, ~record,
   "07EA001" ,    1960  ,  1960  , "QMS",
   "07EA001"  ,   1961 ,   1970  , "QMC",
   "07EA001" ,    1971  ,  1971  , "QMM",
   "07EA001" ,    1972  ,  1976  , "QMC",
   "07EA001" ,    1977  ,  1983  , "QRC"
)

result <- input %>%
    rowwise() %>%
    do(tibble(station = .$station,
  year = seq(.$from, .$to),
  record = .$record)
    )

###



In a bit more 'base R' mode I did

   input$year <- with(input, Map(seq, from, to))
   res0 <- with(input, Map(data.frame, station=station, year=year,
   record=record))
    as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3

resulting in


as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3

    station  year record
   
  1 07EA001  1960    QMS
  2 07EA001  1961    QMC
  3 07EA001  1962    QMC
  4 07EA001  1963    QMC
  5 07EA001  1964    QMC
  6 07EA001  1965    QMC
  7 07EA001  1966    QMC
  8 07EA001  1967    QMC
  9 07EA001  1968    QMC
10 07EA001  1969    QMC
# ... with 14 more rows

I though I should have been able to use `tibble` in the second step, but
that leads to a (cryptic) error


res0 <- with(input, Map(tibble, station=station, year=year,

record=record))Error in captureDots(strict = `__quosured`) :
   the argument has already been evaluated

The 'station' and 'record' columns are factors, so different from the
original input, but this seems the appropriate data type for theses 
columns.


It's interesting to compare the 'specialized' knowledge needed for each
approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(), maybe
rbind() and Map() for base R.

Martin






Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <bgunter.4...@gmail.com>
wrote:

To David W.'s point about lack of a suitable reprex ("reproducible

example"), Bill's solution seems to be for only one station.

Here is a reprex and modification that I think does what was requested
for
multiple stations, again using base R and data frames, not dplyr and
tibbles.

First the reprex with **two** stations:

d <- data.frame( station = rep(c("one","two"),c(5,4)),



 from = c(60,61,71,72,76,60,65,82,83),
  to = c(60,70,71,76,83,64, 81, 82,83),
  record = c("A","B","C","B","D","B","B","D","E"))

d



    station from to record
1 one   60 60  A
2 one   61 70  B
3 one   71 71  C
4 one   72 76  B
5 one   76 83  D
6 two   60 64  B
7 two   65 81  B
8 two   82 82  D
9 two   83 83  E

## Now the conversion code using base R, especially by():

out <- by(d, d$station, function(x) with(x, {



+    i <- to - from +1
+    data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD 
=rep(record,i))

+ }))


out <- data.frame(station =



rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)


out



 station YEAR RECORD
1  one   60  A
2  one   61  B
3  one   62  B
4  one   63  B
5  one   64  B
6  one   65  B
7  one   66  B
8  one   67  B
9  one   68  B
10 one   69  B
11 one   70  B
12 one   71  C
13 one   72  B
14 one   73  B
15 one   74  B
16 one   75  B
17 one   76  B
18 one   76  D
19 one   77  D
20 one   78  D
21 one   79  D
22 one   80  D
23 one   81  D
24 one   82  D
25 one   83  D
26 two   60  B
27 two   61  B
28 two   62  B
29 two   63  B
30 two   64  B
31 two   65  B
32 two   66  B
33 two   67  B
34 two   68  B
35 two   69  B
36 two   70  B
37 two   71  B
38 two   72  B
39 two   73  B
40 two   74  B
41 two   75  B
42 two   76  B
43 two   7

Re: [R] dplyr - add/expand rows

2017-11-28 Thread Martin Morgan


On 11/26/2017 08:42 PM, jim holtman wrote:

try this:

##

library(dplyr)

input <- tribble(
   ~station, ~from, ~to, ~record,
  "07EA001" ,1960  ,  1960  , "QMS",
  "07EA001"  ,   1961 ,   1970  , "QMC",
  "07EA001" ,1971  ,  1971  , "QMM",
  "07EA001" ,1972  ,  1976  , "QMC",
  "07EA001" ,1977  ,  1983  , "QRC"
)

result <- input %>%
   rowwise() %>%
   do(tibble(station = .$station,
 year = seq(.$from, .$to),
 record = .$record)
   )

###


In a bit more 'base R' mode I did

  input$year <- with(input, Map(seq, from, to))
  res0 <- with(input, Map(data.frame, station=station, year=year,
  record=record))
   as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3

resulting in

> as_tibble(do.call(rbind, unname(res0)))# A tibble: 24 x 3
   station  year record
  
 1 07EA001  1960QMS
 2 07EA001  1961QMC
 3 07EA001  1962QMC
 4 07EA001  1963QMC
 5 07EA001  1964QMC
 6 07EA001  1965QMC
 7 07EA001  1966QMC
 8 07EA001  1967QMC
 9 07EA001  1968QMC
10 07EA001  1969QMC
# ... with 14 more rows

I though I should have been able to use `tibble` in the second step, but 
that leads to a (cryptic) error


> res0 <- with(input, Map(tibble, station=station, year=year, 
record=record))Error in captureDots(strict = `__quosured`) :

  the argument has already been evaluated

The 'station' and 'record' columns are factors, so different from the 
original input, but this seems the appropriate data type for theses columns.


It's interesting to compare the 'specialized' knowledge needed for each 
approach -- rowwise(), do(), .$ for tidyverse, with(), do.call(), maybe 
rbind() and Map() for base R.


Martin





Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter  wrote:


To David W.'s point about lack of a suitable reprex ("reproducible
example"), Bill's solution seems to be for only one station.

Here is a reprex and modification that I think does what was requested for
multiple stations, again using base R and data frames, not dplyr and
tibbles.

First the reprex with **two** stations:


d <- data.frame( station = rep(c("one","two"),c(5,4)),

from = c(60,61,71,72,76,60,65,82,83),
 to = c(60,70,71,76,83,64, 81, 82,83),
 record = c("A","B","C","B","D","B","B","D","E"))


d

   station from to record
1 one   60 60  A
2 one   61 70  B
3 one   71 71  C
4 one   72 76  B
5 one   76 83  D
6 two   60 64  B
7 two   65 81  B
8 two   82 82  D
9 two   83 83  E

## Now the conversion code using base R, especially by():


out <- by(d, d$station, function(x) with(x, {

+i <- to - from +1
+data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
+ }))



out <- data.frame(station =

rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)



out

station YEAR RECORD
1  one   60  A
2  one   61  B
3  one   62  B
4  one   63  B
5  one   64  B
6  one   65  B
7  one   66  B
8  one   67  B
9  one   68  B
10 one   69  B
11 one   70  B
12 one   71  C
13 one   72  B
14 one   73  B
15 one   74  B
16 one   75  B
17 one   76  B
18 one   76  D
19 one   77  D
20 one   78  D
21 one   79  D
22 one   80  D
23 one   81  D
24 one   82  D
25 one   83  D
26 two   60  B
27 two   61  B
28 two   62  B
29 two   63  B
30 two   64  B
31 two   65  B
32 two   66  B
33 two   67  B
34 two   68  B
35 two   69  B
36 two   70  B
37 two   71  B
38 two   72  B
39 two   73  B
40 two   74  B
41 two   75  B
42 two   76  B
43 two   77  B
44 two   78  B
45 two   79  B
46 two   80  B
47 two   81  B
48 two   82  D
49 two   83  E

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
r-help@r-project.org> wrote:


dplyr may have something for this, but in base R I think the following

does

what you want.  I've shortened the name of your data set to 'd'.

i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <

Re: [R] R_LIBS_USER not in libPaths

2017-09-16 Thread Martin Morgan


On 09/16/2017 11:29 AM, Rene J Suarez-Soto wrote:

I have not intentionally set R_LIBS_USER. I looked for an Renviron.site
file but did not see it in R/etc or my home directory. The strange part is
that if I print Sud.getenv I see a value for R_LIBS_USER. However, this
directory is not showing under libPaths.

I though .libPaths should contain R_LIBS_USER.


If the directory pointed to by R_LIBS_USER does not exist, then 
.libPaths() will not contain it. This is documented on ?.libPaths or 
?R_LIBS_USER


 Only directories
 which exist at the time will be included.

The file in the user home directory is .Renviron, rather than 
Renviron.site. This documented at, e.g,. ?Renviron


 The name of the user file
 can be specified by the 'R_ENVIRON_USER' environment variable; if
 this is unset, the files searched for are '.Renviron' in the
 current or in the user's home directory (in that order).

R environment variables are set when R starts; I can discover these, on 
linux, by invoking the relevant command-line command after running R CMD


$ env|grep "^R_"
$

(i.e., no output) versus

$ R CMD env|grep "^R_"
R_UNZIPCMD=/usr/bin/unzip
...

Generally, ?Startup describes the startup process, and most variables 
are described in R via ?R_...


Martin



I also noticed that R related variables are not in the system or user
variables because I dont see them when I type SET from the Windows Command
line. So a related question is where does R get the system variables
(e.g., R_LIBS_USER,
R_HOME) if I dont see a Renviron.site file. Thanks

On Sep 16, 2017 10:45 AM, "Henrik Bengtsson" 
wrote:

I'm not sure I follow what.the problem is. Are you trying to
set R_LIBS_USER but R does not acknowledge it, or do you observe something
in R that you didn't expect to be there and you are trying to figure out
why that is / where that happens?

Henrik

On Sep 16, 2017 07:10, "Rene J Suarez-Soto"  wrote:


I have a computer where R_LIBS_USER is not found in libPaths. This is for
Windows (x64). I ran R from the command line, RGui and RStudio and I get
the same results. I also ran R --vanilla and I still get the discrepancy.

The only thing I found interesting was that I also ran SET from the command
line and the "R related variables" (e.g.,  R_HOME; R_LIBS_USER) are not
there. Therefore these variables are being set when I start R. I have not
been able to track where does R obtain the value for these.

Aside from looking at
http://stat.ethz.ch/R-manual/R-patched/library/base/html/Startup.html I am
not sure I have much more information that I have found useful.

Thanks

R

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posti
ng-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add make option to package compilation?

2017-09-15 Thread Martin Morgan


On 09/15/2017 08:57 AM, Michael Dewey wrote:

In line

On 15/09/2017 13:30, Martin Møller Skarbiniks Pedersen wrote:

On 15 September 2017 at 14:13, Duncan Murdoch 
wrote:


On 15/09/2017 8:11 AM, Martin Møller Skarbiniks Pedersen wrote:


Hi,

I am installing a lot of packages to a new R installation and it 
takes

a
long time.
However the machine got 4 cpus and most of the packages are 
written in

C/C++.

So is it possible to add a -j4 flag to the make command when I 
use the

install.packages() function?
That will probably speed up the package installation process 390%.



See the Ncpus argument in ?install.packages.



Thanks.

However it looks like Ncpus=4 tries to compile four R packages at the 
same

time using one cpu for each packages.


The variable MAKE is defined in ${R_HOME}/etc/Renviron, and can be 
over-written with ~/.Renviron


MAKE=make -j

There is further discussion in


https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Configuration-variables

and ?Renviron.

One could configure a source installation to always compile with make 
-j, something like ./configure MAKE="make -j"


Martin





But you said you had lots to install so would that not speed things up too?


 From the documentation:
"
Ncpus: the number of parallel processes to use for a parallel
   install of more than one source package.  Values greater than
   one are supported if the ‘make’ command specified by
   ‘Sys.getenv("MAKE", "make")’ accepts argument ‘-k -j Ncpus’
"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
http://www.avg.com






This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in readRDS(dest) (was Re: Error with installed.packages with R 3.4.0 on Windows)

2017-05-31 Thread Martin Morgan


On 05/31/2017 04:38 AM, Patrick Connolly wrote:

On Tue, 23-May-2017 at 12:20PM +0200, Martin Maechler wrote:

[...]

|>
|> Given the above stack trace.
|> It may be easier to just do
|>
|> debugonce(available.packages)
|> install.packages("withr")
|>
|> and then inside available.packages, (using 'n') step to the
|> point _before_ the tryCatch(...) call happens; there, e.g. use
|>
|>   ls.str()
|>
|> which gives an str() of all your local objects, notably 'dest'
|> and 'method'.
|> but you can also try other things once inside
|> available.packages().

I couldn't see any differences between R-3.3.3 (which works) and
R-3.4.0 (which doesn't) until I got to here, a few lines before the
download.file line:

Browse[2]>
debug: dest <- file.path(tempdir(), paste0("repos_", URLencode(repos,
 TRUE), ".rds"))
Browse[2]>

When I check out those directories in a terminal, there's a big diffrence:

With R-3.4.0
~ > ll /tmp/RtmpFUhtpY
total 4
drwxr-xr-x 2 hrapgc hrapgc 4096 May 31 10:45 downloaded_packages/
-rw-r--r-- 1 hrapgc hrapgc0 May 31 10:56 
repos_http%3A%2F%2Fcran.stat.auckland.ac.nz%2Fsrc%2Fcontrib.rds




The file repos_http%3A%2F%2Fcran.stat.auckland.ac.nz%2Fsrc%2Fcontrib.rds 
was likely created earlier in your R session. Likely the download a few 
lines down


download.file(url = paste0(repos, "/PACKAGES.rds"),
  destfile = dest, method = method,
  cacheOK = FALSE, quiet = TRUE, mode = 
"wb")


'succeeded' but created a zero-length file.

You could try to troubleshoot this with something like the following, 
downloading to a temporary location


  dest = tempfile()
  url = "http://cran.stat.auckland.ac.nz/src/contrib/PACKAGES.rds;
  download.file(url, dest)
  file.size(dest)

If this succeeds (it should download a file of several hundred KB), then 
try adding the options method, cacheOK, quiet, mode to the 
download.file() call. 'method' can be determined when you are in 
available.packages while debugging; if R says that it is missing, then 
it will be assigned, in download.file, to either 
getOption("download.file.method") or (if the option is NULL or "auto") 
"libcurl".


If the download 'succeeds' but the temporary file created is 0 bytes, 
then it would be good to share the problematic command with us.


Martin Morgan


With R-3.3.3
~ > ll /tmp/RtmpkPgL3A
total 380
drwxr-xr-x 2 hrapgc hrapgc   4096 May 31 11:01 downloaded_packages/
-rw-r--r-- 1 hrapgc hrapgc   8214 May 31 11:01 libloc_185_3165c7f52d5fdf96.rds
-rw-r--r-- 1 hrapgc hrapgc 372263 May 31 11:01 
repos_http%3A%2F%2Fcran.stat.auckland.ac.nz%2Fsrc%2Fcontrib.rds

So, if I could figure out what makes *that* difference I could get
somewhere.  I see there's considerably extra code in the newer of the
two versions of available.packages() but being a bear with a small
brain, I can't figure out what differences should be expected.  I have
no idea what populates those 'dest' directories.

TIA




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error with installed.packages with R 3.4.0 on Windows

2017-05-22 Thread Martin Morgan


On 05/22/2017 05:10 AM, Patrick Connolly wrote:

On Fri, 28-Apr-2017 at 07:04PM +0200, peter dalgaard wrote:

|>
|> > On 28 Apr 2017, at 12:08 , Duncan Murdoch <murdoch.dun...@gmail.com> wrote:
|> >
|> > On 28/04/2017 4:45 AM, Thierry Onkelinx wrote:
|> >> Dear Peter,
|> >>
|> >> It actually breaks install.packages(). So it is not that innocent.
|> >
|> > I don't think he meant that it is harmless, he meant that the fix is easy, 
and is in place in R-patched and R-devel.  You should use R-patched and you won't 
have the problem.
|>
|> Read more carefully: I said that the _fix_ is harmless for this case, but 
might not be so in general.
|>
|> -pd


Apparently it isn't harmless.


install.packages("withr")

Error in readRDS(dest) : error reading from connection


that seems like a plain-old network connectivity issue, or perhaps an 
issue with the CRAN mirror you're using. Can you debug on your end, e.g,.


  options(error=recover)
  install.packages("withr")
  ...

then select the 'frame' where the error occurs, look around

  ls()

find the value of 'dest', and e.g., try to open dest in your  browser.

Martin Morgan







sessionInfo()

R version 3.4.0 Patched (2017-05-19 r72713)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/hrapgc/local/R-patched/lib/libRblas.so
LAPACK: /home/hrapgc/local/R-patched/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_NZ.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_NZ.UTF-8LC_COLLATE=en_NZ.UTF-8
 [5] LC_MONETARY=en_NZ.UTF-8LC_MESSAGES=en_NZ.UTF-8
 [7] LC_PAPER=en_NZ.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grDevices utils stats graphics  methods   base

other attached packages:
[1] lattice_0.20-35

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0grid_3.4.0




Has anyone a workaround?




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] renameSeqlevels

2017-02-14 Thread Martin Morgan

Rsamtools and GenomicAlignments are Bioconductor packages so ask on the 
Bioconductor support site


  https://support.bioconductor.org

You cannot rename the seqlevels in the bam file; you could rename the 
seqlevels in the object(s) you have created from the bam file.


Martin

On 02/14/2017 09:17 AM, Teresa Tavella wrote:

Dear all,

I would like to ask if it is possible to change the seqnames of a bam file
giving a vector of character to the function renameSeqlevels. This is
because in order to use the fuction summarizeOverlap or count/find, the
seqnames have to match.

From the bamfile below I have extracted the locus annotations form the

seqnames (i.e ERCC2, NC_001133.9...etc) and I have created a list (same
length as the seqlevels of the bam file).


*bamfile*
GAlignments object with 6 alignments and 0 metadata columns:

seqnames


  [1]
DQ459430_gene=ERCC2_loc:ERCC2|1-1061|+_exons:1-1061_segs:1-1061
  [2]
DQ459430_gene=ERCC2_loc:ERCC2|1-1061|+_exons:1-1061_segs:1-1061
  [3]
DQ459430_gene=ERCC2_loc:ERCC2|1-1061|+_exons:1-1061_segs:1-1061
  [4]
DQ459430_gene=ERCC2_loc:ERCC2|1-1061|+_exons:1-1061_segs:1-1061
  [5]
DQ459430_gene=ERCC2_loc:ERCC2|1-1061|+_exons:1-1061_segs:1-1061
  [6]
DQ459430_gene=ERCC2_loc:ERCC2|1-1061|+_exons:1-1061_segs:1-1061
  strand   cigarqwidth start   end width njunc
 
  [1]  + 8M2D27M35  1025  106137 0
  [2]  + 8M2D27M35  1025  106137 0
  [3]  - 36M36  1025  106036 0
  [4]  - 36M36  1026  106136 0
  [5]  + 35M35  1027  106135 0
  [6]  + 35M35  1027  106135 0
  ---
*gffile*
GRanges object with 6 ranges and 12 metadata columns:
 seqnames   ranges strand |   source type score
   |   
  [1] NC_001133.9 [ 24837,  25070]  + | s_cerevisiae exon  
  [2] NC_001133.9 [ 25048,  25394]  + | s_cerevisiae exon  
  [3] NC_001133.9 [ 27155,  27786]  + | s_cerevisiae exon  
  [4] NC_001133.9 [ 73431,  73792]  + | s_cerevisiae exon  
  [5] NC_001133.9 [165314, 165561]  + | s_cerevisiae exon  
  [6] NC_001133.9 [165388, 165781]  + | s_cerevisiae exon  
  phase gene_id  transcript_id exon_number   gene_name
 
  [1]   XLOC_40 TCONS_0191   1FLO9
  [2]   XLOC_40 TCONS_0192   1FLO9
  [3]   XLOC_41 TCONS_0193   1FLO9
  [4]   XLOC_55 TCONS_0200   1   YAL037C-A
  [5]   XLOC_75 TCONS_0100   1 YAR010C
  [6]   XLOC_75 TCONS_0219   1 YAR010C
 oId nearest_ref  class_code
   
  [1]   {TRINITY_GG_normal}16_c1_g1_i1.mrna1rna8   x
  [2]   {TRINITY_GG_normal}16_c0_g1_i1.mrna1rna8   x
  [3]   {TRINITY_GG_normal}12_c0_g1_i1.mrna1rna8   x
  [4]{TRINITY_GG_normal}3_c3_g1_i1.mrna1   rna31   x
  [5] {TRINITY_GG_normal}3479_c0_g1_i1.mrna1   rna77   x
  [6]   {TRINITY_GG_normal}24_c0_g1_i1.mrna1   rna77   x
   tss_id
  
  [1]   TSS42
  [2]   TSS43
  [3]   TSS44
  [4]   TSS71
  [5]  TSS118
  [6]  TSS118
  ---

It is possible to replace the seqlevels names with the list?
I have tried:

bamfile1 <- renameSeqlevels(seqlevels(bamfile), listx)

Thank you for any advice,

Kind regards,

Teresa



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using a mock of an S4 class

2017-02-02 Thread Martin Morgan


On 02/01/2017 02:46 PM, Ramiro Barrantes wrote:

Hello,

I have a function that applies to an S4 object which contains a slot called 
@analysis:

function calculation(myObject) {
  tmp <- myObjects@analysis
  result <- ...operations on analysis...
  return result
}

I am writing a unit test for this function.  So I was hoping to create a mock 
object but I can't figure out how to do it:

test_that("test calculation function", {
  mockMyObject<- mock(?)  #I am not sure what to put here
  r<-calculation(mockMyObject)
  expect_true(r,0.83625)
})

How can I create a mock S4 object??


I don't know of a convenient way to create a mock with functionality 
like mocks in other languages. But here's a class


  .A = setClass("A", contains="integer")

This creates an instance that might be used as a mock

   mock = .A()  # same as new("A")

but maybe you have an initialize method (initialize methods are very 
tricky to get correct, and many people avoid them, using 
plain-old-functions to form an API around object creation; the 
plain-old-function finishes by calling the constructor .A() or new("A")) 
that has side effects that are inappropriate for your test, mimicked 
here with stop()


  setMethod("initialize", "A", function(.Object, ...) stop("oops"))

our initial attempts are thwarted

> .A()
Error in initialize(value, ...) : oops

but we could reach into our bag of hacks and try

  mock = .Call(methods:::C_new_object, getClassDef("A"))

You would still need to populate slots / data used in your test, e.g.,

  slot(mock, ".Data") = 1:4

This is robust to any validity method, since the validity method is not 
invoked on direct slot assignment


  setValidity("A", function(object) {
  if (all(object > 0)) TRUE else "oops2"
  })

  slot(mock, ".Data") = 0:4  # still works

So something like

  mockS4object = function(class, ..., where=topenv(parent.frame())) {
  obj <- .Call(
  methods:::C_new_object,
  getClassDef(class, where=where)
  )

  args = list(...)
  for (nm in names(args))
  slot(obj, nm) = args[[nm]]

  obj
  }
  mockS4object("A", .Data=1:4)

Mock objects typically have useful testing properties, like returning 
the number of times a slot (field) is accessed. Unfortunately, I don't 
have anything to offer for that.


Martin




Thanks in advance,
Ramiro

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error In DESeq installation

2016-10-24 Thread Martin Morgan


On 10/23/2016 10:13 PM, Yogesh Gupta wrote:

Dear All,

I am getting error in DESeq installation in R.

package ‘DESeq’ is not available (for R version 3.3.1)

source("http://www.Bioconductor.org/biocLite.R;)

Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help

biocLite("BiocUpgrade")

Error: Bioconductor version 3.4 cannot be upgraded with R version 3.3.1

Can you suggest me I How I can resolve it.


Ask questions about Bioconductor packages on the Bioconductor support forum

 https://support.bioconductor.org

DESeq was replaced by DESeq2, but is still available; provide (on the 
Bioconductor support site) the complete output of the installation 
attempt and sessionInfo().


'BiocUpgrade' is to update to a more recent version of Bioconductor. 
There is a 'devel' version that is m ore recent that 3.4, but it 
requires R-devel.


Martin



Thanks
Yogesh


*Yogesh Gupta*
*Postdoctoral Researcher*
*Department of Biological Science*
*Seoul National University*
*Seoul, South Korea*
web) http://biosci.snu.ac.kr/jiyounglee
*Cell No. +82-10-6453-0716*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Faster Subsetting

2016-09-28 Thread Martin Morgan

On 09/28/2016 02:53 PM, Hervé Pagès wrote:

Hi,

I'm surprised nobody suggested split(). Splitting the data.frame
upfront is faster than repeatedly subsetting it:

  tmp <- data.frame(id = rep(1:2, each = 10), foo = rnorm(20))
  idList <- unique(tmp$id)

  system.time(for (i in idList) tmp[which(tmp$id == i),])
  #   user  system elapsed
  # 16.286   0.000  16.305

  system.time(split(tmp, tmp$id))
  #   user  system elapsed
  #  5.637   0.004   5.647

an odd speed-up is to provide (non-sequential) row names, e.g.,

> system.time(split(tmp, tmp$id))
   user  system elapsed
  4.472   0.648   5.122
> row.names(tmp) = rev(seq_len(nrow(tmp)))
> system.time(split(tmp, tmp$id))
   user  system elapsed
  0.588   0.000   0.587

for reasons explained here

http://stackoverflow.com/questions/39545400/why-is-split-inefficient-on-large-data-frames-with-many-groups/39548316#39548316

Martin

Cheers,
H.

On 09/28/2016 09:09 AM, Doran, Harold wrote:

I have an extremely large data frame (~13 million rows) that resembles
the structure of the object tmp below in the reproducible code. In my
real data, the variable, 'id' may or may not be ordered, but I think
that is irrelevant.

I have a process that requires subsetting the data by id and then
running each smaller data frame through a set of functions. One
example below uses indexing and the other uses an explicit call to
subset(), both return the same result, but indexing is faster.

Problem is in my real data, indexing must parse through millions of
rows to evaluate the condition and this is expensive and a bottleneck
in my code.  I'm curious if anyone can recommend an improvement that
would somehow be less expensive and faster?

Thank you
Harold

tmp <- data.frame(id = rep(1:200, each = 10), foo = rnorm(2000))

idList <- unique(tmp$id)

### Fast, but not fast enough
system.time(replicate(500, tmp[which(tmp$id == idList[1]),]))

### Not fast at all, a big bottleneck
system.time(replicate(500, subset(tmp, id == idList[1])))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] src/Makevars ignored ?

2016-09-27 Thread Martin Morgan


On 09/26/2016 07:46 AM, Eric Deveaud wrote:



Hello,

as far as I understood the R library generic compilation mechanism,
compilation of C//C++ sources is controlde

1) at system level by the ocntentos RHOME/etc/Makeconf
2) at user level by the content of ~/.R/Makevars
3) at package level by the content of src/Makevars

Problem I have is that src/Makevars is ignored


see following example:

R is compiled and use the following CC and CFLAGS definition

bigmess:epactsR/src > R CMD config CC
gcc -std=gnu99
bigmess:epactsR/src > R CMD config CFLAGS
-Wall -g

so building C sources lead to the following

bigmess:epactsR/src > R CMD SHLIB index.c
gcc -std=gnu99 -I/local/gensoft2/adm/lib64/R/include -DNDEBUG
-I/usr/local/include-fpic  -Wall -g  -c index.c -o index.o

normal, it uses defintion from RHOME/etc/Makeconf


when I set upp a ~/.R/Makevars that overwrite CC and CFLAGS definition.

bigmess:epactsR/src > cat ~/.R/Makevars
CC=gcc
CFLAGS=-O3
bigmess:epactsR/src > R CMD SHLIB index.c
gcc -I/local/gensoft2/adm/lib64/R/include -DNDEBUG  -I/usr/local/include
   -fpic  -O3 -c index.c -o index.o
gcc -std=gnu99 -shared -L/usr/local/lib64 -o index.so index.o


OK CC and CFLAGS are honored and set accordingly to ~/.R/Makevars


but when I try to use src/Makevars, it is ignored

bigmess:epactsR/src > cat ~/.R/Makevars
cat: /home/edeveaud/.R/Makevars: No such file or directory
bigmess:epactsR/src > cat ./Makevars
CC = gcc
CFLAGS=-O3
bigmess:epactsR/src > R CMD SHLIB index.c
gcc -std=gnu99 -I/local/gensoft2/adm/lib64/R/include -DNDEBUG
-I/usr/local/include-fpic  -Wall -g  -c index.c -o index.o


what I have missed or is there something wrong ?


Use PKG_CFLAGS instead of CFLAGS; CC cannot be changed in Makevars. See 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-Makevars


Martin Morgan




PS I tested the ssame behaviour with various version of R from R/2.15 to
R/3.3

best regards

Eric

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] makePSOCKcluster launches different version of R

2016-08-05 Thread Martin Morgan


On 08/05/2016 12:07 PM, Guido Kraemer wrote:

Hi everyone,

we are running R on a Linux Cluster with several R versions installed in
parallel.

If I run:

library(parallel)
cl <- makePSOCKcluster(
  rep('nodeX', 24),
  homogeneous = FALSE,
  rscript = '/usr/local/apps/R/R-3.2.2/bin/Rscript'
)


from ?makePSOCKcluster

 'homogeneous' Logical.  Are all the hosts running identical
  setups, so 'Rscript' can be launched using the same path on
  each?  Otherwise 'Rscript' has to be in the default path on
  the workers.

 'rscript' The path to 'Rscript' on the workers, used if
  'homogeneous' is true. Defaults to the full path on the
  master.

so homogeneous = FALSE and rscript = ... are incompatible. From your 
description it seems like you mean homogeneous = TRUE.


Martin



then still R-3.0.0 gets launched on nodeX. Version 3.0.0 is the default
R version, which is started when I just type R in the terminal without
any further configuration.

Cheers,
Guido

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Get the location of a numeric element in a list

2016-06-28 Thread Martin Morgan


On 06/28/2016 03:03 AM, Mohammad Tanvir Ahamed via R-help wrote:

Can any one please help me. I will apply this for a very large list, about 400k 
vector in a list and vector size is unequal and large

Example :
Input:
a <- c(1,3,6,9,25,100)
b<-c(10,7,20,2,25)
c<-c(1,7,5,15,25,300,1000)
d<-list(a,b,c)

Expected outcome :
# When looking for 1 in d
c(1,3)

# When looking for 7 in d

c(2,3)

# when looking for 25 in d
c(1,2,3)
# When looking for 50 in d
NULL or 0


Make a vector of queries

queries = c(1, 7, 25, 50)

Create a factor of unlist(d), using queries as levels. Create a vector 
rep(seq_along(d), lengths(d)), and split it into groups defined by f


f = factor(unlist(d, use.names=FALSE), levels=queries)
split(rep(seq_along(d), lengths(d)), f)

Martin Morgan




Thanks in advance !!





Tanvir Ahamed
Göteborg, Sweden  |  mashra...@yahoo.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Warning when running R - can't install packages either

2016-05-13 Thread Martin Morgan


Hi Jakub,

This is really a separate question. It is not really end-user related, 
and should be asked on the R-devel mailing list. Nonetheless, some 
answers below.


On 05/13/2016 03:55 PM, Jakub Jirutka wrote:

Hi,

I’m maintainer of the R package in Alpine Linux.

I read on multiple places that some packages needs R_HOME variable
set to the location where is R installed, so I’ve added it to the
system-wide profile. Is this correct, or a misinformation?


R_HOME is set when R starts

~$ env|grep R_HOME
~$ R --vanilla -e "Sys.getenv('R_HOME')"
> Sys.getenv('R_HOME')
[1] "/home/mtmorgan/bin/R-3-3-branch"

and (after reading the documentation in ?R_HOME it the R help system)

~$ R RHOME
/home/mtmorgan/bin/R-3-3-branch

so there is no need to set it in a system-wide profile. It is sometimes 
referenced inside an R package source tree that uses C or other compiled 
code in a Makevars file, as described in the 'Writing R Extensions' manual


https://cran.r-project.org/doc/manuals/r-release/R-exts.html
e.g., the section on configure and cleanup

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Configure-and-cleanup

In these circumstances it has been set by the R process that is 
compiling the source code.




What system dependencies does R need to compile modules from CRAN? On
Alpine the following dependencies are needed to build R: bzip2-dev
curl-dev gfortran lapack-dev pcre-dev perl readline-dev xz-dev
zlib-dev. Are all of these dependencies needed for compiling
modules?


As you say, those look like dependencies required to build R itself.

Individual packages may have dependencies on these or other system 
libraries, but many packages do not have system dependencies. It is up 
to the package maintainer to ensure that appropriate checks are made to 
discover the system resource; there are probably dozens or even hundreds 
of system dependencies amongst all of the CRAN packages. Typically the 
task of satisfying those dependencies is left to the user (or to those 
creating distributions of R packages, e.g., 
https://cran.r-project.org/bin/linux/debian/)


Martin Morgan



Jakub

On 13. May 2016, at 11:31, Martin Morgan
<martin.mor...@roswellpark.org> wrote:




On 05/12/2016 10:25 PM, Alba Pompeo wrote:

Martin Morgan, I tried an HTTP mirror and it worked. What could
be the problem and how to fix? Also, should I ignore the warning
about ignoring environment value of R_HOME?


It depends on why you set the value in your environment in the
first place; maybe you were trying to use a particular installation
of R, but setting R_HOME is not the way to do that (I use an alias,
e.g., R-3.3='~/bin/R-3-3-branch/bin/R --no-save --no-restore
--silent')

Martin


Thanks.

On Thu, May 12, 2016 at 5:59 PM, Tom Hopper <tomhop...@gmail.com>
wrote:

setInternet2() first thing after launching R might fix that.



On May 12, 2016, at 07:45, Alba Pompeo <albapom...@gmail.com>
wrote:

Hello.

I've tried to run R, but I receive many warnings and can't do
simple stuff such as installing packages.

Here's the full log when I run it.

http://pastebin.com/raw/2BkNpTte

Does anyone know what could be wrong here?

Thanks a lot.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide
http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
the posting guide http://www.R-project.org/posting-guide.html and
provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or
confidential information.  If you are not the intended
recipient(s), or the employee or agent responsible for the delivery
of this message to the intended recipient(s), you are hereby
notified that any disclosure, copying, distribution, or use of this
email message is prohibited.  If you have received this message in
error, please notify the sender immediately by e-mail and delete
this email message from your computer. Thank you.





This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Warning when running R - can't install packages either

2016-05-13 Thread Martin Morgan




On 05/12/2016 10:25 PM, Alba Pompeo wrote:

Martin Morgan, I tried an HTTP mirror and it worked.
What could be the problem and how to fix?
Also, should I ignore the warning about ignoring environment value of R_HOME?


It depends on why you set the value in your environment in the first 
place; maybe you were trying to use a particular installation of R, but 
setting R_HOME is not the way to do that (I use an alias, e.g., 
R-3.3='~/bin/R-3-3-branch/bin/R --no-save --no-restore --silent')


Martin


Thanks.

On Thu, May 12, 2016 at 5:59 PM, Tom Hopper <tomhop...@gmail.com> wrote:

setInternet2() first thing after launching R might fix that.



On May 12, 2016, at 07:45, Alba Pompeo <albapom...@gmail.com> wrote:

Hello.

I've tried to run R, but I receive many warnings and can't do simple
stuff such as installing packages.

Here's the full log when I run it.

http://pastebin.com/raw/2BkNpTte

Does anyone know what could be wrong here?

Thanks a lot.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Warning when running R - can't install packages either

2016-05-13 Thread Martin Morgan

On 05/12/2016 10:25 PM, Alba Pompeo wrote:

Martin Morgan, I tried an HTTP mirror and it worked.
What could be the problem and how to fix?

The problem is in the warning message

1: In download.file(url, destfile = f, quiet = TRUE) :
URL 'https://cran.r-project.org/CRAN_mirrors.csv': status was 'Problem 
with the SSL CA cert (path? access rights?)'

and an easier way to reproduce / troubleshoot the problem is

download.file("https://cran.r-project.org/CRAN_mirrors.csv;, 
tempfile())

The details of this process are described in ?download.file. My guess 
would be that you have 'libcurl' available

> capabilities()["libcurl"]
libcurl
   TRUE

that it supports https (mine does, in the protocol attribute):

> libcurlVersion()
[1] "7.35.0"
attr(,"ssl_version")
[1] "OpenSSL/1.0.1f"
attr(,"libssh_version")
[1] ""
attr(,"protocols")
 [1] "dict"   "file"   "ftp""ftps"   "gopher" "http"   "https" 
"imap"
 [9] "imaps"  "ldap"   "ldaps"  "pop3"   "pop3s"  "rtmp"   "rtsp" 
"smtp"

[17] "smtps"  "telnet" "tftp"

and that you have outdated or other CA certificates problem, with some 
hints for troubleshooting in the first and subsequent paragraphs of the 
'Secure URL' section.

Martin Morgan

Also, should I ignore the warning about ignoring environment value of R_HOME?
Thanks.

On Thu, May 12, 2016 at 5:59 PM, Tom Hopper <tomhop...@gmail.com> wrote:

setInternet2() first thing after launching R might fix that.

On May 12, 2016, at 07:45, Alba Pompeo <albapom...@gmail.com> wrote:

Hello.

I've tried to run R, but I receive many warnings and can't do simple
stuff such as installing packages.

Here's the full log when I run it.

http://pastebin.com/raw/2BkNpTte

Does anyone know what could be wrong here?

Thanks a lot.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Warning when running R - can't install packages either

2016-05-12 Thread Martin Morgan




On 05/12/2016 07:45 AM, Alba Pompeo wrote:

Hello.

I've tried to run R, but I receive many warnings and can't do simple
stuff such as installing packages.

Here's the full log when I run it.

http://pastebin.com/raw/2BkNpTte

Does anyone know what could be wrong here?


do you have any success when choosing a non-https mirror, #28 in your 
screenshot?


Martin Morgan



Thanks a lot.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4 non-virtual class with no slots?

2016-04-22 Thread Martin Morgan




On 04/22/2016 04:38 PM, Boylan, Ross wrote:

It seems that if an S4 class has no slots it can't be instantiated because it 
is assumed to be virtual.  Is there a way around this other than adding a 
do-nothing slot?  A singleton would be OK, though is not essential.

Problem:
EmptyFitResult <- setClass("EmptyFitResult", representation=representation())
# also tried it without the second argument.  same result.
  > e <- EmptyFitResult()
  Error in new("EmptyFitResult", ...) :
trying to generate an object from a virtual class ("EmptyFitResult")

This in R 3.1.1.

Context:
I fit simulated data; in some simulations none survive to the second stage of 
fitting.  So I just need a way to record that this happened, in a way that 
integrates with my other non-null results.



A not too artificial solution is to create a base class, with derived 
classes corresponding to stateless or stateful conditions


Base = setClass("Base"); A = setClass("A", contains="Base"); A()

Martin Morgan


Thanks.
Ross Boylan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] what is the faster way to search for a pattern in a few million entries data frame ?

2016-04-10 Thread Martin Morgan

On 04/10/2016 03:27 PM, Fabien Tarrade wrote:

Hi Duncan,

Didn't you post the same question yesterday?  Perhaps nobody answered
because your question is unanswerable.

sorry, I got a email that my message was waiting for approval and when I
look at the forum I didn't see my message and this is why  I sent it
again and this time I did check that the format of my message was text
only. Sorry for the noise.

You need to describe what the strings are like and what the patterns
are like if you want advice on speeding things up.

my strings are 1-gram up to 5-grams (sequence of 1 work up to 5 words)
and I am searching for the frequency in my DF of the strings starting
with a sequence of few words.

I guess these days it is standard to use DF with millions of entries so
I was wondering how people are doing that in the faster way.

I did this to generate and search 40 million unique strings

> grams <- as.character(1:4e7)## a long time passes...
> system.time(grep("^91", grams)) ## similar times to grepl
   user  system elapsed
 10.384   0.168  10.543

Is that the basic task you're trying to accomplish? grep(l) goes quickly 
to C, so I don't think data.table or other will be markedly faster if 
you're looking for an arbitrary regular expression (use fixed=TRUE if 
looking for an exact match).

If you're looking for strings that start with a pattern, then in R-3.3.0 
there is

> system.time(res0 <- startsWith(grams, "91"))
   user  system elapsed
  0.658   0.012   0.669

which returns the same result as grepl

> identical(res0, res1 <- grepl("^91", grams))
[1] TRUE

One can also parallelize the already vectorized grepl function with 
parallel::pvec, with some opportunity for gain (compared to grepl) on 
non-Windows

> system.time(res2 <- pvec(seq_along(grams), function(i) 
grepl("^91", grams[i]), mc.cores=8))

   user  system elapsed
 24.996   1.709   3.974
> identical(res0, res2)
[[1]] TRUE

I think anything else would require pre-processing of some kind, and 
then some more detail about what your data looks like is required.

Martin Morgan

Thanks
Cheers
Fabien

This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ask if an object will respond to a function or method

2016-03-31 Thread Martin Morgan




On 03/31/2016 04:00 PM, Paul Johnson wrote:

In the rockchalk package, I want to provide functions for regression
objects that are "well behaved." If an object responds to the methods
that lm or glm objects can handle, like coef(), nobs(), and summary(),
I want to be able to handle the same thing.

It is more difficult than expected to ask a given fitted model object
"do you respond to these functions: coef(), nobs(), summary()." How
would you do it?

I tried this with the methods() function but learned that all methods
that a class can perform are not listed.  I'll demonstrate with a
regression "zz" that is created by the example in the plm package.
The coef() function succeeds on the zz object, but coef is not listed
in the list of methods that the function can carry out.


library(plm)
example(plm)



class(zz)

[1] "plm""panelmodel"

methods(class = "plm")

  [1] ercomp  fixef   has.intercept   model.matrix
  [5] pFtest  plmtest plotpmodel.response
  [9] pooltestpredict residuals   summary
[13] vcovBK  vcovDC  vcovG   vcovHC
[17] vcovNW  vcovSCC
see '?methods' for accessing help and source code

methods(class = "panelmodel")

  [1] deviance  df.residual   fittedhas.intercept index
  [6] nobs  pbgtest   pbsytest  pcdtest   pdim
[11] pdwtest   phtestprint pwartest  pwfdtest
[16] pwtestresiduals terms updatevcov
see '?methods' for accessing help and source code

coef(zz)

log(pcap)  log(pc) log(emp)unemp
-0.026149654  0.292006925  0.768159473 -0.005297741

I don't understand why coef(zz) succeeds but coef is not listed as a method.


coef(zz) finds stats:::coef.default, which happens to do the right thing 
for zz but also 'works' (returns without an error) for things that don't 
have coefficients, e.g., coef(data.frame()).


stats:::coef.default is

> stats:::coef.default
function (object, ...)
object$coefficients

Maybe fail on use, rather than trying to guess up-front that the object 
is fully appropriate?


Martin Morgan



Right now, I'm contemplating this:

zz1 < - try(coef(zz))
if (inherits(zz1, "try-error")) stop("Your model has no coef method")

This seems like a bad workaround because I have to actually run the
function in order to find out if the function exists. That might be
time consuming for some summary() methods.

pj




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Persistent state in a function?

2016-03-23 Thread Martin Morgan

Use a local environment to as a place to store state. Update with <<- 
and resolve symbol references through lexical scope E.g.,

persist <- local({
last <- NULL# initialize
function(value) {
if (!missing(value))
last <<- value  # update with <<-
last# use
}
})

and in action

> persist("foo")
[1] "foo"
> persist()
[1] "foo"
> persist("bar")
[1] "bar"
> persist()
[1] "bar"

A variant is to use a 'factory' function

factory <- function(init) {
stopifnot(!missing(init))
last <- init
function(value) {
if (!missing(value))
last <<- value
last
}
}

and

> p1 = factory("foo")
> p2 = factory("bar")
> c(p1(), p2())
[1] "foo" "bar"
> c(p1(), p2("foo"))
[1] "foo" "foo"
> c(p1(), p2())
[1] "foo" "foo"

The 'bank account' exercise in section 10.7 of RShowDoc("R-intro") 
illustrates this.

Martin

On 03/19/2016 12:45 PM, Boris Steipe wrote:

Dear all -

I need to have a function maintain a persistent lookup table of results for an 
expensive calculation, a named vector or hash. I know that I can just keep the 
table in the global environment. One problem with this approach is that the 
function should be able to delete/recalculate the table and I don't like 
side-effects in the global environment. This table really should be private. 
What I don't know is:
  -A- how can I keep the table in an environment that is private to the 
function but persistent for the session?
  -B- how can I store and reload such table?
  -C- most importantly: is that the right strategy to initialize and maintain 
state in a function in the first place?

For illustration ...

---

myDist <- function(a, b) {
 # retrieve or calculate distances
 if (!exists("Vals")) {
 Vals <<- numeric() # the lookup table for distance values
# here, created in the global env.
 }
 key <- sprintf("X%d.%d", a, b)
 thisDist <- Vals[key]
 if (is.na(thisDist)) {  # Hasn't been calculated yet ...
 cat("Calculating ... ")
 thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function 
...
 Vals[key] <<- thisDist  # store in global table
 }
 return(thisDist)
}

# run this
set.seed(112358)

for (i in 1:10) {
 x <- sample(1:3, 2)
 print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2])))
}

Thanks!
Boris

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regex - extracting src url

2016-03-22 Thread Martin Morgan




On 03/22/2016 12:44 AM, Omar André Gonzáles Díaz wrote:

Hi,I have a DF with a column with "html", like this:

https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=?;
BORDER="0" HEIGHT="1" WIDTH="1" ALT="Advertisement">


I need to get this:


https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=
?


I've got this so far:


https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=?\;
BORDER=\"0\" HEIGHT=\"1\" WIDTH=\"1\" ALT=\"Advertisement


With this is the code I've used:

carreras_normal$Impression.Tag..image. <-
gsub("","\\1",carreras_normal$Impression.Tag..image.,
   ignore.case = T)



*But I still need to use get rid of this part:*


https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=
?*\" BORDER=\"0\" HEIGHT=\"1\" WIDTH=\"1\" ALT=\"Advertisement*


Thank you for your help.


You're querying an xml string, so use xpath, e.g., via the XML library

> as.character(xmlParse(y)[["//IMG/@SRC"]])
[1] 
"https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=?;


`xmlParse()` translates the character string into  an XML document. `[[` 
subsets the document to extract a single element. "//IMG/@SRC" follows 
the xpath specification (this section 
https://www.w3.org/TR/xpath-31/#abbrev of the specification provides a 
quick guide) to find, starting from the 'root' of the document, a node, 
at any depth, labeled IMG containing an attribute labeled SRC.


A variation, if there were several IMG tags to be extracted, would be

  xpathSApply(xmlParse(y), "//IMG/@SRC", as.character)



Omar Gonzáles.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R problem : Error: protect(): protection stack overflow

2016-03-15 Thread Martin Morgan




On 03/14/2016 06:39 PM, Mohammad Tanvir Ahamed via R-help wrote:

Hi, i got an error while i am running a big data. Error has explained
by the following sample sample


This is an error in the package, and should be reported to the 
maintainer. Discover the maintainer with the command


maintainer("impute")






## Load data mdata <-
as.matrix(read.table('https://gubox.box.com/shared/static/qh4spcxe2ba5ymzjs0ynh8n8s08af7m0.txt',
header = TRUE, check.names = FALSE, sep = '\t'))

## Install and load library
source("https://bioconductor.org/biocLite.R;) biocLite("impute")
library(impute)

## sets a limit on the number of nested expressions
options(expressions = 50) ## Apply k-nearest neighbors for
missing value imputation

res <-impute.knn(mdata)

Error: protect(): protection stack overflow


If anybody has solution or suggestion, please share. Thanks .



Tanvir Ahamed Göteborg, Sweden  |  mashra...@yahoo.com

__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there orphan code in seq.default?

2015-08-15 Thread Martin Morgan


On 08/15/2015 02:01 PM, David Winsemius wrote:

I was looking at the code in seq.default and saw code that I think would throw 
an error if it were ever executed, although it will not because there is first 
a test to see if one of its arguments is missing. Near the end of the function 
body is this code:

 else if (missing(by)) {
 if (missing(to))
 to - from + length.out - 1L
 if (missing(from))
 from - to - length.out + 1L
 if (length.out  2L)
 if (from == to)
 rep.int(from, length.out)
 else as.vector(c(from, from + seq_len(length.out -
 2L) * by, to))

Notice that the last call to `else` would be returning a value calculated with 
'by' which was already established as missing.



missing arguments can have default values

 f = function(by=sea) if (missing(by)) by
 f()
[1] sea

which is the case for seq.default

 args(seq.default)
function (from = 1, to = 1, by = ((to - from)/(length.out - 1)),
length.out = NULL, along.with = NULL, ...)


Martin Morgan
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Release schedule (was (no subject) )

2015-08-05 Thread Martin Morgan


On 08/05/2015 10:08 AM, Jeff Newmiller wrote:

New versions are released when they are ready. This is volunteer-driven 
software.


From https://developer.r-project.org/ :

The overall release schedule is to have annual x.y.0 releases in Spring, with 
patch releases happening on an as-needed basis. It is intended to have a final 
patch release of the previous version shortly before the next major release.



---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On August 5, 2015 5:55:21 AM EDT, Djossè Parfait djosseparf...@gmail.com 
wrote:

Good morning,

I would like to know how often per year is a new full version release
of R.



Thanks

--
Djossè Parfait BODJRENOU
Chef de la Division Centralisation et Analyse des Données
Statistiques /DPP/MESFTPRIJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] writing binary data from RCurl and postForm

2015-08-05 Thread Martin Morgan


On 08/05/2015 11:52 AM, Greg Donald wrote:

I'm using RCurl with postForm() to post to a URL that responds with a
PDF.  I cannot figure out how to write the resulting PDF data to a
file without corruption.

result = postForm(url, binary=TRUE)

Both this:

capture.output(result, file='/tmp/export.pdf')

and this:

f = file('/tmp/export.pdf', 'wb')
write(result, f)
close(f)

result in a corrupted PDF.

I also tried postForm without binary=TRUE but that halts execution
with an embedded nul in string error.

I also tried writeBin() but that complains about my result not being a vector.


I think that is because the value returned from postForm has an attribute; 
remove it by casting the return to a vector


  fl - tempfile(fileext=.pdf)
  writeBin(as.vector(postForm(url, binary=TRUE)), fl)


The httr package might also be a good bet

  writeBin(content(POST(url)), fl)



I can use curl on the command line and this works fine, but I need to
get this working in R.  Any help would be greatly appreciated.

Thanks.

--
Greg Donald

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error when compiling R-2.5.1 / *** [d-p-q-r-tests.Rout] Fehler 1

2015-08-01 Thread Martin Morgan


On 07/31/2015 10:48 PM, Joerg Kirschner wrote:

Hi everyone,
I am new to Linux and R - but I managed to build R-2.5.1 from source to use
it in Genepattern. Genepattern does only support R-2.5.1 which I could not
find anywhere for installation via apt-get or in the Ubuntu Software-Centre
(I am using Ubuntu 14.04 (Trusty Tahr) 32-bit)


Are you sure you want to do this? R 2.5.1 is from 2007, which is a very long 
time ago. It seems like GenePattern is not restricted to R-2.5.1,



http://www.broadinstitute.org/cancer/software/genepattern/administrators-guide#using-different-versions-of-r

and if their default distribution uses it, then I'm not sure I'd recommend using 
GenePattern for new analysis! (Maybe you're trying to re-do a previous analysis?)


Since GenePattern modules that use R typically wrap individual CRAN or 
Bioconductor (http://bioconductor.org) packages, maybe you can take out the 
middleman ?


Martin Morgan



But after doing

make check


I get

comparing 'method-dispatch.Rout' to './method-dispatch.Rout.save' ... OK
running code in 'd-p-q-r-tests.R' ...make[3]: *** [d-p-q-r-tests.Rout]
Fehler 1
make[3]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen
make[2]: *** [test-Specific] Fehler 2
make[2]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen
make[1]: *** [test-all-basics] Fehler 1
make[1]: Verzeichnis »/home/karin/Downloads/R-2.5.1/tests« wird verlassen
make: *** [check] Fehler 2


but I can make install and use R for simple plots etc. afterwards - still I
am worried something is wrong, can you give some advice.

A closer look at the error gives


## PR#7099 : pf() with large df1 or df2:
nu - 2^seq(25,34, 0.5)
y - 1e9*(pf(1,1,nu) - 0.68268949)
stopifnot(All.eq(pf(1,1,Inf), 0.68268949213708596),

+   diff(y)  0, # i.e. pf(1,1, *) is monotone increasing
+   All.eq(y [1], -5.07420372386491),
+   All.eq(y[19],  2.12300110824515))
Error: All.eq(y[1], -5.07420372386491) is not TRUE
Execution halted


As I understand so far some errors are critical some are not - can you
please give some advice on the error above? Can I still use R installed
with that error? What do I need to solve the error?

Thanks, Joerg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4 / operator [ : Compatibility issue between lme4 and kml

2015-06-06 Thread Martin Morgan


On 06/05/2015 10:52 AM, Martin Maechler wrote:

Christophe Genolini cgeno...@u-paris10.fr
 on Fri, 5 Jun 2015 00:36:42 -0700 writes:


  Hi all,
  There is a compatibility issue between the package 'lme4' and my package
  'kml'. I define the [ operator. It works just fine in my package (1). 
If I
  try to use the lme4 package, then it does no longer work (2). Moreover, 
it
  has some kind of strange behavior (3). Do you know what is wrong? Any 
idea
  of how I can correct that?

  Here is a reproductible example, and the same code with the result 
follows.

  Thanks for your help
  Christophe

   [ ... I'm providing slightly different code below  ]


--- 8 - Execution of the previous code ---



library(kml)

Le chargement a nécessité le package : clv
Le chargement a nécessité le package : cluster
Le chargement a nécessité le package : class
Le chargement a nécessité le package : longitudinalData
Le chargement a nécessité le package : rgl
Le chargement a nécessité le package : misc3d

dn - gald(1)



  ###
### (1) the [ operator works just fine



dn[traj]

   t0   t1t2t3t4   t5   t6t7t8t9   t10
i1 -3.11 4.32  2.17  1.82  4.90 7.34 0.83 -2.70  5.36  4.96  3.16
i2 -7.11 1.40 -2.40 -2.96  4.31 0.50 1.25  0.52 -0.04  7.55  5.50
i3  2.80 6.23  6.08  2.87  2.58 2.88 6.58 -2.38  2.30 -1.74 -3.23
i4  2.24 0.91  6.50 10.92 11.32 7.79 7.78 10.69  9.15  1.07 -0.51



  ###
### (2) using 'lme4', it does no longer work



library(lme4)

Le chargement a nécessité le package : Matrix
Le chargement a nécessité le package : Rcpp

dn[traj]

Error in x[i, j] :
   erreur d'évaluation de l'argument 'j' lors de la sélection d'une méthode
pour la fonction '[' : Erreur : l'argument j est manquant, avec aucune
valeur par défaut



  ###
### (3) If I define again the [, it does not work the first time I call
it, but it work the second time!

setMethod([,

+   signature=signature(x=ClusterLongData, i=character, j=ANY,drop=ANY),
+   definition=function (x, i, j=missing, ..., drop = TRUE){


Your file has two definitions of

  setMethod([, c(ClusterLongData, ...

I deleted the first one.

The second definition had

signature=signature(x=ClusterLongData, i=character, j=ANY,drop=ANY),

whereas probably you mean to say that you'll handle

signature=signature(x=ClusterLongData, i=character,
j=missing, drop=ANY)

The next line says

definition=function (x, i, j=missing, ..., drop = TRUE){

which provides a default value for 'j' when j is not provided by the user. Thus 
later when you say


   x[i, j]

you are performing dn[traj, missing] when probably you meant

  x[i, , drop=drop]

Making these changes, so the definition is

setMethod(
[,
signature=signature(x=ClusterLongData, i=character, j=missing,
  drop=ANY),
definition=function (x, i, j, ..., drop = TRUE){
if (is.numeric(i)) {
stop([ClusterLongData:getteur]: to get a clusters list, use 
['ci'])
}else{}
if (i %in% c(criterionValues, criterionValuesAsMatrix)){
j - x['criterionActif']
}else{}
if (i %in% c(CRITERION_NAMES, criterionActif, CLUSTER_NAMES,
 criterionValues, criterionValuesAsMatrix, sorted,
 initializationMethod)) {
x - as(x, ListPartition)
}else{
x - as(x, LongData)
}
x[i, , drop=drop]
})

Allows operations to work correctly.

 library(kml)
Loading required package: clv
Loading required package: cluster
Loading required package: class
Loading required package: longitudinalData
Loading required package: rgl
Loading required package: misc3d
 library(Matrix)
 x = gald(1)[traj]
 x
  t0t1t2t3t4t5t6t7t8t9   t10
i1 -3.18 -1.19 -1.17  1.56 -0.70  1.78 -0.95 -2.00 -5.05  1.05  2.84
i2  3.51  1.72  6.97  6.09  7.81  8.33  9.54 14.38 16.14 12.82 13.86
i3  9.60 11.59  9.09  6.31  9.24  7.69  4.26 -0.80  2.70  1.63  1.21
i4 -0.54  3.80  6.05 10.41 12.60 12.32 10.33 11.05  7.89  5.21  0.67

It's hard to tell whether is an issue with the methods package, or just that 
Matrix offered a better nearest 'method' than those provided by kml / 
longitudinalData.





+   x - as(x, LongData)
+   return(x[i, j])
+ }
+ )
[1] [



### No working the first time I use it

dn[traj]

Error in dn[traj] :
   l'argument j est manquant, avec aucune valeur par défaut



### But working the second time

dn[traj]

   t0   t1t2t3t4   t5   t6t7t8t9   t10
i1 -3.11 4.32  2.17  1.82  4.90 7.34 0.83 -2.70  5.36  4.96  3.16
i2 -7.11 1.40 -2.40 -2.96  4.31 0.50 1.25  0.52 -0.04  7.55  5.50
i3  2.80 6.23  6.08  2.87  2.58 2.88 6.58 -2.38  2.30 -1.74 -3.23
i4  2.24 0.91  6.50 10.92 11.32 7.79 7.78 10.69  9.15  1.07 -0.51


I have made some investigations, but have to stop for now, and
leave this hopefully to

Re: [R] is.na for S4 object

2015-06-04 Thread Martin Morgan


On 06/04/2015 10:08 AM, cgenolin wrote:

Hi the list,

I have a variable y that is either NA or some S4 object. I would like to
know in which case I am, but it seems taht is.na does not work with S4
object, I get a warnings:

--- 8 
setClass(myClass,slots=c(x=numeric))
if(runif(1)0.5){a - new(myClass)}else{a - NA}
is.na(a)
--- 8 

Any solution?


getGeneric(is.na)

shows that it's an S4 generic, so implement a method

setMethod(is.na, myClass, function(x) FALSE)

Martin


Thanks

Christophe




--
View this message in context: 
http://r.789695.n4.nabble.com/is-na-for-S4-object-tp4708201.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can't seem to install packages

2015-05-28 Thread Martin Morgan


On 05/28/2015 08:21 AM, Duncan Murdoch wrote:

On 28/05/2015 6:10 AM, Claire Rioualen wrote:

Hello,

I can't seem to install R packages, since it seemed there were some
permission problems I chmoded /usr/share/R/ and /usr/lib/R/. However,
there are still errors in the process. Here's my config:

 sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] ggplot2_1.0.1BiocInstaller_1.16.5

loaded via a namespace (and not attached):
  [1] colorspace_1.2-6 digest_0.6.8 grid_3.1.1   gtable_0.1.2
  [5] magrittr_1.5 MASS_7.3-40  munsell_0.4.2plyr_1.8.2
  [9] proto_0.3-10 Rcpp_0.11.6  reshape2_1.4.1   scales_0.2.4
[13] stringi_0.4-1stringr_1.0.0tcltk_3.1.1  tools_3.1.1

And here are some packages I tried to install:

* install.packages(XML)*
Installing package into ���/packages/rsat/R-scripts/Rpackages���
(as ���lib��� is unspecified)
trying URL 'http://ftp.igh.cnrs.fr/pub/CRAN/src/contrib/XML_3.98-1.1.tar.gz'
Content type 'text/html' length 1582216 bytes (1.5 Mb)
opened URL
==
downloaded 1.5 Mb

* installing *source* package ���XML��� ...
** package ���XML��� successfully unpacked and MD5 sums checked
checking for gcc... gcc
checking for C compiler default output file name... rm: cannot remove
'a.out.dSYM': Is a directory
a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for sed... /bin/sed
checking for pkg-config... /usr/bin/pkg-config
checking for xml2-config... no
Cannot find xml2-config
ERROR: configuration failed for package ���XML���
* removing ���/packages/rsat/R-scripts/Rpackages/XML���


this is a missing system dependency, requiring the libxml2 'dev' headers. On my 
linux this is


  sudo apt-get installl libxml2-dev

likely you'll also end up needing curl via libcurl4-openssl-dev or similar



The downloaded source packages are in
 ���/tmp/RtmphODjkn/downloaded_packages���
Warning message:
In install.packages(XML) :
   installation of package ���XML��� had non-zero exit status


* install.packages(Biostrings)*
Installing package into ���/packages/rsat/R-scripts/Rpackages���
(as ���lib��� is unspecified)
Warning message:
package ���Biostrings��� is not available (for R version 3.1.1)




* biocLite(Biostrings)*



Yes,Bioconductor versions packages differently from CRAN (we have twice-yearly 
releases and stable 'release' and 'devel' branches). Following the instructions 
for package installation at


http://bioconductor.org/packages/Biostrings

but...



[...]
io_utils.c:16:18: fatal error: zlib.h: No such file or directory
  #include zlib.h
   ^


this seems like a relatively basic header to be missing, installable from 
zlib1g-dev, but I wonder if you're taking a mis-step earlier, e.g., trying to 
install on a cluster node that is configured for software use but not installation?


Also the instructions here to install R

  http://cran.r-project.org/bin/linux/

would likely include these basic dependencies 'out of the box'.

Martin


compilation terminated.
/usr/lib/R/etc/Makeconf:128: recipe for target 'io_utils.o' failed
make: *** [io_utils.o] Error 1
ERROR: compilation failed for package ���Biostrings���
* removing ���/packages/rsat/R-scripts/Rpackages/Biostrings���

The downloaded source packages are in
 ���/tmp/RtmphODjkn/downloaded_packages���
Warning message:
In install.packages(pkgs = pkgs, lib = lib, repos = repos, ...) :
   installation of package ���Biostrings��� had non-zero exit status


I've used R on several machines before and never had such problems.
Thanks for any clue!


It's hard to read your message (I think it was posted in HTML), but I think
those are all valid errors in building those packages.  You appear to be missing
some of their dependencies.  This is not likely related to permissions.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center

Re: [R] Help manipulating 23andme data in R - reproducing relationship results

2015-05-17 Thread Martin Morgan


On 05/17/2015 01:52 PM, Lyle Warren wrote:

Thanks Jeff, Bert!

You are right - definitely out of my skill area.. I've no found some help
on the bioconductor mailing list.


I'm not sure that you've asked in the right place

  https://support.bioconductor.org

see also http://www.vincebuffalo.com/2012/03/12/23andme-gwascat.html which is a 
little dated and maybe not relevant to your question.


A little tangentially, see also https://support.bioconductor.org/p/67444/

Martin Morgan



On 18 May 2015 at 03:04, Bert Gunter gunter.ber...@gene.com wrote:


(No response necessary)

What struck me about this post was the apparent mismatch: the OP
seemed not to have a clue where to begin. Maybe he somehow has been
assigned or chose a task for which his skills and background are
inadequate. This is not really a criticism: if someone told me to make
a dining room set, my reply would be: Either find someone else or see
you in about a year after which I may have learned enough to attempt
the task. 

So maybe the OP should give up looking for internet advice altogether
and find someone local to work with?

And, of course, apologies if I have misinterpreted.

Cheers,
Bert



Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Sun, May 17, 2015 at 9:52 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:

This is a very domain-specific question (genetic data analysis), not so

much a question about how to use R, so does not seem on topic here. I also
suspect that the company 23andme may use some proprietary algorithms, so
replicating their results could be a tall order.


You might start with the CRAN Statistical Genetics task view, and a

textbook on the subject. The Bioconductor project may also be a useful
resource.



---

Jeff NewmillerThe .   .  Go

Live...

DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live

Go...

   Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.

rocks...1k



---

Sent from my phone. Please excuse my brevity.

On May 16, 2015 6:53:46 AM PDT, Lyle Warren lyl...@gmail.com wrote:

Hi,

I'm trying to replicate 23andMe's parentage test results within R,
using
the 23andme raw data. Does anyone know a simple way to do this? I have
read
the data with gwascat and it seems to be in there fine.

Thanks for any help you can give!

Cheers,

Lyle

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] some general advice sought on the use of gctorture()

2015-04-24 Thread Martin Morgan


On 04/24/2015 06:49 AM, Franckx Laurent wrote:

Dear all

I have bumped into the dreaded 'segfault' error type when running some C++
code using .Call().


segfaults often involve invalid memory access at the C level that are best 
discovered via valgrind or similar rather than gctorture. A good way to spot 
these is to


(a) come up with a _minimal_ reproducible script test.R that takes just a few 
seconds to run and that tickles, at least some times, the segfault


(b) make sure that your package is compiled without optimizations and with 
debugging symbols, e.g., in  ~/.R/Makevars add the lines


  CFLAGS=-ggdb -O0
  CXXFLAGS=-ggdb -O0

(c) run the code under 'valgrind'

  R -d valgrind -f test.r

Look especially for 'invalid read' or 'invalid write' messages, and isolate 
_your_ code in the callback that the message produces.


There is a 'worked example' at

  http://bioconductor.org/developers/how-to/c-debugging/#case-study

Of course this might lead to nothing, and then you'll be back to your original 
question about using gctorture or other strategies.


Martin Morgan



I have already undertaken several attempts to debug the C++ code with gdb(),
but until now I have been unable to pinpoint the origin of the problem. There
are two elements that I think are puzzling (a) this .Call() has worked fine
for about three years, for a variety of data (b)  the actual crash occurs at
random points during the execution of the function (well, random from a human
eye's point of view).


From what I understand in the R extensions manual, the actual problem may
have been around for a while before the actual call to the C++ code. As
recommended in the manual, I am now using  gctorture() to try to pinpoint
the origins of the problem. I can, alas, only confirm that gctorture() has
an enormous impact on execution time, even for operations that are normally
executed within the blink of an eye. From what I have seen until now,
executing all the R code before the crash with gctorture(TRUE) could take
months.


I suppose then that the best way to proceed would be to proceed backward from
the point where the crash occurs when gctorture(FALSE).

I have tried to find some concrete examples of good practices in the use of
gctorture() to identify memory problems in R, but most of what I have found
on the web is simply a copy of the help page. Does anybody know more concrete
and elaborated examples that could give an indication on how to best proceed
further?





Laurent Franckx, PhD Senior researcher sustainable mobility VITO NV |
Boeretang 200 | 2400 Mol Tel. ++ 32 14 33 58 22| mob. +32 479 25 59 07 |
Skype: laurent.franckx | laurent.fran...@vito.be | Twitter @LaurentFranckx




VITO Disclaimer: http://www.vito.be/e-maildisclaimer

__ R-help@r-project.org mailing
list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal,
self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Returning to parent function

2015-03-17 Thread Martin Morgan


On 03/16/2015 05:05 PM, Saptarshi Guha wrote:

Example was complicated, but here is a simpler form

continueIfTrue - function(mm=return()){
 eval(mm)
}
telemStats - function(){
 y - substitute(return())
 continueIfTrue(y)
 print(I would not like this message to be printed)
}
telemStats()


Ideally, calling telemStats() should return to the prompt and the
print in telemStats should not appear


here's one way to implement your original example -- signal and handle, via 
tryCatch(), a custom condition created (modelled after simpleCondition()) as an 
S3 class with linear inheritance.


X - function() {
print(I'm saying...)
signalCondition(structure(list(), class=c(my, condition)))
print(X)
}

Y - function(){
tryCatch(XParent(), my=function(...) NULL)
print(hello)
}

XParent - function(){
X()
print(H)
}

leading to

   Y()
  [1] I'm saying...
  [1] hello

callCC() is tricky for me to grasp, but I'll write Y to accept an argument X, 
which will be a function. It'll call XParent with that function, and XParent 
will use the function.


Y - function(X){
XParent(X)
print(hello)
}

XParent - function(X){
X(fun)
print(H)
}

then we've got

   Y(X)
  Error in XParent(X) (from tmp.R!4361C1Y#2) : object 'X' not found
   Y(function(x) print(X))
  [1] X
  [1] H
  [1] hello

but more interestingly the long jump to the top (where callCC was invoked)

   callCC(function(X) { Y(X) })
  [1] fun

or in a function

  y - function() {
  value - callCC(function(X) {
  Y(X)
  })
  print(value)
  print(done)
  }

Hope that helps and is not too misleading. Excellent question.

Martin




On Mon, Mar 16, 2015 at 4:02 PM, David Winsemius dwinsem...@comcast.net wrote:


On Mar 16, 2015, at 3:08 PM, Saptarshi Guha wrote:


Hello,

I would like a function X to return to the place that called the
function XParent that called this function X.

Y calls XParent
Y = function(){
  XParent()
  print(hello)
}

XParent calls X

XParent = function(){
   X()
   print(H)
}

X returns to the point just after the call to XParent. Hence
print(H) is not called, but instead hello is printed.


?sys.call # my second reading of your question makes me think this wasn't what 
was requested.

?return  # this would do what was asked for


XParent = function(){

+   return(sys.call())
+   print(H)
+ }

Y()

[1] hello

# Success
# now to show that a value could be returned if desired


Y = function(){

+  print(XParent())
+  print(hello)
+ }

XParent = function(){

+   return(sys.call())
+   print(H)
+ }

Y()

XParent()
[1] hello




X returns to the point just after the call to XParent. Hence
print(H) is not called, but instead hello is printed.

An example of what i'm going for is this

continueIfTrue - function(filterExp, grpname, subname,n=1){
y - substitute(filterExp)
res - isn(eval(y, envir=parent.frame()),FALSE)
## if res is FALSE, I would like to return from telemStats
}

telemStats - function(a,b){
b - c(10,12)
continueIfTrue( {length(b) =10 }, progStats,00)
print(Since the above condition failed, I would not like this
message to be printed)
}


I'm afraid there were too many undefined objects to make much sense of that 
example.



I looked into callCC and signals but dont think i understood correctly.
Any hints would be appreciated

Kind Regards
Saptarshi


--

David Winsemius
Alameda, CA, USA



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Checking whether specific packages (from bioconductor) are installed when loading a package

2015-03-11 Thread Martin Morgan


On 03/11/2015 01:36 AM, Søren Højsgaard wrote:

Dear all,

My package 'gRbase' uses three packages from Bioconductor and these are not 
automatically installed when gRbase is installed. My instructions (on the 
package webpage) to users are therefore to run:



Treat Bioconductor packages as any other, listing them in Depends: or Imports: 
or Suggests: as described in 'Writing R Extensions'. CRAN builds packages with 
access to the Bioconductor repository. Your CRAN users chooseBioCmirror() and 
setRepositories() before using install.packages(), and Bioc dependencies are 
installed like any other dependency.



source(http://bioconductor.org/biocLite.R;); 
biocLite(c(graph,RBGL,Rgraphviz))

When loading gRbase, it is checked whether these Bioconductor packages are 
available, but I would like to add a message about how to install the packages 
if they are not.



This functionality is provided by Depends: and Imports:, so is not relevant for 
packages listed in this part of your DESCRIPTION file. You're only asking for 
advice on packages that are in Suggests:. It does not matter that these are 
Bioconductor packages or CRAN packages or ... the packages in Suggests: are not, 
by default, installed when your package was installed (see the 'dependencies' 
argument to install.packages()).



Does this go into .onAttach or .onLoad or elsewhere?


Or not at all. If the package belongs in Suggests: and provides some special 
functionality not needed by the package most of the time (else it would be in 
Imports: [most likely] or Depends:) then there will be some few points in the 
code where the package is used and you need to alert the user to the special 
condition they've encountered. You'll want to fully specify the package and 
function to be used


  RBGL::transitive.closure(...)

(full specification provides similar advantage to Import:'ing a symbol into your 
package, avoiding symbol look-up along the search() path, potentially getting a 
function transitive.closure() defined by the user or a package different from 
RBGL). If RBGL is not available, the above code will fail, and the user will be 
told that there is no package called 'RBGL'.


One common strategy for nicer messages is to

if (!requireNamespace(RBGL))
stop(your more tailored message)

in the few code chunks before your use of RBGL::transitive.closure(). 
requireNamespace() loads but does not attach the RBGL package, so the symbols 
are available when fully qualified RBGL:: but the package does not interfere 
with the user search() path.


Which I guess brings us to your question, and the answer is probably that if 
after the above you were to still wish to add a message at package start-up, 
then the right place would be .onLoad(), so that users of your package, as well 
as users of packages that Import: (load) but do not Depend: (attach) on your 
package, will see the message.


Also, this belongs on the R-devel mailing list.

Hope that's helpful,

Martin



Thanks in advance
Søren

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing GO.db Package in R

2015-03-05 Thread Martin Morgan


On 03/05/2015 01:21 AM, Zaynab Mousavian wrote:

Hi all,

I have tried to install GO.db package in R, but the following error is


please ask questions about Bioconductor packages on the Bioconductor support 
forum https://support.bioconductor.org. Please also review answers to your 
question on Biostars first.


You are using an old version of R / Bioconductor but a new version of RSQLite. 
Use either a current version of R / Bioconductor or an old version of RSQLite, 
as explained here https://support.bioconductor.org/p/63555.


If you are having trouble installing a current version of R on linux, indicate 
your OS and how you are currently installing R. Be sure to follow the relevant 
directions from, e.g., http://cran.r-project.org/. Perhaps the R-SIG-Debian 
archives and mailing list have additional hints 
https://stat.ethz.ch/pipermail/r-sig-debian/.


Martin


given to me:

biocLite(c(GO.db))
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.13 (BiocInstaller 1.12.1), R version 3.0.2.
Installing package(s) 'GO.db'
trying URL
 
'http://bioconductor.org/packages/2.13/data/annotation/src/contrib/GO.db_2.10.1.tar.gz'
Content type 'application/x-gzip' length 26094175 bytes (24.9 Mb)
opened URL==
downloaded 24.9 Mb
* installing *source* package �GO.db� ...** R** inst** preparing
package for lazy loading** help*** installing help indices** building
package indices** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'GO.db', details:
call: match.arg(synchronous, c(off, normal, full))
error: 'arg' must be NULL or a character vector
Error: loading failed
Execution halted
ERROR: loading failed* removing
�/home/zmousavian/R/x86_64-pc-linux-gnu-library/3.0/GO.db�

The downloaded source packages are in
 �/tmp/RtmpBDs1Tq/downloaded_packages�
Warning messages:1: In install.packages(pkgs = pkgs, lib = lib, repos
= repos, ...) :
installation of package �GO.db� had non-zero exit status2: installed
directory not writable, cannot update packages 'colorspace','lattice',
'mgcv', 'survival'


Can anyone help me to install it?

Regards



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] character type and memory usage

2015-01-17 Thread Martin Morgan


On 01/16/2015 10:21 PM, Mike Miller wrote:

First, a very easy question:  What is the difference between using
what=character and what=character() in scan()?  What is the reason for the
character() syntax?

I am working with some character vectors that are up to about 27.5 million
elements long.  The elements are always unique.  Specifically, these are names
of genetic markers.  This is how much memory those names take up:


snps - scan(SNPs.txt, what=character())

Read 27446736 items

object.size(snps)

1756363648 bytes

object.size(snps)/length(snps)

63.9917128215173 bytes

As you can see, that's about 1.76 GB of memory for the vector at an average of
64 bytes per element.  The longest string is only 14 bytes, though.  The file
takes up 313 MB.

Using 64 bytes per element instead of 14 bytes per element is costing me a total
of 1,372,336,800 bytes.  In a different example where the longest string is 4
characters, the elements each use 8 bytes.  So it looks like I'm stuck with
either 8 bytes or 64 bytes.  Is that true?  There is no way to modify that?


Hi Mike --

R represents the atomic vector types as so-called S-expressions, which in 
addition to the actual data contain information about whether they have been 
referenced by one or more symbols etc.; you can get a sense of this with


 x - 1:5
 .Internal(inspect(x))
@4c732940 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 1,2,3,4,5

where the number after @ is the memory location, INTSXP indicates that the type 
of data is an integer, etc. So a vector requires memory for the S-expression, 
and for the actual data.


A character vector is represented by an S-expression for the vector itself, and 
an S-expression for each element of the vector, and of course the data itself


 .Internal(inspect(y))
@4ce72090 16 STRSXP g0c3 [NAM(1)] (len=3, tl=0)
  @137ccd8 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] a
  @137ccd8 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] a
  @15a6698 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] b

The large S-expression overhead is recouped by long (in the nchar() sense) or 
re-used strings, but that's not the case for your data.


There is no way around this in base R. There are general-purpose solutions like 
the data.table package, or retaining your large data in a data base (like 
SQLite) that you interface from within R using e.g., sqldf or dplyr to do as 
much data reduction in the data base (and out of R) as possible. In your 
particular case the Bioconductor Biostrings package BStringSet() might be relevant


  http://bioconductor.org/packages/release/bioc/html/Biostrings.html

This will consume memory more along the lines of 1 byte per character + 1 byte 
per string, and is of particular relevance because you are likely doing other 
genetic operations for which the Bioconductor project has relevant packages (see 
especially the GenomicRanges package).


If your work is not particularly domain-specific, data.table would be a good bet 
(it also has an implementation for working with overlapping ranges, which is a 
very common task with SNPs). A lot of SNP data management is really relational, 
for which the SQL representation (and dplyr, for me) is the obvious choice. 
Bioconductor would be the choice if there is to be extensive domain-specific 
work. I am involved in the Bioconductor project, so not exactly impartial.


Martin



By the way...

It turns out that 99.72% of those character strings are of the form paste(rs,
Int) where Int is an integer of no more than 9 digits.  So if I use only those
markers, drop the rs off, and load them as integers, I see a huge improvement:


snps - scan(SNPs_rs.txt, what=integer())

Read 27369706 items

object.size(snps)

109478864 bytes

object.size(snps)/length(snps)

4.0146146985 bytes

That saves 93.8% of the memory by dropping 0.28% of the markers and encoding as
integers instead of strings.  I might end up doing this by encoding the other
characters as negative integers.

Mike

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: which is faster for or apply

2014-12-31 Thread Martin Morgan

, compact, and speedy.

Martin Morgan



   Ô__
  c/ /'_;kmezhoud
(*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
http://bioinformatics.tn/



On Wed, Dec 31, 2014 at 8:54 AM, Berend Hasselman b...@xs4all.nl wrote:




On 31-12-2014, at 08:40, Karim Mezhoud kmezh...@gmail.com wrote:

Hi All,
I would like to choice between these two data frame convert. which is
faster?

   for(i in 1:ncol(DataFrame)){

DataFrame[,i] - as.numeric(DataFrame[,i])
}


OR

DataFrame - as.data.frame(apply(DataFrame,2 ,function(x) as.numeric(x)))




Try it and use system.time.

Berend


Thanks
Karim
  Ô__
c/ /'_;kmezhoud
(*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
http://bioinformatics.tn/

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RCurl much faster than base R

2014-12-09 Thread Martin Morgan


On 12/05/2014 08:12 AM, Alex Gutteridge wrote:

I'm trying to debug a curious network issue, I wonder if anyone can help me as I
(and my local sysadmin) am stumped:

This base R command takes ~1 minute to complete:

readLines(url(http://bioconductor.org/biocLite.R;))

(biocLite.R is a couple of KB in size)

Using RCurl (and so libcurl under the hood) is instantaneous (1s):

library(RCurl)
getURL(http://bioconductor.org/biocLite.R;)

I've not set it to use any proxies (which was my first thought) unless libcurl
autodetects them somehow... And the speed is similarly fast using wget or curl
on the command line. It just seems to be the base R commands which are slow
(including install.packages etc...).

Does anyone have hints on how to debug this (if not an answer directly)?



Hi Alex -- maybe not surprisingly, both approaches are approximately equally 
speedy for me, at least on average.


For what it's worth

- there is no need to use url(), just readLines(http://...;)

It would help to

- provide the output of sessionInfo()

- verify or otherwise that the problem is restricted to particular urls

- work through a simple example where the test say 'works' when accessing a 
local http server (e.g., on the same machine and in a directory mydir, python 
-m SimpleHTTPServer 1 in one terminal, the 
readLines(http://localhost:1/some file in 'mydir') but fails after some 
increasingly remote point, e.g., accessing a url outside your institution 
firewall hence indicating a firewall issue.


Maybe at the end of this exercise the only insight will be that the R and curl 
implementations differ (a known known!).


Also if this is really a problem with installing Bioconductor packages rather 
than a general R question, then https://support.bioconductor.org is a better 
place to post. If the problem is restricted to bioconductor.org, then: (a) for 
your sys.admin, the url is redirected (via DNS, not http:) to Amazon Cloud Front 
and from there to a regional Amazon data center; I'm not sure what the 
significance of this might be, e.g., the admin might have throttled download 
speeds from certain ip address ranges; and (b) if you're in Europe or elsewhere, 
you're trying to install Bioconductor packages, and the regional data center is 
not fast enough (it should be responsive, at least when the url has been seen 
'recently'), then configure R to use a local mirror from 
http://bioconductor.org/about/mirrors/, e.g.,


chooseBioCmirror()

Martin Morgan
Bioconductor


AlexG

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] need help with withRestarts ?

2014-12-06 Thread Martin Morgan


On 12/06/2014 02:53 PM, ce wrote:

Dear all,

Let's say I have this script , below. tryCatch  indeed catches the error but 
exists, I want function to continue and stay in the loop. I found very  
examples of withRestarts on internet to figure it out. Could you help me how to 
do it ?


myfunc - function()
{
   while(1)
   {
   x - runif(1)
   if ( x  0.3 ) a -  x/2 else a - x/b
   print(a)
   Sys.sleep(1)
   }
}


Hi --

Modify your function so that the code that you'd like to restart after is 
surrounded with withRestarts(), and with a handler that performs the action 
you'd like, so


myfunc - function()
{
while(TRUE)
{
x - runif(1)
withRestarts({
if ( x  0.3 ) a -  x/2 else a - x/b
print(a)
}, restartLoop = function() {
message(restarting)
NULL
})
Sys.sleep(1)
}
}

Instead of using tryCatch(), which returns to the top level context to evaluate 
the handlers, use withCallingHandlers(), which retains the calling context. 
Write a handler that invokes the restart


withCallingHandlers({
myfunc()
}, error = function(e) {
message(error)
invokeRestart(restartLoop)
})

It's interesting that tryCatch is usually used with errors (because errors are 
hard to recover from), and withCallingHandlers are usually used with warnings 
(because warnings can usually be recovered from), but tryCatch() and 
withCallingHandlers() can be used with any condition.


Martin



tryCatch({ myfunc() },
 warning = function(w) { print(warning) },
 error = function(e) { print(error) },
 finally = {  print(end) }
)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recoding genetic information using gsub

2014-12-05 Thread Martin Morgan


On 12/5/2014 11:24 AM, Kate Ignatius wrote:

I have genetic information for several thousand individuals:

A/T
T/G
C/G  etc

For some individuals there are some genotypes that are like this:  A/,
C/, T/, G/ or even just / which represents missing and I want to
change these to the following:

A/ A/.
C/ C/.
G/ G/.
T/ T/.
/ ./.
/A ./A
/C ./C
/G ./G
/T ./T

I've tried to use gsub with a command like the following:

gsub(A/,[A/.], GT[,6])


Hi Kate -- a different approach is to create a 'map' (named character vector) 
describing what you want in terms of what you have; the number of possible 
genotypes is not large.


http://stackoverflow.com/questions/15912210/replace-a-list-of-values-by-another-in-r/15912309#15912309

Martin



but if genotypes arent like the above, the command will change it to
look something like:

A/.T
T/.G
C/.G

Is there anyway to be more specific in gsub?

Thanks!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Profiling a C/C++ library from R

2014-12-02 Thread Martin Morgan


On 12/02/2014 01:43 PM, Charles Novaes de Santana wrote:

Dear all,

I am running a c++ library (a .so file) from a R code. I am using the
function dyn.load(lib.so) to load the library. Do you know a way to
profile my C library from R? Or should I compile my C library as an
executable and profile it using the typical C-profilers?

Thanks in advance for any help!


Hi Charles

Section 3.4 of RShowDoc(R-exts) discusses some options; I've had luck with 
operf  friends. Remember to compile without optimizations and with debugging 
information -ggdb -O0.


(I think this is appropriate for the R-devel mailing list 
http://www.r-project.org/posting-guide.html#which_list)


Martin Morgan



Best,

Charles




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cyclic dependency when building a package

2014-11-30 Thread Martin Morgan


On 11/30/2014 07:15 AM, Glenn Schultz wrote:

Hi All,

I am working on a package BondLab for the analysis of fixed income securities.  
Building the package results in the following:

Error in loadNamespace(package, c(which.lib.loc, lib.loc)) :
cyclic namespace dependency detected when loading ‘BondLab’, already loading 
‘BondLab’


It occurs when I set the generic for the function mortgagecashflow.  Further if 
a function uses mortgagecashflow, similarly its generic, when set, causes the 
above error.  Other generics do not throw this error so I am quite sure it is 
mortgagecashflow.  The package and the code can be found on github.
https://github.com/glennmschultz/BondLab.git

I have been trying to figure this out for a couple of months to no avail. If 
anyone has familiarity with this issue I would certainly appreciate any help 
with the issue.



Hi Glenn --

The root of the problem is that you are defining both a generic and a 
plain-old-function named MortgageCashFlow -- one or the other and you're fine.


R CMD INSTALL pkgA, where pkgA contains a single R file R/test.R

setGeneric(foo, function(x, ...) standardGeneric(foo))
foo - function(x, ...) {}

also generates this; maybe you meant something like

.foo - function(x, ...) {}
setGeneric(foo, function(x, ...) standardGeneric(foo),
   useAsDefault=.foo)

or simply reversing the order of the declarations

foo - function(x, ...) {}
setGeneric(foo, function(x, ...) standardGeneric(foo))


?

Martin Morgan


Thanks,
Glenn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gender balance in R

2014-11-25 Thread Martin Morgan


On 11/25/2014 04:11 AM, Scott Kostyshak wrote:

On Mon, Nov 24, 2014 at 12:34 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

I took a look at apparent gender among list participants a few years ago:
https://stat.ethz.ch/pipermail/r-help/2011-June/280272.html

Same general thing: very few regular participants on the list were
women. I don't see any sign that that has changed in the last three
years. The bar to participation in the R-help list is much, much lower
than that to become a developer.


I plotted the gender of posters on r-help over time. The plot is here:
https://twitter.com/scottkosty/status/449933971644633088

The code to reproduce that plot is here:
https://github.com/scottkosty/genderAnalysis
The R file there will call devtools::install_github to install a
package from Github used for guessing the gender based on the first
name (https://github.com/scottkosty/gender).


It would be great to include in your package the script that scraped author 
names from R-help archives (I guess that's what you did?). Presumably it easily 
applies to other mailing lists hosted at the same location (R-devel, further 
along the ladder from user to developer, and Bioconductor / Bioc-devel, in a 
different domain and perhaps confounded with a different 'feel' to the list). 
Also the R community is definitely international, so finding more versatile 
gender-assignment approaches seems important.


it might be interesting to ask about participation in mailing list forums versus 
other, and in particular the recent Bioconductor transition from mailing list to 
'StackOverflow' style support forum (https://support.bioconductor.org) -- on the 
one hand the 'gamification' elements might seem to only entrench male 
participation, while on the other we have already seen increased (quantifiable) 
and broader (subjective) participation from the Bioconductor community. I'd be 
happy to make support site usage data available, and am interested in 
collaborating in an academically well-founded analysis of this data; any 
interested parties please feel free to contact me off-list.


Martin Morgan
Bioconductor



Note also on that tweet that Gabriela de Queiroz posted it, who is the
founder of R-ladies; and that David Smith showed interest in
discussing the topic. So there is definitely demand for some data
analysis and discussion on the topic.


It would be interesting to look at the stats for CRAN packages as well.

The very low percentage of regular female participants is one of the
things that keeps me active on this list: to demonstrate that it's not
only men who use R and participate in the community.


Thank you for that!

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University


(If you decide to do the stats for 2014, be aware that I've been out
on medical leave for the past two months, so the numbers are even
lower than usual.)

Sarah

On Mon, Nov 24, 2014 at 10:10 AM, Maarten Blaauw
maarten.bla...@qub.ac.uk wrote:

Hi there,

I can't help to notice that the gender balance among R developers and
ordinary members is extremely skewed (as it is with open source software in
general).

Have a look at http://www.r-project.org/foundation/memberlist.html - at most
a handful of women are listed among the 'supporting members', and none at
all among the 29 'ordinary members'.

On the other hand I personally know many happy R users of both genders.

My questions are thus: Should R developers (and users) be worried that the
'other half' is excluded? If so, how could female R users/developers be
persuaded to become more visible (e.g. added as supporting or ordinary
members)?

Thanks,

Maarten


--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading FCS files with flowCore package

2014-11-24 Thread Martin Morgan


On 11/24/2014 06:18 AM, Luigi wrote:

Dear all,
I would like to use the R's Bioconductor package flowCore to do flow cytometry


Please address questions about Bioconductor packages to the Bioconductor support 
site


  https://support.bioconductor.org

and...


analysis.
I generated a FCS file using the fileexport function of the FACSDiva Software
Version 8 from a BD LSRII machine. I then used the functions:
 file.name -system.file(extdata, cd cells_FMO 8_003.fcs,
package=flowCore)


system.file() is used to access files installed in R packages, but probably you 
want to access your own file. Try


  file.name = file.choose()

and selecting the file that you want to iniptu. Verify that the path is correct 
by displaying the result


  file.name

Martin


 x -read.FCS(file.name, transformation = FALSE)
as shown in the flowCore: data structure package... vignette (20 May 2014) as
available from the internet. However the result is an error:
 Error in read.FCS(file.name, transformation = FALSE) : ' ' is not a valid
file
I then used the function:
 isFCSfile(cd cells_FMO 8_003.fcs)
where cd cells_FMO 8_003.fcs is the name of the file. As expected I obtained the
following message:
 cd cells_FMO 8_003.fcs FALSE
meaning I reckon that the file is not a FCS. Since I am completely new to this
kind of analysis but I would not like to use flowJo, could anybody tell me how
to load the FCS files? In the rest of the file I am pasting the beginning of the
cd cells_FMO 8_003.fcs file for further reference (I can't attach the whole
thing or even attaching the file because it is too big). From its gibberish I
reckon that the encoding is probably wrong: I was expecting a flatfile after all
not ASCII. Would the problem be how the run was exported? FlowJo however
recognizes the files...
Best regards,
Luigi

==
FCS3.0 25619271933 1192532 0 0
$BEGINANALYSIS0$ENDANALYSIS0$BEGINSTEXT0$ENDSTEXT0$BEGINDATA1933
$ENDDATA1192532
$FIL180444.fcs$SYSWindows 7 6.1$TOT29765
$PAR10$MODEL$BYTEORD4,3,2,1$DATATYPEF$NEXTDATA0CREATORBD FACSDiva
Software Version 8.0TUBE NAMEFMO 8$SRCcd
cellsEXPERIMENT
NAMEExperiment_001GUID4171c2f1-427b-4cc5-bf86-39bb76803c48$DATE31-OCT-2014
$BTIM16:07:12$ETIM16:09:25SETTINGSCytometerWINDOW
EXTENSION0.00EXPORT USER NAMELuigiMarongiuEXPORT
TIME31-OCT-2014-16:07:11FSC ASF0.78AUTOBSTRUE$INST
$TIMESTEP0.01SPILL
3,405-450/50-A,405-655/8-A,405-525/50-A,1,0.0028442147740618787,0.0923076944711957,0,1,0,0.3425525014147933,0.08630456626553264,1
APPLY
COMPENSATIONTRUETHRESHOLDFSC,5000$P1NTime$P1R262144$P1B32$P1E0,0
$P1G0.01P1BS0P1MS0$P2NFSC-A$P2R262144$P2B32$P2E0,0$P2V450$P2G
1.0P2DISPLAYLINP2BS-1P2MS0$P3NFSC-H$P3R262144$P3B32$P3E0,0$P3V
450$P3G1.0P3DISPLAYLINP3BS-1P3MS0$P4NFSC-W$P4R262144$P4B32$P4E
0,0$P4V450$P4G1.0P4BS-1P4MS0$P5NSSC-A$P5R262144$P5B32$P5E0,0
$P5V319$P5G1.0P5DISPLAYLINP5BS-1P5MS0$P6NSSC-H$P6R262144$P6B32
$P6E0,0$P6V319$P6G1.0P6DISPLAYLINP6BS-1P6MS0$P7NSSC-W$P7R262144
$P7B32$P7E0,0$P7V319$P7G1.0P7BS-1P7MS0$P8N405-450/50-A$P8Scd8
- pac
blue$P8R262144$P8B32$P8E0,0$P8V450$P8G1.0P8DISPLAYLOGP8BS-1P8MS
0$P9N405-655/8-A$P9Scd45ra
-
q655$P9R262144$P9B32$P9E0,0$P9V450$P9G1.0P9DISPLAYLOGP9BS-1P9MS
0$P10N405-525/50-A$P10Sld
-
acqua$P10R262144$P10B32$P10E0,0$P10V450$P10G1.0P10DISPLAYLOGP10BS
-1P10MS0CST
BEADS EXPIREDFalse BHffEšùëGwI,E p F�ÑgG„F{¨
DËƒ×ÀG®CçË…BI33GAàõG¬‡GA G1ÊqGŒ
ƒG� Bôk…Ab=pBÜ.BI33EÝ-ÂG�ÊÀEÚ  Fe�×G�h±Fc DN
=ÀAë…C‰ÝqBK33FÀúG‚JF½– FVšG{ÚeF| Bp¤Cb=pAÊ  BM33GõÇG¡Ã’GÁö G³ôãGš;G•Œ
CÓ˜REY6�CiO\BO33EÑÞfGŠPlEÂ8 El G€4.E0 Cp¤ÃHýqC!™šBQ33FKòG�UùF6  FûG†¾vF
Â-¸RC0À ÂJ  BTffG^ùõG�m@G5L GH—îGŠÏüG8ø Ap¤Fœ�BÅõÂBVÌÍF¥Ý£G®ÑdFrä G8•ÐG‘÷âG!Ý
C¦fB—€ À�G®BZ F„Ž®G„)~F€b F±ÕŠG�´ôF ¢ Áâ=pB‡W
B\)B]33FøuáGŽ0ÏFßª G¸.G‡E¤G
Õ Bâ=pF0fB=áHB_™šEÇÙÂG…ÒdE¿( FRÙGˆ˜ÈFE” DgŠáÃ…C•záBa™šF×Õ£G¥�ÌF¦Þ
Fë¦HG�¬:FÐ~ BøuÂC#ž¸BAë…BbÌÍGœa€G”G‡: G$ÛOG‹Ü¡Gà
C#ž¸Fï“=B�ffBhffGŽß^G“í¾Gw@ G~G�SFõ( BêQëFÕ¥…CW
BhffFÜ
(GƒÂOFÕÆ F}v)G¡ù˜FHL BÃð¤Âb=pB9×
Bi™šG©MGœi0GŠŒ€GDžG“f·G*½ Bî\)C³Ç®CAë…Bj  G5[ðG™sûGG G]¾G‹ö!G3
CR{F$HB�G®Bj  FÜéG“/FÀ G(ñGŒGÛ B
ffGxRCŒÍBlÌÍE××GŽ‡�EÁˆ E±…Gƒ»E¬ Àâ=pCtk…@�G®BnÌÍGœ›œG“cGˆ
G/$;GŠë]G!` B¥£×FšŸBfG®BzffE³˜QG�:zE € F0y G€µ›F/€
D€EÂþ…Cš‡®B{™šG“¨\G—)JGz G,“G‘ª¬G¥ B�k…G(ÊÔBÜ.B|ffG*ðGŒÚ}G´
FéˆšG�Ø8FÎ` AúzáÂƒLÍB�\B|ÌÍG¬}\G˜G�œ€GY
G�‘ÕG@) C‰ÝqGp¶CnB~ÌÍFñ–3G“ïpFÑ Fª¤G‰ŽFžà C�G®F9

etc.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box

Re: [R] Reading FCS files with flowCore package

2014-11-24 Thread Martin Morgan


On 11/24/2014 11:38 AM, William Dunlap wrote:

If help files used the mustWork=TRUE argument to system.file() this sort of 
problem
would become more apparent to the user.  It would give a clear error message 
from


or to change the default to mustWork=TRUE, since there are not many use cases 
for querying a non-existent system file?


(one irony I've stumbled across in my own code is to misspell 'mustWork', e.g., 
system.file(foo, mustwork=TRUE), which happily returns ).


Martin


system.file() instead of a mysterious error about file  not being valid or,
worse, a hang
from an input command waiting for the user to type something into standard input
(because
scan() and others treat file= the same as scan=stdin()).

Bill Dunlap
TIBCO Software
wdunlap tibco.com http://tibco.com

On Mon, Nov 24, 2014 at 10:36 AM, Martin Morgan mtmor...@fredhutch.org
mailto:mtmor...@fredhutch.org wrote:

On 11/24/2014 06:18 AM, Luigi wrote:

Dear all,
I would like to use the R's Bioconductor package flowCore to do flow
cytometry


Please address questions about Bioconductor packages to the Bioconductor
support site

https://support.bioconductor.__org https://support.bioconductor.org

and...

analysis.
I generated a FCS file using the fileexport function of the FACSDiva
Software
Version 8 from a BD LSRII machine. I then used the functions:
file.name http://file.name -system.file(extdata, cd cells_FMO
8_003.fcs,
package=flowCore)


system.file() is used to access files installed in R packages, but probably
you want to access your own file. Try

file.name http://file.name = file.choose()

and selecting the file that you want to iniptu. Verify that the path is
correct by displaying the result

file.name http://file.name

Martin

  x -read.FCS(file.name http://file.name, transformation = FALSE)
as shown in the flowCore: data structure package... vignette (20 May
2014) as
available from the internet. However the result is an error:
  Error in read.FCS(file.name http://file.name, transformation =
FALSE) : ' ' is not a valid
file
I then used the function:
  isFCSfile(cd cells_FMO 8_003.fcs)
where cd cells_FMO 8_003.fcs is the name of the file. As expected I
obtained the
following message:
  cd cells_FMO 8_003.fcs FALSE
meaning I reckon that the file is not a FCS. Since I am completely new
to this
kind of analysis but I would not like to use flowJo, could anybody tell
me how
to load the FCS files? In the rest of the file I am pasting the
beginning of the
cd cells_FMO 8_003.fcs file for further reference (I can't attach the 
whole
thing or even attaching the file because it is too big). From its
gibberish I
reckon that the encoding is probably wrong: I was expecting a flatfile
after all
not ASCII. Would the problem be how the run was exported? FlowJo however
recognizes the files...
Best regards,
Luigi

==
FCS3.0 25619271933 1192532 0 0
$BEGINANALYSIS0$ENDANALYSIS0$BEGINSTEXT0$ENDSTEXT0$BEGINDATA
1933$ENDDATA1192532
$FIL180444.fcs$SYSWindows 7 6.1$TOT29765
$PAR10$MODEL$BYTEORD4,3,2,1$DATATYPEF$NEXTDATA0CREATORBD
FACSDiva
Software Version 8.0TUBE NAMEFMO 8$SRCcd
cellsEXPERIMENT
NAMEExperiment_001GUID4171c2f1-427b-4cc5-bf86-__39bb76803c48$DATE
31-OCT-2014$BTIM16:07:12$ETIM16:09:25SETTINGSCytometerWINDOW
EXTENSION0.00EXPORT USER NAMELuigiMarongiuEXPORT
TIME31-OCT-2014-16:07:11FSC ASF0.78AUTOBSTRUE$INST
$TIMESTEP0.01SPILL
3,405-450/50-A,405-655/8-A,__405-525/50-A,1,0.__0028442147740618787,0.__0923076944711957,0,1,0,0.__3425525014147933,0.__08630456626553264,1
APPLY
COMPENSATIONTRUETHRESHOLDFSC,5000$P1NTime$P1R262144$P1B32$P1E
0,0$P1G0.01P1BS0P1MS0$P2NFSC-A$P2R262144$P2B32$P2E0,0$P2V450
$P2G1.0P2DISPLAYLINP2BS-1P2MS0$P3NFSC-H$P3R262144$P3B32$P3E0,0
$P3V450$P3G1.0P3DISPLAYLINP3BS-1P3MS0$P4NFSC-W$P4R262144$P4B32
$P4E0,0$P4V450$P4G1.0P4BS-1P4MS0$P5NSSC-A$P5R262144$P5B32$P5E
0,0$P5V319$P5G1.0P5DISPLAYLINP5BS-1P5MS0$P6NSSC-H$P6R262144$P6B
32$P6E0,0$P6V319$P6G1.0P6DISPLAYLINP6BS-1P6MS0$P7NSSC-W$P7R
262144$P7B32$P7E0,0$P7V319$P7G1.0P7BS-1P7MS0$P8N405-450/50-A$P8S
cd8
- pac
blue$P8R262144$P8B32$P8E0,0$P8V450$P8G1.0P8DISPLAYLOGP8BS
-1P8MS0$P9N405-655/8-A$P9Scd45ra
-
q655$P9R262144$P9B32$P9E0,0$P9V450$P9G1.0P9DISPLAYLOGP9BS
-1P9MS0$P10N405-525/50-A$P10Sld
-
acqua$P10R

Re: [R] Problem on annotation of Deseq2 on reportingtools

2014-11-16 Thread Martin Morgan


On 11/16/2014 10:25 AM, jarod...@libero.it wrote:

Dear all!,

I use this code:


dds - DESeq(ddHTSeq)
res -results(dds)
#reporting
library(ReportingTools)
library(org.Hs.eg.db)
des2Report - HTMLReport(shortName ='RNAseq_analysis_DESeq2.html',title ='RNA-seq 
analysis of differential expression using DESeq2 ',reportDirectory = ./Reports)
#publish(dds,des2Report,pvalueCutoff=0.05,annotation.db=org,Hs.eg.db)
publish(dds,des2Report,pvalueCutoff=0.01,annotation.db=org.Hs.egENSEMBL2EG,factor=colData(dds)$condition,categorySize=5)
finish(des2Report)

and I have this error:
Error in results(object, resultName) :
   'contrast', as a character vector of length 3, should have the form:
contrast = c('factorName','numeratorLevel','denominatorLevel'),
see the manual page of ?results for more information


is.factor(colData(dds)$condition)
[1] TRUE



What can I do?



Please ask questions about Bioconductor packages on the Bioconductor support 
site

  https://support.bioconductor.org

Martin


  sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C   LC_TIME=en_US.UTF-8
  [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C  LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods   base

other attached packages:
  [1] pvclust_1.2-2 gplots_2.13.0 genefilter_1.44.0
  [4] ReportingTools_2.2.0  knitr_1.6 org.Hs.eg.db_2.10.1
  [7] RSQLite_0.11.4DBI_0.2-7 annotate_1.40.1
[10] AnnotationDbi_1.24.0  Biobase_2.22.0biomaRt_2.18.0
[13] DESeq2_1.4.5  RcppArmadillo_0.4.300.8.0 Rcpp_0.11.2
[16] GenomicRanges_1.14.4  XVector_0.2.0 IRanges_1.20.7
[19] BiocGenerics_0.8.0

loaded via a namespace (and not attached):
  [1] AnnotationForge_1.4.4Biostrings_2.30.1biovizBase_1.10.8
  [4] bitops_1.0-6 BSgenome_1.30.0  Category_2.28.0
  [7] caTools_1.17 cluster_1.15.3   colorspace_1.2-4
[10] dichromat_2.0-0  digest_0.6.4 edgeR_3.4.2
[13] evaluate_0.5.5   formatR_0.10 Formula_1.1-1
[16] gdata_2.13.3 geneplotter_1.40.0   GenomicFeatures_1.14.5
[19] ggbio_1.10.16ggplot2_1.0.0GO.db_2.10.1
[22] GOstats_2.28.0   graph_1.40.1 grid_3.1.1
[25] gridExtra_0.9.1  GSEABase_1.24.0  gtable_0.1.2
[28] gtools_3.4.1 Hmisc_3.14-4 hwriter_1.3
[31] KernSmooth_2.23-13   lattice_0.20-29  latticeExtra_0.6-26
[34] limma_3.18.13locfit_1.5-9.1   MASS_7.3-34
[37] Matrix_1.1-4 munsell_0.4.2PFAM.db_2.10.1
[40] plyr_1.8.1   proto_0.3-10 RBGL_1.38.0
[43] RColorBrewer_1.0-5   RCurl_1.95-4.1   reshape2_1.4
[46] R.methodsS3_1.6.1R.oo_1.18.0  Rsamtools_1.14.3
[49] rtracklayer_1.22.7   R.utils_1.32.4   scales_0.2.4
[52] splines_3.1.1stats4_3.1.1 stringr_0.6.2
[55] survival_2.37-7  tools_3.1.1  VariantAnnotation_1.8.13
[58] XML_3.98-1.1 xtable_1.7-3 zlibbioc_1.8.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] snow/Rmpi without MPI.spawn?

2014-09-04 Thread Martin Morgan


On 09/03/2014 10:24 PM, Leek, Jim wrote:

Thanks for the tips.  I'll take a look around for for loops in the morning.

I think the example you provided worked for OpenMPI.  (The default on our 
machine is MPICH2, but it gave the same error about calling spawn.)  Anyway, 
with OpenMPI I got this:


# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - 
mpi.universe.size()


(the '## Recent Rmpi bug' comment should have been removed, it's a holdover from 
when the script was written several years ago)



nslaves = 4
mpi.spawn.Rslaves(nslaves)


The argument needs to be named

  mpi.spawn.Rslaves(nslaves=4)

otherwise R matches unnamed arguments by position, and '4' is associated with 
the 'Rscript' argument.


Martin


Reported: 2 (out of 2) daemons - 4 (out of 4) procs

Then it hung there.  So things spawned anyway, which is progress.  I'm just not 
sure is that expected behavior for parSupply or not.

Jim

-Original Message-
From: Martin Morgan [mailto:mtmor...@fhcrc.org]
Sent: Wednesday, September 03, 2014 5:08 PM
To: Leek, Jim; r-help@r-project.org
Subject: Re: [R] snow/Rmpi without MPI.spawn?

On 09/03/2014 03:25 PM, Jim Leek wrote:

I'm a programmer at a high-performance computing center.  I'm not very
familiar with R, but I have used MPI from C, C++, and Python.  I have
to run an R code provided by a guy who knows R, but not MPI.  So, this
fellow used the R snow library to parallelize his R code
(theoretically, I'm not actually sure what he did.)  I need to get
this code running on our machines.

However, Rmpi and snow seem to require mpi spawn, which our computing
center doesn't support.  I even tried building Rmpi with MPICH1
instead of 2, because Rmpi has that option, but it still tries to use spawn.

I can launch plenty of processes, but I have to launch them all at
once at the beginning. Is there any way to convince Rmpi to just use
those processes rather than trying to spawn its own?  I haven't found
any documentation on this issue, although I would've thought it would be quite 
common.


This script

spawn.R
===
# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size() nWorkers - 
mpi.universe.size()
mpi.spawn.Rslaves(nslaves=nWorkers)
mpiRank - function(i)
c(i=i, rank=mpi.comm.rank())
mpi.parSapply(seq_len(2*nWorkers), mpiRank)
mpi.close.Rslaves()
mpi.quit()

can be run like the comment suggests

 salloc -n 12 orterun -n 1 R -f spawn.R

uses slurm (or whatever job manager) to allocate resources for 12 tasks and 
spawn within that allocation. Maybe that's 'good enough' -- spawning within the 
assigned allocation? Likely this requires minimal modification of the current 
code.

More extensive is to revise the manager/worker-style code to something more 
like single instruction, multiple data


simd.R
==
## salloc -n 4 orterun R --slave -f simd.R
sink(/dev/null) # don't capture output -- more care needed here
library(Rmpi)

TAGS = list(FROM_WORKER=1L)
.comm = 0L

## shared `work', here just determine rank and host
work = c(rank=mpi.comm.rank(.comm),
   host=system(hostname, intern=TRUE))

if (mpi.comm.rank(.comm) == 0) {
  ## manager
  mpi.barrier(.comm)
  nWorkers = mpi.comm.size(.comm)
  res = list(nWorkers)
  for (i in seq_len(nWorkers - 1L)) {
  res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER,
comm=.comm)
  }
  res[[nWorkers]] = work
  sink() # start capturing output
  print(do.call(rbind, res))
} else {
  ## worker
  mpi.barrier(.comm)
  mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm)
}
mpi.quit()

but this likely requires some serious code revision; if going this route then
http://r-pbd.org/ might be helpful (and from a similar HPC environment).

It's always worth asking whether the code is written to be efficient in R -- a
typical 'mistake' is to write R-level explicit 'for' loops that
copy-and-append results, along the lines of

 len - 10
 result - NULL
 for (i in seq_len(len))
 ## some complicated calculation, then...
 result - c(result, sqrt(i))

whereas it's much better to pre-allocate and fill

  result - integer(len)
  for (i in seq_len(len))
  result[[i]] = sqrt(i)

or

  lapply(seq_len(len), sqrt)

and very much better still to 'vectorize'

  result - sqrt(seq_len(len))

(timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate
and fill, and .002s for vectorize).

Pushing back on the guy providing the code (grep for for loops, and look for
that copy-and-append pattern) might save you from having to use parallel
evaluation at all.

Martin



Thanks,
Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] snow/Rmpi without MPI.spawn?

2014-09-03 Thread Martin Morgan


On 09/03/2014 03:25 PM, Jim Leek wrote:

I'm a programmer at a high-performance computing center.  I'm not very
familiar with R, but I have used MPI from C, C++, and Python.  I have to run
an R code provided by a guy who knows R, but not MPI.  So, this fellow used
the R snow library to parallelize his R code (theoretically, I'm not
actually sure what he did.)  I need to get this code running on our
machines.

However, Rmpi and snow seem to require mpi spawn, which our computing center
doesn't support.  I even tried building Rmpi with MPICH1 instead of 2,
because Rmpi has that option, but it still tries to use spawn.

I can launch plenty of processes, but I have to launch them all at once at
the beginning. Is there any way to convince Rmpi to just use those processes
rather than trying to spawn its own?  I haven't found any documentation on
this issue, although I would've thought it would be quite common.


This script

spawn.R
===
# salloc -n 12 orterun -n 1 R -f spawn.R
library(Rmpi)
## Recent Rmpi bug -- should be mpi.universe.size()
nWorkers - mpi.universe.size()
mpi.spawn.Rslaves(nslaves=nWorkers)
mpiRank - function(i)
  c(i=i, rank=mpi.comm.rank())
mpi.parSapply(seq_len(2*nWorkers), mpiRank)
mpi.close.Rslaves()
mpi.quit()

can be run like the comment suggests

   salloc -n 12 orterun -n 1 R -f spawn.R

uses slurm (or whatever job manager) to allocate resources for 12 tasks and 
spawn within that allocation. Maybe that's 'good enough' -- spawning within the 
assigned allocation? Likely this requires minimal modification of the current code.


More extensive is to revise the manager/worker-style code to something more like 
single instruction, multiple data



simd.R
==
## salloc -n 4 orterun R --slave -f simd.R
sink(/dev/null) # don't capture output -- more care needed here
library(Rmpi)

TAGS = list(FROM_WORKER=1L)
.comm = 0L

## shared `work', here just determine rank and host
work = c(rank=mpi.comm.rank(.comm),
 host=system(hostname, intern=TRUE))

if (mpi.comm.rank(.comm) == 0) {
## manager
mpi.barrier(.comm)
nWorkers = mpi.comm.size(.comm)
res = list(nWorkers)
for (i in seq_len(nWorkers - 1L)) {
res[[i]] - mpi.recv.Robj(mpi.any.source(), TAGS$FROM_WORKER,
  comm=.comm)
}
res[[nWorkers]] = work
sink() # start capturing output
print(do.call(rbind, res))
} else {
## worker
mpi.barrier(.comm)
mpi.send.Robj(work, 0L, TAGS$FROM_WORKER, comm=.comm)
}
mpi.quit()

but this likely requires some serious code revision; if going this route then 
http://r-pbd.org/ might be helpful (and from a similar HPC environment).


It's always worth asking whether the code is written to be efficient in R -- a 
typical 'mistake' is to write R-level explicit 'for' loops that 
copy-and-append results, along the lines of


   len - 10
   result - NULL
   for (i in seq_len(len))
   ## some complicated calculation, then...
   result - c(result, sqrt(i))

whereas it's much better to pre-allocate and fill

result - integer(len)
for (i in seq_len(len))
result[[i]] = sqrt(i)

or

lapply(seq_len(len), sqrt)

and very much better still to 'vectorize'

result - sqrt(seq_len(len))

(timing for me are about 1 minute for copy-and-append, .2 s for pre-allocate 
and fill, and .002s for vectorize).


Pushing back on the guy providing the code (grep for for loops, and look for 
that copy-and-append pattern) might save you from having to use parallel 
evaluation at all.


Martin



Thanks,
Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How should I do GO enrichment of differential expressed miRNA?

2014-08-29 Thread Martin Morgan


On 08/28/2014 11:47 PM, my1stbox wrote:

Hi all,
First, I carried out GO enrichment to predicted/validated target genes of those 
miRNA using GOstats package. Then I find myself in a dead end. So what is the 
good practice? Is it possible to directly do GO enrichment to miRNAs? Are they 
included in GO database?


The Bioconductor mailing list

  http://bioconductor.org/help/mailing-list/mailform/

is a more appropriate forum for discussion of Bioconductor packages (like 
topGO). It's better to be more specific about what your question / problem is; 
'dead end' might mean that you had technical problems, or that you managed to 
get results but that they were unsatisfactory for some specific reason, or...


Martin


Regards,
Allen
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dev-lang/R-3.1.0: biocLite(vsn) removes all files in /

2014-05-19 Thread Martin Morgan


On 05/19/2014 01:09 AM, Henric Winell wrote:

On 2014-05-18 20:43, peter dalgaard wrote:


On 18 May 2014, at 07:38 , Jeff Newmiller jdnew...@dcn.davis.ca.us wrote:


Then you had best not do it again if you don't like that result.

1) This is not the right mailing list for issues having to do with
bioconductor. Please go to the bioconductor mailing list for that.


Hmm, this is one case where I'd really rather not scare people off R-help.
Non-BioC users do use BioC packages from time to time and what Juergen did is
what the BioConductor web pages tells new users to do (probably minus the
as-root bit). A warn-off on R-help seems entirely warranted. Good to see that
Martin Morgan is taking the issue very seriously in his post below.


As a non-BioC user using BioC packages I've always wondered why the standard R
functionality isn't enough.  Can someone, please, tell me why 'biocLite()'
should be used?

I've always succeeded installing BioC packages using the standard R tools (as
indicated by Uwe in his reply).


Conversely, I've always succeeded in installing CRAN and Bioc packages via

  source(http://bioconductor.org/biocLite.R;)
  biocLite(...)

and am more-or-less flummoxed by the extra steps I'm asked to perform (to choose 
and set repositories) when I take that rare foray into install.packages()-land! 
One point is that http://bioconductor.org actually points to an Amazon 
CloudFront address, which means that the content comes from a geographically 
proximate and reliable location (this makes choice of repository mostly 
irrelevant for normal users, just point to bioconductor.org)




Bioconductor has a repository and release schedule that differs from R 
(Bioconductor has a 'devel' branch to which new packages and updates are 
introduced, and a stable 'release' branch emitted once every 6 months to which 
bug fixes but not new features are introduced).


A consequences of the mismatch between R and Bioconductor release schedules is 
that the Bioconductor version identified by Uwe's method is sometimes not the 
most recent 'release' available. For instance, R 3.1.1 will likely be introduced 
some months before the next Bioc release. After the Bioc release, 3.1.1 users 
will be pointed to an out-of-date version of Bioconductor.


A consequence of the distinct 'devel' branch is that Uwe's method sometimes 
points only to the 'release' repository, whereas Bioconductor developers and 
users wanting leading-edge features wish to access the Bioconductor 'devel' 
repository. For instance, the next Bioc release will be available for R.3.1.x, 
so Bioconductor developers and leading-edge users need to be able to install the 
devel version of Bioconductor packages into the same version (though perhaps 
different instance or at least library location) of R that currently supports 
the release version.


An indirect consequence of the structured release is that Bioconductor packages 
generally have more extensive dependencies with one another, both explicitly via 
the usual package mechanisms and implicitly because the repository, release 
structure, and Bioconductor community  interactions favor re-use of data 
representations and analysis concepts across packages. There is thus a higher 
premium on knowing that packages are from the same release, and that all 
packages are current within the release.




These days, the main purpose of source(http://bioconductor.org/biocLite.R;) is 
to install and attach the 'BiocInstaller' package.


In a new installation, the script installs the most recent version of the 
BiocInstaller package relevant to the version of R in use, regardless of the 
relative times of R and Bioconductor release cycles. The BiocInstaller package 
serves as the primary way to identify the version of Bioconductor in use


   library(BiocInstaller)
  Bioconductor version 2.14 (BiocInstaller 1.14.2), ?biocLite for help

Since new features are often appealing to users, but at the same time require an 
updated version of Bioconductor, the source() command evaluated in an 
out-of-date R will nudge users to upgrade, e.g., in R-2.15.3


   source(http://bioconductor.org/biocLite.R;)
  A new version of Bioconductor is available after installing the most
recent version of R; see http://bioconductor.org/install

The biocLite() function is provided by BiocInstaller. This is a wrapper around 
install.packages, but with the repository chosen according to the version of 
Bioconductor in use, rather than to the version relevant at the time of the 
release of R.


biocLite also nudges users to remain current within a release, by default 
checking for out-of-date packages and asking if the user would like to update


 biocLite()
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.14 (BiocInstaller 1.14.2), R version
  3.1.0.
Old packages: 'BBmisc', 'genefilter', 'GenomicAlignments',
  'GenomicRanges', 'IRanges', 'MASS', 'reshape2', 'Rgraphviz',
  'RJSONIO', 'rtracklayer'
Update all/some/none

Re: [R] dev-lang/R-3.1.0: biocLite(vsn) removes all files in /

2014-05-18 Thread Martin Morgan

This would be very bad and certainly unintended if it were the responsibility of 
biocLite. Can we communicate off-list about this? In particular can you report


  noquote(readLines(http://bioconductor.org/biocLite.R;))

?

Martin Morgan

On 05/17/2014 10:16 PM, Juergen Rose wrote:

I had the following files in /:

root@caiman:/root(8)# ll /
total 160301
drwxr-xr-x   2 root root  4096 May 16 12:23 bin/
drwxr-xr-x   6 root root  3072 May 14 13:58 boot/
-rw-r--r--   1 root root 38673 May 14 14:22 boot_local-d.log
lrwxrwxrwx   1 root root11 Jan 22  2011 data - data_caiman/
drwxr-xr-x   7 root root  4096 Mar  9 22:29 data_caiman/
lrwxrwxrwx   1 root root23 Dec 29 13:43 data_impala - 
/net/impala/data_impala/
lrwxrwxrwx   1 root root21 Jan 27 08:13 data_lynx2 - 
/net/lynx2/data_lynx2/
drwxr-xr-x  21 root root  4040 May 14 14:40 dev/
drwxr-xr-x 160 root root 12288 May 17 17:14 etc/
-rw---   1 root root 15687 Dec 26 13:42 grub.cfg_old
lrwxrwxrwx   1 root root11 Jan 23  2011 home - home_caiman/
drwxr-xr-x   5 root root  4096 Dec 26 11:31 home_caiman/
lrwxrwxrwx   1 root root23 Dec 29 13:43 home_impala - 
/net/impala/home_impala/
lrwxrwxrwx   1 root root21 Jan 27 08:13 home_lynx2 - 
/net/lynx2/home_lynx2/
lrwxrwxrwx   1 root root 5 Mar 30 04:25 lib - lib64/
drwxr-xr-x   3 root root  4096 May 14 04:31 lib32/
drwxr-xr-x  17 root root 12288 May 16 12:23 lib64/
-rw-r--r--   1 root root   1797418 May 14 14:22 login.log
drwx--   2 root root 16384 Jan 20  2011 lost+found/
drwxr-xr-x   2 root root 0 May 14 14:21 misc/
drwxr-xr-x  10 root root  4096 Nov  4  2013 mnt/
drwxr-xr-x   4 root root 0 May 17 17:38 net/
drwxr-xr-x  13 root root  4096 Feb 13 13:25 opt/
dr-xr-xr-x 270 root root 0 May 14 14:21 proc/
drwx--  36 root root  4096 May 17 15:00 root/
drwxr-xr-x  30 root root   840 May 16 18:21 run/
drwxr-xr-x   2 root root 12288 May 16 12:23 sbin/
-rw-r--r--   1 root root 162191459 Jan 13  2011 stage3-amd64-20110113.tar.bz2
dr-xr-xr-x  12 root root 0 May 14 14:21 sys/
drwxrwxrwt  16 root root  1648 May 17 17:14 tmp/
drwxr-xr-x  19 root root  4096 May  6 04:40 usr/
drwxr-xr-x  16 root root  4096 Dec 26 11:17 var/


Then I did as root:
R

source(http://bioconductor.org/biocLite.R;)
biocLite(vsn)


Save workspace image? [y/n/c]: n
root@caiman:/root(15)# ll /
total 93
drwxr-xr-x   2 root root  4096 May 16 12:23 bin/
drwxr-xr-x   6 root root  3072 May 14 13:58 boot/
drwxr-xr-x   7 root root  4096 Mar  9 22:29 data_caiman/
drwxr-xr-x  21 root root  4040 May 14 14:40 dev/
drwxr-xr-x 160 root root 12288 May 17 17:14 etc/
drwxr-xr-x   5 root root  4096 Dec 26 11:31 home_caiman/
drwxr-xr-x   3 root root  4096 May 14 04:31 lib32/
drwxr-xr-x  17 root root 12288 May 16 12:23 lib64/
drwx--   2 root root 16384 Jan 20  2011 lost+found/
drwxr-xr-x   2 root root 0 May 14 14:21 misc/
drwxr-xr-x  10 root root  4096 Nov  4  2013 mnt/
drwxr-xr-x   2 root root 0 May 17 17:38 net/
drwxr-xr-x  13 root root  4096 Feb 13 13:25 opt/
dr-xr-xr-x 272 root root 0 May 14 14:21 proc/
drwx--  36 root root  4096 May 17 15:00 root/
drwxr-xr-x  30 root root   840 May 16 18:21 run/
drwxr-xr-x   2 root root 12288 May 16 12:23 sbin/
dr-xr-xr-x  12 root root 0 May 17 17:38 sys/
drwxrwxrwt  19 root root  1752 May 17 18:33 tmp/
drwxr-xr-x  19 root root  4096 May  6 04:40 usr/
drwxr-xr-x  16 root root  4096 Dec 26 11:17 var/

I.e., all not directory files in / disappeared. This happens on two systems.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] flowDensity package

2014-04-09 Thread Martin Morgan


On 04/09/2014 07:23 AM, Raghu wrote:

I am unable to install flowDensity package from bioconductor in R version 3.0 or
3.1.
did anyone have the same problems with this.


Please ask questions about Bioconductor packages on the Bioconductor mailing 
list

  http://bioconductor.org/help/mailing-list/

but as far as I can tell flowDensity is not a Bioconductor package!

  http://bioconductor.org/packages/release/BiocViews.html#___Software

Don't forget to provide the output of the R command

  sessionInfo()

to let us know about your operating system and R version.

Martin



Thanks,
Raghu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory use of copies

2014-01-27 Thread Martin Morgan


Hi Ross --

On 01/23/2014 05:53 PM, Ross Boylan wrote:

[Apologies if a duplicate; we are having mail problems.]

I am trying to understand the circumstances under which R makes a copy
of an object, as opposed to simply referring to it.  I'm talking about
what goes on under the hood, not the user semantics.  I'm doing things
that take a lot of memory, and am trying to minimize my use.

I thought that R was clever so that copies were created lazily.  For
example, if a is matrix, then
b - a
b  a referred to to the same object underneath, so that a complete
duplicate (deep copy) wasn't made until it was necessary, e.g.,
b[3, 1] - 4
would duplicate the contents of a to b, and then overwrite them.


Compiling your R with --enable-memory-profiling gives access to the tracemem() 
function, showing that your understanding above is correct


 b = matrix(0, 3, 2)
 tracemem(b)
[1] 0x7054020
 a = b## no copy
 b[3, 1] = 2  ## copy
tracemem[0x7054020 - 0x7053fc8]:
 b = matrix(0, 3, 2)
 tracemem(b)
 tracemem(b)
[1] 0x680e258
 b[3, 1] = 2  ## no copy


The same is apparent using .Internal(inspect()), where the first information 
@7053ec0 is the address of the data. The other relevant part is the 'NAM()' 
field, which indicates whether there are 0, 1 or (have been) at least 2 symbols 
referring to the data. NAM() increments from 1 (no duplication on modify 
required) on original creation to 2 when a = b (duplicate on modify)


 b = matrix(0, 3, 2)
 .Internal(inspect(b))
@7053ec0 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,0,0,0,...
ATTRIB:
  @7057528 02 LISTSXP g0c0 []
TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] dim (has value)
@7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2
 b[3, 1] = 2
 .Internal(inspect(b))
@7053ec0 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,2,0,0,...
ATTRIB:
  @7057528 02 LISTSXP g0c0 []
TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] dim (has value)
@7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2
 a = b
 .Internal(inspect(b))  ## data address unchanced
@7053ec0 14 REALSXP g0c4 [NAM(2),ATT] (len=6, tl=0) 0,0,0,0,0,...
ATTRIB:
  @7057528 02 LISTSXP g0c0 []
TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] dim (has value)
@7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2
 b[3, 1] = 2
 .Internal(inspect(b))  ## data address changed
@7232910 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,2,0,0,...
ATTRIB:
  @7239d28 02 LISTSXP g0c0 []
TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] dim (has value)
@7237b48 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2




The following log, from R 3.0.1, does not seem to act that way; I get
the same amount of memory used whether I copy the same object repeatedly
or create new objects of the same size.

Can anyone explain what is going on?  Am I just wrong that copies are
initially shallow?  Or perhaps that behavior only applies for function
arguments?  Or doesn't apply for class slots or reference class
variables?

   foo - setRefClass(foo, fields=list(x=ANY))
   bar - setClass(bar, slots=c(x))


using the approach above, we can see that creating an S4 or reference object in 
the way you've indicated (validity checks or other initialization might change 
this) does not copy the data although it is marked for duplication


 x = 1:2; .Internal(inspect(x))
@7553868 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
 .Internal(inspect(foo(x=x)$x))
@7553868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
 .Internal(inspect(bar(x=x)@x))
@7553868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2

On the other hand, lapply is creating copies

 x = 1:2; .Internal(inspect(x))
@757b5a8 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
 .Internal(inspect(lapply(1:2, function(i) x)))
@7551f88 19 VECSXP g0c2 [] (len=2, tl=0)
  @757b428 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
  @757b3f8 13 INTSXP g0c1 [] (len=2, tl=0) 1,2

One can construct a list without copies

 x = 1:2; .Internal(inspect(x))
@7677c18 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
 .Internal(inspect(list(x)[rep(1, 2)]))
@767b080 19 VECSXP g0c2 [NAM(2)] (len=2, tl=0)
  @7677c18 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
  @7677c18 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2

but that (creating a list of identical elements) doesn't seem to be a likely 
real-world scenario and the gain is transient


 x = 1:2; y = list(x)[rep(1, 4)]
 .Internal(inspect(y))
@507bef8 19 VECSXP g0c3 [NAM(2)] (len=4, tl=0)
  @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
  @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
  @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
  @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2
 y[[1]][1] = 2L  ## everybody copied
 .Internal(inspect(y))
@507bf40 19 VECSXP g0c3 [NAM(1)] (len=4, tl=0)
  @51502c8 13 INTSXP g0c1 [] (len=2, tl=0) 2,2
  @51502f8 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
  @5150328 13 INTSXP g0c1 [] (len=2, tl=0) 1,2
  @5150358 13 INTSXP g0c1 [] (len=2, tl=0) 1,2


Probably it is more helpful to think of reducing the number of times an object 
is _modified_, e.g.,

Re: [R] Error in dispersionPlot using cummeRbund

2014-01-05 Thread Martin Morgan


Hi Nancy --

cummeRbund is a Bioconductor package so please ask questions about it on the 
Bioconductor mailing list.


  http://bioconductor.org/help/mailing-list/mailform/

Be sure to include the maintainer packageDescription(cummeRbund)$Maintainer in 
the email.


You have the 'latest' version of cummeRbund for R-2.15.3; a more recent version 
is available when using R-3.0.2.


Martin

On 01/05/2014 08:12 AM, Yanxiang Shi wrote:

Hi all,

I'm new to RNA-seq analysis. And I'm now trying to use R to visualize the
Galaxy data. I'm using the cummeRbund to deal with the data from cuffdiff
in Galaxy.

Here is the codes I've run:


cuff= readCufflinks (dbFile = output_database, geneFPKM =

gene_FPKM_tracking, geneDiff = gene_differential_expression_testing,
isoformFPKM = transcript_FPKM_tracking,isoformDiff =
transcript_differential_expression_testing, TSSFPKM =
TSS_groups_FPKM_tracking, TSSDiff =
TSS_groups_differential_expression_testing, CDSFPKM =
CDS_FPKM_tracking, CDSExpDiff =
CDS_FPKM_differential_expression_testing, CDSDiff =
CDS_overloading_diffential_expression_testing, promoterFile =
promoters_differential_expression_testing, splicingFile =
splicing_differential_expression_testing, rebuild = T)


cuff

CuffSet instance with:
2 samples
26 genes
44 isoforms
36 TSS
0 CDS
26 promoters
36 splicing
0 relCDS


disp-dispersionPlot(genes(cuff))
disp




*Error in `$-.data.frame`(`*tmp*`, SCALE_X, value = 1L) : replacement
has 1 rows, data has 0 In addition: Warning message:In max(panels$ROW) : no
non-missing arguments to max; returning -Inf*

Does any one know why there's error? My cummeRbund is the latest version, R
is 2.15.3, and cuffdiff v1.3.0.

I've tried to search the internet for solutions but apparently it's not a
problem that people discussed much.

Thank you very much in advance!!!

Nancy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4; Setter function is not chaning slot value as expected

2013-11-10 Thread Martin Morgan


On 11/10/2013 03:54 AM, daniel schnaider wrote:

Thanks Martin. It worked well.
Two new questions related to the same subject.

1) Why create this semantic of a final argument name specifically names value?


I do not know. It is a requirement of replacement methods in R in general, not 
just S4 methods. See section 3.4.4 of RShowDoc(R-lang).



2) Regarding performance. When CustomerID(ac) - 54321 runs, does it only
change the slot from whatever it was to 54321, or it really create another
object and change all the value of all slots, keeping technically all the other
values equal and changing 54321?


Copying is tricky in R. It behaves as though a copy has been made of the entire 
object. Whether a copy is actually made, or just marked as necessary on 
subsequent modification, requires deep consideration of the code. This is the 
way R works, not just the way S4 classes work.


If instead of a single account you modelled 'Accounts', i.e., all accounts, then 
updating 1000 account id's would only make one copy, whereas if you model each 
account separately this would require 1000 copies.


Martin



thanks..


On Sat, Nov 9, 2013 at 4:20 PM, Martin Morgan mtmor...@fhcrc.org
mailto:mtmor...@fhcrc.org wrote:

On 11/09/2013 06:31 AM, daniel schnaider wrote:

It is my first time programming with S4 and I can't get the setter 
fuction
to actually change the value of the slot created by the constructor.

I guess it has to do with local copy, global copy, etc. of the variable 
-
but, I could't find anything relevant in documentation.

Tried to copy examples from the internet, but they had the same problem.

# The code
  setClass (Account ,
 representation (
 customer_id = character,
 transactions = matrix)
  )


  Account - function(id, t) {
  new(Account, customer_id = id, transactions = t)
  }


  setGeneric (CustomerID-, function(obj,
id){standardGeneric(__CustomerID-)})


Replacement methods (in R in general) require that the final argument (the
replacement value) be named 'value', so

 setGeneric(CustomerID-,
 function(x, ..., value) standardGeneric(CustomerID))

 setReplaceMethod(CustomerID, c(Account, character),
 function(x, , value)
 {
 x@customer_id - value
 x
 })

use this as

CustomerID(ac) - 54321



  setReplaceMethod(CustomerID, Account, function(obj, id){
  obj@customer_id - id
  obj
  })

  ac - Account(12345, matrix(c(1,2,3,4,5,6), ncol=2))
  ac
  CustomerID - 54321
  ac

#Output
   ac
  An object of class Account
  Slot customer_id:
  [1] 12345

  Slot transactions:
   [,1] [,2]
  [1,]14
  [2,]25
  [3,]36

# CustomerID is value has changed to 54321, but as you can see it does't
   CustomerID - 54321


   ac
  An object of class Account
  Slot customer_id:
  [1] 12345

  Slot transactions:
   [,1] [,2]
  [1,]14
  [2,]25
  [3,]36


Help!

 [[alternative HTML version deleted]]


R-help@r-project.org mailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/r-help
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793




--
Daniel Schnaider

SP Phone:  +55-11-9.7575.0822


d...@scaigroup.com mailto:d...@scaigroup.com
skype dschnaider
Linked In: http://www.linkedin.com/in/danielschnaider

w http://www.arkiagroup.com/ww.scaigroup.com http://ww.scaigroup.com/

Depoimentos de clientes http://www.scaigroup.com/Projetos/depoimentos

Casos de Sucesso  Referências http://www.scaigroup.com/Projetos

SCAI Group no Facebook http://facebook.scaigroup.com/

SCAI Group no Twitter http://twitter.scaigroup.com/

SCAI Group no Google Plus http://plus.scaigroup.com/




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024

Re: [R] S4; Setter function is not chaning slot value as expected

2013-11-10 Thread Martin Morgan


On 11/09/2013 11:31 PM, Hadley Wickham wrote:

Modelling a mutable entity, i.e. an account, is really a perfect
example of when to use reference classes.  You might find the examples
on http://adv-r.had.co.nz/OO-essentials.html give you a better feel
for the strengths and weaknesses of R's different OO systems.


Reference classes provide less memory copying and a more familiar programming 
paradigm but not necessarily fantastic performance, as illustrated here


http://stackoverflow.com/questions/18677696/stack-class-in-r-something-more-concise/18678440#18678440

and I think elsewhere on this or the R-devel list (sorry not to be able to 
provide a more precise recollection).


Martin




Hadley

On Sat, Nov 9, 2013 at 9:31 AM, daniel schnaider dschnai...@gmail.com wrote:

It is my first time programming with S4 and I can't get the setter fuction
to actually change the value of the slot created by the constructor.

I guess it has to do with local copy, global copy, etc. of the variable -
but, I could't find anything relevant in documentation.

Tried to copy examples from the internet, but they had the same problem.

# The code
 setClass (Account ,
representation (
customer_id = character,
transactions = matrix)
 )


 Account - function(id, t) {
 new(Account, customer_id = id, transactions = t)
 }


 setGeneric (CustomerID-, function(obj,
id){standardGeneric(CustomerID-)})
 setReplaceMethod(CustomerID, Account, function(obj, id){
 obj@customer_id - id
 obj
 })

 ac - Account(12345, matrix(c(1,2,3,4,5,6), ncol=2))
 ac
 CustomerID - 54321
 ac

#Output
  ac
 An object of class Account
 Slot customer_id:
 [1] 12345

 Slot transactions:
  [,1] [,2]
 [1,]14
 [2,]25
 [3,]36

# CustomerID is value has changed to 54321, but as you can see it does't
  CustomerID - 54321
  ac
 An object of class Account
 Slot customer_id:
 [1] 12345

 Slot transactions:
  [,1] [,2]
 [1,]14
 [2,]25
 [3,]36


Help!

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4; Setter function is not chaning slot value as expected

2013-11-09 Thread Martin Morgan


On 11/09/2013 06:31 AM, daniel schnaider wrote:

It is my first time programming with S4 and I can't get the setter fuction
to actually change the value of the slot created by the constructor.

I guess it has to do with local copy, global copy, etc. of the variable -
but, I could't find anything relevant in documentation.

Tried to copy examples from the internet, but they had the same problem.

# The code
 setClass (Account ,
representation (
customer_id = character,
transactions = matrix)
 )


 Account - function(id, t) {
 new(Account, customer_id = id, transactions = t)
 }


 setGeneric (CustomerID-, function(obj,
id){standardGeneric(CustomerID-)})


Replacement methods (in R in general) require that the final argument (the 
replacement value) be named 'value', so


setGeneric(CustomerID-,
function(x, ..., value) standardGeneric(CustomerID))

setReplaceMethod(CustomerID, c(Account, character),
function(x, , value)
{
x@customer_id - value
x
})

use this as

   CustomerID(ac) - 54321



 setReplaceMethod(CustomerID, Account, function(obj, id){
 obj@customer_id - id
 obj
 })

 ac - Account(12345, matrix(c(1,2,3,4,5,6), ncol=2))
 ac
 CustomerID - 54321
 ac

#Output
  ac
 An object of class Account
 Slot customer_id:
 [1] 12345

 Slot transactions:
  [,1] [,2]
 [1,]14
 [2,]25
 [3,]36

# CustomerID is value has changed to 54321, but as you can see it does't
  CustomerID - 54321



  ac
 An object of class Account
 Slot customer_id:
 [1] 12345

 Slot transactions:
  [,1] [,2]
 [1,]14
 [2,]25
 [3,]36


Help!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4 vs S3. New Package

2013-11-09 Thread Martin Morgan


On 11/09/2013 11:59 AM, Rolf Turner wrote:



For my take on the issue see fortune(strait jacket).

 cheers,

 Rolf Turner

P. S.  I said that quite some time ago and I have seen nothing
in the intervening years to change my views.


Mileage varies; the Bioconductor project attains a level of interoperability and 
re-use (http://www.nature.com/nbt/journal/v31/n10/full/nbt.2721.html) that would 
be difficult with a less formal class system.




 R. T.


On 11/10/13 04:22, daniel schnaider wrote:

Hi,

I am working on a new credit portfolio optimization package. My question is
if it is more recommended to develop in S4 object oriented or S3.

It would be more naturally to develop in object oriented paradigm, but
there is many concerns regarding S4.

1) Performance of S4 could be an issue as a setter function, actually
changes the whole object behind the scenes.


Depending on implementation, updating S3 objects could as easily trigger copies; 
this is a fact of life in R. Mitigate by modelling objects in a vector 
(column)-oriented approach rather than the row-oriented paradigm of Java / C++ / 
etc.


Martin Morgan


2) Documentation. It has been really hard to find examples in S4. Most
books and articles consider straightforward S3 examples.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread Martin Morgan


On 11/01/2013 08:22 AM, Magnus Thor Torfason wrote:

Sure,

I was attempting to be concise and boiling it down to what I saw as the root
issue, but you are right, I could have taken it a step further. So here goes.

I have a set of around around 20M string pairs. A given string (say, A) can
either be equivalent to another string (B) or not. If A and B occur together in
the same pair, they are equivalent. But equivalence is transitive, so if A and B
occur together in one pair, and A and C occur together in another pair, then A
and C are also equivalent. I need a way to quickly determine if any two strings
from my data set are equivalent or not.


Do you mean that if A,B occur together and B,C occur together, then A,B and A,C 
are equivalent?


Here's a function that returns a unique identifier (not well tested!), allowing 
for transitive relations but not circularity.


 uid - function(x, y)
{
i - seq_along(x)   # global index
xy - paste0(x, y)  # make unique identifiers
idx - match(xy, xy)

repeat {
## transitive look-up
y_idx - match(y[idx], x)   # look up 'y' in 'x'
keep - !is.na(y_idx)
if (!any(keep)) # no transitive relations, done!
break
x[idx[keep]] - x[y_idx[keep]]
y[idx[keep]] - y[y_idx[keep]]

## create new index of values
xy - paste0(x, y)
idx - match(xy, xy)
}
idx
}

Values with the same index are identical. Some tests

 x - c(1, 2, 3, 4)
 y - c(2, 3, 5, 6)
 uid(x, y)
[1] 1 1 1 4
 i - sample(x); uid(x[i], y[i])
[1] 1 1 3 1
 uid(as.character(x), as.character(y))  ## character() ok
[1] 1 1 1 4
 uid(1:10, 1 + 1:10)
 [1] 1 1 1 1 1 1 1 1 1 1
 uid(integer(), integer())
integer(0)
 x - c(1, 2, 3)
 y - c(2, 3, 1)
 uid(x, y)  ## circular!
  C-c C-c

I think this will scale well enough, but the worst-case scenario can be made to 
be log(longest chain) and copying can be reduced by using an index i and 
subsetting the original vector on each iteration. I think you could test for 
circularity by checking that the updated x are not a permutation of the kept x, 
all(x[y_idx[keep]] %in% x[keep]))


Martin



The way I do this currently is to designate the smallest (alphabetically) string
in each known equivalence set as the main entry. For each pair, I therefore
insert two entries into the hash table, both pointing at the mail value. So
assuming the input data:

A,B
B,C
D,E

I would then have:

A-A
B-A
C-B
D-D
E-D

Except that I also follow each chain until I reach the end (key==value), and
insert pointers to the main value for every value I find along the way. After
doing that, I end up with:

A-A
B-A
C-A
D-D
E-D

And I can very quickly check equivalence, either by comparing the hash of two
strings, or simply by transforming each string into its hash, and then I can use
simple comparison from then on. The code for generating the final hash table is
as follows:

h : Empty hash table created with hash.new()
d : Input data
hash.deep.get : Function that iterates through the hash table until it finds a
key whose value is equal to itself (until hash.get(X)==X), then returns all the
values in a vector


h = hash.new()
for ( i in 1:nrow(d) )
{
 deep.a  = hash.deep.get(h, d$a[i])
 deep.b  = hash.deep.get(h, d$b[i])
 equivalents = sort(unique(c(deep.a,deep.b)))
 equiv.id= min(equivalents)
 for ( equivalent in equivalents )
 {
 hash.put(h, equivalent, equiv.id)
 }
}


I would so much appreciate if there was a simpler and faster way to do this.
Keeping my fingers crossed that one of the R-help geniuses who sees this is
sufficiently interested to crack the problem

Best,
Magnus

On 11/1/2013 1:49 PM, jim holtman wrote:

It would be nice if you followed the posting guidelines and at least
showed the script that was creating your entries now so that we
understand the problem you are trying to solve.  A bit more
explanation of why you want this would be useful.  This gets to the
second part of my tag line:  Tell me what you want to do, not how you
want to do it.  There may be other solutions to your problem.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Nov 1, 2013 at 9:32 AM, Magnus Thor Torfason
zulutime@gmail.com wrote:

Pretty much what the subject says:

I used an env as the basis for a Hashtable in R, based on information that
this is in fact the way environments are implemented under the hood.

I've been experimenting with doubling the number of entries, and so far it
has seemed to be scaling more or less linearly, as expected.

But as I went from 17 million entries to 34 million entries, the completion
time has gone from 18 hours, to 5 days

Re: [R] S4 base class

2013-10-17 Thread Martin Morgan


On 10/17/2013 08:54 AM, Michael Meyer wrote:


Suppose you have a base class Base which implements a function Base::F
which works in most contexts but not in the context of ComplicatedDerived 
class
where some preparation has to happen before this very same function can be 
called.

You would then define

void ComplicatedDerived::F(...){

 preparation();
 Base::F();
}

You can nealry duplicate this in R via

setMethod(F,
signature(this=ComplicatedDerived),
definition=function(this){

 preparation(this)
 F(as(this,Base))
})

but it will fail whenever F uses virtual functions (i.e. generics) which are 
only defined
for derived classes of Base


With

  .A - setClass(A, representation(a=numeric))
  .B - setClass(B, representation(b=numeric), contains=A)

  setGeneric(f, function(x, ...) standardGeneric(f))

  setMethod(f, A, function(x, ...) {
  message(f,A-method)
  g(x, ...)   # generic with methods only for derived classes
  })

  setMethod(f, B, function(x, ...) {
  message(f,B-method)
  callNextMethod(x, ...)  # earlier response from Duncan Murdoch
  })

  setGeneric(g, function(x, ...) standardGeneric(g))

  setMethod(g, B, function(x, ...) {
  message(g,B-method)
  x
  })

one has

 f(.B())
f,B-method
f,A-method
g,B-method

An object of class B
Slot b:
numeric(0)

Slot a:
numeric(0)

?


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to merge GRange object?

2013-10-16 Thread Martin Morgan


On 10/16/2013 06:32 AM, John linux-user wrote:

Hello everyone,

I am wondering how to simply merge two GRanges objects by range field and add 
the value by additional vector. For example, I have two objects below



Hi -- GRanges is from a Bioconductor package, so please ask on the Bioconductor 
mailing list


  http://bioconductor.org/help/mailing-list/

I think you might do hits = findOverlaps(obj1, obj2) to get indexes of 
overlapping ranges, then pmin(obj1[queryHits(obj1)], obj2[subjectHits(obj2)]) 
and pmax() to get start and end coordinates, and construct a new GRanges from 
those. If you provide an easily reproducile example (e.g., constructing some 
sample GRanges objects 'by hand' using GRanges()) and post to the Bioconductor 
mailing list you'll likely get a complete answer.


Martin


obj1

seqnames   ranges strand |   Val
 RleIRanges  Rle | integer
   [1] chr1_random [272531, 272571]  + |88
   [2] chr1_random [272871, 272911]  + |45

obj2
  seqnames   ranges strand |   Val
 RleIRanges  Rle | integer
   [1] chr1_random [272531, 272581]  + |800
   [2] chr1_random [272850, 272911]  + |450

after merged, it should be an object as the following mergedObject and it would 
concern the differences in IRANGE data (e.g. 581 and 850 in obj2 above were 
different from those of obj1, which were 571 and 871 respectively)

mergedObject

  seqnames   ranges strand | object2Val   
object1Val
 RleIRanges  Rle | integer 
integer
   [1] chr1_random [272531, 272581]  + |800   88
   [2] chr1_random [272850, 272911]  + |450   45





On Wednesday, October 16, 2013 8:31 AM, Terry Therneau thern...@mayo.edu 
wrote:



On 10/16/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

I'm trying to use coxph() function to fit a very simple Cox proportional
hazards regression model (only one covariate) but the parameter space is
restricted to an open set (0, 1). Can I still obtain a valid estimate by
using coxph function in this scenario? If yes, how? Any suggestion would be
greatly appreciated. Thanks!!!


Easily:
 1.  Fit the unrestricted model.  If the solution is in 0-1 you are done.
 2.  If it is outside, fix the coefficient.  Say that the solution is 1.73, 
then the
optimal solution under contraint is 1.
 Redo the fit adding the paramters  init=1, iter=0.  This forces the 
program to
give the loglik and etc for the fixed coefficient of 1.0.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bioconductor / AnnotationDbi: Why does a GOAllFrame contain more rows than its argument GoFrame?

2013-10-02 Thread Martin Morgan


On 10/02/2013 09:28 AM, Asis Hallab wrote:

Dear Bioconductor Experts,


This will be responded to on the Bioconductor mailing list; please address any 
follow-ups there.


http://bioconductor.org/help/mailing-list/

Martin




thank you for providing such a useful tool-set.

I have a question regarding the package AnnotationDbi, specifically
the classes GOFrame and GOALLFrame.

During a GO Enrichment Analysis I create a data frame with Arabidopsis
thaliana GO annotations and from that first a GOFrame and than from
this GOFrame a GOALLFrame. Checking the result with

nrow(  getGOFrameData(  athal.go.all.frame ) ) # The GOAllFrame

and comparing it with

nrow( athal.go.frame ) # The GoFrame

I realize that the GOALLFrame has more than 5 times more rows than my
original GO annotation table. If I provide
organism='Arabidopsis thaliana'
to the constructor of GOFrame this ratio increases even further.

Unfortunately I could not find any documentation on this, so I feel
forced to bother you with my questions:

1) Why does GOALLFrame so many more annotations?
2) Why and from where does it retrieve the organism specific ones that
are added when a model organism like 'Arabidopsis thaliana' is
provided?
3) I suspected that all ancestors of annotated terms are added, but
when I did so myself, I still got less GO term annotations? So do you
add ancestors of the is_a type and possibly other relationship types
like part_of etc. ?

Please let me know your answers soon. Your help will be much appreciated.

Kind regards!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] EdgeR annotation

2013-08-25 Thread Martin Morgan


On 08/24/2013 04:50 AM, Robin Mjelle wrote:

after updating R and edgeR I lost the annotations in the final
Diff.Expressed matrix (toptags) when running the edgeR pipeline. How do I
get the row.names from the data matrix into the topTag-matrix?

data - read.table(KO_and_WT_Summary_miRNA_Expression.csv, row.names=1,
sep=, header=T)


edgeR is a Bioconductor package, so please ask on their mailing list (no 
subscription required!)


  http://bioconductor.org/help/mailing-list/

Remember to provide a reproducible example (people on the list will not be able 
to created your 'data' object; perhaps working with the simulated data on the 
help page ?glmFit is a good place to start?) and to include the output of 
sessionInfo() so that there is no ambiguity about the software version you are 
using.


Martin




keep - rowSums(cpm(data)2) =2
data - data[keep, ]
table(keep)
y - DGEList(counts=data[,1:18], genes=data[,0:1])
y - calcNormFactors(y)
y$samples
plotMDS(y,main=)
Time=c(0.25h,0.5h,1h,2h,3h,6h,12h,24h,48h,0.25h,0.5h,1h,2h,3h,6h,12h,24h,48h)
Condition=c(KO,KO,KO,KO,KO,KO,KO,KO,KO,WT,WT,WT,WT,WT,WT,WT,WT,WT)
design - model.matrix(~0+Time+Condition)
rownames(design) - colnames(y)
y - estimateGLMCommonDisp(y, design, verbose=TRUE,
method=deviance,robust=TRUE, subset=NULL)
y - estimateGLMTrendedDisp(y, design)
y - estimateGLMTagwiseDisp(y, design)
fit - glmFit(y, design)
lrt - glmLRT(fit)
topTags(lrt)

Coefficient:  ConditionWT
  genes  logFClogCPMLR PValue FDR
189   5128 -11.028422  7.905804  4456.297  0   0
188  12271 -10.582267  9.061326  5232.075  0   0
167 121120  -9.831894 12.475576  5957.104  0   0
34  255235  -9.771266 13.592968  7355.592  0   0
168 311906  -9.597952 13.907951 10710.111  0   0
166 631262  -9.592550 14.932018 11719.222  0   0
79  79   9.517226 11.466696  7964.269  0   0
169   2512  -8.946429  6.758584  2502.548  0   0
448   3711  -7.650068  7.764682  2914.784  0   0
32  260769  -7.412197 13.633352  4906.198  0   0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Method dispatch in S4

2013-08-09 Thread Martin Morgan


On 08/09/2013 07:45 AM, Bert Gunter wrote:

Simon:

Have a look at the proto package for which there is a vignette. You
may find it suitable for your needs and less intimidating.


Won't help much with S4, though! Some answers here

http://stackoverflow.com/questions/5437238/which-packages-make-good-use-of-s4-objects

including from Bioconductor simple class in EBImage, the advanced IRanges 
package and the 'toy' StudentGWAS.


Martin



Cheers,
Bert

On Fri, Aug 9, 2013 at 7:40 AM, Simon Zehnder szehn...@uni-bonn.de wrote:

Hi Martin,

thank you very much for this profound answer! Your added design advice is very 
helpful, too!

For the 'simple example': Sometimes I am still a little overwhelmed from a 
certain setting in the code and my ideas how I want to handle a process. But I 
learn from session to session. In future I will also span the lines more than 
80 columns. I am used to the indent in my vim editor.

I have one further issue: I do know, that you are one of the leading developers 
of the bioconductor package which uses (as far as I have read) extensively OOP 
in R. Is there a package you could suggest to me to learn from by reading and 
understanding the code? Where can I find the source code?

Best

Simon


On Aug 8, 2013, at 10:00 PM, Martin Morgan mtmor...@fhcrc.org wrote:


On 08/04/2013 02:13 AM, Simon Zehnder wrote:

So, I found a solution: First in the initialize method of class C coerce
the C object into a B object. Then call the next method in the list with the
B class object. Now, in the initialize method of class B the object is a B
object and the respective generateSpec method is called. Then, in the
initialize method of C the returned object from callNextMethod has to be
written to the C class object in .Object. See the code below.

setMethod(initialize, C, function(.Object, value) {.Object@c - value;
object - as(.Object, B); object - callNextMethod(object, value);
as(.Object, B) - object; .Object - generateSpec(.Object);
return(.Object)})

This setting works. I do not know though, if this setting is the usual way
such things are done in R OOP. Maybe the whole class design is
disadvantageous. If anyone detects a mistaken design, I am very thankful to
learn.


Hi Simon -- your 'simple' example is pretty complicated, and I didn't really 
follow it in detail! The code is not formatted for easy reading (e.g., lines 
spanning no more than 80 columns) and some of it (e.g., generateSpec) might not 
be necessary to describe the problem you're having.

A good strategy is to ensure that 'new' called with no arguments works (there 
are other solutions, but following this rule has helped me to keep my classes 
and methods simple). This is not the case for

  new(A)
  new(C)

The reason for this strategy has to do with the way inheritance is implemented, 
in particular the coercion from derived to super class. Usually it is better to 
provide default values for arguments to initialize, and to specify arguments 
after a '...'. This means that your initialize methods will respects the 
contract set out in ?initialize, in particular the handling of unnamed 
arguments:

 ...: data to include in the new object.  Named arguments
  correspond to slots in the class definition. Unnamed
  arguments must be objects from classes that this class
  extends.

I might have written initialize,A-method as

  setMethod(initialize, A, function(.Object, ..., value=numeric()){
  .Object - callNextMethod(.Object, ..., a=value)
  generateSpec(.Object)
  })

Likely in a subsequent iteration I would have ended up with (using the 
convention that function names preceded by '.' are not exported)

  .A - setClass(A, representation(a = numeric, specA = numeric))

  .generateSpecA - function(a) {
  1 / a
   }

  A - function(a=numeric(), ...) {
  specA - .generateSpecA(a)
  .A(..., a=a, specA=specA)
  }

  setMethod(generateSpec, A, function(object) {
  .generateSpecA(object@a)
  })

ensuring that A() returns a valid object and avoiding the definition of an 
initialize method entirely.

Martin



Best

Simon


On Aug 3, 2013, at 9:43 PM, Simon Zehnder simon.zehn...@googlemail.com
wrote:


setMethod(initialize, C, function(.Object, value) {.Object@c - value;
.Object - callNextMethod(.Object, value); .Object -
generateSpec(.Object); return(.Object)})


__ R-help@r-project.org mailing
list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
guide http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R

Re: [R] Method dispatch in S4

2013-08-08 Thread Martin Morgan


On 08/04/2013 02:13 AM, Simon Zehnder wrote:

So, I found a solution: First in the initialize method of class C coerce
the C object into a B object. Then call the next method in the list with the
B class object. Now, in the initialize method of class B the object is a B
object and the respective generateSpec method is called. Then, in the
initialize method of C the returned object from callNextMethod has to be
written to the C class object in .Object. See the code below.

setMethod(initialize, C, function(.Object, value) {.Object@c - value;
object - as(.Object, B); object - callNextMethod(object, value);
as(.Object, B) - object; .Object - generateSpec(.Object);
return(.Object)})

This setting works. I do not know though, if this setting is the usual way
such things are done in R OOP. Maybe the whole class design is
disadvantageous. If anyone detects a mistaken design, I am very thankful to
learn.


Hi Simon -- your 'simple' example is pretty complicated, and I didn't really 
follow it in detail! The code is not formatted for easy reading (e.g., lines 
spanning no more than 80 columns) and some of it (e.g., generateSpec) might not 
be necessary to describe the problem you're having.


A good strategy is to ensure that 'new' called with no arguments works (there 
are other solutions, but following this rule has helped me to keep my classes 
and methods simple). This is not the case for


  new(A)
  new(C)

The reason for this strategy has to do with the way inheritance is implemented, 
in particular the coercion from derived to super class. Usually it is better to 
provide default values for arguments to initialize, and to specify arguments 
after a '...'. This means that your initialize methods will respects the 
contract set out in ?initialize, in particular the handling of unnamed arguments:


 ...: data to include in the new object.  Named arguments
  correspond to slots in the class definition. Unnamed
  arguments must be objects from classes that this class
  extends.

I might have written initialize,A-method as

  setMethod(initialize, A, function(.Object, ..., value=numeric()){
  .Object - callNextMethod(.Object, ..., a=value)
  generateSpec(.Object)
  })

Likely in a subsequent iteration I would have ended up with (using the 
convention that function names preceded by '.' are not exported)


  .A - setClass(A, representation(a = numeric, specA = numeric))

  .generateSpecA - function(a) {
  1 / a
   }

  A - function(a=numeric(), ...) {
  specA - .generateSpecA(a)
  .A(..., a=a, specA=specA)
  }

  setMethod(generateSpec, A, function(object) {
  .generateSpecA(object@a)
  })

ensuring that A() returns a valid object and avoiding the definition of an 
initialize method entirely.


Martin



Best

Simon


On Aug 3, 2013, at 9:43 PM, Simon Zehnder simon.zehn...@googlemail.com
wrote:


setMethod(initialize, C, function(.Object, value) {.Object@c - value;
.Object - callNextMethod(.Object, value); .Object -
generateSpec(.Object); return(.Object)})


__ R-help@r-project.org mailing
list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting
guide http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Check the class of an object

2013-07-23 Thread Martin Morgan


On 07/23/2013 09:59 AM, Simon Zehnder wrote:

Hi David,

thanks for the reply. You are right. Using the %in% is more stable and I gonna 
change my code.


you said you were you were using S4 classes. S4 classes do not report vectors of 
length != 1, from ?class


 For objects which have a formal class, its name is returned by 'class'
 as a character vector of length one

so a first unit test could be

  stopifnot(length(class(myObject)) != 1L)




When testing for a specific class using 'is' one has to start at the lowest 
heir and walk up the inheritance structure. Starting at the checks at the root 
will always give TRUE. Having a structure which is quite complicated let me 
move to the check I suggested in my first mail.

Best

Simon

On Jul 23, 2013, at 6:15 PM, David Winsemius dwinsem...@comcast.net wrote:



On Jul 23, 2013, at 5:36 AM, Simon Zehnder wrote:


Dear R-Users and R-Devels,

I have large project based on S4 classes. While writing my unit tests I found 
out, that 'is' cannot test for a specific class, as also inherited classes can 
be treated as their super classes. I need to do checks for specific classes. 
What I do right now is sth. like

if (class(myClass) == firstClass) {


I would think that you would need to use `%in%` instead.

if( firstClass %in% class(myObject) ){

Objects can have more than one class, so testing with == would fail in those 
instances.




} else if (class(myClass) == secondClass) {

}

Is this the usual way how classes are checked in R?


Well, `inherits` IS the usual way.


I was expecting some specific method (and 'inherits' or 'extends' is not what I 
look for)...


Best

Simon

[[alternative HTML version deleted]]


Plain-text format is the recommended format for Rhelp

--
David Winsemius
Alameda, CA, USA



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'save' method for S4 class

2013-07-18 Thread Martin Morgan


On 07/18/2013 03:47 AM, Simon Zehnder wrote:

Hi Christopher,

I think, that save is no generic function like plot, show, etc. So at 
first you have to determine a generic.

setGeneric(save, function(x, file_Path) standardGeneric(save))


The implementation offered by Christofer shows write.table, and the end result 
is a text file rather than a binary file expected from base::save. This makes it 
seem inappropriate to use 'save' in this context.


Instead, it seems that what Cristofer wants to implement is functionality to 
support write.table. ?write.table says


 'write.table' prints its required argument 'x' (after converting
 it to a data frame if it is not one nor a matrix)

So implementing an S3 method

  as.data.frame.MyClass -
  function(x, row.names=NULL, optional=FALSE, ...)
  {
  x@x
  }

is all that is needed, gaining lots of flexibility by re-using the code of 
write.table.


  myClass = new(MyClass, x=data.frame(x=1:3, y=3:1))
  write.table(myClass, stdout())

In the case of a 'save' method producing binary output (but this is what save 
does already...), I think it's better practice to promote the non-generic 'save' 
to an S4 generic using it's existing arguments; in this case it makes sense to 
restrict dispatch to '...', so


  setGeneric(save, signature=...)

The resulting generic is

 getGeneric(save)
standardGeneric for save defined from package .GlobalEnv

function (..., list = character(), file = stop('file' must be specified),
ascii = FALSE, version = NULL, envir = parent.frame(), compress = !ascii,
compression_level, eval.promises = TRUE, precheck = TRUE)
standardGeneric(save)
environment: 0x4a7b860
Methods may be defined for arguments: ...
Use  showMethods(save)  for currently available ones.

This means that a method might be defined as

  setMethod(save, MyClass, function(..., list = character(),
  file = stop('file' must be specified), ascii = FALSE, version = NULL,
  envir = parent.frame(), compress = !ascii, compression_level,
  eval.promises = TRUE, precheck = TRUE)
  {
  ## check non-sensical or unsupported user input for 'MyClass'
  if (!is.null(version))
  stop(non-NULL 'version' not supported for 'MyClass')
  ## ...
  ## implement save on MyClass
  })

It might be that Christofer wants to implement a 'write.table-like' (text 
output) or a 'save-like' (binary output) function that really does not conform 
to the behavior of write.table (e.g., producing output that could not be input 
by read.table) or save. Then I think the better approach is to implement 
writeMyClass (for text output) or saveMyClass (for binary output).


Martin



Now your definition via setMethod.


Best

Simon



On Jul 18, 2013, at 12:09 PM, Christofer Bogaso bogaso.christo...@gmail.com 
wrote:


Hello again,

I am trying to define the 'save' method for my S4 class as below:

setClass(MyClass, representation(
Slot1 = data.frame
))  

setMethod(save, MyClass, definition = function(x, file_Path) {

write.table(x@Slot1, file = file_Path, append = FALSE, quote = 
TRUE,
sep = ,,
eol = \n, na = NA, dec = 
., row.names = FALSE,
col.names = TRUE, qmethod = c(escape, 
double),
fileEncoding = )
})

However while doing this I am getting following error:

Error in conformMethod(signature, mnames, fnames, f, fdef, definition) :
  in method for ‘save’ with signature ‘list=MyClass’: formal
arguments (list = MyClass, file = MyClass, ascii = MyClass,
version = MyClass, envir = MyClass, compress = MyClass,
compression_level = MyClass, eval.promises = MyClass, precheck =
MyClass) omitted in the method definition cannot be in the signature


Can somebody point me what will be the correct approach to define
'save' method for S4 class?

Thanks and regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Setting Derived Class Slots

2013-07-16 Thread Martin Morgan


On 07/16/2013 06:36 AM, Steve Creamer wrote:

Dear AllI am really struggling with oo in R. Trying to set attributes of
the base class in a derived class method but the slot is only populated in
the method itself, not when I try to print out the object from the console.

Code is

library(RODBC)
#
# -
# Define a medical event class. This is abstract (VIRTUAL)
# -
#
setClass(Medical_Event,
  representation(
 Event_Name=character,
 Capacity_Profile=numeric,
 Delay_Profile=numeric,
 VIRTUAL),
  prototype(Event_Name=An
Event,Capacity_Profile=c(.2,.2,.2,.2,.2,0,0)))

setGeneric(getDelayProfile,function(object){standardGeneric(getDelayProfile)},simpleInheritanceOnly=T)

# --
# Now define a derived class called GP_Event
# --

setClass(GP_Event,representation(Surgery_Name=character),contains=c(Medical_Event),prototype(Surgery_Name=Unknown))

# -
# Now define a derived class called OP_Appt
# -

setClass(OP_Appt,representation(Clinic_Name=character),contains=c(Medical_Event),prototype(Clinic_Name=Unknown))


setMethod(f=getDelayProfile,signature(OP_Appt),definition=function(object)
{
   OpTablesDB-odbcDriverConnect(DRIVER=Microsoft Access Driver (*.mdb,
*.accdb);
  DBQ=Z:\\srp\\Development
Code\\Projects\\CancerPathwaySimulation\\Database\\CancerPathway.accdb)
   strQuery-select * from op_profile
   odbcQuery(OpTablesDB,strQuery)
   dfQuery-odbcFetchRows(OpTablesDB)
   odbcClose(OpTablesDB)
   delay-dfQuery$data[[1]][1:70]
   prob-dfQuery$data[[2]][1:70]
#  as(object,Medical_Event)@Delay_Profile-prob
   object@Delay_Profile - prob
object
}
)

if I instantiate a new instance of the derived class

*aTest-new(OPP_Appt)*and then try and populate the attribute
Delay_Profile by

*getDelayProfile(aTest) *

the object slot seems to be populated in the method because I can print it
out, viz

An object of class OP_Appt
Slot Clinic_Name:
[1] Unknown

Slot Event_Name:
[1] An Event

Slot Capacity_Profile:
[1] 0.2 0.2 0.2 0.2 0.2 0.0 0.0

*Slot Delay_Profile:
  [1]  14  21  25  29  27  49  72  71  43  65 102 134 223 358  24  14  21  25
35  31  38  43  31  23  21  26  46  54  42  26
[31]  34  24  25  41  48  33  30  17  18  31  24  35  35  24  16  32  36  39
46  36  26  16  27  21  30  32  33  27   7   5
[61]   9  10   9  11   8   6   1  11  14  10*

but when the method returns and I type

*aTest*

I get

An object of class OP_Appt
Slot Clinic_Name:
[1] Unknown

Slot Event_Name:
[1] An Event

Slot Capacity_Profile:
[1] 0.2 0.2 0.2 0.2 0.2 0.0 0.0

*Slot Delay_Profile:
numeric(0)*

ie the Delay_Profile slot is empty

What haven't I done - can anybody help me please?


It helps to provide a more minimal example, preferably reproducible (no data 
base queries needed to illustrate your problem); I'm guessing that, just as with


f = funtion(l) { l$a = 1; l }
lst = list(a=0, b=1)

one would 'update' lst with

  lst = f(lst)

and not

  f(lst)

you need to assign the return value to the original object

  aTest - getDelayProfile(aTest)

Martin


Many Thanks

Steve Creamer




--
View this message in context: 
http://r.789695.n4.nabble.com/Setting-Derived-Class-Slots-tp4671683.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] suppress startup messages from default packages

2013-07-15 Thread Martin Morgan


On 07/15/2013 06:25 AM, Duncan Murdoch wrote:

On 15/07/2013 8:49 AM, Andreas Leha wrote:

Hi Helios,

Helios de Rosario helios.derosa...@ibv.upv.es writes:

 Hi all,

 several packages print messages during loading.  How do I avoid to
 see
 them when the packages are in the defaultPackages?

 Here is an example.

 With this in ~/.Rprofile
 ,[ ~/.Rprofile ]
 | old - getOption(defaultPackages)
 | options(defaultPackages = c(old, filehash))
 | rm(old)
 `

 I get as last line when starting R:
 ,
 | filehash: Simple key-value database (2.2-1 2012-03-12)
 `

 Another package with (even more) prints during startup is
 tikzDevice.

How can I avoid to get these messages?


 There are several options in ?library to control the messages that are
 displayed when loading packages. However, this does not seem be able to
 supress all the messages. Some messages are defined by the package
 authors, because they feel necessary that the user reads them.



Thanks for your answer.  When I actually call library() or require()
myself I can avoid all messages.  There are hacks to do that even for
the very persistent messages [fn:1].

My question is how to suppress these messages, when it is not me who
calls library() or require(), but when the package is loaded during R's
startup through the defaultPackages option.


You could try the --slave command line option on startup.  If that isn't
sufficient, try getting the maintainer to change the package behaviour, or do it
yourself.


In a hack-y way ?setHook and ?sink seem to work

 setHook(packageEvent(filehash, onLoad), function(...) 
sink(file(tempfile(), w), type=message))
 setHook(packageEvent(filehash, attach), function(...) sink(file=NULL, 
type=message), append)

 library(filehash)


Martin



Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Check a list of genes for a specific GO term

2013-07-08 Thread Martin Morgan

Please ask follow-up questions about Bioconductor packages on the Bioconductor 
mailing list.

  http://bioconductor.org/help/mailing-list/mailform/

If you are interested in organisms rather than chips, use the organism package, 
e.g., for Homo sapiens

  library(org.Hs.eg.db)
  df0 = select(org.Hs.eg.db, keys(org.Hs.eg.db), GO)

giving

   head(df)
ENTREZID GO EVIDENCE ONTOLOGY
  11 GO:0003674   ND   MF
  21 GO:0005576  IDA   CC
  31 GO:0008150   ND   BP
  4   10 GO:0004060  IEA   MF
  5   10 GO:0005829  TAS   CC
  6   10 GO:0006805  TAS   BP

from which you might

  df = unique(df0[df0$ONTOLOGY == BP, c(ENTREZID, GO)])
  len = tapply(df$ENTREZID, df$GO, length)
  keep = len[len  1000]

to get a vector of counts, with names being GO ids. Remember that the GO is a 
directed acyclic graph, so terms are nested; you'll likely want to give some 
thought to what you're actually wanting.

The vignettes in the AnnotationDbi and Category packages

  http://bioconductor.org/packages/release/bioc/html/AnnotationDbi.html
  http://bioconductor.org/packages/release/bioc/html/Category.html

are two useful sources of information, as is the annotation work flow

  http://bioconductor.org/help/workflows/annotation/

Martin

- Chirag Gupta cxg...@email.uark.edu wrote:
 Hi
 I think I asked the wrong question. Apologies.
 
 Actually I want all the GO BP annotations for my organism and from them I
 want to retain only those annotations which annotate less than a specified
 number of genes. (say 1000 genes)
 
 I hope I have put it clearly.
 
 sorry again.
 
 Thanks!
 
 
 On Sun, Jul 7, 2013 at 6:55 AM, Martin Morgan mtmor...@fhcrc.org wrote:
 
  In Bioconductor, install the annotation package
 
 
  http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData
 
  corresponding to your chip, e.g.,
 
source(http://bioconductor.org/biocLite.R;)
biocLite(hgu95av2.db)
 
  then load it and select the GO terms corresponding to your probes
 
library(hgu95av2.db)
lkup - select(hgu95av2.db, rownames(dat), GO)
 
  then use standard R commands to find the probesets that have the GO id
  you're interested in
 
keep = lkup$GO %in% GO:0006355
unique(lkup$PROBEID[keep])
 
  Ask follow-up questions about Bioconductor packages on the Bioconductor
  mailing list
 
http://bioconductor.org/help/mailing-list/mailform/
 
  Martin
  - Rui Barradas ruipbarra...@sapo.pt wrote:
   Hello,
  
   Your question is not very clear, maybe if you post a data example.
   To do so, use ?dput. If your data frame is named 'dat', use the
  following.
  
   dput(head(dat, 50))  # paste the output of this in a post
  
  
   If you want to get the rownames matching a certain pattern, maybe
   something like the following.
  
  
   idx - grep(GO:0006355, rownames(dat))
   dat[idx, ]
  
  
   Hope this helps,
  
   Rui Barradas
  
  
   Em 07-07-2013 07:01, Chirag Gupta escreveu:
Hello everyone
   
I have a dataframe with rows as probeset ID and columns as samples
I want to check the rownames and find which are those probes are
transcription factors. (GO:0006355 )
   
Any suggestions?
   
Thanks!
   
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 *Chirag Gupta*
 Department of Crop, Soil, and Environmental Sciences,
 115 Plant Sciences Building, Fayetteville, Arkansas 72701

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Check a list of genes for a specific GO term

2013-07-07 Thread Martin Morgan

In Bioconductor, install the annotation package

  http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData

corresponding to your chip, e.g.,

  source(http://bioconductor.org/biocLite.R;)
  biocLite(hgu95av2.db)

then load it and select the GO terms corresponding to your probes

  library(hgu95av2.db)
  lkup - select(hgu95av2.db, rownames(dat), GO)
  
then use standard R commands to find the probesets that have the GO id you're 
interested in

  keep = lkup$GO %in% GO:0006355
  unique(lkup$PROBEID[keep])

Ask follow-up questions about Bioconductor packages on the Bioconductor mailing 
list

  http://bioconductor.org/help/mailing-list/mailform/

Martin
- Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,
 
 Your question is not very clear, maybe if you post a data example.
 To do so, use ?dput. If your data frame is named 'dat', use the following.
 
 dput(head(dat, 50))  # paste the output of this in a post
 
 
 If you want to get the rownames matching a certain pattern, maybe 
 something like the following.
 
 
 idx - grep(GO:0006355, rownames(dat))
 dat[idx, ]
 
 
 Hope this helps,
 
 Rui Barradas
 
 
 Em 07-07-2013 07:01, Chirag Gupta escreveu:
  Hello everyone
 
  I have a dataframe with rows as probeset ID and columns as samples
  I want to check the rownames and find which are those probes are
  transcription factors. (GO:0006355 )
 
  Any suggestions?
 
  Thanks!
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 6 >

1 - 100 of 530 matches

Mail list logo