Re: [R] sub-setting rows based on dates in R

2017-01-31 Thread Jim Lemon
Hi Md,
This kind of clunky, but it might do what you want.

df1<-read.table(text="DateRainfall_Duration
 6/14/2016   10
 6/15/2016   20
 6/17/2016   10
 8/16/2016   30
 8/19/2016   40",
 header=TRUE,stringsAsFactors=FALSE)

df1$Date<-as.Date(df1$Date,"%m/%d/%Y")

df2<-read.table(text="DateRemoval.Rate
 6/17/201664.7
 6/30/201622.63
 7/14/201618.18
 8/19/201627.87",
 header=TRUE,stringsAsFactors=FALSE)

df2$Date<-as.Date(df2$Date,"%m/%d/%Y")

df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)

df3row<-0

for(i in 1:dim(df2)[1]) {
 rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
 if(!length(rdrows)) rdrows<-lastrows
 lastrows<-rdrows
 nrows<-length(rdrows)
 for(row in 1:nrows) {
  df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
  df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
  df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
 }
 df3row<-df3row+nrows
}

names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
df3

Jim

On Wed, Feb 1, 2017 at 3:48 AM, Md Sami Bin Shokrana  wrote:
> Hello guys, I am trying to solve a problem in R. I have 2 data frames which 
> look like this:
> df1 <-
>   DateRainfall_Duration
> 6/14/2016   10
> 6/15/2016   20
> 6/17/2016   10
> 8/16/2016   30
> 8/19/2016   40
>
> df2 <-
>   DateRemoval.Rate
> 6/17/201664.7
> 6/30/201622.63
> 7/14/201618.18
> 8/19/201627.87
>
> I want to look up the dates from df2 in df1 and their corresponding 
> Rainfall_Duration data. For example, I want to look for the 1st date of df2 
> in df1 and subset rows in df1 for that specific date and 7 days prior to 
> that. additionally, for example: for 6/30/2016 (in df2) there is no dates 
> available in df1 within it's 7 days range. So, in this case I just want to 
> extract the results same as it's previous date (6/17/2016) in df2. Same logic 
> goes for 7/14/2016(df2).
> The output should look like this:
>
> df3<-
>
> Rate.Removal.Date  Date Rainfall_Duration
> 6/17/2016  6/14/2016  10
> 6/17/2016  6/15/2016  20
> 6/17/2016  6/17/2016  10
> 6/30/2016  6/14/2016  10
> 6/30/2016  6/15/2016  20
> 6/30/2016  6/17/2016  10
> 7/14/2016  6/14/2016  10
> 7/14/2016  6/15/2016  20
> 7/14/2016  6/17/2016  10
> 8/19/2016  8/16/2016  30
> 8/19/2016  8/19/2016  40
>
> I could subset data for the 7 days range. But could not do it when no dates 
> are available in that range. I have the following code:
> library(plyr)
> library (dplyr)
> df1$Date <- as.Date(df1$Date,format = "%m/%d/%Y")
> df2$Date <- as.Date(df2$Date,format = "%m/%d/%Y")
>
> df3 <- lapply(df2$Date, function(x){
>   filter(df1, between(Date, x-7, x))
> })
>
> names(df3) <- as.character(df2$Date)
> bind_rows(df3, .id = "Rate.Removal.Date")
> df3 <- ldply (df3, data.frame, .id = "Rate.Removal.Date")
>
> I hope I could explain my problem properly. I would highly appreciate if 
> someone can help me out with this code or a new one. Thanks in advance.
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HELP with GLM

2017-01-31 Thread Bert Gunter
Dear Xavier:

It sounds like you have a right mess! Perhaps others cleverer and more
diligent than I can sort through it and diagnose the problem. However,
it really *does* sound like you need to step back, take a deep breath,
and spend some time with an R tutorial or two (there are many good
ones on teh web) to learn how R works before proceeding further. You
seem to be groping around in the dark.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 31, 2017 at 6:35 PM, CHIRIBOGA Xavier
 wrote:
> Dear colleagues,
>
>
> I am trying to perform GLM ..but I got some objects masked: and an error 
> message below
>
>
> a <- read.table(file.choose(), h<-T)
>> head(a)
>   time treatment transinduc
> 11   CHA0+Db 1,0768
> 21   CHA0+Db 1,0706
> 31   CHA0+Db 1,0752
> 41   CHA0+Db 1,0689
> 51   CHA0+Db 1,1829
> 61PCL+Db 1,1423
>> attach(a)
> The following objects are masked from a (pos = 12):
>
> time, transinduc, treatment
>
> The following objects are masked from a (pos = 13):
>
> time, treatment
>
> The following objects are masked from a (pos = 14):
>
> time, treatment
>
>> summary(a)
>   time treatmenttransinduc
>  Min.   :1.000   CHA0   :10   1,0488 : 6
>  1st Qu.:1.000   CHA0+Db: 9   1,0724 : 4
>  Median :1.000   Db : 9   1,0752 : 3
>  Mean   :1.433   HEALTHY:15   1,0954 : 3
>  3rd Qu.:2.000   PCL:10   1,0001 : 2
>  Max.   :2.000   PCL+Db :14   1,0005 : 2
>   (Other):47
>
>
> m1<-glm(transinduc~time*treatment,data=a,family="poisson")
> Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' 
> family") :
>   valor ausente donde TRUE/FALSE es necesario
> Además: Warning message:
> In Ops.factor(y, 0) : '<' not meaningful for factors
>
>
> I DO NOT HAVE NEGATIVE VALUES IN MY DATASET, Do you know what is going wrong?
>
>
> Thank you for you help,
>
>
> Xavier
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] HELP with GLM

2017-01-31 Thread David Winsemius

> On Jan 31, 2017, at 6:35 PM, CHIRIBOGA Xavier  
> wrote:
> 
> Dear colleagues,
> 
> 
> I am trying to perform GLM ..but I got some objects masked: and an error 
> message below
> 
> 
> a <- read.table(file.choose(), h<-T)
>> head(a)
>  time treatment transinduc
> 11   CHA0+Db 1,0768

Do note htat transinduc came in as a factor variable, not numeric.

> 21   CHA0+Db 1,0706
> 31   CHA0+Db 1,0752
> 41   CHA0+Db 1,0689
> 51   CHA0+Db 1,1829
> 61PCL+Db 1,1423
>> attach(a)
> The following objects are masked from a (pos = 12):
> 
>time, transinduc, treatment
> 
> The following objects are masked from a (pos = 13):
> 
>time, treatment
> 
> The following objects are masked from a (pos = 14):
> 
>time, treatment
> 
>> summary(a)
>  time treatmenttransinduc
> Min.   :1.000   CHA0   :10   1,0488 : 6
> 1st Qu.:1.000   CHA0+Db: 9   1,0724 : 4
> Median :1.000   Db : 9   1,0752 : 3
> Mean   :1.433   HEALTHY:15   1,0954 : 3
> 3rd Qu.:2.000   PCL:10   1,0001 : 2
> Max.   :2.000   PCL+Db :14   1,0005 : 2
>  (Other):47
> 
> 
> m1<-glm(transinduc~time*treatment,data=a,family="poisson")
> Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' 
> family") :
>  valor ausente donde TRUE/FALSE es necesario
> Adem�s: Warning message:
> In Ops.factor(y, 0) : '<' not meaningful for factors
> 
> 
> I DO NOT HAVE NEGATIVE VALUES IN MY DATASET, Do you know what is going wrong?


DO NOT USE `attach`. THAT'S PROBABLY WHAT IS WRONG.
> 
> 
> Thank you for you help,
> 
> 
> Xavier
> 
>   [[alternative HTML version deleted]]

And do not post in HTML. Please read the Posting Guide.


> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Mark Sharp
Hadley,

It’s sometimes amazing the mistakes I can make. No, it did not do what I 
wanted, which was
read_xml(str_c(with_ns_xml, collapse = “")

Reproducible example follows:
library(stringr)
library(xml2)
## Given the correct argument value for collapse, the next two lines work
no_ns <- read_xml(str_c(no_ns_xml, collapse = ""))
with_ns <- read_xml(str_c(with_ns_xml, collapse = ""))
## The next line finds the node in the XML without a namespace
xml_find_all(no_ns, "//WorkSet//Description")
## With a namespace designated in the XML
## Neither of the next two work, though I thought the second should
xml_find_all(with_ns, "//WorkSet//Description")
xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns))
## Using xml_ns_strip() works as predicted
xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description")
## I was surprised to find the incorrect namespace value did not matter
xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns))
## This also seems to ignore the namespace argument value
xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = 
xml_ns(with_ns))


Full output follows:
> ## Given the correct argument value for collapse, the next two lines work
> no_ns <- read_xml(str_c(no_ns_xml, collapse = ""))
> with_ns <- read_xml(str_c(with_ns_xml, collapse = ""))
> ## The next line finds the node in the XML without a namespace
> xml_find_all(no_ns, "//WorkSet//Description")
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
> ## With a namespace designated in the XML
> ## Neither of the next two work, though I thought the second should
> xml_find_all(with_ns, "//WorkSet//Description")
{xml_nodeset (0)}
> xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns))
{xml_nodeset (0)}
> ## Using xml_ns_strip() works as predicted
> xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description")
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
> ## I was surprised to find the incorrect namespace value did not matter
> xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns))
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
> ## This also seems to ignore the namespace argument value
> xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = 
> xml_ns(with_ns))
{xml_nodeset (1)}
[1] MFIA 9-Plex (CharlesRiver)
R. Mark Sharp, Ph.D.
msh...@txbiomed.org





> On Jan 31, 2017, at 5:52 PM, Hadley Wickham  wrote:
>
> I think you want
>
> x <- read_xml('
>  http://labkey.org/etl/xml;>
>  MFIA 9-Plex (CharlesRiver)
> ')
>
> The collapse argument do what you think it does.
>
> Hadley
>
> On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp  wrote:
>> Hadley,
>>
>> Thank you. I am able to get the xml_ns_strip() function to work with my file 
>> directly so I will likely be able to reach my immediate goal.
>>
>> However, I still have had no success with understanding the namespace 
>> problem. I am not able to use read_xml() using the object I generated for 
>> the reproducible example, which is simply a character vector of length 4 
>> having the contents of the XML file as produce by readLines(). I then used 
>> dput() to define the structure. The resulting structure apparently is not to 
>> the liking of read_xml(). I have reproduced the necessary code here for your 
>> convenience. There error is below.
>>
>> ##
>> library(xml2)
>> library(stringr)
>> with_ns_xml <- c("",
>> "http://labkey.org/etl/xml\;>",
>> "MFIA 9-Plex (CharlesRiver)",
>> "")
>> ## without str_c() collapse it complain of a vector of length > 1 also.
>> read_xml(str_c(with_ns_xml, collapse = TRUE))
>> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html 
>> = as_html,  :
>>  Start tag expected, '<' not found [4]
>>
>> ## produces the following error message.
>> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html 
>> = as_html,  :
>>  Start tag expected, '<' not found [4]
>>
>> I have similar issues with xml2::xml_find_all
>> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")
>>
>> ## Produces the following error message.
>> Error in UseMethod("xml_find_all") :
>>  no applicable method for 'xml_find_all' applied to an object of class 
>> "character"
>>
>>
>>
>> R. Mark Sharp, Ph.D.
>> msh...@txbiomed.org
>>
>>
>>
>>
>>
>>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham  wrote:
>>>
>>> See the last example in ?xml2::xml_find_all or use 
>>> xml2::xml2::xml_ns_strip()
>>>
>>> Hadley
>>>
>>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
 I am trying to read a series of XML files that use a namespace and I have 
 failed, thus far, to discover the proper syntax. I have a reproducible 
 example below. I have two XML character strings defined: one without a 
 namespace and one with. I show that I can successfully extract the node 
 using the XML string without the namespace and fail when 

[R] Error in mcp2matrix(model, linfct = linfct)

2017-01-31 Thread Karen Castillioni
Problem: Error in mcp2matrix(model, linfct = linfct)

Post hoc test is not working. Does anyone know what I did wrong?

modmisto<-lme(Cobertura~Tratamento, random=~1|Parcela, data=Cover_BraquiT3) 
summary(modmisto)

tukey<-glht(modmisto, mcp(Tratamento="Tukey")) Error in mcp2matrix(model, 
linfct = linfct) : Variable(s) ‘Tratamento’ of class ‘character’ is/are not 
contained as a factor in ‘model’.
Some translations if necessary: cobertura=cover, tratamento=treatment, 
parcela=plot

I have 4 treatments being tested.

Any help with this will be very appreciated!

Thank you
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] caculate correlation

2017-01-31 Thread Jim Lemon
Hi Elham,
It looks to me as though you are looking for matches in the name field
and then asking for all columns except the first (-1). If you only
have two columns in "coding.rpkm", and "name" is the first column, you
will get whatever is in the second column that has a match in the
"name" column. I suspect that whatever you are doing to whatever is in
the data frames is just returning names, and without knowing what the
data frames look like, nobody can tell you what is going wrong.

Jim


On Tue, Jan 31, 2017 at 10:31 PM, Elham -  wrote:
> thank you for replay dear Jim,
> actually I`m new in R and I asked the person that teach correlation to
> me,but I have problem in it.
> please guide me, I can not understand why notwithstanding I transpose data,
> I do not have number in coding.rpkm[grep("23.C",coding.rpkm$name),-1]  and
> there is gene name instead of number
>
>
> On Tuesday, January 31, 2017 12:20 PM, Jim Lemon 
> wrote:
>
>
> Hi Elham,
>
> On Tue, Jan 31, 2017 at 7:28 PM, Elham -  wrote:
>> Hi Dear Jim,
>>
>> I did it, both return a vector of name of the genes with different
>> length,as
>> I said before I have list of coding and noncoding so the length are not
>> same.
>>
>> where is number?!
>>
> Not in the values you are extracting from the data frame. As you are
> aware, you can only perform the "cor" operation on numbers. As the
> value returned refers to the correlation of _pairs_ of values, the
> vectors of numbers should be the same length and there should be some
> meaningful relationship between those pairs. Are you just trying to
> correlate any old numbers because they are numbers?
>
>> and at the end of print there is this error :
>>
>> <0 rows> (or 0-length row.names)
>>
> This is probably not an error, just R telling you that something that
> was requested didn't have anything in it. Maybe one day we will find
> out what is in:
>
> coding.rpkm
> ncoding.rpkm
>
> and we can provide more informed advice.
>
>
> Jim
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need help in trying out sparklyr - spark_connect will not work

2017-01-31 Thread Taylor, Ronald C
Hello R-help list,

I am a new list member. My first question: I was trying out sparklyr (in R ver 
3.3.2) on my Red Hat Linux workstation, following the instructions at 
spark.rstudio.com as to how to download and use a local copy of Spark. The 
Spark download appears to work. However, when I try to the do the 
spark_connect, to get started, I get the error msgs that  you see below.

I cannot find any guidance as to how to fix this. Quite frustrating. Can 
somebody give me a bit of help? Does something need to be added to my PATH env 
var in my .mycshrc file, for example? Is there a closed port problem? Has 
anybody run into this type of error msg? Do I need to do something additional 
to start up the local copy of Spark that is not mentioned in the RStudio online 
documentation?


-  Ron




> spark_install(version = "1.6.2")
Installing Spark 1.6.2 for Hadoop 2.6 or later.
Downloading from:
- 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
Installing to:
- '~/.cache/spark/spark-1.6.2-bin-hadoop2.6'
trying URL 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
Content type 'application/x-tar' length 278057117 bytes (265.2 MB)
==
downloaded 265.2 MB

Installation complete.
>
> sc <- spark_connect(master = "local")
Error in force(code) :
  Failed while connecting to sparklyr to port (8880) for sessionid (3689): 
Gateway in port (8880) did not respond.
Path: /home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-submit
Parameters: --class, sparklyr.Backend, --jars, 
'/usr/lib64/R/library/sparklyr/java/spark-csv_2.11-1.3.0.jar','/usr/lib64/R/library/sparklyr/java/commons-csv-1.1.jar','/usr/lib64/R/library/sparklyr/java/univocity-parsers-1.5.1.jar',
 '/usr/lib64/R/library/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 3689


 Output Log 
/home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-class: line 86: 
/usr/local/bin/bin/java: No such file or directory

 Error Log 
>


%%

Full screen output of my R session, from the R invocation on:

sidney115% R

R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
>
> library(sparklyr)
>
> ls(pos = "package:sparklyr")
  [1] "%>%"
  [2] "compile_package_jars"
  [3] "connection_config"
  [4] "connection_is_open"
  [5] "copy_to"
  [6] "ensure_scalar_boolean"
  [7] "ensure_scalar_character"
  [8] "ensure_scalar_double"
  [9] "ensure_scalar_integer"
 [10] "find_scalac"
 [11] "ft_binarizer"
 [12] "ft_bucketizer"
 [13] "ft_discrete_cosine_transform"
 [14] "ft_elementwise_product"
 [15] "ft_index_to_string"
 [16] "ft_one_hot_encoder"
 [17] "ft_quantile_discretizer"
 [18] "ft_regex_tokenizer"
 [19] "ft_sql_transformer"
 [20] "ft_string_indexer"
 [21] "ft_tokenizer"
 [22] "ft_vector_assembler"
 [23] "hive_context"
 [24] "invoke"
 [25] "invoke_method"
 [26] "invoke_new"
 [27] "invoke_static"
 [28] "java_context"
 [29] "livy_available_versions"
 [30] "livy_config"
 [31] "livy_home_dir"
 [32] "livy_install"
 [33] "livy_install_dir"
 [34] "livy_installed_versions"
 [35] "livy_service_start"
 [36] "livy_service_stop"
 [37] "ml_als_factorization"
 [38] "ml_binary_classification_eval"
 [39] "ml_classification_eval"
 [40] "ml_create_dummy_variables"
 [41] "ml_decision_tree"
 [42] "ml_generalized_linear_regression"
 [43] "ml_gradient_boosted_trees"
 [44] "ml_kmeans"
 [45] "ml_lda"
 [46] "ml_linear_regression"
 [47] "ml_load"
 [48] "ml_logistic_regression"
 [49] "ml_model"
 [50] "ml_multilayer_perceptron"
 [51] "ml_naive_bayes"
 [52] "ml_one_vs_rest"
 [53] "ml_options"
 [54] "ml_pca"
 [55] "ml_prepare_dataframe"
 [56] "ml_prepare_features"
 [57] "ml_prepare_response_features_intercept"
[58] "ml_random_forest"
 [59] "ml_save"
 [60] "ml_survival_regression"
 [61] "ml_tree_feature_importance"
 [62] "na.replace"
 [63] "print_jobj"
 [64] "register_extension"
 [65] "registered_extensions"
 [66] "sdf_copy_to"
 [67] "sdf_import"
 [68] "sdf_load_parquet"
 [69] "sdf_load_table"
 [70] "sdf_mutate"
 [71] "sdf_mutate_"
 [72] "sdf_partition"
 [73] "sdf_persist"
 [74] "sdf_predict"
 [75] "sdf_quantile"
 [76] "sdf_read_column"
 [77] "sdf_register"
 [78] "sdf_sample"
 [79] "sdf_save_parquet"
 [80] "sdf_save_table"
 [81] "sdf_schema"
 [82] "sdf_sort"
 [83] "sdf_with_unique_id"
 [84] "spark_available_versions"
 [85] "spark_compilation_spec"
 [86] "spark_compile"
 [87] "spark_config"
 [88] "spark_connect"
 [89] "spark_connection"
 [90] "spark_connection_is_open"
 [91] "spark_context"
 [92] "spark_dataframe"
 [93] "spark_default_compilation_spec"
 [94] "spark_dependency"
 [95] "spark_disconnect"
 [96] "spark_disconnect_all"
 [97] "spark_home_dir"
 [98] "spark_install"
 [99] "spark_install_dir"
[100] "spark_install_tar"
[101] "spark_installed_versions"
[102] "spark_jobj"
[103] "spark_load_table"
[104] "spark_log"
[105] "spark_read_csv"

[R] HELP with GLM

2017-01-31 Thread CHIRIBOGA Xavier
Dear colleagues,


I am trying to perform GLM ..but I got some objects masked: and an error 
message below


a <- read.table(file.choose(), h<-T)
> head(a)
  time treatment transinduc
11   CHA0+Db 1,0768
21   CHA0+Db 1,0706
31   CHA0+Db 1,0752
41   CHA0+Db 1,0689
51   CHA0+Db 1,1829
61PCL+Db 1,1423
> attach(a)
The following objects are masked from a (pos = 12):

time, transinduc, treatment

The following objects are masked from a (pos = 13):

time, treatment

The following objects are masked from a (pos = 14):

time, treatment

> summary(a)
  time treatmenttransinduc
 Min.   :1.000   CHA0   :10   1,0488 : 6
 1st Qu.:1.000   CHA0+Db: 9   1,0724 : 4
 Median :1.000   Db : 9   1,0752 : 3
 Mean   :1.433   HEALTHY:15   1,0954 : 3
 3rd Qu.:2.000   PCL:10   1,0001 : 2
 Max.   :2.000   PCL+Db :14   1,0005 : 2
  (Other):47


m1<-glm(transinduc~time*treatment,data=a,family="poisson")
Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' 
family") :
  valor ausente donde TRUE/FALSE es necesario
Adem�s: Warning message:
In Ops.factor(y, 0) : '<' not meaningful for factors


I DO NOT HAVE NEGATIVE VALUES IN MY DATASET, Do you know what is going wrong?


Thank you for you help,


Xavier

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Hadley Wickham
I think you want

x <- read_xml('
  http://labkey.org/etl/xml;>
  MFIA 9-Plex (CharlesRiver)
')

The collapse argument do what you think it does.

Hadley

On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp  wrote:
> Hadley,
>
> Thank you. I am able to get the xml_ns_strip() function to work with my file 
> directly so I will likely be able to reach my immediate goal.
>
> However, I still have had no success with understanding the namespace 
> problem. I am not able to use read_xml() using the object I generated for the 
> reproducible example, which is simply a character vector of length 4 having 
> the contents of the XML file as produce by readLines(). I then used dput() to 
> define the structure. The resulting structure apparently is not to the liking 
> of read_xml(). I have reproduced the necessary code here for your 
> convenience. There error is below.
>
> ##
> library(xml2)
> library(stringr)
> with_ns_xml <- c("",
>  "http://labkey.org/etl/xml\;>",
>  "MFIA 9-Plex (CharlesRiver)",
>  "")
> ## without str_c() collapse it complain of a vector of length > 1 also.
> read_xml(str_c(with_ns_xml, collapse = TRUE))
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
> as_html,  :
>   Start tag expected, '<' not found [4]
>
> ## produces the following error message.
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
> as_html,  :
>   Start tag expected, '<' not found [4]
>
> I have similar issues with xml2::xml_find_all
> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")
>
> ## Produces the following error message.
> Error in UseMethod("xml_find_all") :
>   no applicable method for 'xml_find_all' applied to an object of class 
> "character"
>
>
>
> R. Mark Sharp, Ph.D.
> msh...@txbiomed.org
>
>
>
>
>
>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham  wrote:
>>
>> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()
>>
>> Hadley
>>
>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
>>> I am trying to read a series of XML files that use a namespace and I have 
>>> failed, thus far, to discover the proper syntax. I have a reproducible 
>>> example below. I have two XML character strings defined: one without a 
>>> namespace and one with. I show that I can successfully extract the node 
>>> using the XML string without the namespace and fail when using the XML 
>>> string with the namespace.
>>>
>>> Mark
>>> PS I am having the same problem with the xml2 package and am hoping 
>>> understanding one with help with the other.
>>>
>>> ##
>>> library(XML)
>>> ## The first XML text (no_ns_xml) does not have a namespace defined
>>> no_ns_xml <- c("", "",
>>>   "MFIA 9-Plex (CharlesRiver)",
>>>   "")
>>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>>>   useInternalNodes = TRUE)
>>> ## The node is found
>>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>>
>>> ## The second XML text (with_ns_xml) has a namespace defined
>>> with_ns_xml <- c("",
>>> "http://labkey.org/etl/xml\;>",
>>> "MFIA 9-Plex (CharlesRiver)",
>>> "")
>>>
>>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>>>   useInternalNodes = TRUE)
>>> ## The node is not found
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>>> ## I attempt to provide the namespace, but fail.
>>> ns <-  "http://labkey.org/etl/xml;
>>> names(ns)[1] <- "xmlns"
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>>>
>>> R. Mark Sharp, Ph.D.
>>> Director of Data Science Core
>>> Southwest National Primate Research Center
>>> Texas Biomedical Research Institute
>>> P.O. Box 760549
>>> San Antonio, TX 78245-0549
>>> Telephone: (210)258-9476
>>> e-mail: msh...@txbiomed.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> http://hadley.nz
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments 
> transmitted, may contain privileged and confidential information and is 
> intended solely for the exclusive use of the individual or entity to whom it 
> is addressed. If you are not the intended recipient, you are hereby notified 
> that any review, dissemination, distribution or copying of this e-mail and/or 
> attachments is strictly prohibited. If you have received this e-mail in 
> error, please immediately notify the sender stating that this transmission 
> 

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Mark Sharp
Hadley,

Thank you. I am able to get the xml_ns_strip() function to work with my file 
directly so I will likely be able to reach my immediate goal.

However, I still have had no success with understanding the namespace problem. 
I am not able to use read_xml() using the object I generated for the 
reproducible example, which is simply a character vector of length 4 having the 
contents of the XML file as produce by readLines(). I then used dput() to 
define the structure. The resulting structure apparently is not to the liking 
of read_xml(). I have reproduced the necessary code here for your convenience. 
There error is below.

##
library(xml2)
library(stringr)
with_ns_xml <- c("",
 "http://labkey.org/etl/xml\;>",
 "MFIA 9-Plex (CharlesRiver)",
 "")
## without str_c() collapse it complain of a vector of length > 1 also.
read_xml(str_c(with_ns_xml, collapse = TRUE))
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
as_html,  :
  Start tag expected, '<' not found [4]

## produces the following error message.
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
as_html,  :
  Start tag expected, '<' not found [4]

I have similar issues with xml2::xml_find_all
xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")

## Produces the following error message.
Error in UseMethod("xml_find_all") :
  no applicable method for 'xml_find_all' applied to an object of class 
"character"



R. Mark Sharp, Ph.D.
msh...@txbiomed.org





> On Jan 31, 2017, at 4:27 PM, Hadley Wickham  wrote:
>
> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()
>
> Hadley
>
> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
>> I am trying to read a series of XML files that use a namespace and I have 
>> failed, thus far, to discover the proper syntax. I have a reproducible 
>> example below. I have two XML character strings defined: one without a 
>> namespace and one with. I show that I can successfully extract the node 
>> using the XML string without the namespace and fail when using the XML 
>> string with the namespace.
>>
>> Mark
>> PS I am having the same problem with the xml2 package and am hoping 
>> understanding one with help with the other.
>>
>> ##
>> library(XML)
>> ## The first XML text (no_ns_xml) does not have a namespace defined
>> no_ns_xml <- c("", "",
>>   "MFIA 9-Plex (CharlesRiver)",
>>   "")
>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>>   useInternalNodes = TRUE)
>> ## The node is found
>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>
>> ## The second XML text (with_ns_xml) has a namespace defined
>> with_ns_xml <- c("",
>> "http://labkey.org/etl/xml\;>",
>> "MFIA 9-Plex (CharlesRiver)",
>> "")
>>
>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>>   useInternalNodes = TRUE)
>> ## The node is not found
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>> ## I attempt to provide the namespace, but fail.
>> ns <-  "http://labkey.org/etl/xml;
>> names(ns)[1] <- "xmlns"
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>>
>> R. Mark Sharp, Ph.D.
>> Director of Data Science Core
>> Southwest National Primate Research Center
>> Texas Biomedical Research Institute
>> P.O. Box 760549
>> San Antonio, TX 78245-0549
>> Telephone: (210)258-9476
>> e-mail: msh...@txbiomed.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> http://hadley.nz

CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Challenge extracting months

2017-01-31 Thread Jim Lemon
Hi Kwesi,
I worked through your code below, and I think that when you have the
two variables "mon.t1" and "seas.t1" you can select a "rolling
quarter" like this:

# the file name in your example is different from the one you sent
era<-read.table(file="SAfr_700hpa_7x5II.txt",header=FALSE,sep=" ",
 skip=1,dec = ".")
era.nodes<-paste(era[,1],era[,2],sep=".")
era.nodes<-as.numeric(era.nodes)
era.nodes.days<-zooreg(era.nodes,start=as.Date("1980-01-01"),
 end=as.Date("2016-12-31"))
era.nodes.days.t1<-window(era.nodes.days,start=as.Date("1980-01-01"),
 end=as.Date("2016-12-31"))
mon.t1<-as.numeric(format(index(era.nodes.days.t1),"%m"))
addyear<-0
# this loop transforms mon.t1 into an increasing sequence of months
for(i in 2:length(mon.t1)) {
 if(seas.t1[i] > seas.t1[i-1]) addyear<-addyear+12
 mon.t1[i]<-mon.t1[i] + addyear
}
for(i in 1:(max(mon.t1)-2)) {
 # this gives a logical index for the rolling quarter
 rq<-mon.t1 %in% i:(i+2)
}

Each successive "rq" produced by the last loop can be used to extract
whatever values you want from "era" or "era.nodes".

Jim


On Tue, Jan 31, 2017 at 9:04 PM, Kwesi Quagraine  wrote:
> Hello Jim, thanks for the code. But I come to you once again, I am not
> looking to do a rolling mean, but to select JFM,FMA,MAM etc from the data
> attached. Below is my sample code which actually selects these months. I
> will rather be glad if I can have a function that does the selection for all
> these 3 months selected for each year as shown in my last two lines of code;
> Taking into accounts years with 29 days in February etc.
>
> rm(list = ls())
> library(zoo)
> library(PCICt)
> library(lattice)
> library(RColorBrewer)
>
> setwd('/home/kwesi/Documents/700hpa/soms/')
> # Reading the data
>
> era   <- read.table(file="SAfr_700hpa_5x4II.txt",header = FALSE, sep =
> "",skip=1,dec = ".")
> era.nodes  <- paste(era[,1],era[,2],sep=".")
>
> era.nodes  <-as.numeric(era.nodes)
> era.nodes.days<-zooreg(era.nodes,start=as.Date("1980-01-01"),end=as.Date("2016-12-31"))
>
> era.nodes.days.t1<-window(era.nodes.days,start=as.Date("1980-01-01"),end=as.Date("2016-12-31"))
>
> mon.t1<-as.numeric(format(index(era.nodes.days.t1),"%m"))
> seas.t1 <-as.numeric(format(index(era.nodes.days.t1),"%Y"))
> era.nodes.days.t1<-cbind(era.nodes.days.t1,mon.t1,seas.t1)
> era.nodes.days.t1
> jfm80<-era.nodes.days.t1[1:91,1:3[era.nodes.days.t1[1:91,2]==1|era.nodes.days.t1[1:91,2]==2|era.nodes.days.t1[1:91,2]==3]
> fma80<-era.nodes.days.t1[32:(91+30),1:3
> [era.nodes.days.t1[1:91,2]==2|era.nodes.days.t1[1:91,2]==3|era.nodes.days.t1[1:91,2]==4]
>
> On Tue, Jan 31, 2017 at 5:23 AM, Jim Lemon  wrote:
>>
>> Hi Kwesi,
>> A mistake in the last email. Don't try to replace the column in
>> era.sta as the result will be a different length. Try this:
>>
>> newera.sta2<-collapse.values(era.sta[,2],3)
>>
>> Jim
>>
>> On Tue, Jan 31, 2017 at 10:32 AM, Jim Lemon  wrote:
>> > Hi Kwesi,
>> > The function collapse_values will only work on a vector of numbers
>> > with FUN="mean". era.sta looks like a data frame with at least two
>> > elements. As the second of these elements seems to be numeric, perhaps
>> > this will work:
>> >
>> > era.sta[,2]<-collapse.values(era.sta[,2],3)
>> >
>> > Don't try to apply the names to era.sta, that was just something to
>> > make the example easier to understand. If you want to collapse more
>> > than one column of era.sta do each one at a time and assign them to a
>> > new data frame. In particular, if era[,1] is a vector of month names,
>> > you will have to create a new vector of quarter (three month) names.
>> > If there are very many of these, the collapse_values function can be
>> > modified to do it automatically.
>> >
>> > Jim
>> >
>> >
>> >
>> > On Tue, Jan 31, 2017 at 9:50 AM, Kwesi Quagraine
>> >  wrote:
>> >> Hello Jim,this is my script now; I am having this error when I called
>> >> the
>> >> function;" In mean.default(list(era...1. = 1:444, Node_freq =
>> >> c(-0.389855332400718,  :  argument is not numeric or logical: returning
>> >> NA"
>> >> Any help will be much appreciated.
>> >>
>> >> Kwesi
>> >>
>> >> rm(list = ls())
>> >> setwd('/home/kwesi/Documents/700hpa/soms/')
>> >> # Reading the data
>> >>
>> >> era   <- read.csv(file="som_freq.csv",header = TRUE, sep = ",",dec
>> >> =
>> >> ".")
>> >> era.scaled <- scale(era[,2:3], center = TRUE, scale = TRUE)
>> >> era.sta<-data.frame(era[,1],era.scaled)
>> >> era.sta
>> >>
>> >> collapse_values<-function(x,span,FUN="mean",na.rm=FALSE) {
>> >>   jump<-span-1
>> >>   newx<-rep(NA,length(x)-jump)
>> >>   for(i in 1:length(newx))
>> >> newx[i]<-do.call(FUN,list(x[i:(i+jump)],na.rm=na.rm))
>> >>   return(newx)
>> >> }
>> >>
>> >> #test<-1:12
>> >> names(era.sta)<-month.abb
>> >> collapse_values(era.sta,3)
>> >> era.sta
>> >>
>> >>
>> >> On Mon, Jan 30, 2017 at 11:53 PM, Jim Lemon 
>> >> wrote:
>> >>>
>> >>> Hi Kwesi,

Re: [R] [FORGED] filter correlation data

2017-01-31 Thread Rolf Turner


On 01/02/17 10:34, Elham - via R-help wrote:


hello everybody,I have a very very huge table in R from calculating
correlation,how can I filter it per spearman correlation and p-value
before export it,I mean what is the function that I use?I want to
select the pairs for value (r), , greater than 0.9 (directly
correlated) and less than -0.9 (inversely corerlated), and a p-value
< 0.001


I should say that I transformed the big matrix in a table by
library(reshape).


(a) In this instance it doesn't really matter, but *PLEASE* stop posting 
in HTML.


(b) It's not at all clear what the structure of your (table? matrix? 
data frame) is.  Please learn to be precise and explicit, otherwise it 
is difficult-to-impossible to provide useful advice.  (I.e. don't expect 
us to be mind-readers.)


Let us suppose (for the sake of saying *something* that might be 
helpful) that your correlations and p-values are stored in a data frame 
"X" as columns named "r" and "pval".


Then assign

ok <- with(X, (r < -0.9 | r > 0.9) & pval < 0.001)
Y  <- X[ok,]

Then export Y.

Really, if you are going to use R you should learn something about R.

cheers,

Rolf Turner

P.S.  Note that your p-value < 0.001 condition is redundant for any 
sample size greater than or equal to 7.


R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filter correlation data

2017-01-31 Thread Elham - via R-help
actually,First I searched in net,after that I sent my question, because I 
couldn't find the function. 

On Wednesday, February 1, 2017 1:34 AM, Bert Gunter 
 wrote:
 

 ... And why did you not do a web search on "correlation coefficient in
R", which would have led you almost imediately to ?cor and friends?

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 31, 2017 at 1:34 PM, Elham - via R-help
 wrote:
> hello everybody,I have a very very huge table in R from calculating 
> correlation,how can I filter it per spearman correlation and p-value before 
> export it,I mean what is the function that I use?I want to select the pairs 
> for value (r), , greater than 0.9 (directly correlated) and less than -0.9 
> (inversely corerlated), and a p-value < 0.001
>
>
> I should say that I transformed the big matrix in a table by library(reshape).
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Hadley Wickham
See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()

Hadley

On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
> I am trying to read a series of XML files that use a namespace and I have 
> failed, thus far, to discover the proper syntax. I have a reproducible 
> example below. I have two XML character strings defined: one without a 
> namespace and one with. I show that I can successfully extract the node using 
> the XML string without the namespace and fail when using the XML string with 
> the namespace.
>
> Mark
> PS I am having the same problem with the xml2 package and am hoping 
> understanding one with help with the other.
>
> ##
> library(XML)
> ## The first XML text (no_ns_xml) does not have a namespace defined
> no_ns_xml <- c("", "",
>"MFIA 9-Plex (CharlesRiver)",
>"")
> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>useInternalNodes = TRUE)
> ## The node is found
> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>
> ## The second XML text (with_ns_xml) has a namespace defined
> with_ns_xml <- c("",
>  "http://labkey.org/etl/xml\;>",
>  "MFIA 9-Plex (CharlesRiver)",
>  "")
>
> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>useInternalNodes = TRUE)
> ## The node is not found
> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
> ## I attempt to provide the namespace, but fail.
> ns <-  "http://labkey.org/etl/xml;
> names(ns)[1] <- "xmlns"
> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>
> R. Mark Sharp, Ph.D.
> Director of Data Science Core
> Southwest National Primate Research Center
> Texas Biomedical Research Institute
> P.O. Box 760549
> San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail: msh...@txbiomed.org
>
>
>
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filter correlation data

2017-01-31 Thread Bert Gunter
... And why did you not do a web search on "correlation coefficient in
R", which would have led you almost imediately to ?cor and friends?

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 31, 2017 at 1:34 PM, Elham - via R-help
 wrote:
> hello everybody,I have a very very huge table in R from calculating 
> correlation,how can I filter it per spearman correlation and p-value before 
> export it,I mean what is the function that I use?I want to select the pairs 
> for value (r), , greater than 0.9 (directly correlated) and less than -0.9 
> (inversely corerlated), and a p-value < 0.001
>
>
> I should say that I transformed the big matrix in a table by library(reshape).
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] filter correlation data

2017-01-31 Thread Elham - via R-help
hello everybody,I have a very very huge table in R from calculating 
correlation,how can I filter it per spearman correlation and p-value before 
export it,I mean what is the function that I use?I want to select the pairs for 
value (r), , greater than 0.9 (directly correlated) and less than -0.9 
(inversely corerlated), and a p-value < 0.001


I should say that I transformed the big matrix in a table by library(reshape).
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sub-setting rows based on dates in R

2017-01-31 Thread Md Sami Bin Shokrana
Hello guys, I am trying to solve a problem in R. I have 2 data frames which 
look like this:
df1 <-
  DateRainfall_Duration
6/14/2016   10
6/15/2016   20
6/17/2016   10
8/16/2016   30
8/19/2016   40

df2 <-
  DateRemoval.Rate
6/17/201664.7
6/30/201622.63
7/14/201618.18
8/19/201627.87

I want to look up the dates from df2 in df1 and their corresponding 
Rainfall_Duration data. For example, I want to look for the 1st date of df2 in 
df1 and subset rows in df1 for that specific date and 7 days prior to that. 
additionally, for example: for 6/30/2016 (in df2) there is no dates available 
in df1 within it's 7 days range. So, in this case I just want to extract the 
results same as it's previous date (6/17/2016) in df2. Same logic goes for 
7/14/2016(df2).
The output should look like this:

df3<-

Rate.Removal.Date  Date Rainfall_Duration
6/17/2016  6/14/2016  10
6/17/2016  6/15/2016  20
6/17/2016  6/17/2016  10
6/30/2016  6/14/2016  10
6/30/2016  6/15/2016  20
6/30/2016  6/17/2016  10
7/14/2016  6/14/2016  10
7/14/2016  6/15/2016  20
7/14/2016  6/17/2016  10
8/19/2016  8/16/2016  30
8/19/2016  8/19/2016  40

I could subset data for the 7 days range. But could not do it when no dates are 
available in that range. I have the following code:
library(plyr)
library (dplyr)
df1$Date <- as.Date(df1$Date,format = "%m/%d/%Y")
df2$Date <- as.Date(df2$Date,format = "%m/%d/%Y")

df3 <- lapply(df2$Date, function(x){
  filter(df1, between(Date, x-7, x))
})

names(df3) <- as.character(df2$Date)
bind_rows(df3, .id = "Rate.Removal.Date")
df3 <- ldply (df3, data.frame, .id = "Rate.Removal.Date")

I hope I could explain my problem properly. I would highly appreciate if 
someone can help me out with this code or a new one. Thanks in advance.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] caculate correlation

2017-01-31 Thread Elham -
Hi Dear Jim,

I did it, both return a vector of name of the genes with different length,as I 
said before I have list of coding and noncoding so the length are not same.
where is number?!

and at the end of print there is this error :

<0 rows> (or 0-length row.names)
 

On Tuesday, January 31, 2017 3:07 AM, Jim Lemon  
wrote:
 

 Hi Elham,
What I meant is to simply copy these two expressions into the R command line:

coding.rpkm[grep("23.C",coding.rpkm$name),-1]

ncoding.rpkm[grep("23.C",ncoding.rpkm$name),-1]

and see what comes out. If both return a vector of numbers of the same
length with no NA values, my guess was wrong. If there are NA values,
try adding the argument use=pairwise.complete.obs to the "cor"
statement.

Jim


On Tue, Jan 31, 2017 at 9:17 AM, Elham -  wrote:
> this script automatically recognizes what is control among cod and lnc. Note
> that this script contains a piece of text that is "grep(".C",cod$name)".
> This text select - among all column names - those that contain ".C". in my
> files, I named C1, C2, C3, etc all columns that correspond to controls. In
> the same manner, I get controls among the lnc, with the text:
> "grep(".C",lnc$name)"
>
> I`m so sorry,maybe I do not understand you again.
>
>
> On Tuesday, January 31, 2017 1:27 AM, Jim Lemon 
> wrote:
>
>
> Hi Elham,
> This is about the same as your first message. What I meant was, what
> do these two expressions return? Is whatever is returned suitable
> input for the "cor" function?
>
> coding.rpkm[grep("23.C",coding.rpkm$name),-1]
>
> ncoding.rpkm[grep("23.C",ncoding.rpkm$name),-1]
>
> Jim
>
>
> On Tue, Jan 31, 2017 at 8:45 AM, Elham -  wrote:
>> I have 9 experiments control/treatment that I analysed coding and
>> lncoding,
>> after that I normalize expression value.as you know we have different
>> known
>> number of coding and non -coding genes,so for calculating correlation
>> first
>> I transposed data ,(rows become columns)so row is control and
>> columns are gene names.(so I have 2 matrix with same row and different
>> column).This information is enough?
>>
>>
>>
>>
>> On Tuesday, January 31, 2017 1:06 AM, Jim Lemon 
>> wrote:
>>
>>
>> Hi Elham,
>> Without knowing much about what coding.rpkm and ncoding.rkpm look
>> like, it is difficult to say. Have you tried to subset these matrices
>> as you do in the "cor" function and see what is returned?
>>
>> Jim
>>
>> On Tue, Jan 31, 2017 at 6:40 AM, Elham - via R-help
>>  wrote:
>>> for calculating correlation between coding and noncoding,first I
>>> transposed data ,(rows become columns) so row is control and
>>> columns are gene names.(so I have 2 matrix with same row and different
>>> column),I use these function for calculating correlation but all of
>>> spearman
>>> correlation are NA,why?
>>>
>>>
>>>
>>>
>>> control.corr=cor(coding.rpkm[grep("23.C",coding.rpkm$name),-1],ncoding.rpkm[grep("23.C",ncoding.rpkm$name),-1],method=
>>> "spearman")
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> tumor.corr=cor(coding.rpkm [grep("27.T", coding.rpkm $name),-1],
>>> ncoding.rpkm [grep("27.T", ncoding.rpkm $name),-1],method = "spearman")
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>


   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Mark Sharp
I am trying to read a series of XML files that use a namespace and I have 
failed, thus far, to discover the proper syntax. I have a reproducible example 
below. I have two XML character strings defined: one without a namespace and 
one with. I show that I can successfully extract the node using the XML string 
without the namespace and fail when using the XML string with the namespace.

Mark
PS I am having the same problem with the xml2 package and am hoping 
understanding one with help with the other.

##
library(XML)
## The first XML text (no_ns_xml) does not have a namespace defined
no_ns_xml <- c("", "",
   "MFIA 9-Plex (CharlesRiver)",
   "")
l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
   useInternalNodes = TRUE)
## The node is found
getNodeSet(l_no_ns_xml, "/WorkSet//Description")

## The second XML text (with_ns_xml) has a namespace defined
with_ns_xml <- c("",
 "http://labkey.org/etl/xml\;>",
 "MFIA 9-Plex (CharlesRiver)",
 "")

l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
   useInternalNodes = TRUE)
## The node is not found
getNodeSet(l_with_ns_xml, "/WorkSet//Description")
## I attempt to provide the namespace, but fail.
ns <-  "http://labkey.org/etl/xml;
names(ns)[1] <- "xmlns"
getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)

R. Mark Sharp, Ph.D.
Director of Data Science Core
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msh...@txbiomed.org









CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2017-01-31 Thread PIKAL Petr
Hi

Of course you have fewer values than in original data.

> x<-1:21

> length(rowSums(embed(x,3)))
[1] 19
> length(x)
[1] 21

> embed(x,3)
  [,1] [,2] [,3]
[1,]321
[2,]432
[3,]543
[4,]654
[5,]765
[6,]876
[7,]987
[8,]   1098
[9,]   11   109
[10,]   12   11   10
[11,]   13   12   11
[12,]   14   13   12
[13,]   15   14   13
[14,]   16   15   14
[15,]   17   16   15
[16,]   18   17   16
[17,]   19   18   17
[18,]   20   19   18
[19,]   21   20   19

You need to prepend or append 2 NA values if you want to add newly computed 
values to old data.

or extend your original vector by 2 values

length(rowSums(embed(c(NA, NA, x),3), na.rm=T))
[1] 21

Cheers
Petr


From: Kwesi Quagraine [mailto:starskykw...@gmail.com]
Sent: Monday, January 30, 2017 6:29 PM
To: PIKAL Petr 
Cc: Jeff Newmiller ; r-help@r-project.org
Subject: Re: [R] (no subject)

Upon trying this method, I get an error ;

Error in `$<-.data.frame`(`*tmp*`, "MEImeans", value = c(0.313162987462034,  :
  replacement has 442 rows, data has 444
Any thoughts?
Kwesi

On Mon, Jan 30, 2017 at 5:37 PM, PIKAL Petr 
> wrote:
Hi

Probably just a small correction.

d3 <- embed( dta$MEI, 3)

Cheers
Petr

> -Original Message-
> From: R-help 
> [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Jeff
> Newmiller
> Sent: Monday, January 30, 2017 4:19 PM
> To: r-help@r-project.org; Kwesi Quagraine 
> >
> Subject: Re: [R] (no subject)
>
> How you proceed depends on how consistent the data are and on what you
> want to do with those sets of three months after you have identified them.
>
> One approach is to create a matrix where each row contains the values
> corresponding to the "second previous", "previous", and "current" months
> data, respectively using the embed() function. The first two rows would be
> incomplete because the earlier data are missing there:
>
> d3 <- embed( dta$MEI )
>
> with which you could compute whatever metric you wanted. For example
> you could compute rolling means:
>
> dta$MEImeans <- rowMeans( d3 )
>
> If your data have missing rows you might need to use the aggregate or
> merge functions instead.
>
> For specific layouts of data or metrics you can find specialized functions in
> various packages. You might want to search using the R "sos" package or
> Google for your analysis method of choice.
> --
> Sent from my phone. Please excuse my brevity.
>
> On January 30, 2017 6:11:48 AM PST, Kwesi Quagraine
> > wrote:
> >Hello, I have a data with two variables nodes and index, I want to
> >extract
> >3 months seasons, with a shift of 1 month, that is, DJF, JFM, FMA etc
> >to OND. Was wondering how to go about it. Kindly find data sample
> >below, data is in csv format.
> >Any help will be appreciated.
> >
> >My data sample;
> >
> >  era...1.Node_freq   MEI
> >1   1980-01-01 -0.389855332  0.3394196488
> >2 1980-02-01 -0.728019153  
> >0.2483738232
> >3   1980-03-01 -1.992457784  0.3516954904
> >4   1980-04-01  0.222760284  0.5736836269
> >5   1980-05-01  0.972601798  0.6289249144
> >6   1980-06-01  0.570725954  0.5736836269
> >7   1980-07-01 -0.977966324  0.4120517119
> >8   1980-08-01  0.056128836 -0.0104418383
> >9   1980-09-01  0.987304573 -0.0687520861
> >10  1980-10-01  1.188242495 -0.1403611624
> >11  1980-11-01  1.693037763 -0.0963727298
> >12  1980-12-01  1.173539720 -0.2539126977
> >13  1981-01-01  0.423698206 -0.6140040528
> >14  1981-02-01 -2.208098481 -0.5209122536
> >15  1981-03-01 -0.786830252  0.1133395650
> >16  1981-04-01 -0.110502611  0.3302127675
> >17  1981-05-01 -1.272021820 -0.1894645290
> >18  1981-06-01  0.394292656 -0.3736021538
> >19  1981-07-01  1.452892441 -0.4032687711
> >20  1981-08-01  0.698150002 -0.4441882433
> >21  1981-09-01  0.997106423 -0.1720737534
> >22  1981-10-01  0.247264908 -0.2436828296
> >23  1981-11-01  0.771663876 -0.3909929295
> >24  1981-12-01 -0.316341458 -0.4943145967
> >
> >Regards,
> >​Kwesi​
> >
> >--
> >Try not to become a man of success but rather a man of value-Albert
> >Einstein
> >
> >University of Cape Coast|College of Agriculture and Natural
> >Sciences|Department
> >of Physics|
> >Team Leader|Recycle Up! Ghana|Technology Without Borders| Other
> emails:
> >kwesi.quagra...@ucc.edu.gh|kwesi.quagra...@teog.de|
> >Mobile: +233266173582
> >Skype: quagraine_cwasi
> >Twitter: @Pkdilly
> >
> > [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To 
> >UNSUBSCRIBE and 

Re: [R] g parameter for deltaMethod() as a function

2017-01-31 Thread Marc Girondot via R-help

Dear John and list members,

I have found a solution using the package nlWaldTest. I post the 
solution in case someone else will have this problem.


Here is a summary of the problem:
I would like use the delta method for a function for which no derivative 
using D() can be calculated. I would like rather use numerical derivative.


Here is the solution. In the two first examples, symbolic derivative is 
used.


library(car)
m1 <- lm(time ~ t1 + t2, data = Transact)
deltaMethod(coef(m1), "t1/t2", vcov.=vcov(m1))

library("nlWaldTest")
nlConfint(obj = NULL, texts="b[2]/b[3]", level = 0.95, coeff = coef(m1),
  Vcov = vcov(m1), df2 = TRUE, x = NULL)

# Now numerical derivative is used. The result is the same.

try_g <- function(...) {
  par <- list(...)
  return(par[[1]]/par[[2]])
}

nlConfint(obj = NULL, texts="try_g(b[2], b[3])", level = 0.95, coeff = 
coef(m1),

  Vcov = vcov(m1), df2 = TRUE, x = NULL)

Marc

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] caculate correlation

2017-01-31 Thread Jim Lemon
Hi Elham,

On Tue, Jan 31, 2017 at 7:28 PM, Elham -  wrote:
> Hi Dear Jim,
>
> I did it, both return a vector of name of the genes with different length,as
> I said before I have list of coding and noncoding so the length are not
> same.
>
> where is number?!
>
Not in the values you are extracting from the data frame. As you are
aware, you can only perform the "cor" operation on numbers. As the
value returned refers to the correlation of _pairs_ of values, the
vectors of numbers should be the same length and there should be some
meaningful relationship between those pairs. Are you just trying to
correlate any old numbers because they are numbers?

> and at the end of print there is this error :
>
> <0 rows> (or 0-length row.names)
>
This is probably not an error, just R telling you that something that
was requested didn't have anything in it. Maybe one day we will find
out what is in:

coding.rpkm
ncoding.rpkm

and we can provide more informed advice.

Jim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.