Re: [R] stringr::str_split_fixed query

2015-01-15 Thread Bernhard Pröll

Dear David,

str_split_fixed calls str_locate_all, which gives

str_locate_all(ab, )
## [[1]]
##  start end
## [1,] 1   0
## [2,] 2   1
##

in your example, since  is a character of length 1. substring() is
probably more intuitive to get your expected result:

substring(ab, 1:2, 1:2)
## [1] a b


David Barron dnbar...@gmail.com schrieb am Wed, 14. Jan 18:47:

I'm puzzled as to why I get this behaviour with str_split_fixed in the
stringr package.


stringr::str_split_fixed('ab','',2)

[,1] [,2]
[1,]ab


stringr::str_split_fixed('ab','',3)

[,1] [,2] [,3]
[1,]a  b

In the first example, I was expecting to get
[,1] [,2]
[1,] a   b

Can someone explain?

Thanks,
David

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Approximation of a function from R^2 to R

2015-01-15 Thread rala
Thanks for the reply. I have datapoints.
What I actually want to do is to estimate a joint density function.
I used npudens() to estimate the kernel density of two vectors and got the
densities at the evaluation points. Now I want an approximation for this so
I can have an estimate for different points.



--
View this message in context: 
http://r.789695.n4.nabble.com/Approximation-of-a-function-from-R-2-to-R-tp4701805p4701828.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R 3.1.2 mle2() function on Windows 7 Error and multiple solutions

2015-01-15 Thread Ravi Varadhan
Hi,



I tried your problem with optimx package.  I found a better solution than that 
found by mle2.



?library(optimx)



# the objective function needs to be re-written

LL2 - function(par,y) {

lambda - par[1]
alpha - par[2]
beta - par[3]
R = Nweibull(y,lambda,alpha,beta)

-sum(log(R))
}



optimx(fn=LL2,  par=c(.01,325,.8),y=y, lower=c(.1,.1,.1),upper = 
c(Inf, Inf,Inf),control=list(all.methods=TRUE))



# Look at the solution found by `nlminb' and `nmkb'. This is the optimal one.  
This log-likelihood is larger than that of mle2 and other optimizers in optimx.



If this solution is not what you are looking for, your problem may be poorly 
scaled.  First, make sure that the likelihood is coded correctly.  If it is 
correct, then you may need to improve the scaling of the problem.





Hope this is helpful,

Ravi



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stringr::str_split_fixed query

2015-01-15 Thread Hadley Wickham
FWIW this is fixed in the dev version of stringr which uses stringi
under the hood:

 stringr::str_split_fixed('ab','',2)
 [,1] [,2]
[1,] a  b
 stringr::str_split_fixed('ab','',3)
 [,1] [,2] [,3]
[1,] a  b  

Hadley

On Wed, Jan 14, 2015 at 12:47 PM, David Barron dnbar...@gmail.com wrote:
 I'm puzzled as to why I get this behaviour with str_split_fixed in the
 stringr package.

 stringr::str_split_fixed('ab','',2)
  [,1] [,2]
 [1,]ab

 stringr::str_split_fixed('ab','',3)
  [,1] [,2] [,3]
 [1,]a  b

 In the first example, I was expecting to get
  [,1] [,2]
 [1,] a   b

 Can someone explain?

 Thanks,
 David

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R 3.1.2 mle2() function on Windows 7 Error and multiple solutions

2015-01-15 Thread Ravi Varadhan
A more important point that want to make is that I find few people taking 
advantage of the comparative evaluation or benchmarking ability of optimx.  
There is no uniformly best optimizer for all problems.  Different ones turn 
out to perform better for different problems and it is quite difficult to know 
a priori which one would be the best for my problem.  Thus, the benchmarking 
capability provided by optimx is a powerful feature.  

Ravi  

From: Ben Bolker bbol...@gmail.com
Sent: Thursday, January 15, 2015 9:29 AM
To: Ravi Varadhan; R-Help
Cc: malqura...@ksu.edu.sa
Subject: Re: R 3.1.2 mle2() function on Windows 7 Error and multiple solutions

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

  For what it's worth, you can use either nlminb (directly) or optimx
within the mle2 wrapper by specifying the 'optimizer' parameter ...
this gives you flexibility in optimization along with the convenience
of mle2 (likelihood ratio tests via anova(), likelihood profiling, etc.)

On 15-01-15 09:26 AM, Ravi Varadhan wrote:
 Hi,



 I tried your problem with optimx package.  I found a better
 solution than that found by mle2.



 ?library(optimx)



 # the objective function needs to be re-written

 LL2 - function(par,y) {

 lambda - par[1] alpha - par[2] beta - par[3] R =
 Nweibull(y,lambda,alpha,beta)

 -sum(log(R)) }



 optimx(fn=LL2,  par=c(.01,325,.8),y=y,
 lower=c(.1,.1,.1),upper = c(Inf,
 Inf,Inf),control=list(all.methods=TRUE))



 # Look at the solution found by `nlminb' and `nmkb'. This is the
 optimal one.  This log-likelihood is larger than that of mle2 and
 other optimizers in optimx.



 If this solution is not what you are looking for, your problem may
 be poorly scaled.  First, make sure that the likelihood is coded
 correctly.  If it is correct, then you may need to improve the
 scaling of the problem.





 Hope this is helpful,

 Ravi




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJUt87VAAoJEOCV5YRblxUH9E4H/ismNjBi/diA7db1f4EtIYUz
fk0V1GIjAkhNr+gxs8bu6CBAMB2f/ufw+9ey2X6yHlzvgfwzIwNafgg9c5qVlArF
xD8A4w/4G9cRsQFX8yySEQMP7dH5tyCTeRHU0sEcTbY+vV/NtWAYpF7k36He0QnQ
Jz/Gfmjt/TTVlcsL4crr8IdOjP34mq7H1SGXKNoBymhaggkBXXjG+IlhPK3/HE4s
2LFKusdSVDiJCCR+kafwyKxk76Lf2WADw9/RaysWfW0/v5O5dWU4IuvK2//nzvts
7rKMkF9/zlT+LgLNo7LON+RTOeDtTMqyA10Vu+txQTKH4AcMP4LqYoiGMerl6O0=
=cHEo
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing elements of named vector / named list to function

2015-01-15 Thread Jeff Newmiller
It x is a list...

do.call(fun,x)

You should keep external data input away from this code construct to avoid 
intentional or unintentional misuse of your fun.

If your toy example were your actual usage I would suggest the collapse 
argument of paste.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On January 15, 2015 6:25:43 AM PST, Rainer M Krug rai...@krugs.de wrote:

Hi

Following scenario: I have a function fun

--8---cut here---start-8---
fun - function(A, B, C, ...){paste(A, B, C, ...)}
--8---cut here---end---8---

and x defined as follow

--8---cut here---start-8---
x - 1:5
names(x) - LETTERS[x]
--8---cut here---end---8---

now I want to pass the *elements* of x to fun as named arguments, i.e.

,
|  fun(A=1, B=2, C=3, D=4, E=5)
| [1] 1 2 3 4 5
`

The below examples obviously do not work:

,
|  fun(x)
| Error in paste(A, B, C, list(...)) :
|   argument B is missing, with no default
|  fun(unlist(x))
| Error in paste(A, B, C, list(...)) :
|   argument B is missing, with no default
`

How can I extract from x the elements and pass them on to fun()?

I could easily change x to a list() if this would be easier.

--8---cut here---start-8---
x - list(A=1, B=2, C=3, D=4, E=5)
--8---cut here---end---8---

In my actual program, x can have different elements as well as fun -
this is decided programmatically.

Any suggestions how I can achieve this?

Thanks,

Rainer

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mlogit: Error in model.frame.default/ Error in solve.default

2015-01-15 Thread Lee van Cleef
Dear R users, 
1) My problem in short: 
Mlogit cannot calculate certain conditional models. 

2) My database: 
The target was a logistical regression analysis and a probability function
which should include generic coefficients and alternative-specific ones. The
database was a survey, the dependent variable consisted of six possible
choices. 
I worked with two databases: 
First, the true survey data (out of the six choices, one is quite dominant
with a share of around 40%, two have around 20% each, and the remaining are
marginal). 
Second, the survey with randomized values for both choice and independent
variables to avoid the possible problems with the less uniform distribution
of the choice variable in the true set. 
Both datasets include 725 individuals with twelve variables due to receive a
generic coefficient and 25 individual ones which should have an alternative
specific coefficient. I have already to note that the problems above emerged
when estimating with only one independent variable. 
Before the estimate, both datasets were brought into the required long
format. The original data in the long format looked ok. 

3) The estimates and the issues. 
I started in both datasets with an estimate with only one variable with a
generic coefficient. I used the following order:  mlogit(choice ~ variable,
data = survey_long) 
R’s comment was “Error in solve.default(H, g[!fixed]) :   system is
computationally singular: reciprocal condition number = ...”. 
I was able to get an estimate for such a comparably simple model when  using
mlogit(choice ~ variable | 0, data = survey_long). However, the output did
not contain estimates on the alternative specific coefficents for the
intercept. 
I continued by gradually adding variables with generic coefficients. If it
was more than five of them, R’s message was “Error in solve.default(H,
g[!fixed]) : system is computationally singular: reciprocal condition number
=  ...”. 
One of the variables had the problem “Error in
model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data),  :  
variable lengths differ (found for 'C15_CC_Dist')”. 
I compared the length for all the variables, and it was all the same. 

Another thing: It was impossible to add variables with an alternative
specific coefficient:  “Error in solve.default(H, g[!fixed]) :system is
computationally singular: reciprocal condition number =  ...”. 

These statements appeared both when using the true survey as well as when
using the randomized data. 

Does anyone have any comment or help on that? What can I do? I can send the
datasets if required. 

Any comment would be helpful. 

Best regards, 

LvC




--
View this message in context: 
http://r.789695.n4.nabble.com/Mlogit-Error-in-model-frame-default-Error-in-solve-default-tp4701843.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to get items for both LHS and RHS for only specific columns in arules?‏

2015-01-15 Thread Kim C.
Hi all, I have a question about the arules package in R. I hope the example 
tables are readable in your email, otherwise you can view it in the 
question.txt in the attachment.Within the apriori function in the arules 
package, I want the outcome to only contain these two variables in the LHS 
HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes 
from the column Product. For instance:   lhs   rhs  
  support   confidence lift1 {HouseOwnerFlag=0} = 
{Product=SV 16xDVD M360 Black}  0.250 0.250 1.002 
{HouseOwnerFlag=1} = {Product=Adventure Works 26 720p}  0.250 
0.250 1.003 {HouseOwnerFlag=0} = {Product=Litware Wall Lamp E3015 
Silver}0.167 0.333 1.334 {HouseOwnerFlag=1} = {Product=Contoso 
Coffee Maker 5C E0900} 0.167 0.333 1.33So now I use the following: 
rules - apriori(sales, parameter=list(support =0.01, confidence =0.8, 
minlen=2), appearance = list(lhs=c(HouseOwnerFlag=0, 
HouseOwnerFlag=1)))Then I use this to ensure that only the Product column is 
on the RHS: inspect( subset( rules, subset = rhs %pin% Product= ) )The 
outcome is like this (for the sake of readability, I omitted the colomns for 
support, lift, confidence):lhs  
  rhs 1 {ProductKey=153, IncomeGroup=Moderate, 
BrandName=Adventure Works }   = {Product=SV 16xDVD M360 Black} 2 
{ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } = 
{Product=Adventure Works 26 720p} 3 {BrandName=Southridge Video, 
NumberChildrenAtHome=0 }= {Product=Litware Wall Lamp E3015 
Silver} 4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 }  
= {Product=Contoso Coffee Maker 5C E0900} So apparently the LHS is able to 
contain every possible column, not just HouseOwnerFlag like I specified.  I see 
that I can put default=rhs in the apriori function to prevent this, like so: 
rules - apriori(sales, parameter=list(support =0.001, confidence =0.5, 
minlen=2), appearance = list(lhs=c(HouseOwnerFlag=0, HouseOwnerFlag=1), 
default=rhs)) Then upon inspecting (without the subset part, just 
inspect(rules), there are far less rules (7) than before but it does indeed 
only containHouseOwnerFlag in the LHS:lhs  rhs  
   support  confidence lift1 {HouseOwnerFlag=0} = 
{MaritalStatus=S}0.250 0.250 1.002 
{HouseOwnerFlag=1} = {Gender=M}   0.250 0.250 
1.003 {HouseOwnerFlag=0} = {NumberChildrenAtHome=0} 0.167 
0.333 1.334 {HouseOwnerFlag=1} = {Gender=M}   
0.167 0.333 1.33However on the RHS there's nothing from the column 
Product in the RHS. So it has no use to inspect it with subset as ofcourse it 
would return null. I tested it several times with different support numbers to 
experiment and see if Product would appear or not, but the 7 same rules remain 
the same.So my question is, how can I specify both the LHS (HouseOwnerFlag) and 
RHS (Product)? What am I doing wrong?You can reproduce this problem by 
downloading this testdataset from the attachment (testdf.txt) or via this 
link:https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 Mind you, I only 
took the first 20 rows from a huge dataset (12 million), so the output here 
won't have the same product names as the example I displayed above. But the 
problem still remains the same. (if you would like to have the entire dataset I 
can email it ofcourse). I want to be able to get only HouseOwnerFlag=0 and/or 
HouseOwnerFlag=1 on the LHS and the column Product on the RHS. I asked this 
question on other forum before, but no response at all unfortunately. Since 
this mailinglist is dedicated to R only I thought you guys might be able to 
help me. Thanks in advance! I look forward to hear from you.Kim 
sales - structure(list(ProductCategoryName = structure(c(6L, 6L, 2L, 
 2L, 2L, 7L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 
 2L), .Label = c(Audio, 
Cameras and camcorders , Cell phones, 
 Computers, 
Games and Toys, Home Appliances, Music, Movies and Audio Books, 
 TV and 
Video), class = factor), ProductSubcategory = structure(c(26L, 

  26L, 11L, 12L, 12L, 21L, 
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 

  27L, 12L, 12L, 12L, 12L, 
12L), .Label = c(Air Conditioners, 
   

Re: [R] sparse matrix from vector outer product

2015-01-15 Thread Philipp A.
thanks, that sounds good!

Martin Maechler maech...@stat.math.ethz.ch schrieb am Thu Jan 15 2015 at
09:07:04:

  Philipp A flying-sh...@web.de
  on Wed, 14 Jan 2015 14:02:40 + writes:

  Hi,
  creating a matrix from two vectors a, b by multiplying each
 combination can
  be done e.g. via

  a %*% t(b)

  or via

  outer(a, b)  # default for third argument is '*'

 really the best (most efficient) way would be

tcrossprod(a, b)

  But this yields a normal matrix.
 of course.

 Please always use small self-contained example code,
 here, e.g.,

 a - numeric(17); a[3*(1:5)] - 10*(5:1)
 b - numeric(12); b[c(2,3,7,11)] - 1:3


  Is there an efficient way to create sparse matrices (from the Matrix
  package) like that?

  Right now i’m doing

  a.sparse = as(a, 'sparseVector')
  b.sparse = as(t(b), 'sparseMatrix')
  a.sparse %*% b.sparse

  but this strikes me as wasteful.

 not really wasteful I think. But there is a nicer and more efficient way :

 require(Matrix)
 tcrossprod(as(a, sparseVector),
as(b, sparseVector))

 now also gives

  17 x 12 sparse Matrix of class dgCMatrix

   [1,] .  .   . . . .   . . . .  . .
   [2,] .  .   . . . .   . . . .  . .
   [3,] . 50 100 . . . 150 . . . 50 .
   [4,] .  .   . . . .   . . . .  . .
   [5,] .  .   . . . .   . . . .  . .
   [6,] . 40  80 . . . 120 . . . 40 .
   [7,] .  .   . . . .   . . . .  . .
   [8,] .  .   . . . .   . . . .  . .
   [9,] . 30  60 . . .  90 . . . 30 .
  [10,] .  .   . . . .   . . . .  . .
  [11,] .  .   . . . .   . . . .  . .
  [12,] . 20  40 . . .  60 . . . 20 .
  [13,] .  .   . . . .   . . . .  . .
  [14,] .  .   . . . .   . . . .  . .
  [15,] . 10  20 . . .  30 . . . 10 .
  [16,] .  .   . . . .   . . . .  . .
  [17,] .  .   . . . .   . . . .  . .
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Iteratively subsetting data by factor level across multiple variables

2015-01-15 Thread Reid Bryant
Hi R experts!

I would like to have a scripted solution that will iteratively subset data
across many variables per factor level of each variable.

To illustrate, if I create a dataframe (df) by:

variation - c(A,B,C,D)
element1 - as.factor(c(0,1,0,1))
element2 - as.factor(c(0,0,1,1))
response - c(4,2,6,2)
df - data.frame(variation,element1,element2,response)

I would like a function that would allow me to subset the data into four
groups and perform analysis across the groups.  One group for each of the
two factor levels across two variables.  In this example its fairly easy
because I only have two variables with two levels each, but would I would
like this to be extendable across situations where I am dealing with more
than 2 variables and/or more than two factor levels per variable.  I am
looking for a result that will mimic the output of the following:

element1_level0 - subset(df,df$element1==0)
element1_level1 - subset(df,df$element1==1)
element2_level0 - subset(df,df$element2==0)
element2_level1 - subset(df,df$element2==1)

The purpose would be to perform analysis on the df across each subset.
Simplistically this could be represented as follows:

mean(element1_level0$response)
mean(element1_level1$response)
mean(element2_level0$response)
mean(element2_level1$response)

Thanks,
Reid

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Latest version of Rtools is incompatible with latest version of R !!

2015-01-15 Thread PRAMEET LAHIRI
 I have installed R version 3.1.2.  I tried to install RTools32.exe which is 
the latest version (using this link - 
http://cran.r-project.org/bin/windows/Rtools/) ! However on using the function 
find_rtools() an error message was displayed which said ---
Rtools is required to build R packages, but no version of Rtools compatible 
with R 3.1.2 was found. (Only the following incompatible version(s) of Rtools 
were found:3.2)Please download and install Rtools 3.1 from 
http://cran.r-project.org/bin/windows/Rtools/ and then run find_rtools().
 
I want to know why the latest version of R is not supporting the latest Rtools! 
Any suggestions?

Thanks for your time,Prameet
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Iteratively subsetting data by factor level across multiple variables

2015-01-15 Thread William Dunlap
There are lots of ways to do this.  You have to decide on how you want to
organize the results.
Here are two ways that use only core R packages. Many people like the plyr
package for this
split-data/analyze-parts/combine-results sort of thing.

 df - data.frame(x=1:27,response=log2(1:27),
   g1=rep(letters[1:2],len=27),g2=rep(LETTERS[24:26],c(10,10,7)))
 s - split(seq_len(nrow(df)), df[c(g1,g2)])
 mean(subset(df, df$g1==a  df$g2==Z)$response)
[1] 4.578656
 vapply(s, function(si)mean(df$response[si]), FUN.VALUE=0) # a.Z part is
previous result
 a.X  b.X  a.Y  b.Y  a.Z  b.Z
1.976834 2.381378 3.880430 3.976834 4.578656 4.581611
 coef(lm(response~x, data=subset(df, df$g1==a  df$g2==Z))) #
regression example
(Intercept)   x
 3.12905040  0.06040022
 vapply(s, function(si)coef(lm(response ~ x, data=df[si,])),
FUN.VALUE=rep(0,2))
  a.X   b.Xa.Yb.Ya.Zb.Z
(Intercept) 0.0862735 0.6882213 2.40741927 2.50763309 3.12905040 3.13556268
x   0.3781121 0.2821928 0.09820075 0.09182506 0.06040022 0.06025202


For the particular case of computing means of a partition of the data you
can use lm() once,
which gives the same numbers organized in a different way:
 coef(lm(response ~ x * (g1:g2) - x - 1, data=df))
   g1a:g2Xg1b:g2Xg1a:g2Yg1b:g2Yg1a:g2Zg1b:g2Z
0.08627350 0.68822126 2.40741927 2.50763309 3.12905040 3.13556268
 x:g1a:g2X  x:g1b:g2X  x:g1a:g2Y  x:g1b:g2Y  x:g1a:g2Z  x:g1b:g2Z
0.37811212 0.28219281 0.09820075 0.09182506 0.06040022 0.06025202



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jan 15, 2015 at 11:42 AM, Reid Bryant reidbry...@gmail.com wrote:

 Hi R experts!

 I would like to have a scripted solution that will iteratively subset data
 across many variables per factor level of each variable.

 To illustrate, if I create a dataframe (df) by:

 variation - c(A,B,C,D)
 element1 - as.factor(c(0,1,0,1))
 element2 - as.factor(c(0,0,1,1))
 response - c(4,2,6,2)
 df - data.frame(variation,element1,element2,response)

 I would like a function that would allow me to subset the data into four
 groups and perform analysis across the groups.  One group for each of the
 two factor levels across two variables.  In this example its fairly easy
 because I only have two variables with two levels each, but would I would
 like this to be extendable across situations where I am dealing with more
 than 2 variables and/or more than two factor levels per variable.  I am
 looking for a result that will mimic the output of the following:

 element1_level0 - subset(df,df$element1==0)
 element1_level1 - subset(df,df$element1==1)
 element2_level0 - subset(df,df$element2==0)
 element2_level1 - subset(df,df$element2==1)

 The purpose would be to perform analysis on the df across each subset.
 Simplistically this could be represented as follows:

 mean(element1_level0$response)
 mean(element1_level1$response)
 mean(element2_level0$response)
 mean(element2_level1$response)

 Thanks,
 Reid

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Psych package: why am I receiving NA for many of the factor scores?

2015-01-15 Thread William Revelle
Dear Elizabeth,
  A correction to my suggestion:
scaled - scale(mydata)
wts - f4$weights

scores -t( apply(scaled,1,function(x) colSums(x*wts,na.rm=TRUE)))  #you need 
the colSums not the sum function

Also, your confusion in getting the NAs with missing data was due to a bug in 
the fa function in the way it just ignored the missing statement.

Thanks for catching that.  It is now fixed and should be on CRAN real soon.

Bill





 On Jan 14, 2015, at 9:39 AM, William Revelle li...@revelle.net wrote:
 
 Dear Elizabeth,
 
 Factor scores in the fa function are found by multiplying the standardized 
 data by the factor weights using matrix multiplication.  This will give 
 scores only for subjects with complete data.
 
 However, if you want, you can create them yourself by standardizing your data 
 and then multiplying them by the weights:
 
 mydata - rProjectSurveyDataJustVariables
 
 f4 - fa(my.data,4)  #modify this to match your call
 wts - f4$wts
 scaleddata - scale(mydata)
 scores - apply(scaleddata,1,function(x) sum(x * wts,na.rm=TRUE))   
 
 #this will work with complete data, and impute factor scores for those cases 
 with incomplete data.  If the data are missing completely at random, this 
 should give a reasonable answer.  However, if the missingness has some 
 structure to it, the imputed scores will be biased.
 
 This is a reasonable option to add to the fa function and I will do so.
 
 A side note.  If you need help with a package, e.g., psych, you get faster 
 responses by writing to the package author.  I just happened to be browsing 
 R-help when your question came in.
 
 Let me know if this solution works for you.
 
 Bill
 
 
 
 On Jan 13, 2015, at 7:46 PM, Elizabeth Barrett-Cheetham 
 ebarrettcheet...@gmail.com wrote:
 
 
 Hello R Psych package users,
 
 Why am I receiving NA for many of the factor scores for individual
 observations? I'm assuming it is because there is quite a bit of missing
 data (denoted by NA). Are there any tricks in the psych package for getting
 a complete set of factor scores? 
 
 My input is: 
 rProjectSurveyDataJustVariables = read.csv(R Project Survey Data Just
 Variables.csv, header = TRUE)
 solution - fa(r = rProjectSurveyDataJustVariables, nfactors = 4,  rotate =
 oblimin, fm = ml, scores = tenBerge, warnings = TRUE, oblique.scores =
 TRUE) 
 solution
 
 Thank you.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 William Revelle  
 http://personality-project.org/revelle.html
 Professorhttp://personality-project.org
 Department of Psychology   http://www.wcas.northwestern.edu/psych/
 Northwestern University  http://www.northwestern.edu/
 Use R for psychology http://personality-project.org/r
 It is 5 minutes to midnight  http://www.thebulletin.org
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

William Revellehttp://personality-project.org/revelle.html
Professor  http://personality-project.org
Department of Psychology   http://www.wcas.northwestern.edu/psych/
Northwestern Universityhttp://www.northwestern.edu/
Use R for psychology http://personality-project.org/r
It is 5 minutes to midnighthttp://www.thebulletin.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with latex when checking a package

2015-01-15 Thread Leandro Roser
Hello everyone. I'm checking a package in Windows 8.1,  and when the
program starts to create the PDF manual, exits with status 1, printing
the following message:


# End(Not run)
Sorry, but I'm not programmed to handle this case;
I'll just pretend that you didn't ask for it.
! You can't use `macro parameter character #' in vertical mode.
l.164 ##


I think  the program is having problems with the  \dontrun{} sections
when creating the PDF. Does anyone know how to solve this problem?

Thanks in advance,



-- 
Lic. Leandro Gabriel Roser
 Laboratorio de Genética
 Dto. de Ecología, Genética y Evolución,
 F.C.E.N., U.B.A.,
 Ciudad Universitaria, PB II, 4to piso,
 Nuñez, Cdad. Autónoma de Buenos Aires,
 Argentina.
 tel ++54 +11 4576-3300 (ext 219)
 fax ++54 +11 4576-3384

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with latex when checking a package

2015-01-15 Thread Richard M. Heiberger
You didn't show your code, so this is a guess.

My guess is that the \dontrun{} is outside the \examples{} section.
It must be inside the \examples{} section.

This is right

\examples{
abc - 123
\dontrun{
def - 456
}
ghi - 789
}

This is my guess as to what you did.
\examples{
abc - 123
ghi - 789
}
\dontrun{
def - 456
}

Rich

On Thu, Jan 15, 2015 at 10:31 PM, Leandro Roser learo...@gmail.com wrote:
 Hello everyone. I'm checking a package in Windows 8.1,  and when the
 program starts to create the PDF manual, exits with status 1, printing
 the following message:


 # End(Not run)
 Sorry, but I'm not programmed to handle this case;
 I'll just pretend that you didn't ask for it.
 ! You can't use `macro parameter character #' in vertical mode.
 l.164 ##


 I think  the program is having problems with the  \dontrun{} sections
 when creating the PDF. Does anyone know how to solve this problem?

 Thanks in advance,



 --
 Lic. Leandro Gabriel Roser
  Laboratorio de Genética
  Dto. de Ecología, Genética y Evolución,
  F.C.E.N., U.B.A.,
  Ciudad Universitaria, PB II, 4to piso,
  Nuñez, Cdad. Autónoma de Buenos Aires,
  Argentina.
  tel ++54 +11 4576-3300 (ext 219)
  fax ++54 +11 4576-3384

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with latex when checking a package

2015-01-15 Thread Leandro Roser
Hi, Richard.  I was putting some piece of code in a wrong section
(description). After correcting that, I have no more errors.

Many tanks for your help,

Leandro.

2015-01-16 1:59 GMT-03:00 Richard M. Heiberger r...@temple.edu:
 You didn't show your code, so this is a guess.

 My guess is that the \dontrun{} is outside the \examples{} section.
 It must be inside the \examples{} section.

 This is right

 \examples{
 abc - 123
 \dontrun{
 def - 456
 }
 ghi - 789
 }

 This is my guess as to what you did.
 \examples{
 abc - 123
 ghi - 789
 }
 \dontrun{
 def - 456
 }

 Rich

 On Thu, Jan 15, 2015 at 10:31 PM, Leandro Roser learo...@gmail.com wrote:
 Hello everyone. I'm checking a package in Windows 8.1,  and when the
 program starts to create the PDF manual, exits with status 1, printing
 the following message:


 # End(Not run)
 Sorry, but I'm not programmed to handle this case;
 I'll just pretend that you didn't ask for it.
 ! You can't use `macro parameter character #' in vertical mode.
 l.164 ##


 I think  the program is having problems with the  \dontrun{} sections
 when creating the PDF. Does anyone know how to solve this problem?

 Thanks in advance,



 --
 Lic. Leandro Gabriel Roser
  Laboratorio de Genética
  Dto. de Ecología, Genética y Evolución,
  F.C.E.N., U.B.A.,
  Ciudad Universitaria, PB II, 4to piso,
  Nuñez, Cdad. Autónoma de Buenos Aires,
  Argentina.
  tel ++54 +11 4576-3300 (ext 219)
  fax ++54 +11 4576-3384

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Lic. Leandro Gabriel Roser
 Laboratorio de Genética
 Dto. de Ecología, Genética y Evolución,
 F.C.E.N., U.B.A.,
 Ciudad Universitaria, PB II, 4to piso,
 Nuñez, Cdad. Autónoma de Buenos Aires,
 Argentina.
 tel ++54 +11 4576-3300 (ext 219)
 fax ++54 +11 4576-3384

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help installing packages with dependencies for R, behind

2015-01-15 Thread jose.nunez-zuleta
Hello all,

Sorry about the first post, I forgot to mention that I am using R on Linux. As 
Brian suggested I looked closely into the R help and tried the verbose option 
( install.packages(ggplot2, verbose = 'true')) but it wasn't very helpful. 
But after checking 'help(download.file)' I found than my format of the variable 
http_proxy was wrong and needed to be like this (protocol and port):

http_proxy=http://myproxy.com:80

Downloads now work fine, I hope this saves some time to the next person dealing 
with this issue.

Thanks for the help!

--Jose

--

Message: 26
Date: Wed, 14 Jan 2015 16:37:17 -0500
From: jose.nunez-zul...@barclays.com
To: r-help@r-project.org
Subject: [R] Help installing packages with dependencies for R, behind
corporate firewall
Message-ID:

34922d8098cb7048a99568d009af262d0916cd1...@nykpcmmgmb05.intranet.barcapint.com

Content-Type: text/plain; charset=us-ascii

Hello R-users,

I have no practical experience with the R language itself but I've been tasked 
to install it behind a corporate firewall. Basic installation seems sane but 
when my user tries to install a custom library like this:

install.packages(ggplot2)
Installing package into '/home/myuser/rlibs'
(as 'lib' is unspecified)
Warning: unable to access index for repository 
http://cran.us.r-project.org/src/contrib
Warning message:
package 'ggplot2' is not available (for R version 3.1.2)

See no progress and eventually nothing gets downloaded into my custom 
directory. My question is, there is a way to add verbosity to R to see if the 
network proxy setting are working correctly (I can get files using 'wget' 
without problems under the same account)?

More details about my installation below:

1) I'm behind a firewall with http proxy access. I have no root and made a 
local installation
2) Contents of my ~/.Renviron:

R_LIBS=/home/myuser/rlibs

3) Contents of ~/.Rprofile:

r - getOption(repos) # hard code the US repo for CRAN
r[CRAN] - http://cran.us.r-project.org;
options(repos = r)
rm(r)

4) Http environment variable proxy is set (like export 
http_proxy=proxy..com. I can see it if I do 'Sys.getenv(http_proxy)' 
from inside the R prompt)

NOTE: I managed to install libraries 'by hand' but for module that have 
dependencies this doesn't work: 
cd /home/$USER/rlibs/; wget 
http://cran.us.r-project.org/src/contrib/timeDate_3011.99.tar.gz; 
/mylocal/R-3.1.2/bin/R CMD INSTALL -l /localrdir timeDate_3011.99.tar.gz

I apologize if this is not the correct list (went through all of them, WIKI and 
other groups looking for an answer to this issue without much luck).

Thanks,

--Jose


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Latest version of Rtools is incompatible with latest version of R !!

2015-01-15 Thread Henrik Bengtsson
On Thu, Jan 15, 2015 at 9:27 AM, PRAMEET LAHIRI
prameet_lah...@yahoo.co.in wrote:
  I have installed R version 3.1.2.  I tried to install RTools32.exe which is 
 the latest version (using this link - 
 http://cran.r-project.org/bin/windows/Rtools/) ! However on using the 
 function find_rtools() an error message was displayed which said ---
 Rtools is required to build R packages, but no version of Rtools compatible 
 with R 3.1.2 was found. (Only the following incompatible version(s) of Rtools 
 were found:3.2)Please download and install Rtools 3.1 from 
 http://cran.r-project.org/bin/windows/Rtools/ and then run find_rtools().

 I want to know why the latest version of R is not supporting the latest 
 Rtools! Any suggestions?

find_rtools() is a function of the 'devtools' package.  Maybe it's an
issue with that package and not R, and I'm pretty sure Duncan Murdoch
put great efforts in asserting Rtools is working well with R (that's
been my experience for the last 5-10 years).

I also know that devtools 1.7.0 have been submitted to CRAN, so
if/when that becomes available you problem might be solved.  You can
of course also grab it earlier from the devtools GitHub page.

My $.02

Henrik


 Thanks for your time,Prameet
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to speed up the simulation when processing time and date

2015-01-15 Thread MacQueen, Don
I don't have time to look at your example in detail, but there are couple
of things that caught my eye.

Use as.POSIXct() instead of as.POSIXlt()
I don't see anything that requires POSIXlt, and POSIXct is simpler.

If everything in
  Total_Zone1
is numeric, then leave it as a matrix, do not convert to data frame.

If you use as.POSIXct() then the times are actually the number of seconds
since an origin, and thus can be treated as numeric, making it possible to
leave Total_Zone1 as a matrix.

If it is a matrix, you can refer to the times using
  Total_Zone1[,'time'] instead of Total_Zone1$time

Either of these might help speed things up, though I can't be sure without
trying it.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/15/15, 2:38 AM, Faranak Golestaneh faranak.golesta...@gmail.com
wrote:

Dear Friends,


I am trying to program a forecasting method in R. The predictors are
weather variables in addition to lag measured Power values. The accuracy
of
data is one minute and their corresponding time and date are available.

To add lag values of power to the predictors list, I am aiming to consider
last ten minutes values. If I was sure that the database is perfect and
the
values for all minutes throughout the year are available I could simply
shift the Power columns but as it may not be always the case, I have used
the following codes for each time t to check if all its corresponding ten
minutes lag values are available and extract them and store in a matrix.
The problem is that, the process is highly time consuming and it takes a
long time to be simulated. Here I ve given reproducible example. I was
wondering any of you can suggest a better approach. Thank you.



rm(list = ls())

cat(\014)



st=2012/01/01

et=2012/02/27



st - as.POSIXlt(as.Date(st))

et - as.POSIXlt(as.Date(et))

time= seq(from=st, to=et,by=60)

timeas.POSIXlt(time)

#Window is the number of lag values

#leadTime is look-ahead time (forecast horizon)

leadTime=10;

Window=15;



=time[1:8000]

Total_Zone1=abind(matrix(rnorm(4000*2),4000*2,1),
matrix(rnorm(4000*2),4000*2,1),
matrix(rnorm(4000*2),4000*2,1),time[1:8000])

N_Train=nrow(Total_Zone1);

lag_Power=matrix(0,N_Train,Window)

colnames(Total_Zone1) - c( airtemp,humidity,  Power, time)

Total_Zone1- as.data.frame(Total_Zone1)

for (tt in 4000:N_Train){

  Statlag=Total_Zone1$time[tt]-(leadTime+Window)*60

  EndLag=Total_Zone1$time[tt]-(leadTime)*60

  Index_lags=which((Total_Zone1$timeStatlag)(Total_Zone1$time=EndLag))

  if (size(Index_lags)[2]Window) {

Statlag2=Total_Zone1$time[tt]-24*60*60

Index_lags2=which(Total_Zone1$time==Statlag2)


tem1=rep(Total_Zone1[Index_lags2,c(Power)],Window-size(Index_lags)[2])

lag_Power[tt,]=t(c(Total_Zone1[Index_lags,c(Power)],tem1))

  }else{

 lag_Power[tt,]=t(Total_Zone1[Index_lags,c(Power)])

  }

}

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interactive Decion Trees in R

2015-01-15 Thread balakrishna
How can we have the Interactive Decision in R, I want a variable which is
of Business importance to be injected in the Tree (Rpart specifically).

R tree is not picking up the variable, we have this functionality in SAS
and Statistica where we can force a variable to be injected in a branch and
further split of tree  is done by machine.


Thanks
Krishna

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two-sample KS test: data becomes significantly different after normalization

2015-01-15 Thread Monnand
Thank you, Chris and Martin!

On Wed Jan 14 2015 at 7:31:12 AM Andrews, Chris chri...@med.umich.edu
wrote:

 Your definition of p-value is not correct.  See, for example,
 http://en.wikipedia.org/wiki/P-value#Misunderstandings

 -Original Message-
 From: Monnand [mailto:monn...@gmail.com]
 Sent: Wednesday, January 14, 2015 2:17 AM
 To: Andrews, Chris
 Cc: r-help@r-project.org
 Subject: Re: [R] two-sample KS test: data becomes significantly different
 after normalization

 I know this must be a wrong method, but I cannot help to ask: Can I only
 use the p-value from KS test, saying if p-value is greater than \beta, then
 two samples are from the same distribution. If the definition of p-value is
 the probability that the null hypothesis is true, then why there's little
 people uses p-value as a true probability. e.g. normally, people will not
 multiply or add p-values to get the probability that two independent null
 hypothesis are both true or one of them is true. I had this question for
 very long time.

 -Monnand

 On Tue Jan 13 2015 at 2:47:30 PM Andrews, Chris chri...@med.umich.edu
 wrote:

  This sounds more like quality control than hypothesis testing.  Rather
  than statistical significance, you want to determine what is an
 acceptable
  difference (an 'equivalence margin', if you will).  And that is a
 question
  about the application, not a statistical one.
  
  From: Monnand [monn...@gmail.com]
  Sent: Monday, January 12, 2015 10:14 PM
  To: Andrews, Chris
  Cc: r-help@r-project.org
  Subject: Re: [R] two-sample KS test: data becomes significantly different
  after normalization
 
  Thank you, Chris!
 
  I think it is exactly the problem you mentioned. I did consider
  1000-point data is a large one at first.
 
  I down-sampled the data from 1000 points to 100 points and ran KS test
  again. It worked as expected. Is there any typical method to compare
  two large samples? I also tried KL diverge, but it only gives me some
  number but does not tell me how large the distance is should be
  considered as significantly different.
 
  Regards,
  -Monnand
 
  On Mon, Jan 12, 2015 at 9:32 AM, Andrews, Chris chri...@med.umich.edu
  wrote:
  
   The main issue is that the original distributions are the same, you
  shift the two samples *by different amounts* (about 0.01 SD), and you
 have
  a large (n=1000) sample size.  Thus the new distributions are not the
 same.
  
   This is a problem with testing for equality of distributions.  With
  large samples, even a small deviation is significant.
  
   Chris
  
   -Original Message-
   From: Monnand [mailto:monn...@gmail.com]
   Sent: Sunday, January 11, 2015 10:13 PM
   To: r-help@r-project.org
   Subject: [R] two-sample KS test: data becomes significantly different
  after normalization
  
   Hi all,
  
   This question is sort of related to R (I'm not sure if I used an R
  function
   correctly), but also related to stats in general. I'm sorry if this is
   considered as off-topic.
  
   I'm currently working on a data set with two sets of samples. The csv
  file
   of the data could be found here: http://pastebin.com/200v10py
  
   I would like to use KS test to see if these two sets of samples are
 from
   different distributions.
  
   I ran the following R script:
  
   # read data from the file
   data = read.csv('data.csv')
   ks.test(data[[1]], data[[2]])
   Two-sample Kolmogorov-Smirnov test
  
   data:  data[[1]] and data[[2]]
   D = 0.025, p-value = 0.9132
   alternative hypothesis: two-sided
   The KS test shows that these two samples are very similar. (In fact,
 they
   should come from same distribution.)
  
   However, due to some reasons, instead of the raw values, the actual
 data
   that I will get will be normalized (zero mean, unit variance). So I
 tried
   to normalize the raw data I have and run the KS test again:
  
   ks.test(scale(data[[1]]), scale(data[[2]]))
   Two-sample Kolmogorov-Smirnov test
  
   data:  scale(data[[1]]) and scale(data[[2]])
   D = 0.3273, p-value  2.2e-16
   alternative hypothesis: two-sided
   The p-value becomes almost zero after normalization indicating these
 two
   samples are significantly different (from different distributions).
  
   My question is: How the normalization could make two similar samples
   becomes different from each other? I can see that if two samples are
   different, then normalization could make them similar. However, if two
  sets
   of data are similar, then intuitively, applying same operation onto
 them
   should make them still similar, at least not different from each other
  too
   much.
  
   I did some further analysis about the data. I also tried to normalize
 the
   data into [0,1] range (using the formula (x-min(x))/(max(x)-min(x))),
 but
   same thing happened. At first, I thought it might be outliers caused
 this
   problem (I can see that an outlier may cause this problem if I
 normalize
   the data 

Re: [R] sparse matrix from vector outer product

2015-01-15 Thread Martin Maechler
 Philipp A flying-sh...@web.de
 on Wed, 14 Jan 2015 14:02:40 + writes:

 Hi,
 creating a matrix from two vectors a, b by multiplying each combination 
can
 be done e.g. via

 a %*% t(b)

 or via

 outer(a, b)  # default for third argument is '*'

really the best (most efficient) way would be

   tcrossprod(a, b)

 But this yields a normal matrix.
of course.

Please always use small self-contained example code,
here, e.g.,

a - numeric(17); a[3*(1:5)] - 10*(5:1)
b - numeric(12); b[c(2,3,7,11)] - 1:3


 Is there an efficient way to create sparse matrices (from the Matrix
 package) like that?

 Right now i’m doing

 a.sparse = as(a, 'sparseVector')
 b.sparse = as(t(b), 'sparseMatrix')
 a.sparse %*% b.sparse

 but this strikes me as wasteful.

not really wasteful I think. But there is a nicer and more efficient way :

require(Matrix)
tcrossprod(as(a, sparseVector),
   as(b, sparseVector))

now also gives

 17 x 12 sparse Matrix of class dgCMatrix

  [1,] .  .   . . . .   . . . .  . .
  [2,] .  .   . . . .   . . . .  . .
  [3,] . 50 100 . . . 150 . . . 50 .
  [4,] .  .   . . . .   . . . .  . .
  [5,] .  .   . . . .   . . . .  . .
  [6,] . 40  80 . . . 120 . . . 40 .
  [7,] .  .   . . . .   . . . .  . .
  [8,] .  .   . . . .   . . . .  . .
  [9,] . 30  60 . . .  90 . . . 30 .
 [10,] .  .   . . . .   . . . .  . .
 [11,] .  .   . . . .   . . . .  . .
 [12,] . 20  40 . . .  60 . . . 20 .
 [13,] .  .   . . . .   . . . .  . .
 [14,] .  .   . . . .   . . . .  . .
 [15,] . 10  20 . . .  30 . . . 10 .
 [16,] .  .   . . . .   . . . .  . .
 [17,] .  .   . . . .   . . . .  . .


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using poly() to predict

2015-01-15 Thread Stanislav Aggerwal
Thanks Prof Ripley.
For anybody else wondering about this, see:
http://stackoverflow.com/questions/26728289/extracting-orthogonal-polynomial-coefficients-from-rs-poly-function

=

The polynomials are defined recursively using the alpha and norm2
coefficients of the poly object you've created. Let's look at an example:

z - poly(1:10, 3)
attributes(z)$coefs# $alpha# [1] 5.5 5.5 5.5# $norm2# [1]1.0
10.0   82.5  528.0 3088.8

For notation, let's call a_d the element in index d of alpha and let's call
n_d the element in index d of norm2. F_d(x) will be the orthogonal
polynomial of degree d that is generated. For some base cases we have:

F_0(x) = 1 / sqrt(n_2)
F_1(x) = (x-a_1) / sqrt(n_3)

The rest of the polynomials are recursively defined:

F_d(x) = [(x-a_d) * sqrt(n_{d+1}) * F_{d-1}(x) - n_{d+1} / sqrt(n_d) *
F_{d-2}(x)] / sqrt(n_{d+2})

To confirm with x=2.1:

x - 2.1
predict(z, newdata=x)#   1 2 3# [1,]
-0.3743277 0.1440493 0.1890351# ...

a - attributes(z)$coefs$alpha
n - attributes(z)$coefs$norm2
f0 - 1 / sqrt(n[2])(f1 - (x-a[1]) / sqrt(n[3]))# [1] -0.3743277(f2
- ((x-a[2]) * sqrt(n[3]) * f1 - n[3] / sqrt(n[2]) * f0) /
sqrt(n[4]))# [1] 0.1440493(f3 - ((x-a[3]) * sqrt(n[4]) * f2 - n[4] /
sqrt(n[3]) * f1) / sqrt(n[5]))# [1] 0.1890351

The most compact way to export your polynomials to your C++ code would
probably be to export attributes(z)$coefs$alpha and
attributes(z)$coefs$norm2 and then use the recursive formula in C++ to
evaluate your polynomials.


On Wed, Jan 14, 2015 at 2:38 PM, Prof Brian Ripley rip...@stats.ox.ac.uk
wrote:

 On 14/01/2015 14:20, Stanislav Aggerwal wrote:

 This method of finding yhat as x %*% b works when I use raw polynomials:

 x-1:8
 y- 1+ 1*x + .5*x^2
 fit-lm(y~poly(x,2,raw=T))
 b-coef(fit)
 xfit-seq(min(x),max(x),length=20)
 yfit-b[1] + poly(xfit,2,raw=T) %*% b[-1]
 plot(x,y)
 lines(xfit,yfit)

 But it doesn't work when I use orthogonal polynomials:

 fit-lm(y~poly(x,2))
 b-coef(fit)
 yfit-b[1] + poly(xfit,2) %*% b[-1]
 plot(x,y)
 lines(xfit,yfit,col='red')

 I have a feeling that the second version needs to incorporate poly() coefs
 (alpha and norm2) somehow. If so, please tell me how.

 I do know how to use predict() for this. I just want to understand how
 poly() works.


 What matters is how lm() and predict() use poly(): see ?makepredictcall
 and its code.

 str(fit) might also help.


 Thanks very much for any help
 Stan

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

  Please do, and do not send HTML.

 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Emeritus Professor of Applied Statistics, University of Oxford
 1 South Parks Road, Oxford OX1 3TG, UK


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RStudio connection to server: Safari cannot open page because it could not connect to the server

2015-01-15 Thread Ista Zahn
Hi John,

This isn't a question about R, and so is off-topic here. Try a web search
for home server port forwarding or similar.

Best,
Ista
On Jan 15, 2015 7:30 AM, John Sorkin jsor...@grecc.umaryland.edu wrote:

 I set up Rstudio, and can access it from within my lan using
 http:/192.168.108:8787.
 I looked up my external IP address using one of the websites that returns
 an ip addresses and tried to connect from outside my LAN using
 http://73.213.144.65:8787
 and received a message:
 Safari cannot open the page because the page because it could not connect
 to the server.
 I can ping the 73.213.144.65

 My LAN connects to the WWW through a wireless router which connects to a
 cable model.

 Can anyone help me connect?
 Thank you,
 John


 John David Sorkin M.D., Ph.D.
 Professor of Medicine
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology and
 Geriatric Medicine
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Confidentiality Statement:
 This email message, including any attachments, is for ...{{dropped:16}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RStudio connection to server: Safari cannot open page because it could not connect to the server

2015-01-15 Thread John Sorkin
I set up Rstudio, and can access it from within my lan using 
http:/192.168.108:8787.
I looked up my external IP address using one of the websites that returns an ip 
addresses and tried to connect from outside my LAN using
http://73.213.144.65:8787
and received a message:
Safari cannot open the page because the page because it could not connect to 
the server. 
I can ping the 73.213.144.65

My LAN connects to the WWW through a wireless router which connects to a cable 
model.

Can anyone help me connect?
Thank you,
John


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Probit regression with misclassified binary dependent variable

2015-01-15 Thread Jack
Is there a generalized linear mixed-effects model implementation for R that
could handle misclassified binary data? Unless I have overlooked something
in the documentation, glmer with family=binomial(link=probit) does not
handle misclassification. Closest to what I have found was misclass, which
implements Hausman et al. 1998, in an old McSpatial package version 1.1.1,
but it is no longer available and does not handle random effects.

Hausman, J. A., Abrevaya, J.,  Scott-Morton, F. M. (1998).
Misclassification of the dependent variable in a discrete-response setting.
Journal of Econometrics, 87(2), 239-269.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to speed up the simulation when processing time and date

2015-01-15 Thread Faranak Golestaneh
Dear Friends,


I am trying to program a forecasting method in R. The predictors are
weather variables in addition to lag measured Power values. The accuracy of
data is one minute and their corresponding time and date are available.

To add lag values of power to the predictors list, I am aiming to consider
last ten minutes values. If I was sure that the database is perfect and the
values for all minutes throughout the year are available I could simply
shift the Power columns but as it may not be always the case, I have used
the following codes for each time t to check if all its corresponding ten
minutes lag values are available and extract them and store in a matrix.
The problem is that, the process is highly time consuming and it takes a
long time to be simulated. Here I ve given reproducible example. I was
wondering any of you can suggest a better approach. Thank you.



rm(list = ls())

cat(\014)



st=2012/01/01

et=2012/02/27



st - as.POSIXlt(as.Date(st))

et - as.POSIXlt(as.Date(et))

time= seq(from=st, to=et,by=60)

timeas.POSIXlt(time)

#Window is the number of lag values

#leadTime is look-ahead time (forecast horizon)

leadTime=10;

Window=15;



=time[1:8000]

Total_Zone1=abind(matrix(rnorm(4000*2),4000*2,1),
matrix(rnorm(4000*2),4000*2,1), matrix(rnorm(4000*2),4000*2,1),time[1:8000])

N_Train=nrow(Total_Zone1);

lag_Power=matrix(0,N_Train,Window)

colnames(Total_Zone1) - c( airtemp,humidity,  Power, time)

Total_Zone1- as.data.frame(Total_Zone1)

for (tt in 4000:N_Train){

  Statlag=Total_Zone1$time[tt]-(leadTime+Window)*60

  EndLag=Total_Zone1$time[tt]-(leadTime)*60

  Index_lags=which((Total_Zone1$timeStatlag)(Total_Zone1$time=EndLag))

  if (size(Index_lags)[2]Window) {

Statlag2=Total_Zone1$time[tt]-24*60*60

Index_lags2=which(Total_Zone1$time==Statlag2)

tem1=rep(Total_Zone1[Index_lags2,c(Power)],Window-size(Index_lags)[2])

lag_Power[tt,]=t(c(Total_Zone1[Index_lags,c(Power)],tem1))

  }else{

 lag_Power[tt,]=t(Total_Zone1[Index_lags,c(Power)])

  }

}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HOW TO DOWNLOAD INTRADAY DATA AT ONE TIME

2015-01-15 Thread Amatoallah Ouchen
Is there any way VIA R to download all available intraday data for stocks
at once (for example, all the data available at the Indian stock exchange)?
I need to make a comparative analysis and downloading the data by ticker is
too time consuming, besides I want to know if there is any website that
store the historical intraday data. Other sites delete the data gradually.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] passing elements of named vector / named list to function

2015-01-15 Thread Rainer M Krug

Hi

Following scenario: I have a function fun

--8---cut here---start-8---
fun - function(A, B, C, ...){paste(A, B, C, ...)}
--8---cut here---end---8---

and x defined as follow

--8---cut here---start-8---
x - 1:5
names(x) - LETTERS[x]
--8---cut here---end---8---

now I want to pass the *elements* of x to fun as named arguments, i.e.

,
|  fun(A=1, B=2, C=3, D=4, E=5)
| [1] 1 2 3 4 5
`

The below examples obviously do not work:

,
|  fun(x)
| Error in paste(A, B, C, list(...)) :
|   argument B is missing, with no default
|  fun(unlist(x))
| Error in paste(A, B, C, list(...)) :
|   argument B is missing, with no default
`

How can I extract from x the elements and pass them on to fun()?

I could easily change x to a list() if this would be easier.

--8---cut here---start-8---
x - list(A=1, B=2, C=3, D=4, E=5)
--8---cut here---end---8---

In my actual program, x can have different elements as well as fun -
this is decided programmatically.

Any suggestions how I can achieve this?

Thanks,

Rainer

-- 
Rainer M. Krug
email: Raineratkrugsdotde
PGP: 0x0F52F982


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 3.1.2 mle2() function on Windows 7 Error and multiple solutions

2015-01-15 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

  For what it's worth, you can use either nlminb (directly) or optimx
within the mle2 wrapper by specifying the 'optimizer' parameter ...
this gives you flexibility in optimization along with the convenience
of mle2 (likelihood ratio tests via anova(), likelihood profiling, etc.)

On 15-01-15 09:26 AM, Ravi Varadhan wrote:
 Hi,
 
 
 
 I tried your problem with optimx package.  I found a better
 solution than that found by mle2.
 
 
 
 ?library(optimx)
 
 
 
 # the objective function needs to be re-written
 
 LL2 - function(par,y) {
 
 lambda - par[1] alpha - par[2] beta - par[3] R = 
 Nweibull(y,lambda,alpha,beta)
 
 -sum(log(R)) }
 
 
 
 optimx(fn=LL2,  par=c(.01,325,.8),y=y, 
 lower=c(.1,.1,.1),upper = c(Inf, 
 Inf,Inf),control=list(all.methods=TRUE))
 
 
 
 # Look at the solution found by `nlminb' and `nmkb'. This is the 
 optimal one.  This log-likelihood is larger than that of mle2 and 
 other optimizers in optimx.
 
 
 
 If this solution is not what you are looking for, your problem may
 be poorly scaled.  First, make sure that the likelihood is coded 
 correctly.  If it is correct, then you may need to improve the 
 scaling of the problem.
 
 
 
 
 
 Hope this is helpful,
 
 Ravi
 
 
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJUt87VAAoJEOCV5YRblxUH9E4H/ismNjBi/diA7db1f4EtIYUz
fk0V1GIjAkhNr+gxs8bu6CBAMB2f/ufw+9ey2X6yHlzvgfwzIwNafgg9c5qVlArF
xD8A4w/4G9cRsQFX8yySEQMP7dH5tyCTeRHU0sEcTbY+vV/NtWAYpF7k36He0QnQ
Jz/Gfmjt/TTVlcsL4crr8IdOjP34mq7H1SGXKNoBymhaggkBXXjG+IlhPK3/HE4s
2LFKusdSVDiJCCR+kafwyKxk76Lf2WADw9/RaysWfW0/v5O5dWU4IuvK2//nzvts
7rKMkF9/zlT+LgLNo7LON+RTOeDtTMqyA10Vu+txQTKH4AcMP4LqYoiGMerl6O0=
=cHEo
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.