date:20150716

[R] matrix manipulation

2015-07-16 Thread Therneau, Terry M., Ph.D.

This is as much a mathematics as an R question, in the this should be easy but I don't 
see it category.


Assume I have a full rank p by p matrix V  (aside: V = (X'X)^{-1} for a particular setup), 
a p by k matrix B, and I want to complete an orthagonal basis for the space with distance 
function V.  That is, find A such that t(B) %*% V %*% A =0, where A has p rows and p-k 
columns.


With V=identity this is easy. I can do it in 1-2 lines using qr(), lm(), or several other 
tools.  A part of me is quite certain that the general problem isn't more than 3 lines of 
R, but after a day of beating my head on the issue I still don't see it.  Math wise it 
looks like a simple homework problem in a mid level class, but I'm not currently sure that 
I'd pass said class.


If someone could show the way I would be grateful.  Either that or assurance that the 
problem actually IS hard and I'm not as dense as I think.


Terry T.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (ordinal) logistic regression

2015-07-16 Thread Wolfgang Raffelsberger

Dear list,
I've been looking at previous posts on the list, but I haven't found any
close enough to my question/problem.
My data can be seen as a matrix of mutiple individuals (columns) with
(rather independent) measures (lines). Now based on supplemental
information, the individuals are organized in (multiple) ordered classes.
Now I would like to test for which type of measure (ie which line form my
matrix of data) the groups are distinct (eg different by group-mean). In
other words, I would like to see in which line of my input matrix the
measures for the groups of individuals associate to distinct group-values.

So I tried glm (or glm2) on each line of the matrix, but in my toy-example
(shown below) I'm surprised to get warnings about not finding convergence
in the nice toy-cases (ie distinct groups as I am looking for),e even
with glm2 ! I see in such nice cases with glm() the Pr(|z|) is close
to 1, which in first sight is OK (since: H0 : coefficient =0), but I
suppose the test is not really set up right this way.  When trying lrm (rms
package) I even get an error message (Unable to fit model using “lrm.fit”)
with the nice cases.
In my real data with 4000 lines of data (ie 4000 glm tests) multiple
testing correction would transform everything from 1-p to end up at p=1, so
that’s another problem with this approach.
I suppose somehow I should transform my data (I don't see where I would
change the H0 ?) to obtain low and NOT high p-values (example below) in the
case I'm looking for, ie when group-means are distinct.

Any suggestions ?

Thank’s in advance,
Wolfgang

Here my toy-example :
datB1 - c(12,14:16,18:21,20:22,20,22:24,19.5)   # fit
partially/overlapping to 3grp model
datB2 - c(11:15,21:25,31:36)# too beautiful to be real
...
datB3 - c(15:12,11:15,12:14,15:12)  # no fit to 3grp model
datB4 - c(11:15,15:11,21:26)# grpA == grpB but grpA !=
grpC

datB - rbind(datB1,datB2,datB3,datB4)
set.seed(2015)
datB - datB + round(runif(length(datB),-0.3,0.3),1)  # add some noise
datB - datB - rowMeans(datB) # centering
## here the definition of the groups
grpB - gl(3,6,labels=LETTERS[1:3])[c(-6,-7)]
   table(grpB)

## display
layout(1:4)
for(i in 1:4) plot(datB[i,],as.numeric(grpB))

## now the 'test'
glmLi - function(dat,grp) {
  ## run glm : predict grp based on dat
  dat - data.frame(dat=dat,grp=grp)
  glm(grp ~ dat, data=dat, family=binomial)}

logitB - apply(datB,1,glmLi,grpB)
lapply(logitB,summary)
sapply(logitB,function(x) summary(x)$coefficients[,4])  # lines 1  2 are
designed to be 'positive' but give high p-values with convergence problem

## for completness
sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252
LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C   LC_TIME=French_France.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] glm2_1.1.2  MASS_7.3-40 TinnRcom_1.0.18 formatR_1.2
svSocket_0.9-57

loaded via a namespace (and not attached):
[1] tools_3.2.0   svMisc_0.9-70 tcltk_3.2.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speeding up code?

2015-07-16 Thread jim holtman

Here is one improvement.  Avoid dataframes in some of these cases.  This
create a character matrix and then converts to a dataframe after doing the
transpose of the matrix.  This just takes less than 10 seconds on my system:


  library(stringr)
  # create character matrix; avoid dataframes in this case
  print(proc.time())
   user  system elapsed
  15.525.24  587.70
  xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -))
  # convert to dataframe and do transpose on matrix and not dataframe
  separoPairs - as.data.frame(t(xm), stringsAsFactors = FALSE)
  print(proc.time()
+
+ )
   user  system elapsed
  20.905.36  596.57



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez ignaci...@gmail.com
wrote:

 Hi Collin,

 The objective of the gen.names function is to generate N *unique *random
 names, where N is a *large *number. In my computer `gen.names(n = 5)`
 takes under a second, so is probably not the root problem in my code. That
 said, I would love to improve it. I'm not exactly sure how you propose to
 change it using sample. What is the object that I would be sampling? I
 would love to run a little benchmark to compare my version with yours.

 What really takes a long time to run is:

 separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern =
 -))

 So that and the chunk of code before that is probably where I would get big
 gains in speed. Sadly, I have no clue how to do it differently

 Thanks a lot for the help!!

 Ignacio


 On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote:

  Hi Ignacio, If I am reading your code correctly then the top while loop
 is
  essentially seeking to select a random set of names from the original
 set,
  then using unique to reduce it down, you then iterate until you have
 built
  your quota.  Ultimately this results in a very inefficient attempt at
  sampling without replacement.  Why not just sample without replacement
  rather than loop iteratively and use unique?  Or if the set of possible
  names are short enough why not just randomize it and then pull the first
 n
  items off?
 
  Best,
  Collin.
 
  On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez ignaci...@gmail.com
  wrote:
 
  Hi R-Help!
 
  I'm hoping that some of you may give me some tips that could make my
 code
 
  more efficient. More precisely, I would like to make the answer to my
  stakoverflow
  
 
 http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
  
 
 
  question more efficient.
 
  This is the code:
 
  library(dplyr)
  library(randomNames)
  library(geosphere)
 
  set.seed(7142015)# Define Parameters
 
 
  n.Schools - 20
  first.grade-3
  last.grade-5
  n.Grades -last.grade-first.grade+1
  n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
  n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
  teacher
  # Define Random names function:
  gen.names - function(n, which.names = both, name.order =
 last.first){
names - unique(randomNames(n=n, which.names = which.names,
  name.order = name.order))
need - n - length(names)
while(need0){
  names - unique(c(randomNames(n=need, which.names = which.names,
  name.order = name.order), names))
  need - n - length(names)
}
return(names)}
  # Generate n.Schools names
  gen.schools - function(n.schools) {
School.ID -
  paste0(gen.names(n = n.schools, which.names = last), ' School')
School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
School.RE - rnorm(n = n.schools, mean = 0, sd = 1)
Schools -
  data.frame(School.ID, School.lat, School.long, School.RE) %%
  mutate(School.ID = as.character(School.ID)) %%
  rowwise() %%  mutate (School.distance = distHaversine(
p1 = c(School.long, School.lat),
p2 = c(21.7672, 58.8471), r = 3961
  ))
return(Schools)}
 
  Schools - gen.schools(n.schools = n.Schools)
  # Generate Grades
  Grades - c(first.grade:last.grade)
  # Generate n.Classrooms
 
  Classrooms - LETTERS[1:n.Classrooms]
  # Group schools and grades
 
  SchGr - outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
  FUN=paste)#head(SchGr)
  # Group SchGr and Classrooms
 
  SchGrClss - outer(SchGr, paste0(Classrooms, '-'),
  FUN=paste)#head(SchGrClss)
  # These are the combination of  School-Grades-Classroom
  SchGrClssTmp - as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
  SchGrClssEnd - as.data.frame(SchGrClssTmp)
  # Assign n.Teachers (2 classroom in a given school-grade)
  Allpairs - as.data.frame(t(combn(SchGrClssTmp, 2)))
  AllpairsTmp - paste(Allpairs$V1, Allpairs$V2, sep= )
 
  library(stringr)
  separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern =
  -))
  separoPairs -

[R] matching strings in a list

2015-07-16 Thread tryingtolearn

Say I have a list: 
[[1]] I like google
[[2]] Hi Google google 
[[3]] what's up 

and they are tweets. And I want to find out how many tweets mention google
(the answer should be 2). 
If I string split and unlist them, then I would get the answer of 3. How do
I make sure I get just 2? 



--
View this message in context: 
http://r.789695.n4.nabble.com/matching-strings-in-a-list-tp4709967.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speeding up code?

2015-07-16 Thread jim holtman

Actually looking at the result, you don't need the transpose; that was an
artifact of how you were doing it before.

 xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -))
 # convert to dataframe and do transpose on matrix and not dataframe
 separoPairs - as.data.frame((xm), stringsAsFactors = FALSE)




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 16, 2015 at 1:37 PM, jim holtman jholt...@gmail.com wrote:

 Here is one improvement.  Avoid dataframes in some of these cases.  This
 create a character matrix and then converts to a dataframe after doing the
 transpose of the matrix.  This just takes less than 10 seconds on my system:


   library(stringr)
   # create character matrix; avoid dataframes in this case
   print(proc.time())
user  system elapsed
   15.525.24  587.70
   xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -))
   # convert to dataframe and do transpose on matrix and not dataframe
   separoPairs - as.data.frame(t(xm), stringsAsFactors = FALSE)
   print(proc.time()
 +
 + )
user  system elapsed
   20.905.36  596.57
 


 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez ignaci...@gmail.com
 wrote:

 Hi Collin,

 The objective of the gen.names function is to generate N *unique *random
 names, where N is a *large *number. In my computer `gen.names(n = 5)`
 takes under a second, so is probably not the root problem in my code. That
 said, I would love to improve it. I'm not exactly sure how you propose to
 change it using sample. What is the object that I would be sampling? I
 would love to run a little benchmark to compare my version with yours.

 What really takes a long time to run is:

 separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern =
 -))

 So that and the chunk of code before that is probably where I would get
 big
 gains in speed. Sadly, I have no clue how to do it differently

 Thanks a lot for the help!!

 Ignacio


 On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote:

  Hi Ignacio, If I am reading your code correctly then the top while loop
 is
  essentially seeking to select a random set of names from the original
 set,
  then using unique to reduce it down, you then iterate until you have
 built
  your quota.  Ultimately this results in a very inefficient attempt at
  sampling without replacement.  Why not just sample without replacement
  rather than loop iteratively and use unique?  Or if the set of possible
  names are short enough why not just randomize it and then pull the
 first n
  items off?
 
  Best,
  Collin.
 
  On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez ignaci...@gmail.com
 
  wrote:
 
  Hi R-Help!
 
  I'm hoping that some of you may give me some tips that could make my
 code
 
  more efficient. More precisely, I would like to make the answer to my
  stakoverflow
  
 
 http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions

  
 
 
  question more efficient.
 
  This is the code:
 
  library(dplyr)
  library(randomNames)
  library(geosphere)
 
  set.seed(7142015)# Define Parameters
 
 
  n.Schools - 20
  first.grade-3
  last.grade-5
  n.Grades -last.grade-first.grade+1
  n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
  n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
  teacher
  # Define Random names function:
  gen.names - function(n, which.names = both, name.order =
 last.first){
names - unique(randomNames(n=n, which.names = which.names,
  name.order = name.order))
need - n - length(names)
while(need0){
  names - unique(c(randomNames(n=need, which.names = which.names,
  name.order = name.order), names))
  need - n - length(names)
}
return(names)}
  # Generate n.Schools names
  gen.schools - function(n.schools) {
School.ID -
  paste0(gen.names(n = n.schools, which.names = last), ' School')
School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
School.RE - rnorm(n = n.schools, mean = 0, sd = 1)
Schools -
  data.frame(School.ID, School.lat, School.long, School.RE) %%
  mutate(School.ID = as.character(School.ID)) %%
  rowwise() %%  mutate (School.distance = distHaversine(
p1 = c(School.long, School.lat),
p2 = c(21.7672, 58.8471), r = 3961
  ))
return(Schools)}
 
  Schools - gen.schools(n.schools = n.Schools)
  # Generate Grades
  Grades - c(first.grade:last.grade)
  # Generate n.Classrooms
 
  Classrooms - LETTERS[1:n.Classrooms]
  # Group schools and grades
 
  SchGr - outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
  FUN=paste)#head(SchGr)
  # Group SchGr and Classrooms

Re: [R] matrix manipulation

2015-07-16 Thread Peter Langfelder

Hi Terry,

maybe I'm missing something, but why not define a matrix BB = V'B;
then t(B) %*% V = t(BB), then your problem reduces to finding A such
that t(BB) %*% A = 0?

Peter

On Thu, Jul 16, 2015 at 10:28 AM, Therneau, Terry M., Ph.D.
thern...@mayo.edu wrote:
 This is as much a mathematics as an R question, in the this should be easy
 but I don't see it category.

 Assume I have a full rank p by p matrix V  (aside: V = (X'X)^{-1} for a
 particular setup), a p by k matrix B, and I want to complete an orthagonal
 basis for the space with distance function V.  That is, find A such that
 t(B) %*% V %*% A =0, where A has p rows and p-k columns.

 With V=identity this is easy. I can do it in 1-2 lines using qr(), lm(), or
 several other tools.  A part of me is quite certain that the general problem
 isn't more than 3 lines of R, but after a day of beating my head on the
 issue I still don't see it.  Math wise it looks like a simple homework
 problem in a mid level class, but I'm not currently sure that I'd pass said
 class.

 If someone could show the way I would be grateful.  Either that or assurance
 that the problem actually IS hard and I'm not as dense as I think.

 Terry T.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matching strings in a list

2015-07-16 Thread Marc Schwartz


 On Jul 16, 2015, at 12:40 PM, tryingtolearn inshi...@ymail.com wrote:
 
 Say I have a list: 
 [[1]] I like google
 [[2]] Hi Google google 
 [[3]] what's up 
 
 and they are tweets. And I want to find out how many tweets mention google
 (the answer should be 2). 
 If I string split and unlist them, then I would get the answer of 3. How do
 I make sure I get just 2? 


See ?grepl presuming that you just want a count of the matches.

If you want it to also be case insensitive, set 'ignore.case = TRUE’.

For example:

Tweets - list(I like google, Hi Google google, what's up”)

 Tweets
[[1]]
[1] I like google

[[2]]
[1] Hi Google google

[[3]]
[1] what's up”

 sapply(Tweets, function(x) grepl(google, x, ignore.case = TRUE))
[1]  TRUE  TRUE FALSE

 sum(sapply(Tweets, function(x) grepl(google, x, ignore.case = TRUE)))
[1] 2


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matching strings in a list

2015-07-16 Thread Ista Zahn

Why would you strsplit them? I would think

length(grep(google, unlist(x), ignore.case = TRUE))

should do it.

Best,
Ista

On Thu, Jul 16, 2015 at 1:40 PM, tryingtolearn inshi...@ymail.com wrote:
 Say I have a list:
 [[1]] I like google
 [[2]] Hi Google google
 [[3]] what's up

 and they are tweets. And I want to find out how many tweets mention google
 (the answer should be 2).
 If I string split and unlist them, then I would get the answer of 3. How do
 I make sure I get just 2?



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/matching-strings-in-a-list-tp4709967.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matching strings in a list

2015-07-16 Thread John McKown

On Thu, Jul 16, 2015 at 12:40 PM, tryingtolearn inshi...@ymail.com wrote:

 Say I have a list:
 [[1]] I like google
 [[2]] Hi Google google
 [[3]] what's up

 and they are tweets. And I want to find out how many tweets mention google
 (the answer should be 2).
 If I string split and unlist them, then I would get the answer of 3. How do
 I make sure I get just 2?


NROW(grep(google,list,ignore.case=TRUE))


-- 

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] User defined function within a formula

2015-07-16 Thread William Dunlap

This might do what you want:

OPoly - function(x, degree=1, weight=1, coefs=NULL, rangeX=NULL){
  weight - round(weight,0)# weight need to be integer
  if(length(weight)!=length(x)) {
weight - rep(1,length(x))
  }
  if (is.null(rangeX)) {
  rangeX - range(x)
  }
  p - poly(4*(rep(x,weight)-mean(rangeX))/diff(rangeX), degree=degree,
coefs=coefs)
  # why t(t(...))?  That strips the attributes.
  Z - t( t(p[cumsum(weight),]) * sqrt(attr(p,coefs)$norm2[-seq(2)]) )[,
degree, drop=FALSE]
  class(Z) - OPoly
  attr(Z, coefs) - attr(p, coefs)
  attr(Z, rangeX) - rangeX
  Z
}

makepredictcall.OPoly-function(var,call)
{
  if (is.call(call)) {
if (identical(call[[1]], quote(OPoly))) {
  if (!is.null(tmp - attr(var, coefs))) {
call$coefs - tmp
  }
  if (!is.null(tmp - attr(var, rangeX))) {
call$rangeX - tmp
  }
  call$weight - 1 # weight not relevant in predictions
}
  }
  call
}

d - data.frame(Y=1:8, X=log(1:8), Weight=1:8)
fit - lm(data=d, Y ~ OPoly(X, degree=2, weight=Weight))
predict(fit)[c(3,8)]
predict(fit, newdata=data.frame(X=d$X[c(3,8)])) # same result


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 16, 2015 at 4:39 PM, William Dunlap wdun...@tibco.com wrote:

 OPoly-function(x,degree=1,weight=1){
   weight=round(weight,0)# weight need to be integer
   if(length(weight)!=length(x))weight=rep(1,length(x))
   p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
   Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[-
 seq(2)]))[,degree])
   class(Z)-OPoly;Z
 }

 You need to make OPoly to have optional argument(s) that give
 the original-regressor-dependent information to OPoly and then
 have it return, as attributes, the value of those arguments.
  makepredictcall
 will take the attributes and attach them to the call in predvars so
 predict uses values derived from the original regressors, not value derived
 from the data to be predicted from.

 Take a look at a pair like makepredictcall.scale() and scale() for an
 example:
 scale has optional arguments 'center' and 'scale' that it returns as
 attributes
 and makepredictcall.scale adds those to the call to scale that it is given.
 Thus when you predict, the scale and center arguments come from the
 original data, not from the data you are predicting from.






 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin yinkuns...@gmail.com wrote:

 Thanks Bill for your quick reply.

 I tried your solution and it did work for the simple user defined
 function xploly. But when I try with other function, it gave me error again:

 OPoly-function(x,degree=1,weight=1){
   weight=round(weight,0)# weight need to be integer
   if(length(weight)!=length(x))weight=rep(1,length(x))
   p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)

 Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[-seq(2)]))[,degree])
   class(Z)-OPoly;Z
 }

 ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps
 the x to range[-2,2] then do QR, then return the results with sqrt(norm2).
 Comparing with poly, this transformation will make the model coefficients
 within a similar range as other variables, the R poly routine will usually
 give you a very large coefficients. I did not find such routine in R, so I
 have to define this as user defined function.
 ###

 I  also have following function as you suggested:

 makepredictcall.OPoly-function(var,call)
 {
   if (is.call(call)) {
 if (identical(call[[1]], quote(OPoly))) {
   if (!is.null(tmp - attr(var, coefs))) {
 call$coefs - tmp
   }
 }
   }
   call
 }


 But I still got error for following:

  g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma)

  predict(g3,dc)Error in poly(4 * (rep(x, weight) - 
  mean(range(x)))/diff(range(x)), degree) :
   missing values are not allowed in 'poly'

 I thought it might be due to the /diff(range(x) in the function.  But
 even I remove that part, it will still give me error. Any idea?

 Many thanks in advance.

 Alex























 On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap wdun...@tibco.com
 wrote:

 Read about the 'makepredictcall' generic function.  There is a method,
 makepredictcall.poly(), for poly() that attaches the polynomial
 coefficients
 used during the fitting procedure to the call to poly() that predict()
 makes.
 You ought to supply a similar method for your xpoly(), and xpoly() needs
 to return an object of a a new class that will cause that method to be used.

 E.g.,

 xpoly - function(x,degree=1,...){ ret - poly(x,degree=degree,...);
 class(ret) - xpoly ; ret }
 makepredictcall.xpoly - function (var, call)
 {
 if (is.call(call)) {
 if (identical(call[[1]], quote(xpoly))) {
 if (!is.null(tmp - attr(var, coefs))) {
 call$coefs - tmp
 }
 }
 }
 call
 }

 g2 - glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)

Re: [R] powerTransform warning message?

2015-07-16 Thread Bert Gunter

I suggest you consult a local statistician. You are (way) over your
head statistically here, and  statistical matters are off topic on
this list. The brief answer to your question is: you are almost
certainly producing nonsense.

Cheers,
Bert


Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Jul 16, 2015 at 4:35 PM, Brittany Demmitt demmi...@gmail.com wrote:
 Hello,

 I have a series of 40 variables that I am trying to transform via the boxcox 
 method using the powerTransfrom function in R.  I have no zero values in any 
 of my variables.  When I run the powerTransform function on the full data set 
 I get the following warning.

 Warning message:
 In sqrt(diag(solve(res$hessian))) : NaNs produced

 However, when I analyze the variables in groups, rather than all 40 at a time 
 I do not get this warning message.  Why would this be? And does this mean 
 this warning is safe to ignore?

 I would like to add that all of my lambda values are in the -5 to 5 range.  I 
 also get different lambda values when I analyze the variables together versus 
 in groups.  Is this to be expected?

 Thank you so much!

 Brittany
 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] powerTransform warning message?

2015-07-16 Thread John Fox

Dear Brittany,

On Thu, 16 Jul 2015 17:35:38 -0600
 Brittany Demmitt demmi...@gmail.com wrote:
 Hello,
 
 I have a series of 40 variables that I am trying to transform via the boxcox 
 method using the powerTransfrom function in R.  I have no zero values in any 
 of my variables.  When I run the powerTransform function on the full data set 
 I get the following warning. 
 
 Warning message:
 In sqrt(diag(solve(res$hessian))) : NaNs produced
 
 However, when I analyze the variables in groups, rather than all 40 at a time 
 I do not get this warning message.  Why would this be? And does this mean 
 this warning is safe to ignore?
 

No, it is not safe to ignore the warning, and the problem has nothing to do 
with non-positive values in the data -- when you say that there are no 0s in 
the data, I assume that you mean that the data values are all positive. The 
square-roots of the diagonal entries of the Hessian at the (pseudo-) ML 
estimates are the SEs of the estimated transformation parameters. If the 
Hessian can't be inverted, that usually implies that the maximum of the 
(pseudo-) likelihood isn't well defined. 

This isn't surprising when you're trying to transform as many as 40 variables 
at a time to multivariate normality. It's my general experience that people 
often throw their data into the Box-Cox black box and hope for the best without 
first examining the data, and, e.g., insuring a reasonable ratio of 
maximum/minimum values for each variable, checking for extreme outliers, etc. 
Of course, I don't know that you did that, and it's perfectly possible that you 
were careful.

 I would like to add that all of my lambda values are in the -5 to 5 range.  I 
 also get different lambda values when I analyze the variables together versus 
 in groups.  Is this to be expected?
 

Yes. It's very unlikely that both are right. If, e.g., the variables are 
multivariate normal within groups then their marginal distribution is a mixture 
of multivariate normals, which almost surely isn't itself normal.

I hope this helps,
 John


John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/


 Thank you so much!
 
 Brittany
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Operaciones entre conjuntos

2015-07-16 Thread Olivier Nuñez

Estas comparando listas, prueba más bien 

a[,yld:=mapply(setequal,y,z)]

Un saludo. Olivier


- Mensaje original -
De: Patricio Fuenmayor Viteri patricio.fuenma...@outlook.com
Para: r-help-es r-help-es@r-project.org
Enviados: Jueves, 16 de Julio 2015 2:48:35
Asunto: [R-es] Operaciones entre conjuntos

Hola a todos...Estoy tratando de hacer un trabajo de comparacion de conjuntos y 
no entiendo que pasa con los resultados.Me explico. Tengo una columna donde se 
tiene el nombre de una persona, est� ordenado APELLIDOS - NOMBRESa continuaci�n 
tengo el el nombre de la misma persona, pero ordenado NOMBRES - APELLIDOS.El 
proceso debe identificar que las 2 columnas son iguales. Estoy usando 
operaciones entre conjuntos y estructuras data.tableNo entiendo, porque 
haciendo en data.table la comparacion me sale FALSA, es decir no son iguales, 
pero si hago la comparaci�n aparte, sale VERDADEROAdjunto el c�digo... gracias 
por su apoyo...
require(data.table)a - data.table(  x = 1:2,  y = 
list(c(ANDRES,GERARDO,CABRERA,GUAMAN),   
c(MONTALVAN,VERA,JORGE,LEONARDO)),  z = 
list(c(CABRERA,GUAMAN,GERARDO,ANDRES),   
c(JORGE,MONTALVAN,VERA)))
a[,:=(vld=setequal(y,z)),by=x]
setequal(c(ANDRES,GERARDO,CABRERA,GUAMAN),c(CABRERA,GUAMAN,GERARDO,ANDRES))
  
[[alternative HTML version deleted]]


___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

[R] problem with wilcox.test()

2015-07-16 Thread Ivan Calandra


Dear useRs,

I am running a wilcox.test() on two subsets of a dataset and get exactly 
the same results although the raw data are different in the subsets.


mydata - structure(list(cat1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(high, low), 
class = factor), cat2 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c(large, small), 
class = factor), var1 = c(2.012743, 1.51272, 1.328453, 1.2609935, 
1.617757, 1.8175455, 1.890035, 2.3652205, 1.295888, 1.5985145, 1.081813, 
1.856733, 2.366358, 2.27421, 1.727023, 2.230433, 5.272843, 3.7626355), 
var2 = c(0.00196, 0.0066545, 0.006188, 0.0058985, 0.004453, 0.005468, 
0.003773, 0.004742, 0.007525, 0.0081235, 0.004611, 0.0050475, 0.006643, 
0.0097335, 0.009213, 0.0049525, 0.006243, 0.006021)), .Names = c(cat1, 
cat2, var1, var2), row.names = c(NA, 18L), class = data.frame)


#p-values are identical but W different for the first variable
wilcox.test(var1~cat1, data=mydata[mydata$cat2==large,])
wilcox.test(var1~cat1, data=mydata[mydata$cat2==small,])

#both p-values and W are identical for the second variable
wilcox.test(var2~cat1, data=mydata[mydata$cat2==large,])
wilcox.test(var2~cat1, data=mydata[mydata$cat2==small,])

Did I do something wrong or does it just have something to do with my 
dataset? Or is it just a coincidence?


Thank you in advance for your help,
Ivan

--
Ivan Calandra, ATER
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cronbachs alpha and missing values

2015-07-16 Thread John Kane

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 and http://adv-r.had.co.nz/Reproducibility.html

Currently we don't even know what version of Cronbach's alpha you are using.

Also use google as well as help()

Try R statistics Cronbach's alpha  

John Kane
Kingston ON Canada


 -Original Message-
 From: penv...@uni-hamburg.de
 Sent: Wed, 15 Jul 2015 01:50:46 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] cronbachs alpha and missing values
 
 i want to calculate cronbachs alpha for my df. my df has some missing
 values
 so that there are only 23 out of 56 complete cases. if i run alpha on
 only
 the complete cases, i get a value of .79 and if i run it on the whole df,
 I
 get .82. My question is: what does alpha do with those missing values, if
 i
 include the incomplete cases? are they imputed in some way?
 
 
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/cronbachs-alpha-and-missing-values-tp4709885.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords  protects your account.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speeding up code?

2015-07-16 Thread Ignacio Martinez

Hi Collin,

The objective of the gen.names function is to generate N *unique *random
names, where N is a *large *number. In my computer `gen.names(n = 5)`
takes under a second, so is probably not the root problem in my code. That
said, I would love to improve it. I'm not exactly sure how you propose to
change it using sample. What is the object that I would be sampling? I
would love to run a little benchmark to compare my version with yours.

What really takes a long time to run is:

separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern =
-))

So that and the chunk of code before that is probably where I would get big
gains in speed. Sadly, I have no clue how to do it differently

Thanks a lot for the help!!

Ignacio


On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote:

 Hi Ignacio, If I am reading your code correctly then the top while loop is
 essentially seeking to select a random set of names from the original set,
 then using unique to reduce it down, you then iterate until you have built
 your quota.  Ultimately this results in a very inefficient attempt at
 sampling without replacement.  Why not just sample without replacement
 rather than loop iteratively and use unique?  Or if the set of possible
 names are short enough why not just randomize it and then pull the first n
 items off?

 Best,
 Collin.

 On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez ignaci...@gmail.com
 wrote:

 Hi R-Help!

 I'm hoping that some of you may give me some tips that could make my code

 more efficient. More precisely, I would like to make the answer to my
 stakoverflow
 
 http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions
 


 question more efficient.

 This is the code:

 library(dplyr)
 library(randomNames)
 library(geosphere)

 set.seed(7142015)# Define Parameters


 n.Schools - 20
 first.grade-3
 last.grade-5
 n.Grades -last.grade-first.grade+1
 n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
 n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
 teacher
 # Define Random names function:
 gen.names - function(n, which.names = both, name.order = last.first){
   names - unique(randomNames(n=n, which.names = which.names,
 name.order = name.order))
   need - n - length(names)
   while(need0){
 names - unique(c(randomNames(n=need, which.names = which.names,
 name.order = name.order), names))
 need - n - length(names)
   }
   return(names)}
 # Generate n.Schools names
 gen.schools - function(n.schools) {
   School.ID -
 paste0(gen.names(n = n.schools, which.names = last), ' School')
   School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
   School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
   School.RE - rnorm(n = n.schools, mean = 0, sd = 1)
   Schools -
 data.frame(School.ID, School.lat, School.long, School.RE) %%
 mutate(School.ID = as.character(School.ID)) %%
 rowwise() %%  mutate (School.distance = distHaversine(
   p1 = c(School.long, School.lat),
   p2 = c(21.7672, 58.8471), r = 3961
 ))
   return(Schools)}

 Schools - gen.schools(n.schools = n.Schools)
 # Generate Grades
 Grades - c(first.grade:last.grade)
 # Generate n.Classrooms

 Classrooms - LETTERS[1:n.Classrooms]
 # Group schools and grades

 SchGr - outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'),
 FUN=paste)#head(SchGr)
 # Group SchGr and Classrooms

 SchGrClss - outer(SchGr, paste0(Classrooms, '-'),
 FUN=paste)#head(SchGrClss)
 # These are the combination of  School-Grades-Classroom
 SchGrClssTmp - as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) )
 SchGrClssEnd - as.data.frame(SchGrClssTmp)
 # Assign n.Teachers (2 classroom in a given school-grade)
 Allpairs - as.data.frame(t(combn(SchGrClssTmp, 2)))
 AllpairsTmp - paste(Allpairs$V1, Allpairs$V2, sep= )

 library(stringr)
 separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern =
 -))
 separoPairs - as.data.frame(t(separoPairs))
 row.names(separoPairs) - NULL
 separoPairs - separoPairs %% select(-V7)  %%  #Drops empty column
   mutate(V1=as.character(V1), V4=as.character(V4), V2=as.numeric(V2),
 V5=as.numeric(V5)) %% mutate(V4 = trimws(V4, which = both))

 separoPairs[120,]$V4#Only the rows with V1=V4 and V2=V5 are valid


 validPairs - separoPairs %% filter(V1==V4  V2==V5) %% select(V1, V2,
 V3, V6)
 # Generate n.Teachers

 gen.teachers - function(n.teachers){
   Teacher.ID - gen.names(n = n.teachers, name.order = last.first)
   Teacher.exp - runif(n = n.teachers, min = 1, max = 30)
   Teacher.Other - sample(c(0,1), replace = T, prob = c(0.5, 0.5),
 size = n.teachers)
   Teacher.RE - rnorm(n = n.teachers, mean = 0, sd = 1)
   Teachers - data.frame(Teacher.ID, Teacher.exp, Teacher.Other,
 Teacher.RE)
   return(Teachers)}
 Teachers - gen.teachers(n.teachers = n.Teachers) %%
   mutate(Teacher.ID = as.character(Teacher.ID))
 # Randomly assign n.Teachers teachers to the ValidPairs

Re: [R] Problem Regarding R Packages installation

2015-07-16 Thread MacQueen, Don

the first error message is
  configuration failed for package RCurl
immediately before that it said
  Cannot find curl-config.

It would appear that you don't have curl installed on your computer.
Unfortunately, I cannot help you with that. But you might look at the
documentation for RCurl (available from CRAN) to start trying to find
where to get curl.

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 7/15/15, 8:23 PM, R-help on behalf of Muhammad Sohail Raza
r-help-boun...@r-project.org on behalf of muhammadsohailr...@live.com
wrote:

Hi Everyone!
I am trying to install R Package VariantAnnotation. I entered commands:

source(http://bioconductor.org/biocLite.R;)
 biocLite(VariantAnnotation)


But i am getting the following errors,

ERROR: configuration failed for package �XML�
* removing �/usr/lib64/R/library/XML�
* installing *source* package �RCurl� ...
** package �RCurl� successfully unpacked and MD5 sums checked
configure: loading site script /usr/share/site/x86_64-unknown-linux-gnu
checking for curl-config... no
Cannot find curl-config
ERROR: configuration failed for package �RCurl�
* removing �/usr/lib64/R/library/RCurl�
* installing *source* package �Rsamtools� ...
** libs
gcc -std=gnu99 -I/usr/lib64/R/include -DNDEBUG  -I/usr/local/include
-I/usr/lib64/R/library/S4Vectors/include
-I/usr/lib64/R/library/IRanges/include
-I/usr/lib64/R/library/XVector/include
-I/usr/lib64/R/library/Biostrings/include  -fopenmp -D_USE_KNETFILE
-D_FILE_OFFSET_BITS=64 -U_FORTIFY_SOURCE -DBGZF_CACHE
-Dfprintf=_samtools_fprintf -Dexit=_samtools_exit -Dabort=_samtools_abort
-I./samtools -I./samtools/bcftools -I./tabix -fpic  -fmessage-length=0
-grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector
-funwind-tables -fasynchronous-unwind-tables  -c Biostrings_stubs.c -o
Biostrings_stubs.o
gcc -std=gnu99 -I/usr/lib64/R/include -DNDEBUG  -I/usr/local/include
-I/usr/lib64/R/library/S4Vectors/include
-I/usr/lib64/R/library/IRanges/include
-I/usr/lib64/R/library/XVector/include
-I/usr/lib64/R/library/Biostrings/include  -fopenmp -D_USE_KNETFILE
-D_FILE_OFFSET_BITS=64 -U_FORTIFY_SOURCE -DBGZF_CACHE
-Dfprintf=_samtools_fprintf -Dexit=_samtools_exit -Dabort=_samtools_abort
-I./samtools -I./samtools/bcftools -I./tabix -fpic  -fmessage-length=0
-grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector
-funwind-tables -fasynchronous-unwind-tables  -c IRanges_stubs.c -o
IRanges_stubs.o
g++ -I/usr/lib64/R/include -DNDEBUG  -I/usr/local/include
-I/usr/lib64/R/library/S4Vectors/include
-I/usr/lib64/R/library/IRanges/include
-I/usr/lib64/R/library/XVector/include
-I/usr/lib64/R/library/Biostrings/include   -fpic  -fmessage-length=0
-grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector
-funwind-tables -fasynchronous-unwind-tables  -c PileupBuffer.cpp -o
PileupBuffer.o
/bin/sh: g++: command not found
/usr/lib64/R/etc/Makeconf:143: recipe for target 'PileupBuffer.o' failed
make: *** [PileupBuffer.o] Error 127
ERROR: compilation failed for package �Rsamtools�
* removing �/usr/lib64/R/library/Rsamtools�
* installing *source* package �futile.logger� ...
** package �futile.logger� successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (futile.logger)
ERROR: dependencies �XML�, �RCurl� are not available for package �biomaRt�
* removing �/usr/lib64/R/library/biomaRt�
* installing *source* package �BiocParallel� ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (BiocParallel)
ERROR: dependency �Rsamtools� is not available for package
�GenomicAlignments�
* removing �/usr/lib64/R/library/GenomicAlignments�
ERROR: dependencies �XML�, �RCurl�, �Rsamtools�, �GenomicAlignments� are
not available for package �rtracklayer�
* removing �/usr/lib64/R/library/rtracklayer�
ERROR: dependencies �rtracklayer�, �Rsamtools� are not available for
package �BSgenome�
* removing �/usr/lib64/R/library/BSgenome�
ERROR: dependencies �rtracklayer�, �biomaRt�, �RCurl� are not available
for package �GenomicFeatures�
* removing �/usr/lib64/R/library/GenomicFeatures�
ERROR: dependencies �Rsamtools�, �BSgenome�, �rtracklayer�,
�GenomicFeatures� are not available for package �VariantAnnotation�
* removing �/usr/lib64/R/library/VariantAnnotation�

The downloaded source packages are in
�/tmp/RtmpvpoYjO/downloaded_packages�
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning messages:
1: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package �XML� had non-zero exit status
2: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package �RCurl� had non-zero exit status
3: In install.packages(pkgs = doing,

Re: [R] problem with wilcox.test()

2015-07-16 Thread peter dalgaard


 On 16 Jul 2015, at 15:13 , Ivan Calandra ivan.calan...@univ-reims.fr wrote:
 
 Dear useRs,
 
 I am running a wilcox.test() on two subsets of a dataset and get exactly the 
 same results although the raw data are different in the subsets.
 
 mydata - structure(list(cat1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(high, low), class = 
 factor), cat2 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
 1L, 2L, 1L, 2L, 1L, 2L), .Label = c(large, small), class = factor), 
 var1 = c(2.012743, 1.51272, 1.328453, 1.2609935, 1.617757, 1.8175455, 
 1.890035, 2.3652205, 1.295888, 1.5985145, 1.081813, 1.856733, 2.366358, 
 2.27421, 1.727023, 2.230433, 5.272843, 3.7626355), var2 = c(0.00196, 
 0.0066545, 0.006188, 0.0058985, 0.004453, 0.005468, 0.003773, 0.004742, 
 0.007525, 0.0081235, 0.004611, 0.0050475, 0.006643, 0.0097335, 0.009213, 
 0.0049525, 0.006243, 0.006021)), .Names = c(cat1, cat2, var1, var2), 
 row.names = c(NA, 18L), class = data.frame)
 
 #p-values are identical but W different for the first variable
 wilcox.test(var1~cat1, data=mydata[mydata$cat2==large,])
 wilcox.test(var1~cat1, data=mydata[mydata$cat2==small,])
 
 #both p-values and W are identical for the second variable
 wilcox.test(var2~cat1, data=mydata[mydata$cat2==large,])
 wilcox.test(var2~cat1, data=mydata[mydata$cat2==small,])
 
 Did I do something wrong or does it just have something to do with my 
 dataset? Or is it just a coincidence?

Coincidence, mostly, I think:

You have

 table(mydata[mydata$cat2==small,cat1])

high  low 
   45 

 table(mydata[mydata$cat2==large,cat1])

high  low 
   45 

and all of your response variables' values are distinct.

In both cases, the null distribution of the rank sum W is that of 
(sum(sample(1:9,4))-sum(1:4)) which is a distribution on 0:20, symmetric around 
10. Hence there are only 11 different p-values possible, so it is not 
particularly odd that you may get the same one twice.


 
 Thank you in advance for your help,
 Ivan
 
 -- 
 Ivan Calandra, ATER
 University of Reims Champagne-Ardenne
 GEGENAA - EA 3795
 CREA - 2 esplanade Roland Garros
 51100 Reims, France
 +33(0)3 26 77 36 89
 ivan.calan...@univ-reims.fr
 https://www.researchgate.net/profile/Ivan_Calandra
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2015-07-16 Thread prabir das

I am trying to analyse time-series .netcdf (3D lat,long and time domain)
climate data. I want to apply the SPEI package (calculation of standardized
precipitation evapotranspiration index) on it. But unable to arrange my data
in the required data frame. As I am a beginner in R, it will be very much
helpful if someone provide me the details of the code to be written before
executing the package.

The details of SPEI proggrame is as follows:

spei(data, scale, kernel = list(type = 'rectangular', shift = 0),
distribution = 'log-Logistic', fit = 'ub-pwm', na.rm = FALSE,
ref.start=NULL, ref.end=NULL, x=FALSE, params=NULL, ...)


Thanks in advance.

-- 
Prabir Kumar Das
Scientist/Engineer - SD
RRSC - East
NRSC, ISRO
Dept. of Space, Govt. of India
Kolkata

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R 3.2.0 - Windows 8, 1 - SSL certificate problem installing course in swirl

2015-07-16 Thread MH

Hello,

I try to install a course for swirl and got a SSL problem:

 install_from_swirl(R Programming)
Error in function (type, msg, asError = TRUE)  : 
  SSL certificate problem: self signed certificate in certificate chain

Found this answer with Google but does not work either:

 set_config( config( ssl.verifypeer = 0L ) )
Error: could not find function set_config

What do I do to avoid or accept the SSL certificate?

 

Kind regards,

 

Martin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls singular gradient matrix - fit parameters in integral's upper limits

2015-07-16 Thread ProfJCNash

The list rejects almost all attachments.

You could dput the data and put it in your posting.

You may also want to try a Marquardt solver. In R from my nlmrt or
compiled in Kate Mullen's minpack.lm. They are slightly different in
flavour and the call is a bit different from nls.

JN

On 15-07-16 08:21 AM, Laura Teresa Corredor Bohórquez wrote:
 ---The las post rejected two files I had attached, so I modified
 it.---
 
 
 Hi. I am trying to make a nls fit for a little bit complicated expression that
 includes two integrals with two of the fit parameters in their upper limits.
 
 I got the error Error in nlsModel(formula, mf, start, wts) :
 singular gradient
 matrix at initial parameter estimates. First of all, I have searched
 already in the previous answers, but didn´t help. The parameters 
 initialization
 seems to be ok, I have tried to change the parameters but no one works. If
 my function has just one integral everything works very nicely, but when 
 adding
 a second integral term just got the error. I don´t believe the function is
 over-parametrized, as I have performed other fits with much more parameters
 and they worked. I did try to enclose the data but the attachment was
 rejected.
 
 The minimal example is the following:
 
 # read the data from a csv file
 dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE)
 x = 0*(1:97)
 y = 0*(1:97)
 for(i in 1:97){
   x[i] = dados[i,1]
   y[i] = dados[i,2]
 }
 integrand - function(X) {
   return(X^4/(2*sinh(X/2))^2)
 }
 fitting = function(T1, T2, N, D, x){
   int1 = integrate(integrand, lower=0, upper = T1)$value
   int2 = integrate(integrand, lower=0, upper = T2)$value
   return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2
 )+(448.956*(x/T1)^3*int1)+(299.304*(x/T2)^3*int2))
 }
 fit = nls(y ~ fitting(T1, T2, N, D, x),
 start=list(T1=400,T2=200,N=0.01,D=2))
 
 --For reference, the fit that worked is the following:
 
 # read the data from a csv file
 dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE)
 x = 0*(1:97)
 y = 0*(1:97)
 for(i in 1:97){
   x[i] = dados[i,1]
   y[i] = dados[i,2]
 }
 integrand - function(X) {
   return(X^4/(2*sinh(X/2))^2)
 }
 fitting = function(T1, N, D, x){
   int = integrate(integrand, lower=0, upper = T1)$value
   return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(748.26)*(x/T1)^3*int)
 }
 fit = nls(y ~ fitting(T1 , N, D, x), start=list(T1=400,N=0.01,D=2))
 
 
 I cannot figure out what happen. I need to perform this fit for three
 integral components, but even for two I have this problem. I appreciate so
 much your help. Thank you.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nls singular gradient matrix - fit parameters in integral's upper limits

2015-07-16 Thread Laura Teresa Corredor Bohórquez

---The las post rejected two files I had attached, so I modified
it.---


Hi. I am trying to make a nls fit for a little bit complicated expression that
includes two integrals with two of the fit parameters in their upper limits.

I got the error Error in nlsModel(formula, mf, start, wts) :
singular gradient
matrix at initial parameter estimates. First of all, I have searched
already in the previous answers, but didn´t help. The parameters initialization
seems to be ok, I have tried to change the parameters but no one works. If
my function has just one integral everything works very nicely, but when adding
a second integral term just got the error. I don´t believe the function is
over-parametrized, as I have performed other fits with much more parameters
and they worked. I did try to enclose the data but the attachment was
rejected.

The minimal example is the following:

# read the data from a csv file
dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE)
x = 0*(1:97)
y = 0*(1:97)
for(i in 1:97){
  x[i] = dados[i,1]
  y[i] = dados[i,2]
}
integrand - function(X) {
  return(X^4/(2*sinh(X/2))^2)
}
fitting = function(T1, T2, N, D, x){
  int1 = integrate(integrand, lower=0, upper = T1)$value
  int2 = integrate(integrand, lower=0, upper = T2)$value
  return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2
)+(448.956*(x/T1)^3*int1)+(299.304*(x/T2)^3*int2))
}
fit = nls(y ~ fitting(T1, T2, N, D, x),
start=list(T1=400,T2=200,N=0.01,D=2))

--For reference, the fit that worked is the following:

# read the data from a csv file
dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE)
x = 0*(1:97)
y = 0*(1:97)
for(i in 1:97){
  x[i] = dados[i,1]
  y[i] = dados[i,2]
}
integrand - function(X) {
  return(X^4/(2*sinh(X/2))^2)
}
fitting = function(T1, N, D, x){
  int = integrate(integrand, lower=0, upper = T1)$value
  return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(748.26)*(x/T1)^3*int)
}
fit = nls(y ~ fitting(T1 , N, D, x), start=list(T1=400,N=0.01,D=2))


I cannot figure out what happen. I need to perform this fit for three
integral components, but even for two I have this problem. I appreciate so
much your help. Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weighted skewness and curtosis

2015-07-16 Thread Dimitri Liakhovitski

Is there an R package that allows one to calculate skewness and
curtosis - but weighted with individual level weights (one weight per
observation)?

Thank you!

-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] removing the columns with 0 or NA or 1or NA or 2 or NA

2015-07-16 Thread Lida Zeighami

I have ma matrix which its elements are NA,0,1,2 ! I got my answer bout
removing the columns with 0 or NA or both values but now I want to add
additional condition for deleting the columns! I have to delete the columns
which contain the same value. delete the columns with NA or 0 or both and
the columns with NA or 1 or both and the column with NA or 2 or both (I
should keep the columns which have variation in their values)! I use this
code but didn't work properly:

mat_nonNA- mat[, !apply((is.na(mat) | mat == 0)  (is.na(mat) | mat==1) (
is.na(mat) | mat==2), 2, all)]

mat
 1:1105901701:110888172 1:110906406   1:110993854
 1:110996710   1:44756
A05363   1   1 1
  2 NA
0
A05370   0   1
0NA 0 NA
A05380   1
 NA   2NA
  NA
0
A05397   01
0NA 0   2
A05400   21
0 2   0   0
A05426
0   NA NA NA
0   1

my out put should be like below:

   1:110590170 1:110906406  1:44756
A05363   1 1
0
A05370   0  0
NA
A05380   1  2
0
A05397   0
0 2
A05400   2  0
  0
A05426   0 NA
1

Thanks for your help

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 3.2.0 - Windows 8, 1 - SSL certificate problem installing course in swirl

2015-07-16 Thread Duncan Murdoch

On 16/07/2015 3:08 AM, MH wrote:
 Hello,
 
 I try to install a course for swirl and got a SSL problem:
 
 install_from_swirl(R Programming)
 Error in function (type, msg, asError = TRUE)  : 
   SSL certificate problem: self signed certificate in certificate chain
 
 Found this answer with Google but does not work either:
 
 set_config( config( ssl.verifypeer = 0L ) )
 Error: could not find function set_config
 
 What do I do to avoid or accept the SSL certificate?

We don't know what install_from_swirl is doing, so it's hard to say.
You should ask its author.

If it is using the wininet method to download from an https site, then
it is Windows that is issuing the error:  you may be able to configure
Internet Explorer to avoid it.  Ask Microsoft how.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted skewness and curtosis

2015-07-16 Thread Dimitri Liakhovitski

Unfortunately not - more like 0.7654, 1.2345.
I understand that I could multiply each number by 100, round it to no
decimal point and then unroll my data in proportion.
I was just hoping someone has done it in C and put it into a package...

On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius
dwinsem...@comcast.net wrote:

 On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote:

 Is there an R package that allows one to calculate skewness and
 curtosis - but weighted with individual level weights (one weight per
 observation)?


 Integer weights?

 --

 David Winsemius
 Alameda, CA, USA




-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted skewness and curtosis

2015-07-16 Thread Bert Gunter

rep() **does** do it essentially in C !!

See also the moments package, which I found instantly by googling
sample moments in R, though I don't know whether it does what you
want (but probably shouldn't do).

Of course, sample skewness and kurtosis are basically useless, but
that's another, off topic, issue.


Cheers,
Bert


Cheers,
Bert
Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Jul 16, 2015 at 8:37 AM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Unfortunately not - more like 0.7654, 1.2345.
 I understand that I could multiply each number by 100, round it to no
 decimal point and then unroll my data in proportion.
 I was just hoping someone has done it in C and put it into a package...

 On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius
 dwinsem...@comcast.net wrote:

 On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote:

 Is there an R package that allows one to calculate skewness and
 curtosis - but weighted with individual level weights (one weight per
 observation)?


 Integer weights?

 --

 David Winsemius
 Alameda, CA, USA




 --
 Dimitri Liakhovitski

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted skewness and curtosis

2015-07-16 Thread David Winsemius


On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote:

 Is there an R package that allows one to calculate skewness and
 curtosis - but weighted with individual level weights (one weight per
 observation)?
 

Integer weights?

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NLOPTR

2015-07-16 Thread Olu Ola via R-help

Hello,
I have been running a nonlinear GMM using the nloptr wrapper since the last 7 
days. The maximum time I have to run the code on the server that I am using to 
run this code is 7 days which expires in about an hour when the server 
automatically terminates it. I will like to know if there is a way that I can 
obtain the estimates at the point of termination so that I can use these 
estimates as the new starting values for another round of estimation.

Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted skewness and curtosis

2015-07-16 Thread Bert Gunter

David:

I think you missed my point. As I understand him, Dmitri mentioned
rounding off to e.g. 2 decimal digits and multiplying by 100 to
produce integer weights, which would then lead to unrolling the
vector via rep().  Your weighted moments reference is probably
closer to what he sought, however.

Cheers,
Bert

Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Jul 16, 2015 at 9:47 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Jul 16, 2015, at 8:37 AM, Dimitri Liakhovitski wrote:

 Unfortunately not - more like 0.7654, 1.2345.
 I understand that I could multiply each number by 100, round it to no
 decimal point and then unroll my data in proportion.
 I was just hoping someone has done it in C and put it into a package...

 I saw Bert's comments but I don't think that `rep` handles fractional 
 weights. Take a look at Hmisc::wtd.var (since I have that package loaded and 
 see that code is easy to grasp).

  Also appears after after using sos::findFn(weighted moments) that the 
 package:lmomco deserves your attention.


 On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius
 dwinsem...@comcast.net wrote:

 On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote:

 Is there an R package that allows one to calculate skewness and
 curtosis - but weighted with individual level weights (one weight per
 observation)?


 Integer weights?

 --

 David Winsemius


 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Conservar el nombre de la variable entre varias funciones: ejemplos de resultados: solucion defmacro

2015-07-16 Thread Griera

Hola:

He probado otra solución: con defmacro del paquete gtools:

=
DATOS - data.frame(SE=c(M, H, M, M, H),
ED=c(50, 60, 20, 18, 30),
GRP=c(B, B, A, A, B))
library(gtools)
MRL - defmacro(XDADES, XVD, XVI, expr = 
  {
RL - glm(XVD ~ XVI, family=binomial, data=XDADES)
print(summary(RL))
  
  })
=

Si se ejecuta aparecen los nombres reales de las variables y no el nombre de 
los argumentos:

=
 MRL(XDADES=DATOS, XVD=SE, XVI=ED)  

Call:
glm(formula = SE ~ ED, family = binomial, data = DATOS)

Deviance Residuals: 
  12345  
 1,3511  -0,7869   0,6512   0,6159  -1,5435  

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)  2,671702,60538   1,0250,305
ED  -0,061420,06326  -0,9710,332
[borrado]
=

En cambio en forma de función en los resultados aparece el nombre de los 
argumentos:

=
FRL - function(XVD, XVI)
  {
RL - glm(XVD ~ XVI, family=binomial)
print(summary(RL))
  }
FRL(XVD=SE, XVI=ED)  

Call:
glm(formula = XVD ~ XVI, family = binomial)

Deviance Residuals: 
  12345  
 1,3511  -0,7869   0,6512   0,6159  -1,5435  

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)  2,671702,60538   1,0250,305
XVI -0,061420,06326  -0,9710,332
[borrado]
=

Saludos!


On Wed, 15 Jul 2015 20:26:46 +0200
Griera gri...@yandex.com wrote:

 Hola:
 
 On Wed, 15 Jul 2015 00:18:32 +0200
 Carlos Ortega c...@qualityexcellence.es wrote:
 [borro]
  Sobre la duda de los nombres, si le pasas el data.frame tal cual, te
  debiera de conservar los nombres.
 
 Tienes razón. Ahora le paso el nombre del data.frame, y ya muestra el nombre 
 de la variable analizada.
 
 Muchas gracias por la sugerencia. 
 
 Saludos!
 
  Si no es así, pásale como argumento adicional a las funciones los nombres
  de las columnas/variables...
  
  Saludos,
  Carlos.
  
  
  El 14 de julio de 2015, 22:49, Griera gri...@yandex.com escribió:
  
   Hola Carlos:
  
   Te adjunto un ejemplo de aplicación: las funciones (he borrado los path de
   las funciones y las ordenes source() que las carga ) y un ejemplo para
   ejecutarlas para las opciones que tengo implementadas con la tabla de 
   datos
   birthwt del paqueteMASS:
   - Descriptiva de todas las variables de una tabla.
   - Análisis univariado de todas las variables de una tabla cruzadas con una
   variable dependiente cualitativa.
  
   =Inicio funciones 
   ##--
   ## DESUNI
   ##--
   DESUNI = function(XDADES,
 XDROP=NULL,
 XVD=NULL,
 XSPV=NULL # Si és una anàlisi de SPV # Pot tenir el
   valor TRUE
 )
 {
 options(digits = 3, OutDec=,, scipen=999)
 ## No existeix VD: descriptiva
 if(is.null(XVD))   # No existeix VD: descriptiva
   {
 cat(\n*** Descriptiva (no existeix variable dependent)\n)
 DES(XDADES=XDADES, XDROP=XDROP,
 XCAMIF=XCAMIF)
   }
 ## Existeis VD: anàlisi univariat
 else   # Existeis VD: anàlisi univariat
   {
 UNI(XDADES=XDADES, XDROP=XDROP, XVD=XVD, XSPV=XSPV,
 XCAMIF=XCAMIF)
   }
 }
  
   ##--
   ## DES: Descriptiva de todas las variables
   ##--
   DES = function(XDADES,  XDROP=NULL,
  XCAMIF)
 {
   ifelse(is.null(XDROP), DADES_S - XDADES, DADES_S - XDADES[,
   setdiff(names(XDADES), XDROP) ]) # setdiff Selecciona les variables de
   XDADES que són diferents de XDROP
   attach(DADES_S, warn.conflicts = F)
   XVARLLI=names(DADES_S)
   for (XVARNOM in names(DADES_S))
 {
 if(is.numeric(get(XVARNOM)))
   {
   DES_QUANTI (XVARNOM)
   }
 else if(is.factor(get(XVARNOM)))
   {
   DES_QUALI (XVARNOM)
   }
 else
   {
   cat(La variable , XVARNOM, no és de cap dels tipus coneguts,
   \n)
   }
 }
   # Fi de la funció
   detach(DADES_S)
 }
   ##--
   ## DES_QUANTI: Descriptiva variables factor
   ##--
   DES_QUANTI -
 function(X) {
   OP - par(no.readonly = TRUE); # save old parameters
   par(mfrow=c(1,3))
   hist(get(X),main=c(Histograma de, X), xlab=X);rug(get(X))
   boxplot(get(X), main=c(Diagrama de caixa de, X),

Re: [R] NLOPTR

2015-07-16 Thread Bert Gunter

1. I have no idea.

2. However, I doubt that your strategy would work anyway. If there is
not an outright error, you are probably stuck in some endless loop or
are wandering around at random on an essentially flat hypersurface.
You would need to change convergence criteria or change the
parameterization of and/or simplify your model before proceeding if
that is the case. Although I would have thought you would have run up
against some sort of maximum iterations limit...

Cheers,
Bert
Bert Gunter

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
   -- Clifford Stoll


On Thu, Jul 16, 2015 at 10:11 AM, Olu Ola via R-help
r-help@r-project.org wrote:
 Hello,
 I have been running a nonlinear GMM using the nloptr wrapper since the last 7 
 days. The maximum time I have to run the code on the server that I am using 
 to run this code is 7 days which expires in about an hour when the server 
 automatically terminates it. I will like to know if there is a way that I can 
 obtain the estimates at the point of termination so that I can use these 
 estimates as the new starting values for another round of estimation.

 Thank you.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rgl 3d surface

2015-07-16 Thread S Ellison



 -Original Message-
  I compute its regression surface doing polynomical regression (fit)
  ...
  fit - lm(z ~ poly(x,2) + poly(y,2))
 
.
  So I want to repressent the surface
 
  How could I do it? Any idea??
 
 You need to write a function f of x and y that produces the fitted values.  I
 haven't checked, but I'd assume it needs to take vector inputs and produce a
 vector of responses.  

Perhaps worth noting that since the fit was produced by lm(), predict() will 
generate the surface values without a separate function.

For example, if we said something like [sorry, not yet checked]

xvals - seq(-35, -15, 0.5)
yvals - seq(-35, -15, 0.5)
new.data - data.frame(x=rep(xvals, each=length(yvals)), y=rep(yvals, 
length(xvals) )
new.data$z - predict(fit, newdata=new.data)
#This should generate the predicted surface

#Then:
with(new.data, persp3d(x, y, z)) 

#ought to do the job.



 Then
 
 
 persp3d(f)
 
 will draw the surface.  See ?persp3d.function for details on setting the x 
 and y
 ranges, etc.
 
 Duncan Murdoch
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted skewness and curtosis

2015-07-16 Thread Dimitri Liakhovitski

Thank you!

On Thu, Jul 16, 2015 at 11:45 AM, Bert Gunter bgunter.4...@gmail.com wrote:
 rep() **does** do it essentially in C !!

 See also the moments package, which I found instantly by googling
 sample moments in R, though I don't know whether it does what you
 want (but probably shouldn't do).

 Of course, sample skewness and kurtosis are basically useless, but
 that's another, off topic, issue.


 Cheers,
 Bert


 Cheers,
 Bert
 Bert Gunter

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
-- Clifford Stoll


 On Thu, Jul 16, 2015 at 8:37 AM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Unfortunately not - more like 0.7654, 1.2345.
 I understand that I could multiply each number by 100, round it to no
 decimal point and then unroll my data in proportion.
 I was just hoping someone has done it in C and put it into a package...

 On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius
 dwinsem...@comcast.net wrote:

 On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote:

 Is there an R package that allows one to calculate skewness and
 curtosis - but weighted with individual level weights (one weight per
 observation)?


 Integer weights?

 --

 David Winsemius
 Alameda, CA, USA




 --
 Dimitri Liakhovitski

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted skewness and curtosis

2015-07-16 Thread David Winsemius


On Jul 16, 2015, at 8:37 AM, Dimitri Liakhovitski wrote:

 Unfortunately not - more like 0.7654, 1.2345.
 I understand that I could multiply each number by 100, round it to no
 decimal point and then unroll my data in proportion.
 I was just hoping someone has done it in C and put it into a package...

I saw Bert's comments but I don't think that `rep` handles fractional weights. 
Take a look at Hmisc::wtd.var (since I have that package loaded and see that 
code is easy to grasp).

 Also appears after after using sos::findFn(weighted moments) that the 
package:lmomco deserves your attention.

 
 On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius
 dwinsem...@comcast.net wrote:
 
 On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote:
 
 Is there an R package that allows one to calculate skewness and
 curtosis - but weighted with individual level weights (one weight per
 observation)?
 
 
 Integer weights?
 
 --
 
 David Winsemius


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix manipulation -solved

2015-07-16 Thread Therneau, Terry M., Ph.D.


Yes it is obvious --- once someone else pointed it out.
Thanks for the hint.

Terry T.


On 07/16/2015 12:52 PM, Peter Langfelder wrote:

Hi Terry,

maybe I'm missing something, but why not define a matrix BB = V'B;
then t(B) %*% V = t(BB), then your problem reduces to finding A such
that t(BB) %*% A = 0?

Peter

On Thu, Jul 16, 2015 at 10:28 AM, Therneau, Terry M., Ph.D.
thern...@mayo.edu wrote:

This is as much a mathematics as an R question, in the this should be easy
but I don't see it category.

Assume I have a full rank p by p matrix V  (aside: V = (X'X)^{-1} for a
particular setup), a p by k matrix B, and I want to complete an orthagonal
basis for the space with distance function V.  That is, find A such that
t(B) %*% V %*% A =0, where A has p rows and p-k columns.

With V=identity this is easy. I can do it in 1-2 lines using qr(), lm(), or
several other tools.  A part of me is quite certain that the general problem
isn't more than 3 lines of R, but after a day of beating my head on the
issue I still don't see it.  Math wise it looks like a simple homework
problem in a mid level class, but I'm not currently sure that I'd pass said
class.

If someone could show the way I would be grateful.  Either that or assurance
that the problem actually IS hard and I'm not as dense as I think.

Terry T.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] User defined function within a formula

2015-07-16 Thread William Dunlap

Read about the 'makepredictcall' generic function.  There is a method,
makepredictcall.poly(), for poly() that attaches the polynomial coefficients
used during the fitting procedure to the call to poly() that predict()
makes.
You ought to supply a similar method for your xpoly(), and xpoly() needs to
return an object of a a new class that will cause that method to be used.

E.g.,

xpoly - function(x,degree=1,...){ ret - poly(x,degree=degree,...);
class(ret) - xpoly ; ret }
makepredictcall.xpoly - function (var, call)
{
if (is.call(call)) {
if (identical(call[[1]], quote(xpoly))) {
if (!is.null(tmp - attr(var, coefs))) {
call$coefs - tmp
}
}
}
call
}

g2 - glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
predict(g2,dc)
# 1  2  3  4  5
#-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
#-0.01398928608
# 6  7  8  9
#-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608

You can see the effects of makepredictcall() in the 'terms' component
of glm's output.  The 'variables' attribute of it gives the original
function
calls and the 'predvars' attribute gives the calls to be used for
prediction:
attr(g2$terms, variables)
   list(lot1, log(u), xpoly(u, 1))
   attr(g2$terms, predvars)
  list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1,
  9, 8850



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin yinkuns...@gmail.com wrote:

 Hello, I have a question about the formula and the user defined function:

 I can do following:
 ###Case 1:
  clotting - data.frame(
 + u = c(5,10,15,20,30,40,60,80,100),
 + lot1 = c(118,58,42,35,27,25,21,19,18),
 + lot2 = c(69,35,26,21,18,16,13,12,12))
  g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
  dc=clotting
  dc$u=1
  predict(g1,dc)
   1   2   3   4   5
 6   7   8   9
 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
 -0.01398929 -0.01398929 -0.01398929

 However, if I just simply wrap the poly as a user defined function ( in
 reality I would have my own more complex function)  then I will get error:
 ###Case 2:
  xpoly-function(x,degree=1){poly(x,degree)}
  g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
  predict(g2,dc)
 Error in poly(x, degree) :
   'degree' must be less than number of unique points

 It seems that the predict always treat the user defined function in the
 formula with I().  My question is how can I get the  results for Case2 same
 as case1?

 Anyone can have any idea about this?

 Thank you very much.

 Alex

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] User defined function within a formula

2015-07-16 Thread Kunshan Yin

Hello, I have a question about the formula and the user defined function:

I can do following:
###Case 1:
 clotting - data.frame(
+ u = c(5,10,15,20,30,40,60,80,100),
+ lot1 = c(118,58,42,35,27,25,21,19,18),
+ lot2 = c(69,35,26,21,18,16,13,12,12))
 g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
 dc=clotting
 dc$u=1
 predict(g1,dc)
  1   2   3   4   5
6   7   8   9
-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
-0.01398929 -0.01398929 -0.01398929

However, if I just simply wrap the poly as a user defined function ( in
reality I would have my own more complex function)  then I will get error:
###Case 2:
 xpoly-function(x,degree=1){poly(x,degree)}
 g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
 predict(g2,dc)
Error in poly(x, degree) :
  'degree' must be less than number of unique points

It seems that the predict always treat the user defined function in the
formula with I().  My question is how can I get the  results for Case2 same
as case1?

Anyone can have any idea about this?

Thank you very much.

Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adjusting Probability for Oversampling

2015-07-16 Thread Riya Mehra via R-help

I am developing a marketing (Churn) model that has an event rate of 0.5%. So i 
thought to perform oversampling. I mean making the number of events equal to 
number of non-events by reducing non-events (50-50 after sampling). After 
oversampling, we need to adjust predicted probabilities as it inflates 
intercept. I always do it in logistic regression. Does the same adjustment 
require for decision tree, random forest or other ensemble techniques? I am 
following Applied Predictive Modeling with R (Caret Package). The author has 
used the same technique (down sampling) but he did not adjust predicted 
probability when he score validation and test data. Any help would be highly 
appreciated!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matching strings in a list

2015-07-16 Thread John McKown

On Thu, Jul 16, 2015 at 1:00 PM, John McKown john.archie.mck...@gmail.com
wrote:

 On Thu, Jul 16, 2015 at 12:40 PM, tryingtolearn inshi...@ymail.com
 wrote:

 Say I have a list:
 [[1]] I like google
 [[2]] Hi Google google
 [[3]] what's up

 and they are tweets. And I want to find out how many tweets mention google
 (the answer should be 2).
 If I string split and unlist them, then I would get the answer of 3. How
 do
 I make sure I get just 2?


 NROW(grep(google,list,ignore.case=TRUE))


Or

sum(grepl(google,x,ignore.case=TRUE))

-- 

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speeding up code?

2015-07-16 Thread Ignacio Martinez

Thank Jim!

This makes a huge difference. Can you explain why are data frame slower
than a matrix? Any other suggestions on how to improve the code would be
greatly appreciated.

Thanks again!

Ignacio

On Thu, Jul 16, 2015 at 1:42 PM jim holtman jholt...@gmail.com wrote:

 Actually looking at the result, you don't need the transpose; that was an
 artifact of how you were doing it before.

  xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -))
  # convert to dataframe and do transpose on matrix and not dataframe
  separoPairs - as.data.frame((xm), stringsAsFactors = FALSE)




 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 On Thu, Jul 16, 2015 at 1:37 PM, jim holtman jholt...@gmail.com wrote:

 Here is one improvement.  Avoid dataframes in some of these cases.  This
 create a character matrix and then converts to a dataframe after doing the
 transpose of the matrix.  This just takes less than 10 seconds on my system:


   library(stringr)
   # create character matrix; avoid dataframes in this case
   print(proc.time())
user  system elapsed
   15.525.24  587.70
   xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -))
   # convert to dataframe and do transpose on matrix and not dataframe
   separoPairs - as.data.frame(t(xm), stringsAsFactors = FALSE)
   print(proc.time()
 +
 + )
user  system elapsed
   20.905.36  596.57
 


 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez ignaci...@gmail.com
 wrote:

 Hi Collin,

 The objective of the gen.names function is to generate N *unique *random
 names, where N is a *large *number. In my computer `gen.names(n = 5)`
 takes under a second, so is probably not the root problem in my code.
 That
 said, I would love to improve it. I'm not exactly sure how you propose to
 change it using sample. What is the object that I would be sampling? I
 would love to run a little benchmark to compare my version with yours.

 What really takes a long time to run is:

 separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern
 =
 -))

 So that and the chunk of code before that is probably where I would get
 big
 gains in speed. Sadly, I have no clue how to do it differently

 Thanks a lot for the help!!

 Ignacio


 On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote:

  Hi Ignacio, If I am reading your code correctly then the top while
 loop is
  essentially seeking to select a random set of names from the original
 set,
  then using unique to reduce it down, you then iterate until you have
 built
  your quota.  Ultimately this results in a very inefficient attempt at
  sampling without replacement.  Why not just sample without replacement
  rather than loop iteratively and use unique?  Or if the set of possible
  names are short enough why not just randomize it and then pull the
 first n
  items off?
 
  Best,
  Collin.
 
  On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez 
 ignaci...@gmail.com
  wrote:
 
  Hi R-Help!
 
  I'm hoping that some of you may give me some tips that could make my
 code
 
  more efficient. More precisely, I would like to make the answer to my
  stakoverflow
  
 
 http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions

  
 
 
  question more efficient.
 
  This is the code:
 
  library(dplyr)
  library(randomNames)
  library(geosphere)
 
  set.seed(7142015)# Define Parameters
 
 
  n.Schools - 20
  first.grade-3
  last.grade-5
  n.Grades -last.grade-first.grade+1
  n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE
  n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per
  teacher
  # Define Random names function:
  gen.names - function(n, which.names = both, name.order =
 last.first){
names - unique(randomNames(n=n, which.names = which.names,
  name.order = name.order))
need - n - length(names)
while(need0){
  names - unique(c(randomNames(n=need, which.names = which.names,
  name.order = name.order), names))
  need - n - length(names)
}
return(names)}
  # Generate n.Schools names
  gen.schools - function(n.schools) {
School.ID -
  paste0(gen.names(n = n.schools, which.names = last), ' School')
School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025)
School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025)
School.RE - rnorm(n = n.schools, mean = 0, sd = 1)
Schools -
  data.frame(School.ID, School.lat, School.long, School.RE) %%
  mutate(School.ID = as.character(School.ID)) %%
  rowwise() %%  mutate (School.distance = distHaversine(
p1 = c(School.long, School.lat),
p2 = c(21.7672, 58.8471), r = 3961
  ))
return(Schools)}
 
  Schools - gen.schools(n.schools = n.Schools)
  #

Re: [R] (ordinal) logistic regression

2015-07-16 Thread Duncan Mackay

Hi 
your example is not reproducible. With ordinal regression the type of the y 
values is important
sometimes an ordered factor is required.

Ordinal regression depends on your hypothesis see Ananth and Kleinbaum 1997

functions/packages to look at apart from ordinal
VGAM
polr::MASS
bayespolr::arm
lrm::rms

if you want to do GEE that is another matter.

Regards

Duncan 

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au





-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Wolfgang 
Raffelsberger
Sent: Friday, 17 July 2015 03:41
To: r-help@r-project.org
Subject: [R] (ordinal) logistic regression

Dear list,
I've been looking at previous posts on the list, but I haven't found any
close enough to my question/problem.
My data can be seen as a matrix of mutiple individuals (columns) with
(rather independent) measures (lines). Now based on supplemental
information, the individuals are organized in (multiple) ordered classes.
Now I would like to test for which type of measure (ie which line form my
matrix of data) the groups are distinct (eg different by group-mean). In
other words, I would like to see in which line of my input matrix the
measures for the groups of individuals associate to distinct group-values.

So I tried glm (or glm2) on each line of the matrix, but in my toy-example
(shown below) I'm surprised to get warnings about not finding convergence
in the nice toy-cases (ie distinct groups as I am looking for),e even
with glm2 ! I see in such nice cases with glm() the Pr(|z|) is close
to 1, which in first sight is OK (since: H0 : coefficient =0), but I
suppose the test is not really set up right this way.  When trying lrm (rms
package) I even get an error message (Unable to fit model using “lrm.fit”)
with the nice cases.
In my real data with 4000 lines of data (ie 4000 glm tests) multiple
testing correction would transform everything from 1-p to end up at p=1, so
that’s another problem with this approach.
I suppose somehow I should transform my data (I don't see where I would
change the H0 ?) to obtain low and NOT high p-values (example below) in the
case I'm looking for, ie when group-means are distinct.

Any suggestions ?

Thank’s in advance,
Wolfgang

Here my toy-example :
datB1 - c(12,14:16,18:21,20:22,20,22:24,19.5)   # fit
partially/overlapping to 3grp model
datB2 - c(11:15,21:25,31:36)# too beautiful to be real
...
datB3 - c(15:12,11:15,12:14,15:12)  # no fit to 3grp model
datB4 - c(11:15,15:11,21:26)# grpA == grpB but grpA !=
grpC

datB - rbind(datB1,datB2,datB3,datB4)
set.seed(2015)
datB - datB + round(runif(length(datB),-0.3,0.3),1)  # add some noise
datB - datB - rowMeans(datB) # centering
## here the definition of the groups
grpB - gl(3,6,labels=LETTERS[1:3])[c(-6,-7)]
   table(grpB)

## display
layout(1:4)
for(i in 1:4) plot(datB[i,],as.numeric(grpB))

## now the 'test'
glmLi - function(dat,grp) {
  ## run glm : predict grp based on dat
  dat - data.frame(dat=dat,grp=grp)
  glm(grp ~ dat, data=dat, family=binomial)}

logitB - apply(datB,1,glmLi,grpB)
lapply(logitB,summary)
sapply(logitB,function(x) summary(x)$coefficients[,4])  # lines 1  2 are
designed to be 'positive' but give high p-values with convergence problem

## for completness
sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252
LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C   LC_TIME=French_France.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] glm2_1.1.2  MASS_7.3-40 TinnRcom_1.0.18 formatR_1.2
svSocket_0.9-57

loaded via a namespace (and not attached):
[1] tools_3.2.0   svMisc_0.9-70 tcltk_3.2.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Re: Weighted skewness and curtosis

2015-07-16 Thread Rolf Turner


On 17/07/15 03:45, Bert Gunter wrote:

SNIP


Of course, sample skewness and kurtosis are basically useless, but
that's another, off topic, issue.


Fortune nomination!

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] powerTransform warning message?

2015-07-16 Thread Brittany Demmitt

Hello,

I have a series of 40 variables that I am trying to transform via the boxcox 
method using the powerTransfrom function in R.  I have no zero values in any of 
my variables.  When I run the powerTransform function on the full data set I 
get the following warning. 

Warning message:
In sqrt(diag(solve(res$hessian))) : NaNs produced

However, when I analyze the variables in groups, rather than all 40 at a time I 
do not get this warning message.  Why would this be? And does this mean this 
warning is safe to ignore?

I would like to add that all of my lambda values are in the -5 to 5 range.  I 
also get different lambda values when I analyze the variables together versus 
in groups.  Is this to be expected?

Thank you so much!

Brittany
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] User defined function within a formula

2015-07-16 Thread William Dunlap

OPoly-function(x,degree=1,weight=1){
  weight=round(weight,0)# weight need to be integer
  if(length(weight)!=length(x))weight=rep(1,length(x))
  p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
  Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[-
seq(2)]))[,degree])
  class(Z)-OPoly;Z
}

You need to make OPoly to have optional argument(s) that give
the original-regressor-dependent information to OPoly and then
have it return, as attributes, the value of those arguments.
 makepredictcall
will take the attributes and attach them to the call in predvars so
predict uses values derived from the original regressors, not value derived
from the data to be predicted from.

Take a look at a pair like makepredictcall.scale() and scale() for an
example:
scale has optional arguments 'center' and 'scale' that it returns as
attributes
and makepredictcall.scale adds those to the call to scale that it is given.
Thus when you predict, the scale and center arguments come from the
original data, not from the data you are predicting from.






Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin yinkuns...@gmail.com wrote:

 Thanks Bill for your quick reply.

 I tried your solution and it did work for the simple user defined function
 xploly. But when I try with other function, it gave me error again:

 OPoly-function(x,degree=1,weight=1){
   weight=round(weight,0)# weight need to be integer
   if(length(weight)!=length(x))weight=rep(1,length(x))
   p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)

 Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[-seq(2)]))[,degree])
   class(Z)-OPoly;Z
 }

 ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps
 the x to range[-2,2] then do QR, then return the results with sqrt(norm2).
 Comparing with poly, this transformation will make the model coefficients
 within a similar range as other variables, the R poly routine will usually
 give you a very large coefficients. I did not find such routine in R, so I
 have to define this as user defined function.
 ###

 I  also have following function as you suggested:

 makepredictcall.OPoly-function(var,call)
 {
   if (is.call(call)) {
 if (identical(call[[1]], quote(OPoly))) {
   if (!is.null(tmp - attr(var, coefs))) {
 call$coefs - tmp
   }
 }
   }
   call
 }


 But I still got error for following:

  g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma)

  predict(g3,dc)Error in poly(4 * (rep(x, weight) - 
  mean(range(x)))/diff(range(x)), degree) :
   missing values are not allowed in 'poly'

 I thought it might be due to the /diff(range(x) in the function.  But even
 I remove that part, it will still give me error. Any idea?

 Many thanks in advance.

 Alex























 On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap wdun...@tibco.com wrote:

 Read about the 'makepredictcall' generic function.  There is a method,
 makepredictcall.poly(), for poly() that attaches the polynomial
 coefficients
 used during the fitting procedure to the call to poly() that predict()
 makes.
 You ought to supply a similar method for your xpoly(), and xpoly() needs
 to return an object of a a new class that will cause that method to be used.

 E.g.,

 xpoly - function(x,degree=1,...){ ret - poly(x,degree=degree,...);
 class(ret) - xpoly ; ret }
 makepredictcall.xpoly - function (var, call)
 {
 if (is.call(call)) {
 if (identical(call[[1]], quote(xpoly))) {
 if (!is.null(tmp - attr(var, coefs))) {
 call$coefs - tmp
 }
 }
 }
 call
 }

 g2 - glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
 predict(g2,dc)
 # 1  2  3  4
  5
 #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
 #-0.01398928608
 # 6  7  8  9
 #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608

 You can see the effects of makepredictcall() in the 'terms' component
 of glm's output.  The 'variables' attribute of it gives the original
 function
 calls and the 'predvars' attribute gives the calls to be used for
 prediction:
 attr(g2$terms, variables)
list(lot1, log(u), xpoly(u, 1))
attr(g2$terms, predvars)
   list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1,
   9, 8850



 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin yinkuns...@gmail.com
 wrote:

 Hello, I have a question about the formula and the user defined function:

 I can do following:
 ###Case 1:
  clotting - data.frame(
 + u = c(5,10,15,20,30,40,60,80,100),
 + lot1 = c(118,58,42,35,27,25,21,19,18),
 + lot2 = c(69,35,26,21,18,16,13,12,12))
  g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
  dc=clotting
  dc$u=1
  predict(g1,dc)
   1   2   3   4   5
 6   7

45 matches

Mail list logo