Re: [R] Error in eval(expr, envir, enclos) : could not find function

2015-09-22 Thread Mark Sharp
Please provide a context for your question. See the posting guide referenced 
below for instructions on providing commented, minimal, self-contained, 
reproducible code. If you can show how to produce the error, someone can almost 
certainly show you how to avoid it.

Mark
R. Mark Sharp, Ph.D.
msh...@txbiomed.org





> On Sep 22, 2015, at 2:07 PM, Alaa Sindi  wrote:
> 
> hi all
> 
> I am getting this error "Error in eval(expr, envir, enclos) : could not find 
> function “
> 
> do you have an idea what might cause this problem. 
> 
> thanks
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Excel vs. R

2015-09-22 Thread daniel
Agrupando tus datos y sin necesidad de ninguna librería puedes hacer algo
como esto que creo te puede servir también.

http://stats.stackexchange.com/questions/14118/drawing-multiple-barplots-on-a-graph-in-r

mydata <- data.frame(Barplot1=rbinom(5,16,0.6), Barplot2=rbinom(5,16,0.25),
 Barplot3=rbinom(5,5,0.25), Barplot4=rbinom(5,16,0.7))
barplot(as.matrix(mydata), main="Interesting", ylab="Total", beside=TRUE,
col=terrain.colors(5))
legend(13, 12, c("Label1","Label2","Label3","Label4","Label5"), cex=0.6,
   fill=terrain.colors(5))


Daniel Merino

El 22 de septiembre de 2015, 16:48, pepeceb  escribió:

> Mira esto sobre los Bar Plots
>
> Quick-R: Bar Plots 
>
>
> [image: image] 
>
>
>
>
>
> Quick-R: Bar Plots 
> Bar Plots Create barplots with the barplot(height) function, where height
> is a vector or matrix. If height is a vector, the values determine the
> heights of the bar...
> Ver en www.statmethods.net 
> Vista previa por Yahoo
>
>
>
>
> Si lo que quieres es dividir por intervalos puedes hacer algo así, a ver
> si te sirve.
> Sea z1 tu matriz:
>
> z1$rango <- (cut(z1$tamaño, breaks =3, dig.lab=2)) #Añadimos una variable
> rango con 3 divisiones del rango de tamaños. Te pone los intervalos por
> defecto pero tambien puedes tu ponerlos tu.
>
> table (z1$estaciones,z1$rango) #una tablita con el numero de estaciones
> por rango y tamaño
> #Grafica de barras
> barplot(table(z1$estaciones,z1$rango))# pero la mejoramos un poco
>
> #creamos una leyenda
> leyenda<-c( "Gualeguycito", "(Itapebí]","(Cañada]")
>
> barplot (table (z1$tamaño,z1$rango),
> main= "Promedio de tamaños...", ylab="Tamaño",
>
> beside = T, legend.text=leyenda,args.legend=list(x="topleft"))
>
> Saludos
>
>
>
> El Martes 22 de septiembre de 2015 21:20, Susana deus alvarez <
> susanadeus.deusalva...@gmail.com> escribió:
>
>
> Hola, escribo porque tengo una gran duda como gráficas tan fáciles en
> Excel son tan difíciles en R?
> No consigo hacer una gráfica en R de estas características. Osea como
> puedo dividir por sitios por tamaños. En una planilla enorme tengo la
> primera columna son las estaciones (pero solo quiero las tres ultimas) y
> después muchos parámetros, en las columnas 46, 53 y 60 los promedios
> ponderados. Y no puedo hacerlo en R incapaz. He intentado crear unos excel
> más pequeño solo con eso pero no hay manera. Si alguien me puede ayudar un
> poco...
>
> Gracias
>
> [image: Imágenes integradas 1]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>


-- 
Daniel
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread peter dalgaard
Marc,

I don't think Copyright/Intellectual property issues factor into this. Urkund 
and similar tools are to my knowledge entirely about plagiarism. So the issue 
would seem to be that the R output is considered identical or nearly indentical 
to R output in other published orotherwise  submitted material.

What puzzles me (except for how a document can be deemed 32% plagiarized in 25% 
of the text) is whether this includes the numbers and variable names. If those 
are somehow factored out, then any R regression could be pretty much identical 
to any other R regression. However, two analyses with similar variable names 
could happen if they are based on the same cookbook recipe and analyses with 
similar numerical output come from analyzing the same standard data. Such 
situations would not necessarily be considered plagiarism (I mean: If you claim 
that you are analyzing data from experiments that you yourself have performed, 
and your numbers are exactly identical to something that has been previously 
published, then it would be suspect. If you analyze something from public 
sources, someone else might well have done the same thing.). 

Similarly to John Kane, I think it is necessary to know exactly what sources 
the text is claimed to be plagiarized from and/or what parts of the text that 
are being matched by Urkund. If it turns out that Urkund is generating false 
positives, then this needs to be pointed out to them and to the people basing 
decisions on it.

-pd

> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
> 
> Hi,
> 
> With the usual caveat that I Am Not A Lawyerand that I am not speaking on 
> behalf of any organization...
> 
> My guess is that they are claiming that the output of R, simply being copied 
> and pasted verbatim into your thesis constitutes the use of copyrighted 
> output from the software.
> 
> It is not clear to me that R's output is copyrighted by the R Foundation (or 
> by other parties for CRAN packages), albeit, the source code underlying R is, 
> along with other copyright owner's as apropos. There is some caselaw to 
> support the notion that the output alone is not protected in a similar 
> manner, but that may be country specific.
> 
> Did you provide any credit to R (see the output of citation() ) in your 
> thesis and indicate that your analyses were performed using R?
> 
> If R is uncredited, I could see them raising the issue.
> 
> You might check with your institution's legal/policy folks to see if there is 
> any guidance provided for students regarding the crediting of software used 
> in this manner, especially if that guidance is at no cost to you.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>> 
>> 1. It is highly unlikely that we could be of help (unless someone else
>> has experienced this and knows what happened). You will have to
>> contact the Urkund people and ask them why their algorithms raised the
>> flags.
>> 
>> 2. But of course, the regression methodology is not "your own" -- it's
>> just a standard tool that you used in your work, which is entirely
>> legitimate of course.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>  -- Clifford Stoll
>> 
>> 
>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>  wrote:
>>> 
>>> Dear 'R' community support,
>>> 
>>> 
>>> I am a student at Skema business school and I have recently submitted my 
>>> MSc thesis/dissertation. This has been passed on to an external plagiarism 
>>> service provider, Urkund, who have scanned my document and returned a 
>>> plagiarism report to my professor having detected 32% plagiarism.
>>> 
>>> 
>>> I have contacted Urkund regarding this issue having committed no such 
>>> plagiarism and they have told me that all the plagiarism detected in my 
>>> document comes from the last 25% which consists only of 'R' regressions 
>>> like the one I have pasted below:
>>> 
>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>> 
>>> Residuals:
>>> Min1QMedian3Q   Max
>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>> 
>>> Coefficients:
>>>Estimate Std. Error t value Pr(>|t|)
>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>> Fed -0.121595   0.165359  -0.735   0.4627
>>> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
>>> Fed.t.2. 0.026529   0.143648   0.185   0.8536
>>> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
>>> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>> 
>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>> (20 observations deleted due to missingness)
>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>> 

[R-es] Excel vs. R

2015-09-22 Thread Susana deus alvarez
Hola, escribo porque tengo una gran duda como gráficas tan fáciles en Excel
son tan difíciles en R?
No consigo hacer una gráfica en R de estas características. Osea como puedo
dividir por sitios por tamaños. En una planilla enorme tengo la primera
columna son las estaciones (pero solo quiero las tres ultimas) y después
muchos parámetros, en las columnas 46, 53 y 60 los promedios ponderados. Y
no puedo hacerlo en R incapaz. He intentado crear unos excel más pequeño
solo con eso pero no hay manera. Si alguien me puede ayudar un poco...

Gracias

[image: Imágenes integradas 1]
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] How to coerce a parameter in nls?

2015-09-22 Thread Jianling Fan
great,   Thanks a lot!

On 22 September 2015 at 12:07, Gabor Grothendieck
 wrote:
> You may have to do without masking and switch back to nls.  dproot2 and fo
> are from prior post.
>
> # to mask Rm6 omit it from start and set it explicitly
> st <- c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65, Rm5=1.01, d50=20, c=-1)
> Rm6 <- 1
>
> fm.nls <- nls(fo, dproot2, start = st)
>
> AIC(fm.nls)
> summary(fm.nls)
>
>
> On Tue, Sep 22, 2015 at 12:46 PM, Jianling Fan 
> wrote:
>>
>> Hello Prof. Nash,
>>
>> My regression works good now. But I found another problem when I using
>> nlxb. In the output, the SE, t-stat, and p-value are not available.
>> Furthermore, I can't extract AIC from the output. The output looks
>> like below:
>>
>> Do you have any suggestion for this?
>>
>> Thanks a lot!
>>
>> Regards,
>>
>> nlmrt class object: x
>> residual sumsquares =  0.29371  on  33 observations
>> after  9Jacobian and  10 function evaluations
>>   namecoeff  SE   tstat  pval
>> gradientJSingval
>> Rm1   1.1162NA NA NA
>> -3.059e-13   2.745
>> Rm2  1.56072NA NA NA
>> 1.417e-131.76
>> Rm3  1.09775NA NA NA
>> -3.179e-13   1.748
>> Rm4  7.18377NA NA NA
>> -2.941e-12   1.748
>> Rm5  1.13562NA NA NA
>> -3.305e-13   1.076
>> Rm61  M NA NA NA
>> 0   0.603
>> d50  22.4803NA NA NA
>> 4.975e-13   0.117
>> c   -1.64075NA NA NA
>> 4.12e-12   1.908e-17
>>
>>
>>
>> On 21 September 2015 at 13:38, ProfJCNash  wrote:
>> > I've not used it for group data, and suspect that the code to generate
>> > derivatives cannot cope with the bracket syntax. If you can rewrite the
>> > equation without the brackets, you could get the derivatives and solve
>> > that
>> > way. This will probably mean having a "translation" routine to glue
>> > things
>> > together.
>> >
>> > JN
>> >
>> >
>> > On 15-09-21 12:22 PM, Jianling Fan wrote:
>> >>
>> >> Thanks Prof. Nash,
>> >>
>> >> Sorry for late reply. I am learning and trying to use your nlmrt
>> >> package since I got your email. It works good to mask a parameter in
>> >> regression but seems does work for my equation. I think the problem is
>> >> that the parameter I want to mask is a group-specific parameter and I
>> >> have a "[]" syntax in my equation. However, I don't have your 2014
>> >> book on hand and couldn't find it in our library. So I am wondering if
>> >> nlxb works for group data?
>> >> Thanks a lot!
>> >>
>> >> following is my code and I got a error form it.
>> >>
>> >>> fitdp1<-nlxb(den~Rm[ref]/(1+(depth/d50)^c),data=dproot,
>> >>
>> >>  + start =c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
>> >> Rm5=1.01, Rm6=1, d50=20, c=-1),
>> >>  + masked=c("Rm6"))
>> >>
>> >> Error in deriv.default(parse(text = resexp), names(start)) :
>> >>Function '`[`' is not in the derivatives table
>> >>
>> >>
>> >> Best regards,
>> >>
>> >> Jianling
>> >>
>> >>
>> >> On 20 September 2015 at 12:56, ProfJCNash  wrote:
>> >>>
>> >>> I posted a suggestion to use nlmrt package (function nlxb to be
>> >>> precise),
>> >>> which has masked (fixed) parameters. Examples in my 2014 book on
>> >>> Nonlinear
>> >>> parameter optimization with R tools. However, I'm travelling just now,
>> >>> or
>> >>> would consider giving this a try.
>> >>>
>> >>> JN
>> >>>
>> >>>
>> >>> On 15-09-20 01:19 PM, Jianling Fan wrote:
>> 
>> 
>>  no, I am doing a regression with 6 group data with 2 shared
>>  parameters
>>  and 1 different parameter for each group data. the parameter I want
>>  to
>>  coerce is for one group. I don't know how to do it. Any suggestion?
>> 
>>  Thanks!
>> 
>>  On 19 September 2015 at 13:33, Jeff Newmiller
>>  
>>  wrote:
>> >
>> >
>> > Why not rewrite the function so that value is not a parameter?
>> >
>> >
>> >
>> > ---
>> > Jeff NewmillerThe .   .  Go
>> > Live...
>> > DCN:Basics: ##.#.   ##.#.
>> > Live
>> > Go...
>> > Live:   OO#.. Dead: OO#..
>> > Playing
>> > Research Engineer (Solar/BatteriesO.O#.   #.O#.
>> > with
>> > /Software/Embedded Controllers)   .OO#.   .OO#.
>> > rocks...1k
>> >
>> >
>> >
>> > ---
>> > Sent from my phone. Please excuse my 

Re: [R] unixtime conversion

2015-09-22 Thread jim holtman
you can also do:

> structure(1183377301, class = c("POSIXct", "POSIXt"))
[1] "2007-07-02 07:55:01 EDT"
>




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Tue, Sep 22, 2015 at 8:01 AM, JonyGreen  wrote:

> you can try this  free online timestamp converter
>   to convert timestamp to
> readable date.
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/unixtime-conversion-tp829898p4712599.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Excel vs. R

2015-09-22 Thread Susana deus alvarez
ya lo que pasa es que si ya me decidi a hacer las gráficas en R no queda
prolijo hacer una en excel
Los datos son 3 por sitio, el excel reducido sería así:
Estacion Mayor 150 Entre 150 y 50 Entre 50 y 23 GUA 49 60 12 ITA 37 41 19
CVA 37 83 11

El 22 de septiembre de 2015, 16:33, daniel  escribió:

> Antes que nada puedes encontrar documentación sobre la librería ggplot2
> que es la que yo uso para gráficos como el que quieres hacer, aunque hay
> otros paquetes que te pueden ser útiles:
>
> http://docs.ggplot2.org/current/
>
> Adicionalmente a lo dicho por Carlos aquí te van otros ejemplos (los
> agrego porque no nos proporcionaste datos con lo que pudiéramos ver mejor
> lo que buscas):
>
> http://stackoverflow.com/questions/18158461/grouped-bar-plot-in-ggplot
>
> http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
>
>
> https://martinsbioblogg.wordpress.com/2014/03/19/using-r-barplot-with-ggplot2/
>
> http://stackoverflow.com/questions/17303573/ggplot-multiple-grouping-bar
>
>
> Finalmente, si te va bien Excel úsalo, si buscas algo mejor siempre puedes
> encontrarlo en R.
>
> Daniel Merino
>
>
>
>
> El 22 de septiembre de 2015, 15:57, Carlos J. Gil Bellosta <
> c...@datanalytics.com> escribió:
>
>> Hola, ¿qué tal?
>>
>> Lo que quieres es algo parecido a lo que se publica (con código) en
>>
>>
>> http://stackoverflow.com/questions/18624394/ggplot-bar-plot-with-facet-dependent-order-of-categories
>>
>> Es decir: ggplot2 con facetas (por estaciones). Posiblemente tengas que
>> pivotar tus datos para tener un conjunto de datos con tres columnas:
>>
>> 1) la estación
>> 2) la etiqueta de la variable
>> 3) el valor
>>
>> Eso se hace, entre otras herramientas, con melt (de reshape2).
>>
>> Un saludo,
>>
>> Carlos J. Gil Bellosta
>> http://www.datanalytics.com
>>
>> El 22 de septiembre de 2015, 20:40, Susana deus alvarez <
>> susanadeus.deusalva...@gmail.com> escribió:
>>
>>> Hola, escribo porque tengo una gran duda como gráficas tan fáciles en
>>> Excel son tan difíciles en R?
>>> No consigo hacer una gráfica en R de estas características. Osea como
>>> puedo dividir por sitios por tamaños. En una planilla enorme tengo la
>>> primera columna son las estaciones (pero solo quiero las tres ultimas) y
>>> después muchos parámetros, en las columnas 46, 53 y 60 los promedios
>>> ponderados. Y no puedo hacerlo en R incapaz. He intentado crear unos excel
>>> más pequeño solo con eso pero no hay manera. Si alguien me puede ayudar un
>>> poco...
>>>
>>> Gracias
>>>
>>> [image: Imágenes integradas 1]
>>>
>>> ___
>>> R-help-es mailing list
>>> R-help-es@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-help-es
>>>
>>>
>>
>> ___
>> R-help-es mailing list
>> R-help-es@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-help-es
>>
>>
>
>
> --
> Daniel
>
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Excel vs. R

2015-09-22 Thread pepeceb
Mira esto sobre los Bar Plots
Quick-R: Bar Plots

|   |
|   |  |   |   |   |   |   |
| Quick-R: Bar PlotsBar Plots Create barplots with the barplot(height) 
function, where height is a vector or matrix. If height is a vector, the values 
determine the heights of the bar... |
|  |
| Ver en www.statmethods.net | Vista previa por Yahoo |
|  |
|   |




Si lo que quieres es dividir por intervalos puedes hacer algo así, a ver si te 
sirve. Sea z1 tu matriz:
z1$rango <- (cut(z1$tamaño, breaks =3, dig.lab=2)) #Añadimos una variable rango 
con 3 divisiones del rango de tamaños. Te pone los intervalos por defecto pero 
tambien puedes tu ponerlos tu.
table (z1$estaciones,z1$rango) #una tablita con el numero de estaciones por 
rango y tamaño#Grafica de barrasbarplot(table(z1$estaciones,z1$rango))# pero la 
mejoramos un poco
#creamos una leyendaleyenda<-c( "Gualeguycito", "(Itapebí]","(Cañada]")
barplot (table (z1$tamaño,z1$rango),main= "Promedio de tamaños...", 
ylab="Tamaño",
beside = T, legend.text=leyenda,args.legend=list(x="topleft"))
Saludos 


 El Martes 22 de septiembre de 2015 21:20, Susana deus alvarez 
 escribió:
   

 Hola, escribo porque tengo una gran duda como gráficas tan fáciles en Excel 
son tan difíciles en R?No consigo hacer una gráfica en R de estas 
características. Osea como puedo dividir por sitios por tamaños. En una 
planilla enorme tengo la primera columna son las estaciones (pero solo quiero 
las tres ultimas) y después muchos parámetros, en las columnas 46, 53 y 60 los 
promedios ponderados. Y no puedo hacerlo en R incapaz. He intentado crear unos 
excel más pequeño solo con eso pero no hay manera. Si alguien me puede ayudar 
un poco...
Gracias


___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

  ___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Excel vs. R

2015-09-22 Thread daniel
Por lo que veo creo que el ejemplo del cookbook te debiera ir (mira los
últimos gráficos). Ten en cuenta para otra vez que envíes datos el uso de
la función dput(), o mira como definen en el cookbook el data.frame datn.
De esa manera tendrás oportunidad de recibir mejores respuestas.

Daniel Merino

El 22 de septiembre de 2015, 16:36, Susana deus alvarez <
susanadeus.deusalva...@gmail.com> escribió:

> ya lo que pasa es que si ya me decidi a hacer las gráficas en R no queda
> prolijo hacer una en excel
> Los datos son 3 por sitio, el excel reducido sería así:
> Estacion Mayor 150 Entre 150 y 50 Entre 50 y 23 GUA 49 60 12 ITA 37 41 19
> CVA 37 83 11
>
> El 22 de septiembre de 2015, 16:33, daniel  escribió:
>
>> Antes que nada puedes encontrar documentación sobre la librería ggplot2
>> que es la que yo uso para gráficos como el que quieres hacer, aunque hay
>> otros paquetes que te pueden ser útiles:
>>
>> http://docs.ggplot2.org/current/
>>
>> Adicionalmente a lo dicho por Carlos aquí te van otros ejemplos (los
>> agrego porque no nos proporcionaste datos con lo que pudiéramos ver mejor
>> lo que buscas):
>>
>> http://stackoverflow.com/questions/18158461/grouped-bar-plot-in-ggplot
>>
>> http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
>>
>>
>> https://martinsbioblogg.wordpress.com/2014/03/19/using-r-barplot-with-ggplot2/
>>
>> http://stackoverflow.com/questions/17303573/ggplot-multiple-grouping-bar
>>
>>
>> Finalmente, si te va bien Excel úsalo, si buscas algo mejor siempre
>> puedes encontrarlo en R.
>>
>> Daniel Merino
>>
>>
>>
>>
>> El 22 de septiembre de 2015, 15:57, Carlos J. Gil Bellosta <
>> c...@datanalytics.com> escribió:
>>
>>> Hola, ¿qué tal?
>>>
>>> Lo que quieres es algo parecido a lo que se publica (con código) en
>>>
>>>
>>> http://stackoverflow.com/questions/18624394/ggplot-bar-plot-with-facet-dependent-order-of-categories
>>>
>>> Es decir: ggplot2 con facetas (por estaciones). Posiblemente tengas que
>>> pivotar tus datos para tener un conjunto de datos con tres columnas:
>>>
>>> 1) la estación
>>> 2) la etiqueta de la variable
>>> 3) el valor
>>>
>>> Eso se hace, entre otras herramientas, con melt (de reshape2).
>>>
>>> Un saludo,
>>>
>>> Carlos J. Gil Bellosta
>>> http://www.datanalytics.com
>>>
>>> El 22 de septiembre de 2015, 20:40, Susana deus alvarez <
>>> susanadeus.deusalva...@gmail.com> escribió:
>>>
 Hola, escribo porque tengo una gran duda como gráficas tan fáciles en
 Excel son tan difíciles en R?
 No consigo hacer una gráfica en R de estas características. Osea como
 puedo dividir por sitios por tamaños. En una planilla enorme tengo la
 primera columna son las estaciones (pero solo quiero las tres ultimas) y
 después muchos parámetros, en las columnas 46, 53 y 60 los promedios
 ponderados. Y no puedo hacerlo en R incapaz. He intentado crear unos excel
 más pequeño solo con eso pero no hay manera. Si alguien me puede ayudar un
 poco...

 Gracias

 [image: Imágenes integradas 1]

 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es


>>>
>>> ___
>>> R-help-es mailing list
>>> R-help-es@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-help-es
>>>
>>>
>>
>>
>> --
>> Daniel
>>
>
>


-- 
Daniel
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Excel vs. R

2015-09-22 Thread daniel
Antes que nada puedes encontrar documentación sobre la librería ggplot2 que
es la que yo uso para gráficos como el que quieres hacer, aunque hay otros
paquetes que te pueden ser útiles:

http://docs.ggplot2.org/current/

Adicionalmente a lo dicho por Carlos aquí te van otros ejemplos (los agrego
porque no nos proporcionaste datos con lo que pudiéramos ver mejor lo que
buscas):

http://stackoverflow.com/questions/18158461/grouped-bar-plot-in-ggplot

http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

https://martinsbioblogg.wordpress.com/2014/03/19/using-r-barplot-with-ggplot2/

http://stackoverflow.com/questions/17303573/ggplot-multiple-grouping-bar


Finalmente, si te va bien Excel úsalo, si buscas algo mejor siempre puedes
encontrarlo en R.

Daniel Merino




El 22 de septiembre de 2015, 15:57, Carlos J. Gil Bellosta <
c...@datanalytics.com> escribió:

> Hola, ¿qué tal?
>
> Lo que quieres es algo parecido a lo que se publica (con código) en
>
>
> http://stackoverflow.com/questions/18624394/ggplot-bar-plot-with-facet-dependent-order-of-categories
>
> Es decir: ggplot2 con facetas (por estaciones). Posiblemente tengas que
> pivotar tus datos para tener un conjunto de datos con tres columnas:
>
> 1) la estación
> 2) la etiqueta de la variable
> 3) el valor
>
> Eso se hace, entre otras herramientas, con melt (de reshape2).
>
> Un saludo,
>
> Carlos J. Gil Bellosta
> http://www.datanalytics.com
>
> El 22 de septiembre de 2015, 20:40, Susana deus alvarez <
> susanadeus.deusalva...@gmail.com> escribió:
>
>> Hola, escribo porque tengo una gran duda como gráficas tan fáciles en
>> Excel son tan difíciles en R?
>> No consigo hacer una gráfica en R de estas características. Osea como
>> puedo dividir por sitios por tamaños. En una planilla enorme tengo la
>> primera columna son las estaciones (pero solo quiero las tres ultimas) y
>> después muchos parámetros, en las columnas 46, 53 y 60 los promedios
>> ponderados. Y no puedo hacerlo en R incapaz. He intentado crear unos excel
>> más pequeño solo con eso pero no hay manera. Si alguien me puede ayudar un
>> poco...
>>
>> Gracias
>>
>> [image: Imágenes integradas 1]
>>
>> ___
>> R-help-es mailing list
>> R-help-es@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-help-es
>>
>>
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>


-- 
Daniel
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R] Error in eval(expr, envir, enclos) : could not find function

2015-09-22 Thread Alaa Sindi
hi all

I am getting this error "Error in eval(expr, envir, enclos) : could not find 
function “

do you have an idea what might cause this problem. 

thanks
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Marc Schwartz
Hi,

With the usual caveat that I Am Not A Lawyerand that I am not speaking on 
behalf of any organization...

My guess is that they are claiming that the output of R, simply being copied 
and pasted verbatim into your thesis constitutes the use of copyrighted output 
from the software.

It is not clear to me that R's output is copyrighted by the R Foundation (or by 
other parties for CRAN packages), albeit, the source code underlying R is, 
along with other copyright owner's as apropos. There is some caselaw to support 
the notion that the output alone is not protected in a similar manner, but that 
may be country specific.

Did you provide any credit to R (see the output of citation() ) in your thesis 
and indicate that your analyses were performed using R?

If R is uncredited, I could see them raising the issue.

You might check with your institution's legal/policy folks to see if there is 
any guidance provided for students regarding the crediting of software used in 
this manner, especially if that guidance is at no cost to you.

Regards,

Marc Schwartz


> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
> 
> 1. It is highly unlikely that we could be of help (unless someone else
> has experienced this and knows what happened). You will have to
> contact the Urkund people and ask them why their algorithms raised the
> flags.
> 
> 2. But of course, the regression methodology is not "your own" -- it's
> just a standard tool that you used in your work, which is entirely
> legitimate of course.
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
> 
> 
> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>  wrote:
>> 
>> Dear 'R' community support,
>> 
>> 
>> I am a student at Skema business school and I have recently submitted my MSc 
>> thesis/dissertation. This has been passed on to an external plagiarism 
>> service provider, Urkund, who have scanned my document and returned a 
>> plagiarism report to my professor having detected 32% plagiarism.
>> 
>> 
>> I have contacted Urkund regarding this issue having committed no such 
>> plagiarism and they have told me that all the plagiarism detected in my 
>> document comes from the last 25% which consists only of 'R' regressions like 
>> the one I have pasted below:
>> 
>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>Fed.t.4., data = OLS_CAR, x = TRUE)
>> 
>> Residuals:
>>  Min1QMedian3Q   Max
>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>> 
>> Coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>> Fed -0.121595   0.165359  -0.735   0.4627
>> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
>> Fed.t.2. 0.026529   0.143648   0.185   0.8536
>> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
>> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> 
>> Residual standard error: 0.0293 on 304 degrees of freedom
>>  (20 observations deleted due to missingness)
>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>> 
>> I have produced all of these regressions myself and pasted them directly 
>> from the 'R' software package. My regression methodology is entirely my own 
>> along with the sourcing and preperation of the data used to produce these 
>> statistics.
>> 
>> I would be very grateful if you could provide my with some clarity as to why 
>> this output from 'R' is reading as plagiarism.
>> 
>> I would like to thank you in advance,
>> 
>> Kind regards,
>> 
>> Oliver Barrett
>> (+44) 7341 834 217
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error from lme4: "Error: (p <- ncol(X)) == ncol(Y) is not TRUE"

2015-09-22 Thread Adams, Jean
Rory,

When I searched online, I found an issue with lme4 on GitHub that suggests
this error is "due to NA values in non-factor variables".
https://github.com/lme4/lme4/issues/246

Hope this helps.

Jean

On Tue, Sep 22, 2015 at 8:18 AM, Rory Wilson  wrote:

> Hello all, I am trying to run a random intercept model using lme4. The
> random effect is a factor of 29 possibilities, making a model with one
> random effect (one level). It is just a linear model. There are 713
> observations. However, when trying to run the model I receive the
> error "Error: (p <- ncol(X)) == ncol(Y) is not TRUE",
> a search for which reveals somewhat surprisingly little. Has anyone seen
> this before? Note that if I simply change the random effect into a fixed
> effect and use lm, the model works perfectly.Thank you!Rory
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to coerce a parameter in nls?

2015-09-22 Thread Gabor Grothendieck
You may have to do without masking and switch back to nls.  dproot2 and fo
are from prior post.

# to mask Rm6 omit it from start and set it explicitly
st <- c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65, Rm5=1.01, d50=20, c=-1)
Rm6 <- 1

fm.nls <- nls(fo, dproot2, start = st)

AIC(fm.nls)
summary(fm.nls)


On Tue, Sep 22, 2015 at 12:46 PM, Jianling Fan 
wrote:

> Hello Prof. Nash,
>
> My regression works good now. But I found another problem when I using
> nlxb. In the output, the SE, t-stat, and p-value are not available.
> Furthermore, I can't extract AIC from the output. The output looks
> like below:
>
> Do you have any suggestion for this?
>
> Thanks a lot!
>
> Regards,
>
> nlmrt class object: x
> residual sumsquares =  0.29371  on  33 observations
> after  9Jacobian and  10 function evaluations
>   namecoeff  SE   tstat  pval
> gradientJSingval
> Rm1   1.1162NA NA NA
> -3.059e-13   2.745
> Rm2  1.56072NA NA NA
> 1.417e-131.76
> Rm3  1.09775NA NA NA
> -3.179e-13   1.748
> Rm4  7.18377NA NA NA
> -2.941e-12   1.748
> Rm5  1.13562NA NA NA
> -3.305e-13   1.076
> Rm61  M NA NA NA
> 0   0.603
> d50  22.4803NA NA NA
> 4.975e-13   0.117
> c   -1.64075NA NA NA
> 4.12e-12   1.908e-17
>
>
>
> On 21 September 2015 at 13:38, ProfJCNash  wrote:
> > I've not used it for group data, and suspect that the code to generate
> > derivatives cannot cope with the bracket syntax. If you can rewrite the
> > equation without the brackets, you could get the derivatives and solve
> that
> > way. This will probably mean having a "translation" routine to glue
> things
> > together.
> >
> > JN
> >
> >
> > On 15-09-21 12:22 PM, Jianling Fan wrote:
> >>
> >> Thanks Prof. Nash,
> >>
> >> Sorry for late reply. I am learning and trying to use your nlmrt
> >> package since I got your email. It works good to mask a parameter in
> >> regression but seems does work for my equation. I think the problem is
> >> that the parameter I want to mask is a group-specific parameter and I
> >> have a "[]" syntax in my equation. However, I don't have your 2014
> >> book on hand and couldn't find it in our library. So I am wondering if
> >> nlxb works for group data?
> >> Thanks a lot!
> >>
> >> following is my code and I got a error form it.
> >>
> >>> fitdp1<-nlxb(den~Rm[ref]/(1+(depth/d50)^c),data=dproot,
> >>
> >>  + start =c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
> >> Rm5=1.01, Rm6=1, d50=20, c=-1),
> >>  + masked=c("Rm6"))
> >>
> >> Error in deriv.default(parse(text = resexp), names(start)) :
> >>Function '`[`' is not in the derivatives table
> >>
> >>
> >> Best regards,
> >>
> >> Jianling
> >>
> >>
> >> On 20 September 2015 at 12:56, ProfJCNash  wrote:
> >>>
> >>> I posted a suggestion to use nlmrt package (function nlxb to be
> precise),
> >>> which has masked (fixed) parameters. Examples in my 2014 book on
> >>> Nonlinear
> >>> parameter optimization with R tools. However, I'm travelling just now,
> or
> >>> would consider giving this a try.
> >>>
> >>> JN
> >>>
> >>>
> >>> On 15-09-20 01:19 PM, Jianling Fan wrote:
> 
> 
>  no, I am doing a regression with 6 group data with 2 shared parameters
>  and 1 different parameter for each group data. the parameter I want to
>  coerce is for one group. I don't know how to do it. Any suggestion?
> 
>  Thanks!
> 
>  On 19 September 2015 at 13:33, Jeff Newmiller <
> jdnew...@dcn.davis.ca.us>
>  wrote:
> >
> >
> > Why not rewrite the function so that value is not a parameter?
> >
> >
> >
> ---
> > Jeff NewmillerThe .   .  Go
> > Live...
> > DCN:Basics: ##.#.   ##.#.
> Live
> > Go...
> > Live:   OO#.. Dead: OO#..
> > Playing
> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> > /Software/Embedded Controllers)   .OO#.   .OO#.
> > rocks...1k
> >
> >
> >
> ---
> > Sent from my phone. Please excuse my brevity.
> >
> > On September 18, 2015 9:54:54 PM PDT, Jianling Fan
> >  wrote:
> >>
> >>
> >> Hello, everyone,
> >>
> >> I am using a nls regression with 6 groups data. I am trying to
> coerce
> >> a parameter to 1 by using a upper and lower statement. but I 

Re: [R] Accounting for correlated random effects in coxme

2015-09-22 Thread Therneau, Terry M., Ph.D.

I've been away for a couple weeks and am now catching up on email.

The issue is that the coxme code does not have conversions built-in for all of the 
possible types of sparse matrix.  Since it assumes that the variance matrix must be 
symmetric, the non-neccarily-symmetric dgCMatrix class is not one that I had considered. 
You should transform it to dsCMatrix first, which is a symmetric

class.  Or if it is small enough, to a simple matrix.

Terry T.


On 09/22/2015 05:00 AM, r-help-requ...@r-project.org wrote:

I have a problem with running the mixed effects Cox regression model using
a distance matrix from a phylogeny rather than a pedigree. I searched
previous posts and didn't find any directly relevant previous posts.

I am interested in using a mixed effects Cox regression model to determine
the best predictors of time to recruitment in 80 different reintroduced
plant populations representing a total of 31 species. I will like to
account for correlated random effects that result from phylogenetic
relationships amongst species. Dr. Therneau's 2015 article on Mixed Effects
Cox Models provide a very helpful template for me to do this with the coxme
function in R. In this article, the correlation structure due to genetic
relationships amongst individuals was defined using a kinship matrix
derived from a pedigree. Instead of a pedigree, I have a phylogeny for
these 31 species. Hence, I used the inverseA function in the MCMCglmm
package to generate an inverse additive genetic relatedness matrix from the
phylogeny for these 31 species. And then fed it in as input to the varlist
argument in my mixed effects cox regression model (using function coxme). I
got an error message (please see below). Based on the error, one thought I
had was to convert the inverseA matrix from a ?dgCMatrix? to ?bdsmatrix?
but this was not successful either. I have also unsuccessfully tried to use
a pairwise phylogenetic distance matrix.

Is there a better way to do this? I basically just want to account for the
correlated random effects due to phylogenetic relatedness amongst the 31
species represented in the dataset for the Cox regression model.  Please
see my code below and I welcome suggestions on how best to make this work.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to coerce a parameter in nls?

2015-09-22 Thread Jianling Fan
Hello Prof. Nash,

My regression works good now. But I found another problem when I using
nlxb. In the output, the SE, t-stat, and p-value are not available.
Furthermore, I can't extract AIC from the output. The output looks
like below:

Do you have any suggestion for this?

Thanks a lot!

Regards,

nlmrt class object: x
residual sumsquares =  0.29371  on  33 observations
after  9Jacobian and  10 function evaluations
  namecoeff  SE   tstat  pval
gradientJSingval
Rm1   1.1162NA NA NA
-3.059e-13   2.745
Rm2  1.56072NA NA NA
1.417e-131.76
Rm3  1.09775NA NA NA
-3.179e-13   1.748
Rm4  7.18377NA NA NA
-2.941e-12   1.748
Rm5  1.13562NA NA NA
-3.305e-13   1.076
Rm61  M NA NA NA
0   0.603
d50  22.4803NA NA NA
4.975e-13   0.117
c   -1.64075NA NA NA
4.12e-12   1.908e-17



On 21 September 2015 at 13:38, ProfJCNash  wrote:
> I've not used it for group data, and suspect that the code to generate
> derivatives cannot cope with the bracket syntax. If you can rewrite the
> equation without the brackets, you could get the derivatives and solve that
> way. This will probably mean having a "translation" routine to glue things
> together.
>
> JN
>
>
> On 15-09-21 12:22 PM, Jianling Fan wrote:
>>
>> Thanks Prof. Nash,
>>
>> Sorry for late reply. I am learning and trying to use your nlmrt
>> package since I got your email. It works good to mask a parameter in
>> regression but seems does work for my equation. I think the problem is
>> that the parameter I want to mask is a group-specific parameter and I
>> have a "[]" syntax in my equation. However, I don't have your 2014
>> book on hand and couldn't find it in our library. So I am wondering if
>> nlxb works for group data?
>> Thanks a lot!
>>
>> following is my code and I got a error form it.
>>
>>> fitdp1<-nlxb(den~Rm[ref]/(1+(depth/d50)^c),data=dproot,
>>
>>  + start =c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
>> Rm5=1.01, Rm6=1, d50=20, c=-1),
>>  + masked=c("Rm6"))
>>
>> Error in deriv.default(parse(text = resexp), names(start)) :
>>Function '`[`' is not in the derivatives table
>>
>>
>> Best regards,
>>
>> Jianling
>>
>>
>> On 20 September 2015 at 12:56, ProfJCNash  wrote:
>>>
>>> I posted a suggestion to use nlmrt package (function nlxb to be precise),
>>> which has masked (fixed) parameters. Examples in my 2014 book on
>>> Nonlinear
>>> parameter optimization with R tools. However, I'm travelling just now, or
>>> would consider giving this a try.
>>>
>>> JN
>>>
>>>
>>> On 15-09-20 01:19 PM, Jianling Fan wrote:


 no, I am doing a regression with 6 group data with 2 shared parameters
 and 1 different parameter for each group data. the parameter I want to
 coerce is for one group. I don't know how to do it. Any suggestion?

 Thanks!

 On 19 September 2015 at 13:33, Jeff Newmiller 
 wrote:
>
>
> Why not rewrite the function so that value is not a parameter?
>
>
> ---
> Jeff NewmillerThe .   .  Go
> Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
> Live:   OO#.. Dead: OO#..
> Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.
> rocks...1k
>
>
> ---
> Sent from my phone. Please excuse my brevity.
>
> On September 18, 2015 9:54:54 PM PDT, Jianling Fan
>  wrote:
>>
>>
>> Hello, everyone,
>>
>> I am using a nls regression with 6 groups data. I am trying to coerce
>> a parameter to 1 by using a upper and lower statement. but I always
>> get an error like below:
>>
>> Error in ifelse(internalPars < upper, 1, -1) :
>>(list) object cannot be coerced to type 'double'
>>
>> does anyone know how to fix it?
>>
>> thanks in advance!
>>
>> My code is below:
>>
>>
>>
>>> dproot
>>
>>
>> depth   den ref
>> 1 20 0.573   1
>> 2 40 0.780   1
>> 3 60 0.947   1
>> 4 80 0.990   1
>> 5100 1.000   1
>> 6 10 0.600   2
>> 7 20 0.820   2
>> 8 30 0.930   2
>> 9 40 1.000   2
>> 1020 0.480  

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread John Kane
Very good point about the referencing. 

I wonder if this is happening to users of Stata or SAS as well?

John Kane
Kingston ON Canada


> -Original Message-
> From: marc_schwa...@me.com
> Sent: Tue, 22 Sep 2015 11:24:13 -0500
> To: bgunter.4...@gmail.com
> Subject: Re: [R] 'R' Software Output Plagiarism
> 
> Hi,
> 
> With the usual caveat that I Am Not A Lawyerand that I am not
> speaking on behalf of any organization...
> 
> My guess is that they are claiming that the output of R, simply being
> copied and pasted verbatim into your thesis constitutes the use of
> copyrighted output from the software.
> 
> It is not clear to me that R's output is copyrighted by the R Foundation
> (or by other parties for CRAN packages), albeit, the source code
> underlying R is, along with other copyright owner's as apropos. There is
> some caselaw to support the notion that the output alone is not protected
> in a similar manner, but that may be country specific.
> 
> Did you provide any credit to R (see the output of citation() ) in your
> thesis and indicate that your analyses were performed using R?
> 
> If R is uncredited, I could see them raising the issue.
> 
> You might check with your institution's legal/policy folks to see if
> there is any guidance provided for students regarding the crediting of
> software used in this manner, especially if that guidance is at no cost
> to you.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Sep 22, 2015, at 11:01 AM, Bert Gunter 
>> wrote:
>> 
>> 1. It is highly unlikely that we could be of help (unless someone else
>> has experienced this and knows what happened). You will have to
>> contact the Urkund people and ask them why their algorithms raised the
>> flags.
>> 
>> 2. But of course, the regression methodology is not "your own" -- it's
>> just a standard tool that you used in your work, which is entirely
>> legitimate of course.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>   -- Clifford Stoll
>> 
>> 
>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>  wrote:
>>> 
>>> Dear 'R' community support,
>>> 
>>> 
>>> I am a student at Skema business school and I have recently submitted
>>> my MSc thesis/dissertation. This has been passed on to an external
>>> plagiarism service provider, Urkund, who have scanned my document and
>>> returned a plagiarism report to my professor having detected 32%
>>> plagiarism.
>>> 
>>> 
>>> I have contacted Urkund regarding this issue having committed no such
>>> plagiarism and they have told me that all the plagiarism detected in my
>>> document comes from the last 25% which consists only of 'R' regressions
>>> like the one I have pasted below:
>>> 
>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>Fed.t.4., data = OLS_CAR, x = TRUE)
>>> 
>>> Residuals:
>>>  Min1QMedian3Q   Max
>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>> 
>>> Coefficients:
>>> Estimate Std. Error t value Pr(>|t|)
>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>> Fed -0.121595   0.165359  -0.735   0.4627
>>> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
>>> Fed.t.2. 0.026529   0.143648   0.185   0.8536
>>> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
>>> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>> 
>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>>  (20 observations deleted due to missingness)
>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>> 
>>> I have produced all of these regressions myself and pasted them
>>> directly from the 'R' software package. My regression methodology is
>>> entirely my own along with the sourcing and preperation of the data
>>> used to produce these statistics.
>>> 
>>> I would be very grateful if you could provide my with some clarity as
>>> to why this output from 'R' is reading as plagiarism.
>>> 
>>> I would like to thank you in advance,
>>> 
>>> Kind regards,
>>> 
>>> Oliver Barrett
>>> (+44) 7341 834 217
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, 

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread John Kane
This is just guessing but the reason is probably that the regression output 
(not including the specific numbers, and your variable names) is standard R 
output as already noted.

It probably appears in many other theses and dissertations , in books on R, and 
possibly in appendices in published books and papers reporting research 
findings. 

It, or parts of it, may occur thousands of times on R-help and in R oriented 
blogs and other documents on the Web.  It quite likely shows up on Stack 
Overflow.

Here is one  example it took me about 2 minutes to find.
http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R%20for%20linear%20regression.pdf.
 And here's another http://www.princeton.edu/~otorres/Regression101R.pdf. 

Have a look at Julian Faraway's pdf "Practical Regression and Anova using R" 
book in the Contributed section of the R home site at pp -23-24. There it is 
again.

I think you probably should do a bit of on-line searching and a sweep of some 
of the Manuals and Contributed materials on the R site and point out to the 
powers that be that it is not plagiarism, it's just standard R reporting.of 
regression results.


John Kane
Kingston ON Canada


> -Original Message-
> From: oliver.barr...@skema.edu
> Sent: Tue, 22 Sep 2015 14:27:03 +
> To: r-help@r-project.org
> Subject: [R] 'R' Software Output Plagiarism
> 
> 
> Dear 'R' community support,
> 
> 
> I am a student at Skema business school and I have recently submitted my
> MSc thesis/dissertation. This has been passed on to an external
> plagiarism service provider, Urkund, who have scanned my document and
> returned a plagiarism report to my professor having detected 32%
> plagiarism.
> 
> 
> I have contacted Urkund regarding this issue having committed no such
> plagiarism and they have told me that all the plagiarism detected in my
> document comes from the last 25% which consists only of 'R' regressions
> like the one I have pasted below:
> 
> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
> Fed.t.4., data = OLS_CAR, x = TRUE)
> 
> Residuals:
>   Min1QMedian3Q   Max
> -0.154587 -0.015961  0.001429  0.017196  0.110907
> 
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.001630   0.001763  -0.925   0.3559
> Fed -0.121595   0.165359  -0.735   0.4627
> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
> Fed.t.2. 0.026529   0.143648   0.185   0.8536
> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Residual standard error: 0.0293 on 304 degrees of freedom
>   (20 observations deleted due to missingness)
> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
> 
> I have produced all of these regressions myself and pasted them directly
> from the 'R' software package. My regression methodology is entirely my
> own along with the sourcing and preperation of the data used to produce
> these statistics.
> 
> I would be very grateful if you could provide my with some clarity as to
> why this output from 'R' is reading as plagiarism.
> 
> I would like to thank you in advance,
> 
> Kind regards,
> 
> Oliver Barrett
> (+44) 7341 834 217
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Knitr problema con xtable

2015-09-22 Thread Rodrigo López Correa
Muchas gracias Javier, posiblemente sea eso.

Saludos,

Rodrigo.

El 18 de septiembre de 2015, 9:26, Javier Rubén Marcuzzi <
javier.ruben.marcu...@gmail.com> escribió:

> Posiblemente es la compilación. Hace un tiempo que no toco lyx, en algunas
> oportunidades es genial y en otras da trabajo. Pienso en lo siguiente, por
> ejemplo en latex para las referencias bibliográficas hay que compilar dos
> veces el mismo archivo (salvo una actualización que desconozco si existe).
> Lyx usa el compilador de latex, latex usa varios archivos para llegar al
> resultado, el ejemplo 1 tiene knitr. ¿Qué pasa si la superposición de capas
> hace que la compilación de problemas?
>
>
>
> Javier Rubén Marcuzzi
> Técnico en Industrias Lácteas
> Veterinario
>
>
>
>
>
>
> *De: *Rodrigo López Correa
> *Enviado: *lunes, 10 de agosto de 2015 16:20
> *Para: *R-help-es
> *Asunto: *[R-es] Knitr problema con xtable
>
>
>
>
>
> Hola soy medio nuevo en esto del uso del paquete knitr en R asociado a Lyx
>
> y tengo el siguiente problema:
>
>
>
> 1) Construí un script en R que incluye entre otras operaciones simples una
>
> tabla utilizando el paquete xtable para poder imprimirla luego con Lyx.
>
>
>
>
>
> ## @knitr q1
>
>
>
> library(knitr)
>
>
>
> library(RMySQL)
>
>
>
> library(xtable)
>
>
>
>
>
> #abro la base datos mysql
>
>
>
> con <- dbConnect(MySQL(),
>
>
>
>  user="XXX", password="xxx",
>
>
>
>  dbname="hol", host="xxx")
>
>
>
>
>
> uno<-dbGetQuery(con, "SELECT nhijl FROM resumen where leche;")
>
>
>
> quantile(uno$nhijl,probs=(c(0.25,0.5)))
>
>
>
> quantile(uno$nhijl,probs=0.5)
>
>
>
> quantile(uno$nhijl,probs=0.75)
>
>
>
> min(uno$nhijl)
>
>
>
> max(uno$nhijl)
>
>
>
>
>
> #tabla
>
>
>
>
> tabla_uno<-data.frame(Total_hijas=round((c(min(uno$nhijl),quantile(uno$nhijl,probs=(0.25)),quantile(uno$nhijl,probs=0.5),
>
> mean(uno$nhijl),quantile(uno$nhijl,probs=0.75),max(uno$nhijl)
>
>
>
> rownames(tabla_uno)<-(c("Mínimo","1er.cuartil", "Mediana", "Media",
>
> "3er.cuartil", "Máximo"))
>
>
>
> print(xtable(tabla_uno),floating=FALSE)
>
>
>
>
>
> 2) Desde Lyx quise leer el script en R:
>
>
>
> <<>>=
>
>
>
> read_chunk("descriptiva_resumen.R")
>
>
>
> @
>
>
>
> <>=
>
>
>
> @
>
>
>
> *Sin embargo, la exportación a un archivo pdf falla y probé las siguientes
>
> 2 opciones con resultados diferentes:*
>
>
>
> 3) Cuando elimino del chunk results="asis", puedo exportar a un pdf los
>
> resultados esperados con el script de R, excepto la tabla que no la puedo
>
> visualizar.
>
>
>
> Entonces las 2 opciones que seguí fueron:
>
>
>
> OPCION 3.1:
>
>
>
> - Cerré el archivo pdf
>
>
>
> - Luego volví a incluir results="asis" en el chunk original
>
>
>
> - Finalmente traté de exportarlo a un *nuevo archivo pdf*.
>
>
>
> Resultado: *Falló la exportación*
>
>
>
> OPTION 3.2
>
>
>
> *- Minimicé el archivo pdf* obtenido en el paso 3),
>
>
>
> - Luego volví a incluir results="asis" en el chunk original
>
>
>
> - Finalmente traté de *actualizar la salida del archivo pdf que había
>
> minimizado.*
>
>
>
>
>
> Resultado: *Obtuve el resultado correcto y completo de todo el script*
>
>
>
>
>
> *No entiendo que estoy haciendo mal, porque debería poder obtener el
>
> resultado de manera más directa incluyendo en el chunk original
>
> results""asis"  y sin tener que hacer tantos pasos. *
>
>
>
> Muchas gracias desde ya por cualquier ayuda!
>
>
>
> Saludos,
>
>
>
> Rodrigo.
>
>
>
> --
>
> *Dr. Rodrigo López Correa.*
>
>
>
> Miguel Barreiro 3186.
>
> Montevideo.
>
> Uruguay.
>
> Cel: 099 660 549.
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> ___
>
> R-help-es mailing list
>
> R-help-es@r-project.org
>
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>
>
>
>



-- 
*Dr. Rodrigo López Correa.*

Miguel Barreiro 3186.
Montevideo.
Uruguay.
Cel: 099 660 549.

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Mitchell Maltenfort
Isn't plagiarism detection based on overlaps with sentence structure?
That way, it would catch plagiarism if someone simply did a
find-and-replace. But that would also catch regressions with the same
output format.

How long was the original thesis?  If 25% of it was all regression
output, sounds like a lot of regressions.



On Tue, Sep 22, 2015 at 4:06 PM, peter dalgaard  wrote:
> Marc,
>
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
>
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.).
>
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
>
> -pd
>
>> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>>  wrote:

 Dear 'R' community support,


 I am a student at Skema business school and I have recently submitted my 
 MSc thesis/dissertation. This has been passed on to an external plagiarism 
 service provider, Urkund, who have scanned my document and returned a 
 plagiarism report to my professor having detected 32% plagiarism.


 I have contacted Urkund regarding this issue having committed no such 
 plagiarism and they have told me that all the plagiarism detected in my 
 document comes from the last 25% which consists only of 'R' regressions 
 like the one I have pasted below:

 lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
   Fed.t.4., data = OLS_CAR, x = TRUE)

 Residuals:
 Min1QMedian3Q   Max
 -0.154587 -0.015961  0.001429  0.017196  0.110907

 Coefficients:
Estimate Std. Error t value Pr(>|t|)
 (Intercept) -0.001630   0.001763  -0.925   0.3559
 Fed -0.121595   0.165359 

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Marc Schwartz
Peter,

Great distinction. 

I was leaning in the direction that the "look and feel" of the output (standard 
wording, table structure, column headings, significance stars and so forth in 
the output) is similar to whatever Urkund is using as the basis for the 
comparison and less so on an exact replication (covariates, coefficients, 
etc.), or nearly so, of prior work.

Thanks,

Marc


> On Sep 22, 2015, at 3:06 PM, peter dalgaard  wrote:
> 
> Marc,
> 
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
> 
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.). 
> 
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
> 
> -pd
> 
>> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
>> 
>> Hi,
>> 
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>> 
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>> 
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>> 
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>> 
>> If R is uncredited, I could see them raising the issue.
>> 
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 
>> 
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>>> 
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>> 
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>> 
>>> Cheers,
>>> Bert
>>> 
>>> 
>>> Bert Gunter
>>> 
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>> -- Clifford Stoll
>>> 
>>> 
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>>  wrote:
 
 Dear 'R' community support,
 
 
 I am a student at Skema business school and I have recently submitted my 
 MSc thesis/dissertation. This has been passed on to an external plagiarism 
 service provider, Urkund, who have scanned my document and returned a 
 plagiarism report to my professor having detected 32% plagiarism.
 
 
 I have contacted Urkund regarding this issue having committed no such 
 plagiarism and they have told me that all the plagiarism detected in my 
 document comes from the last 25% which consists only of 'R' regressions 
 like the one I have pasted below:
 
 lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
  Fed.t.4., data = OLS_CAR, x = TRUE)
 
 Residuals:
Min1QMedian3Q   Max
 -0.154587 -0.015961  0.001429  0.017196  0.110907
 
 Coefficients:
   Estimate Std. Error t value Pr(>|t|)

Re: [R] retaining characters in a csv file

2015-09-22 Thread Duncan Murdoch
On 22/09/2015 7:19 PM, peter dalgaard wrote:
> 
>> On 23 Sep 2015, at 00:33 , Rolf Turner  wrote:
>>
> 
> [read.csv() doesn't distinguish "123.4" from 123.4]
> 
>> IMHO this is a bug in read.csv().
>>
> 
> Dunno about that:
> 
> pd$ cat ~/tmp/junk.csv 
> "1";1
> 2;"2"
> pd$ open !$
> open ~/tmp/junk.csv
> 
> And lo and behold, Excel opens with 
> 
> 1 1
> 2 2
> 
> and all cells numeric.
> 
> I don't think the CSV standard (if there is one...) specifies that quoted 
> strings are necessarily text.

It specifically does not.  Quotes allow commas and spaces to be ignored
as column separators.  That's all.  They say nothing about the type of data.

Duncan Murdoch


> 
> I think we have been here before, and found that even if we decide that it is 
> a bug (or misfeature), it would be hard to change, because the modus operandi 
> of read.* is to first read everything as character and _then_ see (in 
> type.convert()) which entries can be converted to numeric, logical, etc.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread Duncan Murdoch
On 22/09/2015 6:00 PM, Therneau, Terry M., Ph.D. wrote:
> I have a csv file from an automatic process (so this will happen thousands of 
> times), for 
> which the first row is a vector of variable names and the second row often 
> starts 
> something like this:
> 
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .
> 
> Notice the second variable which is
>a character string (note the quotation marks)
>a sequence of numeric digits
>leading zeros are significant
> 
> The read.csv function insists on turning this into a numeric. 

No it doesn't.  All you need to do is specify colClasses and it will
follow your instructions.


 Is there any simple set of
> options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the 
> bloody quotes" 
> -- I still want the first, third, etc columns to become numeric.  There can 
> be more than 
> one variable like this, and not always in the second position.

No, because the bloody quotes are part of the "csv standard".  They
aren't meaningful.

If you don't know what the data is, that's your fault.  You shouldn't be
analyzing data when you are so ignorant.

Duncan Murdoch

> This happens deep inside the httr library; there is an easy way for me to add 
> more options 
> to the read.csv call but it is not so easy to replace it with something else.
> 
> Terry T
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread Duncan Murdoch
On 22/09/2015 6:33 PM, Rolf Turner wrote:
> On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:
>> I have a csv file from an automatic process (so this will happen
>> thousands of times), for which the first row is a vector of variable
>> names and the second row often starts something like this:
>>
>> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .
>>
>> Notice the second variable which is
>>a character string (note the quotation marks)
>>a sequence of numeric digits
>>leading zeros are significant
>>
>> The read.csv function insists on turning this into a numeric.  Is there
>> any simple set of options that
>> will turn this behavior off?  I'm looking for a way to tell it to "obey
>> the bloody quotes" -- I still want the first, third, etc columns to
>> become numeric.  There can be more than one variable like this, and not
>> always in the second position.
>>
>> This happens deep inside the httr library; there is an easy way for me
>> to add more options to the read.csv call but it is not so easy to
>> replace it with something else.
> 
> IMHO this is a bug in read.csv().

No, it's a bug in "Rolf Turner", who believes in fairies at the end of
his garden, rather than in documentation for file formats.

Duncan Murdoch

> 
> A possible workaround:
> 
> ccc <- c("integer","character",rep(NA,k))
> X   <- read.csv("melvin.csv",colClasses=ccc)
> 
> where "melvin.csv" is the file from which you are attempting to read and
> where k+2 = the number of columns in that file.
> 
> Kludgey, but it might work.
> 
> Another workaround is to specify quote="", but this has the side effect
> of making the 5th column character rather than logical.
> 
> cheers,
> 
> Rolf
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: 'R' Software Output Plagiarism

2015-09-22 Thread Rolf Turner


RIGHT ON!!!  I concur most heartily with the sentiments expressed by Duncan.

cheers,

Rolf

On 23/09/15 12:33, Duncan Murdoch wrote:


On 22/09/2015 4:06 PM, peter dalgaard wrote:

Marc,

I don't think Copyright/Intellectual property issues factor into
this. Urkund and similar tools are to my knowledge entirely about
plagiarism. So the issue would seem to be that the R output is
considered identical or nearly indentical to R output in other
published orotherwise  submitted material.

What puzzles me (except for how a document can be deemed 32%
plagiarized in 25% of the text) is whether this includes the
numbers and variable names. If those are somehow factored out, then
any R regression could be pretty much identical to any other R
regression. However, two analyses with similar variable names could
happen if they are based on the same cookbook recipe and analyses
with similar numerical output come from analyzing the same standard
data. Such situations would not necessarily be considered
plagiarism (I mean: If you claim that you are analyzing data from
experiments that you yourself have performed, and your numbers are
exactly identical to something that has been previously published,
then it would be suspect. If you analyze something from public
sources, someone else might well have done the same thing.).


I don't see why this puzzles you.  A simple explanation is that Urkund
is incompetent.

Many companies that sell software to university administrations are
incompetent, because the buyers have been promoted so far beyond their
competence that they'll buy anything if it is expensive enough.

This isn't uncommon.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread Rolf Turner

On 23/09/15 11:19, peter dalgaard wrote:



On 23 Sep 2015, at 00:33 , Rolf Turner  wrote:



[read.csv() doesn't distinguish "123.4" from 123.4]


IMHO this is a bug in read.csv().



Dunno about that:

pd$ cat ~/tmp/junk.csv
"1";1
2;"2"
pd$ open !$
open ~/tmp/junk.csv

And lo and behold, Excel opens with

1 1
2 2

and all cells numeric.


I would say that this phenomenon ("Excel does it") is *overwhelming* 
evidence that it is bad practice!!! :-)



I don't think the CSV standard (if there is one...) specifies that
quoted strings are necessarily text.


Duncan Murdoch has pointed out that this is definitely *not* the case.


I think we have been here before, and found that even if we decide
that it is a bug (or misfeature), it would be hard to change, because
the modus operandi of read.* is to first read everything as character
and _then_ see (in type.convert()) which entries can be converted to
numeric, logical, etc.


As Arunkumar Srinivasan has pointed out, fread() from the data.table 
package can handle this, so it is *not impossible*.


cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread Rolf Turner


On 23/09/15 12:48, Duncan Murdoch wrote:


On 22/09/2015 6:33 PM, Rolf Turner wrote:





IMHO this is a bug in read.csv().


No, it's a bug in "Rolf Turner", who believes in fairies at the end of
his garden, rather than in documentation for file formats.


Naturally, I beg to differ.

The documentation for read.csv() refers to a quote character.  Nowhere 
does it mention that quotes only serve to keep commas and white space 
from being interpreted as delimiters.  The usual meaning of quotes in R 
is to enclose character strings and so it is a reasonable assumption 
that this would be their function in this instance.


Before you fly off into some idiotic rant about how one "should never 
make assumptions" consider the fact that if one made no assumptions at 
all one could not get out of bed in the morning.  One has to assume that 
the documentation is reasonably consistent and that any serious 
inconsistencies are drawn to the user's attention.  If one had to read 
the (entire) documentation for each system called upon by a given piece 
of software (apply recursively!) then one would spend one's entire life 
reading documentation and never get any work done.


Although I most definitely do not believe in fairies at the bottom of my 
garden, I am the first to admit that I am not all that bright and could 
have erringly missed something.  HOWEVER Terry Therneau was flummoxed by 
the quirky and counter-intuitive nature of quotes in read.csv(), and Dr. 
Therneau is very bright indeed.


So the fault is not in the user/reader but in the function and its 
documentation.


cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Rolf Turner


On 23/09/15 13:39, John Sorkin wrote:


Charles, I am not sure the answer to me question, given a dataset,
how can one compare the fit of a model of the fits the data to a
mixture of two normal distributions to the fit of a model that uses a
single normal distribution, can be based on the glm model you
suggest.

I have used normalmixEM to fit the data to a mixture of two normal
curves. The model estimates four (perhaps five) parameters: mu1, sd^2
1, mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing
proportion may not need to be estimated, it may be determined once
once specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data
to a model that contains only the mean, and estimates 2 parameters
mu0 and sd0^2.  I am not sure that your model and mine can be
considered to be nested. If I am correct I can't compare the log
likelihood values from the two models. I  may be wrong. If I am, I
should be able to perform a log likelihood test with 2 (or 3, I am
not sure which) DFs. Are you suggesting the models are nested? If so,
should I use 3 or 2 DFs?


You are quite correct; there are subtleties involved here.

The one-component model *is* nested in the two-component model, but is 
nested "ambiguously".


(1) The null (single component) model for a mixture distribution is 
ill-defined.  Note that a single component could be achieved either by 
setting the mixing probabilities equal to (1,0) or (0,1) or by setting

mu_1 = mu_2 and sigma_1 = sigma_2.


(2) However you slice it, the parameter values corresponding to the null 
model fall on the *boundary* of the parameter space.


(3) Consequently the asymptotics go to hell in a handcart and the 
likelihood ratio statistic, however you specify the null model, does not 
have an asymptotic chi-squared distribution.


(4) I have a vague idea that there are ways of obtaining a valid 
asymptotic null distribution for the LRT but I am not sufficiently 
knowledgeable to provide any guidance here.


(5) You might be able to gain some insight from delving into the 
literature --- a reasonable place to start would be with "Finite Mixture 
Models" by McLachlan and Peel:


@book{mclachlan2000finite,
  title={Finite Mixture Models, Wiley Series in
 Probability and Statistics},
  author={McLachlan, G and Peel, D},
  year={2000},
  publisher={John Wiley \& Sons, New York}
}

(6) My own approach would be to do "parametric bootstrapping":

* fit (to the real data) the null model and calculate
  the log-likelihood L1, any way you like
* fit the full model and determine the log-likelihood L2
* form the test statistic LRT = 2*(L2 - L1)
* simulate data sets from the fitted parameters for the null model
* for each such simulate data set calculate a test statistic in the
  foregoing manner, obtaining LRT^*_1, ..., LRT^*_N
* the p-value for your test is then

  p = (m+1)/(N+1)

  where m = the number of LRT^*_i values that greater than LRT

The factor of 2 is of course completely unnecessary.  I just put it in 
"by analogy" with the "real", usual, likelihood ratio statistic.


Note that this p-value is *exact* (not an approximation!) --- for any 
value of N --- when interpreted with respect to the "total observation
procedure" of observing both the real and simulated data.  (But see 
below.) That is, the probability, under the null hypothesis, of 
observing a test statistic "as extreme as" what you actually observed is 
*exactly* (m+1)/(N+1).  See e.g.:


@article{Barnard1963,
author = {G. A. Barnard},
title  = {Discussion of ``{T}he spectral analysis of point processes'' 
by {M}. {S}. {B}artlett},

journal = {J. Royal Statist. Soc.},
series  = {B},
volume  = {25},
year = {1963},
pages = {294}
}

or

@article{Hope1968,
author =  {A.C.A. Hope},
title =  {A simplified {M}onte {C}arlo significance test procedure},
journal =  {Journal of the Royal Statistical Society, series {B}},
year =  1968,
volume = 30,
pages = {582--598}
}

Taking N=99 (or 999) is arithmetically convenient.

However I exaggerate when I say that the p-value is exact.  It would be 
exact if you *knew* the parameters of the null model.  Since you have to 
estimate these parameters the test is (a bit?) conservative.  Note that 
the conservatism would be present even if you eschewed the "exact" test 
and an "approximate" test using a (very) large value of N.


Generally conservatism (in this context! :-) ) is deemed to be no bad thing.

cheers,

Rolf Turner

P. S.  I think that the mixing parameter must *always* be estimated. 
I.e. even if you knew mu_1, mu_2, sigma_1 and sigma_2 you would still 
have to estimate "lambda".  So you have 5 parameters in your full model. 
 Not that this is particularly relevant.


R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the 

Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
Charles,
I am not sure the answer to me question, given a dataset, how can one compare 
the fit of a model of the fits the data to a mixture of two normal 
distributions to the fit of a model that uses a single normal distribution, can 
be based on the glm model you suggest. 


I have used normalmixEM to fit the data to a mixture of two normal curves. The 
model estimates four (perhaps five) parameters: mu1, sd^2 1, mu2, sd^2, (and 
perhaps lambda, the mixing proportion. The mixing proportion may not need to be 
estimated, it may be determined once once specifies mu1, sd^2 1, mu2, and 
sd^2.) Your model fits the data to a model that contains only the mean, and 
estimates 2 parameters mu0 and sd0^2.  I am not sure that your model and mine 
can be considered to be nested. If I am correct I can't compare the log 
likelihood values from the two models. I  may be wrong. If I am, I should be 
able to perform a log likelihood test with 2 (or 3, I am not sure which) DFs. 
Are you suggesting the models are nested? If so, should I use 3 or 2 DFs?


May thanks,
John





John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> "Charles C. Berry"  09/22/15 6:23 PM >>>
On Tue, 22 Sep 2015, John Sorkin wrote:

>
> In any event, I still don't know how to fit a single normal distribution 
> and get a measure of fit e.g. log likelihood.
>

Gotta love R:

> y <- rnorm(10)
> logLik(glm(y~1))
'log Lik.' -17.36071 (df=2)

HTH,

Chuck





Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Bert Gunter
Two normals will **always** be a better fit than one, as the latter
must be a subset of the former (with identical parameters for both
normals).

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
 wrote:
> I have data that may be the mixture of two normal distributions (one 
> contained within the other) vs. a single normal.
> I used normalmixEM to get estimates of parameters assuming two normals:
>
>
> GLUT <- scale(na.omit(data[,"FCW_glut"]))
> GLUT
> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> summary(mixmdl)
> plot(mixmdl,which=2)
> lines(density(data[,"GLUT"]), lty=2, lwd=2)
>
>
>
>
>
> summary of normalmixEM object:
>comp 1   comp 2
> lambda  0.7035179 0.296482
> mu -0.0592302 0.140545
> sigma   1.1271620 0.536076
> loglik at estimate:  -110.8037
>
>
>
> I would like to see if the two normal distributions are a better fit that one 
> normal. I have two problems
> (1) normalmixEM does not seem to what to fit a single normal (even if I 
> address the error message produced):
>
>
>> mixmdl = normalmixEM(GLUT,k=1)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>
>
>
> (2) Even if I had the loglik from a single normal, I am not sure how many DFs 
> to use when computing the -2LL ratio test.
>
>
> Any suggestions for comparing the two-normal vs. one normal distribution 
> would be appreciated.
>
>
> Thanks
> John
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:12}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Mark Leeds
That's true but if he uses some AIC or BIC criterion that penalizes the
number of parameters,
then he might see something else ? This ( comparing mixtures to not
mixtures ) is not something I deal with so I'm just throwing it out there.




On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:

> Two normals will **always** be a better fit than one, as the latter
> must be a subset of the former (with identical parameters for both
> normals).
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>-- Clifford Stoll
>
>
> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
>  wrote:
> > I have data that may be the mixture of two normal distributions (one
> contained within the other) vs. a single normal.
> > I used normalmixEM to get estimates of parameters assuming two normals:
> >
> >
> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
> > GLUT
> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> > summary(mixmdl)
> > plot(mixmdl,which=2)
> > lines(density(data[,"GLUT"]), lty=2, lwd=2)
> >
> >
> >
> >
> >
> > summary of normalmixEM object:
> >comp 1   comp 2
> > lambda  0.7035179 0.296482
> > mu -0.0592302 0.140545
> > sigma   1.1271620 0.536076
> > loglik at estimate:  -110.8037
> >
> >
> >
> > I would like to see if the two normal distributions are a better fit
> that one normal. I have two problems
> > (1) normalmixEM does not seem to what to fit a single normal (even if I
> address the error message produced):
> >
> >
> >> mixmdl = normalmixEM(GLUT,k=1)
> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
> k,  :
> >   arbmean and arbvar cannot both be FALSE
> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
> k,  :
> >   arbmean and arbvar cannot both be FALSE
> >
> >
> >
> > (2) Even if I had the loglik from a single normal, I am not sure how
> many DFs to use when computing the -2LL ratio test.
> >
> >
> > Any suggestions for comparing the two-normal vs. one normal distribution
> would be appreciated.
> >
> >
> > Thanks
> > John
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > John David Sorkin M.D., Ph.D.
> > Professor of Medicine
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone) 410-605-7119
> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> >
> >
> > Confidentiality Statement:
> > This email message, including any attachments, is for ...{{dropped:12}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
Bert,Better, perhaps, but will something like the LR test be significant? 
Adding an extra parameter to a linear regression almost always improves the R2, 
the if one compares models, the model with the extra parameter is not always 
significantly better.
John
P.S. Please forgive the appeal to "significantly better" . . .


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> Bert Gunter  09/22/15 4:30 PM >>>
Two normals will **always** be a better fit than one, as the latter
must be a subset of the former (with identical parameters for both
normals).

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
 wrote:
> I have data that may be the mixture of two normal distributions (one 
> contained within the other) vs. a single normal.
> I used normalmixEM to get estimates of parameters assuming two normals:
>
>
> GLUT <- scale(na.omit(data[,"FCW_glut"]))
> GLUT
> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> summary(mixmdl)
> plot(mixmdl,which=2)
> lines(density(data[,"GLUT"]), lty=2, lwd=2)
>
>
>
>
>
> summary of normalmixEM object:
>comp 1   comp 2
> lambda  0.7035179 0.296482
> mu -0.0592302 0.140545
> sigma   1.1271620 0.536076
> loglik at estimate:  -110.8037
>
>
>
> I would like to see if the two normal distributions are a better fit that one 
> normal. I have two problems
> (1) normalmixEM does not seem to what to fit a single normal (even if I 
> address the error message produced):
>
>
>> mixmdl = normalmixEM(GLUT,k=1)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
>   arbmean and arbvar cannot both be FALSE
>
>
>
> (2) Even if I had the loglik from a single normal, I am not sure how many DFs 
> to use when computing the -2LL ratio test.
>
>
> Any suggestions for comparing the two-normal vs. one normal distribution 
> would be appreciated.
>
>
> Thanks
> John
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> Confidentiality Statement:
> This email message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized use, disclosure or distribution is prohibited. 
> If you are not the intended recipient, please contact the sender by reply 
> email and destroy all copies of the original message.
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


Call
Send SMS
Call from mobile
Add to Skype
You'll need Skype CreditFree via Skype


Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Error from lme4: "Error: (p <- ncol(X)) == ncol(Y) is not TRUE"

2015-09-22 Thread Rolf Turner


On 23/09/15 01:18, Rory Wilson wrote:


Hello all, I am trying to run a random intercept model using lme4.
The random effect is a factor of 29 possibilities, making a model
with one random effect (one level). It is just a linear model. There
are 713 observations. However, when trying to run the model I receive
the error "Error: (p <- ncol(X)) == ncol(Y) is not TRUE", a search
for which reveals somewhat surprisingly little. Has anyone seen this
before? Note that if I simply change the random effect into a fixed
effect and use lm, the model works perfectly.Thank you!


[Caveat:  I really find the syntax of lmer() incomprehensible, so my 
example below could be a load of dingos' kidneys.]


I think a reproducible example (as specified by the posting guide) is 
needed here.  When I do:


set.seed(42)
f <- factor(sample(1:29,713,TRUE))
x <- seq(0,1,length=713)
y <- rnorm(713)
require(lme4)
fit <- lmer(y ~ x + (1|f))

I get a reasonable (???) looking result and no error messages.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] error in mlogit.optim

2015-09-22 Thread Rolf Turner

On 23/09/15 03:33, Alaa Sindi wrote:

Hi all

I hope you are doing well.

I am trying to install and use mlogit.optim and getting this error.

Error: could not find function “mlogit.optim"

Warning in install.packages : unable to access index for repository
https://cran.rstudio.com/src/contrib
 Warning in install.packages :
unable to access index for repository
https://cran.rstudio.com/src/contrib
 Warning in install.packages :
package ‘mlogit.optim’ is not available (for R version 3.2.2) Warning
in install.packages : unable to access index for repository
https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.2



I see that the out-patients are out in force tonight.

You need to install (and then load) the ***mlogit*** package.  The 
function mlogit.optim() is a function in this package.  Learn to use 
search tools.  Better still, learn to think.


cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vector manipulations -- differences

2015-09-22 Thread Frank Schwidom

And if we want to use the approach of William Dunlap for sequence.optimization
, then we can write:

rev( xr[ seq_len(sum(vec)) - rep.int(cumsum(c(0L, vec[-length(vec)])), vec)] - 
rep.int( xr[ -1], vec))

Regards.

On 2015-09-22 23:43:10, Frank Schwidom wrote:
> Hi,
> 
> xr <- rev( x)
> vec <- 1:(length( x) - 1)
> rev( xr[ sequence( vec)] - rep.int( xr[ -1], vec))
> 
> 
> On 2015-09-21 14:17:40, Dan D wrote:
> > I need an efficient way to build a new n x (n-1)/2 vector from an n-vector x
> > as:
> > 
> > c(x[-1]-x[1], x[-(1:2)]-x[2], ... , x[-(1:(n-1)] - x[n-1])
> > 
> > x is increasing with x[1] = 0. 
> > 
> > The following works but is not the greatest:
> > junk<-outer(x, x, '-')
> > junk[junk>0]
> > 
> > e.g., 
> > given
> > x<-c(0, 3, 7, 20)
> > junk<-outer(x, x, '-')
> > junk[junk>0] # yields: c(3, 7, 20, 4, 17, 13) as needed, but it has to go
> > through 
> > junk
> > # [,1] [,2] [,3] [,4]
> > #[1,]0   -3   -7  -20
> > #[2,]30   -4  -17
> > #[3,]740  -13
> > #[4,]   20   17   130
> > 
> > Anyone have a better idea?
> > 
> > -Dan
> > 
> > 
> > 
> > --
> > View this message in context: 
> > http://r.789695.n4.nabble.com/vector-manipulations-differences-tp4712575.html
> > Sent from the R help mailing list archive at Nabble.com.
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread Hadley Wickham
The problem is that quotes in csv files are commonly held to me
meaningless (i.e. they don't automatically force components to be
strings).

Earlier this morning I committed a fix to readr so that numbers
starting with a sequence of zeros are read as character strings. You
may want to try out the dev version: https://github.com/hadley/readr.

Hadley

On Tue, Sep 22, 2015 at 5:00 PM, Therneau, Terry M., Ph.D.
 wrote:
> I have a csv file from an automatic process (so this will happen thousands
> of times), for which the first row is a vector of variable names and the
> second row often starts something like this:
>
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .
>
> Notice the second variable which is
>   a character string (note the quotation marks)
>   a sequence of numeric digits
>   leading zeros are significant
>
> The read.csv function insists on turning this into a numeric.  Is there any
> simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the
> bloody quotes" -- I still want the first, third, etc columns to become
> numeric.  There can be more than one variable like this, and not always in
> the second position.
>
> This happens deep inside the httr library; there is an easy way for me to
> add more options to the read.csv call but it is not so easy to replace it
> with something else.
>
> Terry T



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
I have data that may be the mixture of two normal distributions (one contained 
within the other) vs. a single normal. 
I used normalmixEM to get estimates of parameters assuming two normals:


GLUT <- scale(na.omit(data[,"FCW_glut"]))
GLUT
mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
summary(mixmdl)
plot(mixmdl,which=2)
lines(density(data[,"GLUT"]), lty=2, lwd=2)





summary of normalmixEM object:
   comp 1   comp 2
lambda  0.7035179 0.296482
mu -0.0592302 0.140545
sigma   1.1271620 0.536076
loglik at estimate:  -110.8037 



I would like to see if the two normal distributions are a better fit that one 
normal. I have two problems 
(1) normalmixEM does not seem to what to fit a single normal (even if I address 
the error message produced):


> mixmdl = normalmixEM(GLUT,k=1)
Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  : 
  arbmean and arbvar cannot both be FALSE
> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  : 
  arbmean and arbvar cannot both be FALSE



(2) Even if I had the loglik from a single normal, I am not sure how many DFs 
to use when computing the -2LL ratio test. 


Any suggestions for comparing the two-normal vs. one normal distribution would 
be appreciated.


Thanks
John









John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 


Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Randomness tests

2015-09-22 Thread Giorgio Garziano
Hi,

to test randomness of time series whose values can only be +1 and -1, are all 
following
randomness tests applicable or only a part of ?

cox.stuart.test
difference.sign.test
bartels.rank.test
rank.test
runs.test

Tests provided by the randtests R package.

Thanks.

Giorgio Garziano



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] c(1:n, 1:(n-1), 1:(n-2), ... , 1)

2015-09-22 Thread Frank Schwidom
Hi

I have to correct myself, this last solution is not universally valid

here a better one:

tmp1 <- ( 1 - outer( max( x):1, x, FUN='-'))
tmp1[ tmp1 > 0]


On 2015-09-17 21:06:30, Frank Schwidom wrote:
> 
> how abount a more complicated one?
> 
> outer( 1:5, 1:5, '-')[ outer( 1:5, 1:5, '>')]
>  [1] 1 2 3 4 1 2 3 1 2 1
> 
> 
> On Thu, Sep 17, 2015 at 11:52:27AM -0700, David Winsemius wrote:
> > You can add this to the list of options to be tested, although my bet would 
> > be placed on `sequence(5:1)`:
> > 
> > > Reduce( function(x,y){c( 1:y, x)}, 1:5)
> >  [1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1
> > 
> > 
> > On Sep 17, 2015, at 11:40 AM, Achim Zeileis wrote:
> > 
> > > On Thu, 17 Sep 2015, Peter Langfelder wrote:
> > > 
> > >> Not sure if this is slicker or easier to follow than your solution,
> > >> but it is shorter :)
> > >> 
> > >> do.call(c, lapply(n:1, function(n1) 1:n1))
> > > 
> > > Also not sure about efficiency but somewhat shorter...
> > > unlist(lapply(5:1, seq))
> > > 
> > >> Peter
> > >> 
> > >> On Thu, Sep 17, 2015 at 11:19 AM, Dan D  wrote:
> > >>> Can anyone think of a slick way to create an array that looks like 
> > >>> c(1:n,
> > >>> 1:(n-1), 1:(n-2), ... , 1)?
> > >>> 
> > >>> The following works, but it's inefficient and a little hard to follow:
> > >>> n<-5
> > >>> junk<-array(1:n,dim=c(n,n))
> > >>> junk[((lower.tri(t(junk),diag=T)))[n:1,]]
> > >>> 
> > >>> Any help would be greatly appreciated!
> > >>> 
> > >>> -Dan
> > >>> 
> > >>> 
> > 
> > David Winsemius
> > Alameda, CA, USA
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
I am not sure AIC or BIC would be needed as the two normal distribution has at 
least two additional parameters to estimate; mean 1, var1, mean 2, var 2 where 
as the one normal has to estimate only var1 and var2.In any event, I don't know 
how to fit the single normal and get values for the loglik let alone AIC or BIC
John



John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> Mark Leeds  09/22/15 4:36 PM >>>
That's true but if he uses some AIC or BIC criterion that penalizes the number 
of parameters,

then he might see something else ? This ( comparing mixtures to not mixtures ) 
is not something I deal with so I'm just throwing it out there.






On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:
Two normals will **always** be a better fit than one, as the latter
 must be a subset of the former (with identical parameters for both
 normals).
 
 Cheers,
 Bert
 
 
 Bert Gunter
 
 "Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom."
-- Clifford Stoll
 
 
 On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
  wrote:
 > I have data that may be the mixture of two normal distributions (one 
 > contained within the other) vs. a single normal.
 > I used normalmixEM to get estimates of parameters assuming two normals:
 >
 >
 > GLUT <- scale(na.omit(data[,"FCW_glut"]))
 > GLUT
 > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
 > summary(mixmdl)
 > plot(mixmdl,which=2)
 > lines(density(data[,"GLUT"]), lty=2, lwd=2)
 >
 >
 >
 >
 >
 > summary of normalmixEM object:
 >comp 1   comp 2
 > lambda  0.7035179 0.296482
 > mu -0.0592302 0.140545
 > sigma   1.1271620 0.536076
 > loglik at estimate:  -110.8037
 >
 >
 >
 > I would like to see if the two normal distributions are a better fit that 
 > one normal. I have two problems
 > (1) normalmixEM does not seem to what to fit a single normal (even if I 
 > address the error message produced):
 >
 >
 >> mixmdl = normalmixEM(GLUT,k=1)
 > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
 >   arbmean and arbvar cannot both be FALSE
 >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
 > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k = k,  :
 >   arbmean and arbvar cannot both be FALSE
 >
 >
 >
 > (2) Even if I had the loglik from a single normal, I am not sure how many 
 > DFs to use when computing the -2LL ratio test.
 >
 >
 > Any suggestions for comparing the two-normal vs. one normal distribution 
 > would be appreciated.
 >
 >
 > Thanks
 > John
 >
 >
 >
 >
 >
 >
 >
 >
 >
 > John David Sorkin M.D., Ph.D.
 > Professor of Medicine
 > Chief, Biostatistics and Informatics
 > University of Maryland School of Medicine Division of Gerontology and 
 > Geriatric Medicine
 > Baltimore VA Medical Center
 > 10 North Greene Street
 > GRECC (BT/18/GR)
 > Baltimore, MD 21201-1524
 > (Phone) 410-605-7119410-605-7119
 > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
 >
 >
 > Confidentiality Statement:
 

> This email message, including any attachments, is for ...{{dropped:12}}
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






Call
Send SMS
Call from mobile
Add to Skype
You'll need Skype CreditFree via Skype



Confidentiality Statement:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized use, disclosure or distribution is prohibited. If you are not 
the intended recipient, please contact the sender by reply email and destroy 
all copies of the original message. 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Bert Gunter
I'll be brief in my reply to you both, as this is off topic.

So what?  All this statistical stuff is irrelevant baloney(and of
questionable accuracy, since based on asymptotics and strong
assumptions, anyway) . The question of interest is whether a mixture
fit better suits the context, which only the OP knows and which none
of us can answer.

I know that many will disagree with this -- maybe a few might agree --
but please send all replies, insults, praise, and learned discourse to
me privately,  as I have already occupied more space on the list than
I should.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
> That's true but if he uses some AIC or BIC criterion that penalizes the
> number of parameters,
> then he might see something else ? This ( comparing mixtures to not mixtures
> ) is not something I deal with so I'm just throwing it out there.
>
>
>
>
> On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:
>>
>> Two normals will **always** be a better fit than one, as the latter
>> must be a subset of the former (with identical parameters for both
>> normals).
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>-- Clifford Stoll
>>
>>
>> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
>>  wrote:
>> > I have data that may be the mixture of two normal distributions (one
>> > contained within the other) vs. a single normal.
>> > I used normalmixEM to get estimates of parameters assuming two normals:
>> >
>> >
>> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
>> > GLUT
>> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > summary(mixmdl)
>> > plot(mixmdl,which=2)
>> > lines(density(data[,"GLUT"]), lty=2, lwd=2)
>> >
>> >
>> >
>> >
>> >
>> > summary of normalmixEM object:
>> >comp 1   comp 2
>> > lambda  0.7035179 0.296482
>> > mu -0.0592302 0.140545
>> > sigma   1.1271620 0.536076
>> > loglik at estimate:  -110.8037
>> >
>> >
>> >
>> > I would like to see if the two normal distributions are a better fit
>> > that one normal. I have two problems
>> > (1) normalmixEM does not seem to what to fit a single normal (even if I
>> > address the error message produced):
>> >
>> >
>> >> mixmdl = normalmixEM(GLUT,k=1)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >
>> >
>> >
>> > (2) Even if I had the loglik from a single normal, I am not sure how
>> > many DFs to use when computing the -2LL ratio test.
>> >
>> >
>> > Any suggestions for comparing the two-normal vs. one normal distribution
>> > would be appreciated.
>> >
>> >
>> > Thanks
>> > John
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > John David Sorkin M.D., Ph.D.
>> > Professor of Medicine
>> > Chief, Biostatistics and Informatics
>> > University of Maryland School of Medicine Division of Gerontology and
>> > Geriatric Medicine
>> > Baltimore VA Medical Center
>> > 10 North Greene Street
>> > GRECC (BT/18/GR)
>> > Baltimore, MD 21201-1524
>> > (Phone) 410-605-7119
>> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>> >
>> >
>> > Confidentiality Statement:
>> > This email message, including any attachments, is for ...{{dropped:12}}
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vector manipulations -- differences

2015-09-22 Thread Frank Schwidom
Hi,

xr <- rev( x)
vec <- 1:(length( x) - 1)
rev( xr[ sequence( vec)] - rep.int( xr[ -1], vec))


On 2015-09-21 14:17:40, Dan D wrote:
> I need an efficient way to build a new n x (n-1)/2 vector from an n-vector x
> as:
> 
> c(x[-1]-x[1], x[-(1:2)]-x[2], ... , x[-(1:(n-1)] - x[n-1])
> 
> x is increasing with x[1] = 0. 
> 
> The following works but is not the greatest:
> junk<-outer(x, x, '-')
> junk[junk>0]
> 
> e.g., 
> given
> x<-c(0, 3, 7, 20)
> junk<-outer(x, x, '-')
> junk[junk>0] # yields: c(3, 7, 20, 4, 17, 13) as needed, but it has to go
> through 
> junk
> # [,1] [,2] [,3] [,4]
> #[1,]0   -3   -7  -20
> #[2,]30   -4  -17
> #[3,]740  -13
> #[4,]   20   17   130
> 
> Anyone have a better idea?
> 
> -Dan
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/vector-manipulations-differences-tp4712575.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in eval(expr, envir, enclos) : could not find function

2015-09-22 Thread William Dunlap
You left out the rest of the error message (the name of the function
it is looking for is key):
> lm(Y ~ nosuchfuncion(X), data=data.frame(Y=1:10,X=log(1:10)))
Error in eval(expr, envir, enclos) :
   could not find function "nosuchfuncion"
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Sep 22, 2015 at 12:07 PM, Alaa Sindi  wrote:
> hi all
>
> I am getting this error "Error in eval(expr, envir, enclos) : could not find 
> function “
>
> do you have an idea what might cause this problem.
>
> thanks
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weighted Ridge Regression with GCV Optimization

2015-09-22 Thread Preetam Pal
Hi R-users,

I am having problems while implementing the following model:

   1. I have numerical regressors (GDP, HPA and FX observed quarterly) and
   need to predict the numerical variable Y.
   2. I have to run *weighted Ridge Regression* where the weights of the
   squared residuals are decreasing at 5% with every quarter into the past.
   3. Before estimating beta, I need select the *optimal Ridge parameter*
   (lambda) wrt the GCV criterion:
  a> For any lambda, divide
   the data into say, blocks B1, B2, B3, B4 and B5 of size k = 20% of data
   size. For each i, remove  B_i, estimate the beta vector   over the
   remaining data set and find the unweighted SSE (or any other deviation
   metric ) using this beta vector on the block B_i. Iterate over all
 five B_i''s  ( i =1,2,3,4) and get the average of the 4 sse
   values.
b> Allow
   lambda to vary between 0 to 1 in steps of size 0.01 and choose that lambda
   which minimizes the average sse computed in step a>
   4. With this choice of lambda, my final beta estimate would be [X'W'WX +
   lambda * Identity Matrix]^(-1)  * X'W'WY.
   5. Here W'W is a diagonal matrix whose diagonals are decreasing from the
   last entry upwards at 5% decay rate and trace(W'W) = 1 (i.e. sum of weights
   = 1)

I know lm.ridge() can do Ridge Regression, but I dont know how to write the
code with these weights, GCV criterion etc.

Can you please help me with this? I have attached the exact data in .txt
format (should be readable with read.table() ).Please let me know in case I
need to provide any more clarifications.

Thanks,
Preetam
T   GDP RateHPA FX  Y
1   0.806660537 2.177803167 1.14980573  2.733594304
2   0.997724655 1.585686087 0.814496976 3.193948056
3   0.99032353  0.569843997 0.46442 3.065751781
4   0.606121306 3.037648988 0.565322084 4.537399052
5   0.858131141 4.816423605 1.924534222 7.871730873
6   0.052909178 2.048591352 1.470221953 2.580646078
7   0.081400487 1.152495559 1.128828557 7.200336313
8   0.840972911 3.848225962 1.004272646 1.211124673
9   0.965868218 1.039679934 0.231408747 7.566968
10  0.952626722 4.455565591 0.483541015 9.412639513
11  0.067691757 0.038417569 0.69744243  8.055369029
12  0.985658841 1.143481763 1.65850909  6.962599601
13  0.177186946 3.762691635 0.44379572  9.904367023
14  0.490066697 0.655629739 1.281478696 1.796422139
15  0.223740666 1.393201062 1.235291827 5.237943945
16  0.782873809 1.485727273 0.224511215 6.399036418
17  0.947492758 0.318485005 1.158911495 8.183470692
18  0.49692711  2.169601457 1.777618832 8.830805294
19  0.956704273 1.546827505 0.241838792 7.554654431
20  0.404624372 3.041530693 1.66039172  6.709330773
21  0.98557461  2.45656369  1.695179666 8.638707974
22  0.494102398 4.527230971 0.993352283 7.958872374
23  0.893182943 3.429112971 0.675541115 5.665249801
24  0.669680459 0.459919029 1.011872328 8.883120607
25  0.017296599 2.184045646 1.575891106 2.585709635
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] retaining characters in a csv file

2015-09-22 Thread Therneau, Terry M., Ph.D.
I have a csv file from an automatic process (so this will happen thousands of times), for 
which the first row is a vector of variable names and the second row often starts 
something like this:


5724550,"000202075214",2005.02.17,2005.02.17,"F", .

Notice the second variable which is
  a character string (note the quotation marks)
  a sequence of numeric digits
  leading zeros are significant

The read.csv function insists on turning this into a numeric.  Is there any simple set of 
options that
will turn this behavior off?  I'm looking for a way to tell it to "obey the bloody quotes" 
-- I still want the first, third, etc columns to become numeric.  There can be more than 
one variable like this, and not always in the second position.


This happens deep inside the httr library; there is an easy way for me to add more options 
to the read.csv call but it is not so easy to replace it with something else.


Terry T

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Excel vs. R

2015-09-22 Thread Susana deus alvarez
Genial, muchas gracias a todos, voy a probar

El 22 de septiembre de 2015, 17:01, daniel  escribió:

> Agrupando tus datos y sin necesidad de ninguna librería puedes hacer algo
> como esto que creo te puede servir también.
>
>
> http://stats.stackexchange.com/questions/14118/drawing-multiple-barplots-on-a-graph-in-r
>
> mydata <- data.frame(Barplot1=rbinom(5,16,0.6), Barplot2=rbinom(5,16,0.25),
>  Barplot3=rbinom(5,5,0.25), Barplot4=rbinom(5,16,0.7))
> barplot(as.matrix(mydata), main="Interesting", ylab="Total", beside=TRUE,
> col=terrain.colors(5))
> legend(13, 12, c("Label1","Label2","Label3","Label4","Label5"), cex=0.6,
>fill=terrain.colors(5))
>
>
> Daniel Merino
>
> El 22 de septiembre de 2015, 16:48, pepeceb  escribió:
>
>> Mira esto sobre los Bar Plots
>>
>> Quick-R: Bar Plots 
>>
>>
>> [image: image] 
>>
>>
>>
>>
>>
>> Quick-R: Bar Plots 
>> Bar Plots Create barplots with the barplot(height) function, where height
>> is a vector or matrix. If height is a vector, the values determine the
>> heights of the bar...
>> Ver en www.statmethods.net 
>> Vista previa por Yahoo
>>
>>
>>
>>
>> Si lo que quieres es dividir por intervalos puedes hacer algo así, a ver
>> si te sirve.
>> Sea z1 tu matriz:
>>
>> z1$rango <- (cut(z1$tamaño, breaks =3, dig.lab=2)) #Añadimos una variable
>> rango con 3 divisiones del rango de tamaños. Te pone los intervalos por
>> defecto pero tambien puedes tu ponerlos tu.
>>
>> table (z1$estaciones,z1$rango) #una tablita con el numero de estaciones
>> por rango y tamaño
>> #Grafica de barras
>> barplot(table(z1$estaciones,z1$rango))# pero la mejoramos un poco
>>
>> #creamos una leyenda
>> leyenda<-c( "Gualeguycito", "(Itapebí]","(Cañada]")
>>
>> barplot (table (z1$tamaño,z1$rango),
>> main= "Promedio de tamaños...", ylab="Tamaño",
>>
>> beside = T, legend.text=leyenda,args.legend=list(x="topleft"))
>>
>> Saludos
>>
>>
>>
>> El Martes 22 de septiembre de 2015 21:20, Susana deus alvarez <
>> susanadeus.deusalva...@gmail.com> escribió:
>>
>>
>> Hola, escribo porque tengo una gran duda como gráficas tan fáciles en
>> Excel son tan difíciles en R?
>> No consigo hacer una gráfica en R de estas características. Osea como
>> puedo dividir por sitios por tamaños. En una planilla enorme tengo la
>> primera columna son las estaciones (pero solo quiero las tres ultimas) y
>> después muchos parámetros, en las columnas 46, 53 y 60 los promedios
>> ponderados. Y no puedo hacerlo en R incapaz. He intentado crear unos excel
>> más pequeño solo con eso pero no hay manera. Si alguien me puede ayudar un
>> poco...
>>
>> Gracias
>>
>> [image: Imágenes integradas 1]
>>
>> ___
>> R-help-es mailing list
>> R-help-es@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-help-es
>>
>>
>> ___
>> R-help-es mailing list
>> R-help-es@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-help-es
>>
>>
>
>
> --
> Daniel
>
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Compare two normal to one normal

2015-09-22 Thread John Sorkin
Bert
I am surprised by your response. Statistics serves two purposes: estimation and 
hypothesis testing. Sometimes we are fortunate and theory, physiology, physics, 
or something else tell us what is the correct, or perhaps I should same most 
adequate model. Sometimes theory fails us and we wish to choose between two 
competing models. This is my case.  The cell sizes may come from one normal 
distribution (theory 1) or two (theory 2). Choosing between the models will 
help us postulate about physiology. I want to use statistics to help me decide 
between the two competing models, and thus inform my understanding of 
physiology. It is true that statistics can't tell me which model is the 
"correct" or "true" model, but it should be able to help me select the more 
"adequate" or "appropriate" or "closer to he truth" model.


In any event, I still don't know how to fit a single normal distribution and 
get a measure of fit e.g. log likelihood.


John


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric 
Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 

>>> Bert Gunter  09/22/15 4:48 PM >>>
I'll be brief in my reply to you both, as this is off topic.

So what?  All this statistical stuff is irrelevant baloney(and of
questionable accuracy, since based on asymptotics and strong
assumptions, anyway) . The question of interest is whether a mixture
fit better suits the context, which only the OP knows and which none
of us can answer.

I know that many will disagree with this -- maybe a few might agree --
but please send all replies, insults, praise, and learned discourse to
me privately,  as I have already occupied more space on the list than
I should.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
> That's true but if he uses some AIC or BIC criterion that penalizes the
> number of parameters,
> then he might see something else ? This ( comparing mixtures to not mixtures
> ) is not something I deal with so I'm just throwing it out there.
>
>
>
>
> On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter  wrote:
>>
>> Two normals will **always** be a better fit than one, as the latter
>> must be a subset of the former (with identical parameters for both
>> normals).
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>-- Clifford Stoll
>>
>>
>> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
>>  wrote:
>> > I have data that may be the mixture of two normal distributions (one
>> > contained within the other) vs. a single normal.
>> > I used normalmixEM to get estimates of parameters assuming two normals:
>> >
>> >
>> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
>> > GLUT
>> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > summary(mixmdl)
>> > plot(mixmdl,which=2)
>> > lines(density(data[,"GLUT"]), lty=2, lwd=2)
>> >
>> >
>> >
>> >
>> >
>> > summary of normalmixEM object:
>> >comp 1   comp 2
>> > lambda  0.7035179 0.296482
>> > mu -0.0592302 0.140545
>> > sigma   1.1271620 0.536076
>> > loglik at estimate:  -110.8037
>> >
>> >
>> >
>> > I would like to see if the two normal distributions are a better fit
>> > that one normal. I have two problems
>> > (1) normalmixEM does not seem to what to fit a single normal (even if I
>> > address the error message produced):
>> >
>> >
>> >> mixmdl = normalmixEM(GLUT,k=1)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >> mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
>> > Error in normalmix.init(x = x, lambda = lambda, mu = mu, s = sigma, k =
>> > k,  :
>> >   arbmean and arbvar cannot both be FALSE
>> >
>> >
>> >
>> > (2) Even if I had the loglik from a single normal, I am not sure how
>> > many DFs to use when computing the -2LL ratio test.
>> >
>> >
>> > Any suggestions for comparing the two-normal vs. one normal distribution
>> > would be appreciated.
>> >
>> >
>> > Thanks
>> > John
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > John David Sorkin M.D., Ph.D.
>> > Professor of Medicine
>> > Chief, Biostatistics and Informatics
>> > University of Maryland School of Medicine Division of Gerontology and
>> > Geriatric Medicine
>> > Baltimore VA Medical Center
>> > 10 North Greene Street
>> > GRECC (BT/18/GR)
>> > Baltimore, MD 21201-1524
>> > (Phone) 410-605-7119410-605-7119
>> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>> >
>> >
>> > Confidentiality 

Re: [R] Compare two normal to one normal

2015-09-22 Thread Mark Leeds
Hi John:  For the log likelihood in the single case, you can just calculate
it directly
using the normal density, so the sum from i = 1 to n of f(x_i, uhat,
sigmahat)
where f(x_i, uhat, sigma hat)  is the density of the normal with that mean
and variance.
so you can use dnorm with log = TRUE.  Of course you need to estimate the
parameters uhat and sigma hat first but for the single normal case, they
are of course just the sample mean and sample variance

Note though: If you going to calculate a log likelihood ratio, make sure
you compare
apples and apples and not apples and oranges in the sense that the
loglikelihood
that comes out of the mixture case may include constants such
1/radical(2pi) etc.
So you need to know EXACTLY how the mixture algorithm is calculating it's
log likelihood.

In fact, it may be better and safer to just calculate the loglikelihood for
the mixture yourself also so sum  from i = 1 to n of [ lambda*f(x_i,
mu1hat, sigma1hat) + (1-lambda)*f(x_i, mu2hat, sigma2hat) By calculating it
yourself and being consistent, you then know that you will be calculating
apples and applies.

As I said earlier, another way is by comparing AICs. in that case, you
calculate it
in both cases and see which AIC is lower. Lower wins and it penalizes for
number of parameters. There are asymptotics required in both the LRT
approach and the AIC
approach so you can pick your poison !!! :).



























On Tue, Sep 22, 2015 at 6:01 PM, John Sorkin 
wrote:

> Bert
> I am surprised by your response. Statistics serves two purposes:
> estimation and hypothesis testing. Sometimes we are fortunate and theory,
> physiology, physics, or something else tell us what is the correct, or
> perhaps I should same most adequate model. Sometimes theory fails us and we
> wish to choose between two competing models. This is my case.  The cell
> sizes may come from one normal distribution (theory 1) or two (theory 2).
> Choosing between the models will help us postulate about physiology. I want
> to use statistics to help me decide between the two competing models, and
> thus inform my understanding of physiology. It is true that statistics
> can't tell me which model is the "correct" or "true" model, but it should
> be able to help me select the more "adequate" or "appropriate" or "closer
> to he truth" model.
>
> In any event, I still don't know how to fit a single normal distribution
> and get a measure of fit e.g. log likelihood.
>
> John
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
> >>> Bert Gunter  09/22/15 4:48 PM >>>
> I'll be brief in my reply to you both, as this is off topic.
>
> So what? All this statistical stuff is irrelevant baloney(and of
> questionable accuracy, since based on asymptotics and strong
> assumptions, anyway) . The question of interest is whether a mixture
> fit better suits the context, which only the OP knows and which none
> of us can answer.
>
> I know that many will disagree with this -- maybe a few might agree --
> but please send all replies, insults, praise, and learned discourse to
> me privately, as I have already occupied more space on the list than
> I should.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> -- Clifford Stoll
>
>
> On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
> > That's true but if he uses some AIC or BIC criterion that penalizes the
> > number of parameters,
> > then he might see something else ? This ( comparing mixtures to not
> mixtures
> > ) is not something I deal with so I'm just throwing it out there.
> >
> >
> >
> >
> > On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter 
> wrote:
> >>
> >> Two normals will **always** be a better fit than one, as the latter
> >> must be a subset of the former (with identical parameters for both
> >> normals).
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >> Bert Gunter
> >>
> >> "Data is not information. Information is not knowledge. And knowledge
> >> is certainly not wisdom."
> >> -- Clifford Stoll
> >>
> >>
> >> On Tue, Sep 22, 2015 at 1:21 PM, John Sorkin
> >>  wrote:
> >> > I have data that may be the mixture of two normal distributions (one
> >> > contained within the other) vs. a single normal.
> >> > I used normalmixEM to get estimates of parameters assuming two
> normals:
> >> >
> >> >
> >> > GLUT <- scale(na.omit(data[,"FCW_glut"]))
> >> > GLUT
> >> > mixmdl = normalmixEM(GLUT,k=1,arbmean=TRUE)
> >> > summary(mixmdl)
> >> > plot(mixmdl,which=2)
> >> > 

Re: [R] Compare two normal to one normal

2015-09-22 Thread Mark Leeds
John: After I sent what I wrote, I read Rolf's intelligent response. I
didn't realize that
there are boundary issues so yes, he's correct and  my approach is EL
WRONGO. I feel very not good that I just sent that email being that it's
totally wrong. My apologies for noise
and thanks Rolf for the correct response.

Oh,  thing that does still hold in my response is  the AIC approach unless
Rolf
tells us that it's not valid also. I don't see why it wouldn't be though
because you're
not doing a hypothesis test when you go the AIC route.






On Wed, Sep 23, 2015 at 12:33 AM, Mark Leeds  wrote:

> Hi John:  For the log likelihood in the single case, you can just
> calculate it directly
> using the normal density, so the sum from i = 1 to n of f(x_i, uhat,
> sigmahat)
> where f(x_i, uhat, sigma hat)  is the density of the normal with that mean
> and variance.
> so you can use dnorm with log = TRUE.  Of course you need to estimate the
> parameters uhat and sigma hat first but for the single normal case, they
> are of course just the sample mean and sample variance
>
> Note though: If you going to calculate a log likelihood ratio, make sure
> you compare
> apples and apples and not apples and oranges in the sense that the
> loglikelihood
> that comes out of the mixture case may include constants such
> 1/radical(2pi) etc.
> So you need to know EXACTLY how the mixture algorithm is calculating it's
> log likelihood.
>
> In fact, it may be better and safer to just calculate the loglikelihood
> for the mixture yourself also so sum  from i = 1 to n of [ lambda*f(x_i,
> mu1hat, sigma1hat) + (1-lambda)*f(x_i, mu2hat, sigma2hat) By calculating it
> yourself and being consistent, you then know that you will be calculating
> apples and applies.
>
> As I said earlier, another way is by comparing AICs. in that case, you
> calculate it
> in both cases and see which AIC is lower. Lower wins and it penalizes for
> number of parameters. There are asymptotics required in both the LRT
> approach and the AIC
> approach so you can pick your poison !!! :).
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Sep 22, 2015 at 6:01 PM, John Sorkin 
> wrote:
>
>> Bert
>> I am surprised by your response. Statistics serves two purposes:
>> estimation and hypothesis testing. Sometimes we are fortunate and theory,
>> physiology, physics, or something else tell us what is the correct, or
>> perhaps I should same most adequate model. Sometimes theory fails us and we
>> wish to choose between two competing models. This is my case.  The cell
>> sizes may come from one normal distribution (theory 1) or two (theory 2).
>> Choosing between the models will help us postulate about physiology. I want
>> to use statistics to help me decide between the two competing models, and
>> thus inform my understanding of physiology. It is true that statistics
>> can't tell me which model is the "correct" or "true" model, but it should
>> be able to help me select the more "adequate" or "appropriate" or "closer
>> to he truth" model.
>>
>> In any event, I still don't know how to fit a single normal distribution
>> and get a measure of fit e.g. log likelihood.
>>
>> John
>>
>>
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology and
>> Geriatric Medicine
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>
>> >>> Bert Gunter  09/22/15 4:48 PM >>>
>> I'll be brief in my reply to you both, as this is off topic.
>>
>> So what? All this statistical stuff is irrelevant baloney(and of
>> questionable accuracy, since based on asymptotics and strong
>> assumptions, anyway) . The question of interest is whether a mixture
>> fit better suits the context, which only the OP knows and which none
>> of us can answer.
>>
>> I know that many will disagree with this -- maybe a few might agree --
>> but please send all replies, insults, praise, and learned discourse to
>> me privately, as I have already occupied more space on the list than
>> I should.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> -- Clifford Stoll
>>
>>
>> On Tue, Sep 22, 2015 at 1:35 PM, Mark Leeds  wrote:
>> > That's true but if he uses some AIC or BIC criterion that penalizes the
>> > number of parameters,
>> > then he might see something else ? This ( comparing mixtures to not
>> mixtures
>> > ) is not something I deal with so I'm just throwing it out there.
>> >
>> >
>> >
>> >
>> > On Tue, Sep 22, 2015 at 4:30 PM, Bert Gunter 
>> wrote:
>> >>
>> >> Two normals will **always** be a better fit than one, as the latter
>> 

Re: [R] Fixing Gale-Shapley Algorithm for R

2015-09-22 Thread VictorDelgado
Hello R Developers, I have made a new code for this algorithm in R. In the
end I present an very small example with system.time computing.

Gale-Shapley Many-to-One (Note that many are always in Rows):

#
#

gsa.many <- function(m, n, preference.row, preference.col, expand)
{

# m = row number 
# n = col number 
# Remember, rows propose first in this code
# expand = seats per 'school' or column classes
# Note that m > n is needed to algorithm to run
# Comments in Portuguese

loop <- 1 # marcação do primeiro loop
result <- matrix(0,nrow=m, ncol=n) # Matriz zerada
pos <- NULL # Para ver a posição do número mais escolhido
surplus <- 1 # Só para servir de condição inicial.

# Core of the Function:

while(any(surplus > 0)){ # Testa a consição se o número de alunos é maior
que o número de vagas

# Obtenção das propostas:

for(i in 1:m){
pos[i] <- which.min(preference.row[i,])
result[i,pos[i]] <- 1}

# Vamos obter quantos alunos requisitam as vagas:

demand <- apply(result, 2, sum)
surplus <- demand - expand # quantos alunos excedentes

# Qual(is) escola(s) terá(ão) de tirar alunos:

escolas <- which(surplus > 0) 

rejected <- list(NULL) # Vai ser usado p/ descobrir os alunos que precisam
ser retirados:
surplus <- surplus[surplus > 0] # Quantos alunos estão sobrando

# Vamos criar uma lista auxiliar para o FOR abaixo:

if(length(surplus) > 0){
aux <- list(NULL)

for(i in 1:length(escolas)){
aux[[i]] <- escolas[i]} # ESSA LISTA Coloca a escolas na ordem

for(i in 1:length(escolas)){
proponents <- which(result[,aux[[i]]] == 1) 
decreasing <- sort(preference.col[proponents,aux[[i]]], decreasing = TRUE)
rejected <- decreasing[1:surplus[i]]

retirar <- NULL

for(k in 1:length(rejected)){
retirar[k] <- which(preference.col[,aux[[i]]]==rejected[k])
retirar <- sort(retirar)}

preference.row[retirar,aux[[i]]] <- 2*m
result[retirar,aux[[i]]] <- 0} # FIM DOS DOIS FOR DA ESCOLA!!
} # FIM DO IF

cat("interações =",loop,'\n')
flush.console()
loop <- loop+1} # FIM DO WHILE!

# Cospe RESULT

result

} # FIM DA FUNÇÃO! END OF FUNCTION!

#

Comparing Time of previous function with new one:

#

# Setting the Example:

set.seed(51)

m <- 1
n <- 20
S <- NULL

while(m <= 100){
S <- append(S,sample(1:n,n))
m <- m + 1}

m <- m - 1
Pi <- matrix(S, nrow = m, byrow = TRUE)

R <- NULL
n <- 1

while(n <= 20){
R <- append(R,sample(1:m,m))
n <- n + 1}

n <- n - 1
Ps <- matrix(R, nrow=m)

vac <- c(rep(10,5),rep(5,5),rep(4,5),rep(1,5))

##


# PREVIOUS CODE

system.time(gsa.many2(m = m, n = n, preference.row = Pi, preference.col =
Ps, first = 1, expand = vac)) # In fact this functions have small changes to
apply a school Vector, please e-mail me for details.

   user  system elapsed 
   0.090.050.15 

# NEW CODE

system.time(gsa.many(m = m, n = n, preference.row = Pi, preference.col = Ps,
expand = vac))

   user  system elapsed 
   0.030.020.04 

R Version:

Rx64 3.0.1

My Machine:

i7 3770 CPU @ 3.40 GHz 16GB RAM




-
Victor Delgado
cedeplar.ufmg.br P.H.D. student
UFOP assistant professor
--
View this message in context: 
http://r.789695.n4.nabble.com/Gale-Shapley-Algorithm-for-R-tp4240809p4712636.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread BARRETT, Oliver
Hi all,

Thank you so much for your input.

Just to clarify, of the 32% plagiarism detected only 27-8% has come from the 
regressions but this is expected as the appendix where the regressions are 
contained is much more dense with text and numbers than the rest of the 
document.

The other 5% will be my quotations and references but that's normal,

Thanks again, I will be sharing your thoughts with my thesis supervisor.

Cheers,

Oliver


From: Marc Schwartz 
Sent: 22 September 2015 22:27
To: peter dalgaard
Cc: Bert Gunter; BARRETT, Oliver; R-help
Subject: Re: [R] 'R' Software Output Plagiarism

Peter,

Great distinction.

I was leaning in the direction that the "look and feel" of the output (standard 
wording, table structure, column headings, significance stars and so forth in 
the output) is similar to whatever Urkund is using as the basis for the 
comparison and less so on an exact replication (covariates, coefficients, 
etc.), or nearly so, of prior work.

Thanks,

Marc


> On Sep 22, 2015, at 3:06 PM, peter dalgaard  wrote:
>
> Marc,
>
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
>
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.).
>
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
>
> -pd
>
>> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>> -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>>  wrote:

 Dear 'R' community support,


 I am a student at Skema business school and I have recently submitted my 
 MSc thesis/dissertation. This has been passed on to an external plagiarism 
 service provider, Urkund, who have scanned my document and returned a 
 plagiarism report to my professor 

[R] Top Trading Cycles (TTC) Algorithm in R

2015-09-22 Thread VictorDelgado
Hello R users I'm posting here my recent implementation of Top Trading Cycles
Algorithm in R. For more details, look for Shapley and Scarf "On Cores and
Indivisibility" Journal of Mathematical Economics, 1, 23-37.

ttc.many <- function(m, n, preference.row, preference.col,expand)
{

# m = row number 
# n = col number 
# Remember, rows propose first in this code 
# expand = counter of seats per 'school' or column classes 
# Note that m > n is needed to algorithm to run 
# Comments in Portuguese 

##

students <- 1:m

# Condição dos alunos:
# Há alunos na lista?

loop <- 1
result <- matrix(0,nrow=m, ncol=2) # E gerar um resultado

repeat{
ciclo <- NULL
pos <- NULL
s.point <- students[1]

# E vamos armazenar o ciclo em um objeto:

ciclo <- c(ciclo, s.point)

while(all(duplicated(ciclo)==FALSE)){
i.point <- which.min(preference.row[s.point,]) # Para onde o primeiro aluno
da lista aponta:
s.point <- which.min(preference.col[,i.point]) # Para quem essa escola
aponta?
ciclo <- c(ciclo, s.point) # Para quem essa escola aponta formando o ciclo.
} # FIM DO PEQUENO WHILE!

# Quem é o duplicado?

dup <- ciclo[which(duplicated(ciclo)==TRUE)]
start <- min(which(ciclo==dup))

# Ciclo apenas com os participantes e sem o repetido ao final:

ciclo <- ciclo[start:(length(ciclo)-1)]

for(i in ciclo){
escola <- which.min(preference.row[i,])
result[i,] <- c(i,escola)
preference.col[i,1:n] <- 2*m

if(expand[escola]>1){
expand[escola] <- expand[escola] - 1}else{
expand[escola] <- expand[escola] - 1
preference.row[,escola] <- 2*m}}

for(k in 1:length(ciclo)){
pos[k] <- which(students==ciclo[k])}
students <- students[-pos]

cat("interações =",loop,'\n')
flush.console()
loop <- loop+1
if(length(students) == 0){
break
}
} # FIM DO REPEAT!

result.matrix <- matrix(0, nrow=m, ncol=n)
for(j in result[,1]){
result.matrix[j,result[j,2]] <- 1}
result.matrix

} # FIM DA FUNÇÃO! END OF FUNCTION!

#

Simple test:

m1 <- c(2,1,3,4)
m2 <- c(1,2,3,4)
m3 <- c(3,2,1,4)
m4 <- c(3,4,1,2)
m5 <- c(1,4,2,3)
m6 <- c(2,3,4,1)
m7 <- c(1,2,3,4)
m8 <- c(1,2,4,3)

n1 <- c(1,2,3,4,5,6,7,8)
n2 <- c(7,6,1,3,2,8,5,4)
n3 <- c(3,5,2,8,1,7,4,6) 
n4 <- c(8,5,6,4,7,1,3,2)

preference.row <- matrix(c(m1,m2,m3,m4,m5,m6,m7,m8), nrow=8, byrow=TRUE)
preference.col <- matrix(c(n1, n2, n3, n4), ncol=4)
exp <- c(2,2,3,3) # Vector of Seats

gsa.many(m=8, n=4, preference.row=preference.row,
preference.col=preference.col, expand=exp))

### SOME REFERENCES:

A. Abdulkadiroglu, T. Sonmez School Choice: A Mechanism Design Approach.
American Economic Review, 93(3):729–743, 2003.

L. S. Shapley, H. Scarf "On Cores and Indivisibility" Journal of
Mathematical Economics, 1, 23-37.

Klein, T. (2015). matchingMarkets: Structural Estimator and Algorithms for
the Analysis of Stable
Matchings. R package version 0.1-5.

https://cran.r-project.org/web/packages/matchingMarkets/index.html





-
Victor Delgado
Professor in department of Economics,
UFOP - Univ. Federal de Ouro Preto, Brazil
--
View this message in context: 
http://r.789695.n4.nabble.com/Top-Trading-Cycles-TTC-Algorithm-in-R-tp4712649.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Running R on a hosting o server

2015-09-22 Thread bgnumis bgnum
Hi all,

Hope I can Explain:

I want to "run" online some function (code written by me) so that this code
"saves" an output file on my server and my html webpage read the file R
plots and save (really manually I would run R function, open Filezilla, and
pass the output png o jpg file).

Is it possible to do this "authomatically" telling R, something like each
15 minutes run this "pru.txt" file, and take save this plot.png and execute
filezilla with this inputs and save the plot in this folder?

Hope you can understand me.

In rmarkdown it is true that output html file but my intention is tu run my
own function and the need is to run my function in R, and run and open
filezilla and deposit the file in the right place.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loading data chartseries

2015-09-22 Thread bgnumis bgnum
Hi all,

I want to plot this data on file.txt that has this format

01/01/2000;970,1877
02/01/2000;970,2224
03/01/2000;969,0336
04/01/2000;958,3023
05/01/2000;952,8527

I´m trying to plot with quantmode with this code but it is not working


X<-read.table("file.txt", col.names=c("Date","LAST"), sep=";",dec=",")




chartSeries(
X,theme="white",
  TA = c(addBBands(200,2))


)

But it says on error

Error in try.xts(x, error = "chartSeries requires an xtsible object") :
  chartSeries requires an xtsible object


How can I run chartseries with my own data?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] doubt with Odds ratio - URGENT HELP NEEDED

2015-09-22 Thread Rosa Oliveira
Dear all,


I’m trying to compute Odds ratio and OR confidence interval.

I’m really naive, sorry for that.


I attach my data and my code.

I’m having lots of errors:

1. Error in data.frame(tas1 = tas.data$tas_d2, tas2 = tas.data$tas_d3, tas3 = 
tas.data$tas_d4,  : 
  arguments imply differing number of rows: 90, 0

2. Error in data.frame(tas = c(unlist(tas.data[, -8:-6])), time = rep(c(0:4),  
: 
  arguments imply differing number of rows: 630, 450, 0

3. Error: object 'tas.data.long' not found

4. Error in data.frame(media = c(mean.dead, mean.alive), standarderror = 
c(se.dead,  : 
  arguments imply differing number of rows: 14, 10

5. Error in ggplot(summarytas, aes(x = c(c(1:5), c(1:5)), y = mean, colour = 
discharge)) : 
  object 'summarytas' not found

6. Error in summary(glm(tas.data[, 6] ~ tas.data[, 4], family = binomial(link = 
probit))) : 
  error in evaluating the argument 'object' in selecting a method for function 
'summary': Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1

7. Error in wilcox.test.default(pred[obs == 1], pred[obs == 0], alternative = 
"great") : 
  not enough (finite) 'x' observations
In addition: Warning message:
In is.finite(x) & apply(pred, 1, f) :
  longer object length is not a multiple of shorter object length


and off course I’m not getting OR.

Nonetheless all this errors, I think I have not been able to compute de code to 
get OR and OR confidence interval.


Can anyone help me please. It’s really urgent.

PLEASE

THE CODE:

the hospital outcome is discharge.

require(gdata)
library(foreign)
library(nlme)
library(lme4)
library(boot) 
library(MASS)
library(Hmisc)
library(plotrix)
library(verification)
library(mvtnorm)
library(statmod) 
library(epiR)

#
# Data preparation  
#
#

setwd("/Users/RO/Desktop")

casedata <-read.spss("tas_05112008.sav")
tas.data<-data.frame(casedata)

#Delete patients that were not discharged
tas.data <- tas.data[ tas.data$hosp!="si ",]
tas.data$resultado.hosp  <- ifelse(tas.data$hosp=="l", 0, 1)

tas.data$tas_d2  <- 
log(ifelse(tas.data$tas_d2==8|tas.data$tas_d2==9, NA, tas.data$tas_d2))
tas.data$tas_d3  <- 
log(ifelse(tas.data$tas_d3==8|tas.data$tas_d3==9, NA, tas.data$tas_d3))
tas.data$tas_d4  <- 
log(ifelse(tas.data$tas_d4==8|tas.data$tas_d4==9, NA, tas.data$tas_d4))
tas.data$tas_d5  <- 
log(ifelse(tas.data$tas_d5==8|tas.data$tas_d5==9, NA, tas.data$tas_d5))
tas.data$tas_d6  <- 
log(ifelse(tas.data$tas_d6==8|tas.data$tas_d6==9, NA, tas.data$tas_d6))

tas.data$age  <- ifelse(tas.data$age==8|tas.data$age==9, NA, 
tas.data$age)


tas.data <-   data.frame(tas1 = tas.data$tas_d2, tas2 = 
tas.data$tas_d3, 
 tas3 = tas.data$tas_d4, tas4 = 
tas.data$tas_d5, 
 tas5 = tas.data$tas_d6, age = 
tas.data$age, 
 discharge = 
tas.data$resultado.hosp, id.pat=tas.data$ID)

#tas.data$discharge  <- factor(   tas.data$discharge , 
levels=c(0,1), labels = c("dead", "alive"))

  #select only cases that have more than 3 tas
tas.data  <- tas.data[apply(tas.data[,-8:-6], 1, 
function(x) sum(!is.na(x)))>2,]



nsample <- n.obs  <- dim(tas.data)[1]  #nr of patients with 
more than 2 tas measurements

tas.data.long <- data.frame( 
tas=c(unlist(tas.data[,-8:-6])), time=rep(c(0:4), each=n.obs), 
age=rep(tas.data$age, 5), discharge=rep(tas.data$discharge, 5),
   id=rep(c(1:n.obs), 5))
tas.data.long <- tas.data.long  [order(tas.data.long$id),]

age=tas.data$age

##
#PLOT EMPIRICAL MEANS OF CRP FOR ALIVE  DEATh
##
  mean.alive  <- apply(tas.data[tas.data$discharge==0, 
-8:-6], 2, mean, na.rm=T)
  mean.dead   <- apply(tas.data[tas.data$discharge==1, 
-8:-6], 2, mean, na.rm=T) 
  stderr  <- function(x) 
sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))
  se.alive<- apply(tas.data[tas.data$discharge==0, 
-8:-6], 2, stderr)
  se.dead <- apply(tas.data[tas.data$discharge==1, 
-8:-6], 2, stderr)
  summarytas  <- data.frame (media = c(mean.dead, 
mean.alive), 
  

Re: [R] retaining characters in a csv file

2015-09-22 Thread David Winsemius

On Sep 22, 2015, at 3:00 PM, Therneau, Terry M., Ph.D. wrote:

> I have a csv file from an automatic process (so this will happen thousands of 
> times), for which the first row is a vector of variable names and the second 
> row often starts something like this:
> 
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .
> 
> Notice the second variable which is
>  a character string (note the quotation marks)
>  a sequence of numeric digits
>  leading zeros are significant
> 
> The read.csv function insists on turning this into a numeric.  Is there any 
> simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the 
> bloody quotes" -- I still want the first, third, etc columns to become 
> numeric.  There can be more than one variable like this, and not always in 
> the second position.

The last part about not knowing which col might be an issue might require 
inputting everything with character class, but if there is a way to pass in a 
colClasses argument this might help:

> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', 
> stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", 
> rep("character", 4)))
   V1   V2 V3 V4 V5
1 5724550 000202075214 2005.02.17 2005.02.17  F

Or you can create a class with an As method:

> setClass('myChar')
> setAs('character', 'myChar', def=function(from, to ) to <- I(from))
> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', 
> stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", 
> rep('myChar',4)) )
   V1   V2 V3 V4 V5
1 5724550 000202075214 2005.02.17 2005.02.17  F

(Neither of the third or fourth columns makes sense as a numeric, so now 
illustrating coercion to Date.)

> setClass('dotDate')
> setAs('character', 'dotDate', def=function(from, to ) to <- as.Date(from, 
> "%Y.%m.%d")  )

> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', 
> stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", "character", 
> rep('dotDate',2), "character") )
   V1   V2 V3 V4 V5
1 5724550 000202075214 2005-02-17 2005-02-17  F


> 
> This happens deep inside the httr library; there is an easy way for me to add 
> more options to the read.csv call but it is not so easy to replace it with 
> something else.
> 
> Terry T
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread Arunkumar Srinivasan
data.table's fread reads this as expected. Quoted strings aren't coerced.

sapply(fread('5724550,"000202075214",2005.02.17,2005.02.17,"F"\n'), class)
#  V1  V2  V3  V4  V5
#   "integer" "character" "character" "character" "character"

Best,
Arun.

On Wed, Sep 23, 2015 at 12:00 AM, Therneau, Terry M., Ph.D.
 wrote:
> I have a csv file from an automatic process (so this will happen thousands
> of times), for which the first row is a vector of variable names and the
> second row often starts something like this:
>
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .
>
> Notice the second variable which is
>   a character string (note the quotation marks)
>   a sequence of numeric digits
>   leading zeros are significant
>
> The read.csv function insists on turning this into a numeric.  Is there any
> simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the
> bloody quotes" -- I still want the first, third, etc columns to become
> numeric.  There can be more than one variable like this, and not always in
> the second position.
>
> This happens deep inside the httr library; there is an easy way for me to
> add more options to the read.csv call but it is not so easy to replace it
> with something else.
>
> Terry T
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread peter dalgaard

> On 23 Sep 2015, at 00:33 , Rolf Turner  wrote:
> 

[read.csv() doesn't distinguish "123.4" from 123.4]

> IMHO this is a bug in read.csv().
> 

Dunno about that:

pd$ cat ~/tmp/junk.csv 
"1";1
2;"2"
pd$ open !$
open ~/tmp/junk.csv

And lo and behold, Excel opens with 

1 1
2 2

and all cells numeric.

I don't think the CSV standard (if there is one...) specifies that quoted 
strings are necessarily text.

I think we have been here before, and found that even if we decide that it is a 
bug (or misfeature), it would be hard to change, because the modus operandi of 
read.* is to first read everything as character and _then_ see (in 
type.convert()) which entries can be converted to numeric, logical, etc.

-pd

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare two normal to one normal

2015-09-22 Thread Charles C. Berry

On Tue, 22 Sep 2015, John Sorkin wrote:



In any event, I still don't know how to fit a single normal distribution 
and get a measure of fit e.g. log likelihood.




Gotta love R:


y <- rnorm(10)
logLik(glm(y~1))

'log Lik.' -17.36071 (df=2)

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retaining characters in a csv file

2015-09-22 Thread Rolf Turner

On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:

I have a csv file from an automatic process (so this will happen
thousands of times), for which the first row is a vector of variable
names and the second row often starts something like this:

5724550,"000202075214",2005.02.17,2005.02.17,"F", .

Notice the second variable which is
   a character string (note the quotation marks)
   a sequence of numeric digits
   leading zeros are significant

The read.csv function insists on turning this into a numeric.  Is there
any simple set of options that
will turn this behavior off?  I'm looking for a way to tell it to "obey
the bloody quotes" -- I still want the first, third, etc columns to
become numeric.  There can be more than one variable like this, and not
always in the second position.

This happens deep inside the httr library; there is an easy way for me
to add more options to the read.csv call but it is not so easy to
replace it with something else.


IMHO this is a bug in read.csv().

A possible workaround:

ccc <- c("integer","character",rep(NA,k))
X   <- read.csv("melvin.csv",colClasses=ccc)

where "melvin.csv" is the file from which you are attempting to read and
where k+2 = the number of columns in that file.

Kludgey, but it might work.

Another workaround is to specify quote="", but this has the side effect
of making the 5th column character rather than logical.

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Duncan Murdoch
On 22/09/2015 4:06 PM, peter dalgaard wrote:
> Marc,
> 
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
> 
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.).

I don't see why this puzzles you.  A simple explanation is that Urkund
is incompetent.

Many companies that sell software to university administrations are
incompetent, because the buyers have been promoted so far beyond their
competence that they'll buy anything if it is expensive enough.

This isn't uncommon.

Duncan Murdoch

> 
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
> 
> -pd
> 
>> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>>  wrote:

 Dear 'R' community support,


 I am a student at Skema business school and I have recently submitted my 
 MSc thesis/dissertation. This has been passed on to an external plagiarism 
 service provider, Urkund, who have scanned my document and returned a 
 plagiarism report to my professor having detected 32% plagiarism.


 I have contacted Urkund regarding this issue having committed no such 
 plagiarism and they have told me that all the plagiarism detected in my 
 document comes from the last 25% which consists only of 'R' regressions 
 like the one I have pasted below:

 lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
   Fed.t.4., data = OLS_CAR, x = TRUE)

 Residuals:
 Min1QMedian3Q   Max
 -0.154587 -0.015961  0.001429  0.017196  0.110907

 Coefficients:
Estimate Std. Error t value Pr(>|t|)
 (Intercept) -0.001630   0.001763  -0.925   0.3559
 Fed -0.121595   0.165359  -0.735   0.4627

[R] 'R' Software Output Plagiarism

2015-09-22 Thread BARRETT, Oliver

Dear 'R' community support,


I am a student at Skema business school and I have recently submitted my MSc 
thesis/dissertation. This has been passed on to an external plagiarism service 
provider, Urkund, who have scanned my document and returned a plagiarism report 
to my professor having detected 32% plagiarism.


I have contacted Urkund regarding this issue having committed no such 
plagiarism and they have told me that all the plagiarism detected in my 
document comes from the last 25% which consists only of 'R' regressions like 
the one I have pasted below:

lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
Fed.t.4., data = OLS_CAR, x = TRUE)

Residuals:
  Min1QMedian3Q   Max
-0.154587 -0.015961  0.001429  0.017196  0.110907

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.001630   0.001763  -0.925   0.3559
Fed -0.121595   0.165359  -0.735   0.4627
Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
Fed.t.2. 0.026529   0.143648   0.185   0.8536
Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0293 on 304 degrees of freedom
  (20 observations deleted due to missingness)
Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05

I have produced all of these regressions myself and pasted them directly from 
the 'R' software package. My regression methodology is entirely my own along 
with the sourcing and preperation of the data used to produce these statistics.

I would be very grateful if you could provide my with some clarity as to why 
this output from 'R' is reading as plagiarism.

I would like to thank you in advance,

Kind regards,

Oliver Barrett
(+44) 7341 834 217

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: Weighted skewness and curtosis

2015-09-22 Thread SEMson
I´m also looking for an answer on this question right now. 

You can´t use a weight in the moments package, but i found a
weighted.moments()-function in the acid-package ( weighted.moments-function
  ). If your
data has NA, you can do the following:

#-
skew <- function(x,weight){
  weight<-weight[!is.na(x)]  #delete weight for cases with NA
  x<-x[!is.na(x)] # delete NA
  acid::weighted.moments(x, w8=weight) #calulate moments
}
skew(mydata$var,weight)
#-


I also tried to write a weighted-skew-function by myself:
The result is different from the acid-package: i get a skew of 0.7692313.
Perhaps, because x and length(x) aren´t weighted here. The unweighted skew
was 0.58 btw.
#-
skew.wtd <- function(x,weight){
  weight<-weight[!is.na(x)]
  x<-x[!is.na(x)]
  sum.w <- sum(weight)
  sum.w2 <- sum(weight^2)
  mean.w <- sum(x * weight) / sum(weight)
  x.sd.w<-sqrt((sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2))
  ((sum(((x - mean.w)/ x.sd.w)^3))/(length(x) - 1))
}
skew.wtd(mydata$var,weight)
#-


Because the acid-package doesn´t give a weighted kurtosis, i tried the
following:
#-
kurt <- function(x,weight){
  weight<-weight[!is.na(x)]
  x<-x[!is.na(x)]
  mean.w <- sum(x * weight) / sum(weight)
  sum.w <- sum(weight)
  sum.w2 <- sum(weight^2)
  x.sd.w<-sqrt((sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2))
 #((sum(((x - mean.w)/(sd(x)))^4))/(length(x) - 1)) #formula A
 (((sum(((x - mean(x))/(sd(x)))^4))/(length(x) - 1)) - 3)   #formula B
}
kurt(mydata$var,weight)
# weighted Kurtosis is -0.7127631

#-
kurtosis<-function(x,weight) {
  weight<-weight[!is.na(x)]
  x<-x[!is.na(x)]
  mean.w <- sum(x * weight) / sum(weight)
  sum.w <- sum(weight)
  sum.w2 <- sum(weight^2)
  x.sd.w<-sqrt((sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2))
  m4<-mean((x - mean.w)^4) #formula
C
  kurt<-m4/(x.sd.w^4)-3 
  kurt}
kurtosis(mydata$var,weight)

# weighted Kurtosis is -0.5076363
# unweighted Kurtosis was -0.72
#-




--
View this message in context: 
http://r.789695.n4.nabble.com/Weighted-skewness-and-curtosis-tp4709956p4712612.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Bert Gunter
1. It is highly unlikely that we could be of help (unless someone else
has experienced this and knows what happened). You will have to
contact the Urkund people and ask them why their algorithms raised the
flags.

2. But of course, the regression methodology is not "your own" -- it's
just a standard tool that you used in your work, which is entirely
legitimate of course.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
 wrote:
>
> Dear 'R' community support,
>
>
> I am a student at Skema business school and I have recently submitted my MSc 
> thesis/dissertation. This has been passed on to an external plagiarism 
> service provider, Urkund, who have scanned my document and returned a 
> plagiarism report to my professor having detected 32% plagiarism.
>
>
> I have contacted Urkund regarding this issue having committed no such 
> plagiarism and they have told me that all the plagiarism detected in my 
> document comes from the last 25% which consists only of 'R' regressions like 
> the one I have pasted below:
>
> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
> Fed.t.4., data = OLS_CAR, x = TRUE)
>
> Residuals:
>   Min1QMedian3Q   Max
> -0.154587 -0.015961  0.001429  0.017196  0.110907
>
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.001630   0.001763  -0.925   0.3559
> Fed -0.121595   0.165359  -0.735   0.4627
> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
> Fed.t.2. 0.026529   0.143648   0.185   0.8536
> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.0293 on 304 degrees of freedom
>   (20 observations deleted due to missingness)
> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>
> I have produced all of these regressions myself and pasted them directly from 
> the 'R' software package. My regression methodology is entirely my own along 
> with the sourcing and preperation of the data used to produce these 
> statistics.
>
> I would be very grateful if you could provide my with some clarity as to why 
> this output from 'R' is reading as plagiarism.
>
> I would like to thank you in advance,
>
> Kind regards,
>
> Oliver Barrett
> (+44) 7341 834 217
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract from data.frame

2015-09-22 Thread Nico Gutierrez
Thank you all!
n

On Mon, Sep 21, 2015 at 10:00 PM, Bert Gunter 
wrote:

> No.
>
>
> On Mon, Sep 21, 2015 at 10:58 AM, John McKown
>  wrote:
> > On Mon, Sep 21, 2015 at 9:52 AM, Nico Gutierrez <
> nico.gutierr...@gmail.com>
> > wrote:
> >
> >> Hi All,
> >>
> >> I need to do the following operation from data.frame:
> >>
> >> df <- data.frame(Year = c("2001", "2002", "2003", "2004", "2005",
> "2006",
> >> "2007"), Amount = c(150, 120, 175, 160, 120, 105, 135))
> >> df[which.max(df$Amount),]  #to extract row with max Amount.
> >>
> >> Now I need to do 3 years average around the max Amount value (ie:
> >> mean(120,175,160))
> >>
> >> Thanks!
> >> N
> >>
> >>
> > The simplistic answer is something like:
> >
> > df <- structure(list(Year = structure(1:7, .Label = c("2001", "2002",
> > "2003", "2004", "2005", "2006", "2007"), class = "factor"), Amount =
> c(150,
> > 120, 175, 160, 120, 105, 135)), .Names = c("Year", "Amount"), row.names =
> > c(NA,
> > -7L), class = "data.frame");
> > wdf <- which.max(df$Amount);
> > adf3 <- mean(df$Amount[adf-1:adr+1]);
>
> Typos?!
> But it won't work anyway. See ?Syntax for operator precedence and
>
> Example:
>
> > a <- 1:5
> > mid <- 3
> > a[mid-1:mid+1]
> [1] 3 2 1
> > a[(mid-1):(mid+1)]
> [1] 2 3 4
>
> Cheers,
> Bert
>
>
> >
> > But that ignores the boundry condition where the maximum is at either
> end.
> > What do you want to do in that case?
> >
> >
> > --
> >
> > Schrodinger's backup: The condition of any backup is unknown until a
> > restore is attempted.
> >
> > Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will
> be.
> >
> > He's about as useful as a wax frying pan.
> >
> > 10 to the 12th power microphones = 1 Megaphone
> >
> > Maranatha! <><
> > John McKown
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2015-09-22 Thread John Kane

You seem to have sent a blank message.

John Kane
Kingston ON Canada


> -Original Message-
> From: fathi.s...@gmail.com
> Sent: Tue, 22 Sep 2015 16:29:07 +0330
> To: r-help@r-project.org
> Subject: [R] (no subject)
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] store results in loop

2015-09-22 Thread Michael Dewey

Dear Nico

Comment inline

On 22/09/2015 10:36, Nico Gutierrez wrote:

Hi All,

very rusty in R.. my results get overwritten when try to store within the
loop. This my code:
ListS=unique(data$Spec)
Stat<- numeric(0)

for(i in 5){



Is that what you meant? I would have expected something like 1:5


SS=subset(data,data$Spec==ListS[i])
maxC<- which.max(SS$Cc)
smoothC=mean(SS$Cc[maxC + c(-2:2)])
currC=tail(SS,1)$Cc
Index=currC/smoothC

Stat[i]=c(Stat, Index[i])
}
Stat

This is what I get:

[1] NA NA NA NA NA


I am obviously not indexing well here.


Thanks!!!

N

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] store results in loop

2015-09-22 Thread Nico Gutierrez
Hi All,

very rusty in R.. my results get overwritten when try to store within the
loop. This my code:
ListS=unique(data$Spec)
Stat<- numeric(0)

for(i in 5){

SS=subset(data,data$Spec==ListS[i])
maxC<- which.max(SS$Cc)
smoothC=mean(SS$Cc[maxC + c(-2:2)])
currC=tail(SS,1)$Cc
Index=currC/smoothC

Stat[i]=c(Stat, Index[i])
}
Stat

This is what I get:

[1] NA NA NA NA NA


I am obviously not indexing well here.


Thanks!!!

N

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] store results in loop

2015-09-22 Thread Jim Lemon
Hi Nico,
A bit difficult to see what is happening without the data, but two
suggestions:

smoothC=mean(SS$Cc[maxC + c(-2:2)],na.rm=TRUE)
...
Stat[i]<-Index

Jim


On Tue, Sep 22, 2015 at 7:36 PM, Nico Gutierrez 
wrote:

> Hi All,
>
> very rusty in R.. my results get overwritten when try to store within the
> loop. This my code:
> ListS=unique(data$Spec)
> Stat<- numeric(0)
>
> for(i in 5){
>
> SS=subset(data,data$Spec==ListS[i])
> maxC<- which.max(SS$Cc)
> smoothC=mean(SS$Cc[maxC + c(-2:2)])
> currC=tail(SS,1)$Cc
> Index=currC/smoothC
>
> Stat[i]=c(Stat, Index[i])
> }
> Stat
>
> This is what I get:
>
> [1] NA NA NA NA NA
>
>
> I am obviously not indexing well here.
>
>
> Thanks!!!
>
> N
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to coerce a parameter in nls?

2015-09-22 Thread Gabor Grothendieck
Just write out the 20 terms.

On Mon, Sep 21, 2015 at 10:26 PM, Jianling Fan 
wrote:

> Hello, Gabor,
>
> Thanks again for your suggestion. And now I am trying to improve the
> code by adding a function to replace the express "Rm1 * ref.1 + Rm2 *
> ref.2 + Rm3 * ref.3 + Rm4 * ref.4 + Rm5 * ref.5 + Rm6 * ref.6" because
> I have some other dataset need to fitted to the same model but with
> more groups (>20).
>
> I tried to add the function as:
>
> denfun<-function(i){
>for(i in 1:6){
>  Rm<-sum(Rm[i]*ref.i)
>  return(Rm)}
> }
>
> but I got another error when I incorporate this function into my
> regression:
>
> >fitdp1<-nlxb(den ~ denfun(6)/(1+(depth/d50)^c),
>data = dproot2,
>  start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
> Rm5=1.01, Rm6=1, d50=20, c=-1),
> masked = "Rm6")
>
> Error in deriv.default(parse(text = resexp), names(start)) :
>   Function 'denfun' is not in the derivatives table
>
> I think there must be something wrong with my function. I tried some
> times but am not sure how to improve it because I am quite new to R.
>
> Could anyone please give me some suggestion.
>
> Thanks a lot!
>
>
> Jianling
>
>
> On 22 September 2015 at 00:43, Gabor Grothendieck
>  wrote:
> > Express the formula in terms of simple operations like this:
> >
> > # add 0/1 columns ref.1, ref.2, ..., ref.6
> > dproot2 <- do.call(data.frame, transform(dproot, ref = outer(dproot$ref,
> > seq(6), "==") + 0))
> >
> > # now express the formula in terms of the new columns
> > library(nlmrt)
> > fitdp1<-nlxb(den ~ (Rm1 * ref.1 + Rm2 * ref.2 + Rm3 * ref.3 + Rm4 *
> ref.4 +
> > Rm5 * ref.5 + Rm6 * ref.6)/(1+(depth/d50)^c),
> >  data = dproot2,
> >  start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65, Rm5=1.01,
> Rm6=1,
> > d50=20, c=-1),
> >  masked = "Rm6")
> >
> > where we used this input:
> >
> > Lines <- "   depth   den ref
> > 1 20 0.573   1
> > 2 40 0.780   1
> > 3 60 0.947   1
> > 4 80 0.990   1
> > 5100 1.000   1
> > 6 10 0.600   2
> > 7 20 0.820   2
> > 8 30 0.930   2
> > 9 40 1.000   2
> > 1020 0.480   3
> > 1140 0.734   3
> > 1260 0.961   3
> > 1380 0.998   3
> > 14   100 1.000   3
> > 1520 3.2083491   4
> > 1640 4.9683383   4
> > 1760 6.2381133   4
> > 1880 6.5322348   4
> > 19   100 6.5780660   4
> > 20   120 6.6032064   4
> > 2120 0.614   5
> > 2240 0.827   5
> > 2360 0.950   5
> > 2480 0.995   5
> > 25   100 1.000   5
> > 2620 0.4345774   6
> > 2740 0.6654726   6
> > 2860 0.8480684   6
> > 2980 0.9268951   6
> > 30   100 0.9723207   6
> > 31   120 0.9939966   6
> > 32   140 0.9992400   6"
> >
> > dproot <- read.table(text = Lines, header = TRUE)
> >
> >
> >
> > On Mon, Sep 21, 2015 at 12:22 PM, Jianling Fan 
> > wrote:
> >>
> >> Thanks Prof. Nash,
> >>
> >> Sorry for late reply. I am learning and trying to use your nlmrt
> >> package since I got your email. It works good to mask a parameter in
> >> regression but seems does work for my equation. I think the problem is
> >> that the parameter I want to mask is a group-specific parameter and I
> >> have a "[]" syntax in my equation. However, I don't have your 2014
> >> book on hand and couldn't find it in our library. So I am wondering if
> >> nlxb works for group data?
> >> Thanks a lot!
> >>
> >> following is my code and I got a error form it.
> >>
> >> > fitdp1<-nlxb(den~Rm[ref]/(1+(depth/d50)^c),data=dproot,
> >> + start =c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
> >> Rm5=1.01, Rm6=1, d50=20, c=-1),
> >> + masked=c("Rm6"))
> >>
> >> Error in deriv.default(parse(text = resexp), names(start)) :
> >>   Function '`[`' is not in the derivatives table
> >>
> >>
> >> Best regards,
> >>
> >> Jianling
> >>
> >>
> >> On 20 September 2015 at 12:56, ProfJCNash  wrote:
> >> > I posted a suggestion to use nlmrt package (function nlxb to be
> >> > precise),
> >> > which has masked (fixed) parameters. Examples in my 2014 book on
> >> > Nonlinear
> >> > parameter optimization with R tools. However, I'm travelling just now,
> >> > or
> >> > would consider giving this a try.
> >> >
> >> > JN
> >> >
> >> >
> >> > On 15-09-20 01:19 PM, Jianling Fan wrote:
> >> >>
> >> >> no, I am doing a regression with 6 group data with 2 shared
> parameters
> >> >> and 1 different parameter for each group data. the parameter I want
> to
> >> >> coerce is for one group. I don't know how to do it. Any suggestion?
> >> >>
> >> >> Thanks!
> >> >>
> >> >> On 19 September 2015 at 13:33, Jeff Newmiller
> >> >> 
> >> >> wrote:
> >> >>>
> >> >>> Why not rewrite the function so that value is not a parameter?
> >> >>>
> >> >>>
> >> >>>
> 

Re: [R] How to coerce a parameter in nls?

2015-09-22 Thread Jianling Fan
 Hello Gabor,

It is very kind of you to reply and give suggestion so rapid. I will
try to learn and use it.

Thanks very much for your help!

Best regards,

Jianling

On 22 September 2015 at 06:45, Gabor Grothendieck
 wrote:
> Or if you really can't bear to write out 20 terms have R do it for you:
>
> # number of terms is the number of unique values in ref column
> nterms <- length(unique(dproot$ref))
>
> dproot2 <- do.call(data.frame, transform(dproot, ref = outer(dproot$ref,
> seq(nterms), "==") + 0))
>
> # construct the formula as a string
> terms <- paste( sprintf("Rm%d*ref.%d", 1:nterms, 1:nterms), collapse = "+")
> fo <- sprintf("den ~ (%s)/(1+(depth/d50)^c)", terms)
>
> library(nlmrt)
> fm <- nlxb(fo, data = dproot2, masked = "Rm6",
>  start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65, Rm5=1.01, Rm6=1,
> d50=20, c=-1))
>
>
> On Tue, Sep 22, 2015 at 7:04 AM, Gabor Grothendieck
>  wrote:
>>
>> Just write out the 20 terms.
>>
>> On Mon, Sep 21, 2015 at 10:26 PM, Jianling Fan 
>> wrote:
>>>
>>> Hello, Gabor,
>>>
>>> Thanks again for your suggestion. And now I am trying to improve the
>>> code by adding a function to replace the express "Rm1 * ref.1 + Rm2 *
>>> ref.2 + Rm3 * ref.3 + Rm4 * ref.4 + Rm5 * ref.5 + Rm6 * ref.6" because
>>> I have some other dataset need to fitted to the same model but with
>>> more groups (>20).
>>>
>>> I tried to add the function as:
>>>
>>> denfun<-function(i){
>>>for(i in 1:6){
>>>  Rm<-sum(Rm[i]*ref.i)
>>>  return(Rm)}
>>> }
>>>
>>> but I got another error when I incorporate this function into my
>>> regression:
>>>
>>> >fitdp1<-nlxb(den ~ denfun(6)/(1+(depth/d50)^c),
>>>data = dproot2,
>>>  start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
>>> Rm5=1.01, Rm6=1, d50=20, c=-1),
>>> masked = "Rm6")
>>>
>>> Error in deriv.default(parse(text = resexp), names(start)) :
>>>   Function 'denfun' is not in the derivatives table
>>>
>>> I think there must be something wrong with my function. I tried some
>>> times but am not sure how to improve it because I am quite new to R.
>>>
>>> Could anyone please give me some suggestion.
>>>
>>> Thanks a lot!
>>>
>>>
>>> Jianling
>>>
>>>
>>> On 22 September 2015 at 00:43, Gabor Grothendieck
>>>  wrote:
>>> > Express the formula in terms of simple operations like this:
>>> >
>>> > # add 0/1 columns ref.1, ref.2, ..., ref.6
>>> > dproot2 <- do.call(data.frame, transform(dproot, ref =
>>> > outer(dproot$ref,
>>> > seq(6), "==") + 0))
>>> >
>>> > # now express the formula in terms of the new columns
>>> > library(nlmrt)
>>> > fitdp1<-nlxb(den ~ (Rm1 * ref.1 + Rm2 * ref.2 + Rm3 * ref.3 + Rm4 *
>>> > ref.4 +
>>> > Rm5 * ref.5 + Rm6 * ref.6)/(1+(depth/d50)^c),
>>> >  data = dproot2,
>>> >  start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65, Rm5=1.01,
>>> > Rm6=1,
>>> > d50=20, c=-1),
>>> >  masked = "Rm6")
>>> >
>>> > where we used this input:
>>> >
>>> > Lines <- "   depth   den ref
>>> > 1 20 0.573   1
>>> > 2 40 0.780   1
>>> > 3 60 0.947   1
>>> > 4 80 0.990   1
>>> > 5100 1.000   1
>>> > 6 10 0.600   2
>>> > 7 20 0.820   2
>>> > 8 30 0.930   2
>>> > 9 40 1.000   2
>>> > 1020 0.480   3
>>> > 1140 0.734   3
>>> > 1260 0.961   3
>>> > 1380 0.998   3
>>> > 14   100 1.000   3
>>> > 1520 3.2083491   4
>>> > 1640 4.9683383   4
>>> > 1760 6.2381133   4
>>> > 1880 6.5322348   4
>>> > 19   100 6.5780660   4
>>> > 20   120 6.6032064   4
>>> > 2120 0.614   5
>>> > 2240 0.827   5
>>> > 2360 0.950   5
>>> > 2480 0.995   5
>>> > 25   100 1.000   5
>>> > 2620 0.4345774   6
>>> > 2740 0.6654726   6
>>> > 2860 0.8480684   6
>>> > 2980 0.9268951   6
>>> > 30   100 0.9723207   6
>>> > 31   120 0.9939966   6
>>> > 32   140 0.9992400   6"
>>> >
>>> > dproot <- read.table(text = Lines, header = TRUE)
>>> >
>>> >
>>> >
>>> > On Mon, Sep 21, 2015 at 12:22 PM, Jianling Fan 
>>> > wrote:
>>> >>
>>> >> Thanks Prof. Nash,
>>> >>
>>> >> Sorry for late reply. I am learning and trying to use your nlmrt
>>> >> package since I got your email. It works good to mask a parameter in
>>> >> regression but seems does work for my equation. I think the problem is
>>> >> that the parameter I want to mask is a group-specific parameter and I
>>> >> have a "[]" syntax in my equation. However, I don't have your 2014
>>> >> book on hand and couldn't find it in our library. So I am wondering if
>>> >> nlxb works for group data?
>>> >> Thanks a lot!
>>> >>
>>> >> following is my code and I got a error form it.
>>> >>
>>> >> > fitdp1<-nlxb(den~Rm[ref]/(1+(depth/d50)^c),data=dproot,
>>> >> + start =c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
>>> >> Rm5=1.01, 

Re: [R] How to coerce a parameter in nls?

2015-09-22 Thread Gabor Grothendieck
Or if you really can't bear to write out 20 terms have R do it for you:

# number of terms is the number of unique values in ref column
nterms <- length(unique(dproot$ref))

dproot2 <- do.call(data.frame, transform(dproot, ref =
outer(dproot$ref, seq(nterms),
"==") + 0))

# construct the formula as a string
terms <- paste( sprintf("Rm%d*ref.%d", 1:nterms, 1:nterms), collapse = "+")
fo <- sprintf("den ~ (%s)/(1+(depth/d50)^c)", terms)

library(nlmrt)
fm <- nlxb(fo, data = dproot2, masked = "Rm6",
 start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65, Rm5=1.01, Rm6=1,
d50=20, c=-1))


On Tue, Sep 22, 2015 at 7:04 AM, Gabor Grothendieck  wrote:

> Just write out the 20 terms.
>
> On Mon, Sep 21, 2015 at 10:26 PM, Jianling Fan 
> wrote:
>
>> Hello, Gabor,
>>
>> Thanks again for your suggestion. And now I am trying to improve the
>> code by adding a function to replace the express "Rm1 * ref.1 + Rm2 *
>> ref.2 + Rm3 * ref.3 + Rm4 * ref.4 + Rm5 * ref.5 + Rm6 * ref.6" because
>> I have some other dataset need to fitted to the same model but with
>> more groups (>20).
>>
>> I tried to add the function as:
>>
>> denfun<-function(i){
>>for(i in 1:6){
>>  Rm<-sum(Rm[i]*ref.i)
>>  return(Rm)}
>> }
>>
>> but I got another error when I incorporate this function into my
>> regression:
>>
>> >fitdp1<-nlxb(den ~ denfun(6)/(1+(depth/d50)^c),
>>data = dproot2,
>>  start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
>> Rm5=1.01, Rm6=1, d50=20, c=-1),
>> masked = "Rm6")
>>
>> Error in deriv.default(parse(text = resexp), names(start)) :
>>   Function 'denfun' is not in the derivatives table
>>
>> I think there must be something wrong with my function. I tried some
>> times but am not sure how to improve it because I am quite new to R.
>>
>> Could anyone please give me some suggestion.
>>
>> Thanks a lot!
>>
>>
>> Jianling
>>
>>
>> On 22 September 2015 at 00:43, Gabor Grothendieck
>>  wrote:
>> > Express the formula in terms of simple operations like this:
>> >
>> > # add 0/1 columns ref.1, ref.2, ..., ref.6
>> > dproot2 <- do.call(data.frame, transform(dproot, ref = outer(dproot$ref,
>> > seq(6), "==") + 0))
>> >
>> > # now express the formula in terms of the new columns
>> > library(nlmrt)
>> > fitdp1<-nlxb(den ~ (Rm1 * ref.1 + Rm2 * ref.2 + Rm3 * ref.3 + Rm4 *
>> ref.4 +
>> > Rm5 * ref.5 + Rm6 * ref.6)/(1+(depth/d50)^c),
>> >  data = dproot2,
>> >  start = c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65, Rm5=1.01,
>> Rm6=1,
>> > d50=20, c=-1),
>> >  masked = "Rm6")
>> >
>> > where we used this input:
>> >
>> > Lines <- "   depth   den ref
>> > 1 20 0.573   1
>> > 2 40 0.780   1
>> > 3 60 0.947   1
>> > 4 80 0.990   1
>> > 5100 1.000   1
>> > 6 10 0.600   2
>> > 7 20 0.820   2
>> > 8 30 0.930   2
>> > 9 40 1.000   2
>> > 1020 0.480   3
>> > 1140 0.734   3
>> > 1260 0.961   3
>> > 1380 0.998   3
>> > 14   100 1.000   3
>> > 1520 3.2083491   4
>> > 1640 4.9683383   4
>> > 1760 6.2381133   4
>> > 1880 6.5322348   4
>> > 19   100 6.5780660   4
>> > 20   120 6.6032064   4
>> > 2120 0.614   5
>> > 2240 0.827   5
>> > 2360 0.950   5
>> > 2480 0.995   5
>> > 25   100 1.000   5
>> > 2620 0.4345774   6
>> > 2740 0.6654726   6
>> > 2860 0.8480684   6
>> > 2980 0.9268951   6
>> > 30   100 0.9723207   6
>> > 31   120 0.9939966   6
>> > 32   140 0.9992400   6"
>> >
>> > dproot <- read.table(text = Lines, header = TRUE)
>> >
>> >
>> >
>> > On Mon, Sep 21, 2015 at 12:22 PM, Jianling Fan 
>> > wrote:
>> >>
>> >> Thanks Prof. Nash,
>> >>
>> >> Sorry for late reply. I am learning and trying to use your nlmrt
>> >> package since I got your email. It works good to mask a parameter in
>> >> regression but seems does work for my equation. I think the problem is
>> >> that the parameter I want to mask is a group-specific parameter and I
>> >> have a "[]" syntax in my equation. However, I don't have your 2014
>> >> book on hand and couldn't find it in our library. So I am wondering if
>> >> nlxb works for group data?
>> >> Thanks a lot!
>> >>
>> >> following is my code and I got a error form it.
>> >>
>> >> > fitdp1<-nlxb(den~Rm[ref]/(1+(depth/d50)^c),data=dproot,
>> >> + start =c(Rm1=1.01, Rm2=1.01, Rm3=1.01, Rm4=6.65,
>> >> Rm5=1.01, Rm6=1, d50=20, c=-1),
>> >> + masked=c("Rm6"))
>> >>
>> >> Error in deriv.default(parse(text = resexp), names(start)) :
>> >>   Function '`[`' is not in the derivatives table
>> >>
>> >>
>> >> Best regards,
>> >>
>> >> Jianling
>> >>
>> >>
>> >> On 20 September 2015 at 12:56, ProfJCNash 
>> wrote:
>> >> > I posted a suggestion to use nlmrt package (function nlxb to be
>> >> > 

[R] (no subject)

2015-09-22 Thread arsalan fathi


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need data labels to jitter with datapoints in boxplot

2015-09-22 Thread smheas
Thank you both for your responses! I ended up going with PIKAL Petr's
suggestion.



--
View this message in context: 
http://r.789695.n4.nabble.com/Need-data-labels-to-jitter-with-datapoints-in-boxplot-tp4712380p4712605.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Millisecond TimeStamps

2015-09-22 Thread JonyGreen
you can try this free  online unix timestamp creator
  , you can get the current
time stamp in milliseconds.



--
View this message in context: 
http://r.789695.n4.nabble.com/Millisecond-TimeStamps-tp3403594p4712598.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unixtime conversion

2015-09-22 Thread JonyGreen
you can try this  free online timestamp converter
  to convert timestamp to
readable date.




--
View this message in context: 
http://r.789695.n4.nabble.com/unixtime-conversion-tp829898p4712599.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in mlogit.optim

2015-09-22 Thread Alaa Sindi
Hi all

I hope you are doing well.

I am trying to install and use mlogit.optim and getting this error. 

Error: could not find function “mlogit.optim"

Warning in install.packages :
  unable to access index for repository https://cran.rstudio.com/src/contrib 

Warning in install.packages :
  unable to access index for repository https://cran.rstudio.com/src/contrib 

Warning in install.packages :
  package ‘mlogit.optim’ is not available (for R version 3.2.2)
Warning in install.packages :
  unable to access index for repository 
https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.2 



Thanks


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error from lme4: "Error: (p <- ncol(X)) == ncol(Y) is not TRUE"

2015-09-22 Thread Rory Wilson
Hello all, I am trying to run a random intercept model using lme4. The random 
effect is a factor of 29 possibilities, making a model with one random effect 
(one level). It is just a linear model. There are 713 observations. However, 
when trying to run the model I receive the error "Error: (p <- ncol(X)) == 
ncol(Y) is not TRUE",
a search for which reveals somewhat surprisingly little. Has anyone seen this 
before? Note that if I simply change the random effect into a fixed effect and 
use lm, the model works perfectly.Thank you!Rory
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.