Re: [R] RWeka Error

2016-04-12 Thread ‪Rini John‬ ‪ via R-help
Hi,When I use any function of RWeka Package in Rstudio I get an error, "Error 
in .jnew (name): java.lang.ClassFormatError." can anyone guide me in 
this?Operation system used: Linux 64 bit (CentOS)
Command used: >data("crude")>tdm <- TermDocumentMatrix(crude, 
control=list(tokenize = NGramTokenizer))
Packages loaded: tm and RWeka

Regards,Rini John 
  From: Jeff Newmiller 
 To: ‪Rini John‬ ‪ ; ‪Rini John‬ ‪ via R-help 
; "r-help@r-project.org"  
 Sent: Tuesday, 5 April 2016, 18:30:26
 Subject: Re: [R] RWeka Error
   
Read the Posting Guide mentioned at the bottom of this email. Highlights you 
should be sure to address:

* HTML formatted email gets messed up on the R mailing lists, so post in plain 
text. Yes, you can and need to do this. 

* Make sure the problem occurs in R by trying it without RStudio. Sometimes 
RStudio interferes with R, and you have to ask elsewhere about such problems. 

* Give us details about your setup and the exact commands you used. The 
sessionInfo function is helpful here, as is a sample of what you entered into a 
clean R session to get that error (for completeness). Make sure you are clear 
in your post about what operating system you are using, and what Java runtime 
(version and 32/64 bitness) is installed. 
-- 
Sent from my phone. Please excuse my brevity.

On April 5, 2016 5:14:55 AM PDT, "‪Rini John‬ ‪ via R-help" 
 wrote:

When I use any function of RWeka Package in Rstudio I get an error, "Error in 
.jnew (name): java.lang.ClassFormatError." can anyone guide me in this?

 [[alternative HTML version deleted]]


R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding Two-Headed Arrow in map legend

2016-04-12 Thread Jim Lemon
Hi Milu,
My fault here. As I don't have the data to make the map and try out my
suggestions I mixed up the x and y coordinates. Try this:

par(xpd=TRUE)
arrows(-19.75966,53,33.6,53,code=3)
par(xpd=FALSE)

Jim

On Tue, Apr 12, 2016 at 10:11 PM, Miluji Sb  wrote:
> Hello Jim,
>
> Thanks again. I am getting the two-headed arrow but I cannot seem to get the
> coordinates right for the arrow to appear beneath the map. These coordinates
> puts the arrow on the left hand side. Thanks again!
>
> Sincerely,
>
> Milu
>
> On Tue, Apr 12, 2016 at 1:15 PM, Jim Lemon  wrote:
>>
>> Hi Milu,
>> There is a two-headed arrow on the image you sent, and it seems to be
>> where you specified. Did you want it beneath the map, as:
>>
>> par(xpd=TRUE)
>> arrows(-22,54.75,-22,74,code=3)
>> par(xpd=FALSE)
>>
>> Jim
>>
>> On Tue, Apr 12, 2016 at 7:58 PM, Miluji Sb  wrote:
>> > Dear Jim,
>> >
>> > Thanks again! I do want the arrows at the bottom (beneath the map). This
>> > is
>> > what I am doing:
>> >
>> > # Draw the map
>> > eps_europe <- mapCountryData(n, nameColumnToPlot="eps_score",
>> > mapTitle="EPS
>> > Score - Europe",colourPalette=colourPalette,
>> > catMethod="fixedWidth", missingCountryCol = "white", mapRegion="Europe",
>> > addLegend=FALSE)
>> >
>> > # ISO3 codes on the map
>> > text(n, labels="ISO3", cex=0.30)
>> >
>> > # Obtain coordinates for the arrow
>> > par('usr')
>> >
>> > # -19.75966  54.75966  33.6  71.4
>> >
>> > # Arrows
>> > par(xpd=TRUE)
>> > arrows(-19.75966,  54.75966,  33.6,  71.4,code=3)
>> > par(xpd=FALSE)
>> >
>> > As the output shows I cannot seem to get the correct coordinates for the
>> > arrows. Thanks again.
>> >
>> > Sincerely,
>> >
>> > Milu
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error: contextstack overflow

2016-04-12 Thread John Kane
I've never seen the error mentioned before but see Brian Ripley's post 
https://stat.ethz.ch/pipermail/r-help/2008-March/157341.html. It looks like you 
are exceeding a limit.

We probably should see some sample code and data. 

Please have a look at 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 and/or http://adv-r.had.co.nz/Reproducibility.html  for some suggestions on 
what to include in a question.

John Kane
Kingston ON Canada


> -Original Message-
> From: r-help@r-project.org
> Sent: Tue, 12 Apr 2016 23:42:56 + (UTC)
> To: r-help@r-project.org
> Subject: [R] error: contextstack overflow
> 
> Dear r users
> I've a loop that has 20 if else statements but it gave me the
> ""contextstack overflow"" error at statement 14
> Is there any way to overcome this problem cause i'm forced to do that
> loop
> any help or recommendations please
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error: contextstack overflow

2016-04-12 Thread Mahmoud via R-help
Dear r users
I've a loop that has 20 if else statements but it gave me the ""contextstack 
overflow"" error at statement 14 
Is there any way to overcome this problem cause i'm forced to do that loop 
any help or recommendations please

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dissimilarity matrix and number clusters determination

2016-04-12 Thread Luisfo Chiroque via R-help
Dear Michael,

Yes, AFAIK you are correctly reading the results.
You can print
elbow.obj$k
to obtain the optimal number of clusters, and ‘visually’ you can check it 
plotting the variance vs #clusters
plot(css.obj$k, css.obj$ev)

HTH

Best,
Luisfo Chiroque
PhD Student
IMDEA Networks Institute
http://fourier.networks.imdea.org/people/~luis_nunez/ 

> El 12 abr 2016, a las 4:30, Michael Artz  escribió:
> 
> Hi,
>  I already have a dissimilarity matrix and I am submitting the results to
> the elbow.obj method to get an optimal number of clusters.  Am I reading
> the below output correctly that I should have 17 clusters?
> 
> code:
> top150 <- sampleset[1:150,]
> {cluster1 <- daisy(top150
>   , metric = c("gower")
>   , stand = TRUE
>   , type = list(symm = 1))
> }
> 
> dist.obj <- dist(cluster1)
> hclust.obj <- hclust(dist.obj)
> css.obj <- css.hclust(dist.obj,hclust.obj)
> elbow.obj <- elbow.batch(css.obj)
> 
> [1] "A \"good\" k=17 (EV=0.80) is detected when the EV is no less than
> 0.8\nand the increment of EV is no more than 0.01 for a bigger k.\n"
> attr(,"class")
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Red neuronal

2016-04-12 Thread Carlos Ortega
Hola Javier,

Los métodos de optimización basados en programación lineal (en sus
múltiples variedades) son los que se han venido utilizando con más
frecuencia en las empresas justamente para encontrar puntos óptimos de
fabricación (costes mínimos, maximizar ganancias, etc). Este tipo de
optimización lo puedes hacer con "R" sin problemas y ejemplos de usos,
buscando por "Linear Programming" los encuentras fácilmente. Aquí tienes un
ejemplo:

http://spartan.ac.brocku.ca/~pscarbrough/scarb-alp-burch/Chapters%201-24-16.htm

Ahora este tipo de algoritmos, aunque siguen siendo útiles, se utilizan
menos. Ya muchos de los ERPs llevan incorporados esta lógica y ya orientan
sobre los lotes de fabricación mínimos, momentos de lanzamiento de pedidos,
etc. Dejando la aplicación de estos otros nuevos algoritmos de optimización
(redes neuronales, algoritmos genéticos entre otros) para el análisis de
temas más de clasificación y modelización (para predecir resultados). Entre
otras cosas porque las redes neuronales presentan problemas de
convergencia, ajuste de parámetros, etc. Cosas que con un SIMPLEX no ocurre
los resultados son inmediatos.

Un caso de esta aplicación del uso de redes para temas un tanto peculiares
es este que apareció hace un tiempo en la prensa española:
http://www.elconfidencial.com/tecnologia/2015-05-05/quiebra-banco-rescate-banco-madrid_787740/
No sé si los datos están disponibles, para poder reproducir el caso...

Saludos,
Carlos Ortega
www.qualityexcellence.es

El 12 de abril de 2016, 22:45, Javier Marcuzzi <
javier.ruben.marcu...@gmail.com> escribió:

> Estimados
>
> Estoy pensado algo en redes neuronales, pero la documentación suele
> confundirme.
>
> Hay algunos ejemplos donde se entrena en algo muy simple como una tabla de
> multiplicar, o una recta de regresión.
>
> A esto en R lo podría resolver,  es muy simple con ln()…, si es algo más
> complejo con alguna librería de optimización.
>
> Si utilizo librerías de optimización puedo colocar valores, entre máximos
> y mínimos, calcular el mayor ingreso o el menor costo de producción. Un
> problema de costos.
>
> Ahora por ejemplo, monmlp.cost (librería monmlp, que utiliza redes
> neuronales) dice (MONMLP mean squared error cost function with analytical
> calculation of its gradient via backpropagation).
>
> Encuentro problemas con la documentación por la palabra costo, pensando en
> aumentar la ganancia o disminuir los egresos, no en costos informáticos por
> procesos numéricos.
>
> Los ejemplos que encuentro no son apropiados para estudiar un caso con R,
> aunque encontré algunos trabajos donde optimizan con redes neuronales.
>
> ¿Alguno usó una librería con alguna función para optimizar costos o
> ganancias en una empresa utilizando redes neuronales? Mi pregunta es libre,
> no es sobre datos concretos, forma parte de mis ejercicios personales con R.
>
> Javier Rubén Marcuzzi
>
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es




-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R] ggplot2

2016-04-12 Thread James Henson
Thanks, the stat="identity" worked.

On Tue, Apr 12, 2016 at 3:34 PM, Huzefa Khalil 
wrote:

> Hi James,
>
> If you want to specify the y-values, you need to use stat="identity" as
> below:
>
> ggplot(probability, aes(x=Fertilizer, y=prob)) +
> geom_bar(stat="identity", aes(fill=Treatment))
>
>
> best,
> huzefa
>
> On Tue, Apr 12, 2016 at 1:02 PM, James Henson  wrote:
> > Dear R Community,
> >
> > Below is a problem with a simple ggplot2 graph. The code returns the
> error
> > message below.
> >
> > Error: stat_count() must not be used with a y aesthetic.
> >
> > My code is below and the data is attached as a ‘text’ file.
> >
> >
> >
> > # Graph of the probabilities
> >
> > library(digest)
> >
> > library(DT)
> >
> > datatable(probability)
> >
> > str(probability)
> >
> > probability$Fertilizer <- as.factor(probability$Fertilizer)
> >
> > str(probability)
> >
> > library(ggplot2)
> >
> > plot1 <- ggplot(probability, aes(x=Fertilizer, y=prob)) +
> > geom_bar(aes(fill=Treatment))
> >
> > plot1
> >
> >
> >
> > Thanks.
> >
> > Best regards,
> >
> > James F. Henson
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R-es] Red neuronal

2016-04-12 Thread Javier Marcuzzi
Estimados

Estoy pensado algo en redes neuronales, pero la documentación suele confundirme.

Hay algunos ejemplos donde se entrena en algo muy simple como una tabla de 
multiplicar, o una recta de regresión.

A esto en R lo podría resolver,  es muy simple con ln()…, si es algo más 
complejo con alguna librería de optimización.

Si utilizo librerías de optimización puedo colocar valores, entre máximos y 
mínimos, calcular el mayor ingreso o el menor costo de producción. Un problema 
de costos.

Ahora por ejemplo, monmlp.cost (librería monmlp, que utiliza redes neuronales) 
dice (MONMLP mean squared error cost function with analytical calculation of 
its gradient via backpropagation).

Encuentro problemas con la documentación por la palabra costo, pensando en 
aumentar la ganancia o disminuir los egresos, no en costos informáticos por 
procesos numéricos.

Los ejemplos que encuentro no son apropiados para estudiar un caso con R, 
aunque encontré algunos trabajos donde optimizan con redes neuronales.

¿Alguno usó una librería con alguna función para optimizar costos o ganancias 
en una empresa utilizando redes neuronales? Mi pregunta es libre, no es sobre 
datos concretos, forma parte de mis ejercicios personales con R.

Javier Rubén Marcuzzi


[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R] ggplot2

2016-04-12 Thread Huzefa Khalil
Hi James,

If you want to specify the y-values, you need to use stat="identity" as below:

ggplot(probability, aes(x=Fertilizer, y=prob)) +
geom_bar(stat="identity", aes(fill=Treatment))


best,
huzefa

On Tue, Apr 12, 2016 at 1:02 PM, James Henson  wrote:
> Dear R Community,
>
> Below is a problem with a simple ggplot2 graph. The code returns the error
> message below.
>
> Error: stat_count() must not be used with a y aesthetic.
>
> My code is below and the data is attached as a ‘text’ file.
>
>
>
> # Graph of the probabilities
>
> library(digest)
>
> library(DT)
>
> datatable(probability)
>
> str(probability)
>
> probability$Fertilizer <- as.factor(probability$Fertilizer)
>
> str(probability)
>
> library(ggplot2)
>
> plot1 <- ggplot(probability, aes(x=Fertilizer, y=prob)) +
> geom_bar(aes(fill=Treatment))
>
> plot1
>
>
>
> Thanks.
>
> Best regards,
>
> James F. Henson
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread ProfJCNash
If you generate the list of pages you're comfortable editing, the posse
of folk who have already come forward can select one that we think can
be improved and see how we get along with it.

Sarah has already noted that Github offers wiki documentation. It is
likely imperfect, but we can (and should!) get a bit of experience to
learn where the important issues lie.

Thanks, JN

On 16-04-12 01:53 PM, Duncan Murdoch wrote:
> On 12/04/2016 11:30 AM, ProfJCNash wrote:
>> Thanks Duncan, for the offer to experiment.
>>
>> Can you suggest a couple of your pages that you think might need
>> improvement? We might as well start with something you'd like looked at.
> 
> I don't think I can.  I don't intentionally write obscure documentation,
> so I think they're all clearly written.
> 
> Which leaves the problem of choosing one.  I could probably generate a
> list of help pages where I've contributed enough
> to be comfortable editing them, but you'll need to choose which one to fix.
> 
> Duncan
>>
>> Then I'll ask if there are interested people and see what can be done
>> about getting a framework set up to work on one of those documents.
>>
>> JN
>>
>>
>> On 16-04-12 10:52 AM, Duncan Murdoch wrote:
>> > On 12/04/2016 9:21 AM, ProfJCNash wrote:
>> >>  "The documentation aims to be accurate, not necessarily clear."
>> >> > I notice that none of the critics
>> >> > in this thread have offered improvements on what is there.
>> >>
>> >>
>> >> This issue is as old as documented things. With software it is
>> >> particularly nasty, especially when we want the software to function
>> >> across many platforms.
>> >>
>> >> Duncan has pointed out that critics need to step up to do something.
>> >> I would put documentation failures at the top of my list of
>> >> time-wasters, and have been bitten by some particularly weak offerings
>> >> (not in R) in the last 2 weeks. So 
>> >>
>> >> Proposal: That the R community consider establishing a "test and
>> >> document" group to parallel R-core to focus on the documentation.
>> >> An experiment to test the waters is suggested below.
>> >>
>> >> The needs:
>> >> - tools that let the difficulties with documentation be visualized
>> along
>> >> with proposed changes and the discussion accessed by the wider
>> >> community, while keeping a well-defined process for committing
>> accepted
>> >> changes.
>> >> - a process for the above. Right now a lot happens by discussion in
>> the
>> >> lists and someone in R-core committing the result. If it is
>> >> well-organized, it is not well-understood by the wider R user
>> community.
>> >> - tools for managing and providing access to tests
>> >>
>> >> At the risk of opening another can of worms, documentation is an area
>> >> where such an effort could benefit from paid help. It's an area where
>> >> there's low reward for high effort, particularly for volunteers.
>> >> Moreover, like many volunteers, I'm happy to do some work, but I need
>> >> ways to contribute in small bites (bytes?), and it is difficult to
>> find
>> >> suitable tasks to take on.
>> >>
>> >> Is it worth an experiment to customize something like Dokuwiki
>> (which I
>> >> believe was the platform for the apparently defunct R wiki) to allow a
>> >> segment of R documentation to be reviewed, discussed and changes
>> >> proposed? It could show how we might get to a better process for
>> >> managing R documentation.
>> >
>> > The idea of having non-core people write and test documentation appeals
>> > to me.   The mechanism (Dokuwiki or whatever) makes no difference to
>> me;
>> > it should be up to the participants to decide on what works.
>> >
>> > The difficulty will be "calibration":  those people need to make
>> changes
>> > that core members agree are improvements, or they won't be
>> incorporated.
>> >
>> > I'd suggest that you start very slowly.  First choose *one* help page
>> > that you think needs improvement, and explain why to one of the authors
>> > of that page, and what sort of improvements you propose to make.  Then
>> > get  the author to agree with the proposal, do it, and get the same
>> > author to agree to the final version and commit it.
>> >
>> > I'll volunteer to participate in the approval and committing stage, but
>> > at first only for pages that I authored.  If it turns out to be an
>> > efficient way to improve docs, then I'd consider other pages too.
>> >
>> > Duncan Murdoch
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread William Michels via R-help
On Tue, Apr 12, 2016 at 9:44 AM, David Winsemius  wrote:

>
>  There need to be more worked examples, but those could easily be mined from 
> problems submitted as recorded in the R-help Archives and StackOverFlow.
>


This sounds like a great opportunity for R-users to contribute to the
community (and I certainly would love to participate).

One question for R-Core gurus: R-GUIs have the ability to open a
script window and use a shortcut to execute code in the R-Console. Can
each "Example" on the help pages be configured to do the same? Or at
least assist in block-copying to the Console?

We'd get a lot more people working through examples that way, and
contributors might come up with their own examples to illustrate a
particular function. A Dokuwiki site might be a place where people
could post and vote on new examples to be included in pre-existing
documentation.

--Bill
William Michels, Ph.D.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread Duncan Murdoch

On 12/04/2016 11:30 AM, ProfJCNash wrote:

Thanks Duncan, for the offer to experiment.

Can you suggest a couple of your pages that you think might need
improvement? We might as well start with something you'd like looked at.


I don't think I can.  I don't intentionally write obscure documentation, 
so I think they're all clearly written.


Which leaves the problem of choosing one.  I could probably generate a 
list of help pages where I've contributed enough

to be comfortable editing them, but you'll need to choose which one to fix.

Duncan


Then I'll ask if there are interested people and see what can be done
about getting a framework set up to work on one of those documents.

JN


On 16-04-12 10:52 AM, Duncan Murdoch wrote:
> On 12/04/2016 9:21 AM, ProfJCNash wrote:
>>  "The documentation aims to be accurate, not necessarily clear."
>> > I notice that none of the critics
>> > in this thread have offered improvements on what is there.
>>
>>
>> This issue is as old as documented things. With software it is
>> particularly nasty, especially when we want the software to function
>> across many platforms.
>>
>> Duncan has pointed out that critics need to step up to do something.
>> I would put documentation failures at the top of my list of
>> time-wasters, and have been bitten by some particularly weak offerings
>> (not in R) in the last 2 weeks. So 
>>
>> Proposal: That the R community consider establishing a "test and
>> document" group to parallel R-core to focus on the documentation.
>> An experiment to test the waters is suggested below.
>>
>> The needs:
>> - tools that let the difficulties with documentation be visualized along
>> with proposed changes and the discussion accessed by the wider
>> community, while keeping a well-defined process for committing accepted
>> changes.
>> - a process for the above. Right now a lot happens by discussion in the
>> lists and someone in R-core committing the result. If it is
>> well-organized, it is not well-understood by the wider R user community.
>> - tools for managing and providing access to tests
>>
>> At the risk of opening another can of worms, documentation is an area
>> where such an effort could benefit from paid help. It's an area where
>> there's low reward for high effort, particularly for volunteers.
>> Moreover, like many volunteers, I'm happy to do some work, but I need
>> ways to contribute in small bites (bytes?), and it is difficult to find
>> suitable tasks to take on.
>>
>> Is it worth an experiment to customize something like Dokuwiki (which I
>> believe was the platform for the apparently defunct R wiki) to allow a
>> segment of R documentation to be reviewed, discussed and changes
>> proposed? It could show how we might get to a better process for
>> managing R documentation.
>
> The idea of having non-core people write and test documentation appeals
> to me.   The mechanism (Dokuwiki or whatever) makes no difference to me;
> it should be up to the participants to decide on what works.
>
> The difficulty will be "calibration":  those people need to make changes
> that core members agree are improvements, or they won't be incorporated.
>
> I'd suggest that you start very slowly.  First choose *one* help page
> that you think needs improvement, and explain why to one of the authors
> of that page, and what sort of improvements you propose to make.  Then
> get  the author to agree with the proposal, do it, and get the same
> author to agree to the final version and commit it.
>
> I'll volunteer to participate in the approval and committing stage, but
> at first only for pages that I authored.  If it turns out to be an
> efficient way to improve docs, then I'd consider other pages too.
>
> Duncan Murdoch


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2

2016-04-12 Thread James Henson
Dear R Community,

Below is a problem with a simple ggplot2 graph. The code returns the error
message below.

Error: stat_count() must not be used with a y aesthetic.

My code is below and the data is attached as a ‘text’ file.



# Graph of the probabilities

library(digest)

library(DT)

datatable(probability)

str(probability)

probability$Fertilizer <- as.factor(probability$Fertilizer)

str(probability)

library(ggplot2)

plot1 <- ggplot(probability, aes(x=Fertilizer, y=prob)) +
geom_bar(aes(fill=Treatment))

plot1



Thanks.

Best regards,

James F. Henson
Trt probLL  UL  Fertilizer  Treatment
S0  0.  0.0154  0.4998  0   S
S2  0.  0.0154  0.4998  2   S
S4  0.  0.0154  0.4998  4   S
S6  0.  0.0154  0.4998  6   S
P0  0.  0.056   0.579   0   P
P2  0.7778  0.4208  0.9439  2   P
P3  0.333   0.  0.6665  4   P
P4  0.6667  0.3334  0.  6   P
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread Duncan Murdoch

On 12/04/2016 12:44 PM, David Winsemius wrote:

> On Apr 12, 2016, at 8:31 AM, Sarah Goslee  wrote:
>
> I am very interested in such a distributed documentation editing
> project, and have some thoughts on how to make it workable for both
> volunteers and core members who would need to review.
>
> I'm willing to lead or colead such a project, if someone stepping up
> would be a useful first step, and I'm also willing to host a wiki,
> although I think something like GitHub is probably the best place.
> I've been contemplating for a while how I can get more involved in the
> main R efforts, and have contributed to the documentation before, in
> tiny ways. I think those of us who have participated in R-help for a
> while have an idea of the main stumbling blocks in the documentation
> (besides, of course, getting people to read it in the first place).
>
> I don't think R-help is the right place to continue discussion; should
> this be moved to R-devel, or somewhere else entirely?

I'm in. My personal experience with R's documentation has been mostly satisfactory, once 
I learned to pay careful attention to the words 'list', 'name', and 'expression'.  I'm 
not an experienced C programmer or package author, so the requirement that I submit a 
"diff" file to an existing document is a hurdle that I cannot not yet clear 
while running, but I can probably muscle my way over. I remember taking a big step up in 
learning R when I built a Powerpoint deck to teach basic R, so I would probably learn 
quite a bit from such a process.

My nomination for an improvement 'target' is the `?reshape` page. I've never 
been able to understand it, despite years of trying, and I've seen many others 
report a similar experience. Opinion: Its Details section needs to be expanded 
into two distinct subsections: a 'wide'-direction subsection and a 
'long'-direction subsection. Each subsection would outline the minimum number 
of supplied arguments for an error-free execution. There need to be more worked 
examples, but those could easily be mined from problems submitted as recorded 
in the R-help Archives and StackOverFlow.


I'd suggest something different -- that one sounds hard.  The revision 
history for the last 13 years is shown below.  Though my name appears, 
it's for trivial changes, and I wouldn't consider myself to be an 
author, and I do not offer to participate in the revision.


Duncan


r68948 | ripley | 2015-08-09 10:51:17 -0400 (Sun, 09 Aug 2015) | 1 line

use https

r68070 | hornik | 2015-03-24 03:32:16 -0400 (Tue, 24 Mar 2015) | 1 line

Spelling.

r68059 | ripley | 2015-03-21 18:14:23 -0400 (Sat, 21 Mar 2015) | 1 line

faulty svn merge, one more partial match

r68055 | ripley | 2015-03-21 16:08:08 -0400 (Sat, 21 Mar 2015) | 1 line

document where use of match.arg allows partial matching

r61521 | ripley | 2013-01-02 10:09:27 -0500 (Wed, 02 Jan 2013) | 1 line

correction

r61433 | ripley | 2012-12-25 07:19:50 -0500 (Tue, 25 Dec 2012) | 1 line

remove trailing spaces

r59039 | ripley | 2012-04-15 06:32:41 -0400 (Sun, 15 Apr 2012) | 1 line

use preferred form of 'R Core Team'

r57816 | ripley | 2011-12-05 03:08:17 -0500 (Mon, 05 Dec 2011) | 1 line

more tidying up

r57814 | ripley | 2011-12-05 02:42:30 -0500 (Mon, 05 Dec 2011) | 1 line

add note about duplicate records

r57094 | maechler | 2011-09-27 11:32:03 -0400 (Tue, 27 Sep 2011) | 1 line

white space only

r56186 | murdoch | 2011-06-19 21:51:45 -0400 (Sun, 19 Jun 2011) | 1 line

Revert r56184 and r56185

r56184 | murdoch | 2011-06-19 19:58:46 -0400 (Sun, 19 Jun 2011) | 1 line

Remove redundant \alias entries from man pages

r53637 | ripley | 2010-11-19 18:13:25 -0500 (Fri, 19 Nov 2010) | 1 line

revise documentation for PR#14435

r50263 | ripley | 2009-10-30 13:52:33 -0400 (Fri, 30 Oct 2009) | 3 lines

spell 'indices' consistently
add an example of CJK fonts for quartz devices (borrowed from Ei-Ji Nakama)


Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread David Winsemius

> On Apr 12, 2016, at 8:31 AM, Sarah Goslee  wrote:
> 
> I am very interested in such a distributed documentation editing
> project, and have some thoughts on how to make it workable for both
> volunteers and core members who would need to review.
> 
> I'm willing to lead or colead such a project, if someone stepping up
> would be a useful first step, and I'm also willing to host a wiki,
> although I think something like GitHub is probably the best place.
> I've been contemplating for a while how I can get more involved in the
> main R efforts, and have contributed to the documentation before, in
> tiny ways. I think those of us who have participated in R-help for a
> while have an idea of the main stumbling blocks in the documentation
> (besides, of course, getting people to read it in the first place).
> 
> I don't think R-help is the right place to continue discussion; should
> this be moved to R-devel, or somewhere else entirely?

I'm in. My personal experience with R's documentation has been mostly 
satisfactory, once I learned to pay careful attention to the words 'list', 
'name', and 'expression'.  I'm not an experienced C programmer or package 
author, so the requirement that I submit a "diff" file to an existing document 
is a hurdle that I cannot not yet clear while running, but I can probably 
muscle my way over. I remember taking a big step up in learning R when I built 
a Powerpoint deck to teach basic R, so I would probably learn quite a bit from 
such a process.

My nomination for an improvement 'target' is the `?reshape` page. I've never 
been able to understand it, despite years of trying, and I've seen many others 
report a similar experience. Opinion: Its Details section needs to be expanded 
into two distinct subsections: a 'wide'-direction subsection and a 
'long'-direction subsection. Each subsection would outline the minimum number 
of supplied arguments for an error-free execution. There need to be more worked 
examples, but those could easily be mined from problems submitted as recorded 
in the R-help Archives and StackOverFlow.

-- 
David.

> 
> Sarah
> 
> On Tue, Apr 12, 2016 at 11:06 AM, Bert Gunter  wrote:
>> FWIW:
>> 
>> 1. I agree that this is an idea worth considering. Especially now that
>> R has become so widely used among practitioners who are neither
>> especially software literate nor interested in poring over R manuals
>> (as I did when I first learned R). They have explicit tasks to do and
>> just want to get to them as directly as possible.
>> 
>> 2. A partial reply to the (fair) criticism of those who criticize docs
>> without offering improvements is that one may not know what
>> improvement to offer precisely because the docs do not make it clear.
>> This proposal or something similar addresses this issue. The experts
>> could adjudicate.
>> 
>> 3. I agree: writing good docs is hard. Having a mechanism like this
>> would also help non-native English writers of software (or challenged
>> native writers like me!) .
>> 
>> 4. I also think John is right, that if the right mechanism were found
>> so that small efforts could be accumulated, a lot of us would
>> participate. A wiki sounds about right, but I bow to those with
>> greater wisdom and experience here.
>> 
>> 5. The danger here is that this would suck a lot of time from R core.
>> That's unacceptable. Presumably a wiki (self-correcting?) would help
>> avoid this.
>> 
>> Cheers,
>> Bert
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Tue, Apr 12, 2016 at 6:21 AM, ProfJCNash  wrote:
>>> 
>>> "The documentation aims to be accurate, not necessarily clear."
 I notice that none of the critics
 in this thread have offered improvements on what is there.
>>> 
>>> 
>>> This issue is as old as documented things. With software it is
>>> particularly nasty, especially when we want the software to function
>>> across many platforms.
>>> 
>>> Duncan has pointed out that critics need to step up to do something.
>>> I would put documentation failures at the top of my list of
>>> time-wasters, and have been bitten by some particularly weak offerings
>>> (not in R) in the last 2 weeks. So 
>>> 
>>> Proposal: That the R community consider establishing a "test and
>>> document" group to parallel R-core to focus on the documentation.
>>> An experiment to test the waters is suggested below.
>>> 
>>> The needs:
>>> - tools that let the difficulties with documentation be visualized along
>>> with proposed changes and the discussion accessed by the wider
>>> community, while keeping a well-defined process for committing accepted
>>> changes.
>>> - a process for the above. Right now a lot happens by discussion in the
>>> lists and someone in R-core committing the result. If it is
>>> 

Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread Sarah Goslee
I am very interested in such a distributed documentation editing
project, and have some thoughts on how to make it workable for both
volunteers and core members who would need to review.

I'm willing to lead or colead such a project, if someone stepping up
would be a useful first step, and I'm also willing to host a wiki,
although I think something like GitHub is probably the best place.
I've been contemplating for a while how I can get more involved in the
main R efforts, and have contributed to the documentation before, in
tiny ways. I think those of us who have participated in R-help for a
while have an idea of the main stumbling blocks in the documentation
(besides, of course, getting people to read it in the first place).

I don't think R-help is the right place to continue discussion; should
this be moved to R-devel, or somewhere else entirely?

Sarah

On Tue, Apr 12, 2016 at 11:06 AM, Bert Gunter  wrote:
> FWIW:
>
> 1. I agree that this is an idea worth considering. Especially now that
> R has become so widely used among practitioners who are neither
> especially software literate nor interested in poring over R manuals
> (as I did when I first learned R). They have explicit tasks to do and
> just want to get to them as directly as possible.
>
> 2. A partial reply to the (fair) criticism of those who criticize docs
> without offering improvements is that one may not know what
> improvement to offer precisely because the docs do not make it clear.
> This proposal or something similar addresses this issue. The experts
> could adjudicate.
>
> 3. I agree: writing good docs is hard. Having a mechanism like this
> would also help non-native English writers of software (or challenged
> native writers like me!) .
>
> 4. I also think John is right, that if the right mechanism were found
> so that small efforts could be accumulated, a lot of us would
> participate. A wiki sounds about right, but I bow to those with
> greater wisdom and experience here.
>
> 5. The danger here is that this would suck a lot of time from R core.
> That's unacceptable. Presumably a wiki (self-correcting?) would help
> avoid this.
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 12, 2016 at 6:21 AM, ProfJCNash  wrote:
>>
>> "The documentation aims to be accurate, not necessarily clear."
>>> I notice that none of the critics
>>> in this thread have offered improvements on what is there.
>>
>>
>> This issue is as old as documented things. With software it is
>> particularly nasty, especially when we want the software to function
>> across many platforms.
>>
>> Duncan has pointed out that critics need to step up to do something.
>> I would put documentation failures at the top of my list of
>> time-wasters, and have been bitten by some particularly weak offerings
>> (not in R) in the last 2 weeks. So 
>>
>> Proposal: That the R community consider establishing a "test and
>> document" group to parallel R-core to focus on the documentation.
>> An experiment to test the waters is suggested below.
>>
>> The needs:
>> - tools that let the difficulties with documentation be visualized along
>> with proposed changes and the discussion accessed by the wider
>> community, while keeping a well-defined process for committing accepted
>> changes.
>> - a process for the above. Right now a lot happens by discussion in the
>> lists and someone in R-core committing the result. If it is
>> well-organized, it is not well-understood by the wider R user community.
>> - tools for managing and providing access to tests
>>
>> At the risk of opening another can of worms, documentation is an area
>> where such an effort could benefit from paid help. It's an area where
>> there's low reward for high effort, particularly for volunteers.
>> Moreover, like many volunteers, I'm happy to do some work, but I need
>> ways to contribute in small bites (bytes?), and it is difficult to find
>> suitable tasks to take on.
>>
>> Is it worth an experiment to customize something like Dokuwiki (which I
>> believe was the platform for the apparently defunct R wiki) to allow a
>> segment of R documentation to be reviewed, discussed and changes
>> proposed? It could show how we might get to a better process for
>> managing R documentation.
>>
>> Cheers, JN
>>
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread ProfJCNash
Thanks Duncan, for the offer to experiment.

Can you suggest a couple of your pages that you think might need
improvement? We might as well start with something you'd like looked at.

Then I'll ask if there are interested people and see what can be done
about getting a framework set up to work on one of those documents.

JN


On 16-04-12 10:52 AM, Duncan Murdoch wrote:
> On 12/04/2016 9:21 AM, ProfJCNash wrote:
>>  "The documentation aims to be accurate, not necessarily clear."
>> > I notice that none of the critics
>> > in this thread have offered improvements on what is there.
>>
>>
>> This issue is as old as documented things. With software it is
>> particularly nasty, especially when we want the software to function
>> across many platforms.
>>
>> Duncan has pointed out that critics need to step up to do something.
>> I would put documentation failures at the top of my list of
>> time-wasters, and have been bitten by some particularly weak offerings
>> (not in R) in the last 2 weeks. So 
>>
>> Proposal: That the R community consider establishing a "test and
>> document" group to parallel R-core to focus on the documentation.
>> An experiment to test the waters is suggested below.
>>
>> The needs:
>> - tools that let the difficulties with documentation be visualized along
>> with proposed changes and the discussion accessed by the wider
>> community, while keeping a well-defined process for committing accepted
>> changes.
>> - a process for the above. Right now a lot happens by discussion in the
>> lists and someone in R-core committing the result. If it is
>> well-organized, it is not well-understood by the wider R user community.
>> - tools for managing and providing access to tests
>>
>> At the risk of opening another can of worms, documentation is an area
>> where such an effort could benefit from paid help. It's an area where
>> there's low reward for high effort, particularly for volunteers.
>> Moreover, like many volunteers, I'm happy to do some work, but I need
>> ways to contribute in small bites (bytes?), and it is difficult to find
>> suitable tasks to take on.
>>
>> Is it worth an experiment to customize something like Dokuwiki (which I
>> believe was the platform for the apparently defunct R wiki) to allow a
>> segment of R documentation to be reviewed, discussed and changes
>> proposed? It could show how we might get to a better process for
>> managing R documentation.
> 
> The idea of having non-core people write and test documentation appeals
> to me.   The mechanism (Dokuwiki or whatever) makes no difference to me;
> it should be up to the participants to decide on what works.
> 
> The difficulty will be "calibration":  those people need to make changes
> that core members agree are improvements, or they won't be incorporated.
> 
> I'd suggest that you start very slowly.  First choose *one* help page
> that you think needs improvement, and explain why to one of the authors
> of that page, and what sort of improvements you propose to make.  Then
> get  the author to agree with the proposal, do it, and get the same
> author to agree to the final version and commit it.
> 
> I'll volunteer to participate in the approval and committing stage, but
> at first only for pages that I authored.  If it turns out to be an
> efficient way to improve docs, then I'd consider other pages too.
> 
> Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation between package output

2016-04-12 Thread Sarah Goslee
Hi Fabio,

Using the first example from ?dbFD

ex1 <- dbFD(dummy$trait, dummy$abun)

If you look at that help page, or at str(ex1), you'll see that the
returned object is a list with named components. So, you can access
the different indices just as you would access any other list. If
that's confusing to you, a good basic intro to R might be just the
thing.

Here are two ways to do so:
with(ex1, plot(nbsp, FRic))
cor(ex1$nbsp, ex1$FRic, use="pair") # the toy example has one NA value

You might in the future find R-sig-ecology to be a better place to ask
this sort of question.

Sarah


On Mon, Apr 11, 2016 at 9:20 AM, Fabio Monteiro
 wrote:
> Hello
>
> I'm currently using the dbFD function of the FD package and i'm having some
> things that I can't do.
>
> Is there any way to check the relations between dbFD indexes?
>
> Function cor for example? I can't manage to put the informations correctly
>
> dbFD function gives a lot of output (indexes - nbsp, sing.sp, FRic, FEve,
> FDiv, FDis and RaoQ). I want to see the relationships between the dbFD
> output (nbsp, sing.sp, FRic, FEve, FDiv, FDis and RaoQ)
>
> How should I type it?
>
> Thank you
>
> Fábio
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula argument evaluation

2016-04-12 Thread Richard M. Heiberger
Would making it regular function %=>%, using "%" instead of quotes,
work for you?

On Tue, Apr 12, 2016 at 11:09 AM, Adrian Dușa  wrote:
> On Tue, Apr 12, 2016 at 2:08 PM, Duncan Murdoch 
> wrote:
>> [...]
>>
>> It never gets to evaluating it.  It is not a legal R statement, so the
> parser signals an error.
>> If you want to pass arbitrary strings to a function, you need to put them
> in quotes.
>
> I see. I thought it was parsed inside the function, but if it's parsed
> before then quoting is the only option.
>
>
> To Keith: no, I mean it like this "A + B => C" which is translated as:
> "the union of A and B is sufficient for C" in set theoretic language.
>
> The "=>" operator means sufficiency, while "<=" means necessity. Quoting
> the expression is good enough, I was just curious if the quotes could be
> made redundant, somehow.
>
> Thank you both,
> Adrian
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula argument evaluation

2016-04-12 Thread Adrian Dușa
On Tue, Apr 12, 2016 at 2:08 PM, Duncan Murdoch 
wrote:
> [...]
>
> It never gets to evaluating it.  It is not a legal R statement, so the
parser signals an error.
> If you want to pass arbitrary strings to a function, you need to put them
in quotes.

I see. I thought it was parsed inside the function, but if it's parsed
before then quoting is the only option.


To Keith: no, I mean it like this "A + B => C" which is translated as:
"the union of A and B is sufficient for C" in set theoretic language.

The "=>" operator means sufficiency, while "<=" means necessity. Quoting
the expression is good enough, I was just curious if the quotes could be
made redundant, somehow.

Thank you both,
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread Bert Gunter
FWIW:

1. I agree that this is an idea worth considering. Especially now that
R has become so widely used among practitioners who are neither
especially software literate nor interested in poring over R manuals
(as I did when I first learned R). They have explicit tasks to do and
just want to get to them as directly as possible.

2. A partial reply to the (fair) criticism of those who criticize docs
without offering improvements is that one may not know what
improvement to offer precisely because the docs do not make it clear.
This proposal or something similar addresses this issue. The experts
could adjudicate.

3. I agree: writing good docs is hard. Having a mechanism like this
would also help non-native English writers of software (or challenged
native writers like me!) .

4. I also think John is right, that if the right mechanism were found
so that small efforts could be accumulated, a lot of us would
participate. A wiki sounds about right, but I bow to those with
greater wisdom and experience here.

5. The danger here is that this would suck a lot of time from R core.
That's unacceptable. Presumably a wiki (self-correcting?) would help
avoid this.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 12, 2016 at 6:21 AM, ProfJCNash  wrote:
>
> "The documentation aims to be accurate, not necessarily clear."
>> I notice that none of the critics
>> in this thread have offered improvements on what is there.
>
>
> This issue is as old as documented things. With software it is
> particularly nasty, especially when we want the software to function
> across many platforms.
>
> Duncan has pointed out that critics need to step up to do something.
> I would put documentation failures at the top of my list of
> time-wasters, and have been bitten by some particularly weak offerings
> (not in R) in the last 2 weeks. So 
>
> Proposal: That the R community consider establishing a "test and
> document" group to parallel R-core to focus on the documentation.
> An experiment to test the waters is suggested below.
>
> The needs:
> - tools that let the difficulties with documentation be visualized along
> with proposed changes and the discussion accessed by the wider
> community, while keeping a well-defined process for committing accepted
> changes.
> - a process for the above. Right now a lot happens by discussion in the
> lists and someone in R-core committing the result. If it is
> well-organized, it is not well-understood by the wider R user community.
> - tools for managing and providing access to tests
>
> At the risk of opening another can of worms, documentation is an area
> where such an effort could benefit from paid help. It's an area where
> there's low reward for high effort, particularly for volunteers.
> Moreover, like many volunteers, I'm happy to do some work, but I need
> ways to contribute in small bites (bytes?), and it is difficult to find
> suitable tasks to take on.
>
> Is it worth an experiment to customize something like Dokuwiki (which I
> believe was the platform for the apparently defunct R wiki) to allow a
> segment of R documentation to be reviewed, discussed and changes
> proposed? It could show how we might get to a better process for
> managing R documentation.
>
> Cheers, JN
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread Duncan Murdoch

On 12/04/2016 9:21 AM, ProfJCNash wrote:

 "The documentation aims to be accurate, not necessarily clear."
> I notice that none of the critics
> in this thread have offered improvements on what is there.


This issue is as old as documented things. With software it is
particularly nasty, especially when we want the software to function
across many platforms.

Duncan has pointed out that critics need to step up to do something.
I would put documentation failures at the top of my list of
time-wasters, and have been bitten by some particularly weak offerings
(not in R) in the last 2 weeks. So 

Proposal: That the R community consider establishing a "test and
document" group to parallel R-core to focus on the documentation.
An experiment to test the waters is suggested below.

The needs:
- tools that let the difficulties with documentation be visualized along
with proposed changes and the discussion accessed by the wider
community, while keeping a well-defined process for committing accepted
changes.
- a process for the above. Right now a lot happens by discussion in the
lists and someone in R-core committing the result. If it is
well-organized, it is not well-understood by the wider R user community.
- tools for managing and providing access to tests

At the risk of opening another can of worms, documentation is an area
where such an effort could benefit from paid help. It's an area where
there's low reward for high effort, particularly for volunteers.
Moreover, like many volunteers, I'm happy to do some work, but I need
ways to contribute in small bites (bytes?), and it is difficult to find
suitable tasks to take on.

Is it worth an experiment to customize something like Dokuwiki (which I
believe was the platform for the apparently defunct R wiki) to allow a
segment of R documentation to be reviewed, discussed and changes
proposed? It could show how we might get to a better process for
managing R documentation.


The idea of having non-core people write and test documentation appeals 
to me.   The mechanism (Dokuwiki or whatever) makes no difference to me; 
it should be up to the participants to decide on what works.


The difficulty will be "calibration":  those people need to make changes 
that core members agree are improvements, or they won't be incorporated.


I'd suggest that you start very slowly.  First choose *one* help page 
that you think needs improvement, and explain why to one of the authors 
of that page, and what sort of improvements you propose to make.  Then 
get  the author to agree with the proposal, do it, and get the same 
author to agree to the final version and commit it.


I'll volunteer to participate in the approval and committing stage, but 
at first only for pages that I authored.  If it turns out to be an 
efficient way to improve docs, then I'd consider other pages too.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formula argument evaluation

2016-04-12 Thread Keith Jewell

On 12/04/2016 11:24, Adrian Dușa wrote:

I have a simple function such as:

foo <- function(x) {
 call <- lapply(match.call(), deparse)
 testit <- capture.output(tryCatch(eval(x), error = function(e) e))
 if (grepl("Error", testit)) {
 return(call$x)
 }
}

and I would like to detect a formula when x is not an object:

# this works

foo(A + B)

[1] "A + B"

# but this doesn't

foo(A + B => C)

Error: unexpected '=' in "foo(A + B ="

Can I prevent it from evaluating the "=" sign?
The addition sign "+" hasn't been evaluated, and I was hoping the "=" would
not get evaluated either. The "=>" sign is important for other purposes,
not related to this example.

Thank you in advance,
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]


Did you mean
> foo (A + B >= C)
??

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Re: [FORGED] Re: identical() versus sapply()

2016-04-12 Thread Michael Dewey

Short comment inline

On 12/04/2016 12:45, John Kane wrote:


Thank you Rolf.  fortune(350) was the link I was trying to remember.

I believe! I believe in the documentation.

It can be incredibly difficult to document something and unless one has an 
editor to read and 'try' to interpret the results the original writer may not 
realise just how opaque the explanation is.



I do not think anyone who has written documentation would disagree.
Would one way forward here for the OP to suggest with the benefit of all 
the comments how things might be enhanced so that he would not have been 
baffled?



John Kane
Kingston ON Canada



-Original Message-
From: r.tur...@auckland.ac.nz
Sent: Tue, 12 Apr 2016 15:34:54 +1200
To: murdoch.dun...@gmail.com
Subject: Re: [R] [FORGED] Re: [FORGED] Re: identical() versus sapply()

On 12/04/16 14:45, Duncan Murdoch wrote:

On 11/04/2016 10:18 PM, Bert Gunter wrote:

"The documentation aims to be accurate, not necessarily clear."

!!!

I hope that is not the case! Accurate documentation that is confusing
is not very useful.


I don't think it is ever intentionally confusing, but it is often
concise to the point of obscurity.  Words are chosen carefully, and
explanations are not repeated.  It takes an effort to read it.  It will
be clear to careful readers, but not to all readers.

I was thinking of the statement quoted earlier, 'as(x, "numeric") uses
the existing as.numeric function'.  That is different than saying 'as(x,
"numeric") is the same as as.numeric(x)'.



IMHO this is so *obviously* confusing and misleading --- even though it
is technically correct --- that whoever wrote it was either
intentionally trying to be confusing or is unbelievably obtuse and/or
out of touch with reality.

It is not (again IMHO) clear even to *very* careful readers.

To my mind this documentation fails even the fortune(350) test.

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Procesos paralelos

2016-04-12 Thread Gilsanz, Jose Luis
Miguel:



Mil gracias por tu sugerencia de usar la barra de progreso tcltk , me funciona 
perfectamente y además la barra de progreso es más bonita ☺

Ahora me entra la curiosidad malsana de saber porque con la barra de windows no 
sale la barrra y con tcltk si que aparece.





Carlos:



No habia oido hablar de ese paquete pero en cuanto termine con los ETL que 
tengo pendientes voy a empollarme la web del paquete que me has enviado porque 
seguro que agiliza muchisimo todos estos procesos.





Muchas gracias a los dos por la variedad y calidad de las soluciones









José Luis Gilsanz Gómez

Estadística

Departamento Técnico Entidades Financieras

JLL Valoraciones S.A. (Jones Lang LaSalle España S.A.)

Paseo de la Castellana 130 - 1ª; 28046 Madrid

Tel: +34 91 454 96 94

Fax +34 91 541 42 64

jll.es



Síguenos en: Twitter I Linkedin I Youtube I Blog

Piense en el medio ambiente antes de imprimir este e-mail



Los datos personales que en esta comunicación aparecen, así como los que 
nuestra empresa mantiene de Vd. y de su empresa, son tratados con la finalidad 
de mantener el contacto así como realizar las gestiones que en esta aparecen 
(Ley Orgánica 15/1999, de 13 de diciembre, de Protección de Datos de Carácter 
Personal).

Puede ejercer sus derechos de acceso, rectificación, cancelación y oposición 
dirigiéndose a atencion.cliente...@tasacionesh.com.

La utilización de su dirección de correo electrónico por parte de nuestra 
empresa queda sujeta a las disposiciones de la Ley 34/2002, de Servicios de la 
Sociedad de la Información y el Comercio Electrónico. Si Vd. recibe 
comunicación comercial por nuestra parte y desea dejar de recibirla, rogamos 
nos lo comunique por vía electrónica a través de la dirección 
atencion.cliente...@tasacionesh.com .



> -Mensaje original-

> De: gilbello...@gmail.com [mailto:gilbello...@gmail.com] En nombre de

> Carlos J. Gil Bellosta

> Enviado el: martes, 12 de abril de 2016 14:37

> Para: Miguel Angel Rodriguez Muiños

> CC: Gilsanz, Jose Luis; r-help-es

> Asunto: Re: [R-es] Procesos paralelos

>

> Hola, ¿qué tal?

>

> Si la base de datos de destino es SQL Server, ¿por qué no pruebas con la

> función dbBulkCopy del paquete https://github.com/agstudy/rsqlserver?

> Debería poder cargar millones de registros en segundos. Al menos, en una

> única transacción en lugar de múltiples como con sqlSave.

>

> Un saludo,

>

> Carlos J. Gil Bellosta

> http://www.datanalytics.com

>

> El día 12 de abril de 2016, 11:55,

> >
>  escribió:

> > Hola José Luis.

> >

> > Te da algún error? o simplemente no aparece la barra de progreso?

> >

> > ... y si, en vez de usar winProgressBar(), pruebas con la función

> tkProgressBar() del paquete tcltk ¿?

> >

> > Un saludo,

> > Miguel.

> >

> >

> >

> > El 12/04/2016 a las 11:04, Gilsanz, Jose Luis escribió:

> >

> > Hola:

> >

> > Vuelvo a la carga con algo que resolv  hace a os y que ahora me ha dejado

> de funcionar y no consigo arreglar. A ver si alguien me sugiere alg n enfoque

> o directamente la solucion.

> >

> > Utilizo R en muchos procesos ETL y la cuesti n es que me encuentro con que

> tengo que hacer inserts en un BBDD de SQL  Server  de varios miles (a veces

> millones de registros) que mientras R las realiza parece que no est  haciendo

> nada.

> >

> > La soluci n que consegu  hacer en su momento fue paralelizar el proceso de

> inserci n en dos procesos distintos usando el paquete snowfall.

> > -Un proceso se encargaba de la propia inserci n de los datos.

> > -El otro proceso mostraba una barra de progreso que se constru a

> consultando la tabla (tab) donde se insertaban los registros (datos) para

> monitorizar su proceso.

> >

> > La subida al servidor es esta funci n:

> >

> > subida <- function( datos, tab)

> >{

> >flush.console()

> >canal2 <- odbcDriverConnect( 
> > "case=nochange;

> Driver=xxx; Server=xxx; Database=xxx; uid=xxx; pwd=xxx; wsid=xxx;")

> >
> > sqlSave(canal2,datos,tablename= tab, rownames =

> FALSE, append=TRUE, fast=TRUE )

> >close(canal2)

> >rm(canal2)

> >}

> >

> > La barra de progreso se toma de esta funci n:

> > pb <-function( datos,tab){

> > ##Creamos canal de conexion a BBDD

> > canal1 <- odbcDriverConnect( "case=nochange;

> > Driver=SQL Server; Server=xxx; Database=xxx; uid=xxx; pwd=xx;

> > wsid=ESMADN1003;;")

> >

> > ##Obtenemos conteos de registros##

> > #Numero de registro que se van a cargar

> > asubir <- as.numeric(nrow(datos))

> >

> > 

[R] Documentation: Was -- identical() versus sapply()

2016-04-12 Thread ProfJCNash

 "The documentation aims to be accurate, not necessarily clear."
> I notice that none of the critics
> in this thread have offered improvements on what is there.


This issue is as old as documented things. With software it is
particularly nasty, especially when we want the software to function
across many platforms.

Duncan has pointed out that critics need to step up to do something.
I would put documentation failures at the top of my list of
time-wasters, and have been bitten by some particularly weak offerings
(not in R) in the last 2 weeks. So 

Proposal: That the R community consider establishing a "test and
document" group to parallel R-core to focus on the documentation.
An experiment to test the waters is suggested below.

The needs:
- tools that let the difficulties with documentation be visualized along
with proposed changes and the discussion accessed by the wider
community, while keeping a well-defined process for committing accepted
changes.
- a process for the above. Right now a lot happens by discussion in the
lists and someone in R-core committing the result. If it is
well-organized, it is not well-understood by the wider R user community.
- tools for managing and providing access to tests

At the risk of opening another can of worms, documentation is an area
where such an effort could benefit from paid help. It's an area where
there's low reward for high effort, particularly for volunteers.
Moreover, like many volunteers, I'm happy to do some work, but I need
ways to contribute in small bites (bytes?), and it is difficult to find
suitable tasks to take on.

Is it worth an experiment to customize something like Dokuwiki (which I
believe was the platform for the apparently defunct R wiki) to allow a
segment of R documentation to be reviewed, discussed and changes
proposed? It could show how we might get to a better process for
managing R documentation.

Cheers, JN

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: [FORGED] Re: identical() versus sapply()

2016-04-12 Thread Duncan Murdoch

On 11/04/2016 11:34 PM, Rolf Turner wrote:

On 12/04/16 14:45, Duncan Murdoch wrote:

On 11/04/2016 10:18 PM, Bert Gunter wrote:

"The documentation aims to be accurate, not necessarily clear."

!!!

I hope that is not the case! Accurate documentation that is confusing
is not very useful.


I don't think it is ever intentionally confusing, but it is often
concise to the point of obscurity.  Words are chosen carefully, and
explanations are not repeated.  It takes an effort to read it.  It will
be clear to careful readers, but not to all readers.

I was thinking of the statement quoted earlier, 'as(x, "numeric") uses
the existing as.numeric function'.  That is different than saying 'as(x,
"numeric") is the same as as.numeric(x)'.



IMHO this is so *obviously* confusing and misleading --- even though it
is technically correct --- that whoever wrote it was either
intentionally trying to be confusing or is unbelievably obtuse and/or
out of touch with reality.

It is not (again IMHO) clear even to *very* careful readers.

To my mind this documentation fails even the fortune(350) test.



I generally agree that that particular sentence falls pretty far out on 
the obscurity end of the spectrum, but it's much easier to criticize the 
documentation than it is to write it.  I notice that none of the critics 
in this thread have offered improvements on what is there.


I haven't looked up who wrote it (it wasn't me, though I'm sure I've 
written equally obscure sentences), but I do not believe it was 
intentionally confusing, nor is the author obtuse or out of touch with 
reality.  I think that insulting authors is not a way to encourage them 
to change.  That's reality.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dispatch issue in package check?

2016-04-12 Thread Szumiloski, John
Dear useRs,

I am developing a package using RStudio and roxygen markup files.  I have run 
into a problem while checking.

The relevant function is a generic S3 statistical function modeled on t.test(), 
with methods.  It returns an object of class "htest" etc.  Here Is the 
(anonymized) relevant code:

<...>
#' @examples
#' foo(c(5,4,6,5,7,9,8,11,12,10), )
<...>
#' @export
foo <- function(x, ...) UseMethod("foo ")
#'
foo.default <- function(x, )  {< code adapted from 
t.test.default() >}
#'
foo.formula <- function(formula, data, subset, na.action, ...) {< code adapted 
from t.test.formula() >)

The issue comes when checking.  In RStudio I select the Check button in the 
Build tab.  It executes the command


devtools::check(cleanup = FALSE)

I attach the console output to demonstrate the error, as well as to perhaps see 
any clues:


Updating  documentation

Loading 

Writing NAMESPACE

Writing 

Writing foo.Rd

Writing 

Setting env vars ---

CFLAGS  : -Wall -pedantic

CXXFLAGS: -Wall -pedantic

Building  ---

"C:/PROGRA~1/R/R-32~1.4/bin/i386/R" --no-site-file --no-environ --no-save  \

  --no-restore --quiet CMD build "C:\Users\szumiloj\R\"  \

  --no-resave-data --no-manual



* checking for file 'C:\Users\szumiloj\R\/DESCRIPTION' ... OK

* preparing '':

* checking DESCRIPTION meta-information ... OK

* checking for LF line-endings in source and make files

* checking for empty or unneeded directories

* building '_0.0.0.9001.tar.gz'



Setting env vars ---

_R_CHECK_CRAN_INCOMING_ : FALSE

_R_CHECK_FORCE_SUGGESTS_: FALSE

Checking ) 
---

"C:/PROGRA~1/R/R-32~1.4/bin/i386/R" --no-site-file --no-environ --no-save  \

  --no-restore --quiet CMD check  \

  "C:\Users\szumiloj\AppData\Local\Temp\RtmpWKHu33/_0.0.0.9001.tar.gz"  \

  --as-cran --timings --no-manual



* using log directory 'C:/Users/szumiloj/R/.Rcheck'

* using R version 3.2.4 (2016-03-10)

* using platform: i386-w64-mingw32 (32-bit)

* using session charset: ISO8859-1

* using options '--no-manual --as-cran'

* checking for file '/DESCRIPTION' ... OK

* checking extension type ... Package

* this is package '' version '0.0.0.9001'

* checking package namespace information ... OK

* checking package dependencies ... OK

* checking if this is a source package ... OK

* checking if there is a namespace ... OK

* checking for executable files ... OK

* checking for hidden files and directories ... OK

* checking for portable file names ... OK

* checking whether package '' can be installed ... OK

* checking installed package size ... OK

* checking package directory ... OK

* checking DESCRIPTION meta-information ... OK

* checking top-level files ... OK

* checking for left-over files ... OK

* checking index information ... OK

* checking package subdirectories ... OK

* checking R files for non-ASCII characters ... OK

* checking R files for syntax errors ... OK

* checking whether the package can be loaded ... OK

* checking whether the package can be loaded with stated dependencies ... OK

* checking whether the package can be unloaded cleanly ... OK

* checking whether the namespace can be loaded with stated dependencies ... OK

* checking whether the namespace can be unloaded cleanly ... OK

* checking loading without being on the library search path ... OK

* checking dependencies in R code ... OK

* checking S3 generic/method consistency ... OK

* checking replacement functions ... OK

* checking foreign function calls ... OK

* checking R code for possible problems ... OK

* checking Rd files ... OK

* checking Rd metadata ... OK

* checking Rd line widths ... OK

* checking Rd cross-references ... OK

* checking for missing documentation entries ... OK

* checking for code/documentation mismatches ... OK

* checking Rd \usage sections ... OK

* checking Rd contents ... OK

* checking for unstated dependencies in examples ... OK

* checking examples ... ERROR

Running examples in '-Ex.R' failed

The error most likely occurred in:



> base::assign(".ptime", proc.time(), pos = "CheckExEnv")

> ### Name: foo

> ### Title: 

> ### Aliases: foo foo.default foo.formula <"Foo" etc>

>

> ### ** Examples

>

> foo(c(5,4,6,5,7,9,8,11,12,10), )

Error in UseMethod("foo") :

  no applicable method for 'foo' applied to an object of class "c('double', 
'numeric')"

Calls: foo

Execution halted

* DONE

Status: 1 ERROR



See

  'C:/Users/szumiloj/R/.Rcheck/00check.log'

for details.



checking examples ... ERROR

Running examples in '-Ex.R' failed

The error most likely occurred in:



> base::assign(".ptime", proc.time(), pos = "CheckExEnv")

> ### Name: foo

> ### Title: 

> ### Aliases: foo foo.default foo.formula <"Foo" etc>

>

> ### ** Examples

>

> 

Re: [R] [FORGED] Re: [FORGED] Re: identical() versus sapply()

2016-04-12 Thread John Kane

Thank you Rolf.  fortune(350) was the link I was trying to remember.

I believe! I believe in the documentation. 

It can be incredibly difficult to document something and unless one has an 
editor to read and 'try' to interpret the results the original writer may not 
realise just how opaque the explanation is.

John Kane
Kingston ON Canada


> -Original Message-
> From: r.tur...@auckland.ac.nz
> Sent: Tue, 12 Apr 2016 15:34:54 +1200
> To: murdoch.dun...@gmail.com
> Subject: Re: [R] [FORGED] Re: [FORGED] Re: identical() versus sapply()
> 
> On 12/04/16 14:45, Duncan Murdoch wrote:
>> On 11/04/2016 10:18 PM, Bert Gunter wrote:
>>> "The documentation aims to be accurate, not necessarily clear."
>>> 
>>> !!!
>>> 
>>> I hope that is not the case! Accurate documentation that is confusing
>>> is not very useful.
>> 
>> I don't think it is ever intentionally confusing, but it is often
>> concise to the point of obscurity.  Words are chosen carefully, and
>> explanations are not repeated.  It takes an effort to read it.  It will
>> be clear to careful readers, but not to all readers.
>> 
>> I was thinking of the statement quoted earlier, 'as(x, "numeric") uses
>> the existing as.numeric function'.  That is different than saying 'as(x,
>> "numeric") is the same as as.numeric(x)'.
> 
> 
> IMHO this is so *obviously* confusing and misleading --- even though it
> is technically correct --- that whoever wrote it was either
> intentionally trying to be confusing or is unbelievably obtuse and/or
> out of touch with reality.
> 
> It is not (again IMHO) clear even to *very* careful readers.
> 
> To my mind this documentation fails even the fortune(350) test.
> 
> cheers,
> 
> Rolf
> 
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: identical() versus sapply()

2016-04-12 Thread John Kane



> -Original Message-
> From: bgunter.4...@gmail.com
> Sent: Mon, 11 Apr 2016 19:18:39 -0700
> To: murdoch.dun...@gmail.com
> Subject: Re: [R] [FORGED] Re: identical() versus sapply()
> 
> "The documentation aims to be accurate, not necessarily clear."
> 
> !!!
> 
> I hope that is not the case! Accurate documentation that is confusing
> is not very useful. I understand that it is challenging to write docs
> that are both clear and accurate; but I hope that is always the goal.

I have lost the link but someone here had a lovely essay on R documentation 
which pointed out that one had to  have "faith" that everything was in the 
documentation.


> 
> Cheers,
> Bert
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Mon, Apr 11, 2016 at 6:09 PM, Duncan Murdoch
>  wrote:
>> On 11/04/2016 8:25 PM, Paulson, Ariel wrote:
>>> 
>>> Hi Jeff,
>>> 
>>> 
>>> We are splitting hairs because R is splitting hairs, and causing us
>>> problems.  Integer and numeric are different R classes with different
>>> properties, mathematical relationships notwithstanding.  For instance,
>>> the
>>> counterintuitive result:
>> 
>> 
>> The issue here is that R has grown.  The as() function is newer than the
>> as.numeric() function, it's part of the methods package.  It is a much
>> more
>> complicated thing, and there are cases where they differ.
>> 
>> In this case, the problem is that is(1L, "numeric") evaluates to TRUE,
>> and
>> nobody has written a coerce method that specifically converts "integer"
>> to
>> "numeric".  So the as() function defaults to doing nothing.
>> It takes a while to do nothing, approximately 360 times longer than
>> as.numeric() takes to actually do the conversion:
>> 
>>> microbenchmark(as.numeric(1L), as(1L, "numeric"))
>> Unit: nanoseconds
>>   expr   minlq  mean  median   uq max neval
>> as.numeric(1L)   133   210516.92   273.5409.59444   100
>>  as(1L, "numeric") 51464 64501 119294.31 99768.5 138321.0 1313669   100
>> 
>> R performance is not always simple and easy to predict, but I think
>> anyone
>> who had experience with R would never use as(x, "numeric").  So this
>> just
>> isn't a problem worth fixing.
>> 
>> Now, you might object that the documentation claims they are equivalent,
>> but
>> it certainly doesn't.  The documentation aims to be accurate, not
>> necessarily clear.
>> 
>> Duncan Murdoch
>> 
>> 
>>> 
 identical(as.integer(1), as.numeric(1))
>>> 
>>> [1] FALSE
>>> 
>>> 
>>> Unfortunately the reply-to chain doesn't extend far enough -- here is
>>> the
>>> original problem:
>>> 
>>> 
 sapply(1, identical, 1)
>>> 
>>> [1] TRUE
>>> 
 sapply(1:2, identical, 1)
>>> 
>>> [1] FALSE FALSE
>>> 
 sapply(1:2, function(i) identical(as.numeric(i),1) )
>>> 
>>> [1]  TRUE FALSE
>>> 
 sapply(1:2, function(i) identical(as(i,"numeric"),1) )
>>> 
>>> [1] FALSE FALSE
>>> 
>>> These are the results of R's hair-splitting!
>> 
>> 
>> 
>>> 
>>> Ariel
>>> 
>>> 
>>> From: Jeff Newmiller 
>>> Sent: Monday, April 11, 2016 6:49 PM
>>> To: Bert Gunter; Paulson, Ariel
>>> Cc: Rolf Turner; r-help@r-project.org
>>> Subject: Re: [R] [FORGED] Re: identical() versus sapply()
>>> 
>>> Hypothesis regarding the thought process: integer is a perfect subset
>>> of
>>> numeric, so why split hairs?
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>> 
>>> On April 11, 2016 12:36:56 PM PDT, Bert Gunter 
>>> wrote:
>>> 
>>> Indeed!
>>> 
>>> Slightly simplified to emphasize your point:
>>> 
>>>   class(as(1:2,"numeric"))
>>> [1] "integer"
>>> 
>>>   class(as.numeric(1:2))
>>> [1] "numeric"
>>> 
>>> whereas in ?as it says:
>>> 
>>> "Methods are pre-defined for coercing any object to one of the basic
>>> datatypes. For example, as(x, "numeric") uses the existing as.numeric
>>> function. "
>>> 
>>> I suspect this is related to my ignorance of S4 classes (i.e. as() )
>>> and how they relate to S3 classes, but I certainly don't get it
>>> either.
>>> 
>>> Cheers,
>>> Bert
>>> 
>>> 
>>> 
>>> Bert Gunter
>>> 
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things
>>> into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> 
>>> 
>>> On Mon, Apr 11, 2016 at 9:30 AM, Paulson, Ariel 
>>> wrote:
>>>   Ok, I see the difference between 1 and 1:2, I'll just leave it as one
>>> of
>>> those "only in R" things.
>>> 
>>>   But it seems then, that as.numeric() should guarantee a FALSE
>>> outcome,
>>> yet it does not.
>>> 
>>>   To build on what Rolf pointed out, I would really love for someone to
>>> explain this one:
>>> 
>>>   str(1)
>>>num 1
>>> 
>>>   str(1:2)
>>>int [1:2] 1 2
>>> 
>>>   str(as.numeric(1:2))
>>>   

Re: [R] Adding Two-Headed Arrow in map legend

2016-04-12 Thread Jim Lemon
Hi Milu,
There is a two-headed arrow on the image you sent, and it seems to be
where you specified. Did you want it beneath the map, as:

par(xpd=TRUE)
arrows(-22,54.75,-22,74,code=3)
par(xpd=FALSE)

Jim

On Tue, Apr 12, 2016 at 7:58 PM, Miluji Sb  wrote:
> Dear Jim,
>
> Thanks again! I do want the arrows at the bottom (beneath the map). This is
> what I am doing:
>
> # Draw the map
> eps_europe <- mapCountryData(n, nameColumnToPlot="eps_score", mapTitle="EPS
> Score - Europe",colourPalette=colourPalette,
> catMethod="fixedWidth", missingCountryCol = "white", mapRegion="Europe",
> addLegend=FALSE)
>
> # ISO3 codes on the map
> text(n, labels="ISO3", cex=0.30)
>
> # Obtain coordinates for the arrow
> par('usr')
>
> # -19.75966  54.75966  33.6  71.4
>
> # Arrows
> par(xpd=TRUE)
> arrows(-19.75966,  54.75966,  33.6,  71.4,code=3)
> par(xpd=FALSE)
>
> As the output shows I cannot seem to get the correct coordinates for the
> arrows. Thanks again.
>
> Sincerely,
>
> Milu

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formula argument evaluation

2016-04-12 Thread Duncan Murdoch

On 12/04/2016 6:24 AM, Adrian Dușa wrote:

I have a simple function such as:

foo <- function(x) {
 call <- lapply(match.call(), deparse)
 testit <- capture.output(tryCatch(eval(x), error = function(e) e))
 if (grepl("Error", testit)) {
 return(call$x)
 }
}

and I would like to detect a formula when x is not an object:

# this works

foo(A + B)

[1] "A + B"

# but this doesn't

foo(A + B => C)

Error: unexpected '=' in "foo(A + B ="

Can I prevent it from evaluating the "=" sign?


It never gets to evaluating it.  It is not a legal R statement, so the 
parser signals an error.


If you want to pass arbitrary strings to a function, you need to put 
them in quotes.


Duncan Murdoch


The addition sign "+" hasn't been evaluated, and I was hoping the "=" would
not get evaluated either. The "=>" sign is important for other purposes,
not related to this example.

Thank you in advance,
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Jesús Para Fernández
Voy a investigar mas el forestFloor, seguramente vuelva en un par de d�as con 
dudas. Gracias chicos :)

Por cierto, Carlos, �c�mo haces para encontrar siempre la informaci�n necesaria 
en tan poco tiempo?

Date: Tue, 12 Apr 2016 12:00:42 +0200
Subject: Re: [R-es] Random Forest para clasificaci�n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Si lees estas referencias puedes entender mejor las dificultades y limitaciones 
que presentan los "partialPlot" para determinar su contribuci�n al modelo que 
has 
construido:http://stats.stackexchange.com/questions/21152/obtaining-knowledge-from-a-random-forest#172839
http://stats.stackexchange.com/questions/92150/r-what-do-i-see-in-partial-dependence-plots-of-gbm-and-randomforestEn
 ellas aparece la referencia a este otro paquete 
"forestFloor":https://cloud.r-project.org/web/packages/forestFloor/index.html
Que s� que te puede ayudar a entender de forma visual, las 
relaciones/interrelaciones que existen entre la variable que quieres analizar y 
el resto de tu modelo.
Por otro lado, comentarte que mi experiencia con SMOTE  (est� en el paquete 
DMwR) o con sus equivalentes de "caret" (upSample() - downSample() ) tienden a 
crear un overfit en el modelo. Para el nivel de desbalanceo que tienes. Valores 
de menos del 1% en tu variable son cuando s� que cuando interesa aplicarlo. Una 
alternativa es usar las funciones de coste incluyendo valores inversamente 
proporcionales a la frecuencia de cada caso (variable predictora).
Saludos,Carlos.

El 12 de abril de 2016, 11:33, Jes�s Para Fern�ndez 
 escribi�:



Mi matriz de datos inicial estaba muy desbalanceada (5% de la clase 
minoritaria), por lo que he creado con el algoritmo SMOTE un dataset balanceado 
con el que he creado el modelo, y luego sobre ese modelo he creado la matriz de 
confusi�n con los datos originales.

Respecto a lo que me comentas, Carlos, creo que adem�s de todo lo que comentas, 
que est� bien, en micaso es necesario tambi�n saber no s�lo que variable 
influye sino como influye, y esto en un randomForest es complejo, ya que es una 
caja negra. 

Por ello es por lo que buscaba la mejor manera, y una era medainte partialPlot 
y la otra mediante �rboles de clasificaci�n, pero esta segunda manera me 
empobrece el modelo.

Un saludo
Jes�s

Date: Tue, 12 Apr 2016 11:01:16 +0200
Subject: Re: [R-es] Random Forest para clasificaci�n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Entonces si tienes:La importancia de las variables (esto lo obtienes 
directamente con "importance").Tienes la matriz de confusi�n.Con esto tienes 
bastante informaci�n sobre la bondad de tu modelo y sobre qu� variables 
influyen m�s en tu variable objetivo.Lo �nico que veo que te har�a falta es 
determinar:La precisi�n utilizando cualquiera de las medidas de error que 
existen: Accuracy, Kappa, LogLoss, RSE, RMSEY quiz�s evaluar un poco m�s 
finamente tu modelo con un CV para ver con amplitud si es que no est�s haciendo 
overfitting.�Has hecho tu modelo directamente?. Quiero decir sin utilizar una 
parte para entrenar y otra parte para hacer "test".
Saludos,Carlos.

El 12 de abril de 2016, 10:39, Jes�s Para Fern�ndez 
 escribi�:




No no, eso lo he sacaod, es decir, tengo la matriz de confusi�n para las 
OK/NOK, lo que no entiendo es como extraer las conclusiones sobre el modelo, de 
cara a como afectan las variables. He seguido dos estrategias:

1-Crear arboles de clasificacion con las variables m�s importantes del random 
Forest, pero el modelo se empobrece bastante.
2- Sacar los partialPlot, para ver la influencia de cada variable, pero no 
termino de entender el significado del eje Y para estos gr�ficos. Por lo que he 
visto, con tu aportaci�n primera, es que es el porcentaje de Votos OK/NOK, pero 
me queda la duda de saber si el 1 es el OK y el -1 el NOK o al rev�s.

Gracias Carlos!
Jes�s
Date: Tue, 12 Apr 2016 10:28:44 +0200
Subject: Re: [R-es] Random Forest para clasificaci�n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Entonces, por tu �ltima pregunta, tu duda no es realmente sobre el significado 
de "partialPlot" si no realmente si a la hora de hacer tu modelo, 
"randomForest" est� haciendo una buena o mala clasificaci�n. �Es as�?. Porque 
entonces lo que hay que aclarar es otra cosa.
Si lo que quieres determinar con precisi�n es si cuando "randomForest" (o 
cualquier otro modelo) te est� indicando que un individuo (una fila) pertenece 
o no a una clase determinada (en tu caso "OK" o "KO") lo que tienes que 
plantearte son otras cosas. Antes de hablar de ellas, prefiero confirmar 
contigo si es esto lo que buscas o no.
Saludos,Carlos Ortegawww.qualityexcellence.es

El 12 de abril de 2016, 10:17, Jes�s Para Fern�ndez 
 escribi�:




Gracias por la pronta respuesta, pero 

[R] R integration with SAP-HANA and SQLScripting

2016-04-12 Thread Griffiths, Michael
Dear R forum,

I am seeking relevant material that discusses processes and methods of
incorporating R code into SAP-HANA. I would greatly appreciate links to any
relevant literature.

Background research on my part has only found the SAP-HANA R Integration
Guide, and several short examples. If the forum knows of any other sources
of information, this would be greatly appreciated.

Regards

Mike Griffiths

-- 


--

"Please consider the environment before printing this e-mail"

Newsworks - bringing advertisers and newsbrands together

www.newsworks.org.uk

This e-mail and any attachments are confidential, may be

legally privileged and are the property of News Corp UK & Ireland

Limited on whose systems they were generated. News Corp UK

& Ireland Limited is the holding company for the News UK group,

is registered in England & Wales under number 81701, has its 

registered office at 1 London Bridge Street, London, SE1 9GF and 

is registered with VAT number GB 243 8054 69. If you have received 

this e-mail in error, please notify the sender immediately and do not

use, distribute, store or copy it in any way. Statements or opinions in

this e-mail or any attachment are those of the author and are not 
necessarily agreed or authorised by News Corp UK & Ireland Limited 

or any member of its group. News Corp UK & Ireland Limited may

monitor outgoing or incoming emails as permitted by law.  It accepts
no liability for viruses introduced by this e-mail or attachments. 

News Corp UK & Ireland Limited and its titles are committed to abiding by 
IPSO's regulations and the Editors' Code of Practice that IPSO enforces.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] formula argument evaluation

2016-04-12 Thread Adrian Dușa
I have a simple function such as:

foo <- function(x) {
call <- lapply(match.call(), deparse)
testit <- capture.output(tryCatch(eval(x), error = function(e) e))
if (grepl("Error", testit)) {
return(call$x)
}
}

and I would like to detect a formula when x is not an object:

# this works
> foo(A + B)
[1] "A + B"

# but this doesn't
> foo(A + B => C)
Error: unexpected '=' in "foo(A + B ="

Can I prevent it from evaluating the "=" sign?
The addition sign "+" hasn't been evaluated, and I was hoping the "=" would
not get evaluated either. The "=>" sign is important for other purposes,
not related to this example.

Thank you in advance,
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R-es] Procesos paralelos

2016-04-12 Thread Gilsanz, Jose Luis
Hola:

Vuelvo a la carga con algo que resolv� hace a�os y que ahora me ha dejado de 
funcionar y no consigo arreglar. A ver si alguien me sugiere alg�n enfoque o 
directamente la solucion.

Utilizo R en muchos procesos ETL y la cuesti�n es que me encuentro con que 
tengo que hacer inserts en un BBDD de SQL  Server  de varios miles (a veces 
millones de registros) que mientras R las realiza parece que no est� haciendo 
nada.

La soluci�n que consegu� hacer en su momento fue paralelizar el proceso de 
inserci�n en dos procesos distintos usando el paquete snowfall.
-Un proceso se encargaba de la propia inserci�n de los datos.
-El otro proceso mostraba una barra de progreso que se constru�a consultando la 
tabla (tab) donde se insertaban los registros (datos) para monitorizar su 
proceso.

La subida al servidor es esta funci�n:

subida <- function( datos, tab)
   {
   flush.console()
   canal2 <- odbcDriverConnect( 
"case=nochange; Driver=xxx; Server=xxx; Database=xxx; uid=xxx; pwd=xxx; 
wsid=xxx;")
   sqlSave(canal2,datos,tablename= 
tab, rownames = FALSE, append=TRUE, fast=TRUE )
   close(canal2)
   rm(canal2)
   }

La barra de progreso se toma de esta funci�n:
pb <-function( datos,tab){
##Creamos canal de conexion a BBDD
canal1 <- odbcDriverConnect( "case=nochange; Driver=SQL Server; 
Server=xxx; Database=xxx; uid=xxx; pwd=xx; wsid=ESMADN1003;;")

##Obtenemos conteos de registros##
#Numero de registro que se van a cargar
asubir <- as.numeric(nrow(datos))

#Numero de registro que ya hay en la tabla
entabla <- as.numeric(sqlQuery(canal1,paste("SELECT Count(*) ", 
" FROM ",tab, sep="")))

#Numero de registros cargados en el momento n
total <- as.numeric(0)

#Frecuenca de actualizacion de la barra
  frec <- 0.1

  ##Creamos barra de progreso
  barra <- winProgressBar(title="Subiendo datos a SQL ", label = "Subido el:  
", min= 0, max= 1,initial= 0, width = 800)

##Mientras los registros que quedan por subir sean inferiores a 
los que actualmente hay en la tabla se muestra la barra
while ( entabla + asubir > total  )
 {
   #Reconectamos
   canal1 <- odbcReConnect(canal1)

   #Obtenemos registros actuales en 
la tabla (los que habia + los que han subido hasta el momento)
   total <- 
as.numeric(sqlQuery(canal1,paste("SELECT Count(*) FROM ",tab, sep="")))

   #Calculamos porcentaje de 
registros subidos en el momento
   porcen <- as.numeric((total - 
entabla) / asubir)

   #Actualizamos barra de progreso
   setWinProgressBar(barra, 
porcen,title="SUBIENDO DATOS A SQL", label =paste("Subido el:  ", round(porcen 
*100,0), "% de los datos. Quedan por subir ",(entabla + asubir)-total, " 
registros de ", asubir, "." , sep=""))

   #Actualizamos consola
   flush.console()
   Sys.sleep(frec)
 }
close(barra)
}

Ahora estoy intentando usar el paquete parallel (en lugar de snowfall que ya no 
me funciona) haciendo esto.
library(parallel)
library(RODBC)

##Creo un cluster con dos nodos
cl <-makeCluster(2)

##Exporto datos y funciones a los dos cluster
clusterExport(cl,varlist=c("pb","subida","datos","tab"))

##En el primer cluster hago la inserci�n en el segundo la barra de progreso
clusterApply(cl,subida(datos,tab),pb(datos,tab) )


La inserci�n la realiza correctamente pero la barra de progreso no aparece por 
ning�n lado :( y en el monitor de procesos aparecen dos Rscript.exe corriendo 
(uso Windows 7)

Si alguien quiere que le proporcione la funci�n que constru� usando snowfall (y 
que ahora tampoco muestra la barra) se la puedo enviar para destriparla.

Muchas gracias

Un saludo


Jos� Luis Gilsanz G�mez
Estad�stica
Departamento T�cnico Entidades Financieras
JLL Valoraciones S.A. (Jones Lang LaSalle Espa�a S.A.)
Paseo de la Castellana 130 - 1�; 28046 Madrid
Tel: +34 91 454 96 94
Fax +34 91 541 42 64
jll.es

S�guenos en: Twitter I 
Linkedin
 I Youtube I 

Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Carlos Ortega
Hola,

Entonces si tienes:

   - La importancia de las variables (esto lo obtienes directamente con
   "importance").
   - Tienes la matriz de confusión.

Con esto tienes bastante información sobre la bondad de tu modelo y sobre
qué variables influyen más en tu variable objetivo.
Lo único que veo que te haría falta es determinar:

   - La precisión utilizando cualquiera de las medidas de error que
   existen: Accuracy, Kappa, LogLoss, RSE, RMSE
   - Y quizás evaluar un poco más finamente tu modelo con un CV para ver
   con amplitud si es que no estás haciendo overfitting.

¿Has hecho tu modelo directamente?. Quiero decir sin utilizar una parte
para entrenar y otra parte para hacer "test".

Saludos,
Carlos.


El 12 de abril de 2016, 10:39, Jesús Para Fernández <
j.para.fernan...@hotmail.com> escribió:

>
> No no, eso lo he sacaod, es decir, tengo la matriz de confusión para las
> OK/NOK, lo que no entiendo es como extraer las conclusiones sobre el
> modelo, de cara a como afectan las variables. He seguido dos estrategias:
>
> 1-Crear arboles de clasificacion con las variables más importantes del
> random Forest, pero el modelo se empobrece bastante.
> 2- Sacar los partialPlot, para ver la influencia de cada variable, pero no
> termino de entender el significado del eje Y para estos gráficos. Por lo
> que he visto, con tu aportación primera, es que es el porcentaje de Votos
> OK/NOK, pero me queda la duda de saber si el 1 es el OK y el -1 el NOK o al
> revés.
>
> Gracias Carlos!
> Jesús
> --
> Date: Tue, 12 Apr 2016 10:28:44 +0200
>
> Subject: Re: [R-es] Random Forest para clasificación
> From: c...@qualityexcellence.es
> To: j.para.fernan...@hotmail.com
> CC: r-help-es@r-project.org
>
> Hola,
>
> Entonces, por tu última pregunta, tu duda no es realmente sobre el
> significado de "partialPlot" si no realmente si a la hora de hacer tu
> modelo, "randomForest" está haciendo una buena o mala clasificación. ¿Es
> así?. Porque entonces lo que hay que aclarar es otra cosa.
>
> Si lo que quieres determinar con precisión es si cuando "randomForest" (o
> cualquier otro modelo) te está indicando que un individuo (una fila)
> pertenece o no a una clase determinada (en tu caso "OK" o "KO") lo que
> tienes que plantearte son otras cosas. Antes de hablar de ellas, prefiero
> confirmar contigo si es esto lo que buscas o no.
>
> Saludos,
> Carlos Ortega
> www.qualityexcellence.es
>
>
> El 12 de abril de 2016, 10:17, Jesús Para Fernández <
> j.para.fernan...@hotmail.com> escribió:
>
>
> Gracias por la pronta respuesta, pero tras leer la contestación de la
> gente, sigo sin entender muy bien la explicación.
>
> Le responden lo siguiente:
> "Each point on the partial dependence plot is the average vote percentage
> in favor of the "Yes trees" class across all observations, given a fixed
> level of TRI.
> It's not a probability of correct classification. It has absolutely
> nothing to do with accuracy, true negatives, and true positives.
> When you see the phrase
>
> Values greater than TRI 30 begin to have a positive influence for
> classification in your model
>
> is an puffed-up way of saying
>
> Values greater than TRI 30 begin to predict "Yes trees" more strongly than
> values lower than TRI 30"
>
> Es decir, que el eje Y es el total de votos de una clase frente a la otra,
> pero como está codificado con -1 y +1, como se cual es la clase OK y la NOK?
>
> Gracias
> Jesús
> --
> Date: Tue, 12 Apr 2016 10:04:15 +0200
> Subject: Re: [R-es] Random Forest para clasificación
> From: c...@qualityexcellence.es
> To: j.para.fernan...@hotmail.com
> CC: r-help-es@r-project.org
>
>
> Hola,
>
> Aquí tienes una explicación:
>
>
> http://stats.stackexchange.com/questions/121383/interpreting-y-axis-of-a-partial-dependence-plots
>
> Saludos,
> Carlos Ortega
> www.qualityexcellence.es
>
> El 12 de abril de 2016, 7:13, Jesús Para Fernández <
> j.para.fernan...@hotmail.com> escribió:
>
> Buenas,
>
> Cuando estoy haciendo un random Forest para clasificación y hago el
> gráfico partialPlot, siendo mi respuesta OK/NOK, me sale en el eje X el
> valor de la variable, pero en el eje Y me salen valores de entre -1 y 1.
> ¿Qué significado tiene?
>
> Adjunto un ejemplo:
>
> https://www.dropbox.com/s/4b92lqxi3592r0d/Captura.JPG?dl=0
>
>
> Gracias!!!
>
> [[alternative HTML version deleted]]
>
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>
>
>
> --
> Saludos,
> Carlos Ortega
> www.qualityexcellence.es
>
>
>
>
> --
> Saludos,
> Carlos Ortega
> www.qualityexcellence.es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Isidro Hidalgo Arellano
Lo razonable es pensar que el "SÍ" de los árboles corresponde a los positivos.
Es preocupante que tu modelo se empobrezca bastante al quitar variables "poco 
importantes" para el random forest. ¿Qué porcentaje de variables has quitado?
Un saludo

Isidro Hidalgo Arellano
Observatorio del Mercado de Trabajo
Consejería de Economía, Empresas y Empleo
http://www.castillalamancha.es/




-Mensaje original-
De: R-help-es [mailto:r-help-es-boun...@r-project.org] En nombre de Jesús Para 
Fernández
Enviado el: martes, 12 de abril de 2016 10:40
Para: Carlos Ortega 
CC: r-help-es@r-project.org
Asunto: Re: [R-es] Random Forest para clasificación


No no, eso lo he sacaod, es decir, tengo la matriz de confusi n para las 
OK/NOK, lo que no entiendo es como extraer las conclusiones sobre el modelo, de 
cara a como afectan las variables. He seguido dos estrategias:

1-Crear arboles de clasificacion con las variables m s importantes del random 
Forest, pero el modelo se empobrece bastante.
2- Sacar los partialPlot, para ver la influencia de cada variable, pero no 
termino de entender el significado del eje Y para estos gr ficos. Por lo que he 
visto, con tu aportaci n primera, es que es el porcentaje de Votos OK/NOK, pero 
me queda la duda de saber si el 1 es el OK y el -1 el NOK o al rev s.

Gracias Carlos!
Jes s
Date: Tue, 12 Apr 2016 10:28:44 +0200
Subject: Re: [R-es] Random Forest para clasificaci n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Entonces, por tu  ltima pregunta, tu duda no es realmente sobre el significado 
de "partialPlot" si no realmente si a la hora de hacer tu modelo, 
"randomForest" est  haciendo una buena o mala clasificaci n.  Es as ?. Porque 
entonces lo que hay que aclarar es otra cosa.
Si lo que quieres determinar con precisi n es si cuando "randomForest" (o 
cualquier otro modelo) te est  indicando que un individuo (una fila) pertenece 
o no a una clase determinada (en tu caso "OK" o "KO") lo que tienes que 
plantearte son otras cosas. Antes de hablar de ellas, prefiero confirmar 
contigo si es esto lo que buscas o no.
Saludos,Carlos Ortegawww.qualityexcellence.es

El 12 de abril de 2016, 10:17, Jes s Para Fern ndez 
 escribi :




Gracias por la pronta respuesta, pero tras leer la contestaci n de la gente, 
sigo sin entender muy bien la explicaci n.

Le responden lo siguiente:
"Each point on the partial dependence plot is the average vote percentage in 
favor of the "Yes trees" class across all observations, given a fixed level of 
TRI.

It's not a probability of correct classification. It has absolutely nothing to 
do with accuracy, true negatives, and true positives.


When you see the phrase



  Values greater than TRI 30 begin to have a positive influence for 
classification in your model



is an puffed-up way of saying



  Values greater than TRI 30 begin to predict "Yes trees" more strongly than 
values lower than TRI 30"


Es decir, que el eje Y es el total de votos de una clase frente a la otra, 
pero como est  codificado con -1 y +1, como se cual es la clase OK y la NOK?

Gracias
Jes s
Date: Tue, 12 Apr 2016 10:04:15 +0200
Subject: Re: [R-es] Random Forest para clasificaci n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Aqu  tienes una explicaci n:
http://stats.stackexchange.com/questions/121383/interpreting-y-axis-of-a-partial-dependence-plots

Saludos,Carlos Ortegawww.qualityexcellence.es
El 12 de abril de 2016, 7:13, Jes s Para Fern ndez 
 escribi :
Buenas,



Cuando estoy haciendo un random Forest para clasificaci n y hago el gr fico 
partialPlot, siendo mi respuesta OK/NOK, me sale en el eje X el valor de la 
variable, pero en el eje Y me salen valores de entre -1 y 1.  Qu  significado 
tiene?



Adjunto un ejemplo:



https://www.dropbox.com/s/4b92lqxi3592r0d/Captura.JPG?dl=0





Gracias!!!



[[alternative HTML version deleted]]




___

R-help-es mailing list

R-help-es@r-project.org

https://stat.ethz.ch/mailman/listinfo/r-help-es


-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es
  


-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es
  
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Jesús Para Fernández

No no, eso lo he sacaod, es decir, tengo la matriz de confusi�n para las 
OK/NOK, lo que no entiendo es como extraer las conclusiones sobre el modelo, de 
cara a como afectan las variables. He seguido dos estrategias:

1-Crear arboles de clasificacion con las variables m�s importantes del random 
Forest, pero el modelo se empobrece bastante.
2- Sacar los partialPlot, para ver la influencia de cada variable, pero no 
termino de entender el significado del eje Y para estos gr�ficos. Por lo que he 
visto, con tu aportaci�n primera, es que es el porcentaje de Votos OK/NOK, pero 
me queda la duda de saber si el 1 es el OK y el -1 el NOK o al rev�s.

Gracias Carlos!
Jes�s
Date: Tue, 12 Apr 2016 10:28:44 +0200
Subject: Re: [R-es] Random Forest para clasificaci�n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Entonces, por tu �ltima pregunta, tu duda no es realmente sobre el significado 
de "partialPlot" si no realmente si a la hora de hacer tu modelo, 
"randomForest" est� haciendo una buena o mala clasificaci�n. �Es as�?. Porque 
entonces lo que hay que aclarar es otra cosa.
Si lo que quieres determinar con precisi�n es si cuando "randomForest" (o 
cualquier otro modelo) te est� indicando que un individuo (una fila) pertenece 
o no a una clase determinada (en tu caso "OK" o "KO") lo que tienes que 
plantearte son otras cosas. Antes de hablar de ellas, prefiero confirmar 
contigo si es esto lo que buscas o no.
Saludos,Carlos Ortegawww.qualityexcellence.es

El 12 de abril de 2016, 10:17, Jes�s Para Fern�ndez 
 escribi�:




Gracias por la pronta respuesta, pero tras leer la contestaci�n de la gente, 
sigo sin entender muy bien la explicaci�n.

Le responden lo siguiente:
"Each point on the partial dependence plot is the average vote 
percentage in favor of the "Yes trees" class across all observations, 
given a fixed level of TRI.

It's not a probability of correct classification. It has absolutely 
nothing to do with accuracy, true negatives, and true positives.


When you see the phrase



  Values greater than TRI 30 begin to have a positive influence for 
classification in your model



is an puffed-up way of saying



  Values greater than TRI 30 begin to predict "Yes trees" more strongly than 
values lower than TRI 30"


Es decir, que el eje Y es el total de votos de una clase frente a la otra, 
pero como est� codificado con -1 y +1, como se cual es la clase OK y la NOK?

Gracias
Jes�s
Date: Tue, 12 Apr 2016 10:04:15 +0200
Subject: Re: [R-es] Random Forest para clasificaci�n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Aqu� tienes una explicaci�n:
http://stats.stackexchange.com/questions/121383/interpreting-y-axis-of-a-partial-dependence-plots

Saludos,Carlos Ortegawww.qualityexcellence.es
El 12 de abril de 2016, 7:13, Jes�s Para Fern�ndez 
 escribi�:
Buenas,



Cuando estoy haciendo un random Forest para clasificaci�n y hago el gr�fico 
partialPlot, siendo mi respuesta OK/NOK, me sale en el eje X el valor de la 
variable, pero en el eje Y me salen valores de entre -1 y 1. �Qu� significado 
tiene?



Adjunto un ejemplo:



https://www.dropbox.com/s/4b92lqxi3592r0d/Captura.JPG?dl=0





Gracias!!!



[[alternative HTML version deleted]]




___

R-help-es mailing list

R-help-es@r-project.org

https://stat.ethz.ch/mailman/listinfo/r-help-es


-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es
  


-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es
  
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Isidro Hidalgo Arellano
Lo razonable es pensar que los positivos son la clase "OK", la rama "SÍ" de los 
árboles.
Un saludo

Isidro Hidalgo Arellano
Observatorio del Mercado de Trabajo
Consejería de Economía, Empresas y Empleo
http://www.castillalamancha.es/



-Mensaje original-
De: R-help-es [mailto:r-help-es-boun...@r-project.org] En nombre de Jesús Para 
Fernández
Enviado el: martes, 12 de abril de 2016 10:18
Para: Carlos Ortega 
CC: r-help-es@r-project.org
Asunto: Re: [R-es] Random Forest para clasificación


Gracias por la pronta respuesta, pero tras leer la contestaci n de la gente, 
sigo sin entender muy bien la explicaci n.

Le responden lo siguiente:
"Each point on the partial dependence plot is the average vote percentage in 
favor of the "Yes trees" class across all observations, given a fixed level of 
TRI.

It's not a probability of correct classification. It has absolutely nothing to 
do with accuracy, true negatives, and true positives.


When you see the phrase



  Values greater than TRI 30 begin to have a positive influence for 
classification in your model



is an puffed-up way of saying



  Values greater than TRI 30 begin to predict "Yes trees" more strongly than 
values lower than TRI 30"


Es decir, que el eje Y es el total de votos de una clase frente a la otra, 
pero como est  codificado con -1 y +1, como se cual es la clase OK y la NOK?

Gracias
Jes s
Date: Tue, 12 Apr 2016 10:04:15 +0200
Subject: Re: [R-es] Random Forest para clasificaci n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Aqu  tienes una explicaci n:
http://stats.stackexchange.com/questions/121383/interpreting-y-axis-of-a-partial-dependence-plots

Saludos,Carlos Ortegawww.qualityexcellence.es
El 12 de abril de 2016, 7:13, Jes s Para Fern ndez 
 escribi :
Buenas,



Cuando estoy haciendo un random Forest para clasificaci n y hago el gr fico 
partialPlot, siendo mi respuesta OK/NOK, me sale en el eje X el valor de la 
variable, pero en el eje Y me salen valores de entre -1 y 1.  Qu  significado 
tiene?



Adjunto un ejemplo:



https://www.dropbox.com/s/4b92lqxi3592r0d/Captura.JPG?dl=0





Gracias!!!



[[alternative HTML version deleted]]




___

R-help-es mailing list

R-help-es@r-project.org

https://stat.ethz.ch/mailman/listinfo/r-help-es


-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es
  
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Carlos Ortega
Hola,

Entonces, por tu última pregunta, tu duda no es realmente sobre el
significado de "partialPlot" si no realmente si a la hora de hacer tu
modelo, "randomForest" está haciendo una buena o mala clasificación. ¿Es
así?. Porque entonces lo que hay que aclarar es otra cosa.

Si lo que quieres determinar con precisión es si cuando "randomForest" (o
cualquier otro modelo) te está indicando que un individuo (una fila)
pertenece o no a una clase determinada (en tu caso "OK" o "KO") lo que
tienes que plantearte son otras cosas. Antes de hablar de ellas, prefiero
confirmar contigo si es esto lo que buscas o no.

Saludos,
Carlos Ortega
www.qualityexcellence.es


El 12 de abril de 2016, 10:17, Jesús Para Fernández <
j.para.fernan...@hotmail.com> escribió:

>
> Gracias por la pronta respuesta, pero tras leer la contestación de la
> gente, sigo sin entender muy bien la explicación.
>
> Le responden lo siguiente:
> "Each point on the partial dependence plot is the average vote percentage
> in favor of the "Yes trees" class across all observations, given a fixed
> level of TRI.
> It's not a probability of correct classification. It has absolutely
> nothing to do with accuracy, true negatives, and true positives.
> When you see the phrase
>
> Values greater than TRI 30 begin to have a positive influence for
> classification in your model
>
> is an puffed-up way of saying
>
> Values greater than TRI 30 begin to predict "Yes trees" more strongly than
> values lower than TRI 30"
>
> Es decir, que el eje Y es el total de votos de una clase frente a la otra,
> pero como está codificado con -1 y +1, como se cual es la clase OK y la NOK?
>
> Gracias
> Jesús
> --
> Date: Tue, 12 Apr 2016 10:04:15 +0200
> Subject: Re: [R-es] Random Forest para clasificación
> From: c...@qualityexcellence.es
> To: j.para.fernan...@hotmail.com
> CC: r-help-es@r-project.org
>
>
> Hola,
>
> Aquí tienes una explicación:
>
>
> http://stats.stackexchange.com/questions/121383/interpreting-y-axis-of-a-partial-dependence-plots
>
> Saludos,
> Carlos Ortega
> www.qualityexcellence.es
>
> El 12 de abril de 2016, 7:13, Jesús Para Fernández <
> j.para.fernan...@hotmail.com> escribió:
>
> Buenas,
>
> Cuando estoy haciendo un random Forest para clasificación y hago el
> gráfico partialPlot, siendo mi respuesta OK/NOK, me sale en el eje X el
> valor de la variable, pero en el eje Y me salen valores de entre -1 y 1.
> ¿Qué significado tiene?
>
> Adjunto un ejemplo:
>
> https://www.dropbox.com/s/4b92lqxi3592r0d/Captura.JPG?dl=0
>
>
> Gracias!!!
>
> [[alternative HTML version deleted]]
>
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
>
>
>
> --
> Saludos,
> Carlos Ortega
> www.qualityexcellence.es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Jesús Para Fernández

Gracias por la pronta respuesta, pero tras leer la contestaci�n de la gente, 
sigo sin entender muy bien la explicaci�n.

Le responden lo siguiente:
"Each point on the partial dependence plot is the average vote 
percentage in favor of the "Yes trees" class across all observations, 
given a fixed level of TRI.

It's not a probability of correct classification. It has absolutely 
nothing to do with accuracy, true negatives, and true positives.


When you see the phrase



  Values greater than TRI 30 begin to have a positive influence for 
classification in your model



is an puffed-up way of saying



  Values greater than TRI 30 begin to predict "Yes trees" more strongly than 
values lower than TRI 30"


Es decir, que el eje Y es el total de votos de una clase frente a la otra, 
pero como est� codificado con -1 y +1, como se cual es la clase OK y la NOK?

Gracias
Jes�s
Date: Tue, 12 Apr 2016 10:04:15 +0200
Subject: Re: [R-es] Random Forest para clasificaci�n
From: c...@qualityexcellence.es
To: j.para.fernan...@hotmail.com
CC: r-help-es@r-project.org

Hola,
Aqu� tienes una explicaci�n:
http://stats.stackexchange.com/questions/121383/interpreting-y-axis-of-a-partial-dependence-plots

Saludos,Carlos Ortegawww.qualityexcellence.es
El 12 de abril de 2016, 7:13, Jes�s Para Fern�ndez 
 escribi�:
Buenas,



Cuando estoy haciendo un random Forest para clasificaci�n y hago el gr�fico 
partialPlot, siendo mi respuesta OK/NOK, me sale en el eje X el valor de la 
variable, pero en el eje Y me salen valores de entre -1 y 1. �Qu� significado 
tiene?



Adjunto un ejemplo:



https://www.dropbox.com/s/4b92lqxi3592r0d/Captura.JPG?dl=0





Gracias!!!



[[alternative HTML version deleted]]




___

R-help-es mailing list

R-help-es@r-project.org

https://stat.ethz.ch/mailman/listinfo/r-help-es


-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es
  
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R-es] Package para Selección de Características en Series de Tiempo

2016-04-12 Thread Carlos Ortega
Hola,

No hay una solución única a lo que planteas. Incluso los algoritmos que te
ayudan en la selección, son eso ayudas sobre lo que tú al final tienes que
tomar decisiones. De hecho lo que quieres hacer es casi una rama de estudio
dentro de esto del "Machine Learning". Busca por el concepto de "Feature
Engineering".

Dicho esto, para ver cosas básicas como colinealidad entre tus predictores,
o que haya poca o nula variabilidad en alguna de ellas; aspectos que
perjudican a los modelos:

   - el paquete "caret" tiene funciones que te pueden ayudar:
   "findCorrelation()", "nearZeroVar()".
   - Otro paquete relacionado "fscaret" (Automated Feature Selection from
   'caret') te ayudará a determinar la "importance" de tus variables
   utilizando diferentes modelos combinados. Como ejemplo de su aplicación
   mira este ejemplo:
   http://amunategui.github.io/fscaret-Walkthrough/index.html
   - También el paquete "Boruta" te puede ayudar en esto (que utiliza para
   esta selección de variables los randomForest).

Además de estas aproximaciones, hay otras más sofisticadas, pero prueba con
esto para empezar.

Saludos,
Carlos Ortega
www.qualityexcellence.es

El 12 de abril de 2016, 3:12, Elkin Tabares 
escribió:

> Buenas noches a la Comunidad R,
>
> Deseo hacer una consulta de si existe en R un paquete similar a   bestgml,
> lasso y  leaps, pero que sea para series de tiempo, la idea es estimar un
> modelo ARMAX y una Red Neuronal, la idea es seleccionar las mejores
> variables expplicativas, muchas gracias por su colaboración.
>
> Saludos,
>
> Cordialmente
>
> --
> Elkin Tabares Orozco
> Economista
> Universidad de Antioquia
> Cel:3017226361
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Random Forest para clasificación

2016-04-12 Thread Carlos Ortega
Hola,

Aquí tienes una explicación:

http://stats.stackexchange.com/questions/121383/interpreting-y-axis-of-a-partial-dependence-plots

Saludos,
Carlos Ortega
www.qualityexcellence.es

El 12 de abril de 2016, 7:13, Jesús Para Fernández <
j.para.fernan...@hotmail.com> escribió:

> Buenas,
>
> Cuando estoy haciendo un random Forest para clasificación y hago el
> gráfico partialPlot, siendo mi respuesta OK/NOK, me sale en el eje X el
> valor de la variable, pero en el eje Y me salen valores de entre -1 y 1.
> ¿Qué significado tiene?
>
> Adjunto un ejemplo:
>
> https://www.dropbox.com/s/4b92lqxi3592r0d/Captura.JPG?dl=0
>
>
> Gracias!!!
>
> [[alternative HTML version deleted]]
>
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es