[R-es] Conexión R y Lime survey

2020-08-25 Thread David Contreras
Cordial saludo a todos,

Espero que se encuentren muy bien.

Estoy tratando de automatizar un cleaning de una base de datos, por lo que
requiero que se conecte automáticamente a una data generada de una encuesta
en LimeSurvey. Para esto seguí un par de códigos sencillos que encontré de
LimeSurvey y GitHub:

# first limer (check version: must be recent) must be
installedif(!require("devtools")) {
  install.packages("devtools")
  library("devtools")}
install_github("cloudyr/limer")#
library(limer)
#change the next options (website, user, password)options(lime_api =
'https://www.XXX.nl/index.php/admin/remotecontrol')options(lime_username
= 'user')options(lime_password =
'password')#

# first get a session access key
get_session_key()


Pero al ejecutar el get_session_key () se genera este error:

Error: Argument 'txt' must be a JSON string, URL or file.

Luego de leer varios casos no entiendo aún realmente cómo solucionar esto,
por lo que si alguien ha presentado esto mismo y lo ha solucionado
agradecería una guía,

Agradezco la ayuda que me puedan dar con esto.

Saludos,

*David Contreras*

Estadístico

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Matching backslash in a table's column using R language

2020-08-25 Thread Peter Bishop
To be honest, I've only used the hex values as that was the format in which the 
patterns were passed to me. 

However from your explanation, I now understand what's going on. I didn't 
appreciate that the characters were passed to another layer and not seeing the 
hex code as the raw backslash. 

Many thanks for the explanation. 

> On 25 Aug 2020, at 20:53, Jeff Newmiller  wrote:
> 
> In my opinion, using hexadecimal ASCII is much more obscure than simply 
> using the escape character properly... that is, you are doing no-one any 
> favors by using them. But to attain clarity here, you need to envision what 
> the various software layers are doing.
> 
> In your case, SQLServer may not utilize escape character, but it is passing 
> your R code to the R interpreter, which does use the escape character to 
> convert source code into strings in memory, which are then passed into the 
> regex parser, which is the final layer that also handles the same escape 
> character. 
> 
> What may be confusing you is the distinction between what is in memory that 
> the regex parser sees:
> 
> ["',?\\`]
> 
> and what the R string literal looks like that you should type to get this 
> string into memory:
> 
> "[\"',?`]"
> 
> When you pass the latter literal to the cat() function, it will show you the 
> former version. When you have the literal stored in memory, you can use the 
> print() function to see what you have to type as a literal string to get the 
> in-memory version. I use this trick (cat) to help me zero in on what is 
> actually getting passed to the regex engine when I have difficulty 
> envisioning what is going on.
> 
> The regex engine needs that doubled backslash to recognize that _it_ should 
> not give special treatment to the \ there, and should look for it in the 
> input data.
> 
>> On August 25, 2020 12:16:35 PM PDT, Peter Bishop  
>> wrote:
>> The feed is coming from a SQL table and this is using the embedded
>> support for R which comes with SQL 2016. The source is therefore a
>> SELECT statement.
>> 
>> 
>> As an aside, I found a workaround by changing the pattern from:
>> 
>> 
>> "[\x22\x27\x2c\x3f\x5c\x60]"
>> 
>> 
>> to:
>> 
>> 
>> "[\x22\x27\x2c\x3f\x5c\x5c\x60]"
>> 
>> 
>> This seems to be escaping the backslash in the R script rather than in
>> the data - which confuses me.
>> 
>> 
>> From: Bert Gunter 
>> Sent: Wednesday, 26 August 2020 4:26 AM
>> To: Peter Bishop 
>> Cc: r-help@r-project.org 
>> Subject: Re: [R] Matching backslash in a table's column using R
>> language
>> 
>> 1. I am far from an expert on such matters
>> 2. It is unclear to me what your input is -- I assume a file.
>> 
>> The problem, as you indicate, is that R's parser sees "\B" as an
>> incorrect escape character, so, for example:
>>> cat("\B")
>> Error: '\B' is an unrecognized escape in character string starting
>> ""\B"
>> 
>> In any case, I think you should look at ?scan. Here is an example where
>> I scan from the keyboard first and then remove the "\". You may have to
>> scan from a file to do this.
>> 
>>> z <-scan(file = "", what = "character")
>> 1: A\BCDEFG
>> 2: #CR terminates input
>> Read 1 item
>> 
>>> cat(z)
>> A\BCDEFG
>> 
>>> nchar(z)
>> [1] 8  ## scan read in the "\" as a single character from the console.
>> 
>>> sub("","",z)  ## Yes, 4 backslashes
>> [1] "ABCDEFG"
>> 
>> There may be better ways to do this, but as I said, I'm no expert.
>> 
>> BTW, in posting here, please post in *plain text,* as the server can
>> mangle html.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop
>> mailto:bishop_pet...@hotmail.com>> wrote:
>> In SQL, I'm using R as a way to filter data based on:
>>   - 20 characters in the range  to 
>> - excluding , , , ,
>> , 
>> 
>> Given a SQL column containing the data:
>> 
>>   code
>>   
>>   A\BCDEFG
>> 
>> and the T-SQL script:
>> 
>>   EXEC [sys].[sp_execute_external_script]
>>   @language=N'R',
>>   @script=N'
>>   pattern1 = "^[\x20-\x7e]{1,20}$"
>>   pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]"
>> 
>> outData <- subset(inData, grepl(pattern1, code, perl=TRUE) &
>> !grepl(pattern2, code, perl=TRUE))',
>>   @input_data_1 = N'SELECT [code] FROM [dbo].[products]',
>>   @input_data_1_name = N'inData',
>>   @output_data_1_name = N'outData'
>>   WITH
>>   RESULT SETS (AS OBJECT [dbo].[products]);
>>   GO
>> 
>> why does the row detailed above get returned? I know that backslash is
>> a special character but not in the SQL table. Consequently, the T-SQL
>> code:
>> 
>>   SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table]
>> 
>> returns 92 (the ASCII code for ) which shows that this is
>> being recognised as a backslash character and not as an escape
>> 

Re: [R] Matching backslash in a table's column using R language

2020-08-25 Thread Jeff Newmiller
In my opinion, using hexadecimal ASCII is much more obscure than simply using 
the escape character properly... that is, you are doing no-one any favors by 
using them. But to attain clarity here, you need to envision what the various 
software layers are doing.

In your case, SQLServer may not utilize escape character, but it is passing 
your R code to the R interpreter, which does use the escape character to 
convert source code into strings in memory, which are then passed into the 
regex parser, which is the final layer that also handles the same escape 
character. 

What may be confusing you is the distinction between what is in memory that the 
regex parser sees:

["',?\\`]

and what the R string literal looks like that you should type to get this 
string into memory:

"[\"',?`]"

When you pass the latter literal to the cat() function, it will show you the 
former version. When you have the literal stored in memory, you can use the 
print() function to see what you have to type as a literal string to get the 
in-memory version. I use this trick (cat) to help me zero in on what is 
actually getting passed to the regex engine when I have difficulty envisioning 
what is going on.

The regex engine needs that doubled backslash to recognize that _it_ should not 
give special treatment to the \ there, and should look for it in the input data.

On August 25, 2020 12:16:35 PM PDT, Peter Bishop  
wrote:
>The feed is coming from a SQL table and this is using the embedded
>support for R which comes with SQL 2016. The source is therefore a
>SELECT statement.
>
>
>As an aside, I found a workaround by changing the pattern from:
>
>
>"[\x22\x27\x2c\x3f\x5c\x60]"
>
>
>to:
>
>
>"[\x22\x27\x2c\x3f\x5c\x5c\x60]"
>
>
>This seems to be escaping the backslash in the R script rather than in
>the data - which confuses me.
>
>
>From: Bert Gunter 
>Sent: Wednesday, 26 August 2020 4:26 AM
>To: Peter Bishop 
>Cc: r-help@r-project.org 
>Subject: Re: [R] Matching backslash in a table's column using R
>language
>
>1. I am far from an expert on such matters
>2. It is unclear to me what your input is -- I assume a file.
>
>The problem, as you indicate, is that R's parser sees "\B" as an
>incorrect escape character, so, for example:
>> cat("\B")
>Error: '\B' is an unrecognized escape in character string starting
>""\B"
>
>In any case, I think you should look at ?scan. Here is an example where
>I scan from the keyboard first and then remove the "\". You may have to
>scan from a file to do this.
>
>> z <-scan(file = "", what = "character")
>1: A\BCDEFG
>2: #CR terminates input
>Read 1 item
>
>> cat(z)
>A\BCDEFG
>
>> nchar(z)
>[1] 8  ## scan read in the "\" as a single character from the console.
>
>> sub("","",z)  ## Yes, 4 backslashes
>[1] "ABCDEFG"
>
>There may be better ways to do this, but as I said, I'm no expert.
>
>BTW, in posting here, please post in *plain text,* as the server can
>mangle html.
>
>Cheers,
>Bert
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop
>mailto:bishop_pet...@hotmail.com>> wrote:
>In SQL, I'm using R as a way to filter data based on:
>- 20 characters in the range  to 
>- excluding , , , ,
>, 
>
>Given a SQL column containing the data:
>
>code
>
>A\BCDEFG
>
>and the T-SQL script:
>
>EXEC [sys].[sp_execute_external_script]
>@language=N'R',
>@script=N'
>pattern1 = "^[\x20-\x7e]{1,20}$"
>pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]"
>
>outData <- subset(inData, grepl(pattern1, code, perl=TRUE) &
>!grepl(pattern2, code, perl=TRUE))',
>@input_data_1 = N'SELECT [code] FROM [dbo].[products]',
>@input_data_1_name = N'inData',
>@output_data_1_name = N'outData'
>WITH
>RESULT SETS (AS OBJECT [dbo].[products]);
>GO
>
>why does the row detailed above get returned? I know that backslash is
>a special character but not in the SQL table. Consequently, the T-SQL
>code:
>
>SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table]
>
>returns 92 (the ASCII code for ) which shows that this is
>being recognised as a backslash character and not as an escape
>indicator for the following "B".
>
>Can anyone advise how I can filter out the  in the way that
>the other identified characters are being successfully filtered? As the
>data is being retrieved from a table, I can�t ask the data provider to
>use �\\� instead of �\� as that will be invalid for other uses.
>
>Thanks.
>
>[[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To
>UNSUBSCRIBE and more, see

Re: [R] Matching backslash in a table's column using R language

2020-08-25 Thread Peter Bishop
The feed is coming from a SQL table and this is using the embedded support for 
R which comes with SQL 2016. The source is therefore a SELECT statement.


As an aside, I found a workaround by changing the pattern from:


"[\x22\x27\x2c\x3f\x5c\x60]"


to:


"[\x22\x27\x2c\x3f\x5c\x5c\x60]"


This seems to be escaping the backslash in the R script rather than in the data 
- which confuses me.


From: Bert Gunter 
Sent: Wednesday, 26 August 2020 4:26 AM
To: Peter Bishop 
Cc: r-help@r-project.org 
Subject: Re: [R] Matching backslash in a table's column using R language

1. I am far from an expert on such matters
2. It is unclear to me what your input is -- I assume a file.

The problem, as you indicate, is that R's parser sees "\B" as an incorrect 
escape character, so, for example:
> cat("\B")
Error: '\B' is an unrecognized escape in character string starting ""\B"

In any case, I think you should look at ?scan. Here is an example where I scan 
from the keyboard first and then remove the "\". You may have to scan from a 
file to do this.

> z <-scan(file = "", what = "character")
1: A\BCDEFG
2: #CR terminates input
Read 1 item

> cat(z)
A\BCDEFG

> nchar(z)
[1] 8  ## scan read in the "\" as a single character from the console.

> sub("","",z)  ## Yes, 4 backslashes
[1] "ABCDEFG"

There may be better ways to do this, but as I said, I'm no expert.

BTW, in posting here, please post in *plain text,* as the server can mangle 
html.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop 
mailto:bishop_pet...@hotmail.com>> wrote:
In SQL, I'm using R as a way to filter data based on:
- 20 characters in the range  to 
- excluding , , , , , 


Given a SQL column containing the data:

code

A\BCDEFG

and the T-SQL script:

EXEC [sys].[sp_execute_external_script]
@language=N'R',
@script=N'
pattern1 = "^[\x20-\x7e]{1,20}$"
pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]"

outData <- subset(inData, grepl(pattern1, code, perl=TRUE) & 
!grepl(pattern2, code, perl=TRUE))',
@input_data_1 = N'SELECT [code] FROM [dbo].[products]',
@input_data_1_name = N'inData',
@output_data_1_name = N'outData'
WITH
RESULT SETS (AS OBJECT [dbo].[products]);
GO

why does the row detailed above get returned? I know that backslash is a 
special character but not in the SQL table. Consequently, the T-SQL code:

SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table]

returns 92 (the ASCII code for ) which shows that this is being 
recognised as a backslash character and not as an escape indicator for the 
following "B".

Can anyone advise how I can filter out the  in the way that the 
other identified characters are being successfully filtered? As the data is 
being retrieved from a table, I can�t ask the data provider to use �\\� instead 
of �\� as that will be invalid for other uses.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matching backslash in a table's column using R language

2020-08-25 Thread Bert Gunter
1. I am far from an expert on such matters
2. It is unclear to me what your input is -- I assume a file.

The problem, as you indicate, is that R's parser sees "\B" as an incorrect
escape character, so, for example:
> cat("\B")
Error: '\B' is an unrecognized escape in character string starting ""\B"

In any case, I think you should look at ?scan. Here is an example where I
scan from the keyboard first and then remove the "\". You may have to scan
from a file to do this.

> z <-scan(file = "", what = "character")
1: A\BCDEFG
2: #CR terminates input
Read 1 item

> cat(z)
A\BCDEFG

> nchar(z)
[1] 8  ## scan read in the "\" as a single character from the console.

> sub("","",z)  ## Yes, 4 backslashes
[1] "ABCDEFG"

There may be better ways to do this, but as I said, I'm no expert.

BTW, in posting here, please post in *plain text,* as the server can mangle
html.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop 
wrote:

> In SQL, I'm using R as a way to filter data based on:
> - 20 characters in the range  to 
> - excluding , , , ,
> , 
>
> Given a SQL column containing the data:
>
> code
> 
> A\BCDEFG
>
> and the T-SQL script:
>
> EXEC [sys].[sp_execute_external_script]
> @language=N'R',
> @script=N'
> pattern1 = "^[\x20-\x7e]{1,20}$"
> pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]"
>
> outData <- subset(inData, grepl(pattern1, code, perl=TRUE) &
> !grepl(pattern2, code, perl=TRUE))',
> @input_data_1 = N'SELECT [code] FROM [dbo].[products]',
> @input_data_1_name = N'inData',
> @output_data_1_name = N'outData'
> WITH
> RESULT SETS (AS OBJECT [dbo].[products]);
> GO
>
> why does the row detailed above get returned? I know that backslash is a
> special character but not in the SQL table. Consequently, the T-SQL code:
>
> SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table]
>
> returns 92 (the ASCII code for ) which shows that this is being
> recognised as a backslash character and not as an escape indicator for the
> following "B".
>
> Can anyone advise how I can filter out the  in the way that the
> other identified characters are being successfully filtered? As the data is
> being retrieved from a table, I can’t ask the data provider to use “\\”
> instead of “\” as that will be invalid for other uses.
>
> Thanks.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain individual log-likelihood value from glm?

2020-08-25 Thread peter dalgaard
If you don't worry too much about an additive constant, then half the negative 
squared deviance residuals should do. (Not quite sure how weights factor in. 
Looks like they are accounted for.)

-pd

> On 25 Aug 2020, at 17:33 , John Smith  wrote:
> 
> Dear R-help,
> 
> The function logLik can be used to obtain the maximum log-likelihood value
> from a glm object. This is an aggregated value, a summation of individual
> log-likelihood values. How do I obtain individual values? In the following
> example, I would expect 9 numbers since the response has length 9. I could
> write a function to compute the values, but there are lots of
> family members in glm, and I am trying not to reinvent wheels. Thanks!
> 
> counts <- c(18,17,15,20,10,20,25,13,12)
> outcome <- gl(3,1,9)
> treatment <- gl(3,3)
> data.frame(treatment, outcome, counts) # showing data
> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
> (ll <- logLik(glm.D93))
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain individual log-likelihood value from glm?

2020-08-25 Thread Bert Gunter
If you look at

stats:::logLik.glm  #3 ":" because it's unexported, as is true of most
methods

it should be obvious.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 25, 2020 at 8:34 AM John Smith  wrote:

> Dear R-help,
>
> The function logLik can be used to obtain the maximum log-likelihood value
> from a glm object. This is an aggregated value, a summation of individual
> log-likelihood values. How do I obtain individual values? In the following
> example, I would expect 9 numbers since the response has length 9. I could
> write a function to compute the values, but there are lots of
> family members in glm, and I am trying not to reinvent wheels. Thanks!
>
> counts <- c(18,17,15,20,10,20,25,13,12)
>  outcome <- gl(3,1,9)
>  treatment <- gl(3,3)
>  data.frame(treatment, outcome, counts) # showing data
>  glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
>  (ll <- logLik(glm.D93))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matching backslash in a table's column using R language

2020-08-25 Thread Peter Bishop
In SQL, I'm using R as a way to filter data based on:
- 20 characters in the range  to 
- excluding , , , , , 


Given a SQL column containing the data:

code

A\BCDEFG

and the T-SQL script:

EXEC [sys].[sp_execute_external_script]
@language=N'R',
@script=N'
pattern1 = "^[\x20-\x7e]{1,20}$"
pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]"

outData <- subset(inData, grepl(pattern1, code, perl=TRUE) & 
!grepl(pattern2, code, perl=TRUE))',
@input_data_1 = N'SELECT [code] FROM [dbo].[products]',
@input_data_1_name = N'inData',
@output_data_1_name = N'outData'
WITH
RESULT SETS (AS OBJECT [dbo].[products]);
GO

why does the row detailed above get returned? I know that backslash is a 
special character but not in the SQL table. Consequently, the T-SQL code:

SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table]

returns 92 (the ASCII code for ) which shows that this is being 
recognised as a backslash character and not as an escape indicator for the 
following "B".

Can anyone advise how I can filter out the  in the way that the 
other identified characters are being successfully filtered? As the data is 
being retrieved from a table, I can�t ask the data provider to use �\\� instead 
of �\� as that will be invalid for other uses.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to obtain individual log-likelihood value from glm?

2020-08-25 Thread John Smith
Dear R-help,

The function logLik can be used to obtain the maximum log-likelihood value
from a glm object. This is an aggregated value, a summation of individual
log-likelihood values. How do I obtain individual values? In the following
example, I would expect 9 numbers since the response has length 9. I could
write a function to compute the values, but there are lots of
family members in glm, and I am trying not to reinvent wheels. Thanks!

counts <- c(18,17,15,20,10,20,25,13,12)
 outcome <- gl(3,1,9)
 treatment <- gl(3,3)
 data.frame(treatment, outcome, counts) # showing data
 glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
 (ll <- logLik(glm.D93))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification Tree Prediction Error

2020-08-25 Thread John Smith
As Bert advised correctly, this is not an R programming question. There is
some misunderstanding on how training//test data work together
in predictions. Suppose your test data has only one class. Therefore, you can
get the following rate by betting on the majority class every time, again
using data from the test set. In this case, the misclassification rate is
0! Of course no classification algorithm can beat that prediction for which
you already utilize the truth in the test data. In conclusion, the tree
model you provided has accuracy 0.837, which is very close to 0.85. I would
not complain.

On Tue, Aug 25, 2020 at 9:19 AM Xu Jun  wrote:

> Thank you for your comment! This tree function is from the tree package.
> Although it might be a pure statistical question, it could be related to
> how the tree function is used. I will explore the site that you suggested.
> But if there is anyone who can figure it out off the top of their head, I'd
> very much appreciate it.
>
> Jun
>
> On Mon, Aug 24, 2020 at 1:01 PM Bert Gunter 
> wrote:
>
> > Purely statistical questions -- as opposed to R programming queries --
> are
> > generally off topic here.
> > Here is where they are on topic:  https://stats.stackexchange.com/
> >
> > Suggestion: when you post, do include the package name where you get
> > tree() from, as there might be
> > more than one with this function.
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Mon, Aug 24, 2020 at 8:58 AM Xu Jun  wrote:
> >
> >> Dear all R experts,
> >>
> >> I have a question about using cross-validation to assess results
> estimated
> >> from a classification tree model. I annotated what each line does in
> the R
> >> code chunk below. Basically, I split the data, named usedta, into 70%
> vs.
> >> 30%, with the training set having 70% and the test set 30% of the
> original
> >> cases. After splitting the data, I first run a classification tree off
> the
> >> training set, and then use the results for cross-validation using the
> test
> >> set. It turns out that if I don't have any predictors and make
> predictions
> >> by simply betting on the majority class of the zero-one coding of the
> >> binary response variable, I can do better than what the results from the
> >> classification tree would deliver in the test set. What would this imply
> >> and what would cause this problem? Does it mean that classification tree
> >> is
> >> not an appropriate method for my data; or, it's because I have too few
> >> variables? Thanks a lot!
> >>
> >> Jun Xu, PhD
> >> Professor
> >> Department of Sociology
> >> Ball State University
> >> Muncie, IN 47306
> >> USA
> >>
> >> Using the estimates, I get the following prediction rate (correct
> >> prediction) using the test set. Or we can say the misclassification
> error
> >> rate is 1-0.837 = 0.163
> >>
> >> > (tab[1,1] + tab[2,2]) / sum(tab)[1] 0.837
> >>
> >>
> >> Without any predictors, I can get the following rate by betting on the
> >> majority class every time, again using data from the test set. In this
> >> case, the misclassification error rate is 1-0.85 = 0.15
> >>
> >> > table(h2.test)h2.test
> >> 1poorHlth 0goodHlth
> >>   101   575 > 571/(571+101)[1] 0.85
> >>
> >>
> >>
> >> R Code Chunk
> >>
> >> # set the seed for random number generator for replication
> >> set.seed(47306)
> >> # have the 7/3 split with 70% of the cases allotted to the training set
> >> # AND create the training set identifier
> >> class.train = sample(1:nrow(usedta), nrow(usedta)*0.7)
> >> # create the test set indicator
> >> class.test = (-class.train)
> >> # create a vector for the binary response variable from the test set
> >> # for future cross-tabulation.
> >> h2.test <- usedta$h2[class.test]
> >> # count the train set cases
> >> Ntrain = length(usedta$h2[class.train])
> >> # run the classification tree model using the training set
> >> # h2 is the binary response and other variables are predictors
> >> tree.h2 <- tree(h2 ~ age + educ + female + white + married + happy,
> >> data = usedta, subset = class.train,
> >> control = tree.control(nobs=Ntrain, mindev=0.003))
> >> # summary results
> >> summary(tree.h2)
> >> # make predictions of h2 using the test set
> >> tree.h2.pred <- predict(tree.h2, usedta[class.test,], type="class")
> >> # cross tab the predictions using the test set
> >> table(tree.h2.pred, h2.test)
> >> tab = table(tree.h2.pred, h2.test)
> >> # calculate the ratio for the correctly predicted in the test set
> >> (tab[1,1] + tab[2,2]) / sum(tab)
> >> # calculate the ratio for the correctly predicted using the naive
> approach
> >> # by betting on the majority category.
> >> table(h2.test)[2]/sum(tab)
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> 

Re: [R] Classification Tree Prediction Error

2020-08-25 Thread Xu Jun
Thank you for your comment! This tree function is from the tree package.
Although it might be a pure statistical question, it could be related to
how the tree function is used. I will explore the site that you suggested.
But if there is anyone who can figure it out off the top of their head, I'd
very much appreciate it.

Jun

On Mon, Aug 24, 2020 at 1:01 PM Bert Gunter  wrote:

> Purely statistical questions -- as opposed to R programming queries -- are
> generally off topic here.
> Here is where they are on topic:  https://stats.stackexchange.com/
>
> Suggestion: when you post, do include the package name where you get
> tree() from, as there might be
> more than one with this function.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Aug 24, 2020 at 8:58 AM Xu Jun  wrote:
>
>> Dear all R experts,
>>
>> I have a question about using cross-validation to assess results estimated
>> from a classification tree model. I annotated what each line does in the R
>> code chunk below. Basically, I split the data, named usedta, into 70% vs.
>> 30%, with the training set having 70% and the test set 30% of the original
>> cases. After splitting the data, I first run a classification tree off the
>> training set, and then use the results for cross-validation using the test
>> set. It turns out that if I don't have any predictors and make predictions
>> by simply betting on the majority class of the zero-one coding of the
>> binary response variable, I can do better than what the results from the
>> classification tree would deliver in the test set. What would this imply
>> and what would cause this problem? Does it mean that classification tree
>> is
>> not an appropriate method for my data; or, it's because I have too few
>> variables? Thanks a lot!
>>
>> Jun Xu, PhD
>> Professor
>> Department of Sociology
>> Ball State University
>> Muncie, IN 47306
>> USA
>>
>> Using the estimates, I get the following prediction rate (correct
>> prediction) using the test set. Or we can say the misclassification error
>> rate is 1-0.837 = 0.163
>>
>> > (tab[1,1] + tab[2,2]) / sum(tab)[1] 0.837
>>
>>
>> Without any predictors, I can get the following rate by betting on the
>> majority class every time, again using data from the test set. In this
>> case, the misclassification error rate is 1-0.85 = 0.15
>>
>> > table(h2.test)h2.test
>> 1poorHlth 0goodHlth
>>   101   575 > 571/(571+101)[1] 0.85
>>
>>
>>
>> R Code Chunk
>>
>> # set the seed for random number generator for replication
>> set.seed(47306)
>> # have the 7/3 split with 70% of the cases allotted to the training set
>> # AND create the training set identifier
>> class.train = sample(1:nrow(usedta), nrow(usedta)*0.7)
>> # create the test set indicator
>> class.test = (-class.train)
>> # create a vector for the binary response variable from the test set
>> # for future cross-tabulation.
>> h2.test <- usedta$h2[class.test]
>> # count the train set cases
>> Ntrain = length(usedta$h2[class.train])
>> # run the classification tree model using the training set
>> # h2 is the binary response and other variables are predictors
>> tree.h2 <- tree(h2 ~ age + educ + female + white + married + happy,
>> data = usedta, subset = class.train,
>> control = tree.control(nobs=Ntrain, mindev=0.003))
>> # summary results
>> summary(tree.h2)
>> # make predictions of h2 using the test set
>> tree.h2.pred <- predict(tree.h2, usedta[class.test,], type="class")
>> # cross tab the predictions using the test set
>> table(tree.h2.pred, h2.test)
>> tab = table(tree.h2.pred, h2.test)
>> # calculate the ratio for the correctly predicted in the test set
>> (tab[1,1] + tab[2,2]) / sum(tab)
>> # calculate the ratio for the correctly predicted using the naive approach
>> # by betting on the majority category.
>> table(h2.test)[2]/sum(tab)
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] which.min, equal values and fractions

2020-08-25 Thread Ivan Krylov
On Tue, 25 Aug 2020 14:26:43 +0200
Mike  wrote:

> But which.min only does so if the values don't contain fractions.
> And I get
> 
> > identical (data3ba, c(2.9,2.9))  
> [1] FALSE
> 
> Why is which.min not always returning 1 but which.max does?

It's the unfortunate consequence of the way floating point numbers
work:

data3ba - 2.9
# [1]  0.00e+00 -4.440892e-16

See R FAQ 7.31:
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] which.min, equal values and fractions

2020-08-25 Thread Mike
Hi,

According to ?which.min it returns the "index of the (first)
minimum". So I would expect it to also return the first minimum when
providing two identical extrema. But my minimal reproducible doesn't
do so:

data1a <- c(3.2,4.2)
data1b <- c(3.1,4.1)

data2a <- c(0.2,1.2)
data2b <- c(4.2,5.2)

data3aa <- data1a - data2a
data3ba <- data1b - data2a
data3ab <- data1a - data2b
data3bb <- data1b - data2b

print (data3aa)
print (which.min (data3aa))
print (which.max (data3aa))

print (data3ba)
print (which.min (data3ba))
print (which.max (data3ba))

print (data3ab)
print (which.min (data3ab))
print (which.max (data3ab))

print (data3bb)
print (which.min (data3bb))
print (which.max (data3bb))

results in:

[1] 3 3
[1] 1
[1] 1
[1] 2.9 2.9
[1] 2
[1] 1
[1] -1 -1
[1] 1
[1] 1
[1] -1.1 -1.1
[1] 2
[1] 1

First of all which.max works as expected by always returning 1.

But which.min only does so if the values don't contain fractions.
And I get

> identical (data3ba, c(2.9,2.9))
[1] FALSE

Why is which.min not always returning 1 but which.max does?

Mike

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot 3-color gradient scales

2020-08-25 Thread Rui Barradas

Hello,

If you want a predetermined number of colors, discretise the data and 
use scale_color_manual. In the code below I first compute another vector 
z, with a different range, 0 to 2. (In my first mail it was 0 to 1.)


g <- function(x, a = 0, b = 1){
  (b - a)*(x - min(x))/(max(x) - min(x)) + a
}

library(ggplot2)

df1 <- iris[3:5]
names(df1)[1:2] <- c("x", "y")
df1$z <- ave(df1$y, df1$Species, FUN = function(x) g(x, a = 0, b = 2))


Now is the step that solves the problem, to bin the vector. Other 
options could include findInterval. Then the two plot instructions are 
equivalent.


df1$z <- cut(df1$z,
 breaks = c(-Inf, 0.8, 1.2, Inf),
 labels = c("Small", "Medium", "Large"))


ggplot(df1) +
  geom_point( aes(x, y, color = z) ) +
  scale_color_manual(values = c("red", "green", "blue"))

ggplot(df1) +
  geom_point( aes(x, y, color = z) ) +
  scale_color_manual(breaks = c("Small", "Medium", "Large"),
 values = c("Small" = "red", "Medium" = "green", 
"Large" = "blue"))



Hope this helps,

Rui Barradas


Às 10:38 de 25/08/20, April Ettington escreveu:
Is there a way to set it to 3 color categories instead of a gradient?  
Like if the color is based on the numbers in a dataframe column, can I 
make it so anything >1.2 is red, <0.8 is blue, and anything in the 
middle is green?



On Mon, Aug 24, 2020 at 6:28 PM April Ettington 
mailto:apriletting...@gmail.com>> wrote:


Thank you so much!


On Mon, Aug 24, 2020 at 5:33 PM Rui Barradas mailto:ruipbarra...@sapo.pt>> wrote:

Hello,

Note that the midpoint argument can make a big difference. In
the code
below try commenting out the line where the default is changed.


f <- function(x){
    (x - min(x))/(max(x) - min(x))
}

library(ggplot2)

df1 <- iris[3:5]
names(df1)[1:2] <- c("x", "y")
df1$z <- ave(df1$y, df1$Species, FUN = f)

ggplot(df1) +
    geom_point( aes(x, y, color = z) ) +
    scale_color_gradient2(low = "red",
                          mid = "yellow",
                          high = "blue",
                          midpoint = 0.5
                          )

Hope this helps,

Rui Barradas


Às 04:43 de 24/08/20, Jeff Newmiller escreveu:
 > Check out scale_colour_gradient2()
 >
 > On August 23, 2020 8:12:06 PM PDT, April Ettington
mailto:apriletting...@gmail.com>> wrote:
 >> Currently I am using these settings in ggplot to make a
gradient from
 >> red
 >> to blue.
 >>
 >> geom_point( aes(x, y, color=z) ) +
 >> scale_colour_gradient(low = "red",high = "blue") +
 >>
 >> z is a ratio, and currently I am able to identify which have
high and
 >> low
 >> values, but I'd really like to be able to distinguish which
are >1, <1,
 >> or
 >> close to 1 by color.  It would be great if I could set a
middle color
 >> in
 >> this gradient (eg. green) that is set the the value of 1,
even if that
 >> is
 >> not the exact midpoint between my highest and lowest
values.  Is there
 >> a
 >> way to do this in R?
 >>
 >> Thank you,
 >> April
 >>
 >>      [[alternative HTML version deleted]]
 >>
 >> __
 >> R-help@r-project.org  mailing
list -- To UNSUBSCRIBE and more, see
 >> https://stat.ethz.ch/mailman/listinfo/r-help
 >> PLEASE do read the posting guide
 >> http://www.R-project.org/posting-guide.html
 >> and provide commented, minimal, self-contained, reproducible
code.
 >



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot 3-color gradient scales

2020-08-25 Thread PIKAL Petr
Hi

Maybe scale_colour_manual?

Cheers
Petr
> -Original Message-
> From: R-help  On Behalf Of April Ettington
> Sent: Tuesday, August 25, 2020 11:39 AM
> To: Rui Barradas 
> Cc: r-help@r-project.org
> Subject: Re: [R] ggplot 3-color gradient scales
> 
> Is there a way to set it to 3 color categories instead of a gradient?  Like 
> if the
> color is based on the numbers in a dataframe column, can I make it so
> anything >1.2 is red, <0.8 is blue, and anything in the middle is green?
> 
> 
> On Mon, Aug 24, 2020 at 6:28 PM April Ettington 
> wrote:
> 
> > Thank you so much!
> >
> >
> > On Mon, Aug 24, 2020 at 5:33 PM Rui Barradas 
> wrote:
> >
> >> Hello,
> >>
> >> Note that the midpoint argument can make a big difference. In the
> >> code below try commenting out the line where the default is changed.
> >>
> >>
> >> f <- function(x){
> >>(x - min(x))/(max(x) - min(x))
> >> }
> >>
> >> library(ggplot2)
> >>
> >> df1 <- iris[3:5]
> >> names(df1)[1:2] <- c("x", "y")
> >> df1$z <- ave(df1$y, df1$Species, FUN = f)
> >>
> >> ggplot(df1) +
> >>geom_point( aes(x, y, color = z) ) +
> >>scale_color_gradient2(low = "red",
> >>  mid = "yellow",
> >>  high = "blue",
> >>  midpoint = 0.5
> >>  )
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >>
> >> Às 04:43 de 24/08/20, Jeff Newmiller escreveu:
> >> > Check out scale_colour_gradient2()
> >> >
> >> > On August 23, 2020 8:12:06 PM PDT, April Ettington <
> >> apriletting...@gmail.com> wrote:
> >> >> Currently I am using these settings in ggplot to make a gradient
> >> >> from red to blue.
> >> >>
> >> >> geom_point( aes(x, y, color=z) ) + scale_colour_gradient(low =
> >> >> "red",high = "blue") +
> >> >>
> >> >> z is a ratio, and currently I am able to identify which have high
> >> >> and low values, but I'd really like to be able to distinguish
> >> >> which are >1, <1, or close to 1 by color.  It would be great if I
> >> >> could set a middle color in this gradient (eg. green) that is set
> >> >> the the value of 1, even if that is not the exact midpoint between
> >> >> my highest and lowest values.  Is there a way to do this in R?
> >> >>
> >> >> Thank you,
> >> >> April
> >> >>
> >> >>  [[alternative HTML version deleted]]
> >> >>
> >> >> __
> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >
> >>
> >
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot 3-color gradient scales

2020-08-25 Thread April Ettington
Is there a way to set it to 3 color categories instead of a gradient?  Like
if the color is based on the numbers in a dataframe column, can I make it
so anything >1.2 is red, <0.8 is blue, and anything in the middle is green?


On Mon, Aug 24, 2020 at 6:28 PM April Ettington 
wrote:

> Thank you so much!
>
>
> On Mon, Aug 24, 2020 at 5:33 PM Rui Barradas  wrote:
>
>> Hello,
>>
>> Note that the midpoint argument can make a big difference. In the code
>> below try commenting out the line where the default is changed.
>>
>>
>> f <- function(x){
>>(x - min(x))/(max(x) - min(x))
>> }
>>
>> library(ggplot2)
>>
>> df1 <- iris[3:5]
>> names(df1)[1:2] <- c("x", "y")
>> df1$z <- ave(df1$y, df1$Species, FUN = f)
>>
>> ggplot(df1) +
>>geom_point( aes(x, y, color = z) ) +
>>scale_color_gradient2(low = "red",
>>  mid = "yellow",
>>  high = "blue",
>>  midpoint = 0.5
>>  )
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> Às 04:43 de 24/08/20, Jeff Newmiller escreveu:
>> > Check out scale_colour_gradient2()
>> >
>> > On August 23, 2020 8:12:06 PM PDT, April Ettington <
>> apriletting...@gmail.com> wrote:
>> >> Currently I am using these settings in ggplot to make a gradient from
>> >> red
>> >> to blue.
>> >>
>> >> geom_point( aes(x, y, color=z) ) +
>> >> scale_colour_gradient(low = "red",high = "blue") +
>> >>
>> >> z is a ratio, and currently I am able to identify which have high and
>> >> low
>> >> values, but I'd really like to be able to distinguish which are >1, <1,
>> >> or
>> >> close to 1 by color.  It would be great if I could set a middle color
>> >> in
>> >> this gradient (eg. green) that is set the the value of 1, even if that
>> >> is
>> >> not the exact midpoint between my highest and lowest values.  Is there
>> >> a
>> >> way to do this in R?
>> >>
>> >> Thank you,
>> >> April
>> >>
>> >>  [[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] incompatible dimensions error

2020-08-25 Thread Jeff King
Hi,

It seems like the package "mvpart" is quite outdated and not available for
the current R release. Since PCA is a very common need I'll suggest finding
a replacement for it so that the error will either go away, or it is easier
for us to reproduce it.

Best,
Jiefei

On Tue, Aug 25, 2020 at 1:06 PM Andrew Halford 
wrote:

> Hi Listers
>
> Using mvpart to run a MV regression tree with PCA= TRUE to get a PCA
> plotted with sites coloured according to the tree output.
>
> Unfortunately it wont produce the PCA, instead giving the error message..
>
> Error in cor(xall, xx[order(tree$where), ]) : incompatible dimensions.
>
> However, when I run a PCA on the data using the rda command I have no
> problems producing a PCA.
>
> data is attached as a text file
>
> my code is thus...
> fish05.hel <- decostand(fish05,"hellinger")
> fish05.mrt <-
>
> mvpart(data.matrix(fish05.hel)~.,env,margin=0.08,cp=0,rsq=TRUE,xv="pick",xval=nrow(fish05),xvmult=100,which=4,pca=TRUE)
>
> The tree is produced no problem but it wont produce a PCA.
>
> I am just keen to understand what this error means as I dont see anything
> unusual about the dataset used, notwithstanding the data is rather messy.
>
> Andy
>
>
>
>
>
> --
> Andrew Halford Ph.D
> Senior Coastal Fisheries Scientist
> Pacific Community | Communauté du Pacifique CPS – B.P. D5 | 98848 Noumea,
> New Caledonia | Nouméa, Nouvelle-Calédonie
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.