Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Boris Steipe
Inline



> On 2019-05-19, at 18:11, Michael Boulineau  
> wrote:
> 
> For context:
> 
>> In gsub(b, "\\1<\\2> ", a) the work is done by the backreferences \\1 and 
>> \\2. The expression says:
>> Substitute ALL of the match with the first captured expression, then " <", 
>> then the second captured expression, then "> ". The rest of the line is >not 
>> substituted and appears as-is.
> 
> Back to me: I guess what's giving me trouble is where to draw the line
> in terms of the end or edge of the expression. Given the code, then,
> 
>> a <- readLines ("hangouts-conversation-6.txt", encoding = "UTF-8")
>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
>> c <- gsub(b, "\\1<\\2> ", a)
> 
> to me, it would seem as though this is the first captured expression,
> that is, as though \\1 refers back to ^([0-9-]{10} [0-9:]{8} ), since
> there are parenthesis around it, or since [0-9-]{10} [0-9:]{8} is
> enclosed in parentheses.

That's correct: parentheses in regular expressions delimit captured substrings.



> Then it would seem as though [*]{3} is the
> second expression, and (\\w+ \\w+) is the third.

Note that "[*]{3}" has no parentheses, is not captured and is not accounted for 
in the back-references.

\\1 and \\2 refers only to the captured substrings - everything else 
contributes to whether the regex matches at all, but is no longer considered 
after the match.

> According to this
> (admittedly wrong) logic, it would seem as though the <> would go
> around the date--like

No:  it goes around \\2, which is (\\w+ \\w+)

> 
>> 2016-03-20 <19:29:37> *** Jane Doe started a video chat
> 
> The back references here recalls Davis's code earlier:
> 
>> sub("^(.{10}) (.{8}) (<.+>) (.+$)", "//1,//2,//3,//4", chrvec)
> 
> There, commas were put around everything, and there you can see the
> edge of the expression very well. ^(.{10}) = first. (.{8}) = second.
> (<.+>) = third. (.+$) = fourth. So, by the same logic, it would seem
> as though in
> 
>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
> 
> that ^([0-9-]{10} [0-9:]{8} ) is first, that [*]{3} is second, and
> that  (\\w+ \\w+) is third.
> 
> But, if Boris is to be right, and he is, obviously, then it would have
> to be the case that this entire thing, namely, ^([0-9-]{10} [0-9:]{8}
> )[*]{3}, is the first expression,

Actually "[*]{3}" is not part of the first expression - it is discarded because 
not in parentheses

> since only if that were true would
> the <> be able to go around the names, as in
> 
> [3] "2016-01-27 09:15:20  Hey "
> 
> Again, so 2016-01-27 09:15:20 would have to be an entire unit, an
> expression.

The word "expression" has a different technical meaning, but colloquially you 
are right.


> So I guess what I don't understand is how ^([0-9-]{10}
> [0-9:]{8} )[*]{3} can be an entire expression, although my hunch would
> be that it has something to do with the ^ or with the space after the
> } and before the (, as in
> 
>> {3} (\\w+
> 

No. Just the parentheses.


> Back to earlier:
> 
>> The rest of the line is not substituted and appears as-is.
> 
> Is that due to the space after the \\2? in
> 
>> "\\1<\\2> 

No, that is because the substitution in gsub() targets only the match of the 
regex - and the string to the end is not part of the regex.


Cheers,
Boris

> Notice space after > and before "
> 
> Michael
> 
> On Sun, May 19, 2019 at 2:31 PM Boris Steipe  wrote:
>> 
>> Inline ...
>> 
>>> On 2019-05-19, at 13:56, Michael Boulineau  
>>> wrote:
>>> 
 b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
>>> 
>>> so the ^ signals that the regex BEGINS with a number (that could be
>>> any number, 0-9) that is only 10 characters long (then there's the
>>> dash in there, too, with the 0-9-, which I assume enabled the regex to
>>> grab the - that's between the numbers in the date)
>> 
>> That's right. Note that within a "character class" the hyphen can have tow 
>> meanings: normally it defines a range of characters, but if it appears as 
>> the last character before "]" it is a literal hyphen.
>> 
>>> , followed by a
>>> single space, followed by a unit that could be any number, again, but
>>> that is only 8 characters long this time. For that one, it will
>>> include the colon, hence the 9:, although for that one ([0-9:]{8} ),
>> 
>> Right.
>> 
>> 
>>> I
>>> don't get why the space is on the inside in that one, after the {8},
>> 
>> The space needs to be preserved between the time and the name. I wrote
>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)" # space in the first 
>> captured expression
>> c <- gsub(b, "\\1<\\2> ", a)
>> ... but I could have written
>> b <- "^([0-9-]{10} [0-9:]{8})[*]{3} (\\w+ \\w+)"
>> c <- gsub(b, "\\1 <\\2> ", a)  # space in the substituted string
>> ... same result
>> 
>> 
>>> whereas the space is on the outside with the other one ^([0-9-]{10} ,
>>> directly after the {10}. Why is that?
>> 
>> In the second case, I capture without a space, because I don't want the 
>> space in the results, after the 

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Michael Boulineau
For context:

> In gsub(b, "\\1<\\2> ", a) the work is done by the backreferences \\1 and 
> \\2. The expression says:
> Substitute ALL of the match with the first captured expression, then " <", 
> then the second captured expression, then "> ". The rest of the line is >not 
> substituted and appears as-is.

Back to me: I guess what's giving me trouble is where to draw the line
in terms of the end or edge of the expression. Given the code, then,

> a <- readLines ("hangouts-conversation-6.txt", encoding = "UTF-8")
> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
> c <- gsub(b, "\\1<\\2> ", a)

to me, it would seem as though this is the first captured expression,
that is, as though \\1 refers back to ^([0-9-]{10} [0-9:]{8} ), since
there are parenthesis around it, or since [0-9-]{10} [0-9:]{8} is
enclosed in parentheses. Then it would seem as though [*]{3} is the
second expression, and (\\w+ \\w+) is the third. According to this
(admittedly wrong) logic, it would seem as though the <> would go
around the date--like

> 2016-03-20 <19:29:37> *** Jane Doe started a video chat

The back references here recalls Davis's code earlier:

> sub("^(.{10}) (.{8}) (<.+>) (.+$)", "//1,//2,//3,//4", chrvec)

There, commas were put around everything, and there you can see the
edge of the expression very well. ^(.{10}) = first. (.{8}) = second.
(<.+>) = third. (.+$) = fourth. So, by the same logic, it would seem
as though in

> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"

that ^([0-9-]{10} [0-9:]{8} ) is first, that [*]{3} is second, and
that  (\\w+ \\w+) is third.

But, if Boris is to be right, and he is, obviously, then it would have
to be the case that this entire thing, namely, ^([0-9-]{10} [0-9:]{8}
)[*]{3}, is the first expression, since only if that were true would
the <> be able to go around the names, as in

[3] "2016-01-27 09:15:20  Hey "

Again, so 2016-01-27 09:15:20 would have to be an entire unit, an
expression. So I guess what I don't understand is how ^([0-9-]{10}
[0-9:]{8} )[*]{3} can be an entire expression, although my hunch would
be that it has something to do with the ^ or with the space after the
} and before the (, as in

> {3} (\\w+

Back to earlier:

> The rest of the line is not substituted and appears as-is.

Is that due to the space after the \\2? in

> "\\1<\\2> "

Notice space after > and before "

Michael

On Sun, May 19, 2019 at 2:31 PM Boris Steipe  wrote:
>
> Inline ...
>
> > On 2019-05-19, at 13:56, Michael Boulineau  
> > wrote:
> >
> >> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
> >
> > so the ^ signals that the regex BEGINS with a number (that could be
> > any number, 0-9) that is only 10 characters long (then there's the
> > dash in there, too, with the 0-9-, which I assume enabled the regex to
> > grab the - that's between the numbers in the date)
>
> That's right. Note that within a "character class" the hyphen can have tow 
> meanings: normally it defines a range of characters, but if it appears as the 
> last character before "]" it is a literal hyphen.
>
> > , followed by a
> > single space, followed by a unit that could be any number, again, but
> > that is only 8 characters long this time. For that one, it will
> > include the colon, hence the 9:, although for that one ([0-9:]{8} ),
>
> Right.
>
>
> > I
> > don't get why the space is on the inside in that one, after the {8},
>
> The space needs to be preserved between the time and the name. I wrote
> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)" # space in the first 
> captured expression
> c <- gsub(b, "\\1<\\2> ", a)
>  ... but I could have written
> b <- "^([0-9-]{10} [0-9:]{8})[*]{3} (\\w+ \\w+)"
> c <- gsub(b, "\\1 <\\2> ", a)  # space in the substituted string
> ... same result
>
>
> > whereas the space is on the outside with the other one ^([0-9-]{10} ,
> > directly after the {10}. Why is that?
>
> In the second case, I capture without a space, because I don't want the space 
> in the results, after the time.
>
>
> >
> > Then three *** [*]{3}, then the (\\w+ \\w+)", which Boris explained so
> > well above. I guess I still don't get why this one seemed to have
> > deleted the *** out of the mix, plus I still don't why it didn't
> > remove the *** from the first one.
>
> Because the entire first line was not matched since it had a malformed 
> character preceding the date.
>
> >
> > 2016-03-20 19:29:37 *** Jane Doe started a video chat
> > 2016-03-20 19:30:35 *** John Doe ended a video chat
> > 2016-04-02 12:59:36 *** Jane Doe started a video chat
> > 2016-04-02 13:00:43 *** John Doe ended a video chat
> > 2016-04-02 13:01:08 *** Jane Doe started a video chat
> > 2016-04-02 13:01:41 *** John Doe ended a video chat
> > 2016-04-02 13:03:51 *** John Doe started a video chat
> > 2016-04-02 13:06:35 *** John Doe ended a video chat
> >
> > This is a random sample from the beginning of the txt file with no
> > edits. The ***s were deleted, all but the first one, the one that had
> > the  but that was taken 

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Boris Steipe
Inline ...

> On 2019-05-19, at 13:56, Michael Boulineau  
> wrote:
> 
>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
> 
> so the ^ signals that the regex BEGINS with a number (that could be
> any number, 0-9) that is only 10 characters long (then there's the
> dash in there, too, with the 0-9-, which I assume enabled the regex to
> grab the - that's between the numbers in the date)

That's right. Note that within a "character class" the hyphen can have tow 
meanings: normally it defines a range of characters, but if it appears as the 
last character before "]" it is a literal hyphen.

> , followed by a
> single space, followed by a unit that could be any number, again, but
> that is only 8 characters long this time. For that one, it will
> include the colon, hence the 9:, although for that one ([0-9:]{8} ),

Right.


> I
> don't get why the space is on the inside in that one, after the {8},

The space needs to be preserved between the time and the name. I wrote
b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)" # space in the first captured 
expression
c <- gsub(b, "\\1<\\2> ", a)
 ... but I could have written
b <- "^([0-9-]{10} [0-9:]{8})[*]{3} (\\w+ \\w+)" 
c <- gsub(b, "\\1 <\\2> ", a)  # space in the substituted string
... same result


> whereas the space is on the outside with the other one ^([0-9-]{10} ,
> directly after the {10}. Why is that?

In the second case, I capture without a space, because I don't want the space 
in the results, after the time.


> 
> Then three *** [*]{3}, then the (\\w+ \\w+)", which Boris explained so
> well above. I guess I still don't get why this one seemed to have
> deleted the *** out of the mix, plus I still don't why it didn't
> remove the *** from the first one.

Because the entire first line was not matched since it had a malformed 
character preceding the date.

> 
> 2016-03-20 19:29:37 *** Jane Doe started a video chat
> 2016-03-20 19:30:35 *** John Doe ended a video chat
> 2016-04-02 12:59:36 *** Jane Doe started a video chat
> 2016-04-02 13:00:43 *** John Doe ended a video chat
> 2016-04-02 13:01:08 *** Jane Doe started a video chat
> 2016-04-02 13:01:41 *** John Doe ended a video chat
> 2016-04-02 13:03:51 *** John Doe started a video chat
> 2016-04-02 13:06:35 *** John Doe ended a video chat
> 
> This is a random sample from the beginning of the txt file with no
> edits. The ***s were deleted, all but the first one, the one that had
> the  but that was taken out by the encoding = "UTF-8". I know that
> the function was c <- gsub(b, "\\1<\\2> ", a), so it had a gsub () on
> there, the point of which is to do substitution work.
> 
> Oh, I get it, I think. The \\1<\\2> in the gsub () puts the <> around
> the names, so that it's consistent with the rest of the data, so that
> the names in the text about that aren't enclosed in the <> are
> enclosed like the rest of them. But I still don't get why or how the
> gsub () replaced the *** with the <>...

In gsub(b, "\\1<\\2> ", a) the work is done by the backreferences \\1 and \\2. 
The expression says:
Substitute ALL of the match with the first captured expression, then " <", then 
the second captured expression, then "> ". The rest of the line is not 
substituted and appears as-is.


> 
> This one is more straightforward.
> 
>> d <- "^([0-9-]{10}) ([0-9:]{8}) <(\\w+ \\w+)>\\s*(.+)$"
> 
> any number with - for 10 characters, followed by a space. Oh, there's
> no space in this one ([0-9:]{8}), after the {8}. Hu. So, then, any
> number with : for 8 characters, followed by any two words separated by
> a space and enclosed in <>. And then the \\s* is followed by a single
> space? Or maybe it puts space on both sides (on the side of the #s to
> the left, and then the comment to the right). The (.+)$ is anything
> whatsoever until the end.

\s is the metacharacter for "whitespace". \s* means zero or more whitespace. 
I'm matching that OUTSIDE of the captured expression, to removes any leading 
spaces from the data that goes into the data frame.


Cheers,
Boris




> 
> Michael
> 
> 
> On Sun, May 19, 2019 at 4:37 AM Boris Steipe  wrote:
>> 
>> Inline
>> 
>> 
>> 
>>> On 2019-05-18, at 20:34, Michael Boulineau  
>>> wrote:
>>> 
>>> It appears to have worked, although there were three little quirks.
>>> The ; close(con); rm(con) didn't work for me; the first row of the
>>> data.frame was all NAs, when all was said and done;
>> 
>> You will get NAs for lines that can't be matched to the regular expression. 
>> That's a good thing, it allows you to test whether your assumptions were 
>> valid for the entire file:
>> 
>> # number of failed strcapture()
>> sum(is.na(e$date))
>> 
>> 
>>> and then there
>>> were still three *** on the same line where the  was apparently
>>> deleted.
>> 
>> This is a sign that something else happened with the line that prevented the 
>> regex from matching. In that case you need to investigate more. I see an 
>> invalid multibyte character at the beginning of the line you posted 

Re: [R] Nested structure data simulation

2019-05-19 Thread Boris Steipe
My mental model for such a simulation is that you create data from a known 
distribution, then use your model to check that you can recover the known 
parameters from the data. Thus how the marks are created depends on what 
influences them. Here is a toy model to illustrate this - expanding on my code 
sample:


# a function to generate marks from parameters
rMarks <- function(n, m, s) {
  # a normal distribution limited to between 1 and 6, in 0.5 intervals, with
  # mean m and standard deviation s
  marks <- rnorm(n, m, s)
  marks <- round(marks * 2) / 2
  marks[marks < 1] <- 1
  marks[marks > 6] <- 6
  return(marks)
}

# Teachers in two categories: 70% of teachers (tNormal) grade everyone 
according to 
# a marks distribution with m = 3.5 and sd = 1 ; the others grade girls with a 
# m = 4.5 and sd = 0.7 and boys with m = 3.0 and sd = 1.2

# define who are the "normal teachers"
x <- paste0("t", 1:(nS * nTpS))
tNormal <- sample(x, round(nS * nTpS * 0.7), replace = FALSE)

# this is rather pedestrian code, but as explicit as I can make it ...
for (i in 1:nrow(mySim)) {
  if (mySim$Teacher[i] %in% tNormal) {
m <- 3.5
s <- 1.0
  } else {
if (mySim$Gender[i] == "girl") {
  m <- 4.5
  s <- 0.7
} else {
  m <- 3.0
  s <- 1.2 
}
  }
  mySim$Mark[i] <- rMarks(1, m, s)
}

# Validate
table(mySim$Mark)
hist(mySim$Mark[mySim$Teacher %in% tNormal],
 col = "#BB44")
hist(mySim$Mark[ ! mySim$Teacher %in% tNormal],
 add = TRUE,
 col = "#BB44")

Then the challenge is to recover the parameters from your analysis. 


Cheers,
Boris



> On 2019-05-19, at 11:14, varin sacha  wrote:
> 
> Dear Boris,
> 
> Great  But what about Mark in your R code ? Don't we have to precise in 
> the R code that mark ranges between 1 to 6 (1 ; 1.5 ; 2 ; 2.5 ; 3 ; 3.5 ; 4 ; 
> 4.5 ; 5 ; 5.5 ; 6) ?
> 
> By the way, to fit a linear mixed model, I use lme4 package and then the lmer 
> function works with the variables like in this example here below :
> 
> library(lme4)
> mm=lmer(Mark ~Gender + (1 | School / Class), data=Dataset) 
> 
> With your R code, how can I write the lmer function to make it work ?
> 
> Best,
> S.
> 
> 
> 
> 
> 
> 
> 
> Le dimanche 19 mai 2019 à 15:26:39 UTC+2, Boris Steipe 
>  a écrit : 
> 
> 
> 
> 
> 
> Fair enough - there are additional assumptions needed, which I make as 
> follows:
>   - each class has the same size
>   - each teacher teaches the same number of classes
>   - the number of boys and girls is random within a class
>   - there are 60% girls  (just for illustration that it does not have to be 
> equal)
>   
> 
> To make the dependencies explicit, I define them so, and in a way that they 
> can't be inconsistent.
> 
> nS <- 10# Schools
> nTpS <- 5  # Teachers per School
> nCpT <- 2  # Classes per teacher
> nPpC <- 20  # Pupils per class
> nS * nTpS * nCpT * nPpC == 2000  # Validate
> 
> 
> mySim <- data.frame(School  = paste0("s", rep(1:nS, each = nTpS*nCpT*nPpC)),
> Teacher = paste0("t", rep(1:(nTpS*nS), each = nCpT*nPpC)),
> Class  = paste0("c", rep(1:(nCpT*nTpS*nS), each = nPpC)),
> Gender  = sample(c("boy", "girl"),
> (nS*nTpS*nCpT*nPpC),
> prob = c(0.4, 0.6),
> replace = TRUE),
> Mark= numeric(nS*nTpS*nCpT*nPpC),
> stringsAsFactors = FALSE)
> 
> 
> Then you fill mySim$Mark with values from your linear mixed model ...
> 
> mySim$Mark[i] <- simMarks(mySim[i])  # ... or something equivalent.
> 
> 
> All good?
> 
> Cheers,
> Boris
> 
> 
> 
>> On 2019-05-19, at 08:05, varin sacha  wrote:
>> 
>> Many thanks to all of you for your responses.
>> 
>> So, I will try to be clearer with a larger example. Te end of my mail is the 
>> more important to understand what I am trying to do. I am trying to simulate 
>> data to fit a linear mixed model (nested not crossed). More precisely, I 
>> would love to get at the end of the process, a table (.txt) with columns and 
>> rows. Column 1 and Rows will be the 2000 pupils and the columns the 
>> different variables : Column 2 = classes ; Column 3 = teachers, Column 4 = 
>> schools ; Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
>> 
>> Pupils are nested  in classes, classes are nested in schools. The teacher 
>> are part of the process.
>> 
>> I want to simulate a dataset with n=2000 pupils, 100 classes, 50 teachers 
>> and 10 schools.
>> - Pupils n°1 to pupils n°2000 (p1, p2, p3, p4, ..., p2000)
>> - Classes n°1 to classes n°100 (c1, c2, c3, c4,..., c100)
>> - Teachers n°1 to teacher n°50 ( t1, t2, t3, t4, ..., t50)
>> - Schools n°1 to chool n°10 (s1, s2, s3, s4, ..., s10)
>> 
>> The nested structure is as followed : 
>> 
>> -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with classes 
>> 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, 

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Michael Boulineau
> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"

so the ^ signals that the regex BEGINS with a number (that could be
any number, 0-9) that is only 10 characters long (then there's the
dash in there, too, with the 0-9-, which I assume enabled the regex to
grab the - that's between the numbers in the date), followed by a
single space, followed by a unit that could be any number, again, but
that is only 8 characters long this time. For that one, it will
include the colon, hence the 9:, although for that one ([0-9:]{8} ), I
don't get why the space is on the inside in that one, after the {8},
whereas the space is on the outside with the other one ^([0-9-]{10} ,
directly after the {10}. Why is that?

Then three *** [*]{3}, then the (\\w+ \\w+)", which Boris explained so
well above. I guess I still don't get why this one seemed to have
deleted the *** out of the mix, plus I still don't why it didn't
remove the *** from the first one.

2016-03-20 19:29:37 *** Jane Doe started a video chat
2016-03-20 19:30:35 *** John Doe ended a video chat
2016-04-02 12:59:36 *** Jane Doe started a video chat
2016-04-02 13:00:43 *** John Doe ended a video chat
2016-04-02 13:01:08 *** Jane Doe started a video chat
2016-04-02 13:01:41 *** John Doe ended a video chat
2016-04-02 13:03:51 *** John Doe started a video chat
2016-04-02 13:06:35 *** John Doe ended a video chat

This is a random sample from the beginning of the txt file with no
edits. The ***s were deleted, all but the first one, the one that had
the  but that was taken out by the encoding = "UTF-8". I know that
the function was c <- gsub(b, "\\1<\\2> ", a), so it had a gsub () on
there, the point of which is to do substitution work.

Oh, I get it, I think. The \\1<\\2> in the gsub () puts the <> around
the names, so that it's consistent with the rest of the data, so that
the names in the text about that aren't enclosed in the <> are
enclosed like the rest of them. But I still don't get why or how the
gsub () replaced the *** with the <>...

This one is more straightforward.

> d <- "^([0-9-]{10}) ([0-9:]{8}) <(\\w+ \\w+)>\\s*(.+)$"

any number with - for 10 characters, followed by a space. Oh, there's
no space in this one ([0-9:]{8}), after the {8}. Hu. So, then, any
number with : for 8 characters, followed by any two words separated by
a space and enclosed in <>. And then the \\s* is followed by a single
space? Or maybe it puts space on both sides (on the side of the #s to
the left, and then the comment to the right). The (.+)$ is anything
whatsoever until the end.

Michael


On Sun, May 19, 2019 at 4:37 AM Boris Steipe  wrote:
>
> Inline
>
>
>
> > On 2019-05-18, at 20:34, Michael Boulineau  
> > wrote:
> >
> > It appears to have worked, although there were three little quirks.
> > The ; close(con); rm(con) didn't work for me; the first row of the
> > data.frame was all NAs, when all was said and done;
>
> You will get NAs for lines that can't be matched to the regular expression. 
> That's a good thing, it allows you to test whether your assumptions were 
> valid for the entire file:
>
> # number of failed strcapture()
> sum(is.na(e$date))
>
>
> > and then there
> > were still three *** on the same line where the  was apparently
> > deleted.
>
> This is a sign that something else happened with the line that prevented the 
> regex from matching. In that case you need to investigate more. I see an 
> invalid multibyte character at the beginning of the line you posted below.
>
> >
> >> a <- readLines ("hangouts-conversation-6.txt", encoding = "UTF-8")
> >> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
> >> c <- gsub(b, "\\1<\\2> ", a)
> >> head (c)
> > [1] "2016-01-27 09:14:40 *** Jane Doe started a video chat"
> > [2] "2016-01-27 09:15:20 
> > https://lh3.googleusercontent.com/-_WQF5kRcnpk/Vqj7J4aK1jI/AVA/GVqutPqbSuo/s0/be8ded30-87a6-4e80-bdfa-83ed51591dbf;
>
> [...]
>
> > But, before I do anything else, I'm going to study the regex in this
> > particular code. For example, I'm still not sure why there has to the
> > second \\w+ in the (\\w+ \\w+). Little things like that.
>
> \w is the metacharacter for alphanumeric characters, \w+ designates something 
> we could call a word. Thus \w+ \w+ are two words separated by a single blank. 
> This corresponds to your example, but, as I wrote previously, you need to 
> think very carefully whether this covers all possible cases (Could there be 
> only one word? More than one blank? Could letters be separated by hyphens or 
> periods?) In most cases we could have more robustly matched everything 
> between "<" and ">" (taking care to test what happens if the message contains 
> those characters). But for the video chat lines we need to make an assumption 
> about what is name and what is not. If "started a video chat" is the only 
> possibility in such lines, you can use this information instead. If there are 
> other possibilities, you need a different strategy. In NLP there is no 
> one-approach-fits-all.

Re: [R] Nested structure data simulation

2019-05-19 Thread varin sacha via R-help
Dear Boris,

Great  But what about Mark in your R code ? Don't we have to precise in the 
R code that mark ranges between 1 to 6 (1 ; 1.5 ; 2 ; 2.5 ; 3 ; 3.5 ; 4 ; 4.5 ; 
5 ; 5.5 ; 6) ?

By the way, to fit a linear mixed model, I use lme4 package and then the lmer 
function works with the variables like in this example here below :

library(lme4)
mm=lmer(Mark ~Gender + (1 | School / Class), data=Dataset) 

With your R code, how can I write the lmer function to make it work ?

Best,
S.







Le dimanche 19 mai 2019 à 15:26:39 UTC+2, Boris Steipe 
 a écrit : 





Fair enough - there are additional assumptions needed, which I make as follows:
  - each class has the same size
  - each teacher teaches the same number of classes
  - the number of boys and girls is random within a class
  - there are 60% girls  (just for illustration that it does not have to be 
equal)
  

To make the dependencies explicit, I define them so, and in a way that they 
can't be inconsistent.

nS <- 10        # Schools
nTpS <- 5      # Teachers per School
nCpT <- 2      # Classes per teacher
nPpC <- 20      # Pupils per class
nS * nTpS * nCpT * nPpC == 2000  # Validate


mySim <- data.frame(School  = paste0("s", rep(1:nS, each = nTpS*nCpT*nPpC)),
                    Teacher = paste0("t", rep(1:(nTpS*nS), each = nCpT*nPpC)),
                    Class  = paste0("c", rep(1:(nCpT*nTpS*nS), each = nPpC)),
                    Gender  = sample(c("boy", "girl"),
                                    (nS*nTpS*nCpT*nPpC),
                                    prob = c(0.4, 0.6),
                                    replace = TRUE),
                    Mark    = numeric(nS*nTpS*nCpT*nPpC),
                    stringsAsFactors = FALSE)
                    

Then you fill mySim$Mark with values from your linear mixed model ...

mySim$Mark[i] <- simMarks(mySim[i])  # ... or something equivalent.


All good?

Cheers,
Boris



> On 2019-05-19, at 08:05, varin sacha  wrote:
> 
> Many thanks to all of you for your responses.
> 
> So, I will try to be clearer with a larger example. Te end of my mail is the 
> more important to understand what I am trying to do. I am trying to simulate 
> data to fit a linear mixed model (nested not crossed). More precisely, I 
> would love to get at the end of the process, a table (.txt) with columns and 
> rows. Column 1 and Rows will be the 2000 pupils and the columns the different 
> variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ; 
> Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
> 
> Pupils are nested  in classes, classes are nested in schools. The teacher are 
> part of the process.
> 
> I want to simulate a dataset with n=2000 pupils, 100 classes, 50 teachers and 
> 10 schools.
> - Pupils n°1 to pupils n°2000 (p1, p2, p3, p4, ..., p2000)
> - Classes n°1 to classes n°100 (c1, c2, c3, c4,..., c100)
> - Teachers n°1 to teacher n°50 ( t1, t2, t3, t4, ..., t50)
> - Schools n°1 to chool n°10 (s1, s2, s3, s4, ..., s10)
> 
> The nested structure is as followed : 
> 
> -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with classes 
> 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n°1 to pupils 
> n°200 (p1, p2, p3, p4,..., p200).
> 
> -- School 2 with teacher 6 to teacher 10, with classes 11 to classes 20, 
> pupils n°201 to pupils n°400
> 
> -- and so on
> 
> The table (.txt) I would love to get at the end is the following :
> 
>        Class    Teacher    School    gender    Mark
> 1      c1        t1                s1            boy        5
> 2      c1        t1                s1            boy        5.5
> 3      c1        t1                s1            girl        4.5
> 4      c1        t1                s1            girl        6
> 5      c1        t1                s1            boy      3.5
> 6      ...                                    .        .      
>         
> 
> The first 20 rows with c1, with t1, with s1, gender (randomly slected) and 
> mark (andomly selected) from 1 to 6
> The rows 21 to 40 with c2 with t1 with s1
> The rows 41 to 60 with c3 with t2 with s1
> The rows 61 to 80 with c4 with t2 with s1
> The rows 81 to 100 with c5 with t3 with s1
> The rows 101 to 120 with c6 with t3 with s1
> The rows 121 to 140 with c7 with t4 with s1
> The rows 141 to 160 with c8 with t4 with s1
> The rows 161 to 180 with c9 with t5 with s1
> The rows 181 to 200 with c10 with t5 with s1
> 
> The rows 201 to 220 with c11 with t6 with s2
> The rows 221 to 240 with c12 with t6 with s2
> 
> And so on...
> 
> Is it possible to do that ? Or am I dreaming ?
> 
> 
> Le dimanche 19 mai 2019 à 10:45:43 UTC+2, Linus Chen  
> a écrit : 
> 
> 
> 
> 
> 
> Dear varin sacha,
> 
> I think it will help us help you, if you give a clearer description of
> what exactly you want.
> 
> I assume the situation is that you know what a data structure you
> want, but do not know
> how to conveniently create such structure.
> And that is where others can 

Re: [R] Nested structure data simulation

2019-05-19 Thread Boris Steipe
Fair enough - there are additional assumptions needed, which I make as follows:
  - each class has the same size
  - each teacher teaches the same number of classes
  - the number of boys and girls is random within a class
  - there are 60% girls   (just for illustration that it does not have to be 
equal)
  

To make the dependencies explicit, I define them so, and in a way that they 
can't be inconsistent.

nS <- 10# Schools
nTpS <- 5   # Teachers per School
nCpT <- 2   # Classes per teacher
nPpC <- 20  # Pupils per class
nS * nTpS * nCpT * nPpC == 2000   # Validate


mySim <- data.frame(School  = paste0("s", rep(1:nS, each = nTpS*nCpT*nPpC)),
Teacher = paste0("t", rep(1:(nTpS*nS), each = nCpT*nPpC)),
Class   = paste0("c", rep(1:(nCpT*nTpS*nS), each = nPpC)),
Gender  = sample(c("boy", "girl"),
 (nS*nTpS*nCpT*nPpC),
 prob = c(0.4, 0.6),
 replace = TRUE),
Mark= numeric(nS*nTpS*nCpT*nPpC),
stringsAsFactors = FALSE)


Then you fill mySim$Mark with values from your linear mixed model ...

mySim$Mark[i] <- simMarks(mySim[i])  # ... or something equivalent.


All good?

Cheers,
Boris



> On 2019-05-19, at 08:05, varin sacha  wrote:
> 
> Many thanks to all of you for your responses.
> 
> So, I will try to be clearer with a larger example. Te end of my mail is the 
> more important to understand what I am trying to do. I am trying to simulate 
> data to fit a linear mixed model (nested not crossed). More precisely, I 
> would love to get at the end of the process, a table (.txt) with columns and 
> rows. Column 1 and Rows will be the 2000 pupils and the columns the different 
> variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ; 
> Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
> 
> Pupils are nested  in classes, classes are nested in schools. The teacher are 
> part of the process.
> 
> I want to simulate a dataset with n=2000 pupils, 100 classes, 50 teachers and 
> 10 schools.
> - Pupils n°1 to pupils n°2000 (p1, p2, p3, p4, ..., p2000)
> - Classes n°1 to classes n°100 (c1, c2, c3, c4,..., c100)
> - Teachers n°1 to teacher n°50 ( t1, t2, t3, t4, ..., t50)
> - Schools n°1 to chool n°10 (s1, s2, s3, s4, ..., s10)
> 
> The nested structure is as followed : 
> 
> -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with classes 
> 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n°1 to pupils 
> n°200 (p1, p2, p3, p4,..., p200).
> 
> -- School 2 with teacher 6 to teacher 10, with classes 11 to classes 20, 
> pupils n°201 to pupils n°400
> 
> -- and so on
> 
> The table (.txt) I would love to get at the end is the following :
> 
> ClassTeacherSchoolgenderMark
> 1   c1t1s1boy5
> 2   c1t1s1boy5.5
> 3   c1t1s1girl4.5
> 4   c1t1s1girl6
> 5   c1t1s1boy   3.5
> 6   ..... 
>   
> 
> The first 20 rows with c1, with t1, with s1, gender (randomly slected) and 
> mark (andomly selected) from 1 to 6
> The rows 21 to 40 with c2 with t1 with s1
> The rows 41 to 60 with c3 with t2 with s1
> The rows 61 to 80 with c4 with t2 with s1
> The rows 81 to 100 with c5 with t3 with s1
> The rows 101 to 120 with c6 with t3 with s1
> The rows 121 to 140 with c7 with t4 with s1
> The rows 141 to 160 with c8 with t4 with s1
> The rows 161 to 180 with c9 with t5 with s1
> The rows 181 to 200 with c10 with t5 with s1
> 
> The rows 201 to 220 with c11 with t6 with s2
> The rows 221 to 240 with c12 with t6 with s2
> 
> And so on...
> 
> Is it possible to do that ? Or am I dreaming ?
> 
> 
> Le dimanche 19 mai 2019 à 10:45:43 UTC+2, Linus Chen  
> a écrit : 
> 
> 
> 
> 
> 
> Dear varin sacha,
> 
> I think it will help us help you, if you give a clearer description of
> what exactly you want.
> 
> I assume the situation is that you know what a data structure you
> want, but do not know
> how to conveniently create such structure.
> And that is where others can help you.
> So, please, describe the wanted data structure more thoroughly,
> ideally with example.
> 
> Thanks,
> Lei
> 
> On Sat, May 18, 2019 at 10:04 PM varin sacha via R-help
>  wrote:
>> 
>> Dear Boris,
>> 
>> Yes, top-down, no problem. Many thanks, but in your code did you not forget 
>> "teacher" ? As a reminder teacher has to be nested with classes. I mean the 
>> 50 pupils belonging to C1 must be with (teacher 1) T1, the 50 pupils 
>> belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so on.
>> 
>> Best,
>> 
>> 
>> Le samedi 18 

[R-es] Parameterización de modelo mixto (multilevel)

2019-05-19 Thread Manuel Spínola
Estimados integrantes de la lista.

Disculpas por posteo cruzado.

Estoy ajustando un modelo con lmer (lm4).

La variable respuesta es un índice (ADI) que se midió en 3 áreas diferentes
en 4 estaciones climáticas diferentes, así mis efectos fijos son area y
estaciones climáticas.

Cada área tiene 12 puntos de muestreo (Punto).

En cada punto de muestreo el índice se midió repetidamente a diferentes
horas, diferentes días y diferentes meses dentro de cada área y estación,
pero no estoy interesado en la evolución del índice a través del tiempo.

MI didea de parameterización son éstas:

mod_adi_01 <- lmer(ADI ~  area + estacion + (1 | Punto/Mes/Dia/Hora), data
= df_02, REML = FALSE)

mod_adi_02 <- lmer(ADI ~  area + estacion + (1 | Punto) + (1
|Mes/Dia/Hora), data = df_02, REML = FALSE)

Hay un de estas 2 alternativas correcta, o ninguna de las 2 lo son?

Muchas gracias,

Manuel

-- 
*Manuel Spínola, Ph.D.*
Instituto Internacional en Conservación y Manejo de Vida Silvestre
Universidad Nacional
Apartado 1350-3000
Heredia
COSTA RICA
mspin...@una.cr 
mspinol...@gmail.com
Teléfono: (506) 8706 - 4662
Personal website: Lobito de río 
Institutional website: ICOMVIS 

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] Nested structure data simulation

2019-05-19 Thread varin sacha via R-help
Many thanks to all of you for your responses.

So, I will try to be clearer with a larger example. Te end of my mail is the 
more important to understand what I am trying to do. I am trying to simulate 
data to fit a linear mixed model (nested not crossed). More precisely, I would 
love to get at the end of the process, a table (.txt) with columns and rows. 
Column 1 and Rows will be the 2000 pupils and the columns the different 
variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ; 
Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh

Pupils are nested  in classes, classes are nested in schools. The teacher are 
part of the process.

I want to simulate a dataset with n=2000 pupils, 100 classes, 50 teachers and 
10 schools.
- Pupils n°1 to pupils n°2000 (p1, p2, p3, p4, ..., p2000)
- Classes n°1 to classes n°100 (c1, c2, c3, c4,..., c100)
- Teachers n°1 to teacher n°50 ( t1, t2, t3, t4, ..., t50)
- Schools n°1 to chool n°10 (s1, s2, s3, s4, ..., s10)

The nested structure is as followed : 

-- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with classes 1 
to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n°1 to pupils 
n°200 (p1, p2, p3, p4,..., p200).

-- School 2 with teacher 6 to teacher 10, with classes 11 to classes 20, pupils 
n°201 to pupils n°400

-- and so on

The table (.txt) I would love to get at the end is the following :

    ClassTeacher    School    gender    Mark
1   c1    t1    s1    boy    5
2   c1t1s1boy5.5
3   c1t1s1girl4.5
4   c1t1s1girl6
5   c1t1s1boy   3.5
6   ...            .    .   
    

The first 20 rows with c1, with t1, with s1, gender (randomly slected) and mark 
(andomly selected) from 1 to 6
The rows 21 to 40 with c2 with t1 with s1
The rows 41 to 60 with c3 with t2 with s1
The rows 61 to 80 with c4 with t2 with s1
The rows 81 to 100 with c5 with t3 with s1
The rows 101 to 120 with c6 with t3 with s1
The rows 121 to 140 with c7 with t4 with s1
The rows 141 to 160 with c8 with t4 with s1
The rows 161 to 180 with c9 with t5 with s1
The rows 181 to 200 with c10 with t5 with s1

The rows 201 to 220 with c11 with t6 with s2
The rows 221 to 240 with c12 with t6 with s2

And so on...

Is it possible to do that ? Or am I dreaming ?


Le dimanche 19 mai 2019 à 10:45:43 UTC+2, Linus Chen  a 
écrit : 





Dear varin sacha,

I think it will help us help you, if you give a clearer description of
what exactly you want.

I assume the situation is that you know what a data structure you
want, but do not know
how to conveniently create such structure.
And that is where others can help you.
So, please, describe the wanted data structure more thoroughly,
ideally with example.

Thanks,
Lei

On Sat, May 18, 2019 at 10:04 PM varin sacha via R-help
 wrote:
>
> Dear Boris,
>
> Yes, top-down, no problem. Many thanks, but in your code did you not forget 
> "teacher" ? As a reminder teacher has to be nested with classes. I mean the 
> 50 pupils belonging to C1 must be with (teacher 1) T1, the 50 pupils 
> belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so on.
>
> Best,
>
>
> Le samedi 18 mai 2019 à 16:52:48 UTC+2, Boris Steipe 
>  a écrit :
>
>
>
>
>
> Can you build your data top-down?
>
>
>
> schools <- paste("s", 1:6, sep="")
>
> classes <- character()
> for (school in schools) {
>  classes <- c(classes, paste(school, paste("c", 1:5, sep=""), sep = "."))
> }
>
> pupils <- character()
> for (class in classes) {
>  pupils <- c(pupils, paste(class, paste("p", 1:10, sep=""), sep = "."))
> }
>
>
>
> B.
>
>
>
> > On 2019-05-18, at 09:57, varin sacha via R-help  
> > wrote:
> >
> > Dear R-Experts,
> >
> > In a data simulation, I would like a balanced distribution with a nested 
> > structure for classroom and teacher (not for school). I mean 50 pupils 
> > belonging to C1, 50 other pupils belonging to C2, 50 other pupils belonging 
> > to C3 and so on. Then I want the 50 pupils belonging to C1 with T1, the 50 
> > pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and 
> > so on. The school don’t have to be nested, I just want a balanced 
> > distribution, I mean 60 pupils in S1, 60 other pupils in S2 and so on.
> > Here below the reproducible example.
> > Many thanks for your help.
> >
> > ##
> > set.seed(123)
> > # Génération aléatoire des colonnes
> > pupils<-1:300
> > classroom<-sample(c("C1","C2","C3","C4","C5","C6"),300,replace=T)  
> > teacher<-sample(c("T1","T2","T3","T4","T5","T6"),300,replace=T)  
> > school<-sample(c("S1","S2","S3","S4","S5"),300,replace=T)
>
> > ##
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > 

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Boris Steipe
Inline



> On 2019-05-18, at 20:34, Michael Boulineau  
> wrote:
> 
> It appears to have worked, although there were three little quirks.
> The ; close(con); rm(con) didn't work for me; the first row of the
> data.frame was all NAs, when all was said and done;

You will get NAs for lines that can't be matched to the regular expression. 
That's a good thing, it allows you to test whether your assumptions were valid 
for the entire file:

# number of failed strcapture()
sum(is.na(e$date))


> and then there
> were still three *** on the same line where the  was apparently
> deleted.

This is a sign that something else happened with the line that prevented the 
regex from matching. In that case you need to investigate more. I see an 
invalid multibyte character at the beginning of the line you posted below.

> 
>> a <- readLines ("hangouts-conversation-6.txt", encoding = "UTF-8")
>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
>> c <- gsub(b, "\\1<\\2> ", a)
>> head (c)
> [1] "2016-01-27 09:14:40 *** Jane Doe started a video chat"
> [2] "2016-01-27 09:15:20 
> https://lh3.googleusercontent.com/-_WQF5kRcnpk/Vqj7J4aK1jI/AVA/GVqutPqbSuo/s0/be8ded30-87a6-4e80-bdfa-83ed51591dbf;

[...]

> But, before I do anything else, I'm going to study the regex in this
> particular code. For example, I'm still not sure why there has to the
> second \\w+ in the (\\w+ \\w+). Little things like that.

\w is the metacharacter for alphanumeric characters, \w+ designates something 
we could call a word. Thus \w+ \w+ are two words separated by a single blank. 
This corresponds to your example, but, as I wrote previously, you need to think 
very carefully whether this covers all possible cases (Could there be only one 
word? More than one blank? Could letters be separated by hyphens or periods?) 
In most cases we could have more robustly matched everything between "<" and 
">" (taking care to test what happens if the message contains those 
characters). But for the video chat lines we need to make an assumption about 
what is name and what is not. If "started a video chat" is the only possibility 
in such lines, you can use this information instead. If there are other 
possibilities, you need a different strategy. In NLP there is no 
one-approach-fits-all.

To validate the structure of the names in your transcripts, you can look at

patt <- " <.+?> "   # "  "
m <- regexpr(patt, c)
unique(regmatches(c, m))



B.



> 
> Michael
> 
> 
> On Sat, May 18, 2019 at 4:30 PM Boris Steipe  wrote:
>> 
>> This works for me:
>> 
>> # sample data
>> c <- character()
>> c[1] <- "2016-01-27 09:14:40  started a video chat"
>> c[2] <- "2016-01-27 09:15:20  https://lh3.googleusercontent.com/;
>> c[3] <- "2016-01-27 09:15:20  Hey "
>> c[4] <- "2016-01-27 09:15:22   ended a video chat"
>> c[5] <- "2016-01-27 21:07:11   started a video chat"
>> c[6] <- "2016-01-27 21:26:57   ended a video chat"
>> 
>> 
>> # regex  ^(year)   (time)  <(word word)>\\s*(string)$
>> patt <- "^([0-9-]{10}) ([0-9:]{8}) <(\\w+ \\w+)>\\s*(.+)$"
>> proto <- data.frame(date = character(),
>>time = character(),
>>name = character(),
>>text = character(),
>>stringsAsFactors = TRUE)
>> d <- strcapture(patt, c, proto)
>> 
>> 
>> 
>>date time name   text
>> 1 2016-01-27 09:14:40 Jane Doe   started a video chat
>> 2 2016-01-27 09:15:20 Jane Doe https://lh3.googleusercontent.com/
>> 3 2016-01-27 09:15:20 Jane Doe   Hey
>> 4 2016-01-27 09:15:22 John Doe ended a video chat
>> 5 2016-01-27 21:07:11 Jane Doe   started a video chat
>> 6 2016-01-27 21:26:57 John Doe ended a video chat
>> 
>> 
>> 
>> B.
>> 
>> 
>>> On 2019-05-18, at 18:32, Michael Boulineau  
>>> wrote:
>>> 
>>> Going back and thinking through what Boris and William were saying
>>> (also Ivan), I tried this:
>>> 
>>> a <- readLines ("hangouts-conversation-6.csv.txt")
>>> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)"
>>> c <- gsub(b, "\\1<\\2> ", a)
 head (c)
>>> [1] "2016-01-27 09:14:40 *** Jane Doe started a video chat"
>>> [2] "2016-01-27 09:15:20 
>>> https://lh3.googleusercontent.com/-_WQF5kRcnpk/Vqj7J4aK1jI/AVA/GVqutPqbSuo/s0/be8ded30-87a6-4e80-bdfa-83ed51591dbf;
>>> [3] "2016-01-27 09:15:20  Hey "
>>> [4] "2016-01-27 09:15:22   ended a video chat"
>>> [5] "2016-01-27 21:07:11   started a video chat"
>>> [6] "2016-01-27 21:26:57   ended a video chat"
>>> 
>>> The  is still there, since I forgot to do what Ivan had suggested, 
>>> namely,
>>> 
>>> a <- readLines(con <- file("hangouts-conversation-6.csv.txt", encoding
>>> = "UTF-8")); close(con); rm(con)
>>> 
>>> But then the new code is still turning out only NAs when I apply
>>> strcapture (). This was what happened next:
>>> 
 d <- strcapture("^([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}
>>> + 

Re: [R] Help understanding the relationship between R-3.6.0 and RStudio

2019-05-19 Thread Bill Poling
Good morning, I will head your advice, good to know, thank you Peter.

WHP


From: peter dalgaard 
Sent: Saturday, May 18, 2019 4:44 AM
To: Bill Poling 
Cc: Marc Schwartz ; R-help 
Subject: Re: [R] Help understanding the relationship between R-3.6.0 and RStudio

Actually, you might go for 3.6.0-patched. There was a somewhat annoying bug 
affecting the package installation menu in 3.6.0.

-pd

> On 17 May 2019, at 20:32 , Bill Poling  wrote:
>
> I fixed it by removing previous versions as suggested.
>
>> sessionInfo()
> R version 3.6.0 RC (2019-04-24 r76423)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 17134)
>
> I will have to go out and get the non RC version now.
>
> Thank you.
>
> WHP
>
> From: Marc Schwartz 
> Sent: Friday, May 17, 2019 2:14 PM
> To: Bill Poling 
> Cc: R-help 
> Subject: Re: [R] Help understanding the relationship between R-3.6.0 and 
> RStudio
>
>
>
>> On May 17, 2019, at 2:02 PM, Bill Poling  
>> wrote:
>>
>> Hello.
>>
>> I do not think I have had this problem (assuming it is a problem) in the 
>> past.
>>
>> I downloaded and installed R3.6.0 which is indicted in the console when I 
>> open R itself.
>>
>> R version 3.6.0 RC (2019-04-24 r76423) -- "Planting of a Tree"
>> Copyright (C) 2019 The R Foundation for Statistical Computing
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> However, in RStudio the sessionInfo() remains
>>
>> R version 3.5.3 (2019-03-11)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 10 x64 (build 17134)
>>
>> I also installed the latest version of RStudio 1.2.1335 as well "after" 
>> installing R 3.6.0.
>>
>> I also rebooted my computer.
>>
>> I am not sure why this time the two do not seem to be (for lack of a better 
>> word) in sink?
>>
>> Thank you for any insight
>>
>> WHP
>
>
> Hi,
>
> I don't use RStudio, which is a GUI/IDE on top of R, it is not R.
>
> That being said, a quick Google search supports my intuition, which is that 
> RStudio appears to be able to support multiple R version installations:
>
> https://support.rstudio.com/hc/en-us/articles/200486138-Changing-R-versions-for-RStudio-desktop
>
> RStudio also has their own support venue:
>
> https://support.rstudio.com/hc/en-us
>
> If I read correctly, it looks like you actually installed a "Release 
> Candidate" (RC) version of 3.6.0 for Windows. So you probably want to visit a 
> CRAN mirror and download the release version of 3.6.0:
>
> R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
>
> If you do not want to have multiple R versions on your computer, you can use 
> the normal Windows application uninstall process to remove the older 
> version(s).
>
> Regards,
>
> Marc Schwartz
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> __
> mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: mailto:pd@cbs.dk Priv: mailto:pda...@gmail.com








Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.