Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Boris Steipe
Inline > On 2019-05-19, at 18:11, Michael Boulineau > wrote: > > For context: > >> In gsub(b, "\\1<\\2> ", a) the work is done by the backreferences \\1 and >> \\2. The expression says: >> Substitute ALL of the match with the first captured expression, then " <", >> then the second

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Michael Boulineau
For context: > In gsub(b, "\\1<\\2> ", a) the work is done by the backreferences \\1 and > \\2. The expression says: > Substitute ALL of the match with the first captured expression, then " <", > then the second captured expression, then "> ". The rest of the line is >not > substituted and

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Boris Steipe
Inline ... > On 2019-05-19, at 13:56, Michael Boulineau > wrote: > >> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)" > > so the ^ signals that the regex BEGINS with a number (that could be > any number, 0-9) that is only 10 characters long (then there's the > dash in there, too, with the

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Michael Boulineau
> b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)" so the ^ signals that the regex BEGINS with a number (that could be any number, 0-9) that is only 10 characters long (then there's the dash in there, too, with the 0-9-, which I assume enabled the regex to grab the - that's between the numbers

Re: [R] how to separate string from numbers in a large txt file

2019-05-19 Thread Boris Steipe
Inline > On 2019-05-18, at 20:34, Michael Boulineau > wrote: > > It appears to have worked, although there were three little quirks. > The ; close(con); rm(con) didn't work for me; the first row of the > data.frame was all NAs, when all was said and done; You will get NAs for lines that

Re: [R] how to separate string from numbers in a large txt file

2019-05-18 Thread Michael Boulineau
It appears to have worked, although there were three little quirks. The ; close(con); rm(con) didn't work for me; the first row of the data.frame was all NAs, when all was said and done; and then there were still three *** on the same line where the  was apparently deleted. > a <- readLines

Re: [R] how to separate string from numbers in a large txt file

2019-05-18 Thread Boris Steipe
This works for me: # sample data c <- character() c[1] <- "2016-01-27 09:14:40 started a video chat" c[2] <- "2016-01-27 09:15:20 https://lh3.googleusercontent.com/; c[3] <- "2016-01-27 09:15:20 Hey " c[4] <- "2016-01-27 09:15:22 ended a video chat" c[5] <- "2016-01-27 21:07:11 started a

Re: [R] how to separate string from numbers in a large txt file

2019-05-18 Thread Michael Boulineau
Going back and thinking through what Boris and William were saying (also Ivan), I tried this: a <- readLines ("hangouts-conversation-6.csv.txt") b <- "^([0-9-]{10} [0-9:]{8} )[*]{3} (\\w+ \\w+)" c <- gsub(b, "\\1<\\2> ", a) > head (c) [1] "2016-01-27 09:14:40 *** Jane Doe started a video chat"

Re: [R] how to separate string from numbers in a large txt file

2019-05-17 Thread Boris Steipe
Don't start putting in extra commas and then reading this as csv. That approach is broken. The correct approach is what Bill outlined: read everything with readLines(), and then use a proper regular expression with strcapture(). You need to pre-process the object that readLines() gives you:

Re: [R] how to separate string from numbers in a large txt file

2019-05-17 Thread Michael Boulineau
Very interesting. I'm sure I'll be trying to get rid of the byte order mark eventually. But right now, I'm more worried about getting the character vector into either a csv file or data.frame; that way, I can be able to work with the data neatly tabulated into four columns: date, time, person,

Re: [R] how to separate string from numbers in a large txt file

2019-05-17 Thread Jeff Newmiller
If byte order mark is the issue then you can specify the file encoding as "UTF-8-BOM" and it won't show up in your data any more. On May 17, 2019 12:12:17 PM PDT, William Dunlap via R-help wrote: >The pattern I gave worked for the lines that you originally showed from >the >data file ('a'),

Re: [R] how to separate string from numbers in a large txt file

2019-05-17 Thread Ivan Krylov
On Fri, 17 May 2019 11:36:22 -0700 Michael Boulineau wrote: > So, who knows what happened with the  at the beginning of [1] > directly above. perl -Mutf8 -MEncode=encode,decode -Mcharnames=:full \ -E'say charnames::viacode ord decode utf8 => encode latin1 => ""' # ZERO WIDTH NO-BREAK

Re: [R] how to separate string from numbers in a large txt file

2019-05-17 Thread William Dunlap via R-help
The pattern I gave worked for the lines that you originally showed from the data file ('a'), before you put commas into them. If the name is either of the form "" or "***" then the "(<[^>]*>)" needs to be changed so something like "(<[^>]*>|[*]{3})". The " " at the start of the imported data

Re: [R] how to separate string from numbers in a large txt file

2019-05-17 Thread Michael Boulineau
This seemed to work: > a <- readLines ("hangouts-conversation-6.csv.txt") > b <- sub("^(.{10}) (.{8}) (<.+>) (.+$)", "\\1,\\2,\\3,\\4", a) > b [1:84] And the first 85 lines looks like this: [83] "2016-06-28 21:02:28 *** Jane Doe started a video chat" [84] "2016-06-28 21:12:43 *** John Doe ended

Re: [R] how to separate string from numbers in a large txt file

2019-05-17 Thread William Dunlap via R-help
Consider using readLines() and strcapture() for reading such a file. E.g., suppose readLines(files) produced a character vector like x <- c("2016-10-21 10:35:36 What's your login", "2016-10-21 10:56:29 John_Doe", "2016-10-21 10:56:37 Admit#8242", "October 23,

Re: [R] how to separate string from numbers in a large txt file

2019-05-16 Thread David Winsemius
On 5/16/19 3:53 PM, Michael Boulineau wrote: OK. So, I named the object test and then checked the 6347th item test <- readLines ("hangouts-conversation.txt) test [6347] [1] "2016-10-21 10:56:37 Admit#8242" Perhaps where it was getting screwed up is, since the end of this is a number

Re: [R] how to separate string from numbers in a large txt file

2019-05-16 Thread Michael Boulineau
OK. So, I named the object test and then checked the 6347th item > test <- readLines ("hangouts-conversation.txt) > test [6347] [1] "2016-10-21 10:56:37 Admit#8242" Perhaps where it was getting screwed up is, since the end of this is a number (8242), then, given that there's no space between

Re: [R] how to separate string from numbers in a large txt file

2019-05-16 Thread David Winsemius
On 5/16/19 12:30 PM, Michael Boulineau wrote: Thanks for this tip on etiquette, David. I will be sure and not do that again. I tried the read.fwf from the foreign package, with a code like this: d <- read.fwf("hangouts-conversation.txt", widths= c(10,10,20,40),

Re: [R] how to separate string from numbers in a large txt file

2019-05-16 Thread Michael Boulineau
Thanks for this tip on etiquette, David. I will be sure and not do that again. I tried the read.fwf from the foreign package, with a code like this: d <- read.fwf("hangouts-conversation.txt", widths= c(10,10,20,40), col.names=c("date","time","person","comment"),

Re: [R] how to separate string from numbers in a large txt file

2019-05-15 Thread David Winsemius
On 5/15/19 4:07 PM, Michael Boulineau wrote: I have a wild and crazy text file, the head of which looks like this: 2016-07-01 02:50:35 hey 2016-07-01 02:51:26 waiting for plane to Edinburgh 2016-07-01 02:51:45 thinking about my boo 2016-07-01 02:52:07 nothing crappy has happened, not