Re: [R] read txt file - date - no space

PIKAL Petr Thu, 02 Aug 2018 00:33:19 -0700

Hi

see in line (and please do not post HTML formated messages, it could be 
scrammbled)


From: Diego Avesani <[email protected]>
Sent: Thursday, August 2, 2018 8:56 AM
To: jim holtman <[email protected]>; PIKAL Petr <[email protected]>
Cc: R mailing list <[email protected]>
Subject: Re: [R] read txt file - date - no space

Dear

I have check the one of the line that gives me problem. I mean, which give NA 
after R processing. I think that is similar to the others:

You should stop **thinking** and instead do real inspection of „offending“ 
values.

10/12/1998 10:00,0,0,0
10/12/1998 11:00,0,0,0
10/12/1998 12:00,0,0,0
10/12/1998 13:00,0,0,0
10/12/1998 14:00,0,0,0
10/12/1998 15:00,0,0,0
10/12/1998 16:00,0,0,0
10/12/1998 17:00,0,0,0

These lines do not pose any problem with formating.

>  test<-read.table("clipboard", sep=",")
> str(test)
'data.frame':   8 obs. of  4 variables:
$ V1: Factor w/ 8 levels "10/12/1998 10:00",..: 1 2 3 4 5 6 7 8
$ V2: int  0 0 0 0 0 0 0 0
$ V3: int  0 0 0 0 0 0 0 0
$ V4: int  0 0 0 0 0 0 0 0
> as.POSIXct(test$V1, format="%d/%m/%Y %H:%M")
[1] "1998-12-10 10:00:00 CET" "1998-12-10 11:00:00 CET"
[3] "1998-12-10 12:00:00 CET" "1998-12-10 13:00:00 CET"
[5] "1998-12-10 14:00:00 CET" "1998-12-10 15:00:00 CET"
[7] "1998-12-10 16:00:00 CET" "1998-12-10 17:00:00 CET"


@jim: It seems that you suggestion is focus on reading data from the terminal. 
It is possible to apply it to a *.csv file?

@Pikal: Could it be that there are some date conversion error?

Well, your str(MyData) result suggest, that conversion from character to POSIX 
was done correctly (at least partly).

However NAs in date column you posted in second mail suggest, that some values 
in the input are probably formated differently and they are changed to NA 
during POSIX conversion.

You could check which values are problematic if instead directly changing date 
column to POSIX you put a new column to you data with converted POSIX values

So read your data from csv file and change date to POSIX but store it in 
different column of data frame.

MyData$date2 <- as.POSIXct(MyData$date, format="%d/%m/%Y %H:%M")

and check which values in your original file are formated differently.

something like
MyData$date[is.na(MyData$date2)]

However your (very basic) questions suggest, that you have only minor 
understanding what are R objects, how to check, inspect and manipulate them. 
You could do a big favour to yourself going through basic documentation as I 
suggested before.

Cheers
Petr

Thanks again,
Diego


Diego

On 1 August 2018 at 17:01, jim holtman 
<[email protected]<mailto:[email protected]>> wrote:

Try this:

> library(lubridate)
> library(tidyverse)
> input <- read.csv(text = "date,str1,str2,str3
+ 10/1/1998 0:00,0.6,0,0
+                   10/1/1998 1:00,0.2,0.2,0.2
+                   10/1/1998 2:00,0.6,0.2,0.4
+                   10/1/1998 3:00,0,0,0.6
+                   10/1/1998 4:00,0,0,0
+                   10/1/1998 5:00,0,0,0
+                   10/1/1998 6:00,0,0,0
+                   10/1/1998 7:00,0.2,0,0", as.is<http://as.is> = TRUE)
> # convert the date and add the "day" so summarize
> input <- input %>%
+   mutate(date = mdy_hm(date),
+          day = floor_date(date, unit = 'day')
+   )
>
> by_day <- input %>%
+   group_by(day) %>%
+   summarise(m_s1 = mean(str1),
+             m_s2 = mean(str2),
+             m_s3 = mean(str3)
+   )
>
> by_day
# A tibble: 1 x 4
  day                  m_s1   m_s2  m_s3
  <dttm>              <dbl>  <dbl> <dbl>
1 1998-10-01 00:00:00 0.200 0.0500 0.150

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Tue, Jul 31, 2018 at 11:54 PM Diego Avesani 
<[email protected]<mailto:[email protected]>> wrote:
Dear all,
I am sorry, I did a lot of confusion. I am sorry, I have to relax and stat
all again in order to understand.
If I could I would like to start again, without mixing strategy and waiting
for your advice.

I am really appreciate you help, really really.
Here my new file, a *.csv file (buy the way, it is possible to attach it in
the mailing list?)

date,str1,str2,str3
10/1/1998 0:00,0.6,0,0
10/1/1998 1:00,0.2,0.2,0.2
10/1/1998 2:00,0.6,0.2,0.4
10/1/1998 3:00,0,0,0.6
10/1/1998 4:00,0,0,0
10/1/1998 5:00,0,0,0
10/1/1998 6:00,0,0,0
10/1/1998 7:00,0.2,0,0


I read it as:
MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")

at this point I would like to have the daily mean.
What would you suggest?

Really Really thanks,
You are my lifesaver

Thanks



Diego


On 1 August 2018 at 01:01, Jeff Newmiller 
<[email protected]<mailto:[email protected]>> wrote:

> ... and the most common source of NA values in time data is wrong
> timezones. You really need to make sure the timezone that is assumed when
> the character data are converted to POSIXt agrees with the data. In most
> cases the easiest way to insure this is to use
>
> Sys.setenv(TZ="US/Pacific")
>
> or whatever timezone from
>
> OlsonNames()
>
> corresponds with your data. Execute this setenv function before the
> strptime or as.POSIXct() function call.
>
> You can use
>
> MyData[ is.na<http://is.na>(MyData$datetime), ]
>
> to see which records are failing to convert time.
>
> [1] https://github.com/jdnewmil/eci298sp2016/blob/master/QuickHowto1
>
> On July 31, 2018 3:04:05 PM PDT, Jim Lemon 
> <[email protected]<mailto:[email protected]>> wrote:
> >Hi Diego,
> >I think the error is due to NA values in your data file. If I extend
> >your example and run it, I get no errors:
> >
> >MyData<-read.table(text="103001930 103001580 103001530
> >1998-10-01 00:00:00 0.6 0 0
> >1998-10-01 01:00:00 0.2 0.2 0.2
> >1998-10-01 02:00:00 0.6 0.2 0.4
> >1998-10-01 03:00:00 0 0 0.6
> >1998-10-01 04:00:00 0 0 0
> >1998-10-01 05:00:00 0 0 0
> >1998-10-01 06:00:00 0 0 0
> >1998-10-01 07:00:00 0.2 0 0
> >1998-10-01 08:00:00 0.6 0 0
> >1998-10-01 09:00:00 0.2 0.2 0.2
> >1998-10-01 10:00:00 0.6 0.2 0.4
> >1998-10-01 11:00:00 0 0 0.6
> >1998-10-01 12:00:00 0 0 0
> >1998-10-01 13:00:00 0 0 0
> >1998-10-01 14:00:00 0 0 0
> >1998-10-01 15:00:00 0.2 0 0
> >1998-10-01 16:00:00 0.6 0 0
> >1998-10-01 17:00:00 0.2 0.2 0.2
> >1998-10-01 18:00:00 0.6 0.2 0.4
> >1998-10-01 19:00:00 0 0 0.6
> >1998-10-01 20:00:00 0 0 0
> >1998-10-01 21:00:00 0 0 0
> >1998-10-01 22:00:00 0 0 0
> >1998-10-01 23:00:00 0.2 0 0
> >1998-10-02 00:00:00 0.6 0 0
> >1998-10-02 01:00:00 0.2 0.2 0.2
> >1998-10-02 02:00:00 0.6 0.2 0.4
> >1998-10-02 03:00:00 0 0 0.6
> >1998-10-02 04:00:00 0 0 0
> >1998-10-02 05:00:00 0 0 0
> >1998-10-02 06:00:00 0 0 0
> >1998-10-02 07:00:00 0.2 0 0
> >1998-10-02 08:00:00 0.6 0 0
> >1998-10-02 09:00:00 0.2 0.2 0.2
> >1998-10-02 10:00:00 0.6 0.2 0.4
> >1998-10-02 11:00:00 0 0 0.6
> >1998-10-02 12:00:00 0 0 0
> >1998-10-02 13:00:00 0 0 0
> >1998-10-02 14:00:00 0 0 0
> >1998-10-02 15:00:00 0.2 0 0
> >1998-10-02 16:00:00 0.6 0 0
> >1998-10-02 17:00:00 0.2 0.2 0.2
> >1998-10-02 18:00:00 0.6 0.2 0.4
> >1998-10-02 19:00:00 0 0 0.6
> >1998-10-02 20:00:00 0 0 0
> >1998-10-02 21:00:00 0 0 0
> >1998-10-02 22:00:00 0 0 0
> >1998-10-02 23:00:00 0.2 0 0",
> >skip=1,stringsAsFactors=FALSE)
> >names(MyData)<-c("date","time","st1","st2","st3")
> >MyData$datetime<-strptime(paste(MyData$date,MyData$time),
> > format="%Y-%m-%d %H:%M:%S")
> >MyData$datetime
> >st1_daily<-by(MyData$st1,MyData$date,mean)
> >st2_daily<-by(MyData$st2,MyData$date,mean)
> >st3_daily<-by(MyData$st3,MyData$date,mean)
> >st1_daily
> >st2_daily
> >st3_daily
> >
> >Try adding na.rm=TRUE to the "by" calls:
> >
> >st1_daily<-by(MyData$st1,MyData$date,mean,na.rm=TRUE)
> >st2_daily<-by(MyData$st2,MyData$date,mean,na.rm=TRUE)
> >st3_daily<-by(MyData$st3,MyData$date,mean,na.rm=TRUE)
> >
> >Jim
> >
> >On Tue, Jul 31, 2018 at 11:11 PM, Diego Avesani
> ><[email protected]<mailto:[email protected]>> wrote:
> >> Dear all,
> >>
> >> I have still problem with date.
> >> Could you please tel me how to use POSIXct.
> >> Indeed I have found this command:
> >> timeAverage, but I am not able to convert MyDate to properly date.
> >>
> >> Thank a lot
> >> I hope to no bother you, at least too much
> >>
> >>
> >> Diego
> >>
> >>
> >> On 31 July 2018 at 11:12, Diego Avesani 
> >> <[email protected]<mailto:[email protected]>>
> >wrote:
> >>>
> >>> Dear Jim, Dear all,
> >>>
> >>> thanks a lot.
> >>>
> >>> Unfortunately, I get the following error:
> >>>
> >>>
> >>>  st1_daily<-by(MyData$st1,MyData$date,mean)
> >>> Error in tapply(seq_len(0L), list(`MyData$date` = c(913L, 914L,
> >925L,  :
> >>>   arguments must have same length
> >>>
> >>>
> >>> This is particularly strange. indeed, if I apply
> >>>
> >>>
> >>> mean(MyData$str1,na.rm=TRUE)
> >>>
> >>>
> >>> it works
> >>>
> >>>
> >>> Sorry, I have to learn a lot.
> >>> You are really boosting me
> >>>
> >>> Diego
> >>>
> >>>
> >>> On 31 July 2018 at 11:02, Jim Lemon 
> >>> <[email protected]<mailto:[email protected]>> wrote:
> >>>>
> >>>> Hi Diego,
> >>>> One way you can get daily means is:
> >>>>
> >>>> st1_daily<-by(MyData$st1,MyData$date,mean)
> >>>> st2_daily<-by(MyData$st2,MyData$date,mean)
> >>>> st3_daily<-by(MyData$st3,MyData$date,mean)
> >>>>
> >>>> Jim
> >>>>
> >>>> On Tue, Jul 31, 2018 at 6:51 PM, Diego Avesani
> ><[email protected]<mailto:[email protected]>>
> >>>> wrote:
> >>>> > Dear all,
> >>>> > I have found the error, my fault. Sorry.
> >>>> > There was an extra come in the headers line.
> >>>> > Thanks again.
> >>>> >
> >>>> > If I can I would like to ask you another questions about the
> >imported
> >>>> > data.
> >>>> > I would like to compute the daily average of the different date.
> >>>> > Basically I
> >>>> > have hourly data, I would like to ave the daily mean of them.
> >>>> >
> >>>> > Is there some special commands?
> >>>> >
> >>>> > Thanks a lot.
> >>>> >
> >>>> >
> >>>> > Diego
> >>>> >
> >>>> >
> >>>> > On 31 July 2018 at 10:40, Diego Avesani 
> >>>> > <[email protected]<mailto:[email protected]>>
> >>>> > wrote:
> >>>> >>
> >>>> >> Dear all,
> >>>> >> I move to csv file because originally the date where in csv
> >file.
> >>>> >> In addition, due to the fact that, as you told me, read.csv is a
> >>>> >> special
> >>>> >> case of read.table, I prefer start to learn from the simplest
> >one.
> >>>> >> After that, I will try also the *.txt format.
> >>>> >>
> >>>> >> with read.csv, something strange happened:
> >>>> >>
> >>>> >> This us now the file:
> >>>> >>
> >>>> >> date,st1,st2,st3,
> >>>> >> 10/1/1998 0:00,0.6,0,0
> >>>> >> 10/1/1998 1:00,0.2,0.2,0.2
> >>>> >> 10/1/1998 2:00,0.6,0.2,0.4
> >>>> >> 10/1/1998 3:00,0,0,0.6
> >>>> >> 10/1/1998 4:00,0,0,0
> >>>> >> 10/1/1998 5:00,0,0,0
> >>>> >> 10/1/1998 6:00,0,0,0
> >>>> >> 10/1/1998 7:00,0.2,0,0
> >>>> >> 10/1/1998 8:00,0.6,0.2,0
> >>>> >> 10/1/1998 9:00,0.2,0.4,0.4
> >>>> >> 10/1/1998 10:00,0,0.4,0.2
> >>>> >>
> >>>> >> When I apply:
> >>>> >> MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")
> >>>> >>
> >>>> >> this is the results:
> >>>> >>
> >>>> >> 10/1/1998 0:00    0.6    0.00    0.0 NA
> >>>> >> 2        10/1/1998 1:00    0.2    0.20    0.2 NA
> >>>> >> 3        10/1/1998 2:00    0.6    0.20    0.4 NA
> >>>> >> 4        10/1/1998 3:00    0.0    0.00    0.6 NA
> >>>> >> 5        10/1/1998 4:00    0.0    0.00    0.0 NA
> >>>> >> 6        10/1/1998 5:00    0.0    0.00    0.0 NA
> >>>> >> 7        10/1/1998 6:00    0.0    0.00    0.0 NA
> >>>> >> 8        10/1/1998 7:00    0.2    0.00    0.0 NA
> >>>> >>
> >>>> >> I do not understand why.
> >>>> >> Something wrong with date?
> >>>> >>
> >>>> >> really really thanks,
> >>>> >> I appreciate a lot all your helps.
> >>>> >>
> >>>> >> Diedro
> >>>> >>
> >>>> >>
> >>>> >> Diego
> >>>> >>
> >>>> >>
> >>>> >> On 31 July 2018 at 01:25, MacQueen, Don 
> >>>> >> <[email protected]<mailto:[email protected]>>
> >wrote:
> >>>> >>>
> >>>> >>> Or, without removing the first line
> >>>> >>>   dadf <- read.table("xxx.txt", stringsAsFactors=FALSE, skip=1)
> >>>> >>>
> >>>> >>> Another alternative,
> >>>> >>>    dadf$datetime <- as.POSIXct(paste(dadf$V1,dadf$V2))
> >>>> >>> since the dates appear to be in the default format.
> >>>> >>> (I generally prefer to work with datetimes in POSIXct class
> >rather
> >>>> >>> than
> >>>> >>> POSIXlt class)
> >>>> >>>
> >>>> >>> -Don
> >>>> >>>
> >>>> >>> --
> >>>> >>> Don MacQueen
> >>>> >>> Lawrence Livermore National Laboratory
> >>>> >>> 7000 East Ave., L-627
> >>>> >>> Livermore, CA 94550
> >>>> >>> 925-423-1062
> >>>> >>> Lab cell 925-724-7509
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> On 7/30/18, 4:03 PM, "R-help on behalf of Jim Lemon"
> >>>> >>> <[email protected]<mailto:[email protected]> 
> >>>> >>> on behalf of
> >[email protected]<mailto:[email protected]>>
> >>>> >>> wrote:
> >>>> >>>
> >>>> >>>     Hi Diego,
> >>>> >>>     You may have to do some conversion as you have three fields
> >in
> >>>> >>> the
> >>>> >>>     first line using the default space separator and five
> >fields in
> >>>> >>>     subsequent lines. If the first line doesn't contain any
> >important
> >>>> >>> data
> >>>> >>>     you can just delete it or replace it with a meaningful
> >header
> >>>> >>> line
> >>>> >>>     with five fields and save the file under another name.
> >>>> >>>
> >>>> >>>     It looks as thought you have date-time as two fields. If
> >so, you
> >>>> >>> can
> >>>> >>>     just read the first field if you only want the date:
> >>>> >>>
> >>>> >>>     # assume you have removed the first line
> >>>> >>>     dadf<-read.table("xxx.txt",stringsAsFactors=FALSE
> >>>> >>>     dadf$date<-as.Date(dadf$V1,format="%Y-%m-%d")
> >>>> >>>
> >>>> >>>     If you want the date/time:
> >>>> >>>
> >>>> >>>
> >dadf$datetime<-strptime(paste(dadf$V1,dadf$V2),format="%Y-%m-%d
> >>>> >>> %H:%M:%S")
> >>>> >>>
> >>>> >>>     Jim
> >>>> >>>
> >>>> >>>     On Tue, Jul 31, 2018 at 12:29 AM, Diego Avesani
> >>>> >>> <[email protected]<mailto:[email protected]>> wrote:
> >>>> >>>     > Dear all,
> >>>> >>>     >
> >>>> >>>     > I am dealing with the reading of a *.txt file.
> >>>> >>>     > The txt file the following shape:
> >>>> >>>     >
> >>>> >>>     > 103001930 103001580 103001530
> >>>> >>>     > 1998-10-01 00:00:00 0.6 0 0
> >>>> >>>     > 1998-10-01 01:00:00 0.2 0.2 0.2
> >>>> >>>     > 1998-10-01 02:00:00 0.6 0.2 0.4
> >>>> >>>     > 1998-10-01 03:00:00 0 0 0.6
> >>>> >>>     > 1998-10-01 04:00:00 0 0 0
> >>>> >>>     > 1998-10-01 05:00:00 0 0 0
> >>>> >>>     > 1998-10-01 06:00:00 0 0 0
> >>>> >>>     > 1998-10-01 07:00:00 0.2 0 0
> >>>> >>>     >
> >>>> >>>     > If it is possible I have a coupe of questions, which will
> >sound
> >>>> >>> stupid but
> >>>> >>>     > they are important to me in order to understand ho R deal
> >with
> >>>> >>> file
> >>>> >>> or date.
> >>>> >>>     >
> >>>> >>>     > 1) Do I have to convert it to a *csv file?
> >>>> >>>     > 2) Can a deal with space and not ","
> >>>> >>>     > 3) How can I read date?
> >>>> >>>     >
> >>>> >>>     > thanks a lot to all of you,
> >>>> >>>     > Thanks
> >>>> >>>     >
> >>>> >>>     >
> >>>> >>>     > Diego
> >>>> >>>     >
> >>>> >>>     >         [[alternative HTML version deleted]]
> >>>> >>>     >
> >>>> >>>     > ______________________________________________
> >>>> >>>     > [email protected]<mailto:[email protected]> mailing list 
> >>>> >>> -- To UNSUBSCRIBE and
> >more,
> >>>> >>> see
> >>>> >>>     > https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>     > PLEASE do read the posting guide
> >>>> >>> http://www.R-project.org/posting-guide.html
> >>>> >>>     > and provide commented, minimal, self-contained,
> >reproducible
> >>>> >>> code.
> >>>> >>>
> >>>> >>>     ______________________________________________
> >>>> >>>     [email protected]<mailto:[email protected]> mailing list 
> >>>> >>> -- To UNSUBSCRIBE and
> >more, see
> >>>> >>>     https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>     PLEASE do read the posting guide
> >>>> >>> http://www.R-project.org/posting-guide.html
> >>>> >>>     and provide commented, minimal, self-contained,
> >reproducible
> >>>> >>> code.
> >>>> >>>
> >>>> >>>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
> >______________________________________________
> >[email protected]<mailto:[email protected]> mailing list -- To 
> >UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected]<mailto:[email protected]> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/


        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read txt file - date - no space

Reply via email to