Silent data corruption in pandas, was Re: Python read text file columnwise

Peter Otten Sat, 12 Jan 2019 02:16:41 -0800

Peter Otten wrote:

> [email protected] wrote:
> 
>> Hello
>>> 
>>> I'm very new in python. I have a file in the format:
>>> 
>>> 2018-05-31  16:00:00        28.90   81.77   4.3
>>> 2018-05-31  20:32:00        28.17   84.89   4.1
>>> 2018-06-20  04:09:00        27.36   88.01   4.8
>>> 2018-06-20  04:15:00        27.31   87.09   4.7
>>> 2018-06-28  04.07:00        27.87   84.91   5.0
>>> 2018-06-29  00.42:00        32.20   104.61  4.8
>> 
>> I would like to read this file in python column-wise.


> However, in the long term you may be better off with a tool like pandas:
> 
>>>> import pandas
>>>> pandas.read_table(
> ... "seismicity_R023E.txt", sep=r"\s+",
> ... names=["date", "time", "foo", "bar", "baz"],
> ... parse_dates=[["date", "time"]]
> ... )
>             date_time    foo     bar  baz
> 0 2018-05-31 16:00:00  28.90   81.77  4.3
> 1 2018-05-31 20:32:00  28.17   84.89  4.1
> 2 2018-06-20 04:09:00  27.36   88.01  4.8
> 3 2018-06-20 04:15:00  27.31   87.09  4.7
> 4 2018-06-28 04:00:00  27.87   84.91  5.0
> 5 2018-06-29 00:00:00  32.20  104.61  4.8
> 
> [6 rows x 4 columns]
>>>>
> 
> It will be harder in the beginning, but if you work with tabular data
> regularly it will pay off.

After posting the above I noted that the malformed time in the last two rows 
was silently botched. So I just spent an insane amount of time to try and 
fix this from within pandas:

import datetime

import numpy
import pandas


def parse_datetime(dt):
    return datetime.datetime.strptime(
        dt.replace(".", ":"), "%Y-%m-%d %H:%M:%S"
    )


def date_parser(dates, times):
    return numpy.array([
        parse_datetime(date + " " + time)
        for date, time in zip(dates, times)
    ])

 
df = pandas.read_table(
    "seismicity_R023E.txt", sep=r"\s+",
    names=["date", "time", "foo", "bar", "baz"],
    parse_dates=[["date", "time"]], date_parser=date_parser
)


print(df)

There's probably a better way as I am only a determined amateur...

-- 
https://mail.python.org/mailman/listinfo/python-list

Silent data corruption in pandas, was Re: Python read text file columnwise

Reply via email to