Re: [Tutor] Collating date data from a csv file
Something else to have a look at, but it might have to wait for the longer winter nights :-) On 14/05/2019 06:20, Ben Hancock via Tutor wrote: On Sun, May 12, 2019 at 07:17:53PM +0100, Alan Gauld via Tutor wrote: On 12/05/2019 11:27, Dave Hill wrote: I found out by accident that the Megger PAT 420 data backup is actually an SQLite database, so that is my route for access. Having played with Python on Raspberry Pi's, I thought I would explore Python for data processing, and now, I have a set of programs which extract SQLite, to 'csv' then collate/process this data and produce a multi-page ODS spreadsheet document, which lists tests by location. I also have an application which extracts appliances requiring testing within a +/- 30 day window, so I have a target for testing. You could do it all in native SQLite SQL of course. You can tell sqlite to output its results in Excel CSV format and to a file rather than (or in addition to) stdout. So you can write a standard query and have it generate your Excel readable file directly. You can then automate that as a batch job in the OS... Assuming you run the same reports regularly. Much as I love Python sometimes the native tools are even better... I wonder if `pandas` would also be a helpful library for what you're describing? It has a `pandas.read_sql` function to read data from a SQL database into a pandas dataframe, which you then can manipulate: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html Good luck! -- Ben Hancock https://elwha1.github.io ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
On Sun, May 12, 2019 at 07:17:53PM +0100, Alan Gauld via Tutor wrote: On 12/05/2019 11:27, Dave Hill wrote: I found out by accident that the Megger PAT 420 data backup is actually an SQLite database, so that is my route for access. Having played with Python on Raspberry Pi's, I thought I would explore Python for data processing, and now, I have a set of programs which extract SQLite, to 'csv' then collate/process this data and produce a multi-page ODS spreadsheet document, which lists tests by location. I also have an application which extracts appliances requiring testing within a +/- 30 day window, so I have a target for testing. You could do it all in native SQLite SQL of course. You can tell sqlite to output its results in Excel CSV format and to a file rather than (or in addition to) stdout. So you can write a standard query and have it generate your Excel readable file directly. You can then automate that as a batch job in the OS... Assuming you run the same reports regularly. Much as I love Python sometimes the native tools are even better... I wonder if `pandas` would also be a helpful library for what you're describing? It has a `pandas.read_sql` function to read data from a SQL database into a pandas dataframe, which you then can manipulate: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html Good luck! -- Ben Hancock https://elwha1.github.io ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
On 12/05/2019 11:27, Dave Hill wrote: > I found out by accident that the Megger PAT 420 data backup is actually > an SQLite database, so that is my route for access. Having played with > Python on Raspberry Pi's, I thought I would explore Python for data > processing, and now, I have a set of programs which extract SQLite, to > 'csv' then collate/process this data and produce a multi-page ODS > spreadsheet document, which lists tests by location. I also have an > application which extracts appliances requiring testing within a +/- 30 > day window, so I have a target for testing. You could do it all in native SQLite SQL of course. You can tell sqlite to output its results in Excel CSV format and to a file rather than (or in addition to) stdout. So you can write a standard query and have it generate your Excel readable file directly. You can then automate that as a batch job in the OS... Assuming you run the same reports regularly. Much as I love Python sometimes the native tools are even better... -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
Hi David, Firstly, thank you for your reply. One condition of my agreeing to undertake PAT was that I got a PAT machine that stored the data, as I am better at programming than paperwork! I have a Megger PAT 420 which provides a data backup to a USB stick, and thence transfer to my laptop, which is network connected a dual HDD system (Buffalo), which automatically copies the main HDD to the slave HDD. I found out by accident that the Megger PAT 420 data backup is actually an SQLite database, so that is my route for access. Having played with Python on Raspberry Pi's, I thought I would explore Python for data processing, and now, I have a set of programs which extract SQLite, to 'csv' then collate/process this data and produce a multi-page ODS spreadsheet document, which lists tests by location. I also have an application which extracts appliances requiring testing within a +/- 30 day window, so I have a target for testing. My biggest problem is locating kit, and keeping up with removals/disposals and new acquisitions, but the guys are getting a bit better at communication! I thought it would be useful to the 'management' to have a handle on progress, and stats on tested/new/disposed, etc, hence the latest question. I convert the datetimestamp to Gregorian ordinal date, as this is easier to use in accessing the resulting 'defaultdict', do the counting and then convert the date back to something comprehensible for writing to an ODS spreadsheet. Having seen todays posts I am going to look at wxPython, as a front-end (and possibly display?) Thank you for your consideration Dave On 12/05/2019 04:20, David L Neil wrote: Hi Dave, I also volunteer to do PAT safety testing during my "20% time". Clambering around Snowdonia as a boy, I eschewed* the Rheilffordd yr Wyddfa/SMR in favor of shanks' pony... * OK, I was made to...! For the good of my soul??? On 9/05/19 8:04 AM, Dave Hill wrote: I have a csv file which details the results of equipment tests, I carry out PAT testing as a volunteer at a heriatge railway in N. Wales. I want to extract how many items were tested on each test day. So far I have generated a List of test dates, but I am now stalled at how to efficiently count numbers tested on each date. Can I have a list of tuples, where one item is the date and the second the count? or is there a better construct? Thanks in advance, Dave For completeness, I have listed below an extract from a target file, where the 10 digit number is the UNIX timestamp 182 1515001232 Toaster 13 2000 1 183 1515001259 Contact Grill 13 2000 1 245 1515001367 3G Cube Adaptor 13 0 1 246 1515001396 13A IEC Lead 5 0 1 248 1515001415 Worktop Light 3 30 1 420 1515001440 Fly killer 0 0 1 424 1515001461 Dairy fridge 13 0 1 427 1513277293 Fire 13 0 1 429 1515001489 Toaster Avanti 13 0 1 When you say "target file", is this coming off the tester via a link cable to your PC, or are you capturing by hand to a spreadsheet? A tactic which many people 'miss' is that a workbook may contain multiple spreadsheets, and that the data on one spreadsheet may be auto-magically 'copied' onto another. Thus if the above is data coming off the PAT into one spreadsheet, I would immediately create a more meaningful sheet, 'for human consumption', which has column headings and converts (re-formats) the timestamp into a readable date (as suggested elsewhere), but is otherwise pretty-much a direct copy. We now have a sheet used for data capture/computer processing and something separate (and prettier) as a report/presentation for people. From the spec, above, we are only interested in the date. Remember that considering the whole timestamp only makes life confusing. So convert them (only) to dates. These can be strings because Python compares strings as easily as dates! The time component could be retained if sequence (of testing) might be important. The sad reality is that a daily count could be accomplished in either LO-Writer or MS-Excel. No reason why you shouldn't use Python though. (Assuming that the data appears in (forward or reverse) date sequence) Read-in the data sheet/CSV file, row-by-row, taking note of the date of the first data-entry, and starting to count from one. Then increment for each row where the date matches. When the dates don't match, report, reset the counter, and note the new date. How will you lay-out and present this report? Another spreadsheet? Screen? Paper? When you say "count numbers tested on each date", the above method will let you know a (single) daily total of tests-performed. Did you (also) mean that you want to track how many of tests were performed within categories of devices, eg how many toasters on the one day? In which case, furth
Re: [Tutor] Collating date data from a csv file
Hi Dave, I also volunteer to do PAT safety testing during my "20% time". Clambering around Snowdonia as a boy, I eschewed* the Rheilffordd yr Wyddfa/SMR in favor of shanks' pony... * OK, I was made to...! For the good of my soul??? On 9/05/19 8:04 AM, Dave Hill wrote: I have a csv file which details the results of equipment tests, I carry out PAT testing as a volunteer at a heriatge railway in N. Wales. I want to extract how many items were tested on each test day. So far I have generated a List of test dates, but I am now stalled at how to efficiently count numbers tested on each date. Can I have a list of tuples, where one item is the date and the second the count? or is there a better construct? Thanks in advance, Dave For completeness, I have listed below an extract from a target file, where the 10 digit number is the UNIX timestamp 182 1515001232 Toaster 13 2000 1 183 1515001259 Contact Grill 13 2000 1 245 1515001367 3G Cube Adaptor 13 0 1 246 1515001396 13A IEC Lead 5 0 1 248 1515001415 Worktop Light 3 30 1 420 1515001440 Fly killer 0 0 1 424 1515001461 Dairy fridge 13 0 1 427 1513277293 Fire 13 0 1 429 1515001489 Toaster Avanti 13 0 1 When you say "target file", is this coming off the tester via a link cable to your PC, or are you capturing by hand to a spreadsheet? A tactic which many people 'miss' is that a workbook may contain multiple spreadsheets, and that the data on one spreadsheet may be auto-magically 'copied' onto another. Thus if the above is data coming off the PAT into one spreadsheet, I would immediately create a more meaningful sheet, 'for human consumption', which has column headings and converts (re-formats) the timestamp into a readable date (as suggested elsewhere), but is otherwise pretty-much a direct copy. We now have a sheet used for data capture/computer processing and something separate (and prettier) as a report/presentation for people. From the spec, above, we are only interested in the date. Remember that considering the whole timestamp only makes life confusing. So convert them (only) to dates. These can be strings because Python compares strings as easily as dates! The time component could be retained if sequence (of testing) might be important. The sad reality is that a daily count could be accomplished in either LO-Writer or MS-Excel. No reason why you shouldn't use Python though. (Assuming that the data appears in (forward or reverse) date sequence) Read-in the data sheet/CSV file, row-by-row, taking note of the date of the first data-entry, and starting to count from one. Then increment for each row where the date matches. When the dates don't match, report, reset the counter, and note the new date. How will you lay-out and present this report? Another spreadsheet? Screen? Paper? When you say "count numbers tested on each date", the above method will let you know a (single) daily total of tests-performed. Did you (also) mean that you want to track how many of tests were performed within categories of devices, eg how many toasters on the one day? In which case, further design consideration is required, eg which devices fit into which category and how to match "Toaster" with "Toaster Avanti"... -- Regards =dn ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
On 09/05/2019 14:06, Dave Hill wrote: > Thank you, I now have > > defaultdict(, {736697: 10, 736677: 14, 736980: 9, 737109: > 50, 736919: 15, 736652: 19, 736502: 14, 736710: 2, 736674: 6, 736672: 5, > 736933: 2, 736932: 6, 736658: 7, 736671: 5, 736499: 6, 736707: 4, > 737181: 4, 736686: 2, ... > > where the first number is the proleptic Gregorian ordinal of the date > 8-) ( I had to look up what proleptic meant) > > This means I can access the elements by the ordinal of the date, for > later processing, and extraction to a spreadsheet It might be easier to use a DateTime object instead of the ordinal. It might make printing the results easier later on. You'll find it in the datetime module... -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
Thank you, I now have defaultdict(, {736697: 10, 736677: 14, 736980: 9, 737109: 50, 736919: 15, 736652: 19, 736502: 14, 736710: 2, 736674: 6, 736672: 5, 736933: 2, 736932: 6, 736658: 7, 736671: 5, 736499: 6, 736707: 4, 737181: 4, 736686: 2, ... where the first number is the proleptic Gregorian ordinal of the date 8-) ( I had to look up what proleptic meant) This means I can access the elements by the ordinal of the date, for later processing, and extraction to a spreadsheet Dave On 09/05/2019 04:08, Cameron Simpson wrote: On 08May2019 21:04, Dave Hill wrote: I have a csv file which details the results of equipment tests, I carry out PAT testing as a volunteer at a heriatge railway in N. Wales. I want to extract how many items were tested on each test day. So far I have generated a List of test dates, but I am now stalled at how to efficiently count numbers tested on each date. Can I have a list of tuples, where one item is the date and the second the count? Not as such, because you can't modify a tuple (so you can't update the count part). But you could use a 2 element list. or is there a better construct? Oh definitely. The easiest thing would be a defaultdict(int). Example: from collections import defaultdict ... by_date = defaultdict(int) for row in csvdata: timestamp = row[1] # based on your example data # get the date from the timestamp date = ... by_date[date] += 1 A defaultdict is a dict which magicly makes missing elements when they get access, using a factory function you supply. Here we're using "int" as that factory, as int() returns zero. I presume you've got the timestamp => date conversion sorted? Cheers, Cameron Simpson ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
Cameron Simpson wrote: > On 08May2019 21:04, Dave Hill wrote: >>I have a csv file which details the results of equipment tests, I >>carry out PAT testing as a volunteer at a heriatge railway in N. >>Wales. I want to extract how many items were tested on each test day. >>So far I have generated a List of test dates, but I am now stalled at >>how to efficiently count numbers tested on each date. >> >>Can I have a list of tuples, where one item is the date and the second >>the count? > > Not as such, because you can't modify a tuple (so you can't update the > count part). But you could use a 2 element list. > >>or is there a better construct? > > Oh definitely. The easiest thing would be a defaultdict(int). Example: > > from collections import defaultdict > ... > by_date = defaultdict(int) > for row in csvdata: > timestamp = row[1] # based on your example data > # get the date from the timestamp > date = ... > by_date[date] += 1 > > A defaultdict is a dict which magicly makes missing elements when they > get access, using a factory function you supply. Here we're using "int" > as that factory, as int() returns zero. While this is easily adaptable if you want to keep more data... by_date = defaultdict(list) # rows grouped by date for row in csvdata: date = ... by_date[date].append(row) ... for the simple case there is also collections.Counter: def get_date(row): return datetime.datetime.fromtimestamp(int(row[1])).date() by_date = collections.Counter(map(get_date, csvdata)) # (date, freq) pairs ordered by frequency: print(by_date.most_common()) > > I presume you've got the timestamp => date conversion sorted? > > Cheers, > Cameron Simpson > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
On Wed, May 8, 2019 at 10:29 PM boB Stepp wrote: > > On Wed, May 8, 2019 at 10:09 PM Cameron Simpson wrote: > > > > A defaultdict is a dict which magicly makes missing elements when they > > get access, using a factory function you supply. Here we're using "int" > > as that factory, as int() returns zero. > > Is int() guaranteed to always return zero as Python versions progress? > More importantly, perhaps, where would I go to look to find the > answer to this question myself? It must be time to go to bed! Somehow my eyes missed the int() return behavior at https://docs.python.org/3/library/functions.html#int my first time through. Sorry for the noise! > boB -- boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
On Wed, May 8, 2019 at 10:09 PM Cameron Simpson wrote: > > A defaultdict is a dict which magicly makes missing elements when they > get access, using a factory function you supply. Here we're using "int" > as that factory, as int() returns zero. Is int() guaranteed to always return zero as Python versions progress? More importantly, perhaps, where would I go to look to find the answer to this question myself? -- boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Collating date data from a csv file
On 08May2019 21:04, Dave Hill wrote: I have a csv file which details the results of equipment tests, I carry out PAT testing as a volunteer at a heriatge railway in N. Wales. I want to extract how many items were tested on each test day. So far I have generated a List of test dates, but I am now stalled at how to efficiently count numbers tested on each date. Can I have a list of tuples, where one item is the date and the second the count? Not as such, because you can't modify a tuple (so you can't update the count part). But you could use a 2 element list. or is there a better construct? Oh definitely. The easiest thing would be a defaultdict(int). Example: from collections import defaultdict ... by_date = defaultdict(int) for row in csvdata: timestamp = row[1] # based on your example data # get the date from the timestamp date = ... by_date[date] += 1 A defaultdict is a dict which magicly makes missing elements when they get access, using a factory function you supply. Here we're using "int" as that factory, as int() returns zero. I presume you've got the timestamp => date conversion sorted? Cheers, Cameron Simpson ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor