Re: [PLUG] Correcting duplicate strings in files

2018-06-20 Thread Rich Shepard
On Tue, 19 Jun 2018, david wrote: cat $file | uniq -u > $outfile David, The above prints only the unique lines. I, too, have used this after grep to remove duplicates. With the data referenced in this thread uniq will not do the job because either each line is unique as a whole (same

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread david
On 06/19/2018 06:02 PM, Rich Shepard wrote: On Tue, 19 Jun 2018, david wrote: While I believe the answer has already been found, would the 'uniq' command have been useful as an alternative? david,   Good question. Can it find a difference in a specific field and change only one of them?

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Rich Shepard
On Tue, 19 Jun 2018, david wrote: While I believe the answer has already been found, would the 'uniq' command have been useful as an alternative? david, Good question. Can it find a difference in a specific field and change only one of them? Perhaps, but I've no idea. Thanks, Rich

Re: [PLUG] Correcting duplicate strings in files [RESOLVED]

2018-06-19 Thread Rich Shepard
On Tue, 19 Jun 2018, Robert Citek wrote: Awk is a very nice "little" language. Glad to hear it worked. And thanks for letting us know. Robert, I do a lot of environmental data munging/wragling/ETL. These come to me as .xml spreadsheets or the equivalent of line printer output sent as PDF

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Rich Shepard
On Tue, 19 Jun 2018, Carl Karsten wrote: It could be done with transistors if you spend enough time ;) Carl, Microprocessors. I would add some code that verifies assumptions, like are the dates always the same is it just the 1700 are 1600? Those are hours on the 24-hour clock: 16:00

Re: [PLUG] Correcting duplicate strings in files [RESOLVED]

2018-06-19 Thread Robert Citek
Awk is a very nice "little" language. Glad to hear it worked. And thanks for letting us know. - Robert On Tue, Jun 19, 2018 at 4:58 PM, Rich Shepard wrote: > On Tue, 19 Jun 2018, Robert Citek wrote: > >> $2 != "16.00" { print ; next } <= the decimal should be a colon, 16:00 vs >> 16.00 > > >

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Carl Karsten
It could be done with transistors if you spend enough time ;) I would add some code that verifies assumptions, like are the dates always the same is it just the 1700 are 1600? anyway, assuming all our descriptions and assumptions are correct, and the file starts at 2012-10-01,14:00 import csv

Re: [PLUG] Correcting duplicate strings in files [RESOLVED]

2018-06-19 Thread Rich Shepard
On Tue, 19 Jun 2018, Robert Citek wrote: $2 != "16.00" { print ; next } <= the decimal should be a colon, 16:00 vs 16.00 Robert, Oy! Too often we see what we expect to see, not what's actually there. I had that in a FORTRAN IV program in the early 1970s. flag == 1 && $2 == "16:00" {

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Carl Karsten
On Tue, Jun 19, 2018 at 2:29 PM, Rich Shepard wrote: > On Tue, 19 Jun 2018, Robert Citek wrote: > >> I don't fully understand your question, but here are some examples >> that may be a step in the right direction: > > > Robert, > > I did not provide as complete an explanation as I should have.

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Robert Citek
$2 != "16.00" { print ; next } <= the decimal should be a colon, 16:00 vs 16.00 flag == 1 && $2 == "16:00" { $2=="17:00"; print; flag=0 ; next } <= equality should be assignment, $2= vs $2== Here's a refactored version that you can put in a file: BEGIN {OFS=FS=","} ; flag == 1 && $2 == "16:00"

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Rich Shepard
On Tue, 19 Jun 2018, Robert Citek wrote: Couple of typos and an addition (-F,) : I'm not seeing the typos. { cat < I have the code in a file and run it with the '-f' option: gawk -f correct-double-hour.awk test.dat > out.dat correct-double-hour.awk: #!/usr/bin/gawk # # This script

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Robert Citek
Couple of typos and an addition (-F,) : { cat < wrote: > On Tue, 19 Jun 2018, Robert Citek wrote: > >> A quick pass. Needs testing and refactoring. >> >> $2 != "16.00" { print ; next } >> flag == 0 && $2 == "16:00" { print ; flag=1 ; next } >> flag == 1 && $2 == "16:00" { $2=="17:00"; print;

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Robert Citek
A quick pass. Needs testing and refactoring. $2 != "16.00" { print ; next } flag == 0 && $2 == "16:00" { print ; flag=1 ; next } flag == 1 && $2 == "16:00" { $2=="17:00"; print; flag=0 ; next } On Tue, Jun 19, 2018 at 2:04 PM, Rich Shepard wrote: > On Tue, 19 Jun 2018, Robert Citek wrote: >

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Rich Shepard
On Tue, 19 Jun 2018, Robert Citek wrote: Good luck and let us know how things go. This can be done using awk and flags. I've not before used flags in awk so I don't know the proper sequence of commands. What I have now is: $2!="16.00" { print } $2=="16:00" { print; flag=1 } $2=="16:00" {

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Rich Shepard
On Tue, 19 Jun 2018, Robert Citek wrote: I don't fully understand your question, but here are some examples that may be a step in the right direction: Robert, I did not provide as complete an explanation as I should have. Each file has 8761 lines, one for each hour of each day during

Re: [PLUG] Correcting duplicate strings in files

2018-06-19 Thread Robert Citek
I don't fully understand your question, but here are some examples that may be a step in the right direction: $ seq 1 5 | sed -e '1~2s/$/ --/' 1 -- 2 3 -- 4 5 -- $ seq 1 5 | sed -e '0~2s/$/ --/' 1 2 -- 3 4 -- 5 $ echo -e "2012-10-01,16:00,297.94\n2012-10-01,16:00,297.94" | sed -e