Re: [gentoo-user] OT: Extracting year from data, but honour empty lines
On Sat, May 12, 2018 at 2:16 AM, Daniel Frey wrote: > Hi all, > > I am trying to do something relatively simple and I've had something > working in the past, but my brain just doesn't want to work today. > > I have a text file with the following (this is just a subset of about > 2500 dates, and I don't want to edit these all by hand if I can avoid it): > > --- START --- > December 2, 1994 > March 27, 1992 > June 4, 1994 > 1993 > January 11, 1992 > January 3, 1995 > > > March 12, 1993 > July 12, 1991 > May 17, 1991 > August 7, 1992 > December 23, 1994 > March 27, 1992 > March 1995 > --- END --- > > As you can see, there's no standard in the way the date is formatted. > Some of them are also formatted -MM-DD and MM-DD-. > > I have a basic grep that I tossed together: > > grep -o '\([0-9]\{4\}\)' > > This does extract the year but yields the following: > > 1994 > 1992 > 1994 > 1993 > 1992 > 1995 > 1993 > 1991 > 1991 > 1992 > 1994 > 1992 > 1995 > > As you can see, the two empty lines are removed but this will cause > problems with data not lining up later on. > > Does anyone have a quick tip for my tired brain to make this work and > just output a blank line if there's no match? I swear I did this months > ago and had something working but I apparently didn't bother saving the > script I made. Argh! > > Dan > Here's an awk and sed scripts for you to try: cat dates December 2, 1994 March 27, 1992 June 4, 1994 1993 January 11, 1992 January 3, 1995 March 12, 1993 July 12, 1991 May 17, 1991 August 7, 1992 December 23, 1994 March 27, 1992 March 1995 2018-05-12 05-12-2018 awk 'match($0,/[0-9][0-9][0-9][0-9]/){ print substr($0, RSTART, RLENGTH) } /^$/ ' dates 1994 1992 1994 1993 1992 1995 1993 1991 1991 1992 1994 1992 1995 2018 2018 sed 's/.*\([0-9][0-9][0-9][0-9]\).*/\1/p /^$/p d' dates 1994 1992 1994 1993 1992 1995 1993 1991 1991 1992 1994 1992 1995 2018 2018
Re: [gentoo-user] OT: Extracting year from data, but honour empty lines
On Fri, May 11, 2018 at 6:16 PM, Daniel Frey wrote: > Hi all, > > I am trying to do something relatively simple and I've had something > working in the past, but my brain just doesn't want to work today. > > I have a text file with the following (this is just a subset of about > 2500 dates, and I don't want to edit these all by hand if I can avoid it): > > --- START --- > December 2, 1994 > March 27, 1992 > June 4, 1994 > 1993 > January 11, 1992 > January 3, 1995 > > > March 12, 1993 > July 12, 1991 > May 17, 1991 > August 7, 1992 > December 23, 1994 > March 27, 1992 > March 1995 > --- END --- > > As you can see, there's no standard in the way the date is formatted. > Some of them are also formatted -MM-DD and MM-DD-. > > I have a basic grep that I tossed together: > > grep -o '\([0-9]\{4\}\)' > > This does extract the year but yields the following: > > 1994 > 1992 > 1994 > 1993 > 1992 > 1995 > 1993 > 1991 > 1991 > 1992 > 1994 > 1992 > 1995 > > As you can see, the two empty lines are removed but this will cause > problems with data not lining up later on. > > Does anyone have a quick tip for my tired brain to make this work and > just output a blank line if there's no match? I swear I did this months > ago and had something working but I apparently didn't bother saving the > script I made. Argh! > > Dan > Use awk or perl and when the line matches the pattern ^\s*$ print a blank line. Otherwise, apply the normal pattern. Cheers, R0b0t1
Re: [gentoo-user] OT: Extracting year from data, but honour empty lines
Hi Daniel, On Fri, 11 May 2018 16:16:52 -0700 Daniel Frey wrote: […] Does anyone have a quick tip for my tired brain to make this work and just output a blank line if there's no match? I swear I did this months ago and had something working but I apparently didn't bother saving the script I made. Argh! if you can ensure there is only one four-digit year per line, try to strip all other line characters with: $ sed -e 's/.*\([0-9]\{4\}\).*/\1/' /path/to/your-date-file while keeping none matching lines as they are. Note, pattern is always greedy and picks up the last year it founds. -- Regards, floyd
Re: [gentoo-user] OT: Extracting year from data, but honour empty lines
On Saturday, 12 May 2018 9:16:52 AM AEST Daniel Frey wrote: > Hi all, > > I am trying to do something relatively simple and I've had something > working in the past, but my brain just doesn't want to work today. > > I have a text file with the following (this is just a subset of about > 2500 dates, and I don't want to edit these all by hand if I can avoid it): > > --- START --- > December 2, 1994 > March 27, 1992 > June 4, 1994 > 1993 > January 11, 1992 > January 3, 1995 > > > March 12, 1993 > July 12, 1991 > May 17, 1991 > August 7, 1992 > December 23, 1994 > March 27, 1992 > March 1995 > --- END --- > > As you can see, there's no standard in the way the date is formatted. > Some of them are also formatted -MM-DD and MM-DD-. > > I have a basic grep that I tossed together: > > grep -o '\([0-9]\{4\}\)' > > This does extract the year but yields the following: > > 1994 > 1992 > 1994 > 1993 > 1992 > 1995 > 1993 > 1991 > 1991 > 1992 > 1994 > 1992 > 1995 > > As you can see, the two empty lines are removed but this will cause > problems with data not lining up later on. > > Does anyone have a quick tip for my tired brain to make this work and > just output a blank line if there's no match? I swear I did this months > ago and had something working but I apparently didn't bother saving the > script I made. Argh! > > Dan You can add an alternate regular expression that matches the blank lines, but the '-o' switch will still stop that match from being printed as it is an 'empty' match. The trick is to modify the data on the fly to add a space to the empty lines. I have also added the '-E' switch to make the regular expression easier. sed -e 's/^$/ /' YOUR_DATA_FILE | grep -o -E '([0-9]{4}|^[[:space:]]*$)' -- Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ Asking for technical help in newsgroups? Read this first: http://catb.org/~esr/faqs/smart-questions.html#intro
Re: [gentoo-user] OT: Extracting year from data, but honour empty lines
> On May 11, 2018, at 7:16 PM, Daniel Frey wrote: > > Hi all, > > I am trying to do something relatively simple and I've had something > working in the past, but my brain just doesn't want to work today. > > I have a text file with the following (this is just a subset of about > 2500 dates, and I don't want to edit these all by hand if I can avoid it): > > --- START --- > December 2, 1994 > March 27, 1992 > June 4, 1994 > 1993 > January 11, 1992 > January 3, 1995 > > > March 12, 1993 > July 12, 1991 > May 17, 1991 > August 7, 1992 > December 23, 1994 > March 27, 1992 > March 1995 > --- END — While loop in Bash? This is slower but it will do it: while IFS=$’\n’ read -r line; do if [ -z “$line” ]; then echo; fi grep -o '\([0-9]\{4\}\)’ <<< “$line” done < input_file I would consider using a human date string parsing library in another language, such as Python’s datetime where you can specify the formats, loop to check for any and if nothing matches output a blank line. Andrew
[gentoo-user] OT: Extracting year from data, but honour empty lines
Hi all, I am trying to do something relatively simple and I've had something working in the past, but my brain just doesn't want to work today. I have a text file with the following (this is just a subset of about 2500 dates, and I don't want to edit these all by hand if I can avoid it): --- START --- December 2, 1994 March 27, 1992 June 4, 1994 1993 January 11, 1992 January 3, 1995 March 12, 1993 July 12, 1991 May 17, 1991 August 7, 1992 December 23, 1994 March 27, 1992 March 1995 --- END --- As you can see, there's no standard in the way the date is formatted. Some of them are also formatted -MM-DD and MM-DD-. I have a basic grep that I tossed together: grep -o '\([0-9]\{4\}\)' This does extract the year but yields the following: 1994 1992 1994 1993 1992 1995 1993 1991 1991 1992 1994 1992 1995 As you can see, the two empty lines are removed but this will cause problems with data not lining up later on. Does anyone have a quick tip for my tired brain to make this work and just output a blank line if there's no match? I swear I did this months ago and had something working but I apparently didn't bother saving the script I made. Argh! Dan