Hey, Randall, To be pedantic, the tracking tags and such are all stuff that appear after the question mark delimiting character in the HTTP PUT request, right? `https://foo/bar/baz?evil_tag=evil`
The trick then, is to select only the lines containing question marks, and then delete from the question mark to the end of the line. Try this: ``` sed -e '/\?/ s/\?.*$//' <file> ``` Pedantry again: That's "select lines containing a (backslash escaped) question mark," followed by "substitute all characters from and including that (backslash escaped) question mark to the end of the line ($) with nothing." I haven't tested this on a file, so I deserve whatever mockery I get if I missed something. Cheers! David On 2/5/19 2:48 PM, logical american wrote: > Hi: > > Is there a linux tool which cleans up the URLs in a text file (I > believe Western unicode encoding) so that all the tracking tags, > fbclid, etc are removed and the pure URL is left in the text? > > In one recent email I received, there were 28 govdelivery.com tags and > others embedded inside the URLs, and I don't wish the posted material > to provide an easy access for the website to be tracked. > > Thanks > > Randall > _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
