On Tue, 5 Feb 2019, logical american wrote:
Is there a linux tool which cleans up the URLs in a text file (I believe Western unicode encoding) so that all the tracking tags, fbclid, etc are removed and the pure URL is left in the text?
In one recent email I received, there were 28 govdelivery.com tags and others embedded inside the URLs, and I don't wish the posted material to provide an easy access for the website to be tracked.
Randall, I have no idea what your files look like so I can offer only a generic overview. You have grep, sed, awk and the scripting languages Perl and Python. Each will do the job but the choice depends on the structure of the text file. You might need to pre-process the file(s) using an editor (emacs I know will work; vim probably does too) so there the lines in the files are uniform and the URLs can easily be indentified. HTH, Rich _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
