I am sure this is buildable with a one line perl script. Probably with SED as well. Depends on the level of cleaning you want.
Likely, you get 90% of the way Judy cutting off everything after the ? In the URL ... Including the ? On Wed, Feb 6, 2019, 4:52 PM Ben Koenig <[email protected] wrote: > I don't know of a tool that does this, but URL formatting is common for a > lot of programming tasks. If you know python, setting up a small script > that returns specific pieces of a URL is trivial. > > https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse > > Qt5 (and probably GTK too ) has similar URL parsing mechanisms, and you > could probably find similar functionality in most high-level scripting > languages through the appropriate module or library. Now whether or not a > tool already exists that does this in a production friendly way... probably > not, just example apps and code. The 'QUrl' object within Qt5 does a nice > job of abstracting the components of a network location in C++ so there > might be someone who threw up a quick little demo app on github. > > > > On Tue, Feb 5, 2019 at 8:50 PM David Barr <[email protected]> wrote: > > > Hey, Randall, > > > > To be pedantic, the tracking tags and such are all stuff that appear > > after the question mark delimiting character in the HTTP PUT request, > > right? `https://foo/bar/baz?evil_tag=evil` > <https://foo/bar/baz?evil_tag=evil> > > <https://foo/bar/baz?evil_tag=evil> > > > > The trick then, is to select only the lines containing question marks, > > and then delete from the question mark to the end of the line. Try this: > > > > ``` > > sed -e '/\?/ s/\?.*$//' <file> > > ``` > > > > Pedantry again: That's "select lines containing a (backslash escaped) > > question mark," followed by "substitute all characters from and > > including that (backslash escaped) question mark to the end of the line > > ($) with nothing." > > > > I haven't tested this on a file, so I deserve whatever mockery I get if > > I missed something. > > > > Cheers! > > David > > > > On 2/5/19 2:48 PM, logical american wrote: > > > Hi: > > > > > > Is there a linux tool which cleans up the URLs in a text file (I > > > believe Western unicode encoding) so that all the tracking tags, > > > fbclid, etc are removed and the pure URL is left in the text? > > > > > > In one recent email I received, there were 28 govdelivery.com tags and > > > others embedded inside the URLs, and I don't wish the posted material > > > to provide an easy access for the website to be tracked. > > > > > > Thanks > > > > > > Randall > > > > > > > _______________________________________________ > > PLUG mailing list > > [email protected] > > http://lists.pdxlinux.org/mailman/listinfo/plug > > > _______________________________________________ > PLUG mailing list > [email protected] > http://lists.pdxlinux.org/mailman/listinfo/plug > _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
