I don't know of a tool that does this, but URL formatting is common for a
lot of programming tasks. If you know python, setting up a small script
that returns specific pieces of a URL is trivial.

https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse

Qt5 (and probably GTK too ) has similar URL parsing mechanisms, and you
could probably find similar functionality in most high-level scripting
languages through the appropriate module or library. Now whether or not a
tool already exists that does this in a production friendly way... probably
not, just example apps and code.  The 'QUrl' object within Qt5 does a nice
job of abstracting the components of a network location in C++ so there
might be someone who threw up a quick little demo app on github.



On Tue, Feb 5, 2019 at 8:50 PM David Barr <[email protected]> wrote:

> Hey, Randall,
>
> To be pedantic, the tracking tags and such are all stuff that appear
> after the question mark delimiting character in the HTTP PUT request,
> right? `https://foo/bar/baz?evil_tag=evil`
> <https://foo/bar/baz?evil_tag=evil>
>
> The trick then, is to select only the lines containing question marks,
> and then delete from the question mark to the end of the line. Try this:
>
> ```
> sed -e '/\?/ s/\?.*$//' <file>
> ```
>
> Pedantry again: That's "select lines containing a (backslash escaped)
> question mark," followed by "substitute all characters from and
> including that (backslash escaped) question mark to the end of the line
> ($) with nothing."
>
> I haven't tested this on a file, so I deserve whatever mockery I get if
> I missed something.
>
> Cheers!
> David
>
> On 2/5/19 2:48 PM, logical american wrote:
> > Hi:
> >
> > Is there a linux tool which cleans up the URLs in a text file (I
> > believe Western unicode encoding) so that all the tracking tags,
> > fbclid, etc are removed and the pure URL is left in the text?
> >
> > In one recent email I received, there were 28 govdelivery.com tags and
> > others embedded inside the URLs, and I don't wish the posted material
> > to provide an easy access for the website to be tracked.
> >
> > Thanks
> >
> > Randall
> >
>
> _______________________________________________
> PLUG mailing list
> [email protected]
> http://lists.pdxlinux.org/mailman/listinfo/plug
>
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to