Re: [Qgis-user] Automagically remove html from attribute?

Bernd Vogelgesang Sun, 07 Oct 2018 04:22:06 -0700

Em sáb, 2018-10-06 às 17:35 +0200, Bernd Vogelgesang escreveu:

Em sáb, 2018-10-06 às 12:45 +0200, Bernd Vogelgesang escreveu:

Hi,

==================8<-------------- clip here --------------

    A REGEXP like  "<[^>]+>" should match all contents between a
consecutive pair of angle brackets.   It may be necessary to escape
some of the symbols in REGEXP to avoid misinterpretation.

    It is necessary to avoid REGEXP like "<.*>" because it will match
everything from the first "<" to the last ">", that may include other
characters "<" and ">".

    HTH

Hi Fernando,
a many thanks for your hint. REGEX ist definitely the way to go, if it
was only a little more intuitive.

   I hope this is not considered too offtopic.

   Trying to make it a little more understandable:

   The square bracket pattern ( [...] ) matches any _one_ character among
the listed ones.   For example, if the pattern is "<[a2jZ]" it will match
one character after the "<" if it is one of those listed.   In the given
example it will match an angle bracket (<) followed by one of the listed
chars:  <a or <2 or <j or <Z.   If the input string is (e.g.)
"<jja2ZZ" the example pattern will match "<j" only.  Just to make it clear
the content of the "[...]" pattern will match only one character.

   Ok, in order to make it a little more flexible, if the first character
in the square brackets is a caret (^) it will invert the meaning, that is
it will match any character, except those listed.

   So the used pattern "<[^>]" means one angle pattern (<) followed by any
character except the close angle bracket (>).  As above, this matches just
a pair of characters.

   After a pattern you may use a repeat mark to make it work as much as it
keeps matching.   For example the plus sign (+) make the prior pattern
([^>]) repeat as long as the character at the position is not a close
angle bracket, provided the at least the first match is achieved.   This
pattern will not get the sequence "<>", because the "+" demands at least
one match.   If zero matches if an option it will be necessary to use a
different repeater, the asterisk (*), making it "<[^>]*", this should
match a sequence  "<>".

   And we close the pattern sequence with the closing angle bracket (>).

   In plain English, the complete pattern reads as:

   Matches one string starting with one open angle bracket followed by any
number of characters different from the close bracket, and ending with one
close bracket.

Great exlanations! Thanks a lot. And I think it is not off topic. Offtopic would be answers like "ah, but thats so easy, just use REGEX..."

   regexp_replace( "desc",'<[^>]+>','')

in the field calculator did the trick for me for all entries with
correct html. So only few entries with crippled html left to process
manually.

   If the crippled ones are like "<>", the exchange of "+" by "*" should do
the trick.

   HTH

Thanx a lot,
Bernd

Is the e.g. a way to search for < and > and then delete them an all
text
within programmatically?


Cheers,

Bernd

_______________________________________________
Qgis-user mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user

    Roxo

_______________________________________________
Qgis-user mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user


   Roxo


_______________________________________________
Qgis-user mailing list
[email protected]
List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user

Re: [Qgis-user] Automagically remove html from attribute?

Reply via email to