Em sáb, 2018-10-06 às 17:35 +0200, Bernd Vogelgesang escreveu: > > Em sáb, 2018-10-06 às 12:45 +0200, Bernd Vogelgesang escreveu: > > > Hi, > > > ==================8<-------------- clip here -------------- > > A REGEXP like "<[^>]+>" should match all contents between a > > consecutive pair of angle brackets. It may be necessary to escape > > some of the symbols in REGEXP to avoid misinterpretation. > > > > It is necessary to avoid REGEXP like "<.*>" because it will match > > everything from the first "<" to the last ">", that may include other > > characters "<" and ">". > > > > HTH > > Hi Fernando, > a many thanks for your hint. REGEX ist definitely the way to go, if it > was only a little more intuitive.
I hope this is not considered too offtopic. Trying to make it a little more understandable: The square bracket pattern ( [...] ) matches any _one_ character among the listed ones. For example, if the pattern is "<[a2jZ]" it will match one character after the "<" if it is one of those listed. In the given example it will match an angle bracket (<) followed by one of the listed chars: <a or <2 or <j or <Z. If the input string is (e.g.) "<jja2ZZ" the example pattern will match "<j" only. Just to make it clear the content of the "[...]" pattern will match only one character. Ok, in order to make it a little more flexible, if the first character in the square brackets is a caret (^) it will invert the meaning, that is it will match any character, except those listed. So the used pattern "<[^>]" means one angle pattern (<) followed by any character except the close angle bracket (>). As above, this matches just a pair of characters. After a pattern you may use a repeat mark to make it work as much as it keeps matching. For example the plus sign (+) make the prior pattern ([^>]) repeat as long as the character at the position is not a close angle bracket, provided the at least the first match is achieved. This pattern will not get the sequence "<>", because the "+" demands at least one match. If zero matches if an option it will be necessary to use a different repeater, the asterisk (*), making it "<[^>]*", this should match a sequence "<>". And we close the pattern sequence with the closing angle bracket (>). In plain English, the complete pattern reads as: Matches one string starting with one open angle bracket followed by any number of characters different from the close bracket, and ending with one close bracket. > > regexp_replace( "desc",'<[^>]+>','') > > in the field calculator did the trick for me for all entries with > correct html. So only few entries with crippled html left to process > manually. If the crippled ones are like "<>", the exchange of "+" by "*" should do the trick. HTH > > Thanx a lot, > Bernd > > > > > > Is the e.g. a way to search for < and > and then delete them an all > > > text > > > within programmatically? > > > > > > > > > Cheers, > > > > > > Bernd > > > > > > _______________________________________________ > > > Qgis-user mailing list > > > [email protected] > > > List info: https://lists.osgeo.org/mailman/listinfo/qgis-user > > > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user > > > > Roxo > > > > _______________________________________________ > Qgis-user mailing list > [email protected] > List info: https://lists.osgeo.org/mailman/listinfo/qgis-user > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user Roxo -- ---------------- Non luctari, ludare -------------------+ WYSIWYG Fernando M. Roxo da Motta <[email protected]> | Editor? Except where explicitly stated I speak on my own behalf.| VI !! PU5RXO | I see text, ------------ Quis custodiet ipsos custodes?-------------+ I get text! _______________________________________________ Qgis-user mailing list [email protected] List info: https://lists.osgeo.org/mailman/listinfo/qgis-user Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
