D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-02 Thread Ahmad Samir
ahmadsamir added inline comments.

INLINE COMMENTS

> dfaure wrote in kuitmarkup.cpp:1597
> I thought it was the resolved version, without entities (if they were 
> successfully resolved).

If the entity is unknown it'll be left as is, I don't know how often that 
actually happens though.

REPOSITORY
  R249 KI18n

REVISION DETAIL
  https://phabricator.kde.org/D26285

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread Ahmad Samir
ahmadsamir added a task: T12279: Port frameworks away from QRegExp.

REPOSITORY
  R249 KI18n

REVISION DETAIL
  https://phabricator.kde.org/D26285

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread Ahmad Samir
This revision was automatically updated to reflect the committed changes.
Closed by commit R249:f9b932d254c4: [KuitFormatterPrivate] Start porting 
QRegExp to QRegularExpression (authored by ahmadsamir).

REPOSITORY
  R249 KI18n

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D26285?vs=72567=72569

REVISION DETAIL
  https://phabricator.kde.org/D26285

AFFECTED FILES
  src/kuitmarkup.cpp

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread David Faure
dfaure accepted this revision.

REPOSITORY
  R249 KI18n

BRANCH
  l-finalizeVisualText (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D26285

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread Ahmad Samir
ahmadsamir updated this revision to Diff 72567.
ahmadsamir added a comment.


  Ternary that doesn't return a value is weird, better avoid.
  
  "Grammar fixes" in comments add noise to the diff, be kind to reviewers.

REPOSITORY
  R249 KI18n

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D26285?vs=72539=72567

BRANCH
  l-finalizeVisualText (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D26285

AFFECTED FILES
  src/kuitmarkup.cpp

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread David Faure
dfaure accepted this revision.
dfaure added a comment.
This revision is now accepted and ready to land.


  I wonder if others agree that "a ? b() : c()" is bad form for void b() and 
void c(), compared to an if(), or if it's just me.

INLINE COMMENTS

> kuitmarkup.cpp:1582
> +// unknown Unicode point, leave it as is
> +ok ? plain.append(c) : plain.append(match.captured(0));
>  } else if (s->xmlEntities.contains(ent)) { // known entity

I don't really like ternary operators that don't return a value, this is really 
an if().

(I guess the QChar/QString type difference is the reason why append(ok?c:...) 
wouldn't work).

> ahmadsamir wrote in kuitmarkup.cpp:1572
> I wouldn't know about C, having never written any code with it.
> 
> > This is in no way more efficient, construction isn't very different from 
> > assignment.
> 
> That is the key I was missing (which means I should do some research, before 
> "assuming" stuff).

My statement doesn't apply to all classes obviously, the ctor/dtor could be 
very expensive and then my statement is incorrect.

But QString is really just a wrapper around a QStringPrivate "d", and both 
assignment or constructor (from another QString) just do some refcounting and 
grab the "d" of the other QString, so it's the same.

> ahmadsamir wrote in kuitmarkup.cpp:1597
> I concede that changing var names makes reading the diff harder, and I 
> should, if needed, have done it in a separate patch.
> 
> My, weird, logic is that "original" is the original string that is being 
> modified, v.s. text which is a temporary place to store the parts of original 
> after processing them to prevent the match looping on one singular place in 
> the string. Simply original didn't translate to constant in my head. (And 
> "plain" didn't make sense, it's not plain when it may have "" appended 
> to it).
> 
> Anyway, I'd rather change them back than argue. :)

I thought it was the resolved version, without entities (if they were 
successfully resolved).

REPOSITORY
  R249 KI18n

BRANCH
  l-finalizeVisualText (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D26285

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread Ahmad Samir
ahmadsamir added inline comments.

INLINE COMMENTS

> dfaure wrote in kuitmarkup.cpp:1342
> Duplication :-)
> 
> Dangerous, if the actual char* is modified one day.

Personally, I needed to see the actual regex spelled out like that to 
understand the rest of the code in this function, this:

  QLatin1String("^(") + QLatin1Strig(s_entitySubRx) + QLatin1String(");")

doesn't work as well for me, and especially so with finalizeVisualText(), where 
ENTITY_SUBRX is many lines away, and there are two capture groups used.

I thought of doing away with ENTITY_SUBRX and just construct the regex's in 
toVisualText() and  finalizeVisualText(), but the two must use the same 
pattern, except for the beginning, ^ and & respectively.

I can try lessening the danger by tweaking the comment.

> dfaure wrote in kuitmarkup.cpp:1572
> Please try to always declare the variables where they are used. The old code 
> said "QString ent = ...", why does the new code look more like C, declaring 
> variables outside the loop?
> This is in no way more efficient, construction isn't very different from 
> assignment.

I wouldn't know about C, having never written any code with it.

> This is in no way more efficient, construction isn't very different from 
> assignment.

That is the key I was missing (which means I should do some research, before 
"assuming" stuff).

> dfaure wrote in kuitmarkup.cpp:1597
> So it's not the "original" value anymore. Why the renaming of plain to text, 
> and of text to original? It makes the diff harder to review (but if you 
> convince me that it makes the code easier to read, that's fine). I'm just not 
> sure that "original" for something that is modified in many places (line 1592 
> and here) makes sense as a name.

I concede that changing var names makes reading the diff harder, and I should, 
if needed, have done it in a separate patch.

My, weird, logic is that "original" is the original string that is being 
modified, v.s. text which is a temporary place to store the parts of original 
after processing them to prevent the match looping on one singular place in the 
string. Simply original didn't translate to constant in my head. (And "plain" 
didn't make sense, it's not plain when it may have "" appended to it).

Anyway, I'd rather change them back than argue. :)

REPOSITORY
  R249 KI18n

REVISION DETAIL
  https://phabricator.kde.org/D26285

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread Ahmad Samir
ahmadsamir updated this revision to Diff 72539.
ahmadsamir edited the summary of this revision.
ahmadsamir added a comment.


  Address comments

REPOSITORY
  R249 KI18n

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D26285?vs=72363=72539

BRANCH
  l-finalizeVisualText (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D26285

AFFECTED FILES
  src/kuitmarkup.cpp

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2020-01-01 Thread David Faure
dfaure requested changes to this revision.
dfaure added inline comments.
This revision now requires changes to proceed.

INLINE COMMENTS

> kuitmarkup.cpp:1332
>  
> -#define ENTITY_SUBRX "[a-z]+|#[0-9]+|#x[0-9a-fA-F]+"
> +const QString ENTITY_SUBRX = QStringLiteral("[a-z]+|#[0-9]+|#x[0-9a-fA-F]+");
>  

This creates a QString at program startup. Please change it to

  static const char s_entitySubRx[] = "...";

> kuitmarkup.cpp:1342
>  // but do not touch & which forms an XML entity as it is.
> +// Regex is: ^([a-z]+|#[0-9]+|#x[0-9a-fA-F]+);
> +static const QRegularExpression restRx(QLatin1String("^(") + 
> ENTITY_SUBRX + QLatin1String(");"));

Duplication :-)

Dangerous, if the actual char* is modified one day.

> kuitmarkup.cpp:1567
>  if (format != Kuit::RichText) {
> -static QRegExp staticEntRx(QLatin1String("&(" ENTITY_SUBRX ");"));
> -QRegExp entRx = staticEntRx; // QRegExp not thread-safe
> -int p = entRx.indexIn(text);
> -QString plain;
> -while (p >= 0) {
> -QString ent = entRx.capturedTexts().at(1);
> -plain.append(text.midRef(0, p));
> -text.remove(0, p + ent.length() + 2);
> +// regex is: (&([a-z]+|#[0-9]+|#x[0-9a-fA-F]+);)
> +static const QRegularExpression entRx(QLatin1String("(&(") +  
> ENTITY_SUBRX + QLatin1String(");)"));

(same)

> kuitmarkup.cpp:1572
> +QString text;
> +QString ent;
> +while (match.hasMatch()) {

Please try to always declare the variables where they are used. The old code 
said "QString ent = ...", why does the new code look more like C, declaring 
variables outside the loop?
This is in no way more efficient, construction isn't very different from 
assignment.

> kuitmarkup.cpp:1597
>  if (format == Kuit::RichText) {
> -text = QStringLiteral("") + text + QStringLiteral("");
> +original = QLatin1String("") + original + 
> QLatin1String("");
>  }

So it's not the "original" value anymore. Why the renaming of plain to text, 
and of text to original? It makes the diff harder to review (but if you 
convince me that it makes the code easier to read, that's fine). I'm just not 
sure that "original" for something that is modified in many places (line 1592 
and here) makes sense as a name.

REPOSITORY
  R249 KI18n

REVISION DETAIL
  https://phabricator.kde.org/D26285

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns


D26285: [KuitFormatterPrivate] Start porting QRegExp to QRegularExpression

2019-12-30 Thread Ahmad Samir
ahmadsamir created this revision.
ahmadsamir added reviewers: Frameworks, ilic, dfaure, mlaurent, aacid.
Herald added a project: Frameworks.
ahmadsamir requested review of this revision.

REVISION SUMMARY
  Port toVisualText() and finalizeVisualText() to use QRegularExpression.
  Replace ENTITY_SUBRX macro with a QString.

TEST PLAN
  make && ctest

REPOSITORY
  R249 KI18n

BRANCH
  l-finalizeVisualText (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D26285

AFFECTED FILES
  src/kuitmarkup.cpp

To: ahmadsamir, #frameworks, ilic, dfaure, mlaurent, aacid
Cc: kde-frameworks-devel, ltoscano, LeGast00n, GB_2, michaelh, ngraham, bruns