[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-12-01 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#942128, @Typz wrote:

> Indeed, looks good, thanks!
>
> Though that exacerbates the alignment issue, I now get things like this:
>
>   enum {
> SomeLongerEnum, // comment
> SomeThing,  // comment
> Foo, // something
>   } 
>   ^ (column limit)
>   
>
> The comment on 'Foo' would overflow a bit, but it gets unindented during 
> "alingment" stage, which kind of 'breaks' the decision that was made earlier 
> on *not* to break the comment...


Ok, that seems like a different bug that we can fix in alignment, though :)


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-12-01 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

Indeed, looks good, thanks!

Though that exacerbates the alignment issue, I now get things like this:

  enum {
SomeLongerEnum, // comment
SomeThing,  // comment
Foo, // something
  } 
  ^ (column limit)

The comment on 'Foo' would overflow a bit, but it gets unindented during 
"alingment" stage, which kind of 'breaks' the decision that was made earlier on 
*not* to break the comment...


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-12-01 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#941979, @Typz wrote:

> I think the difference between code and comments is that code "words" are 
> easily 10 characters or more, whereas actual words (in comments) are very 
> often less than 10 characters: so code overflowing by 10 characters is not 
> very frequent. whereas small words in comment will often get closer to the 
> "extra" limit.
>
> That said, I tried with your latest change ("Restructure how we break 
> tokens", sha1:64d42a2fb85ece5987111ffb908c6bc7f7431dd4). and it's working 
> about fine now. For the most part it seems to wrap when I would expect it, 
> great work!
>  I have seen 2 "issues" though:
>
> - Often I see that the last word before reflowing is not wrapped (eventhough 
> it overlaps the line length); I did not count penalties so I cannot confirm 
> this is really an issue or just a borderline scenario.
> - Alignment seems better than before, but since there is no penalty for 
> breaking alignment it will always try to unindent to compensate for 
> overflowing characters...
>
>   Seeing this, I guess this patch does not make much sense anymore, I'll see 
> if I make some improvements for these two issues, in separate patches.


Note that I just 10 mins ago landed another change (r319541) that should fix 
the main issue you raised, and which looks a lot like I wanted this change to 
look back when I first saw it (but of course all the underlying code made that 
impossible). Please give it a try and let me know what you think.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-12-01 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

I think the difference between code and comments is that code "words" are 
easily 10 characters or more, whereas actual words (in comments) are very often 
less than 10 characters: so code overflowing by 10 characters is not very 
frequent. whereas small words in comment will often get closer to the "extra" 
limit.

That said, I tried with your latest change ("Restructure how we break tokens", 
sha1:64d42a2fb85ece5987111ffb908c6bc7f7431dd4). and it's working about fine 
now. For the most part it seems to wrap when I would expect it, great work!
I have seen 2 "issues" though:

- Often I see that the last word before reflowing is not wrapped (eventhough it 
overlaps the line length); I did not count penalties so I cannot confirm this 
is really an issue or just a borderline scenario.
- Alignment seems better than before, but since there is no penalty for 
breaking alignment it will always try to unindent to compensate for overflowing 
characters...

Seeing this, I guess this patch does not make much sense anymore, I'll see if I 
make some improvements for these two issues, in separate patches.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-23 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#933746, @Typz wrote:

> with this setting, a "non wrapped" comment will actually be reflown to 
> ColumnLimit+10...


Isn't the same true for code then, though? Generally, code will protrude by 10 
columns before being broken?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-23 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#933746, @Typz wrote:

> > @klimek wrote:
> >  In the above example, we add 3 line breaks, and we'd add 1 (or more) 
> > additional line breaks when reflowing below the column limit.
> >  I agree that that can lead to different overall outcomes, but I don't see 
> > how the approach of this patch really fixes it - it will only ever reflow 
> > below the column limit, so it'll also lead to states for long lines where 
> > reflowing and leaving chars over the line limit might be the overall best 
> > choice (unless I'm missing something)?
>
> Definitely, this patch may not find the 'optimal' solution. What I mean is 
> that we reduce the `PenaltyExcessCharacter` value to allow "occasionally" 
> breaking the limit: instead of a hard limit, we want to allow lines to 
> sometimes break the limit, but definitely not *all* the time. Both patch work 
> fine when the code is "correct", i.e. there is indeed only a few lines which 
> break the limit.
>
> But the local decision approach behaves really wrong IMHO when the code is 
> formatted beyond the column: it can very easily reformat in such a way that 
> the comment is reflown to what looks like a longer column limit. I currently 
> have a ratio of 10 between  PenaltyBreakComment and PenaltyExcessCharacter 
> (which empirically seemed to give a decent compromise, and match how our code 
> is formatted; I need to try more with your patch, to see if we can get better 
> values...): with this setting, a "non wrapped" comment will actually be 
> reflown to ColumnLimit+10...


To me the reverse intuitively makes sense:
When I see that
// first line
// second line
// third line
stays the same, but
// first line second line third line
gets reflown into something different, I get confused :)

The only way I see to consistently solve this is to optimize the reflow breaks 
like we do the main algorithm.

> When we do indeed reflow, I think we may be stricter than this, to get 
> something that really looks like it obeys the column limit. If this is 
> 'optimal' in the sense that we may have some overflow still, that is fine, 
> but really not the primary requirement IMHO.




https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-23 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

> @klimek wrote:
>  In the above example, we add 3 line breaks, and we'd add 1 (or more) 
> additional line breaks when reflowing below the column limit.
>  I agree that that can lead to different overall outcomes, but I don't see 
> how the approach of this patch really fixes it - it will only ever reflow 
> below the column limit, so it'll also lead to states for long lines where 
> reflowing and leaving chars over the line limit might be the overall best 
> choice (unless I'm missing something)?

Definitely, this patch may not find the 'optimal' solution. What I mean is that 
we reduce the `PenaltyExcessCharacter` value to allow "occasionally" breaking 
the limit: instead of a hard limit, we want to allow lines to sometimes break 
the limit, but definitely not *all* the time. Both patch work fine when the 
code is "correct", i.e. there is indeed only a few lines which break the limit.

But the local decision approach behaves really wrong IMHO when the code is 
formatted beyond the column: it can very easily reformat in such a way that the 
comment is reflown to what looks like a longer column limit. I currently have a 
ratio of 10 between  PenaltyBreakComment and PenaltyExcessCharacter (which 
empirically seemed to give a decent compromise, and match how our code is 
formatted; I need to try more with your patch, to see if we can get better 
values...): with this setting, a "non wrapped" comment will actually be reflown 
to ColumnLimit+10...

When we do indeed reflow, I think we may be stricter than this, to get 
something that really looks like it obeys the column limit. If this is 
'optimal' in the sense that we may have some overflow still, that is fine, but 
really not the primary requirement IMHO.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-23 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#933568, @klimek wrote:

> In https://reviews.llvm.org/D33589#931723, @Typz wrote:
>
> > In https://reviews.llvm.org/D33589#925903, @klimek wrote:
> >
> > > I think this patch doesn't handle a couple of cases that I'd like to see 
> > > handled. A counter-proposal with different trade-offs is in 
> > > https://reviews.llvm.org/D40068.
> >
> >
> > Can you provide more info on these cases? Are they all added to the tests 
> > of https://reviews.llvm.org/D40068?
> >
> > It may be simpler (though not to my eyes, I am not knowledgeable enough to 
> > really understand how you go this fixed...), and works fine for "almost 
> > correct" comments: e.g. when there are indeed just a few extra characters 
> > overall. But it still procudes strange result when each line of the (long) 
> > comment is too long, but not enough to trigger a line-wrap by itself.
>




>> Since that version has landed already, not sure how to improve on this. I 
>> could probably rewrite my patch on master, but it seems a bit redundant. As 
>> a simpler fix, I could imagine adding a "total" overflow counter, to allow 
>> detecting the situation; but when this is detected (e.g. on subsequent 
>> lines) we would need to "backtrack" and revisit the initial decision...
> 
> Yes, I added all tests I was thinking of to the patch. Note that we can 
> always go back on submitted patches / do things differently. As my patch 
> fulfilled all your tests, I (perhaps incorrectly?) assumed it solved your use 
> cases - can you give me a test case of what you would want to happen that 
> doesn't happen with my patch?
> 
> Are you, for example, saying that in the case
> 
>   Limit; 13
>   // foo bar baz foo bar baz foo bar baz
>   you'd not want
>   // foo bar baz
>   // foo bar baz
>   // foo bar baz
>   If the overall penalty of the protruding tokens is - what? More than 1 
> break penalty? If you care about more than 3x break penalty (which I would 
> think is correct), the local algorithm will work, as local decisions will 
> make sure the overall penalty cannot be exceeded.

Ok, sorry, after reading the comment on the other patch I get it :)
In the above example, we add 3 line breaks, and we'd add 1 (or more) additional 
line breaks when reflowing below the column limit.
I agree that that can lead to different overall outcomes, but I don't see how 
the approach of this patch really fixes it - it will only ever reflow below the 
column limit, so it'll also lead to states for long lines where reflowing and 
leaving chars over the line limit might be the overall best choice (unless I'm 
missing something)?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-23 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#931802, @Typz wrote:

> Btw, another issue I am having is that reflowing does not respect the 
> alignment. For exemple:
>
>   enum {
>  Foo,///< This is a very long comment
>  Bar,///< This is shorter  
>  BarBar, ///< This is shorter
>   } Stuff;
>   
>
> will be reflown to :
>
>   enum {
>  Foo, ///< This is a very long
>   ///< comment
>  Bar, ///< This is shorter  
>  BarBar, ///< This is shorter
>   } Stuff;
>   
>
> when I would expect:
>
>   enum {
>  Foo,///< This is a very
>  ///< long comment
>  Bar,///< This is shorter  
>  BarBar, ///< This is shorter
>   } Stuff;
>   
>
> I see there is no penalty for breaking alignment, which may be accounted for 
> when compressing the whitespace 'before' the comment... Or maybe simply the 
> breaking should be smarter, to try to keep the alignment; but I am not sure 
> it can be done without another kind of 'global' optimization of the comment 
> reflow... Any idea/hint?


I think that's just a bug in comment alignment, which is done at a different 
stage.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-23 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#931723, @Typz wrote:

> In https://reviews.llvm.org/D33589#925903, @klimek wrote:
>
> > I think this patch doesn't handle a couple of cases that I'd like to see 
> > handled. A counter-proposal with different trade-offs is in 
> > https://reviews.llvm.org/D40068.
>
>
> Can you provide more info on these cases? Are they all added to the tests of 
> https://reviews.llvm.org/D40068?
>
> It may be simpler (though not to my eyes, I am not knowledgeable enough to 
> really understand how you go this fixed...), and works fine for "almost 
> correct" comments: e.g. when there are indeed just a few extra characters 
> overall. But it still procudes strange result when each line of the (long) 
> comment is too long, but not enough to trigger a line-wrap by itself.
>
> Since that version has landed already, not sure how to improve on this. I 
> could probably rewrite my patch on master, but it seems a bit redundant. As a 
> simpler fix, I could imagine adding a "total" overflow counter, to allow 
> detecting the situation; but when this is detected (e.g. on subsequent lines) 
> we would need to "backtrack" and revisit the initial decision...


Yes, I added all tests I was thinking of to the patch. Note that we can always 
go back on submitted patches / do things differently. As my patch fulfilled all 
your tests, I (perhaps incorrectly?) assumed it solved your use cases - can you 
give me a test case of what you would want to happen that doesn't happen with 
my patch?

Are you, for example, saying that in the case

  Limit; 13
  // foo bar baz foo bar baz foo bar baz
  you'd not want
  // foo bar baz
  // foo bar baz
  // foo bar baz
  If the overall penalty of the protruding tokens is - what? More than 1 break 
penalty? If you care about more than 3x break penalty (which I would think is 
correct), the local algorithm will work, as local decisions will make sure the 
overall penalty cannot be exceeded.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-21 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

Btw, another issue I am having is that reflowing does not respect the 
alignment. For exemple:

  enum {
 Foo,///< This is a very long comment
 Bar,///< This is shorter  
 BarBar, ///< This is shorter
  } Stuff;

will be reflown to :

  enum {
 Foo, ///< This is a very long
  ///< comment
 Bar, ///< This is shorter  
 BarBar, ///< This is shorter
  } Stuff;

when I would expect:

  enum {
 Foo,///< This is a very
 ///< long comment
 Bar,///< This is shorter  
 BarBar, ///< This is shorter
  } Stuff;

I see there is no penalty for breaking alignment, which may be accounted for 
when compressing the whitespace 'before' the comment... Or maybe simply the 
breaking should be smarter, to try to keep the alignment; but I am not sure it 
can be done without another kind of 'global' optimization of the comment 
reflow... Any idea/hint?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-21 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

In https://reviews.llvm.org/D33589#925903, @klimek wrote:

> I think this patch doesn't handle a couple of cases that I'd like to see 
> handled. A counter-proposal with different trade-offs is in 
> https://reviews.llvm.org/D40068.


It may be simpler (though not to my eyes, I am not knowledgeable enough to 
really understand how you go this fixed...), and works fine for "almost 
correct" comments: e.g. when there are indeed just a few extra characters 
overall. But it still procudes strange result when each line of the (long) 
comment is too long, but not enough to trigger a line-wrap by itself.

Since that version has landed already, not sure how to improve on this. I could 
probably rewrite my patch on master, but it seems a bit redundant. As a simpler 
fix, I could imagine adding a "total" overflow counter, to allow detecting the 
situation; but when this is detected (e.g. on subsequent lines) we would need 
to "backtrack" and revisit the initial decision...


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-21 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

In https://reviews.llvm.org/D33589#924716, @klimek wrote:

> One interesting trade-off I'm running into:
>  My gut feeling is that we really want to make local decisions about whether 
> we want to break/reflow - this makes the code significantly simpler (IMO), 
> and handles all tests in this patch correctly, but is fundamentally limiting 
> the global optimizations we can do. Specifically, we would not correctly 
> reflow this:
>
>   //   |< limit
>   // foo bar
>   // baz
>   // x
>
> to
>
>   // foo
>   // bar
>   // baz x
>
> when the excess character limit is low.


As I can see with your patch, local decision does not account for accumulated 
penalty on multi-line comment, and will thus give unexpected (e.g. no change) 
result when each line overlaps by a few characters, but not enough to trigger a 
break at this line.

> That would be a point for global optimization, but I find it really hard to 
> understand exactly why it's ok to do it. Won't we get things like this wrong:
> 
>   Limit: 13
>   // foo  bar baz
>   // bab  bob
> 
> as we'll not compress whitespace?

Indeed, this patch would not trigger whitespace compression when not reflowing; 
it would compare "not doing anything" (no reflow, no whitespace compression) 
with the complete reflowing (including whitespace compression). I don't think 
that would break anything, but indeed we could possibly get even better result 
by trying to apply whitespace compression in the no-reflow case [which should 
be simple, just a bit more code at line 1376 in the version of the patch].


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-15 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

I think this patch doesn't handle a couple of cases that I'd like to see 
handled. A counter-proposal with different trade-offs is in 
https://reviews.llvm.org/D40068.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-14 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

One interesting trade-off I'm running into:
My gut feeling is that we really want to make local decisions about whether we 
want to break/reflow - this makes the code significantly simpler (IMO), and 
handles all tests in this patch correctly, but is fundamentally limiting the 
global optimizations we can do. Specifically, we would not correctly reflow 
this:

  //   |< limit
  // foo bar
  // baz
  // x

to

  // foo
  // bar
  // baz x

when the excess character limit is low.

That would be a point for global optimization, but I find it really hard to 
understand exactly why it's ok to do it. Won't we get things like this wrong:

  Limit: 13
  // foo  bar baz
  // bab  bob

as we'll not compress whitespace?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-14 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#920160, @Typz wrote:

> ping ?


I'm working on understanding this better :) I've refactored the code a bit so I 
could fully understand the problem, which I now do (sorry for this taking a 
while, but it took me multiple hours to work through this). I'm now convinced 
that you're right that we need the reflow/breaking logic to make the decision 
about the trade-offs, as we need the penalty for the potentially reflown code, 
as opposed to the penalty for the code if we do nothing.
Given that, my current gut feeling is that we'll want to go a bit further in 
that direction and make the change more local in breakProtrudingToken. I'll 
give that a spin and report back with what I find.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-11-09 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

ping ?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-10-24 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

> My question is: if CanBreak is false, we currently don't call 
> breakProtrudingToken. So either we do something very wrong in that case 
> (which might be true, but I'd like to understand why) or we should be able  
> to just calculate the penalty by not breaking anything and go on.

CanBreak is currently very short, it only verifies some very broad conditions. 
I initiallly tried patching at this level, but it really does not work.

Most conditions are actually tested at the beginning of breakProtrudingToken. 
That part of the code actually takes different branches for different kind of 
tokens (strings, block commands, line comments...), and handles many different 
cases for each. This goal is to create the actual BreakableToken object, but it 
has a few other side effects: it returns immediately in various corner cases 
(javascript, preprocessor, formatting macros), and it tweaks some 
variables, e.g. ColumnLimit.

In addition, replacement is non trivial even in case we choose not to reflow: 
it is required in order to properly manage whitespaces. For this reason we 
really must pass through the loop in most cases.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-20 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

In https://reviews.llvm.org/D33589#876196, @Typz wrote:

> This cannot be implemented where we currently call breakProtrudingToken(), 
> since this function starts by 'creating' the BreakableToken and dealing with 
> meany corner cases: so this should be done before in any case.
>  But the code at the end of breakProtrudingToken() is actually very similar 
> to what you propose.
>
> I can refactor the code to have two functions reflowProtrudingToken() and 
> getExcessPenalty(), though that will add some significant redundancy: the 
> problem is that even if we don't actually reflow, we still need (I think?) to 
> handle whitespace replacement and update the state (to set Column, 
> Stack.back().LastSpace, and Stack[i].BreakBeforeParameter, and call 
> updateNextToken() ...). And therefore getExcessPenalty() should probably not 
> just compute the penalty, it should also update the state and handle 
> whitespace replacement...


My question is: if CanBreak is false, we currently don't call 
breakProtrudingToken. So either we do something very wrong in that case (which 
might be true, but I'd like to understand why) or we should be able  to just 
calculate the penalty by not breaking anything and go on.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-20 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

This cannot be implemented where we currently call breakProtrudingToken(), 
since this function starts by 'creating' the BreakableToken and dealing with 
meany corner cases: so this should be done before in any case.
But the code at the end of breakProtrudingToken() is actually very similar to 
what you propose.

I can refactor the code to have two functions reflowProtrudingToken() and 
getExcessPenalty(), though that will add some significant redundancy: the 
problem is that even if we don't actually reflow, we still need (I think?) to 
handle whitespace replacement and update the state (to set Column, 
Stack.back().LastSpace, and Stack[i].BreakBeforeParameter, and call 
updateNextToken() ...). And therefore getExcessPenalty() should probably not 
just compute the penalty, it should also update the state and handle whitespace 
replacement...


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-19 Thread Manuel Klimek via Phabricator via cfe-commits
klimek added a comment.

I find the current semantics of the functions a bit surprising, specifically:
... reflowProtrudingToken(..., bool Reflow)
is really confusing me :)

I'd have expected something like this where we currently call 
breakProtrudingToken():

  if (CanBreak) {
ReflowPenalty = breakProtrudingToken(Current, State, /*DryRun=*/true);
FixedPenalty = getExcessPenalty(Current, State);
Penalty = ReflowPenalty < FixedPenalty ? ReflowPenalty : FixedPenalty;
if (!DryRun && ReflowPenalty < FixedPenalty
breakProtrudingToken(Current, State, /*DryRun=*/false);
  }

I haven't looked how we calculate the penalty currently if we can't break, as 
if we don't, that also seems ... wrong.
getExcessPenalty would be a new function that just focuses on getting the 
excess penalty for a breakable token.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-19 Thread Francois Ferrand via Phabricator via cfe-commits
Typz updated this revision to Diff 115830.
Typz added a comment.

Remove `Reflow` from LineState, and perform the decision again during 
reconstruction phase.


https://reviews.llvm.org/D33589

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/ContinuationIndenter.h
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -9809,6 +9809,60 @@
   EXPECT_EQ("#pragma option -C -A", format("#pragmaoption   -C   -A"));
 }
 
+TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
+  FormatStyle Style = getLLVMStyle();
+  Style.ColumnLimit = 20;
+
+  verifyFormat("int a; // the\n"
+   "   // comment", Style);
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+			format("int a; /* first line\n"
+   "* second\n"
+   "* line third\n"
+   "* line\n"
+   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second\n"
+"   // line third\n"
+"   // line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+
+  Style.PenaltyExcessCharacter = 90;
+  verifyFormat("int a; // the comment", Style);
+  EXPECT_EQ("int a; // the\n"
+"   // comment aa",
+format("int a; // the comment aa", Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second line\n"
+"* third line\n"
+"*/",
+			format("int a; /* first line\n"
+	   "* second line\n"
+			   "* third line\n"
+	   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second line\n"
+"   // third line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+format("int a; /* first line second line third line"
+   "\n*/", Style));
+}
+
 #define EXPECT_ALL_STYLES_EQUAL(Styles)\
   for (size_t i = 1; i < Styles.size(); ++i)   \
   EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
Index: lib/Format/ContinuationIndenter.h
===
--- lib/Format/ContinuationIndenter.h
+++ lib/Format/ContinuationIndenter.h
@@ -27,6 +27,7 @@
 namespace format {
 
 class AnnotatedLine;
+class BreakableToken;
 struct FormatToken;
 struct LineState;
 struct ParenState;
@@ -100,6 +101,16 @@
   unsigned breakProtrudingToken(const FormatToken , LineState ,
 bool DryRun);
 
+  /// \brief Perform the reflowing of a BreakableToken.
+  /// Reflow=true, then the function will reflow the token as needed. Otherwise, it simply computes
+  /// the penalty caused by this tokens characters.
+  ///
+  /// \returns The penalty of reflowing the token if State.Reflow=true; otherwise
+  /// the penalty of characters going beyond the column limit.
+  unsigned reflowProtrudingToken(const FormatToken & Current, LineState & State,
+ std::unique_ptr & Token,
+ unsigned ColumnLimit, bool DryRun, bool Reflow);
+
   /// \brief Appends the next token to \p State and updates information
   /// necessary for indentation.
   ///
Index: lib/Format/ContinuationIndenter.cpp
===
--- lib/Format/ContinuationIndenter.cpp
+++ lib/Format/ContinuationIndenter.cpp
@@ -1313,6 +1313,40 @@
   if (Current.UnbreakableTailLength >= ColumnLimit)
 return 0;
 
+  // Verify if the comment should be reflown
+  LineState PrevState = State;
+  unsigned Penalty = reflowProtrudingToken(Current, State, Token, ColumnLimit, true, true);
+  bool Reflow = true;
+  if (Penalty > 0) {
+LineState NoReflowState = PrevState;
+unsigned NoReflowPenalty = reflowProtrudingToken(Current, NoReflowState, Token, ColumnLimit,
+ true, false);
+if (NoReflowPenalty <= Penalty) {
+  Reflow = false;
+  State = NoReflowState;
+  Penalty = NoReflowPenalty;
+}
+  }
+
+  // Actually do the reflow, if DryRun=false
+  if (!DryRun)
+reflowProtrudingToken(Current, PrevState, Token, ColumnLimit, false, Reflow);
+
+  // Do not count the penalty twice, it will be added 

[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-19 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

> I am still trying to get to the bottom of this assertion, any hint where to 
> look for?

OK, got it. The issue is that we will actually need to run the wrapping 3 times 
when DryRun = false : call reflowProtrudingToken() twice with DryRun=true to 
find out the better option, then call it once more with DryRun=false.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-19 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

In https://reviews.llvm.org/D33589#875039, @djasper wrote:

> I think doing the computation twice is fine. Or at least, I'd need a test 
> case where it actually shows substantial overhead before doing what you are 
> doing here. Understand that creating more States and making the State object 
> itself larger also has cost and that cost occurs in the combinatorial 
> exploration of the solution space. Doing an additional computation at the end 
> should be comparatively cheap. Look at it this way: During the exploration of 
> the solution space, we might enter breakProtrudingToken many times for the 
> same comment. One more time during reconstruction of the solution is not that 
> harmful.


I 've just tried to implement this (e.g. make the 2 calls also in DryRun), and 
I am running into an assertion in WhitespaceManager :

  [ RUN  ] FormatTest.ConfigurableUseOfTab
  FormatTests: 
/workspace/llvm/tools/clang/lib/Format/WhitespaceManager.cpp:112: void 
clang::format::WhitespaceManager::calculateLineBreakInformation(): Assertion 
`PreviousOriginalWhitespaceEndOffset <= OriginalWhitespaceStartOffset' failed.

This remind there was another "reason" for limiting this to DryRun (not sure it 
is a good or bad reason): it happens only in the optimizer, all other cases 
where the indenter is used are not affected.

I am still trying to get to the bottom of this assertion, any hint where to 
look for?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-19 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added a comment.

I think doing the computation twice is fine. Or at least, I'd need a test case 
where it actually shows substantial overhead before doing what you are doing 
here. Understand that creating more States and making the State object itself 
larger also has cost and that cost occurs in the combinatorial exploration of 
the solution space. Doing an additional computation at the end should be 
comparatively cheap. Look at it this way: During the exploration of the 
solution space, we might enter breakProtrudingToken many times for the same 
comment. One more time during reconstruction of the solution is not that 
harmful.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-19 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

For one thing, we need to update the state to store the "decision" of the 
reflowing mode, which is performed only in DryRun=true mode, to avoid doing the 
computation multiple times.

Apart from this, the decision is conceptually internal to 
breakProtrudingToken(). But the 'computation' of either mode is very similar, 
and updates the state variable anyway (State.Column, Stack.back().LastSpace, 
State.Stack[i].BreakBeforeParameter, Token->updateNextToken(State)) : so it is 
simpler (code-wise) to split the function and call it twice, then decide which 
'State' variable to use.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-14 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added a comment.

I still don't understand yet. breakProtrudingToken has basically two options:

1. Don't wrap/reflow: In this case the penalty is determined by the number of 
excess characters.
2. Wrap/reflow: I this case the penalty is determined by PenaltySplitComments 
plus the remaining excess characters.

My question is, why do you need to put anything into the state and do this 
outside of breakProtrudingToken. It seems to me that breakProtrudingToken can 
do this locally without putting anything into the State.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-13 Thread Francois Ferrand via Phabricator via cfe-commits
Typz updated this revision to Diff 115051.
Typz added a comment.

Reorder the functions to minimize diff.


https://reviews.llvm.org/D33589

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/ContinuationIndenter.h
  lib/Format/UnwrappedLineFormatter.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -9809,6 +9809,60 @@
   EXPECT_EQ("#pragma option -C -A", format("#pragmaoption   -C   -A"));
 }
 
+TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
+  FormatStyle Style = getLLVMStyle();
+  Style.ColumnLimit = 20;
+
+  verifyFormat("int a; // the\n"
+   "   // comment", Style);
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+			format("int a; /* first line\n"
+   "* second\n"
+   "* line third\n"
+   "* line\n"
+   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second\n"
+"   // line third\n"
+"   // line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+
+  Style.PenaltyExcessCharacter = 90;
+  verifyFormat("int a; // the comment", Style);
+  EXPECT_EQ("int a; // the\n"
+"   // comment aa",
+format("int a; // the comment aa", Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second line\n"
+"* third line\n"
+"*/",
+			format("int a; /* first line\n"
+	   "* second line\n"
+			   "* third line\n"
+	   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second line\n"
+"   // third line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+format("int a; /* first line second line third line"
+   "\n*/", Style));
+}
+
 #define EXPECT_ALL_STYLES_EQUAL(Styles)\
   for (size_t i = 1; i < Styles.size(); ++i)   \
   EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
Index: lib/Format/UnwrappedLineFormatter.cpp
===
--- lib/Format/UnwrappedLineFormatter.cpp
+++ lib/Format/UnwrappedLineFormatter.cpp
@@ -951,6 +951,7 @@
 for (std::deque::iterator I = Path.begin(), E = Path.end();
  I != E; ++I) {
   unsigned Penalty = 0;
+  State.Reflow = (*I)->State.Reflow;
   formatChildren(State, (*I)->NewLine, /*DryRun=*/false, Penalty);
   Penalty += Indenter->addTokenToState(State, (*I)->NewLine, false);
 
Index: lib/Format/ContinuationIndenter.h
===
--- lib/Format/ContinuationIndenter.h
+++ lib/Format/ContinuationIndenter.h
@@ -27,6 +27,7 @@
 namespace format {
 
 class AnnotatedLine;
+class BreakableToken;
 struct FormatToken;
 struct LineState;
 struct ParenState;
@@ -100,6 +101,16 @@
   unsigned breakProtrudingToken(const FormatToken , LineState ,
 bool DryRun);
 
+  /// \brief Perform the reflowing of a BreakableToken.
+  /// If State.Reflow=true, then the function will reflow the token as needed.
+  /// Otherwise, it simply computes the penalty caused by this tokens characters.
+  ///
+  /// \returns The penalty of reflowing the token if State.Reflow=true; otherwise
+  /// the penalty of characters going beyond the column limit.
+  unsigned reflowProtrudingToken(const FormatToken & Current, LineState & State,
+ std::unique_ptr & Token,
+ unsigned ColumnLimit, bool DryRun);
+
   /// \brief Appends the next token to \p State and updates information
   /// necessary for indentation.
   ///
@@ -350,6 +361,11 @@
   /// \brief The indent of the first token.
   unsigned FirstIndent;
 
+  /// \brief Indicates if comments are reflown.
+  /// This value is set when breakProtrudingToken() is called with DryRun=true,
+  /// and simply used otherwise.
+  bool Reflow = true;
+
   /// \brief The line that is being formatted.
   ///
   /// Does not need to be considered for memoization because it doesn't change.
Index: lib/Format/ContinuationIndenter.cpp
===
--- 

[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-13 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added inline comments.



Comment at: lib/Format/ContinuationIndenter.cpp:1339
 
+unsigned ContinuationIndenter::breakProtrudingToken(const FormatToken ,
+LineState ,

djasper wrote:
> Can you create a patch that doesn't move the code around so much? Seems 
> unnecessary and hard to review.
Moving the code around is an unfortunate requirement for this patch: we must 
actually do the "reflowing" twice, to find out which solution (actually 
reflowing or not) gives the least penalty.

Therefore the function that reflowing must be moved out of 
`breakProtrudingToken()`...

All I can do is try moving the new function after `breakProtrudingToken()`, 
maybe Phabricator will better show the change.



Comment at: lib/Format/ContinuationIndenter.cpp:1446
+  // Do not count the penalty twice, it will be added afterwards
+  if (State.Column > getColumnLimit(State)) {
+unsigned ExcessCharacters = State.Column - getColumnLimit(State);

djasper wrote:
> I believe that this is incorrect. reflowProtrudingToken counts the length of 
> the unbreakable tail and here you just remove the penalty of the token 
> itself. E.g. in:
> 
>   string s = f("aaa");
> 
> the ");" is the unbreakable tail of the stringl
This behavior has not changed : before the commit, the last token was not 
included in the penalty [c.f. `if` at line 1338 in original code].

To make the comparison significative, the last token's penalty is included in 
the penalty returned by `reflowProtrudingToken()` (hence the removal of the 
test at line 1260); and it is removed before returning from this function, to 
keep the same behavior as before.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-12 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added a comment.

I have a slightly hard time grasping what this patch now actually does? Doesn't 
it simply try to decide whether or not to make a split locally be comparing the 
PenaltyBreakComment against the penalty for the access characters? If so, 
couldn't we simply do that as an implementation detail of 
breakProtrudingToken() without needing to let anything outside of it now and 
without introducing State.Reflow?




Comment at: lib/Format/ContinuationIndenter.cpp:1339
 
+unsigned ContinuationIndenter::breakProtrudingToken(const FormatToken ,
+LineState ,

Can you create a patch that doesn't move the code around so much? Seems 
unnecessary and hard to review.



Comment at: lib/Format/ContinuationIndenter.cpp:1446
+  // Do not count the penalty twice, it will be added afterwards
+  if (State.Column > getColumnLimit(State)) {
+unsigned ExcessCharacters = State.Column - getColumnLimit(State);

I believe that this is incorrect. reflowProtrudingToken counts the length of 
the unbreakable tail and here you just remove the penalty of the token itself. 
E.g. in:

  string s = f("aaa");

the ");" is the unbreakable tail of the stringl


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-09-11 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

ping?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-07-28 Thread Francois Ferrand via Phabricator via cfe-commits
Typz updated this revision to Diff 108651.
Typz added a comment.

Rebase


https://reviews.llvm.org/D33589

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/ContinuationIndenter.h
  lib/Format/UnwrappedLineFormatter.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -9475,6 +9475,60 @@
   EXPECT_EQ("#pragma option -C -A", format("#pragmaoption   -C   -A"));
 }
 
+TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
+  FormatStyle Style = getLLVMStyle();
+  Style.ColumnLimit = 20;
+
+  verifyFormat("int a; // the\n"
+   "   // comment", Style);
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+			format("int a; /* first line\n"
+   "* second\n"
+   "* line third\n"
+   "* line\n"
+   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second\n"
+"   // line third\n"
+"   // line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+
+  Style.PenaltyExcessCharacter = 90;
+  verifyFormat("int a; // the comment", Style);
+  EXPECT_EQ("int a; // the\n"
+"   // comment aa",
+format("int a; // the comment aa", Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second line\n"
+"* third line\n"
+"*/",
+			format("int a; /* first line\n"
+	   "* second line\n"
+			   "* third line\n"
+	   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second line\n"
+"   // third line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+format("int a; /* first line second line third line"
+   "\n*/", Style));
+}
+
 #define EXPECT_ALL_STYLES_EQUAL(Styles)\
   for (size_t i = 1; i < Styles.size(); ++i)   \
   EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
Index: lib/Format/UnwrappedLineFormatter.cpp
===
--- lib/Format/UnwrappedLineFormatter.cpp
+++ lib/Format/UnwrappedLineFormatter.cpp
@@ -885,6 +885,7 @@
 for (std::deque::iterator I = Path.begin(), E = Path.end();
  I != E; ++I) {
   unsigned Penalty = 0;
+  State.Reflow = (*I)->State.Reflow;
   formatChildren(State, (*I)->NewLine, /*DryRun=*/false, Penalty);
   Penalty += Indenter->addTokenToState(State, (*I)->NewLine, false);
 
Index: lib/Format/ContinuationIndenter.h
===
--- lib/Format/ContinuationIndenter.h
+++ lib/Format/ContinuationIndenter.h
@@ -27,6 +27,7 @@
 namespace format {
 
 class AnnotatedLine;
+class BreakableToken;
 struct FormatToken;
 struct LineState;
 struct ParenState;
@@ -100,6 +101,11 @@
   unsigned breakProtrudingToken(const FormatToken , LineState ,
 bool DryRun);
 
+  unsigned reflowProtrudingToken(const FormatToken & Current, LineState & State,
+ std::unique_ptr & Token,
+ unsigned ColumnLimit, bool DryRun);
+
+
   /// \brief Appends the next token to \p State and updates information
   /// necessary for indentation.
   ///
@@ -350,6 +356,8 @@
   /// \brief The indent of the first token.
   unsigned FirstIndent;
 
+  bool Reflow = true;
+
   /// \brief The line that is being formatted.
   ///
   /// Does not need to be considered for memoization because it doesn't change.
Index: lib/Format/ContinuationIndenter.cpp
===
--- lib/Format/ContinuationIndenter.cpp
+++ lib/Format/ContinuationIndenter.cpp
@@ -1219,92 +1219,10 @@
   return 0;
 }
 
-unsigned ContinuationIndenter::breakProtrudingToken(const FormatToken ,
-LineState ,
-bool DryRun) {
-  // Don't break multi-line tokens other than block comments. Instead, just
-  // update the state.
-  if (Current.isNot(TT_BlockComment) && Current.IsMultiline)
-return addMultilineToken(Current, State);
-
-  // Don't break implicit string literals or 

[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-07-13 Thread Francois Ferrand via Phabricator via cfe-commits
Typz updated this revision to Diff 106442.
Typz added a comment.

Move code out of optimizer, directly into 
ContinuationIndenter::breakProtrudingToken(), to minimize impact on performance.
Comment reflowing of breakable items which break the limit will be slightly 
slower, since we now consider the two options; however this change has no 
impact on the performance of processing non-breakable items, and may actually 
increase the operformance of processing breakable items which do not need to be 
reflowed.


https://reviews.llvm.org/D33589

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/ContinuationIndenter.h
  lib/Format/UnwrappedLineFormatter.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -9393,6 +9393,60 @@
   EXPECT_EQ("#pragma option -C -A", format("#pragmaoption   -C   -A"));
 }
 
+TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
+  FormatStyle Style = getLLVMStyle();
+  Style.ColumnLimit = 20;
+
+  verifyFormat("int a; // the\n"
+   "   // comment", Style);
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+			format("int a; /* first line\n"
+   "* second\n"
+   "* line third\n"
+   "* line\n"
+   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second\n"
+"   // line third\n"
+"   // line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+
+  Style.PenaltyExcessCharacter = 90;
+  verifyFormat("int a; // the comment", Style);
+  EXPECT_EQ("int a; // the\n"
+"   // comment aa",
+format("int a; // the comment aa", Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second line\n"
+"* third line\n"
+"*/",
+			format("int a; /* first line\n"
+	   "* second line\n"
+			   "* third line\n"
+	   "*/", Style));
+  EXPECT_EQ("int a; // first line\n"
+"   // second line\n"
+"   // third line",
+format("int a; // first line\n"
+   "   // second line\n"
+   "   // third line",
+   Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+format("int a; /* first line second line third line"
+   "\n*/", Style));
+}
+
 #define EXPECT_ALL_STYLES_EQUAL(Styles)\
   for (size_t i = 1; i < Styles.size(); ++i)   \
   EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
Index: lib/Format/UnwrappedLineFormatter.cpp
===
--- lib/Format/UnwrappedLineFormatter.cpp
+++ lib/Format/UnwrappedLineFormatter.cpp
@@ -866,6 +866,7 @@
 for (std::deque::iterator I = Path.begin(), E = Path.end();
  I != E; ++I) {
   unsigned Penalty = 0;
+  State.Reflow = (*I)->State.Reflow;
   formatChildren(State, (*I)->NewLine, /*DryRun=*/false, Penalty);
   Penalty += Indenter->addTokenToState(State, (*I)->NewLine, false);
 
Index: lib/Format/ContinuationIndenter.h
===
--- lib/Format/ContinuationIndenter.h
+++ lib/Format/ContinuationIndenter.h
@@ -27,6 +27,7 @@
 namespace format {
 
 class AnnotatedLine;
+class BreakableToken;
 struct FormatToken;
 struct LineState;
 struct ParenState;
@@ -100,6 +101,11 @@
   unsigned breakProtrudingToken(const FormatToken , LineState ,
 bool DryRun);
 
+  unsigned reflowProtrudingToken(const FormatToken & Current, LineState & State,
+ std::unique_ptr & Token,
+ unsigned ColumnLimit, bool DryRun);
+
+
   /// \brief Appends the next token to \p State and updates information
   /// necessary for indentation.
   ///
@@ -350,6 +356,8 @@
   /// \brief The indent of the first token.
   unsigned FirstIndent;
 
+  bool Reflow = true;
+
   /// \brief The line that is being formatted.
   ///
   /// Does not need to be considered for memoization because it doesn't change.
Index: lib/Format/ContinuationIndenter.cpp
===
--- lib/Format/ContinuationIndenter.cpp
+++ lib/Format/ContinuationIndenter.cpp
@@ -1219,6 +1219,112 @@
   return 0;
 }
 
+unsigned 

[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-07-13 Thread Francois Ferrand via Phabricator via cfe-commits
Typz marked 2 inline comments as done.
Typz added inline comments.



Comment at: lib/Format/UnwrappedLineFormatter.cpp:723
   FormatDecision LastFormat = Node->State.NextToken->Decision;
   if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,

Typz wrote:
> djasper wrote:
> > This is almost certainly a no-go. It turns the algorithm from exploring a 
> > state space with a branching factor of 2 into something with a branching 
> > factor of 4.
> > 
> > Even assuming we can perfectly reuse results from computations (or in other 
> > words hit the same state through different path using Dijkstra's 
> > algorithm), the additional bool in StateNode doubles the number of states 
> > we are going to visit. You don't even seem to make a distinction of whether 
> > the token under investigation can possibly be split. You do look at 
> > NoLineBreak(InOperand), but even if those are false, the vast majority of 
> > tokens will never be split.
> > 
> > However, even if you narrow that down, I am not convinced that fixing this 
> > inconsistency around the formatting penalties is worth the runtime 
> > performance hit. I am generally fine with changing the behavior in the way 
> > you are proposing, but only if it has zero (negative) effect on performance.
> Making the distinction to skip some path is done at the beginning of 
> addNextStateToQueue(), though indeed not very precisely at the moment.
> 
> I can improve the check (i.e. by copying all the early return conditions from 
> the beginning of `ContinuationIndenter::breakProtrudingToken()`, which will 
> greatly reduce the number of possible state, but stilll have a small impact.
> 
> The alternative would be modify ContinuationIndenter::breakProtrudingToken(), 
> so that it computes the penalty for reflowing as well as not-reflowing, and 
> updates the LineState with the best solution. It would certainly have an 
> impact on performance, but would not increase the size of the state space.
> 
> Another issue with that approach is that the information is not passed from 
> the DRYRUN phase to the actual rewriting phase. I could store this 
> information in the LineState, to re-use it in the reconstruction phase, but 
> this seems really wrong (and would work only in the exact case of the 
> optimizing line formatter).
> 
> What do you think? Should I keep the same structure but reduce the number of 
> explored state; or move the decision into ContinuationIndenter, possibly 
> storing the result in LineState?
> 
Actually, it seems it cannot be done inside 
ContinuationIndenter::breakProtrudingToken(), since this causes the whitespace 
manager to be called twice for the same token.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-07-13 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added inline comments.



Comment at: lib/Format/UnwrappedLineFormatter.cpp:723
   FormatDecision LastFormat = Node->State.NextToken->Decision;
   if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,

djasper wrote:
> This is almost certainly a no-go. It turns the algorithm from exploring a 
> state space with a branching factor of 2 into something with a branching 
> factor of 4.
> 
> Even assuming we can perfectly reuse results from computations (or in other 
> words hit the same state through different path using Dijkstra's algorithm), 
> the additional bool in StateNode doubles the number of states we are going to 
> visit. You don't even seem to make a distinction of whether the token under 
> investigation can possibly be split. You do look at NoLineBreak(InOperand), 
> but even if those are false, the vast majority of tokens will never be split.
> 
> However, even if you narrow that down, I am not convinced that fixing this 
> inconsistency around the formatting penalties is worth the runtime 
> performance hit. I am generally fine with changing the behavior in the way 
> you are proposing, but only if it has zero (negative) effect on performance.
Making the distinction to skip some path is done at the beginning of 
addNextStateToQueue(), though indeed not very precisely at the moment.

I can improve the check (i.e. by copying all the early return conditions from 
the beginning of `ContinuationIndenter::breakProtrudingToken()`, which will 
greatly reduce the number of possible state, but stilll have a small impact.

The alternative would be modify ContinuationIndenter::breakProtrudingToken(), 
so that it computes the penalty for reflowing as well as not-reflowing, and 
updates the LineState with the best solution. It would certainly have an impact 
on performance, but would not increase the size of the state space.

Another issue with that approach is that the information is not passed from the 
DRYRUN phase to the actual rewriting phase. I could store this information in 
the LineState, to re-use it in the reconstruction phase, but this seems really 
wrong (and would work only in the exact case of the optimizing line formatter).

What do you think? Should I keep the same structure but reduce the number of 
explored state; or move the decision into ContinuationIndenter, possibly 
storing the result in LineState?




Comment at: lib/Format/UnwrappedLineFormatter.cpp:764
   return;
+if (!BreakToken && !Indenter->canSplit(PreviousNode->State))
+  return;

djasper wrote:
> It's not clear to me why you'd return here.
This is needed to avoid trying to break when this is not supported: no point 
doing twice the same thing.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-07-05 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added inline comments.



Comment at: lib/Format/UnwrappedLineFormatter.cpp:723
   FormatDecision LastFormat = Node->State.NextToken->Decision;
   if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,

This is almost certainly a no-go. It turns the algorithm from exploring a state 
space with a branching factor of 2 into something with a branching factor of 4.

Even assuming we can perfectly reuse results from computations (or in other 
words hit the same state through different path using Dijkstra's algorithm), 
the additional bool in StateNode doubles the number of states we are going to 
visit. You don't even seem to make a distinction of whether the token under 
investigation can possibly be split. You do look at NoLineBreak(InOperand), but 
even if those are false, the vast majority of tokens will never be split.

However, even if you narrow that down, I am not convinced that fixing this 
inconsistency around the formatting penalties is worth the runtime performance 
hit. I am generally fine with changing the behavior in the way you are 
proposing, but only if it has zero (negative) effect on performance.



Comment at: lib/Format/UnwrappedLineFormatter.cpp:764
   return;
+if (!BreakToken && !Indenter->canSplit(PreviousNode->State))
+  return;

It's not clear to me why you'd return here.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-07-05 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

ping?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-06-23 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

In https://reviews.llvm.org/D33589#789002, @alexfh wrote:

> why do we want to make an exception for comments and not for regular code?


This is not an exception for comments: the `PenaltyExcessCharacter` is used 
whenever the code is longer than the `ColumnLimit`, and used to compute the 
best solution.
The default value is extremely high, so there is no practical difference: code 
does not go beyond the column, except for specific exceptions.

However, the difference becomes apparent when reducing this penalty: in that 
case, code can break the column limit instead of wrapping, but it does not 
happen with comments. This patch tries to restore the parity, and let comments 
break the comment limit as well.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-06-23 Thread Alexander Kornienko via Phabricator via cfe-commits
alexfh added a comment.

Daniel is a better reviewer here than myself. A few cents from me though: why 
do we want to make an exception for comments and not for regular code?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-06-19 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

@krasimir, @alexfh : can I get some feedback?

This patch solves a practical problem, i.e. allowing the comment to overflow a 
bit without triggering a reflow [according to the priorities which are in 
config, obviously]. It may not always provide the "best" wrapping, but as you 
pointed out comments are unstructured, so processing must indeed stay quite 
conservative: which is how this patch tries to behave.

Is there a better way to do this? or do you see something missing?

> There are at least three separate parts in clang-format that make up the 
> formatting of comments: the normal parsing and optimization pipeline, the 
> BreakableLineCommentSection / BreakableBlockComment classes that are 
> responsible for breaking and reflowing inside comment sections, and the 
> consecutive trailing comment alignment in the WhitespaceManager. This patch 
> takes into account the first aspect but not the consequences for the other 
> aspects

You suggested there is an issue, but I could not isolate it. Can you provide an 
exemple?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-06-14 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

@krasimir : ping


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-06-12 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

ping?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-06-06 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

In https://reviews.llvm.org/D33589#769893, @krasimir wrote:

> I think that what you're trying to solve is not practically that important, 
> is unlikely to improve the handling of comments, and will add a lot of 
> complexity.


Not sure the 'approach' I have in this patch is nice, but it does solve my 
problem: it is quite simple, and avoids reflowing (and actually, sometimes 
breaking, since comments are not so structured...) the comments.
e.g. if I have a line followed by a relatively long comment, with my patch it 
will really consider the penaltys properly, and not split a line which is 
slightly longer [with ColumnLimit=30] :

  int a; //< a very long comment
   

vs

  int a; //< a very long
 //< comment

> From a usability perspective, I think that people are happy enough when their 
> comments don't exceed the line limit. I personally wouldn't want the opposite 
> to happen. I've even seen style guides that have 80 columns limit for 
> comments and 100 for code.

That is a matter of taste and coding style: I prefer overflowing by a few 
characters instead of introducing an extra line... I see no reason to allow 
PenaltyExcessCharacter

> Comments are treated in a somewhat ad-hoc style in clang-format, which is 
> because they are inherently free text and don't have a specific format. 
> People just expect comments to be handled quite differently than source code. 
> There are at least three separate parts in clang-format that make up the 
> formatting of comments: the normal parsing and optimization pipeline, the 
> BreakableLineCommentSection / BreakableBlockComment classes that are 
> responsible for breaking and reflowing inside comment sections, and the 
> consecutive trailing comment alignment in the WhitespaceManager. This patch 
> takes into account the first aspect but not the consequences for the other 
> aspects: for example it allows for the first line comment token in a 
> multiline line comment section to get out of the column limit, but will 
> reflow the rest of the lines. A WhitespaceManager-related issue is that 
> because a trailing line comment for some declaration might not get split, it 
> might not be aligned with the surrounding trailing line comments, which I 
> think is a less desirable effect.

Not sure I understand the case you are speaking about, can you please provide 
an exemple?

(on a separate topic, I could indeed not find a penalty for breaking alignment; 
I think the optimizer should account for that as well... any hint on where to 
look for adding this?)

> Optimizing the layout in comment sections by using the optimizing formatter 
> sounds good, but because comments mostly contain free text that can be 
> structured in unexpected ways, I think that the ad-hoc approach works better. 
> Think of not destroying ASCII art and supporting bulleted and numbered lists 
> for example. We don't really want to leak lots of comment-related formatting 
> tweaks into the general pipeline itself.

Good, one less choice :-)

> I see you point that PenaltyBreakComment and PenaltyExcessCharacter are not 
> taken into account while comment placement, but they are taken into account 
> at a higher level by treating the whole comment section as a unit. For 
> example, if you have a long declaration that has multiple potential split 
> points followed by a long trailing comment section, the penalty induced by 
> the number of lines that are needed and the number of unbroken characters for 
> the formatting of the comment section is taken into account while optimizing 
> the layout of the whole code fragment.

If you reduce PenaltyExcessCharacter you see that comments can never overflow: 
the optimizer will consider splitting the comments or splitting the remaining 
of the code, but will (currently) not allow the comment to overflow just a bit.

> The formatted currently supports somewhat limited version of allowing comment 
> lines exceeding the column limit, like long hyperlinks for example. I think 
> that if there are some other examples which are both widespread and super 
> annoying, we may consider them separately.

This patch does not try to address these special cases, and should not change 
the behavior in this case.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-06-01 Thread Krasimir Georgiev via Phabricator via cfe-commits
krasimir added a comment.

I think that what you're trying to solve is not practically that important, is 
unlikely to improve the handling of comments, and will add a lot of complexity.

From a usability perspective, I think that people are happy enough when their 
comments don't exceed the line limit. I personally wouldn't want the opposite 
to happen. I've even seen style guides that have 80 columns limit for comments 
and 100 for code.

Comments are treated in a somewhat ad-hoc style in clang-format, which is 
because they are inherently free text and don't have a specific format. People 
just expect comments to be handled quite differently than source code. There 
are at least three separate parts in clang-format that make up the formatting 
of comments: the normal parsing and optimization pipeline, the 
BreakableLineCommentSection / BreakableBlockComment classes that are 
responsible for breaking and reflowing inside comment sections, and the 
consecutive trailing comment alignment in the WhitespaceManager. This patch 
takes into account the first aspect but not the consequences for the other 
aspects: for example it allows for the first line comment token in a multiline 
line comment section to get out of the column limit, but will reflow the rest 
of the lines. A WhitespaceManager-related issue is that because a trailing line 
comment for some declaration might not get split, it might not be aligned with 
the surrounding trailing line comments, which I think is a less desirable 
effect.

Optimizing the layout in comment sections by using the optimizing formatter 
sounds good, but because comments mostly contain free text that can be 
structured in unexpected ways, I think that the ad-hoc approach works better. 
Think of not destroying ASCII art and supporting bulleted and numbered lists 
for example. We don't really want to leak lots of comment-related formatting 
tweaks into the general pipeline itself.

I see you point that PenaltyBreakComment and PenaltyExcessCharacter are not 
taken into account while comment placement, but they are taken into account at 
a higher level by treating the whole comment section as a unit. For example, if 
you have a long declaration that has multiple potential split points followed 
by a long trailing comment section, the penalty induced by the number of lines 
that are needed and the number of unbroken characters for the formatting of the 
comment section is taken into account while optimizing the layout of the whole 
code fragment.

The formatted currently supports somewhat limited version of allowing comment 
lines exceeding the column limit, like long hyperlinks for example. I think 
that if there are some other examples which are both widespread and super 
annoying, we may consider them separately.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-05-30 Thread Francois Ferrand via Phabricator via cfe-commits
Typz updated this revision to Diff 100698.
Typz marked an inline comment as done.
Typz added a comment.

fix code & tests


https://reviews.llvm.org/D33589

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/ContinuationIndenter.h
  lib/Format/FormatToken.cpp
  lib/Format/FormatToken.h
  lib/Format/UnwrappedLineFormatter.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -8542,6 +8542,45 @@
   EXPECT_EQ("#pragma option -C -A", format("#pragmaoption   -C   -A"));
 }
 
+TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
+  FormatStyle Style = getLLVMStyle();
+  Style.ColumnLimit = 20;
+
+  verifyFormat("int a; // the\n"
+   "   // comment", Style);
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+			format("int a; /* first line\n"
+   "* second\n"
+   "* line third\n"
+   "* line\n"
+   "*/", Style));
+
+  Style.PenaltyExcessCharacter = 90;
+  verifyFormat("int a; // the comment", Style);
+  EXPECT_EQ("int a; // the\n"
+"   // comment aa",
+format("int a; // the comment aa", Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second line\n"
+"* third line\n"
+"*/",
+			format("int a; /* first line\n"
+	   "* second line\n"
+			   "* third line\n"
+	   "*/", Style));
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+format("int a; /* first line second line third line"
+   "\n*/", Style));
+}
+
 #define EXPECT_ALL_STYLES_EQUAL(Styles)\
   for (size_t i = 1; i < Styles.size(); ++i)   \
   EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
Index: lib/Format/UnwrappedLineFormatter.cpp
===
--- lib/Format/UnwrappedLineFormatter.cpp
+++ lib/Format/UnwrappedLineFormatter.cpp
@@ -590,7 +590,8 @@
   (Indenter->canBreak(State) && State.NextToken->NewlinesBefore > 0);
   unsigned Penalty = 0;
   formatChildren(State, Newline, /*DryRun=*/false, Penalty);
-  Indenter->addTokenToState(State, Newline, /*DryRun=*/false);
+  Indenter->addTokenToState(State, Newline, /*BreakToken=*/true,
+/*DryRun=*/false);
 }
 return 0;
   }
@@ -611,7 +612,8 @@
 LineState State = Indenter->getInitialState(FirstIndent, , DryRun);
 while (State.NextToken) {
   formatChildren(State, /*Newline=*/false, DryRun, Penalty);
-  Indenter->addTokenToState(State, /*Newline=*/false, DryRun);
+  Indenter->addTokenToState(State, /*Newline=*/false, /*BreakToken=*/true,
+DryRun);
 }
 return Penalty;
   }
@@ -658,10 +660,11 @@
   /// \brief An edge in the solution space from \c Previous->State to \c State,
   /// inserting a newline dependent on the \c NewLine.
   struct StateNode {
-StateNode(const LineState , bool NewLine, StateNode *Previous)
-: State(State), NewLine(NewLine), Previous(Previous) {}
+StateNode(const LineState , bool NewLine, bool BreakToken, StateNode *Previous)
+: State(State), NewLine(NewLine), BreakToken(BreakToken), Previous(Previous) {}
 LineState State;
 bool NewLine;
+bool BreakToken;
 StateNode *Previous;
   };
 
@@ -691,7 +694,7 @@
 
 // Insert start element into queue.
 StateNode *Node =
-new (Allocator.Allocate()) StateNode(InitialState, false, nullptr);
+new (Allocator.Allocate()) StateNode(InitialState, false, true, nullptr);
 Queue.push(QueueItem(OrderedPenalty(0, Count), Node));
 ++Count;
 
@@ -718,9 +721,17 @@
 
   FormatDecision LastFormat = Node->State.NextToken->Decision;
   if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
-addNextStateToQueue(Penalty, Node, /*NewLine=*/false, , );
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,
+/*BreakToken=*/true, , );
   if (LastFormat == FD_Unformatted || LastFormat == FD_Break)
-addNextStateToQueue(Penalty, Node, /*NewLine=*/true, , );
+addNextStateToQueue(Penalty, Node, /*NewLine=*/true,
+/*BreakToken=*/true, , );
+  if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,
+/*BreakToken=*/false, , );
+  if (LastFormat == 

[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-05-30 Thread Francois Ferrand via Phabricator via cfe-commits
Typz marked an inline comment as done.
Typz added inline comments.



Comment at: unittests/Format/FormatTest.cpp:8571
+"*/",
+format("int a; /* first line second line third line */", Style));
+}

Typz wrote:
> This is not working as expected, format return:
> 
>   int a; /* first line
> * second *
> line third
> * line */
> 
> Any clue how things could go so wrong?
This was a bug in tests: should not use test::messUp() with multi-line comments.


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-05-26 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

Still some issues with the patch, I would need some feedback first:

- Is this approach desirable, as a relatively easy fix?
- Or should this be fixed with a complete refactoring of the way the 
strings/comments are split, making multiple tokens out of them to let them be 
reflown 'directly' by the optimizing line formatter? (keep the optimizer, but 
changes the whole way comments are handled, and need to move all the 
information about the comment block and its relfowing into LineState)
- Or should the breakProtrudingToken() actually perform a "local" optimization 
of the splits? (keeps the same overall structure, but requires writing another 
optimizer)




Comment at: unittests/Format/FormatTest.cpp:8571
+"*/",
+format("int a; /* first line second line third line */", Style));
+}

This is not working as expected, format return:

  int a; /* first line
* second *
line third
* line */

Any clue how things could go so wrong?


https://reviews.llvm.org/D33589



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-05-26 Thread Francois Ferrand via Phabricator via cfe-commits
Typz updated this revision to Diff 100379.
Typz added a comment.

fix indentation issues


https://reviews.llvm.org/D33589

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/ContinuationIndenter.h
  lib/Format/FormatToken.cpp
  lib/Format/FormatToken.h
  lib/Format/UnwrappedLineFormatter.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -8542,6 +8542,35 @@
   EXPECT_EQ("#pragma option -C -A", format("#pragmaoption   -C   -A"));
 }
 
+TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
+  FormatStyle Style = getLLVMStyle();
+  Style.ColumnLimit = 20;
+
+  verifyFormat("int a; // the\n"
+   "   // comment", Style);
+  verifyFormat("int a; /* first line\n"
+   "* second\n"
+   "* line third\n"
+   "* line\n"
+   "*/", Style);
+
+  Style.PenaltyExcessCharacter = 30;
+  verifyFormat("int a; // the comment", Style);
+  EXPECT_EQ("int a; // the\n"
+"   // comment aa",
+format("int a; // the comment aa", Style));
+  verifyFormat("int a; /* first line\n"
+   "* second line\n"
+   "* third line\n"
+   "*/", Style);
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+"* line third\n"
+"* line\n"
+"*/",
+format("int a; /* first line second line third line */", Style));
+}
+
 #define EXPECT_ALL_STYLES_EQUAL(Styles)\
   for (size_t i = 1; i < Styles.size(); ++i)   \
   EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
Index: lib/Format/UnwrappedLineFormatter.cpp
===
--- lib/Format/UnwrappedLineFormatter.cpp
+++ lib/Format/UnwrappedLineFormatter.cpp
@@ -590,7 +590,8 @@
   (Indenter->canBreak(State) && State.NextToken->NewlinesBefore > 0);
   unsigned Penalty = 0;
   formatChildren(State, Newline, /*DryRun=*/false, Penalty);
-  Indenter->addTokenToState(State, Newline, /*DryRun=*/false);
+  Indenter->addTokenToState(State, Newline, /*BreakToken=*/true,
+/*DryRun=*/false);
 }
 return 0;
   }
@@ -611,7 +612,8 @@
 LineState State = Indenter->getInitialState(FirstIndent, , DryRun);
 while (State.NextToken) {
   formatChildren(State, /*Newline=*/false, DryRun, Penalty);
-  Indenter->addTokenToState(State, /*Newline=*/false, DryRun);
+  Indenter->addTokenToState(State, /*Newline=*/false, /*BreakToken=*/true,
+DryRun);
 }
 return Penalty;
   }
@@ -658,10 +660,11 @@
   /// \brief An edge in the solution space from \c Previous->State to \c State,
   /// inserting a newline dependent on the \c NewLine.
   struct StateNode {
-StateNode(const LineState , bool NewLine, StateNode *Previous)
-: State(State), NewLine(NewLine), Previous(Previous) {}
+StateNode(const LineState , bool NewLine, bool BreakToken, StateNode *Previous)
+: State(State), NewLine(NewLine), BreakToken(BreakToken), Previous(Previous) {}
 LineState State;
 bool NewLine;
+bool BreakToken;
 StateNode *Previous;
   };
 
@@ -691,7 +694,7 @@
 
 // Insert start element into queue.
 StateNode *Node =
-new (Allocator.Allocate()) StateNode(InitialState, false, nullptr);
+new (Allocator.Allocate()) StateNode(InitialState, false, true, nullptr);
 Queue.push(QueueItem(OrderedPenalty(0, Count), Node));
 ++Count;
 
@@ -718,9 +721,17 @@
 
   FormatDecision LastFormat = Node->State.NextToken->Decision;
   if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
-addNextStateToQueue(Penalty, Node, /*NewLine=*/false, , );
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,
+/*BreakToken=*/true, , );
   if (LastFormat == FD_Unformatted || LastFormat == FD_Break)
-addNextStateToQueue(Penalty, Node, /*NewLine=*/true, , );
+addNextStateToQueue(Penalty, Node, /*NewLine=*/true,
+/*BreakToken=*/true, , );
+  if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,
+/*BreakToken=*/false, , );
+  if (LastFormat == FD_Unformatted || LastFormat == FD_Break)
+addNextStateToQueue(Penalty, Node, /*NewLine=*/true,
+/*BreakToken=*/false, , );
 }
 
 if (Queue.empty()) {
@@ -745,18 +756,20 @@
   /// Assume the current state is \p PreviousNode and has been reached with a
   /// penalty of \p Penalty. Insert a line break if \p NewLine is \c true.
 

[PATCH] D33589: clang-format: consider not splitting tokens in optimization

2017-05-26 Thread Francois Ferrand via Phabricator via cfe-commits
Typz created this revision.
Herald added a subscriber: klimek.

This patch tries to improve the optimizer a bit, to avoid splitting
tokens (e.g. comments/strings) if only there are only few characters
beyond the ColumnLimit.

Previously, comments/strings would be split whenever they went beyond
the ColumnLimit, without consideration for the PenaltyBreakComment/
String and PenaltyExcessCharacter. With this patch, the
OptimizingLineFormatter will also consider not splitting each token,
so that these 'small' offenders get a chance:

  // With ColumnLimit=20
  int a; // the
 // comment
  
  // With ColumnLimit=20 and PenaltyExcessCharacter = 10
  int a; // the comment

This patch does not fully optimize the reflowing, as it will only try
reflowing the whole comment or not reflowing it at all (instead of
trying each split, to allow some lines of overflow ColumnLimit even
when reflowing).


https://reviews.llvm.org/D33589

Files:
  lib/Format/ContinuationIndenter.cpp
  lib/Format/ContinuationIndenter.h
  lib/Format/FormatToken.cpp
  lib/Format/FormatToken.h
  lib/Format/UnwrappedLineFormatter.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -8542,6 +8542,36 @@
   EXPECT_EQ("#pragma option -C -A", format("#pragmaoption   -C   -A"));
 }
 
+TEST_F(FormatTest, OptimizeBreakPenaltyVsExcess) {
+  FormatStyle Style = getLLVMStyle();
+  Style.ColumnLimit = 20;
+
+  verifyFormat("int a; // the\n"
+			   "   // comment", Style);
+  verifyFormat("int a; /* first line\n"
+   "* second\n"
+			   "* line third\n"
+   "* line\n"
+   "*/", Style);
+
+  Style.PenaltyExcessCharacter = 30;
+  verifyFormat("int a; // the comment", Style);
+  EXPECT_EQ("int a; // the\n"
+"   // comment aa",
+format("int a; // the comment aa", Style));
+  verifyFormat("int a; /* first line\n"
+   "* second line\n"
+			   "* third line\n"
+   "*/", Style);
+  EXPECT_EQ("int a; /* first line\n"
+"* second\n"
+			"* line third\n"
+"* line\n"
+"*/",
+format("int a; /* first line second line third line */",
+   Style));
+}
+
 #define EXPECT_ALL_STYLES_EQUAL(Styles)\
   for (size_t i = 1; i < Styles.size(); ++i)   \
   EXPECT_EQ(Styles[0], Styles[i]) << "Style #" << i << " of " << Styles.size() \
Index: lib/Format/UnwrappedLineFormatter.cpp
===
--- lib/Format/UnwrappedLineFormatter.cpp
+++ lib/Format/UnwrappedLineFormatter.cpp
@@ -590,7 +590,8 @@
   (Indenter->canBreak(State) && State.NextToken->NewlinesBefore > 0);
   unsigned Penalty = 0;
   formatChildren(State, Newline, /*DryRun=*/false, Penalty);
-  Indenter->addTokenToState(State, Newline, /*DryRun=*/false);
+  Indenter->addTokenToState(State, Newline, /*BreakToken=*/true,
+/*DryRun=*/false);
 }
 return 0;
   }
@@ -611,7 +612,8 @@
 LineState State = Indenter->getInitialState(FirstIndent, , DryRun);
 while (State.NextToken) {
   formatChildren(State, /*Newline=*/false, DryRun, Penalty);
-  Indenter->addTokenToState(State, /*Newline=*/false, DryRun);
+  Indenter->addTokenToState(State, /*Newline=*/false, /*BreakToken=*/true,
+DryRun);
 }
 return Penalty;
   }
@@ -658,10 +660,11 @@
   /// \brief An edge in the solution space from \c Previous->State to \c State,
   /// inserting a newline dependent on the \c NewLine.
   struct StateNode {
-StateNode(const LineState , bool NewLine, StateNode *Previous)
-: State(State), NewLine(NewLine), Previous(Previous) {}
+StateNode(const LineState , bool NewLine, bool BreakToken, StateNode *Previous)
+: State(State), NewLine(NewLine), BreakToken(BreakToken), Previous(Previous) {}
 LineState State;
 bool NewLine;
+bool BreakToken;
 StateNode *Previous;
   };
 
@@ -691,7 +694,7 @@
 
 // Insert start element into queue.
 StateNode *Node =
-new (Allocator.Allocate()) StateNode(InitialState, false, nullptr);
+new (Allocator.Allocate()) StateNode(InitialState, false, true, nullptr);
 Queue.push(QueueItem(OrderedPenalty(0, Count), Node));
 ++Count;
 
@@ -718,9 +721,17 @@
 
   FormatDecision LastFormat = Node->State.NextToken->Decision;
   if (LastFormat == FD_Unformatted || LastFormat == FD_Continue)
-addNextStateToQueue(Penalty, Node, /*NewLine=*/false, , );
+addNextStateToQueue(Penalty, Node, /*NewLine=*/false,
+/*BreakToken=*/true, , );
   if (LastFormat == FD_Unformatted ||