John Naylor <john.nay...@2ndquadrant.com> writes: > The pre-existing ecpg var "state_before" was a bit confusing when > combined with the new var "state_before_quote_stop", and the former is > also used with C-comments, so I decided to go with > "state_before_lit_start" and "state_before_lit_stop". Even though > comments aren't literals, it's less of a stretch than referring to > quotes. To keep things consistent, I went with the latter var in psql > and core.
Hm, what do you think of "state_before_str_stop" instead? It seems to me that both "quote" and "lit" are pretty specific terms, so maybe we need something a bit vaguer. > To get the regression tests to pass, I had to add this: > psql_scan_in_quote(PsqlScanState state) > { > - return state->start_state != INITIAL; > + return state->start_state != INITIAL && > + state->start_state != xqs; > } > ...otherwise with parens we sometimes don't get the right prompt and > we get empty lines echoed. Adding xuend and xuchar here didn't seem to > make a difference. There might be something subtle I'm missing, so I > thought I'd mention it. I think you would see a difference if the regression tests had any cases with blank lines between a Unicode string/ident and the associated UESCAPE and escape-character literal. While poking at that, I also came across this unhappiness: regression=# select u&'foo' uescape 'bogus'; regression'# that is, psql thinks we're still in a literal at this point. That's because the uesccharfail rule eats "'b" and then we go to INITIAL state, so that consuming the last "'" puts us back in a string state. The backend would have thrown an error before parsing as far as the incomplete literal, so it doesn't care (or probably not, anyway), but that's not an option for psql. My first reaction as to how to fix this was to rip the xuend and xuchar states out of psql, and let it just lex UESCAPE as an identifier and the escape-character literal like any other literal. psql doesn't need to account for the escape character's effect on the meaning of the Unicode literal, so it doesn't have any need to lex the sequence as one big token. I think the same is true of ecpg though I've not looked really closely. However, my second reaction was that maybe you were on to something upthread when you speculated about postponing de-escaping of Unicode literals into the grammar. If we did it like that then we would not need to have this difference between the backend and frontend lexers, and we'd not have to worry about what psql_scan_in_quote should do about the whitespace before and after UESCAPE, either. So I'm feeling like maybe we should experiment to see what that solution looks like, before we commit to going in this direction. What do you think? > With the unicode escape rules brought over, the diff to the ecpg > scanner is much cleaner now. The diff for C-comment rules were still > pretty messy in comparison, so I made an attempt to clean that up in > 0002. A bit off-topic, but I thought I should offer that while it was > fresh in my head. I didn't really review this, but it looked like a fairly plausible change of the same ilk, ie combine rules by adding memory of the previous start state. regards, tom lane