[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
joerg added a comment. We already have a couple of case that expect the encoding to be compatible. I'm not very attached to the additional special cases from YAML, but having either a common escape function OR a JSON escape in LLVM/Support is quite important. https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
malaperle-ericsson added a comment. In https://reviews.llvm.org/D31992#728036, @joerg wrote: > Just to avoid any confusion: this should be the generic YAML escape routine > in llvm/lib/Support, i.e. IMO we don't want to have separate YAML and JSON > escape routines. > Effective, change YAMLParser.cpp line 697 to use \u and drop the whole UTF-8 > handling part. I'm not sure it's possible or desirable to reuse yaml::escape as they are two formats and it's expected that a yaml output might not be able to be read by a json parser. If I look at the yaml::escape routine, there are several escape that are incompatible with JSON like \0, \v, \e, and 32-bit handling via \U. I think ideally this patch would be a stop-gap solution until jsoncpp is introduced so that one can build json output in a structured way (as opposed to manually appending strings now) and the output will also be escape via the library. https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
joerg added a comment. Just to avoid any confusion: this should be the generic YAML escape routine in llvm/lib/Support, i.e. IMO we don't want to have separate YAML and JSON escape routines. Effective, change YAMLParser.cpp line 697 to use \u and drop the whole UTF-8 handling part. https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
malaperle-ericsson added a comment. Once the use of "two characters representation" is clarified, I will update the patch again. https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
joerg added a comment. The short version is perfectly fine as long as works for both JSON and YAML. Less output is always good :) https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
malaperle-ericsson updated this revision to Diff 95414. malaperle-ericsson added a comment. Handle other control characters and add test https://reviews.llvm.org/D31992 Files: clangd/ASTManager.cpp clangd/Protocol.cpp clangd/Protocol.h clangd/ProtocolHandlers.cpp test/clangd/encoding.test Index: test/clangd/encoding.test === --- /dev/null +++ test/clangd/encoding.test @@ -0,0 +1,17 @@ +# RUN: clangd -run-synchronously < %s | FileCheck %s +# It is absolutely vital that this file has CRLF line endings. +# +Content-Length: 125 + +{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":123,"rootPath":"clangd","capabilities":{},"trace":"off"}} +# +Content-Length: 154 + +{"jsonrpc":"2.0","method":"textDocument/didOpen","params":{"textDocument":{"uri":"file:///foo.c","languageId":"c","version":1,"text":"void main() {é;}"}}} +# +# CHECK: {"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{"uri":"file:///foo.c","diagnostics":[{"range":{"start": {"line": 0, "character": 1}, "end": {"line": 0, "character": 1}},"severity":2,"message":"return type of 'main' is not 'int'"},{"range":{"start": {"line": 0, "character": 1}, "end": {"line": 0, "character": 1}},"severity":3,"message":"change return type to 'int'"},{"range":{"start": {"line": 0, "character": 14}, "end": {"line": 0, "character": 14}},"severity":1,"message":"use of undeclared identifier 'é'"}]}} +# +# +Content-Length: 44 + +{"jsonrpc":"2.0","id":5,"method":"shutdown"} Index: clangd/ProtocolHandlers.cpp === --- clangd/ProtocolHandlers.cpp +++ clangd/ProtocolHandlers.cpp @@ -172,9 +172,9 @@ if (!Edits.empty()) Commands += - R"({"title":"Apply FixIt ')" + llvm::yaml::escape(D.message) + + R"({"title":"Apply FixIt ')" + jsonEscape(D.message) + R"('", "command": "clangd.applyFix", "arguments": [")" + - llvm::yaml::escape(CAP->textDocument.uri.uri) + + jsonEscape(CAP->textDocument.uri.uri) + R"(", [)" + Edits + R"(]]},)"; } Index: clangd/Protocol.h === --- clangd/Protocol.h +++ clangd/Protocol.h @@ -29,6 +29,8 @@ namespace clang { namespace clangd { +std::string jsonEscape(llvm::StringRef Input); + struct URI { std::string uri; std::string file; Index: clangd/Protocol.cpp === --- clangd/Protocol.cpp +++ clangd/Protocol.cpp @@ -15,12 +15,48 @@ #include "Protocol.h" #include "clang/Basic/LLVM.h" #include "llvm/ADT/SmallString.h" +#include #include "llvm/Support/Format.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Support/Path.h" using namespace clang::clangd; +std::string clang::clangd::jsonEscape(llvm::StringRef Input) { + std::string EscapedInput; + for (llvm::StringRef::iterator i = Input.begin(), e = Input.end(); i != e; ++i) { +if (*i == '\\') + EscapedInput += ""; +else if (*i == '"') + EscapedInput += "\\\""; +// bell +else if (*i == 0x07) + EscapedInput += "\\a"; +// backspace +else if (*i == 0x08) + EscapedInput += "\\b"; +// hoz tab +else if (*i == 0x09) + EscapedInput += "\\t"; +// new line +else if (*i == 0x0A) + EscapedInput += "\\n"; +// form feed +else if (*i == 0x0C) + EscapedInput += "\\f"; +// carr return +else if (*i == 0x0D) + EscapedInput += "\\r"; +else if ((unsigned char)*i < 0x20) { // Control characters not handled above. + std::string HexStr = llvm::utohexstr(*i); + EscapedInput += "\\u" + std::string(4 - HexStr.size(), '0') + HexStr; +} +else + EscapedInput.push_back(*i); + } + return EscapedInput; +} + URI URI::fromUri(llvm::StringRef uri) { URI Result; Result.uri = uri; @@ -230,7 +266,7 @@ std::string Result; llvm::raw_string_ostream(Result) << llvm::format( R"({"range": %s, "newText": "%s"})", Range::unparse(P.range).c_str(), - llvm::yaml::escape(P.newText).c_str()); + jsonEscape(P.newText).c_str()); return Result; } @@ -670,20 +706,20 @@ std::string Result = "{"; llvm::raw_string_ostream Os(Result); assert(!CI.label.empty() && "completion item label is required"); - Os << R"("label":")" << llvm::yaml::escape(CI.label) << R"(",)"; + Os << R"("label":")" << jsonEscape(CI.label) << R"(",)"; if (CI.kind != CompletionItemKind::Missing) Os << R"("kind":)" << static_cast(CI.kind) << R"(,)"; if (!CI.detail.empty()) -Os << R"("detail":")" << llvm::yaml::escape(CI.detail) << R"(",)"; +Os << R"("detail":")" << jsonEscape(CI.detail) << R"(",)"; if (!CI.documentation.empty()) -Os << R"("documentation":")" << llvm::yaml::escape(CI.documentation) +Os << R"("documentation":")" << jsonEscape(CI.documentation) << R"(",)"; if
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
malaperle-ericsson added a comment. In https://reviews.llvm.org/D31992#726447, @joerg wrote: > let's escape ... all the known ASCII control characters, Do you mean encode all of them with \u or keep the two characters representation for those that exist? I think \n is nicer than \u000A and is probably common enough to keep the short version. Same for \r, \t. Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
joerg added a comment. In https://reviews.llvm.org/D31992#725963, @klimek wrote: > If I understand correctly, that's " and \ for JSON and ", \ and all > non-printable characters (which unfortunately requires understanding of > unicode to solve this fully correctly) in YAML. I'd modify that slightly: let's escape those two and all the known ASCII control characters, but just pass the rest through. I.e. reduce the current \x etc handling to known problematic input and not second guess the rest. Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
klimek added a comment. In https://reviews.llvm.org/D31992#725912, @malaperle-ericsson wrote: > In https://reviews.llvm.org/D31992#725866, @krasimir wrote: > > > Seems that we're starting to hit some YAML/JSON mismatches, or is it that > > your YAML string support is lacking? > > > I don't think so. It seems like JSON and YAML are not completely aligned on > escaped characters. See http://yaml.org/spec/1.2/spec.html#id2776092 which > specifies unicode 8, 16, and 32 bits are escaped with \x, \u, and \U whereas > http://www.json.org/string.gif specifies that unicode 16 bit characters can > be encoded with \u but \x and \U are not supported. This leads me to believe > that a YAML parser can read JSON but a JSON parser will not necessarily be > able to read YAML. I thought about using json cpp but that's a much bigger > change Here we only talk about what we escape, which should be the minimum required. If I understand correctly, that's " and \ for JSON and ", \ and all non-printable characters (which unfortunately requires understanding of unicode to solve this fully correctly) in YAML. Am I missing something? Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
klimek added inline comments. Comment at: clangd/Protocol.cpp:26-50 + for (llvm::StringRef::iterator i = Input.begin(), e = Input.end(); i != e; ++i) { +if (*i == '\\') + EscapedInput += ""; +else if (*i == '"') + EscapedInput += "\\\""; +// bell +else if (*i == 0x07) For json we only need the first 2 though, right? Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
malaperle-ericsson added a comment. In https://reviews.llvm.org/D31992#725913, @joerg wrote: > I'm strongly against this patch. Can you give an actual test case for the > problematic behavior? Sure I can add a test. If you meant more real work scenario, you can juste type "é" in VS Code and it will throw an exception trying to parse \x Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
joerg added a comment. I'm strongly against this patch. Can you give an actual test case for the problematic behavior? Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
malaperle-ericsson added a comment. In https://reviews.llvm.org/D31992#725866, @krasimir wrote: > Seems that we're starting to hit some YAML/JSON mismatches, or is it that > your YAML string support is lacking? I don't think so. It seems like JSON and YAML are not completely aligned on escaped characters. See http://yaml.org/spec/1.2/spec.html#id2776092 which specifies unicode 8, 16, and 32 bits are escaped with \x, \u, and \U whereas http://www.json.org/string.gif specifies that unicode 16 bit characters can be encoded with \u but \x and \U are not supported. This leads me to believe that a YAML parser can read JSON but a JSON parser will not necessarily be able to read YAML. I thought about using json cpp but that's a much bigger change Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Re: [PATCH] D31992: [clangd] Escape only necessary characters in JSON output
sorry, is it that *our* YAML string support is lacking? On Thu, Apr 13, 2017 at 2:38 PM, Krasimir Georgiev via Phabricator < revi...@reviews.llvm.org> wrote: > krasimir added a comment. > > Seems that we're starting to hit some YAML/JSON mismatches, or is it that > your YAML string support is lacking? > > > Repository: > rL LLVM > > https://reviews.llvm.org/D31992 > > > > ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D31992: [clangd] Escape only necessary characters in JSON output
krasimir added a comment. Seems that we're starting to hit some YAML/JSON mismatches, or is it that your YAML string support is lacking? Repository: rL LLVM https://reviews.llvm.org/D31992 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits