[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-23 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

In D137885#3947532 , @dyung wrote:

> @MaskRay your change seems to be causing 1509 failures on the linux PS4 bot. 
> Most of the errors look similar to this:
>
>   error: 'error' diagnostics seen but not expected: 
> (frontend): malformed or corrupted AST file: 'LLVM was not built with 
> LLVM_ENABLE_ZSTD or did not find zstd at build time'
>   1 error generated.
>
> Can you take a look?

Back from lunch. Sorry for the breakage. I picked the wrong zlib header... 
Fixed now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-23 Thread Douglas Yung via Phabricator via cfe-commits
dyung added a comment.

In D137885#3947532 , @dyung wrote:

> @MaskRay your change seems to be causing 1509 failures on the linux PS4 bot. 
> Most of the errors look similar to this:
>
>   error: 'error' diagnostics seen but not expected: 
> (frontend): malformed or corrupted AST file: 'LLVM was not built with 
> LLVM_ENABLE_ZSTD or did not find zstd at build time'
>   1 error generated.
>
> Can you take a look?

Sorry, forgot to include a link to the failing buildbot: 
https://lab.llvm.org/buildbot/#/builders/139/builds/31513


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-23 Thread Douglas Yung via Phabricator via cfe-commits
dyung added a comment.

@MaskRay your change seems to be causing 1509 failures on the linux PS4 bot. 
Most of the errors look similar to this:

  error: 'error' diagnostics seen but not expected: 
(frontend): malformed or corrupted AST file: 'LLVM was not built with 
LLVM_ENABLE_ZSTD or did not find zstd at build time'
  1 error generated.

Can you take a look?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-23 Thread Fangrui Song via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGfa7bc386ec75: [modules] Support zstd in .pcm file (authored 
by MaskRay).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

Files:
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/Modules/embed-files-compressed.cpp


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1979,6 +1979,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer, 9);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 
1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -1452,19 +1452,24 @@
 unsigned RecCode = MaybeRecCode.get();
 
 if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
-  if (!llvm::compression::zlib::isAvailable()) {
-Error("zlib is not available");
+  // Inspect the first two bytes to differentiate zlib (\x1f\x8b) and zstd.
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib
+  : llvm::compression::Format::Zstd;
+  if (const char *Reason = llvm::compression::getReasonIfUnsupported(F)) {
+Error(Reason);
 return nullptr;
   }
-  SmallVector Uncompressed;
-  if (llvm::Error E = llvm::compression::zlib::decompress(
-  llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
+  SmallVector Decompressed;
+  if (llvm::Error E = llvm::compression::decompress(
+  F, llvm::arrayRefFromStringRef(Blob), Decompressed, Record[0])) {
 Error("could not decompress embedded file contents: " +
   llvm::toString(std::move(E)));
 return nullptr;
   }
   return llvm::MemoryBuffer::getMemBufferCopy(
-  llvm::toStringRef(Uncompressed), Name);
+  llvm::toStringRef(Decompressed), Name);
 } else if (RecCode == SM_SLOC_BUFFER_BLOB) {
   return llvm::MemoryBuffer::getMemBuffer(Blob.drop_back(1), Name, true);
 } else {


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1979,6 +1979,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer, 9);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -1452,19 +1452,24 @@
 unsigned RecCode = MaybeRecCode.get();
 
 if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
-  

[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-23 Thread David Blaikie via Phabricator via cfe-commits
dblaikie accepted this revision.
dblaikie added a comment.
This revision is now accepted and ready to land.

Sounds good


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-22 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay marked an inline comment as done.
MaskRay added a comment.

Ping:)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-18 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added inline comments.



Comment at: clang/lib/Serialization/ASTReader.cpp:1457
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib

MaskRay wrote:
> dblaikie wrote:
> > tschuett wrote:
> > > Could you put the magic number into a named constant?
> > & perhaps we should have/test a magic number for zstd too, so it's clear 
> > it's one or the other and not something else added in the future (since 
> > this isn't reving the bitcode version or anything - reading zstd compressed 
> > data with the library before this version will I guess/hopefully appear 
> > corrupted, but we could avoid that being the failure mode in the future 
> > when another compression scheme is added by checking explicitly for 
> > zstd/zlib now and reporting unknown compression scheme otherwise?)?
> > Could you put the magic number into a named constant?
> 
> I think a constant doesn't improve readability of this used-once value with a 
> comment.
> 
> > & perhaps we should have/test a magic number for zstd too, so it's clear 
> > it's one or the other and not something else added in the future
> 
> Since the version number keeps increasing, when a new algorithm is added, 
> they will see a version mismatch error...
> > & perhaps we should have/test a magic number for zstd too, so it's clear 
> > it's one or the other and not something else added in the future
> 
> Since the version number keeps increasing, when a new algorithm is added, 
> they will see a version mismatch error...

Oh, right, sorry, got muddled - thinking bitcode & then thinking LLVM IR 
bitcode guarantees, super stable, but the AST's super unstable/closely version 
locked anyway.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-18 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay marked 2 inline comments as done.
MaskRay added inline comments.



Comment at: clang/lib/Serialization/ASTReader.cpp:1457
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib

dblaikie wrote:
> tschuett wrote:
> > Could you put the magic number into a named constant?
> & perhaps we should have/test a magic number for zstd too, so it's clear it's 
> one or the other and not something else added in the future (since this isn't 
> reving the bitcode version or anything - reading zstd compressed data with 
> the library before this version will I guess/hopefully appear corrupted, but 
> we could avoid that being the failure mode in the future when another 
> compression scheme is added by checking explicitly for zstd/zlib now and 
> reporting unknown compression scheme otherwise?)?
> Could you put the magic number into a named constant?

I think a constant doesn't improve readability of this used-once value with a 
comment.

> & perhaps we should have/test a magic number for zstd too, so it's clear it's 
> one or the other and not something else added in the future

Since the version number keeps increasing, when a new algorithm is added, they 
will see a version mismatch error...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-18 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added inline comments.



Comment at: clang/lib/Serialization/ASTReader.cpp:1457
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib

tschuett wrote:
> Could you put the magic number into a named constant?
& perhaps we should have/test a magic number for zstd too, so it's clear it's 
one or the other and not something else added in the future (since this isn't 
reving the bitcode version or anything - reading zstd compressed data with the 
library before this version will I guess/hopefully appear corrupted, but we 
could avoid that being the failure mode in the future when another compression 
scheme is added by checking explicitly for zstd/zlib now and reporting unknown 
compression scheme otherwise?)?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-18 Thread Thorsten via Phabricator via cfe-commits
tschuett added a comment.

Release note?




Comment at: clang/lib/Serialization/ASTReader.cpp:1457
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib

Could you put the magic number into a named constant?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-17 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay updated this revision to Diff 476360.
MaskRay marked an inline comment as done.
MaskRay added a comment.

add a comment about the zlib header


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

Files:
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/Modules/embed-files-compressed.cpp


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1979,6 +1979,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer, 9);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 
1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -1452,19 +1452,24 @@
 unsigned RecCode = MaybeRecCode.get();
 
 if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
-  if (!llvm::compression::zlib::isAvailable()) {
-Error("zlib is not available");
+  // Inspect the first two bytes to differentiate zlib (\x1f\x8b) and zstd.
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib
+  : llvm::compression::Format::Zstd;
+  if (const char *Reason = llvm::compression::getReasonIfUnsupported(F)) {
+Error(Reason);
 return nullptr;
   }
-  SmallVector Uncompressed;
-  if (llvm::Error E = llvm::compression::zlib::decompress(
-  llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
+  SmallVector Decompressed;
+  if (llvm::Error E = llvm::compression::decompress(
+  F, llvm::arrayRefFromStringRef(Blob), Decompressed, Record[0])) {
 Error("could not decompress embedded file contents: " +
   llvm::toString(std::move(E)));
 return nullptr;
   }
   return llvm::MemoryBuffer::getMemBufferCopy(
-  llvm::toStringRef(Uncompressed), Name);
+  llvm::toStringRef(Decompressed), Name);
 } else if (RecCode == SM_SLOC_BUFFER_BLOB) {
   return llvm::MemoryBuffer::getMemBuffer(Blob.drop_back(1), Name, true);
 } else {


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1979,6 +1979,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer, 9);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -1452,19 +1452,24 @@
 unsigned RecCode = MaybeRecCode.get();
 
 if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
-  if (!llvm::compression::zlib::isAvailable()) {
-Error("zlib is 

[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-14 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added inline comments.



Comment at: clang/lib/Serialization/ASTReader.cpp:1456
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib

Worth a comment about this magic number?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-12 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay updated this revision to Diff 474953.
MaskRay edited the summary of this revision.
MaskRay added a comment.

use level 9


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137885/new/

https://reviews.llvm.org/D137885

Files:
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/Modules/embed-files-compressed.cpp


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1980,6 +1980,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer, 9);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 
1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -1452,19 +1452,23 @@
 unsigned RecCode = MaybeRecCode.get();
 
 if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
-  if (!llvm::compression::zlib::isAvailable()) {
-Error("zlib is not available");
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib
+  : llvm::compression::Format::Zstd;
+  if (const char *Reason = llvm::compression::getReasonIfUnsupported(F)) {
+Error(Reason);
 return nullptr;
   }
-  SmallVector Uncompressed;
-  if (llvm::Error E = llvm::compression::zlib::decompress(
-  llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
+  SmallVector Decompressed;
+  if (llvm::Error E = llvm::compression::decompress(
+  F, llvm::arrayRefFromStringRef(Blob), Decompressed, Record[0])) {
 Error("could not decompress embedded file contents: " +
   llvm::toString(std::move(E)));
 return nullptr;
   }
   return llvm::MemoryBuffer::getMemBufferCopy(
-  llvm::toStringRef(Uncompressed), Name);
+  llvm::toStringRef(Decompressed), Name);
 } else if (RecCode == SM_SLOC_BUFFER_BLOB) {
   return llvm::MemoryBuffer::getMemBuffer(Blob.drop_back(1), Name, true);
 } else {


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1980,6 +1980,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer, 9);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -1452,19 +1452,23 @@
 unsigned RecCode = MaybeRecCode.get();
 
 if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
-  if (!llvm::compression::zlib::isAvailable()) {
-Error("zlib is not available");
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && 

[PATCH] D137885: [modules] Support zstd in .pcm file

2022-11-12 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay created this revision.
MaskRay added reviewers: aaron.ballman, jansvoboda11, rsmith.
Herald added a subscriber: StephenFan.
Herald added a project: All.
MaskRay requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Extend SM_SLOC_BUFFER_BLOB_COMPRESSED to allow zstd, which is much faster
(compression/decompression) than zlib and has better compression ratio.

An alternative is to add a value beside SM_SLOC_BUFFER_BLOB_COMPRESSED, but
reusing SM_SLOC_BUFFER_BLOB_COMPRESSED slightly simplifies the implementation
and leads to better diagnostics when a slightly older Clang consumes zstd
compressed blob.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D137885

Files:
  clang/lib/Serialization/ASTReader.cpp
  clang/lib/Serialization/ASTWriter.cpp
  clang/test/Modules/embed-files-compressed.cpp


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1980,6 +1980,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 
1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: clang/lib/Serialization/ASTReader.cpp
===
--- clang/lib/Serialization/ASTReader.cpp
+++ clang/lib/Serialization/ASTReader.cpp
@@ -1452,19 +1452,23 @@
 unsigned RecCode = MaybeRecCode.get();
 
 if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
-  if (!llvm::compression::zlib::isAvailable()) {
-Error("zlib is not available");
+  const llvm::compression::Format F =
+  Blob.size() >= 2 && memcmp(Blob.data(), "\x1f\x8b", 2) == 0
+  ? llvm::compression::Format::Zlib
+  : llvm::compression::Format::Zstd;
+  if (const char *Reason = llvm::compression::getReasonIfUnsupported(F)) {
+Error(Reason);
 return nullptr;
   }
-  SmallVector Uncompressed;
-  if (llvm::Error E = llvm::compression::zlib::decompress(
-  llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
+  SmallVector Decompressed;
+  if (llvm::Error E = llvm::compression::decompress(
+  F, llvm::arrayRefFromStringRef(Blob), Decompressed, Record[0])) {
 Error("could not decompress embedded file contents: " +
   llvm::toString(std::move(E)));
 return nullptr;
   }
   return llvm::MemoryBuffer::getMemBufferCopy(
-  llvm::toStringRef(Uncompressed), Name);
+  llvm::toStringRef(Decompressed), Name);
 } else if (RecCode == SM_SLOC_BUFFER_BLOB) {
   return llvm::MemoryBuffer::getMemBuffer(Blob.drop_back(1), Name, true);
 } else {


Index: clang/test/Modules/embed-files-compressed.cpp
===
--- clang/test/Modules/embed-files-compressed.cpp
+++ clang/test/Modules/embed-files-compressed.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: zlib
+// REQUIRES: zlib || zstd
 // REQUIRES: shell
 //
 // RUN: rm -rf %t
Index: clang/lib/Serialization/ASTWriter.cpp
===
--- clang/lib/Serialization/ASTWriter.cpp
+++ clang/lib/Serialization/ASTWriter.cpp
@@ -1980,6 +1980,14 @@
   // Compress the buffer if possible. We expect that almost all PCM
   // consumers will not want its contents.
   SmallVector CompressedBuffer;
+  if (llvm::compression::zstd::isAvailable()) {
+llvm::compression::zstd::compress(
+llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
+RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 1};
+Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,
+  llvm::toStringRef(CompressedBuffer));
+return;
+  }
   if (llvm::compression::zlib::isAvailable()) {
 llvm::compression::zlib::compress(
 llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer);
Index: