[PATCH] D109531: [CSSPGO] Enable pseudo probe instrumentation in O0 mode.

2021-09-10 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109531/new/

https://reviews.llvm.org/D109531

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D109531: [CSSPGO] Enable pseudo probe instrumentation in O0 mode.

2021-09-10 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In D109531#2993484 , @hoy wrote:

> In D109531#2993394 , @wmi wrote:
>
>> In D109531#2992721 , @hoy wrote:
>>
>>> In D109531#2992702 , @wenlei 
>>> wrote:
>>>
 The change makes sense given instr PGO also happens for O0. But 
 practically, if a file is being built with O0, do we care about its 
 profile given we're not really optimizing it anyways? Functions from O0 
 modules are not supposed to be inlined into O1 
 + modules either.
>>>
>>> We probably don't care about performance for O0 build. The change is for 
>>> consistency, also makes the compiler happy which otherwise will complain 
>>> about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 
>>> modules that don't have probes.
>>
>> The complain message is emitted in SampleProfileLoader::doInitialization. 
>> llvm will not run SampleProfileLoader pass for O0 module. Why there is the 
>> complain?
>
> Good question. It could happen in lto postlink which by default optimizes in 
> -O2 mode. More specifically, with the following command, both `cc1` and `lld` 
> will run in default mode, which is -O0 for cc1 and -O2 for lld.
>
>   clang -flto 1.cpp -v -fuse-ld=lld

I see. It seems a problem only exposed in monolithic lto. Could you add some 
comment before the change in PassBuilder.cpp?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109531/new/

https://reviews.llvm.org/D109531

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D109531: [CSSPGO] Enable pseudo probe instrumentation in O0 mode.

2021-09-09 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In D109531#2992721 , @hoy wrote:

> In D109531#2992702 , @wenlei wrote:
>
>> The change makes sense given instr PGO also happens for O0. But practically, 
>> if a file is being built with O0, do we care about its profile given we're 
>> not really optimizing it anyways? Functions from O0 modules are not supposed 
>> to be inlined into O1 + modules 
>> either.
>
> We probably don't care about performance for O0 build. The change is for 
> consistency, also makes the compiler happy which otherwise will complain 
> about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 
> modules that don't have probes.

The complain message is emitted in SampleProfileLoader::doInitialization. llvm 
will not run SampleProfileLoader pass for O0 module. Why there is the complain?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109531/new/

https://reviews.llvm.org/D109531

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107876: [CSSPGO] Allow the use of debug-info-for-profiling and pseudo-probe-for-profiling together

2021-08-11 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.

In D107876#2939691 , @hoy wrote:

> In D107876#2939624 , @wmi wrote:
>
>> Could you remind me the discriminator difference between debug info based 
>> AFDO and pseudo probe based AFDO? I forgot it. I am sure it is described in 
>> some patch but I havn't found it. Give me a pointer is good enough.
>
> In short, unlike block pseudo probe, a callsite probe relies on the calliste 
> discriminator to be tracked. This has a couple usage: 1. to track direct call 
> and indirect callsite 2. to model inline contexts with a list of callsite 
> probes.
>
> A probe discriminator is treated as empty or null for AutoFDO since it has 
> lowest three bits as "111".

Thanks for the explanation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107876/new/

https://reviews.llvm.org/D107876

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D107876: [CSSPGO] Allow the use of debug-info-for-profiling and pseudo-probe-for-profiling together

2021-08-11 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

Could you remind me the discriminator difference between debug info based AFDO 
and pseudo probe based AFDO? I forgot it. I am sure it is described in some 
patch but I havn't found it. Give me a pointer is good enough.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107876/new/

https://reviews.llvm.org/D107876

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99554: [ThinLTO] During module importing, close one source module before open another one for distributed mode.

2021-03-30 Thread Wei Mi via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd535a05ca1a6: [ThinLTO] During module importing, close one 
source module before open (authored by wmi).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99554/new/

https://reviews.llvm.org/D99554

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/test/CodeGen/thinlto_backend.ll
  llvm/include/llvm/LTO/LTOBackend.h
  llvm/lib/LTO/LTO.cpp
  llvm/lib/LTO/LTOBackend.cpp

Index: llvm/lib/LTO/LTOBackend.cpp
===
--- llvm/lib/LTO/LTOBackend.cpp
+++ llvm/lib/LTO/LTOBackend.cpp
@@ -539,7 +539,7 @@
Module &Mod, const ModuleSummaryIndex &CombinedIndex,
const FunctionImporter::ImportMapTy &ImportList,
const GVSummaryMapTy &DefinedGlobals,
-   MapVector &ModuleMap,
+   MapVector *ModuleMap,
const std::vector &CmdArgs) {
   Expected TOrErr = initAndLookupTarget(Conf, Mod);
   if (!TOrErr)
@@ -608,11 +608,35 @@
   auto ModuleLoader = [&](StringRef Identifier) {
 assert(Mod.getContext().isODRUniquingDebugTypes() &&
"ODR Type uniquing should be enabled on the context");
-auto I = ModuleMap.find(Identifier);
-assert(I != ModuleMap.end());
-return I->second.getLazyModule(Mod.getContext(),
-   /*ShouldLazyLoadMetadata=*/true,
-   /*IsImporting*/ true);
+if (ModuleMap) {
+  auto I = ModuleMap->find(Identifier);
+  assert(I != ModuleMap->end());
+  return I->second.getLazyModule(Mod.getContext(),
+ /*ShouldLazyLoadMetadata=*/true,
+ /*IsImporting*/ true);
+}
+
+ErrorOr> MBOrErr =
+llvm::MemoryBuffer::getFile(Identifier);
+if (!MBOrErr)
+  return Expected>(make_error(
+  Twine("Error loading imported file ") + Identifier + " : ",
+  MBOrErr.getError()));
+
+Expected BMOrErr = findThinLTOModule(**MBOrErr);
+if (!BMOrErr)
+  return Expected>(make_error(
+  Twine("Error loading imported file ") + Identifier + " : " +
+  toString(BMOrErr.takeError()),
+  inconvertibleErrorCode()));
+
+Expected> MOrErr =
+BMOrErr->getLazyModule(Mod.getContext(),
+   /*ShouldLazyLoadMetadata=*/true,
+   /*IsImporting*/ true);
+if (MOrErr)
+  (*MOrErr)->setOwnedMemoryBuffer(std::move(*MBOrErr));
+return MOrErr;
   };
 
   FunctionImporter Importer(CombinedIndex, ModuleLoader,
@@ -652,12 +676,9 @@
  inconvertibleErrorCode());
 }
 
-bool lto::loadReferencedModules(
-const Module &M, const ModuleSummaryIndex &CombinedIndex,
-FunctionImporter::ImportMapTy &ImportList,
-MapVector &ModuleMap,
-std::vector>
-&OwnedImportsLifetimeManager) {
+bool lto::initImportList(const Module &M,
+ const ModuleSummaryIndex &CombinedIndex,
+ FunctionImporter::ImportMapTy &ImportList) {
   if (ThinLTOAssumeMerged)
 return true;
   // We can simply import the values mentioned in the combined index, since
@@ -678,26 +699,5 @@
   ImportList[Summary->modulePath()].insert(GUID);
 }
   }
-
-  for (auto &I : ImportList) {
-ErrorOr> MBOrErr =
-llvm::MemoryBuffer::getFile(I.first());
-if (!MBOrErr) {
-  errs() << "Error loading imported file '" << I.first()
- << "': " << MBOrErr.getError().message() << "\n";
-  return false;
-}
-
-Expected BMOrErr = findThinLTOModule(**MBOrErr);
-if (!BMOrErr) {
-  handleAllErrors(BMOrErr.takeError(), [&](ErrorInfoBase &EIB) {
-errs() << "Error loading imported file '" << I.first()
-   << "': " << EIB.message() << '\n';
-  });
-  return false;
-}
-ModuleMap.insert({I.first(), *BMOrErr});
-OwnedImportsLifetimeManager.push_back(std::move(*MBOrErr));
-  }
   return true;
 }
Index: llvm/lib/LTO/LTO.cpp
===
--- llvm/lib/LTO/LTO.cpp
+++ llvm/lib/LTO/LTO.cpp
@@ -1215,7 +1215,7 @@
 return MOrErr.takeError();
 
   return thinBackend(Conf, Task, AddStream, **MOrErr, CombinedIndex,
- ImportList, DefinedGlobals, ModuleMap);
+ ImportList, DefinedGlobals, &ModuleMap);
 };
 
 auto ModuleID = BM.getModuleIdentifier();
Index: llvm/include/llvm/LTO/LTOBackend.h
===
--- llvm/include/llvm/LTO/LTOBackend.h
+++ llvm/include/llvm/LTO/LTOBackend.h
@@ -46,11 +46,16 @@
   ModuleSummaryIndex &CombinedIndex);
 
 /// Runs a ThinLTO backend.
+/// If \p ModuleMap is not nullptr, all the module files to be i

[PATCH] D99554: [ThinLTO] During module importing, close one source module before open another one for distributed mode.

2021-03-30 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: clang/lib/CodeGen/BackendUtil.cpp:1583
   ModuleToDefinedGVSummaries[M->getModuleIdentifier()],
-  ModuleMap, CGOpts.CmdArgs)) {
+  nullptr, CGOpts.CmdArgs)) {
 handleAllErrors(std::move(E), [&](ErrorInfoBase &EIB) {

tejohnson wrote:
> Document constant parameter
Done.



Comment at: llvm/include/llvm/LTO/LTOBackend.h:48
 
 /// Runs a ThinLTO backend.
 Error thinBackend(const Config &C, unsigned Task, AddStreamFn AddStream,

tejohnson wrote:
> Can you add a note here about the behavior with regards to ModuleMap 
> depending on whether it is null or not?
Add the comment. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99554/new/

https://reviews.llvm.org/D99554

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99554: [ThinLTO] During module importing, close one source module before open another one for distributed mode.

2021-03-30 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 334240.
wmi added a comment.

Address Teresa's comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99554/new/

https://reviews.llvm.org/D99554

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/test/CodeGen/thinlto_backend.ll
  llvm/include/llvm/LTO/LTOBackend.h
  llvm/lib/LTO/LTO.cpp
  llvm/lib/LTO/LTOBackend.cpp

Index: llvm/lib/LTO/LTOBackend.cpp
===
--- llvm/lib/LTO/LTOBackend.cpp
+++ llvm/lib/LTO/LTOBackend.cpp
@@ -539,7 +539,7 @@
Module &Mod, const ModuleSummaryIndex &CombinedIndex,
const FunctionImporter::ImportMapTy &ImportList,
const GVSummaryMapTy &DefinedGlobals,
-   MapVector &ModuleMap,
+   MapVector *ModuleMap,
const std::vector &CmdArgs) {
   Expected TOrErr = initAndLookupTarget(Conf, Mod);
   if (!TOrErr)
@@ -608,11 +608,35 @@
   auto ModuleLoader = [&](StringRef Identifier) {
 assert(Mod.getContext().isODRUniquingDebugTypes() &&
"ODR Type uniquing should be enabled on the context");
-auto I = ModuleMap.find(Identifier);
-assert(I != ModuleMap.end());
-return I->second.getLazyModule(Mod.getContext(),
-   /*ShouldLazyLoadMetadata=*/true,
-   /*IsImporting*/ true);
+if (ModuleMap) {
+  auto I = ModuleMap->find(Identifier);
+  assert(I != ModuleMap->end());
+  return I->second.getLazyModule(Mod.getContext(),
+ /*ShouldLazyLoadMetadata=*/true,
+ /*IsImporting*/ true);
+}
+
+ErrorOr> MBOrErr =
+llvm::MemoryBuffer::getFile(Identifier);
+if (!MBOrErr)
+  return Expected>(make_error(
+  Twine("Error loading imported file ") + Identifier + " : ",
+  MBOrErr.getError()));
+
+Expected BMOrErr = findThinLTOModule(**MBOrErr);
+if (!BMOrErr)
+  return Expected>(make_error(
+  Twine("Error loading imported file ") + Identifier + " : " +
+  toString(BMOrErr.takeError()),
+  inconvertibleErrorCode()));
+
+Expected> MOrErr =
+BMOrErr->getLazyModule(Mod.getContext(),
+   /*ShouldLazyLoadMetadata=*/true,
+   /*IsImporting*/ true);
+if (MOrErr)
+  (*MOrErr)->setOwnedMemoryBuffer(std::move(*MBOrErr));
+return MOrErr;
   };
 
   FunctionImporter Importer(CombinedIndex, ModuleLoader,
@@ -652,12 +676,9 @@
  inconvertibleErrorCode());
 }
 
-bool lto::loadReferencedModules(
-const Module &M, const ModuleSummaryIndex &CombinedIndex,
-FunctionImporter::ImportMapTy &ImportList,
-MapVector &ModuleMap,
-std::vector>
-&OwnedImportsLifetimeManager) {
+bool lto::initImportList(const Module &M,
+ const ModuleSummaryIndex &CombinedIndex,
+ FunctionImporter::ImportMapTy &ImportList) {
   if (ThinLTOAssumeMerged)
 return true;
   // We can simply import the values mentioned in the combined index, since
@@ -678,26 +699,5 @@
   ImportList[Summary->modulePath()].insert(GUID);
 }
   }
-
-  for (auto &I : ImportList) {
-ErrorOr> MBOrErr =
-llvm::MemoryBuffer::getFile(I.first());
-if (!MBOrErr) {
-  errs() << "Error loading imported file '" << I.first()
- << "': " << MBOrErr.getError().message() << "\n";
-  return false;
-}
-
-Expected BMOrErr = findThinLTOModule(**MBOrErr);
-if (!BMOrErr) {
-  handleAllErrors(BMOrErr.takeError(), [&](ErrorInfoBase &EIB) {
-errs() << "Error loading imported file '" << I.first()
-   << "': " << EIB.message() << '\n';
-  });
-  return false;
-}
-ModuleMap.insert({I.first(), *BMOrErr});
-OwnedImportsLifetimeManager.push_back(std::move(*MBOrErr));
-  }
   return true;
 }
Index: llvm/lib/LTO/LTO.cpp
===
--- llvm/lib/LTO/LTO.cpp
+++ llvm/lib/LTO/LTO.cpp
@@ -1215,7 +1215,7 @@
 return MOrErr.takeError();
 
   return thinBackend(Conf, Task, AddStream, **MOrErr, CombinedIndex,
- ImportList, DefinedGlobals, ModuleMap);
+ ImportList, DefinedGlobals, &ModuleMap);
 };
 
 auto ModuleID = BM.getModuleIdentifier();
Index: llvm/include/llvm/LTO/LTOBackend.h
===
--- llvm/include/llvm/LTO/LTOBackend.h
+++ llvm/include/llvm/LTO/LTOBackend.h
@@ -46,11 +46,16 @@
   ModuleSummaryIndex &CombinedIndex);
 
 /// Runs a ThinLTO backend.
+/// If \p ModuleMap is not nullptr, all the module files to be imported have
+/// already been mapped to memory and the corresponding BitcodeModule objects
+/// are saved 

[PATCH] D99554: [ThinLTO] During module importing, close one source module before open another one for distributed mode.

2021-03-29 Thread Wei Mi via Phabricator via cfe-commits
wmi created this revision.
wmi added a reviewer: tejohnson.
Herald added subscribers: steven_wu, hiraditya, inglorion.
wmi requested review of this revision.
Herald added projects: clang, LLVM.
Herald added a subscriber: cfe-commits.

Currently during module importing, ThinLTO opens all the source modules, 
collect functions to be imported and append them to the destination module, 
then leave all the modules open through out the lto backend pipeline. This 
patch refactors it in the way that one source module will be closed before 
another source module is opened. All the source modules will be closed after 
importing phase is done. It will save some amount of memory when there are many 
source modules to be imported.

Note that this patch only changes the distributed thinlto mode. For in process 
thinlto mode, one source module is shared acorss different thinlto backend 
threads so it is not changed in this patch.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D99554

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/test/CodeGen/thinlto_backend.ll
  llvm/include/llvm/LTO/LTOBackend.h
  llvm/lib/LTO/LTO.cpp
  llvm/lib/LTO/LTOBackend.cpp

Index: llvm/lib/LTO/LTOBackend.cpp
===
--- llvm/lib/LTO/LTOBackend.cpp
+++ llvm/lib/LTO/LTOBackend.cpp
@@ -539,7 +539,7 @@
Module &Mod, const ModuleSummaryIndex &CombinedIndex,
const FunctionImporter::ImportMapTy &ImportList,
const GVSummaryMapTy &DefinedGlobals,
-   MapVector &ModuleMap,
+   MapVector *ModuleMap,
const std::vector &CmdArgs) {
   Expected TOrErr = initAndLookupTarget(Conf, Mod);
   if (!TOrErr)
@@ -608,11 +608,35 @@
   auto ModuleLoader = [&](StringRef Identifier) {
 assert(Mod.getContext().isODRUniquingDebugTypes() &&
"ODR Type uniquing should be enabled on the context");
-auto I = ModuleMap.find(Identifier);
-assert(I != ModuleMap.end());
-return I->second.getLazyModule(Mod.getContext(),
-   /*ShouldLazyLoadMetadata=*/true,
-   /*IsImporting*/ true);
+if (ModuleMap) {
+  auto I = ModuleMap->find(Identifier);
+  assert(I != ModuleMap->end());
+  return I->second.getLazyModule(Mod.getContext(),
+ /*ShouldLazyLoadMetadata=*/true,
+ /*IsImporting*/ true);
+}
+
+ErrorOr> MBOrErr =
+llvm::MemoryBuffer::getFile(Identifier);
+if (!MBOrErr)
+  return Expected>(make_error(
+  Twine("Error loading imported file ") + Identifier + " : ",
+  MBOrErr.getError()));
+
+Expected BMOrErr = findThinLTOModule(**MBOrErr);
+if (!BMOrErr)
+  return Expected>(make_error(
+  Twine("Error loading imported file ") + Identifier + " : " +
+  toString(BMOrErr.takeError()),
+  inconvertibleErrorCode()));
+
+Expected> MOrErr =
+BMOrErr->getLazyModule(Mod.getContext(),
+   /*ShouldLazyLoadMetadata=*/true,
+   /*IsImporting*/ true);
+if (MOrErr)
+  (*MOrErr)->setOwnedMemoryBuffer(std::move(*MBOrErr));
+return MOrErr;
   };
 
   FunctionImporter Importer(CombinedIndex, ModuleLoader,
@@ -652,12 +676,9 @@
  inconvertibleErrorCode());
 }
 
-bool lto::loadReferencedModules(
-const Module &M, const ModuleSummaryIndex &CombinedIndex,
-FunctionImporter::ImportMapTy &ImportList,
-MapVector &ModuleMap,
-std::vector>
-&OwnedImportsLifetimeManager) {
+bool lto::initImportList(const Module &M,
+ const ModuleSummaryIndex &CombinedIndex,
+ FunctionImporter::ImportMapTy &ImportList) {
   if (ThinLTOAssumeMerged)
 return true;
   // We can simply import the values mentioned in the combined index, since
@@ -678,26 +699,5 @@
   ImportList[Summary->modulePath()].insert(GUID);
 }
   }
-
-  for (auto &I : ImportList) {
-ErrorOr> MBOrErr =
-llvm::MemoryBuffer::getFile(I.first());
-if (!MBOrErr) {
-  errs() << "Error loading imported file '" << I.first()
- << "': " << MBOrErr.getError().message() << "\n";
-  return false;
-}
-
-Expected BMOrErr = findThinLTOModule(**MBOrErr);
-if (!BMOrErr) {
-  handleAllErrors(BMOrErr.takeError(), [&](ErrorInfoBase &EIB) {
-errs() << "Error loading imported file '" << I.first()
-   << "': " << EIB.message() << '\n';
-  });
-  return false;
-}
-ModuleMap.insert({I.first(), *BMOrErr});
-OwnedImportsLifetimeManager.push_back(std::move(*MBOrErr));
-  }
   return true;
 }
Index: llvm/lib/LTO/LTO.cpp
===
--- llvm/lib/LTO/LTO.cpp
+++ llvm/lib/LTO/LTO.cpp
@@ -1215,7 +1

[PATCH] D95271: [CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker.

2021-02-02 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95271/new/

https://reviews.llvm.org/D95271

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95271: [CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker.

2021-02-01 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: clang/lib/Driver/ToolChains/CommonArgs.cpp:609-610
+  // Pass an option to enable pseudo probe emission.
+  if (Args.hasFlag(options::OPT_fpseudo_probe_for_profiling,
+   options::OPT_fno_pseudo_probe_for_profiling))
+CmdArgs.push_back("-plugin-opt=pseudo-probe-for-profiling");

The third arg of Args.hasFlag defaults to true, which means 
-plugin-opt=pseudo-probe-for-profiling will be inserted even if no 
-fpseudo-probe-for-profiling or -fno-pseudo-probe-for-profiling is provided. Is 
it expected?



Comment at: clang/test/Driver/pseudo-probe-lto.c:6
+// RUN: %clang -### %t.o -target x86_64-unknown-linux -flto 
-fno-pseudo-probe-for-profiling 2>&1 | FileCheck %s --check-prefix=NOPROBE
+// RUN: %clang -### %t.o -target x86_64-unknown-linux -flto 
-fpseudo-probe-for-profiling -fno-pseudo-probe-for-profiling 2>&1 | FileCheck 
%s --check-prefix=NOPROBE
+

Better add another check of NOPROBE when neither -fpseudo-probe-for-profiling 
or -fno-pseudo-probe-for-profiling is provided. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95271/new/

https://reviews.llvm.org/D95271

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93264: [CSSPGO] Introducing distribution factor for pseudo probe.

2021-01-15 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks.




Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:363
   const FunctionSamples *CalleeSamples;
   uint64_t CallsiteCount;
+  // Call site distribution factor to prorate the profile samples for a

hoy wrote:
> wmi wrote:
> > CallsiteCount will be the count before being prorated or after if 
> > CallsiteDistribution is not 1.0?
> It is the count after prorated. The prorated count will be used to guide 
> inlining. For example, if a callsite is duplicated in LTO prelink, then in 
> LTO postlink the two copies will get their own distribution factors and their 
> prorated counts are used to decide if they should be inlined independently.
Ok, better comment it. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93264/new/

https://reviews.llvm.org/D93264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93264: [CSSPGO] Introducing distribution factor for pseudo probe.

2021-01-14 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: llvm/include/llvm/IR/PseudoProbe.h:41
   //  [18:3] - probe id
-  //  [25:19] - reserved
+  //  [25:19] - probe distribution factor
   //  [28:26] - probe type, see PseudoProbeType

hoy wrote:
> wmi wrote:
> > hoy wrote:
> > > wmi wrote:
> > > > hoy wrote:
> > > > > hoy wrote:
> > > > > > wmi wrote:
> > > > > > > The bits in discriminator is a scare resource. Have you 
> > > > > > > considered using less bits to represent probe distribution 
> > > > > > > factor? I guess it is possible that using a little more coarse 
> > > > > > > grain distribution factor won't affect performance.
> > > > > > That's a good point. We are using seven bits to represent [0, 100] 
> > > > > > so that integral numbers can be distinguished. Yes, we could use 
> > > > > > fewer bits to represent, say 4 bits to represent only even numbers. 
> > > > > > We could also not use any bits here but instead use the 
> > > > > > distribution factor of the outer block probes when the competition 
> > > > > > of those bits are high. I can do an experiment to see how well that 
> > > > > > works.
> > > > > On a second thought, using the distribution factor of block probes 
> > > > > for call probe may not work well since a callsite may be surrounded 
> > > > > by more than one block probes. 
> > > > > 
> > > > > We could use also fewer bits like 6 bits to encode even numbers in 
> > > > > the range [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. 
> > > > > I did a profile quality measurement with the even number encoding. 
> > > > > It's OK overall except for two SPEC benchmarks. I guess it's a 
> > > > > trade-off we'll have to take when there's a competition on those 
> > > > > bits. 
> > > > Could you elaborate a little bit about the case that a callsite is 
> > > > surrounded by more than one block probe? Is it because bb merge like in 
> > > > cfg simplification?
> > > Yes, block merge in cfg simplification is a good example. Inlining can 
> > > also end up with callee code and caller code in one block. Jump threading 
> > > or other cfg optimizations that convert a conditional jump into an 
> > > unconditional jump can result in block merge too.
> > > 
> > > So far our way to track block weight for blocks with multiple probes is 
> > > to take the maximum count out of those probes. When it comes to tracking 
> > > callsite count, it is handy and accurate to attach a dedicated 
> > > distribution factor for each individual call. For example, when a call is 
> > > inlined, the inlinee's probes will be cloned into the caller, and they 
> > > will be prorated based on the callsite's dedicated distribution factor.
> > Actually, I think we may be able to extend Discriminator and 
> > PseudoProbeDwarfDiscriminator. To emit Discriminator into Dwarf, we need to 
> > follow Dwarf standard about how many bits Discrminator is going to occupy. 
> > But inside compiler, Discriminator is represented as MetaData so it hasn't 
> > to be 32bits. For example, we can extend Discriminator MetaData to be 
> > 64bits or even larger and specify only lower 32bits will be actually 
> > emitted into Dwarf section. For intermediate information like distribution 
> > factors, we can put it into the higher bits. 
> That's a good idea, I like that. Actually we thought about that int the past 
> and our concern was about memory cost since the discriminator filed in 
> `DILexicalBlockFile` metadata is not optional. It is probably OK for pseudo 
> probe since discriminators are only used for callsites. It might be a problem 
> with -fdebug-info-for-profiling where discriminators can be used more often. 
> 
> It sounds to me extending the size of discriminator is desirable for pseudo 
> probes and potentially FS-AFDO. It might be worth evaluating the cost at some 
> time. What do you think?
Yes, it is worth evaluating the cost. It is only about intermediate data in 
compiler and it won't affect the binary and profile output, therefore it won't 
introduce backward compatibility issue. I think it is up to you to choose 
whether to evaluate it now or later.  



Comment at: llvm/lib/IR/PseudoProbe.cpp:65
+void setProbeDistributionFactor(Instruction &Inst, float Factor) {
+  assert(Factor <= 1);
+  if (auto *II = dyn_cast(&Inst)) {

Add assertion message.



Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:363
   const FunctionSamples *CalleeSamples;
   uint64_t CallsiteCount;
+  // Call site distribution factor to prorate the profile samples for a

CallsiteCount will be the count before being prorated or after if 
CallsiteDistribution is not 1.0?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93264/new/

https://reviews.llvm.org/D93264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/

[PATCH] D93264: [CSSPGO] Introducing distribution factor for pseudo probe.

2021-01-14 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: llvm/include/llvm/IR/PseudoProbe.h:41
   //  [18:3] - probe id
-  //  [25:19] - reserved
+  //  [25:19] - probe distribution factor
   //  [28:26] - probe type, see PseudoProbeType

hoy wrote:
> wmi wrote:
> > hoy wrote:
> > > hoy wrote:
> > > > wmi wrote:
> > > > > The bits in discriminator is a scare resource. Have you considered 
> > > > > using less bits to represent probe distribution factor? I guess it is 
> > > > > possible that using a little more coarse grain distribution factor 
> > > > > won't affect performance.
> > > > That's a good point. We are using seven bits to represent [0, 100] so 
> > > > that integral numbers can be distinguished. Yes, we could use fewer 
> > > > bits to represent, say 4 bits to represent only even numbers. We could 
> > > > also not use any bits here but instead use the distribution factor of 
> > > > the outer block probes when the competition of those bits are high. I 
> > > > can do an experiment to see how well that works.
> > > On a second thought, using the distribution factor of block probes for 
> > > call probe may not work well since a callsite may be surrounded by more 
> > > than one block probes. 
> > > 
> > > We could use also fewer bits like 6 bits to encode even numbers in the 
> > > range [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. I did a 
> > > profile quality measurement with the even number encoding. It's OK 
> > > overall except for two SPEC benchmarks. I guess it's a trade-off we'll 
> > > have to take when there's a competition on those bits. 
> > Could you elaborate a little bit about the case that a callsite is 
> > surrounded by more than one block probe? Is it because bb merge like in cfg 
> > simplification?
> Yes, block merge in cfg simplification is a good example. Inlining can also 
> end up with callee code and caller code in one block. Jump threading or other 
> cfg optimizations that convert a conditional jump into an unconditional jump 
> can result in block merge too.
> 
> So far our way to track block weight for blocks with multiple probes is to 
> take the maximum count out of those probes. When it comes to tracking 
> callsite count, it is handy and accurate to attach a dedicated distribution 
> factor for each individual call. For example, when a call is inlined, the 
> inlinee's probes will be cloned into the caller, and they will be prorated 
> based on the callsite's dedicated distribution factor.
Actually, I think we may be able to extend Discriminator and 
PseudoProbeDwarfDiscriminator. To emit Discriminator into Dwarf, we need to 
follow Dwarf standard about how many bits Discrminator is going to occupy. But 
inside compiler, Discriminator is represented as MetaData so it hasn't to be 
32bits. For example, we can extend Discriminator MetaData to be 64bits or even 
larger and specify only lower 32bits will be actually emitted into Dwarf 
section. For intermediate information like distribution factors, we can put it 
into the higher bits. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93264/new/

https://reviews.llvm.org/D93264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93264: [CSSPGO] Introducing distribution factor for pseudo probe.

2021-01-13 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: llvm/include/llvm/IR/PseudoProbe.h:41
   //  [18:3] - probe id
-  //  [25:19] - reserved
+  //  [25:19] - probe distribution factor
   //  [28:26] - probe type, see PseudoProbeType

hoy wrote:
> hoy wrote:
> > wmi wrote:
> > > The bits in discriminator is a scare resource. Have you considered using 
> > > less bits to represent probe distribution factor? I guess it is possible 
> > > that using a little more coarse grain distribution factor won't affect 
> > > performance.
> > That's a good point. We are using seven bits to represent [0, 100] so that 
> > integral numbers can be distinguished. Yes, we could use fewer bits to 
> > represent, say 4 bits to represent only even numbers. We could also not use 
> > any bits here but instead use the distribution factor of the outer block 
> > probes when the competition of those bits are high. I can do an experiment 
> > to see how well that works.
> On a second thought, using the distribution factor of block probes for call 
> probe may not work well since a callsite may be surrounded by more than one 
> block probes. 
> 
> We could use also fewer bits like 6 bits to encode even numbers in the range 
> [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. I did a profile 
> quality measurement with the even number encoding. It's OK overall except for 
> two SPEC benchmarks. I guess it's a trade-off we'll have to take when there's 
> a competition on those bits. 
Could you elaborate a little bit about the case that a callsite is surrounded 
by more than one block probe? Is it because bb merge like in cfg simplification?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93264/new/

https://reviews.llvm.org/D93264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93264: [CSSPGO] Introducing distribution factor for pseudo probe.

2021-01-12 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: llvm/include/llvm/IR/PseudoProbe.h:41
   //  [18:3] - probe id
-  //  [25:19] - reserved
+  //  [25:19] - probe distribution factor
   //  [28:26] - probe type, see PseudoProbeType

The bits in discriminator is a scare resource. Have you considered using less 
bits to represent probe distribution factor? I guess it is possible that using 
a little more coarse grain distribution factor won't affect performance.



Comment at: llvm/include/llvm/Passes/StandardInstrumentations.h:273
   IRChangedPrinter PrintChangedIR;
+  PseudoProbeVerifier PseudoProbeVerification;
   VerifyInstrumentation Verify;

Before PseudoProbeUpdate pass, there is no need to verify because 
PseudoProbeUpdate will make distribution factor consistent. PseudoProbeUpdate 
run in a late stage in the lto/thinlto prelink pipeline, and no many passes 
need the verification, so what is the major usage of PseudoProbeVerifier?  



Comment at: llvm/lib/Transforms/IPO/SampleProfileProbe.cpp:133-134
+  float PrevProbeFactor = PrevProbeFactors[I.first];
+  if (std::abs(CurProbeFactor - PrevProbeFactor) >
+  DistributionFactorVariance) {
+if (!BannerPrinted) {

Why not issue warning/error message when verification fails? That will make 
enabling the verification in release compiler possible.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93264/new/

https://reviews.llvm.org/D93264

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91756: [CSSPGO] Pseudo probes for function calls.

2020-12-02 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.
This revision is now accepted and ready to land.

In D91756#2427795 , @hoy wrote:

> In D91756#2427759 , @wmi wrote:
>
>> Another question. Sorry for not bringing it up earlier. When a call with 
>> probe metadata attached is inlined, the probe will be gone or it will be 
>> kept somehow? I think you want to keep the probe especially for inline 
>> instance to reconstruct the context but I didn't figure it out how you did 
>> that from the description.
>
> No problem. Sorry for not clarifying it in the description. When a callee is 
> inlined, the probe metadata will go with the inlined instructions. The `!dbg` 
> metadata of an inlined instruction is in form of a scope stack. The top of 
> the stack is the instruction's original `!dbg` metadata and the bottom of the 
> stack is the for the original callsite of the top-level inliner. Except for 
> the top of the stack, all other elements of the stack actually refer to the 
> nested inlined callsites whose discriminator fields (which actually 
> represents a calliste probe) can be used to represent the inline context of 
> an inlined `PseudoProbeInst` or a `CallInst`. I'll update the description.

I see. Thanks for the explanation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91756/new/

https://reviews.llvm.org/D91756

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91756: [CSSPGO] Pseudo probes for function calls.

2020-12-01 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

Another question. Sorry for not bringing it up earlier. When a call with probe 
metadata attached is inlined, the probe will be gone or it will be kept 
somehow? I think you want to keep the probe especially for inline instance to 
reconstruct the context but I didn't figure it out how you did that from the 
description.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91756/new/

https://reviews.llvm.org/D91756

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91756: [CSSPGO] Pseudo probes for function calls.

2020-12-01 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: llvm/include/llvm/IR/PseudoProbe.h:33-34
+  static uint32_t packProbeData(uint32_t Index, uint32_t Type) {
+assert(Index <= 0x);
+assert(Type <= 0x7);
+return (Index << 3) | (Type << 26) | 0x7;

Add assertion messages here and the other places.



Comment at: llvm/lib/Transforms/IPO/SampleProfileProbe.cpp:136-138
+  for (auto &I : CallProbeIds) {
+auto Call = I.first;
+uint32_t Index = I.second;

hoy wrote:
> wmi wrote:
> > for (auto &[Call, Index] : CallProbeIds) {
> Thanks for the suggestion. This is a C++17 usage and may cause a warning with 
> the current build setup which defaults to C++14  (due to -Wc++17)
Ah, right. I mistakenly remember I saw that somewhere in llvm codebase. llvm 
coding standard says we should use c++14.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91756/new/

https://reviews.llvm.org/D91756

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91756: [CSSPGO] Pseudo probes for function calls.

2020-12-01 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: llvm/lib/CodeGen/PseudoProbeInserter.cpp:50
+for (MachineBasicBlock &MBB : MF) {
+  MachineInstr *FirstInstr = nullptr;
+  for (MachineInstr &MI : MBB) {

What is the usage of FirstInstr?



Comment at: llvm/lib/CodeGen/PseudoProbeInserter.cpp:54
+  FirstInstr = &MI;
+if (MI.isCall()) {
+  if (DILocation *DL = MI.getDebugLoc()) {

Will tailcall or other optimizations convert call into something else before 
PseudoProbeInserter pass?



Comment at: llvm/lib/Transforms/IPO/SampleProfileProbe.cpp:136-138
+  for (auto &I : CallProbeIds) {
+auto Call = I.first;
+uint32_t Index = I.second;

for (auto &[Call, Index] : CallProbeIds) {


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91756/new/

https://reviews.llvm.org/D91756

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91676: Avoid redundant work when computing vtable vcall visibility

2020-11-24 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added inline comments.



Comment at: clang/lib/CodeGen/CGVTables.cpp:1301-1302
+  // has no effect on the min visibility computed below by the recursive 
caller.
+  if (!Visited.insert(RD).second)
+return llvm::GlobalObject::VCallVisibilityTranslationUnit;
+

tejohnson wrote:
> wmi wrote:
> > tejohnson wrote:
> > > wmi wrote:
> > > > If a CXXRecordDecl is visited twice, the visibility returned in the 
> > > > second visit could be larger than necessary. Will it change the final 
> > > > result? If it will, can we cache the visibility result got in the first 
> > > > visit instead of returning the max value? 
> > > The recursive callsites compute the std::min of the current TypeVis and 
> > > the one returned by the recursive call. So returning the max guarantees 
> > > that there is no effect on the current TypeVis. Let me know if the 
> > > comment can be clarified (that's what I meant by "so that it has no 
> > > effect on the min visibility computed below ...". Note that the initial 
> > > non-recursive invocation always has an empty Visited set.
> > I see. That makes sense! I didn't understand the location meant by 
> > "computed below by the recursive caller." Your explanation "initial 
> > non-recursive invocation always has an empty Visited set" helps a lot. It 
> > means the immediate result of GetVCallVisibilityLevel may change, but the 
> > result for the initial invocation of the recursive call won't be changed. 
> I've tried to clarify the comment accordingly
Thanks for making the comment more clear. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91676/new/

https://reviews.llvm.org/D91676

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91676: Avoid redundant work when computing vtable vcall visibility

2020-11-24 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.
This revision is now accepted and ready to land.

LGTM.




Comment at: clang/lib/CodeGen/CGVTables.cpp:1301-1302
+  // has no effect on the min visibility computed below by the recursive 
caller.
+  if (!Visited.insert(RD).second)
+return llvm::GlobalObject::VCallVisibilityTranslationUnit;
+

tejohnson wrote:
> wmi wrote:
> > If a CXXRecordDecl is visited twice, the visibility returned in the second 
> > visit could be larger than necessary. Will it change the final result? If 
> > it will, can we cache the visibility result got in the first visit instead 
> > of returning the max value? 
> The recursive callsites compute the std::min of the current TypeVis and the 
> one returned by the recursive call. So returning the max guarantees that 
> there is no effect on the current TypeVis. Let me know if the comment can be 
> clarified (that's what I meant by "so that it has no effect on the min 
> visibility computed below ...". Note that the initial non-recursive 
> invocation always has an empty Visited set.
I see. That makes sense! I didn't understand the location meant by "computed 
below by the recursive caller." Your explanation "initial non-recursive 
invocation always has an empty Visited set" helps a lot. It means the immediate 
result of GetVCallVisibilityLevel may change, but the result for the initial 
invocation of the recursive call won't be changed. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91676/new/

https://reviews.llvm.org/D91676

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91676: Avoid redundant work when computing vtable vcall visibility

2020-11-24 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: clang/lib/CodeGen/CGVTables.cpp:1301-1302
+  // has no effect on the min visibility computed below by the recursive 
caller.
+  if (!Visited.insert(RD).second)
+return llvm::GlobalObject::VCallVisibilityTranslationUnit;
+

If a CXXRecordDecl is visited twice, the visibility returned in the second 
visit could be larger than necessary. Will it change the final result? If it 
will, can we cache the visibility result got in the first visit instead of 
returning the max value? 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91676/new/

https://reviews.llvm.org/D91676

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86502: [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.

2020-11-20 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.
This revision is now accepted and ready to land.

LGTM.




Comment at: clang/lib/Frontend/CompilerInvocation.cpp:978-982
   Opts.DebugInfoForProfiling = Args.hasFlag(
   OPT_fdebug_info_for_profiling, OPT_fno_debug_info_for_profiling, false);
+  Opts.PseudoProbeForProfiling =
+  Args.hasFlag(OPT_fpseudo_probe_for_profiling,
+   OPT_fno_pseudo_probe_for_profiling, false);

hoy wrote:
> wmi wrote:
> > Should it emit an error if DebugInfoForProfiling and 
> > PseudoProbeForProfiling are enabled at the same time? I see that from the 
> > other patch related with pseudoprobe discriminator, these two flags are 
> > incompatible.
> Yes, we should. It's done in D91756 passbuilder.h : 
> https://reviews.llvm.org/D91756#change-Bsibk2p32T1c for both `clang` and 
> `opt`.
> 
Ok, thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86502/new/

https://reviews.llvm.org/D86502

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86502: [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.

2020-11-20 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: clang/lib/Frontend/CompilerInvocation.cpp:978-982
   Opts.DebugInfoForProfiling = Args.hasFlag(
   OPT_fdebug_info_for_profiling, OPT_fno_debug_info_for_profiling, false);
+  Opts.PseudoProbeForProfiling =
+  Args.hasFlag(OPT_fpseudo_probe_for_profiling,
+   OPT_fno_pseudo_probe_for_profiling, false);

Should it emit an error if DebugInfoForProfiling and PseudoProbeForProfiling 
are enabled at the same time? I see that from the other patch related with 
pseudoprobe discriminator, these two flags are incompatible.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86502/new/

https://reviews.llvm.org/D86502

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86193: [CSSPGO] Pseudo probe instrumentation for basic blocks.

2020-09-21 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.
Herald added a subscriber: ecnelises.

The patches split from the main one look good to me. Please see if David has 
further comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86193/new/

https://reviews.llvm.org/D86193

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86502: [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.

2020-08-28 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In D86502#2245578 , @hoy wrote:

> In D86502#2245460 , @wmi wrote:
>
>>> The early instrumentation also allows the inliner to duplicate probes for 
>>> inlined instances. When a probe along with the other instructions of a 
>>> callee function are inlined into its caller function, the GUID of the 
>>> callee function goes with the probe. This allows samples collected on 
>>> inlined probes to be reported for the original callee function.
>>
>> Just get a question from reading the above. Suppose `A` only has one BB and 
>> the BB has one PseudoProbe in it. If function `A` is inlined into `B1` and 
>> `B2` and both `B1` and `B2` inlined into `C`, the PseudoProbe from `A` will 
>> have two copies in `C` both carrying GUID of `A`. How the samples collected 
>> from `A` inlined into `B1` inlined into `C` are categorized differently from 
>> `A` inlined into `B2` inlined into `C`, especially when debug information is 
>> not enabled (so no inline stack information in the binary)?
>
> This is a very good question. Inlined functions are differentiated by their 
> original callsites. A pseudo probe is allocated for each callsite in the 
> `SampleProfileProbe` pass. Nested inlining will produce a stack of pseudo 
> probes, similar with the Dwarf inline stack. The work is not included in the 
> first set of patches.

Thanks, then how does the pseudo probe for a callsite after inline to represent 
the inline scope it covers?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86502/new/

https://reviews.llvm.org/D86502

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86502: [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.

2020-08-28 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.
Herald added a subscriber: danielkiss.

> The early instrumentation also allows the inliner to duplicate probes for 
> inlined instances. When a probe along with the other instructions of a callee 
> function are inlined into its caller function, the GUID of the callee 
> function goes with the probe. This allows samples collected on inlined probes 
> to be reported for the original callee function.

Just get a question from reading the above. Suppose `A` only has one BB and the 
BB has one PseudoProbe in it. If function `A` is inlined into `B1` and `B2` and 
both `B1` and `B2` inlined into `C`, the PseudoProbe from `A` will have two 
copies in `C` both carrying GUID of `A`. How the samples collected from `A` 
inlined into `B1` inlined into `C` are categorized differently from `A` inlined 
into `B2` inlined into `C`, especially when debug information is not enabled 
(so no inline stack information in the binary)?




Comment at: llvm/include/llvm/Passes/PassBuilder.h:67-69
+// Pseudo probe instrumentation should only work with autoFDO or no FDO.
+assert(!this->PseudoProbeForProfiling || this->Action == NoAction ||
+   this->Action == SampleUse);

Need it to work with more types of action for example instrumentation FDO or cs 
instrumentation FDO. For instrumentation FDO optimized binary, we may want to 
collect AutoFDO profile for it for performance comparison, enhance the 
intrumentation profile with AutoFDO profile to make the profile more production 
representative, ...

Currently debug information based AutoFDO supports it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86502/new/

https://reviews.llvm.org/D86502

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86193: [CSSPGO] Pseudo probe instrumentation for basic blocks.

2020-08-26 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In D86193#2240502 , @hoy wrote:

> In D86193#2240353 , @wmi wrote:
>
>>> There are some optimizations such as if-convert, tail call elimination, 
>>> that were initially blocked by the pseudo probe intrinsic but is now 
>>> unblocked by fixes included in this change. With the current change we do 
>>> not see perf degradation out of SPEC and one of our internal large services.
>>> The main optimizations left blocked intentionally are those that merge 
>>> blocks for smaller code size, such as tail merge which is the opposite of 
>>> jump threading. We believe that those optimizations are not very beneficial 
>>> for performance and AutoFDO.
>>
>> If the optimizations are not very beneficial for performance and AutoFDO and 
>> should be blocked, it may be better to block them in a more general way and 
>> not depend on pseudo probe, because blocking them may also be beneficial for 
>> debug info based AutoFDO.
>
> In theory, yes, we should have a black list of transforms (mainly related to 
> block merge) that are not needed by AutoFDO and block them. In reality it 
> might take quite some efforts to figure them out. Pseudo probe, on the other 
> hand, starts with blocking those transforms in the first place and relax the 
> ones that might actually help AutoFDO.
>
>> Another reason is that pseudo probe looks pretty much like debug information 
>> to me. They are used to annotate the IR but shouldn't affect the 
>> transformation. Binaries built w/wo debug information are required to be 
>> identical in LLVM. I think that requirement could be applied on pseudo probe 
>> as well. It is even better to have some test to enforce it so that no change 
>> in the future could break the requirement.
>
> Good point! Yes, pseudo probe is implemented in a similar way with the debug 
> intrinsics. However they are not guaranteed to not affect the codegen since 
> its main purpose is to achieve an accurate profile correlation with low cost. 
> Regarding the cost, it sits somewhere between the debug intrinsics and the 
> PGO instrumentation and close to a zero cost in practice.

I see. It makes sense to fix up some important transformations to achieve the 
goal of low cost. To achieve the goal of not affecting codegen needs a lot more 
effort to test and fix up all over the pipeline. I don't mean to have it ready 
in the patch, but I think it maybe something worthy to strive for later on.

> Agreed that it would be better to have tests protect the pseudo probe cost 
> from going too high, but not sure which optimizations we should start with. 
> Maybe to start with some critical optimizations like inlining, vectorization?

The test I have in my mind comes from debug info. It is to bootstrap llvm with 
and without debug information. The test is to check whether the binaries built 
after stripping the debug information are identical. I am thinking pseudo probe 
can have such test setup somewhere sometime in the future. Same as above, it 
doesn't have to be ready currently.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86193/new/

https://reviews.llvm.org/D86193

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86193: [CSSPGO] Pseudo probe instrumentation for basic blocks.

2020-08-26 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

> There are some optimizations such as if-convert, tail call elimination, that 
> were initially blocked by the pseudo probe intrinsic but is now unblocked by 
> fixes included in this change. With the current change we do not see perf 
> degradation out of SPEC and one of our internal large services.
> The main optimizations left blocked intentionally are those that merge blocks 
> for smaller code size, such as tail merge which is the opposite of jump 
> threading. We believe that those optimizations are not very beneficial for 
> performance and AutoFDO.

If the optimizations are not very beneficial for performance and AutoFDO and 
should be blocked, it may be better to block them in a more general way and not 
depend on pseudo probe, because blocking them may also be beneficial for debug 
info based AutoFDO.

Another reason is that pseudo probe looks pretty much like debug information to 
me. They are used to annotate the IR but shouldn't affect the transformation. 
Binaries built w/wo debug information are required to be identical in LLVM. I 
think that requirement could be applied on pseudo probe as well. It is even 
better to have some test to enforce it so that no change in the future could 
break the requirement.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86193/new/

https://reviews.llvm.org/D86193

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86193: [CSSPGO] Pseudo probe instrumentation for basic blocks.

2020-08-23 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

Thanks for the patch! A few questions:

> probe blocks some CFG transformations that can mess up profile correlation.

Can you enumerate some CFG transformations which be blocked? Is it possible 
that some CFG transformations being blocked are actually beneficial for later 
optimizations?

Are the intrinsic probes counted when computing bb size and function size?

And could you split the patches into small parts for easier review. For 
example,  Add the intrinsic support in IR and MIR. SampleProfileProbe pass. 
-fpseudo-probe-for-profiling support. changes in various passes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86193/new/

https://reviews.llvm.org/D86193

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79959: [SampleFDO] Add use-sample-profile function attribute

2020-06-02 Thread Wei Mi via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG7a6c89427c9b: [SampleFDO] Add use-sample-profile function 
attribute. (authored by wmi).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D79959?vs=267957&id=268030#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79959/new/

https://reviews.llvm.org/D79959

Files:
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/test/CodeGen/use-sample-profile-attr.c
  llvm/include/llvm/IR/Attributes.td
  llvm/lib/Transforms/IPO/SampleProfile.cpp
  llvm/test/LTO/Resolution/X86/load-sample-prof-icp.ll
  llvm/test/LTO/Resolution/X86/load-sample-prof-lto.ll
  llvm/test/LTO/Resolution/X86/load-sample-prof.ll
  llvm/test/Transforms/Inline/inline-incompat-attrs.ll
  llvm/test/Transforms/Inline/partial-inline-incompat-attrs.ll
  llvm/test/Transforms/SampleProfile/Inputs/profile-symbol-list.ll
  llvm/test/Transforms/SampleProfile/Inputs/use-sample-profile-attr.prof
  llvm/test/Transforms/SampleProfile/branch.ll
  llvm/test/Transforms/SampleProfile/calls.ll
  llvm/test/Transforms/SampleProfile/cold-indirect-call.ll
  llvm/test/Transforms/SampleProfile/cov-zero-samples.ll
  llvm/test/Transforms/SampleProfile/coverage-warning.ll
  llvm/test/Transforms/SampleProfile/discriminator.ll
  llvm/test/Transforms/SampleProfile/early-inline.ll
  llvm/test/Transforms/SampleProfile/entry_counts.ll
  llvm/test/Transforms/SampleProfile/entry_counts_cold.ll
  llvm/test/Transforms/SampleProfile/entry_counts_missing_dbginfo.ll
  llvm/test/Transforms/SampleProfile/fnptr.ll
  llvm/test/Transforms/SampleProfile/function_metadata.ll
  llvm/test/Transforms/SampleProfile/gcc-simple.ll
  llvm/test/Transforms/SampleProfile/indirect-call-gcc.ll
  llvm/test/Transforms/SampleProfile/indirect-call.ll
  llvm/test/Transforms/SampleProfile/inline-callee-update.ll
  llvm/test/Transforms/SampleProfile/inline-cold-callsite-samplepgo.ll
  llvm/test/Transforms/SampleProfile/inline-cold.ll
  llvm/test/Transforms/SampleProfile/inline-combine.ll
  llvm/test/Transforms/SampleProfile/inline-coverage.ll
  llvm/test/Transforms/SampleProfile/inline-mergeprof.ll
  llvm/test/Transforms/SampleProfile/inline-stats.ll
  llvm/test/Transforms/SampleProfile/inline-topdown.ll
  llvm/test/Transforms/SampleProfile/inline.ll
  llvm/test/Transforms/SampleProfile/nolocinfo.ll
  llvm/test/Transforms/SampleProfile/offset.ll
  llvm/test/Transforms/SampleProfile/profile-format-compress.ll
  llvm/test/Transforms/SampleProfile/profile-format.ll
  llvm/test/Transforms/SampleProfile/profile-sample-accurate.ll
  llvm/test/Transforms/SampleProfile/propagate.ll
  llvm/test/Transforms/SampleProfile/remap.ll
  llvm/test/Transforms/SampleProfile/remarks.ll
  llvm/test/Transforms/SampleProfile/section-accurate-samplepgo.ll
  llvm/test/Transforms/SampleProfile/syntax.ll
  llvm/test/Transforms/SampleProfile/use-sample-profile-attr.ll
  llvm/test/Transforms/SampleProfile/warm-inline-instance.ll

Index: llvm/test/Transforms/SampleProfile/warm-inline-instance.ll
===
--- llvm/test/Transforms/SampleProfile/warm-inline-instance.ll
+++ llvm/test/Transforms/SampleProfile/warm-inline-instance.ll
@@ -4,7 +4,7 @@
 @.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1
 
 ; Function Attrs: nounwind uwtable
-define i32 @foo(i32 %x, i32 %y) !dbg !4 {
+define i32 @foo(i32 %x, i32 %y) #0 !dbg !4 {
 entry:
   %x.addr = alloca i32, align 4
   %y.addr = alloca i32, align 4
@@ -16,7 +16,7 @@
   ret i32 %add, !dbg !11
 }
 
-define i32 @goo(i32 %x, i32 %y) {
+define i32 @goo(i32 %x, i32 %y) #0 {
 entry:
   %x.addr = alloca i32, align 4
   %y.addr = alloca i32, align 4
@@ -29,7 +29,7 @@
 }
 
 ; Function Attrs: uwtable
-define i32 @main() !dbg !7 {
+define i32 @main() #0 !dbg !7 {
 entry:
   %retval = alloca i32, align 4
   %s = alloca i32, align 4
@@ -83,6 +83,8 @@
 
 declare i32 @printf(i8*, ...) #2
 
+attributes #0 = { "use-sample-profile" }
+
 !llvm.dbg.cu = !{!0}
 !llvm.module.flags = !{!8, !9}
 !llvm.ident = !{!10}
Index: llvm/test/Transforms/SampleProfile/use-sample-profile-attr.ll
===
--- llvm/test/Transforms/SampleProfile/use-sample-profile-attr.ll
+++ llvm/test/Transforms/SampleProfile/use-sample-profile-attr.ll
@@ -1,26 +1,18 @@
-; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline.prof -S | FileCheck %s
-; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline.prof -S | FileCheck %s
-
-; Original C++ test case
-;
-; #include 
-;
-; int sum(int x, int y) {
-;   return x + y;
-; }
-;
-; int main() {
-;   int s, i = 0;
-;   while (i++ < 2 * 2)
-; if (i != 100) s = sum(i, s); else s = 30;
-;   printf("sum is %d\n", s);
-;   return 0;
-; }
-;
+; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/use-sample-p

[PATCH] D77989: Allow disabling of vectorization using internal options

2020-04-14 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.
This revision is now accepted and ready to land.

There is a comment about the bitwise operator, otherwise LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77989/new/

https://reviews.llvm.org/D77989



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77989: Allow disabling of vectorization using internal options

2020-04-13 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In D77989#1979725 , @wmi wrote:

> The patch makes the reasoning about when the internal flags will take effect 
> much easier.
>
> One question is, with the change, the internal flags can only be used to 
> disable loop vectorization/slp vectorization/loop interleaving, but not to 
> enable them. Can we treat them in the way that the flags in 
> PipelineTuningOptions contain the default values set by Opt Level, and the 
> internal flags can override them, which can either enable or disable the 
> vectorizations/interleaving.


I realize it is not easy to do since the internal flags are bool type, and 
there is no easy way to specify unspecified values for the internal flags which 
could allow us to use the default values set by Opt Level.  Ideally, we want 
-slp-vectorize=true to enable slp, -slp-vectorize=false to disable slp, no 
-slp-vectorize flag to keep the default value, but that is not easy to achieve 
with current commandline library.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77989/new/

https://reviews.llvm.org/D77989



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77989: Allow disabling of vectorization using internal options

2020-04-13 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

The patch makes the reasoning about when the internal flags will take effect 
much easier.

One question is, with the change, the internal flags can only be used to 
disable loop vectorization/slp vectorization/loop interleaving, but not to 
enable them. Can we treat them in the way that the flags in 
PipelineTuningOptions contain the default values set by Opt Level, and the 
internal flags can override them, which can either enable or disable the 
vectorizations/interleaving.

To do that, we need to let SLPVectorizerPass be added into pass manager anyway 
and pass PTO.SLPVectorization as a param (now adding SLPVectorizerPass is 
guarded by PTO.SLPVectorization). Whether SLPVectorizerPass will be performed 
can be decided inside of SLPVectorizerPass.




Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7618-7621
+: InterleaveOnlyWhenForced(Opts.InterleaveOnlyWhenForced |
+   !EnableLoopInterleaving),
+  VectorizeOnlyWhenForced(Opts.VectorizeOnlyWhenForced |
+  !EnableLoopVectorization) {}

Use boolean operator || instead of bitwise operator?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77989/new/

https://reviews.llvm.org/D77989



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D73307: Unique Names for Functions with Internal Linkage

2020-03-30 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: clang/test/CodeGen/unique-internal-linkage-names.cpp:5
+// RUN: %clang_cc1 -triple x86_64 -x c++ -S -emit-llvm 
-funique-internal-linkage-names -o - < %s | FileCheck %s --check-prefix=UNIQUE
+
+static int glob;

MaskRay wrote:
> Might be worth adding a few other internal linkage names.
> 
> * an object in an unnamed namespace
> * an extern "C" static function
> * a function-local static variable
> * `label: &&label`
> 
> Hope @mtrofin and @davidxl can clarify what internal names may benefit AFDO 
> and we can add such internal names specifically.
Only internal functions matter for AFDO. Other types of internal names are 
irrelevant.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73307/new/

https://reviews.llvm.org/D73307



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72538: [ThinLTO] Add additional ThinLTO pipeline testing with new PM

2020-01-10 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.
This revision is now accepted and ready to land.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72538/new/

https://reviews.llvm.org/D72538



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72538: [ThinLTO] Add additional ThinLTO pipeline testing with new PM

2020-01-10 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

The additional pipeline testing will catch any future pass change to the 
pipeline. A related but separate question is do we have a way to check whether 
there is any other missing pass in thinlto newpm similar as that in D72386 
?




Comment at: clang/test/CodeGen/thinlto-distributed-newpm.ll:11
+
+; RUN: %clang_cc1 -triple x86_64-grtev4-linux-gnu \
+; RUN:   -O2 -fexperimental-new-pass-manager -fdebug-pass-manager \

Can we test %clang intead of %clang_cc1? so we don't have to add flags like 
-vectorize-slp and -vectorize-loops. clang `O2/O3` pipeline is something we 
care about. Those flags added by clang to cc1 options may be subject to change. 
 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72538/new/

https://reviews.llvm.org/D72538



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72386: [ThinLTO] pass UnrollLoops/VectorizeLoop/VectorizeSLP in CGOpts down to pass builder in ltobackend

2020-01-09 Thread Wei Mi via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG21a4710c67a9: [ThinLTO] Pass CodeGenOpts like 
UnrollLoops/VectorizeLoop/VectorizeSLP down to… (authored by wmi).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Changed prior to commit:
  https://reviews.llvm.org/D72386?vs=237212&id=237240#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72386/new/

https://reviews.llvm.org/D72386

Files:
  clang/lib/CodeGen/BackendUtil.cpp
  clang/test/CodeGen/thinlto-slp-vectorize-pm.c
  lld/COFF/CMakeLists.txt
  lld/ELF/CMakeLists.txt
  lld/ELF/LTO.cpp
  lld/test/ELF/lto/slp-vectorize-pm.ll
  lld/wasm/CMakeLists.txt
  llvm/include/llvm/LTO/Config.h
  llvm/lib/LTO/LTOBackend.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/new-pm-thinlto-defaults.ll
  llvm/test/tools/gold/X86/slp-vectorize-pm.ll
  llvm/test/tools/llvm-lto2/X86/slp-vectorize-pm.ll
  llvm/tools/gold/gold-plugin.cpp
  llvm/tools/llvm-lto2/CMakeLists.txt
  llvm/tools/llvm-lto2/llvm-lto2.cpp

Index: llvm/tools/llvm-lto2/llvm-lto2.cpp
===
--- llvm/tools/llvm-lto2/llvm-lto2.cpp
+++ llvm/tools/llvm-lto2/llvm-lto2.cpp
@@ -270,6 +270,8 @@
   Conf.OverrideTriple = OverrideTriple;
   Conf.DefaultTriple = DefaultTriple;
   Conf.StatsFile = StatsFile;
+  Conf.PTO.LoopVectorization = Conf.OptLevel > 1;
+  Conf.PTO.SLPVectorization = Conf.OptLevel > 1;
 
   ThinBackend Backend;
   if (ThinLTODistributedIndexes)
Index: llvm/tools/llvm-lto2/CMakeLists.txt
===
--- llvm/tools/llvm-lto2/CMakeLists.txt
+++ llvm/tools/llvm-lto2/CMakeLists.txt
@@ -9,6 +9,7 @@
   LTO
   MC
   Object
+  Passes
   Support
   Target
   )
Index: llvm/tools/gold/gold-plugin.cpp
===
--- llvm/tools/gold/gold-plugin.cpp
+++ llvm/tools/gold/gold-plugin.cpp
@@ -860,6 +860,9 @@
   Conf.CGOptLevel = getCGOptLevel();
   Conf.DisableVerify = options::DisableVerify;
   Conf.OptLevel = options::OptLevel;
+  Conf.PTO.LoopVectorization = options::OptLevel > 1;
+  Conf.PTO.SLPVectorization = options::OptLevel > 1;
+
   if (options::Parallelism)
 Backend = createInProcessThinBackend(options::Parallelism);
   if (options::thinlto_index_only) {
Index: llvm/test/tools/llvm-lto2/X86/slp-vectorize-pm.ll
===
--- /dev/null
+++ llvm/test/tools/llvm-lto2/X86/slp-vectorize-pm.ll
@@ -0,0 +1,51 @@
+; RUN: opt -module-summary %s -o %t1.bc
+
+; Test SLP and Loop Vectorization are enabled by default at O2 and O3.
+; RUN: llvm-lto2 run %t1.bc -o %t2.o -O0 -r %t1.bc,foo,plx -debug-pass-manager \
+; RUN:  -use-new-pm -save-temps 2>&1 | FileCheck %s --check-prefix=CHECK-O0-SLP
+; RUN: llvm-dis %t2.o.1.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-O0-LPV
+
+; RUN: llvm-lto2 run %t1.bc -o %t3.o -O1 -r %t1.bc,foo,plx -debug-pass-manager \
+; RUN:  -use-new-pm -save-temps 2>&1 | FileCheck %s --check-prefix=CHECK-O1-SLP
+; RUN: llvm-dis %t3.o.1.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-O1-LPV
+
+; RUN: llvm-lto2 run %t1.bc -o %t4.o -O2 -r %t1.bc,foo,plx -debug-pass-manager \
+; RUN:  -use-new-pm -save-temps 2>&1 | FileCheck %s --check-prefix=CHECK-O2-SLP
+; RUN: llvm-dis %t4.o.1.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-O2-LPV
+
+; RUN: llvm-lto2 run %t1.bc -o %t5.o -O3 -r %t1.bc,foo,plx -debug-pass-manager \
+; RUN:  -use-new-pm -save-temps 2>&1 | FileCheck %s --check-prefix=CHECK-O3-SLP
+; RUN: llvm-dis %t5.o.1.4.opt.bc -o - | FileCheck %s --check-prefix=CHECK-O3-LPV
+
+; CHECK-O0-SLP-NOT: Running pass: SLPVectorizerPass
+; CHECK-O1-SLP-NOT: Running pass: SLPVectorizerPass
+; CHECK-O2-SLP: Running pass: SLPVectorizerPass
+; CHECK-O3-SLP: Running pass: SLPVectorizerPass
+; CHECK-O0-LPV-NOT: = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-O1-LPV-NOT: = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-O2-LPV: = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-O3-LPV: = !{!"llvm.loop.isvectorized", i32 1}
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @foo(i32* %a) {
+entry:
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %red.05 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
+  %0 = load i32, i32* %arrayidx, align 4
+  %add = add nsw i32 %0, %red.05
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond = icmp eq i64 %indvars.iv.next, 255
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
+
+for.end:
+  ret i32 %add
+}
+
+!0 = distinct !{!0, !1}
+!1 = !{!"llvm.loop.unroll.disable", i1 true}
Index: llvm/test/tools/gold/X86/slp-vectorize-pm.ll
===

[PATCH] D71485: [profile] Fix a crash when -fprofile-remapping-file= triggers an error

2019-12-13 Thread Wei Mi via Phabricator via cfe-commits
wmi accepted this revision.
wmi added a comment.
This revision is now accepted and ready to land.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71485/new/

https://reviews.llvm.org/D71485



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2019-08-25 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In D36562#1642403 , @chill wrote:

> In D36562#1641930 , @wmi wrote:
>
> > In D36562#1639441 , @chill wrote:
> >
> > > Shouldn't we disable `OPT_ffine_grained_bitfield_accesses` only if TSAN 
> > > is active?
> >
> >
> > I don't remember why it is disabled for all sanitizer modes. Seems you are 
> > right that the disabling the option is only necessary for TSAN. Do you have 
> > actual needs for the option to be functioning on other sanitizer modes?
>
>
> Well, yes and no. We have the option enabled by default and it causes a 
> warning when we use it together with `-fsanitize=memtag` (we aren't really 
> concerned with other sanitizers). That warning broke a few builds (e.g. CMake 
> doing tests and not wanting to see *any* diagnostics. We can work around that 
> in a number of ways, e.g. we can leave the default off for AArch64.
>
> I'd prefer though to have an upstream solution, if that's considered 
> beneficial for all LLVM users and this one seems like such a case: let anyone 
> use the option with sanitizers, unless it's known that some 
> sanitizers'utility is affected negatively (as with TSAN).


Thanks for providing the background in detail. I sent out a patch for it: 
https://reviews.llvm.org/D66726


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D36562/new/

https://reviews.llvm.org/D36562



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2019-08-22 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In D36562#1639441 , @chill wrote:

> Shouldn't we disable `OPT_ffine_grained_bitfield_accesses` only if TSAN is 
> active?


I don't remember why it is disabled for all sanitizer modes. Seems you are 
right that the disabling the option is only necessary for TSAN. Do you have 
actual needs for the option to be functioning on other sanitizer modes?


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D36562/new/

https://reviews.llvm.org/D36562



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42154: Don't generate inline atomics for i386/i486

2018-01-23 Thread Wei Mi via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL323281: Adjust MaxAtomicInlineWidth for i386/i486 targets. 
(authored by wmi, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D42154?vs=130076&id=131161#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D42154

Files:
  cfe/trunk/lib/Basic/Targets/X86.h
  cfe/trunk/test/CodeGenCXX/atomic-inline.cpp

Index: cfe/trunk/lib/Basic/Targets/X86.h
===
--- cfe/trunk/lib/Basic/Targets/X86.h
+++ cfe/trunk/lib/Basic/Targets/X86.h
@@ -100,6 +100,7 @@
   bool HasRetpoline = false;
   bool HasRetpolineExternalThunk = false;
 
+protected:
   /// \brief Enumeration of all of the X86 CPUs supported by Clang.
   ///
   /// Each enumeration represents a particular CPU supported by Clang. These
@@ -325,9 +326,11 @@
  (1 << TargetInfo::LongDouble));
 
 // x86-32 has atomics up to 8 bytes
-// FIXME: Check that we actually have cmpxchg8b before setting
-// MaxAtomicInlineWidth. (cmpxchg8b is an i586 instruction.)
-MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;
+CPUKind Kind = getCPUKind(Opts.CPU);
+if (Kind >= CK_i586 || Kind == CK_Generic)
+  MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;
+else if (Kind >= CK_i486)
+  MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 32;
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
Index: cfe/trunk/test/CodeGenCXX/atomic-inline.cpp
===
--- cfe/trunk/test/CodeGenCXX/atomic-inline.cpp
+++ cfe/trunk/test/CodeGenCXX/atomic-inline.cpp
@@ -1,6 +1,52 @@
 // RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | FileCheck %s
 // RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu -target-cpu core2 | FileCheck %s --check-prefix=CORE2
-// Check the atomic code generation for cpu targets w/wo cx16 support.
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=i386-linux-gnu -target-cpu i386 | FileCheck %s --check-prefix=I386
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=i386-linux-gnu -target-cpu i486 | FileCheck %s --check-prefix=I486
+// Check the atomic code generation for cpu targets w/wo cx, cx8 and cx16 support.
+
+struct alignas(4) AM4 {
+  short f1, f2;
+};
+AM4 m4;
+AM4 load4() {
+  AM4 am;
+  // CHECK-LABEL: @_Z5load4v
+  // CHECK: load atomic i32, {{.*}} monotonic
+  // CORE2-LABEL: @_Z5load4v
+  // CORE2: load atomic i32, {{.*}} monotonic
+  // I386-LABEL: @_Z5load4v
+  // I386: call i32 @__atomic_load_4
+  // I486-LABEL: @_Z5load4v
+  // I486: load atomic i32, {{.*}} monotonic
+  __atomic_load(&m4, &am, 0);
+  return am;
+}
+
+AM4 s4;
+void store4() {
+  // CHECK-LABEL: @_Z6store4v
+  // CHECK: store atomic i32 {{.*}} monotonic
+  // CORE2-LABEL: @_Z6store4v
+  // CORE2: store atomic i32 {{.*}} monotonic
+  // I386-LABEL: @_Z6store4v
+  // I386: call void @__atomic_store_4
+  // I486-LABEL: @_Z6store4v
+  // I486: store atomic i32 {{.*}} monotonic
+  __atomic_store(&m4, &s4, 0);
+}
+
+bool cmpxchg4() {
+  AM4 am;
+  // CHECK-LABEL: @_Z8cmpxchg4v
+  // CHECK: cmpxchg i32* {{.*}} monotonic
+  // CORE2-LABEL: @_Z8cmpxchg4v
+  // CORE2: cmpxchg i32* {{.*}} monotonic
+  // I386-LABEL: @_Z8cmpxchg4v
+  // I386: call zeroext i1 @__atomic_compare_exchange_4
+  // I486-LABEL: @_Z8cmpxchg4v
+  // I486: cmpxchg i32* {{.*}} monotonic
+  return __atomic_compare_exchange(&m4, &s4, &am, 0, 0, 0);
+}
 
 struct alignas(8) AM8 {
   int f1, f2;
@@ -12,6 +58,10 @@
   // CHECK: load atomic i64, {{.*}} monotonic
   // CORE2-LABEL: @_Z5load8v
   // CORE2: load atomic i64, {{.*}} monotonic
+  // I386-LABEL: @_Z5load8v
+  // I386: call i64 @__atomic_load_8
+  // I486-LABEL: @_Z5load8v
+  // I486: call i64 @__atomic_load_8
   __atomic_load(&m8, &am, 0);
   return am;
 }
@@ -22,6 +72,10 @@
   // CHECK: store atomic i64 {{.*}} monotonic
   // CORE2-LABEL: @_Z6store8v
   // CORE2: store atomic i64 {{.*}} monotonic
+  // I386-LABEL: @_Z6store8v
+  // I386: call void @__atomic_store_8
+  // I486-LABEL: @_Z6store8v
+  // I486: call void @__atomic_store_8
   __atomic_store(&m8, &s8, 0);
 }
 
@@ -31,6 +85,10 @@
   // CHECK: cmpxchg i64* {{.*}} monotonic
   // CORE2-LABEL: @_Z8cmpxchg8v
   // CORE2: cmpxchg i64* {{.*}} monotonic
+  // I386-LABEL: @_Z8cmpxchg8v
+  // I386: call zeroext i1 @__atomic_compare_exchange_8
+  // I486-LABEL: @_Z8cmpxchg8v
+  // I486: call zeroext i1 @__atomic_compare_exchange_8
   return __atomic_compare_exchange(&m8, &s8, &am, 0, 0, 0);
 }
 
@@ -66,4 +124,3 @@
   // CORE2: cmpxchg i128* {{.*}} monotonic
   return __atomic_compare_exchange(&m16, &s16, &am, 0, 0, 0);
 }
-
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D39053: [Bitfield] Add more cases to making the bitfield a separate location

2018-01-22 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

Thanks for the size evaluation. I regarded the change as a natural and limited 
extension to the current fine-grain bitfield access mode, so I feel ok with the 
change. Hal, what do you think?


https://reviews.llvm.org/D39053



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42154: Don't generate inline atomics for i386/i486

2018-01-22 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In https://reviews.llvm.org/D42154#983840, @rjmccall wrote:

> In https://reviews.llvm.org/D42154#977991, @wmi wrote:
>
> > In https://reviews.llvm.org/D42154#977975, @efriedma wrote:
> >
> > > The LLVM backend currently assumes every CPU is Pentium-compatible.  If 
> > > we're going to change that in clang, we should probably fix the backend 
> > > as well.
> >
> >
> > With the patch, for i386/i486 targets, clang will generate more atomic 
> > libcalls than before, for which llvm backend will not do anything extra, so 
> > no fix is necessary in llvm backend for the patch to work.
>
>
> I think Eli's point is that we do not currently support generating code for 
> the 386 and 486 because there are other things in the x86 backend that assume 
> that the target is at minimum a Pentium.  If you're looking to support 
> targeting those chips, you should look into that.


I am not looking to support those chips. Just because 
https://reviews.llvm.org/rL314145 changed the status of FreeBSD ports on 32 
bits x86 as reported in https://bugs.llvm.org/show_bug.cgi?id=34347#c6, I want 
to provide a workaround and give them a relief. 
This is also the intention of keeping MaxAtomicInlineWidth to be 64 for 
CK_Generic. In this way, it provides the minimum churn for current status -- 
otherwise at least a bunch of tests need to be fixed.


Repository:
  rC Clang

https://reviews.llvm.org/D42154



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42154: Don't generate inline atomics for i386/i486

2018-01-16 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: lib/Basic/Targets/X86.h:472
+CPUKind Kind = getCPUKind(Opts.CPU);
+if (Kind >= CK_i586 || Kind == CK_Generic)
+  MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;

wmi wrote:
> craig.topper wrote:
> > craig.topper wrote:
> > > efriedma wrote:
> > > > What exactly does "CK_Generic" mean here?  Under what circumstances 
> > > > does it show up? We should have a specific default for every supported 
> > > > target, if user doesn't specify a CPU with -march.
> > > CK_Generic is the default for 32-bit mode with no -march. CK_x86_64 is 
> > > the default for 64-bit mode with no march.
> > > 
> > > I'm not sure we can assume CK_Generic is 586 or better. We don't assume 
> > > that in the preprocessor defines.
> > > 
> > > ```
> > >   if (CPU >= CK_i486) {
> > > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1");
> > > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2");
> > > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4");
> > >   }
> > >   if (CPU >= CK_i586)
> > > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8");
> > >   if (HasCX16)
> > > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16");
> > > ```
> > Maybe we don't default to "generic" anywhere.  Its very platform specific. 
> > getX86TargetCPU in lib/Driver/ToolChains/Arch/X86.cpp contains the magic I 
> > think.
> Like the command "clang -cc1 -triple=i686-linux-gnu", the cpu will be set to 
> CK_Generic. I want to set the MaxAtomicInlineWidth of the default target to 
> 64 (same as before) to minimize the churn. In the patch, MaxAtomicInlineWidth 
> will only be changed if i386/i486 are explicitly specified.
In X86TargetInfo::getCPUKind (tools/clang/lib/Basic/Targets/X86.cpp), we 
default to "generic", for both 32 bits and 64 bits modes.


Repository:
  rC Clang

https://reviews.llvm.org/D42154



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42154: Don't generate inline atomics for i386/i486

2018-01-16 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In https://reviews.llvm.org/D42154#977975, @efriedma wrote:

> The LLVM backend currently assumes every CPU is Pentium-compatible.  If we're 
> going to change that in clang, we should probably fix the backend as well.


With the patch, for i386/i486 targets, clang will generate more atomic libcalls 
than before, for which llvm backend will not do anything extra, so no fix is 
necessary in llvm backend for the patch to work.




Comment at: lib/Basic/Targets/X86.h:472
+CPUKind Kind = getCPUKind(Opts.CPU);
+if (Kind >= CK_i586 || Kind == CK_Generic)
+  MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;

craig.topper wrote:
> craig.topper wrote:
> > efriedma wrote:
> > > What exactly does "CK_Generic" mean here?  Under what circumstances does 
> > > it show up? We should have a specific default for every supported target, 
> > > if user doesn't specify a CPU with -march.
> > CK_Generic is the default for 32-bit mode with no -march. CK_x86_64 is the 
> > default for 64-bit mode with no march.
> > 
> > I'm not sure we can assume CK_Generic is 586 or better. We don't assume 
> > that in the preprocessor defines.
> > 
> > ```
> >   if (CPU >= CK_i486) {
> > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1");
> > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2");
> > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4");
> >   }
> >   if (CPU >= CK_i586)
> > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8");
> >   if (HasCX16)
> > Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16");
> > ```
> Maybe we don't default to "generic" anywhere.  Its very platform specific. 
> getX86TargetCPU in lib/Driver/ToolChains/Arch/X86.cpp contains the magic I 
> think.
Like the command "clang -cc1 -triple=i686-linux-gnu", the cpu will be set to 
CK_Generic. I want to set the MaxAtomicInlineWidth of the default target to 64 
(same as before) to minimize the churn. In the patch, MaxAtomicInlineWidth will 
only be changed if i386/i486 are explicitly specified.


Repository:
  rC Clang

https://reviews.llvm.org/D42154



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42154: Don't generate inline atomics for i386/i486

2018-01-16 Thread Wei Mi via Phabricator via cfe-commits
wmi created this revision.
wmi added reviewers: rjmccall, eli.friedman, rsmith.
Herald added subscribers: cfe-commits, eraman, sanjoy.

This is to fix the bug reported in 
https://bugs.llvm.org/show_bug.cgi?id=34347#c6

Currently, all  MaxAtomicInlineWidth of x86-32 targets are set to 64. However, 
i386 doesn't support any cmpxchg related instructions. i486 only supports 
cmpxchg. So in this patch MaxAtomicInlineWidth is reset as follows.
For i486, the MaxAtomicInlineWidth should be 32 because it supports cmpxchg. 
For i386, the MaxAtomicInlineWidth should be 0. 
For others x86-32 cpu, the MaxAtomicInlineWidth should be 64 because of 
cmpxchg8b.


Repository:
  rC Clang

https://reviews.llvm.org/D42154

Files:
  lib/Basic/Targets/X86.h
  test/CodeGenCXX/atomic-inline.cpp

Index: test/CodeGenCXX/atomic-inline.cpp
===
--- test/CodeGenCXX/atomic-inline.cpp
+++ test/CodeGenCXX/atomic-inline.cpp
@@ -1,6 +1,52 @@
 // RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | FileCheck %s
 // RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu -target-cpu core2 | FileCheck %s --check-prefix=CORE2
-// Check the atomic code generation for cpu targets w/wo cx16 support.
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=i386-linux-gnu -target-cpu i386 | FileCheck %s --check-prefix=I386
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=i386-linux-gnu -target-cpu i486 | FileCheck %s --check-prefix=I486
+// Check the atomic code generation for cpu targets w/wo cx, cx8 and cx16 support.
+
+struct alignas(4) AM4 {
+  short f1, f2;
+};
+AM4 m4;
+AM4 load4() {
+  AM4 am;
+  // CHECK-LABEL: @_Z5load4v
+  // CHECK: load atomic i32, {{.*}} monotonic
+  // CORE2-LABEL: @_Z5load4v
+  // CORE2: load atomic i32, {{.*}} monotonic
+  // I386-LABEL: @_Z5load4v
+  // I386: call i32 @__atomic_load_4
+  // I486-LABEL: @_Z5load4v
+  // I486: load atomic i32, {{.*}} monotonic
+  __atomic_load(&m4, &am, 0);
+  return am;
+}
+
+AM4 s4;
+void store4() {
+  // CHECK-LABEL: @_Z6store4v
+  // CHECK: store atomic i32 {{.*}} monotonic
+  // CORE2-LABEL: @_Z6store4v
+  // CORE2: store atomic i32 {{.*}} monotonic
+  // I386-LABEL: @_Z6store4v
+  // I386: call void @__atomic_store_4
+  // I486-LABEL: @_Z6store4v
+  // I486: store atomic i32 {{.*}} monotonic
+  __atomic_store(&m4, &s4, 0);
+}
+
+bool cmpxchg4() {
+  AM4 am;
+  // CHECK-LABEL: @_Z8cmpxchg4v
+  // CHECK: cmpxchg i32* {{.*}} monotonic
+  // CORE2-LABEL: @_Z8cmpxchg4v
+  // CORE2: cmpxchg i32* {{.*}} monotonic
+  // I386-LABEL: @_Z8cmpxchg4v
+  // I386: call zeroext i1 @__atomic_compare_exchange_4
+  // I486-LABEL: @_Z8cmpxchg4v
+  // I486: cmpxchg i32* {{.*}} monotonic
+  return __atomic_compare_exchange(&m4, &s4, &am, 0, 0, 0);
+}
 
 struct alignas(8) AM8 {
   int f1, f2;
@@ -12,6 +58,10 @@
   // CHECK: load atomic i64, {{.*}} monotonic
   // CORE2-LABEL: @_Z5load8v
   // CORE2: load atomic i64, {{.*}} monotonic
+  // I386-LABEL: @_Z5load8v
+  // I386: call i64 @__atomic_load_8
+  // I486-LABEL: @_Z5load8v
+  // I486: call i64 @__atomic_load_8
   __atomic_load(&m8, &am, 0);
   return am;
 }
@@ -22,6 +72,10 @@
   // CHECK: store atomic i64 {{.*}} monotonic
   // CORE2-LABEL: @_Z6store8v
   // CORE2: store atomic i64 {{.*}} monotonic
+  // I386-LABEL: @_Z6store8v
+  // I386: call void @__atomic_store_8
+  // I486-LABEL: @_Z6store8v
+  // I486: call void @__atomic_store_8
   __atomic_store(&m8, &s8, 0);
 }
 
@@ -31,6 +85,10 @@
   // CHECK: cmpxchg i64* {{.*}} monotonic
   // CORE2-LABEL: @_Z8cmpxchg8v
   // CORE2: cmpxchg i64* {{.*}} monotonic
+  // I386-LABEL: @_Z8cmpxchg8v
+  // I386: call zeroext i1 @__atomic_compare_exchange_8
+  // I486-LABEL: @_Z8cmpxchg8v
+  // I486: call zeroext i1 @__atomic_compare_exchange_8
   return __atomic_compare_exchange(&m8, &s8, &am, 0, 0, 0);
 }
 
@@ -66,4 +124,3 @@
   // CORE2: cmpxchg i128* {{.*}} monotonic
   return __atomic_compare_exchange(&m16, &s16, &am, 0, 0, 0);
 }
-
Index: lib/Basic/Targets/X86.h
===
--- lib/Basic/Targets/X86.h
+++ lib/Basic/Targets/X86.h
@@ -89,6 +89,7 @@
   bool HasMOVBE = false;
   bool HasPREFETCHWT1 = false;
 
+protected:
   /// \brief Enumeration of all of the X86 CPUs supported by Clang.
   ///
   /// Each enumeration represents a particular CPU supported by Clang. These
@@ -467,9 +468,11 @@
  (1 << TargetInfo::LongDouble));
 
 // x86-32 has atomics up to 8 bytes
-// FIXME: Check that we actually have cmpxchg8b before setting
-// MaxAtomicInlineWidth. (cmpxchg8b is an i586 instruction.)
-MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;
+CPUKind Kind = getCPUKind(Opts.CPU);
+if (Kind >= CK_i586 || Kind == CK_Generic)
+  MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 64;
+else if (Kind >= CK_i486)
+  MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 32;
   }
 
   BuiltinVaListKind getBuiltinVaListKind() co

[PATCH] D39053: [Bitfield] Add more cases to making the bitfield a separate location

2017-11-13 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

I think it may be hard to fix the problem in backend. It will face the same 
issue of store-to-load forwarding if at some places the transformation happens 
but at some other places somehow it doesn't.

But I am not sure whether it is acceptable to add more finegrain bitfield 
access transformations in frontend. It is a tradeoff. From my experience trying 
your patch on x8664, I saw saving of some instructions because with the 
transformation we now use shorter immediate consts which can be folded into 
other instructions instead of loading them in separate moves. But the bit 
operations are not saved (I may be wrong), so I have no idea whether it is 
critical for performance (To provide some background, we introduced finegrain 
field access because the original problem introduced a serious performance 
issue in some critical libraries. But doing this kind of bitfield 
transformations in frontend can potentially harm some other applications. That 
is why it is off by default, and Hal had concern about adding more such 
transformations. ).  Do you have performance number on some benchmarks to 
justify its importance? That may help folks to make a better trade off here.


https://reviews.llvm.org/D39053



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-10-16 Thread Wei Mi via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL315915: [Bitfield] Add an option to access bitfield in a 
fine-grained manner. (authored by wmi).

Changed prior to commit:
  https://reviews.llvm.org/D36562?vs=118181&id=119170#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D36562

Files:
  cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td
  cfe/trunk/include/clang/Driver/Options.td
  cfe/trunk/include/clang/Frontend/CodeGenOptions.def
  cfe/trunk/lib/CodeGen/CGRecordLayoutBuilder.cpp
  cfe/trunk/lib/Driver/ToolChains/Clang.cpp
  cfe/trunk/lib/Frontend/CompilerInvocation.cpp
  cfe/trunk/test/CodeGenCXX/finegrain-bitfield-access.cpp

Index: cfe/trunk/include/clang/Driver/Options.td
===
--- cfe/trunk/include/clang/Driver/Options.td
+++ cfe/trunk/include/clang/Driver/Options.td
@@ -1045,6 +1045,13 @@
   Group, Flags<[CC1Option]>,
   HelpText<"Filename defining the whitelist for imbuing the 'never instrument' XRay attribute.">;
 
+def ffine_grained_bitfield_accesses : Flag<["-"],
+  "ffine-grained-bitfield-accesses">, Group, Flags<[CC1Option]>,
+  HelpText<"Use separate accesses for bitfields with legal widths and alignments.">;
+def fno_fine_grained_bitfield_accesses : Flag<["-"],
+  "fno-fine-grained-bitfield-accesses">, Group, Flags<[CC1Option]>,
+  HelpText<"Use large-integer access for consecutive bitfield runs.">;
+
 def flat__namespace : Flag<["-"], "flat_namespace">;
 def flax_vector_conversions : Flag<["-"], "flax-vector-conversions">, Group;
 def flimited_precision_EQ : Joined<["-"], "flimited-precision=">, Group;
Index: cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td
===
--- cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td
+++ cfe/trunk/include/clang/Basic/DiagnosticDriverKinds.td
@@ -330,4 +330,8 @@
   "unable to find a Visual Studio installation; "
   "try running Clang from a developer command prompt">,
   InGroup>;
+
+def warn_drv_fine_grained_bitfield_accesses_ignored : Warning<
+  "option '-ffine-grained-bitfield-accesses' cannot be enabled together with a sanitizer; flag ignored">,
+  InGroup;
 }
Index: cfe/trunk/include/clang/Frontend/CodeGenOptions.def
===
--- cfe/trunk/include/clang/Frontend/CodeGenOptions.def
+++ cfe/trunk/include/clang/Frontend/CodeGenOptions.def
@@ -179,6 +179,7 @@
 CODEGENOPT(SanitizeStats , 1, 0) ///< Collect statistics for sanitizers.
 CODEGENOPT(SimplifyLibCalls  , 1, 1) ///< Set when -fbuiltin is enabled.
 CODEGENOPT(SoftFloat , 1, 0) ///< -soft-float.
+CODEGENOPT(FineGrainedBitfieldAccesses, 1, 0) ///< Enable fine-grained bitfield accesses.
 CODEGENOPT(StrictEnums   , 1, 0) ///< Optimize based on strict enum definition.
 CODEGENOPT(StrictVTablePointers, 1, 0) ///< Optimize based on the strict vtable pointers
 CODEGENOPT(TimePasses, 1, 0) ///< Set when -ftime-report is enabled.
Index: cfe/trunk/test/CodeGenCXX/finegrain-bitfield-access.cpp
===
--- cfe/trunk/test/CodeGenCXX/finegrain-bitfield-access.cpp
+++ cfe/trunk/test/CodeGenCXX/finegrain-bitfield-access.cpp
@@ -0,0 +1,162 @@
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -fsanitize=address -o - %s | FileCheck %s --check-prefix=SANITIZE
+// Check -fsplit-bitfields will be ignored since sanitizer is enabled.
+
+struct S1 {
+  unsigned f1:2;
+  unsigned f2:6;
+  unsigned f3:8;
+  unsigned f4:4;
+  unsigned f5:8;
+};
+
+S1 a1;
+unsigned read8_1() {
+  // CHECK-LABEL: @_Z7read8_1v
+  // CHECK: %bf.load = load i8, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: %bf.cast = zext i8 %bf.load to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE: %bf.lshr = lshr i32 %bf.load, 8
+  // SANITIZE: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE: ret i32 %bf.clear
+  return a1.f3;
+}
+void write8_1() {
+  // CHECK-LABEL: @_Z8write8_1v
+  // CHECK: store i8 3, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -65281
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 768
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f3 = 3;
+}
+
+unsigned read8_2() {
+  // CHECK-LABEL: @_Z7read8_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S

[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-10-08 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 118181.
wmi added a comment.

Address Hal's comments.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Frontend/CodeGenOptions.def
  lib/CodeGen/CGRecordLayoutBuilder.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/CodeGenCXX/finegrain-bitfield-access.cpp

Index: test/CodeGenCXX/finegrain-bitfield-access.cpp
===
--- test/CodeGenCXX/finegrain-bitfield-access.cpp
+++ test/CodeGenCXX/finegrain-bitfield-access.cpp
@@ -0,0 +1,162 @@
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -fsanitize=address -o - %s | FileCheck %s --check-prefix=SANITIZE
+// Check -fsplit-bitfields will be ignored since sanitizer is enabled.
+
+struct S1 {
+  unsigned f1:2;
+  unsigned f2:6;
+  unsigned f3:8;
+  unsigned f4:4;
+  unsigned f5:8;
+};
+
+S1 a1;
+unsigned read8_1() {
+  // CHECK-LABEL: @_Z7read8_1v
+  // CHECK: %bf.load = load i8, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: %bf.cast = zext i8 %bf.load to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE: %bf.lshr = lshr i32 %bf.load, 8
+  // SANITIZE: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE: ret i32 %bf.clear
+  return a1.f3;
+}
+void write8_1() {
+  // CHECK-LABEL: @_Z8write8_1v
+  // CHECK: store i8 3, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -65281
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 768
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f3 = 3;
+}
+
+unsigned read8_2() {
+  // CHECK-LABEL: @_Z7read8_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: %bf.lshr = lshr i16 %bf.load, 4
+  // CHECK-NEXT: %bf.clear = and i16 %bf.lshr, 255
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.clear to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.lshr = lshr i32 %bf.load, 20
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE-NEXT: ret i32 %bf.clear
+  return a1.f5;
+}
+void write8_2() {
+  // CHECK-LABEL: @_Z8write8_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: %bf.clear = and i16 %bf.load, -4081
+  // CHECK-NEXT: %bf.set = or i16 %bf.clear, 48
+  // CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -267386881
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 3145728
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f5 = 3;
+}
+
+struct S2 {
+  unsigned long f1:16;
+  unsigned long f2:16;
+  unsigned long f3:6;
+};
+
+S2 a2;
+unsigned read16_1() {
+  // CHECK-LABEL: @_Z8read16_1v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 0), align 8
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_1v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.load, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-NEXT: ret i32 %conv
+  return a2.f1;
+}
+unsigned read16_2() {
+  // CHECK-LABEL: @_Z8read16_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 1), align 2
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_2v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.lshr = lshr i64 %bf.load, 16
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.lshr, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-NEXT: ret i32 %conv
+  return a2.f2;
+}
+
+void write16_1() {
+  // CH

[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-10-05 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 117947.
wmi added a comment.

Address Hal's comment.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Frontend/CodeGenOptions.def
  lib/CodeGen/CGRecordLayoutBuilder.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/CodeGenCXX/finegrain-bitfield-access.cpp

Index: test/CodeGenCXX/finegrain-bitfield-access.cpp
===
--- test/CodeGenCXX/finegrain-bitfield-access.cpp
+++ test/CodeGenCXX/finegrain-bitfield-access.cpp
@@ -0,0 +1,162 @@
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -fsanitize=address -o - %s | FileCheck %s --check-prefix=SANITIZE
+// Check -fsplit-bitfields will be ignored since sanitizer is enabled.
+
+struct S1 {
+  unsigned f1:2;
+  unsigned f2:6;
+  unsigned f3:8;
+  unsigned f4:4;
+  unsigned f5:8;
+};
+
+S1 a1;
+unsigned read8_1() {
+  // CHECK-LABEL: @_Z7read8_1v
+  // CHECK: %bf.load = load i8, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: %bf.cast = zext i8 %bf.load to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE: %bf.lshr = lshr i32 %bf.load, 8
+  // SANITIZE: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE: ret i32 %bf.clear
+  return a1.f3;
+}
+void write8_1() {
+  // CHECK-LABEL: @_Z8write8_1v
+  // CHECK: store i8 3, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -65281
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 768
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f3 = 3;
+}
+
+unsigned read8_2() {
+  // CHECK-LABEL: @_Z7read8_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: %bf.lshr = lshr i16 %bf.load, 4
+  // CHECK-NEXT: %bf.clear = and i16 %bf.lshr, 255
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.clear to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.lshr = lshr i32 %bf.load, 20
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE-NEXT: ret i32 %bf.clear
+  return a1.f5;
+}
+void write8_2() {
+  // CHECK-LABEL: @_Z8write8_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: %bf.clear = and i16 %bf.load, -4081
+  // CHECK-NEXT: %bf.set = or i16 %bf.clear, 48
+  // CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -267386881
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 3145728
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f5 = 3;
+}
+
+struct S2 {
+  unsigned long f1:16;
+  unsigned long f2:16;
+  unsigned long f3:6;
+};
+
+S2 a2;
+unsigned read16_1() {
+  // CHECK-LABEL: @_Z8read16_1v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 0), align 8
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_1v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.load, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-NEXT: ret i32 %conv
+  return a2.f1;
+}
+unsigned read16_2() {
+  // CHECK-LABEL: @_Z8read16_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 1), align 2
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_2v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.lshr = lshr i64 %bf.load, 16
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.lshr, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-NEXT: ret i32 %conv
+  return a2.f2;
+}
+
+void write16_1() {
+  // CHE

[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-10-05 Thread Wei Mi via Phabricator via cfe-commits
wmi marked an inline comment as done.
wmi added inline comments.



Comment at: lib/CodeGen/CGRecordLayoutBuilder.cpp:444
 // Add bitfields to the run as long as they qualify.
-if (Field != FieldEnd && Field->getBitWidthValue(Context) != 0 &&
-Tail == getFieldBitOffset(*Field)) {
-  Tail += Field->getBitWidthValue(Context);
-  ++Field;
-  continue;
+if (Field != FieldEnd && !SingleFieldRun) {
+  SingleFieldRun = betterBeSingleFieldRun(Field);

hfinkel wrote:
> The logic here is not obvious. Can you please add a comment. SingleFieldRun 
> here is only not equal to `betterBeSingleFieldRun(Field)` if we've skipped 
> 0-length bitfields, right? Please explain what's going on and also please 
> make sure there's a test case.
I restructure the code a little bit and hope the logic is more clear. I already 
have a testcase added for it.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-09-27 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 116896.
wmi added a comment.

Address Hal's comment. Separate bitfields to shards separated by the 
naturally-sized-and-aligned fields.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Frontend/CodeGenOptions.def
  lib/CodeGen/CGRecordLayoutBuilder.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/CodeGenCXX/bitfield-split.cpp

Index: test/CodeGenCXX/bitfield-split.cpp
===
--- test/CodeGenCXX/bitfield-split.cpp
+++ test/CodeGenCXX/bitfield-split.cpp
@@ -0,0 +1,162 @@
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -ffine-grained-bitfield-accesses \
+// RUN:   -emit-llvm -fsanitize=address -o - %s | FileCheck %s --check-prefix=SANITIZE
+// Check -fsplit-bitfields will be ignored since sanitizer is enabled.
+
+struct S1 {
+  unsigned f1:2;
+  unsigned f2:6;
+  unsigned f3:8;
+  unsigned f4:4;
+  unsigned f5:8;
+};
+
+S1 a1;
+unsigned read8_1() {
+  // CHECK-LABEL: @_Z7read8_1v
+  // CHECK: %bf.load = load i8, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: %bf.cast = zext i8 %bf.load to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE: %bf.lshr = lshr i32 %bf.load, 8
+  // SANITIZE: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE: ret i32 %bf.clear
+  return a1.f3;
+}
+void write8_1() {
+  // CHECK-LABEL: @_Z8write8_1v
+  // CHECK: store i8 3, i8* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 1), align 1
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -65281
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 768
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f3 = 3;
+}
+
+unsigned read8_2() {
+  // CHECK-LABEL: @_Z7read8_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: %bf.lshr = lshr i16 %bf.load, 4
+  // CHECK-NEXT: %bf.clear = and i16 %bf.lshr, 255
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.clear to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.lshr = lshr i32 %bf.load, 20
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE-NEXT: ret i32 %bf.clear
+  return a1.f5;
+}
+void write8_2() {
+  // CHECK-LABEL: @_Z8write8_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: %bf.clear = and i16 %bf.load, -4081
+  // CHECK-NEXT: %bf.set = or i16 %bf.clear, 48
+  // CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 2), align 2
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -267386881
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 3145728
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f5 = 3;
+}
+
+struct S2 {
+  unsigned long f1:16;
+  unsigned long f2:16;
+  unsigned long f3:6;
+};
+
+S2 a2;
+unsigned read16_1() {
+  // CHECK-LABEL: @_Z8read16_1v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 0), align 8
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_1v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.load, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-NEXT: ret i32 %conv
+  return a2.f1;
+}
+unsigned read16_2() {
+  // CHECK-LABEL: @_Z8read16_2v
+  // CHECK: %bf.load = load i16, i16* getelementptr inbounds (%struct.S2, %struct.S2* @a2, i32 0, i32 1), align 2
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_2v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.lshr = lshr i64 %bf.load, 16
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.lshr, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-NEXT: ret i32 %conv
+  return a2

[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-09-26 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In https://reviews.llvm.org/D36562#880808, @hfinkel wrote:

> You seem to be only changing the behavior for the "separatable" fields, but I 
> suspect you want to change the behavior for the others too. The bitfield 
> would be decomposed into shards, separated by the naturally-sized-and-aligned 
> fields. Each access only loads its shard. For example, in your test case you 
> have:
>
>   struct S3 {
> unsigned long f1:14;
> unsigned long f2:18;
> unsigned long f3:32;
>   };
>   
>
> and you test that, with this option, loading/storing to a3.f3 only access the 
> specific 4 bytes composing f3. But if you load f1 or f2, we're still loading 
> all 8 bytes, right? I think we should only load/store the lower 4 bytes when 
> we access a3.f1 and/or a3.f2.


This is intentional. if the struct S3 is like following:
struct S3 {

  unsigned long f1:14;
  unsigned long f2:32;
  unsigned long f3:18;

};

and if there is no write of a.f2 between a.f1 and a.f3, the loads of a.f1 and 
a.f2 can still be shared. It is trying to keep the combining opportunity 
maximally while reducing the cost of accessing naturally-sized-and-aligned 
fields

> Otherwise, you can again end up with the narrow-store/wide-load problem for 
> nearby fields under a different set of circumstances.

Good catch. It is possible to have the problem indeed. Considering the big perf 
impact and triaging difficulty of store-forwarding problem, I have to sacrifice 
the combining opportunity above and take the suggestion just as you describe.

Thanks,
Wei.




Comment at: include/clang/Driver/Options.td:1032
 
+def fsplit_bitfields : Flag<["-"], "fsplit-bitfields">,
+  Group, Flags<[CC1Option]>,

hfinkel wrote:
> I'm not opposed to -fsplit-bitfields, but I'd prefer if we find something 
> more self-explanatory. It's not really clear what "splitting a bitfield" 
> means. Maybe?
> 
>   -fsplit-bitfield-accesses
>   -fdecomposed-bitfield-accesses
>   -fsharded-bitfield-accesses
>   -ffine-grained-bitfield-accesses
> 
> (I think that I prefer -ffine-grained-bitfield-accesses, although it's the 
> longest)
Ok. 



Comment at: include/clang/Driver/Options.td:1034
+  Group, Flags<[CC1Option]>,
+  HelpText<"Enable splitting bitfield with legal width and alignment in 
LLVM.">;
+def fno_split_bitfields : Flag<["-"], "fno-split-bitfields">,

hfinkel wrote:
> How about?
> 
>   Use separate access for bitfields with legal widths and alignments.
> 
> I don't think that "in LLVM" is needed here (or we could put "in LLVM" on an 
> awful lot of these options).
Sure.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38046: [Atomic][X8664] set max atomic inline/promote width according to the target

2017-09-22 Thread Wei Mi via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL313992: [Atomic][X8664] set max atomic inline width 
according to the target (authored by wmi).

Changed prior to commit:
  https://reviews.llvm.org/D38046?vs=116107&id=116362#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38046

Files:
  cfe/trunk/include/clang/Basic/TargetInfo.h
  cfe/trunk/lib/Basic/Targets.cpp
  cfe/trunk/lib/Basic/Targets/X86.h
  cfe/trunk/test/CodeGenCXX/atomic-inline.cpp
  cfe/trunk/test/OpenMP/atomic_capture_codegen.cpp
  cfe/trunk/test/OpenMP/atomic_read_codegen.c
  cfe/trunk/test/OpenMP/atomic_update_codegen.cpp
  cfe/trunk/test/OpenMP/atomic_write_codegen.c

Index: cfe/trunk/lib/Basic/Targets.cpp
===
--- cfe/trunk/lib/Basic/Targets.cpp
+++ cfe/trunk/lib/Basic/Targets.cpp
@@ -620,6 +620,7 @@
 
   Target->setSupportedOpenCLOpts();
   Target->setOpenCLExtensionOpts();
+  Target->setMaxAtomicWidth();
 
   if (!Target->validateTarget(Diags))
 return nullptr;
Index: cfe/trunk/lib/Basic/Targets/X86.h
===
--- cfe/trunk/lib/Basic/Targets/X86.h
+++ cfe/trunk/lib/Basic/Targets/X86.h
@@ -814,7 +814,7 @@
 
 // x86-64 has atomics up to 16 bytes.
 MaxAtomicPromoteWidth = 128;
-MaxAtomicInlineWidth = 128;
+MaxAtomicInlineWidth = 64;
   }
 
   BuiltinVaListKind getBuiltinVaListKind() const override {
@@ -872,6 +872,12 @@
  HasSizeMismatch);
   }
 
+  void setMaxAtomicWidth() override {
+if (hasFeature("cx16"))
+  MaxAtomicInlineWidth = 128;
+return;
+  }
+
   ArrayRef getTargetBuiltins() const override;
 };
 
Index: cfe/trunk/include/clang/Basic/TargetInfo.h
===
--- cfe/trunk/include/clang/Basic/TargetInfo.h
+++ cfe/trunk/include/clang/Basic/TargetInfo.h
@@ -448,6 +448,9 @@
   /// \brief Return the maximum width lock-free atomic operation which can be
   /// inlined given the supported features of the given target.
   unsigned getMaxAtomicInlineWidth() const { return MaxAtomicInlineWidth; }
+  /// \brief Set the maximum inline or promote width lock-free atomic operation
+  /// for the given target.
+  virtual void setMaxAtomicWidth() {}
   /// \brief Returns true if the given target supports lock-free atomic
   /// operations at the specified width and alignment.
   virtual bool hasBuiltinAtomic(uint64_t AtomicSizeInBits,
Index: cfe/trunk/test/OpenMP/atomic_read_codegen.c
===
--- cfe/trunk/test/OpenMP/atomic_read_codegen.c
+++ cfe/trunk/test/OpenMP/atomic_read_codegen.c
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 // REQUIRES: x86-registered-target
 #ifndef HEADER
Index: cfe/trunk/test/OpenMP/atomic_update_codegen.cpp
===
--- cfe/trunk/test/OpenMP/atomic_update_codegen.cpp
+++ cfe/trunk/test/OpenMP/atomic_update_codegen.cpp
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 #ifndef HEADER
 #define HEADER
Index: cfe/trunk/test/OpenMP/atomic_capture_codegen.cpp
===
--- cfe/trunk/test/OpenMP/atomic_capture_codegen.cpp
+++ cfe/trunk/test/OpenMP/atomic_capture_codegen.cpp
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_6

[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-09-21 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 116232.
wmi added a comment.

Changes following the discussion:

- Put the bitfield split logic under an option and off by default.
- When sanitizer is enabled, the option for bitfield split will be ignored and 
a warning message will be emitted.

In addition, a test is added.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562

Files:
  include/clang/Basic/DiagnosticDriverKinds.td
  include/clang/Driver/Options.td
  include/clang/Frontend/CodeGenOptions.def
  lib/CodeGen/CGExpr.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/CodeGenCXX/bitfield-split.cpp

Index: test/CodeGenCXX/bitfield-split.cpp
===
--- test/CodeGenCXX/bitfield-split.cpp
+++ test/CodeGenCXX/bitfield-split.cpp
@@ -0,0 +1,161 @@
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsplit-bitfields -emit-llvm \
+// RUN:  -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsplit-bitfields -emit-llvm \
+// RUN:  -fsanitize=address -o - %s | FileCheck %s --check-prefix=SANITIZE
+// Check -fsplit-bitfields will be ignored since sanitizer is enabled.
+
+struct S1 {
+  unsigned f1:2;
+  unsigned f2:6;
+  unsigned f3:8;
+  unsigned f4:4;
+  unsigned f5:8;
+};
+
+S1 a1;
+unsigned read8_1() {
+  // CHECK-LABEL: @_Z7read8_1v
+  // CHECK: %bf.load = load i8, i8* getelementptr (i8, i8* bitcast (%struct.S1* @a1 to i8*), i32 1), align 1
+  // CHECK-NEXT: %bf.cast = zext i8 %bf.load to i32
+  // CHECK-NEXT: ret i32 %bf.cast
+  // SANITIZE-LABEL: @_Z7read8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE: %bf.lshr = lshr i32 %bf.load, 8
+  // SANITIZE: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE: ret i32 %bf.clear
+  return a1.f3;
+}
+void write8_1() {
+  // CHECK-LABEL: @_Z8write8_1v
+  // CHECK: store i8 3, i8* getelementptr (i8, i8* bitcast (%struct.S1* @a1 to i8*), i32 1), align 1
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_1v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -65281
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 768
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f3 = 3;
+}
+
+unsigned read8_2() {
+  // CHECK-LABEL: @_Z7read8_2v
+  // CHECK: %bf.load = load i32, i32* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 0), align 4
+  // CHECK-NEXT: %bf.lshr = lshr i32 %bf.load, 20
+  // CHECK-NEXT: %bf.clear = and i32 %bf.lshr, 255
+  // CHECK-NEXT: ret i32 %bf.clear
+  // SANITIZE-LABEL: @_Z7read8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.lshr = lshr i32 %bf.load, 20
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.lshr, 255
+  // SANITIZE-NEXT: ret i32 %bf.clear
+  return a1.f5;
+}
+void write8_2() {
+  // CHECK-LABEL: @_Z8write8_2v
+  // CHECK: %bf.load = load i32, i32* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 0), align 4
+  // CHECK-NEXT: %bf.clear = and i32 %bf.load, -267386881
+  // CHECK-NEXT: %bf.set = or i32 %bf.clear, 3145728
+  // CHECK-NEXT: store i32 %bf.set, i32* getelementptr inbounds (%struct.S1, %struct.S1* @a1, i32 0, i32 0), align 4
+  // CHECK-NEXT: ret void
+  // SANITIZE-LABEL: @_Z8write8_2v
+  // SANITIZE: %bf.load = load i32, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: %bf.clear = and i32 %bf.load, -267386881
+  // SANITIZE-NEXT: %bf.set = or i32 %bf.clear, 3145728
+  // SANITIZE-NEXT: store i32 %bf.set, i32* getelementptr inbounds {{.*}}, align 4
+  // SANITIZE-NEXT: ret void
+  a1.f5 = 3;
+}
+
+struct S2 {
+  unsigned long f1:16;
+  unsigned long f2:16;
+  unsigned long f3:6;
+};
+
+S2 a2;
+unsigned read16_1() {
+  // CHECK-LABEL: @_Z8read16_1v
+  // CHECK: %bf.load = load i16, i16* bitcast (%struct.S2* @a2 to i16*), align 2
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_1v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.load, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-NEXT: ret i32 %conv
+  return a2.f1;
+}
+unsigned read16_2() {
+  // CHECK-LABEL: @_Z8read16_2v
+  // CHECK: %bf.load = load i16, i16* bitcast (i8* getelementptr (i8, i8* bitcast (%struct.S2* @a2 to i8*), i32 2) to i16*), align 2
+  // CHECK-NEXT: %bf.cast = zext i16 %bf.load to i64
+  // CHECK-NEXT: %conv = trunc i64 %bf.cast to i32
+  // CHECK-NEXT: ret i32 %conv
+  // SANITIZE-LABEL: @_Z8read16_2v
+  // SANITIZE: %bf.load = load i64, i64* bitcast {{.*}}, align 8
+  // SANITIZE-NEXT: %bf.lshr = lshr i64 %bf.load, 16
+  // SANITIZE-NEXT: %bf.clear = and i64 %bf.lshr, 65535
+  // SANITIZE-NEXT: %conv = trunc i64 %bf.clear to i32
+  // SANITIZE-

[PATCH] D38046: [Atomic][X8664] set max atomic inline/promote width according to the target

2017-09-20 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 116107.
wmi added a comment.
Herald added a subscriber: eraman.

Address Eli's comments.


Repository:
  rL LLVM

https://reviews.llvm.org/D38046

Files:
  include/clang/Basic/TargetInfo.h
  lib/Basic/Targets.cpp
  lib/Basic/Targets/X86.h
  test/CodeGenCXX/atomic-inline.cpp
  test/OpenMP/atomic_capture_codegen.cpp
  test/OpenMP/atomic_read_codegen.c
  test/OpenMP/atomic_update_codegen.cpp
  test/OpenMP/atomic_write_codegen.c

Index: test/OpenMP/atomic_write_codegen.c
===
--- test/OpenMP/atomic_write_codegen.c
+++ test/OpenMP/atomic_write_codegen.c
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 // REQUIRES: x86-registered-target
 #ifndef HEADER
Index: test/OpenMP/atomic_update_codegen.cpp
===
--- test/OpenMP/atomic_update_codegen.cpp
+++ test/OpenMP/atomic_update_codegen.cpp
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 #ifndef HEADER
 #define HEADER
Index: test/OpenMP/atomic_read_codegen.c
===
--- test/OpenMP/atomic_read_codegen.c
+++ test/OpenMP/atomic_read_codegen.c
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 // REQUIRES: x86-registered-target
 #ifndef HEADER
Index: test/OpenMP/atomic_capture_codegen.cpp
===
--- test/OpenMP/atomic_capture_codegen.cpp
+++ test/OpenMP/atomic_capture_codegen.cpp
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 #ifndef HEADER
 #define HEADER
Index: test/CodeGenCXX/atomic-inline.cpp
===
--- test/CodeGenCXX/atomic-inline.cpp
+++ test/CodeGenCXX/atomic-inline.cpp
@@ -0,0 +1,69 @@
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | FileCheck %s
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu -target-cpu core2 | FileCheck %s --check-prefix=CORE2
+// Check the atomic code generation for cpu targets w/wo cx16 support.
+
+struct alignas(8) AM8 {
+  int f1, f2;
+};
+AM8 m8;
+AM8 load8() {
+  AM8 am;
+  // CHECK-LABEL: @_Z5load8v
+  // CHECK: load atomic i64, {{.*}} monotonic
+  // CORE2-LABEL: @_Z5load8v
+  // CORE2: load atomic i64, {{.*}}

[PATCH] D38046: [Atomic][X8664] set max atomic inline/promote width according to the target

2017-09-20 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: lib/Basic/Targets/X86.h:898
+  MaxAtomicPromoteWidth = 64;
+  MaxAtomicInlineWidth = 64;
+}

efriedma wrote:
> wmi wrote:
> > efriedma wrote:
> > > I don't think we need to mess with MaxAtomicPromoteWidth?
> > > 
> > > Probably more intuitive to check "if (hasFeature" rather than "if 
> > > (!hasFeature".
> > > 
> > > Adding a dedicated hook for this seems a bit overkill, but I don't have a 
> > > better suggestion.
> > If 128 bits inline atomic is not supported, what is the point to promote 
> > atomic type to 128 bits?
> MaxAtomicPromoteWidth affects the ABI, so it can't vary based on the target 
> CPU.
That make senses. Thanks for the explanation.


Repository:
  rL LLVM

https://reviews.llvm.org/D38046



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38046: [Atomic][X8664] set max atomic inline/promote width according to the target

2017-09-20 Thread Wei Mi via Phabricator via cfe-commits
wmi added inline comments.



Comment at: lib/Basic/Targets/X86.h:898
+  MaxAtomicPromoteWidth = 64;
+  MaxAtomicInlineWidth = 64;
+}

efriedma wrote:
> I don't think we need to mess with MaxAtomicPromoteWidth?
> 
> Probably more intuitive to check "if (hasFeature" rather than "if 
> (!hasFeature".
> 
> Adding a dedicated hook for this seems a bit overkill, but I don't have a 
> better suggestion.
If 128 bits inline atomic is not supported, what is the point to promote atomic 
type to 128 bits?


Repository:
  rL LLVM

https://reviews.llvm.org/D38046



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38046: [Atomic][X8664] set max atomic inline/promote width according to the target

2017-09-19 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 115945.
wmi added a comment.

Address Eli's comment.


Repository:
  rL LLVM

https://reviews.llvm.org/D38046

Files:
  include/clang/Basic/TargetInfo.h
  lib/Basic/Targets.cpp
  lib/Basic/Targets/X86.h
  test/OpenMP/atomic_capture_codegen.cpp
  test/OpenMP/atomic_read_codegen.c
  test/OpenMP/atomic_update_codegen.cpp
  test/OpenMP/atomic_write_codegen.c

Index: test/OpenMP/atomic_write_codegen.c
===
--- test/OpenMP/atomic_write_codegen.c
+++ test/OpenMP/atomic_write_codegen.c
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 // REQUIRES: x86-registered-target
 #ifndef HEADER
Index: test/OpenMP/atomic_update_codegen.cpp
===
--- test/OpenMP/atomic_update_codegen.cpp
+++ test/OpenMP/atomic_update_codegen.cpp
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 #ifndef HEADER
 #define HEADER
Index: test/OpenMP/atomic_read_codegen.c
===
--- test/OpenMP/atomic_read_codegen.c
+++ test/OpenMP/atomic_read_codegen.c
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 // REQUIRES: x86-registered-target
 #ifndef HEADER
Index: test/OpenMP/atomic_capture_codegen.cpp
===
--- test/OpenMP/atomic_capture_codegen.cpp
+++ test/OpenMP/atomic_capture_codegen.cpp
@@ -1,6 +1,6 @@
-// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -emit-pch -o %t %s
-// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -verify -triple x86_64-apple-darwin10 -target-cpu core2 -fopenmp -x c -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c -triple x86_64-apple-darwin10 -target-cpu core2 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
 // expected-no-diagnostics
 #ifndef HEADER
 #define HEADER
Index: lib/Basic/Targets.cpp
===
--- lib/Basic/Targets.cpp
+++ lib/Basic/Targets.cpp
@@ -620,6 +620,7 @@
 
   Target->setSupportedOpenCLOpts();
   Target->setOpenCLExtensionOpts();
+  Target->setMaxAtomicWidth();
 
   if (!Target->validateTarget(Diags))
 return nullptr;
Index: lib/Basic/Targets/X86.h
===
--- lib/Basic/Targets/X86.h
+++ lib/Basic/Targets/X86.h
@@ -892,6 +892,14 @@
  HasSizeMismatch);
   }
 
+  void setMaxAtomicWidth() override {
+if (!hasFeature("cx16")) {
+  MaxAtomicPromoteWidth = 64;
+  MaxAtomicInlineWidth = 64;
+}
+return;
+  }
+
   ArrayRef getTargetBui

[PATCH] D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression.

2017-09-08 Thread Wei Mi via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL312801: Use EmitPointerWithAlignment to get alignment 
information of the pointer used… (authored by wmi).

Changed prior to commit:
  https://reviews.llvm.org/D37310?vs=114300&id=114384#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D37310

Files:
  cfe/trunk/lib/CodeGen/CGAtomic.cpp
  cfe/trunk/test/CodeGenCXX/atomic-align.cpp


Index: cfe/trunk/lib/CodeGen/CGAtomic.cpp
===
--- cfe/trunk/lib/CodeGen/CGAtomic.cpp
+++ cfe/trunk/lib/CodeGen/CGAtomic.cpp
@@ -745,19 +745,19 @@
   QualType MemTy = AtomicTy;
   if (const AtomicType *AT = AtomicTy->getAs())
 MemTy = AT->getValueType();
-  CharUnits sizeChars, alignChars;
-  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
-  uint64_t Size = sizeChars.getQuantity();
-  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
-  bool UseLibcall = (sizeChars != alignChars ||
- getContext().toBits(sizeChars) > MaxInlineWidthInBits);
-
   llvm::Value *IsWeak = nullptr, *OrderFail = nullptr;
 
   Address Val1 = Address::invalid();
   Address Val2 = Address::invalid();
   Address Dest = Address::invalid();
-  Address Ptr(EmitScalarExpr(E->getPtr()), alignChars);
+  Address Ptr = EmitPointerWithAlignment(E->getPtr());
+
+  CharUnits sizeChars, alignChars;
+  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
+  uint64_t Size = sizeChars.getQuantity();
+  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
+  bool UseLibcall = (sizeChars != Ptr.getAlignment() ||
+ getContext().toBits(sizeChars) > MaxInlineWidthInBits);
 
   if (E->getOp() == AtomicExpr::AO__c11_atomic_init ||
   E->getOp() == AtomicExpr::AO__opencl_atomic_init) {
Index: cfe/trunk/test/CodeGenCXX/atomic-align.cpp
===
--- cfe/trunk/test/CodeGenCXX/atomic-align.cpp
+++ cfe/trunk/test/CodeGenCXX/atomic-align.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | 
FileCheck %s
+
+struct AM {
+  int f1, f2;
+};
+alignas(8) AM m;
+AM load1() {
+  AM am;
+  // m is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load1v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&m, &am, 0);
+  return am;
+}
+
+struct BM {
+  int f1;
+  alignas(8) AM f2;
+};
+BM bm;
+AM load2() {
+  AM am;
+  // BM::f2 is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load2v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&bm.f2, &am, 0);
+  return am;
+}


Index: cfe/trunk/lib/CodeGen/CGAtomic.cpp
===
--- cfe/trunk/lib/CodeGen/CGAtomic.cpp
+++ cfe/trunk/lib/CodeGen/CGAtomic.cpp
@@ -745,19 +745,19 @@
   QualType MemTy = AtomicTy;
   if (const AtomicType *AT = AtomicTy->getAs())
 MemTy = AT->getValueType();
-  CharUnits sizeChars, alignChars;
-  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
-  uint64_t Size = sizeChars.getQuantity();
-  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
-  bool UseLibcall = (sizeChars != alignChars ||
- getContext().toBits(sizeChars) > MaxInlineWidthInBits);
-
   llvm::Value *IsWeak = nullptr, *OrderFail = nullptr;
 
   Address Val1 = Address::invalid();
   Address Val2 = Address::invalid();
   Address Dest = Address::invalid();
-  Address Ptr(EmitScalarExpr(E->getPtr()), alignChars);
+  Address Ptr = EmitPointerWithAlignment(E->getPtr());
+
+  CharUnits sizeChars, alignChars;
+  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
+  uint64_t Size = sizeChars.getQuantity();
+  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
+  bool UseLibcall = (sizeChars != Ptr.getAlignment() ||
+ getContext().toBits(sizeChars) > MaxInlineWidthInBits);
 
   if (E->getOp() == AtomicExpr::AO__c11_atomic_init ||
   E->getOp() == AtomicExpr::AO__opencl_atomic_init) {
Index: cfe/trunk/test/CodeGenCXX/atomic-align.cpp
===
--- cfe/trunk/test/CodeGenCXX/atomic-align.cpp
+++ cfe/trunk/test/CodeGenCXX/atomic-align.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | FileCheck %s
+
+struct AM {
+  int f1, f2;
+};
+alignas(8) AM m;
+AM load1() {
+  AM am;
+  // m is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load1v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&m, &am, 0);
+  return am;
+}
+
+struct BM {
+  int f1;
+  alignas(8) AM f2;
+};
+BM bm;
+AM load2() {
+  AM am;
+  // BM::f2 is declared to align to 8bytes, so generate

[PATCH] D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression.

2017-09-07 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 114300.
wmi added a comment.

Address John's comment.


Repository:
  rL LLVM

https://reviews.llvm.org/D37310

Files:
  lib/CodeGen/CGAtomic.cpp
  test/CodeGenCXX/atomic-align.cpp


Index: test/CodeGenCXX/atomic-align.cpp
===
--- test/CodeGenCXX/atomic-align.cpp
+++ test/CodeGenCXX/atomic-align.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | 
FileCheck %s
+
+struct AM {
+  int f1, f2;
+};
+alignas(8) AM m;
+AM load1() {
+  AM am;
+  // m is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load1v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&m, &am, 0);
+  return am;
+}
+
+struct BM {
+  int f1;
+  alignas(8) AM f2;
+};
+BM bm;
+AM load2() {
+  AM am;
+  // BM::f2 is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load2v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&bm.f2, &am, 0);
+  return am;
+}
Index: lib/CodeGen/CGAtomic.cpp
===
--- lib/CodeGen/CGAtomic.cpp
+++ lib/CodeGen/CGAtomic.cpp
@@ -745,19 +745,19 @@
   QualType MemTy = AtomicTy;
   if (const AtomicType *AT = AtomicTy->getAs())
 MemTy = AT->getValueType();
-  CharUnits sizeChars, alignChars;
-  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
-  uint64_t Size = sizeChars.getQuantity();
-  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
-  bool UseLibcall = (sizeChars != alignChars ||
- getContext().toBits(sizeChars) > MaxInlineWidthInBits);
-
   llvm::Value *IsWeak = nullptr, *OrderFail = nullptr;
 
   Address Val1 = Address::invalid();
   Address Val2 = Address::invalid();
   Address Dest = Address::invalid();
-  Address Ptr(EmitScalarExpr(E->getPtr()), alignChars);
+  Address Ptr = EmitPointerWithAlignment(E->getPtr());
+
+  CharUnits sizeChars, alignChars;
+  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
+  uint64_t Size = sizeChars.getQuantity();
+  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
+  bool UseLibcall = (sizeChars != Ptr.getAlignment() ||
+ getContext().toBits(sizeChars) > MaxInlineWidthInBits);
 
   if (E->getOp() == AtomicExpr::AO__c11_atomic_init ||
   E->getOp() == AtomicExpr::AO__opencl_atomic_init) {


Index: test/CodeGenCXX/atomic-align.cpp
===
--- test/CodeGenCXX/atomic-align.cpp
+++ test/CodeGenCXX/atomic-align.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | FileCheck %s
+
+struct AM {
+  int f1, f2;
+};
+alignas(8) AM m;
+AM load1() {
+  AM am;
+  // m is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load1v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&m, &am, 0);
+  return am;
+}
+
+struct BM {
+  int f1;
+  alignas(8) AM f2;
+};
+BM bm;
+AM load2() {
+  AM am;
+  // BM::f2 is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load2v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&bm.f2, &am, 0);
+  return am;
+}
Index: lib/CodeGen/CGAtomic.cpp
===
--- lib/CodeGen/CGAtomic.cpp
+++ lib/CodeGen/CGAtomic.cpp
@@ -745,19 +745,19 @@
   QualType MemTy = AtomicTy;
   if (const AtomicType *AT = AtomicTy->getAs())
 MemTy = AT->getValueType();
-  CharUnits sizeChars, alignChars;
-  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
-  uint64_t Size = sizeChars.getQuantity();
-  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
-  bool UseLibcall = (sizeChars != alignChars ||
- getContext().toBits(sizeChars) > MaxInlineWidthInBits);
-
   llvm::Value *IsWeak = nullptr, *OrderFail = nullptr;
 
   Address Val1 = Address::invalid();
   Address Val2 = Address::invalid();
   Address Dest = Address::invalid();
-  Address Ptr(EmitScalarExpr(E->getPtr()), alignChars);
+  Address Ptr = EmitPointerWithAlignment(E->getPtr());
+
+  CharUnits sizeChars, alignChars;
+  std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
+  uint64_t Size = sizeChars.getQuantity();
+  unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
+  bool UseLibcall = (sizeChars != Ptr.getAlignment() ||
+ getContext().toBits(sizeChars) > MaxInlineWidthInBits);
 
   if (E->getOp() == AtomicExpr::AO__c11_atomic_init ||
   E->getOp() == AtomicExpr::AO__opencl_atomic_init) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression.

2017-09-06 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

Ping


Repository:
  rL LLVM

https://reviews.llvm.org/D37310



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37310: [Atomic] Merge alignment information from Decl and from Type when emit atomic expression.

2017-08-30 Thread Wei Mi via Phabricator via cfe-commits
wmi created this revision.
Herald added a subscriber: sanjoy.

This is to fix PR34347.

EmitAtomicExpr will only use alignment information from Type, instead of Decl, 
so when the declaration of an atomic variable is marked to have the alignment 
equal as its size, EmitAtomicExpr don't know about it and will generate libcall 
instead of atomic op.

The patch merge the alignment information from Decl and from Type.


Repository:
  rL LLVM

https://reviews.llvm.org/D37310

Files:
  lib/CodeGen/CGAtomic.cpp
  test/CodeGenCXX/atomic-align.cpp


Index: test/CodeGenCXX/atomic-align.cpp
===
--- test/CodeGenCXX/atomic-align.cpp
+++ test/CodeGenCXX/atomic-align.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | 
FileCheck %s
+
+struct AM {
+  int f1, f2;
+};
+alignas(8) AM m;
+AM load1() {
+  AM am;
+  // m is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load1v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&m, &am, 0);
+  return am;
+}
+
+struct BM {
+  int f1;
+  alignas(8) AM f2;
+};
+BM bm;
+AM load2() {
+  AM am;
+  // BM::f2 is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load2v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&bm.f2, &am, 0);
+  return am;
+}
Index: lib/CodeGen/CGAtomic.cpp
===
--- lib/CodeGen/CGAtomic.cpp
+++ lib/CodeGen/CGAtomic.cpp
@@ -665,6 +665,20 @@
 MemTy = AT->getValueType();
   CharUnits sizeChars, alignChars;
   std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
+
+  // Get alignment from declaration.
+  if (const auto *UO = dyn_cast(E->getPtr()))
+if (UO->getOpcode() == UO_AddrOf) {
+  CharUnits alignDecl;
+  if (const auto *DRE = dyn_cast(UO->getSubExpr()))
+alignDecl =
+
getContext().toCharUnitsFromBits(DRE->getDecl()->getMaxAlignment());
+  else if (const auto *ME = dyn_cast(UO->getSubExpr()))
+alignDecl = getContext().toCharUnitsFromBits(
+ME->getMemberDecl()->getMaxAlignment());
+  alignChars = std::max(alignChars, alignDecl);
+}
+
   uint64_t Size = sizeChars.getQuantity();
   unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
   bool UseLibcall = (sizeChars != alignChars ||


Index: test/CodeGenCXX/atomic-align.cpp
===
--- test/CodeGenCXX/atomic-align.cpp
+++ test/CodeGenCXX/atomic-align.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 %s -std=c++11 -emit-llvm -o - -triple=x86_64-linux-gnu | FileCheck %s
+
+struct AM {
+  int f1, f2;
+};
+alignas(8) AM m;
+AM load1() {
+  AM am;
+  // m is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load1v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&m, &am, 0);
+  return am;
+}
+
+struct BM {
+  int f1;
+  alignas(8) AM f2;
+};
+BM bm;
+AM load2() {
+  AM am;
+  // BM::f2 is declared to align to 8bytes, so generate load atomic instead
+  // of libcall.
+  // CHECK-LABEL: @_Z5load2v
+  // CHECK: load atomic {{.*}} monotonic
+  __atomic_load(&bm.f2, &am, 0);
+  return am;
+}
Index: lib/CodeGen/CGAtomic.cpp
===
--- lib/CodeGen/CGAtomic.cpp
+++ lib/CodeGen/CGAtomic.cpp
@@ -665,6 +665,20 @@
 MemTy = AT->getValueType();
   CharUnits sizeChars, alignChars;
   std::tie(sizeChars, alignChars) = getContext().getTypeInfoInChars(AtomicTy);
+
+  // Get alignment from declaration.
+  if (const auto *UO = dyn_cast(E->getPtr()))
+if (UO->getOpcode() == UO_AddrOf) {
+  CharUnits alignDecl;
+  if (const auto *DRE = dyn_cast(UO->getSubExpr()))
+alignDecl =
+getContext().toCharUnitsFromBits(DRE->getDecl()->getMaxAlignment());
+  else if (const auto *ME = dyn_cast(UO->getSubExpr()))
+alignDecl = getContext().toCharUnitsFromBits(
+ME->getMemberDecl()->getMaxAlignment());
+  alignChars = std::max(alignChars, alignDecl);
+}
+
   uint64_t Size = sizeChars.getQuantity();
   unsigned MaxInlineWidthInBits = getTarget().getMaxAtomicInlineWidth();
   bool UseLibcall = (sizeChars != alignChars ||
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-08-22 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 112271.
wmi added a comment.

Try another idea suggested by David.

All the bitfields in a single run are still wrapped inside of a large integer 
according to CGBitFieldInfo. For the bitfields with legal integer types and 
aligned, change their access manner when we generate load/store in llvm IR for 
bitfield (In EmitLoadOfBitfieldLValue and EmitStoreThroughBitfieldLValue). All 
the other bitfields will still be accessed using widen load/store plus masking 
operations. Here is an example:

class A {

  unsigned long f1:2;
  unsigned long f2:6;
  unsigned long f3:8;
  unsigned long f4:4;

};
A a;

f1, f2, f3 and f4 will still be wrapped as a large integer. f1, f2, f4 will 
have the same access code as before. f3 will be accessed as if it is a separate 
unsigned char variable.

In this way, we can reduce the chance of blocking bitfield access combining. 
a.f1 access and a.f4 access can be combined if only no a.f3 access stands in 
between a.f1 and a.f4.  We will generate two less instructions for foo, and one 
more instruction for goo. So it is better, but not perfect.

void foo (unsigned long n1, unsigned long n2, unsigned long n3) {

  a.f1 = n1;
  a.f4 = n4;
  a.f3 = n3;

}

void goo (unsigned long n1, unsigned long n2, unsigned long n3) {

  a.f1 = n1;
  a.f3 = n3;// a.f3 will still block the combining of a.f1 and a.f2 because 
a.f3 is accessed independently.
  a.f4 = n4;

}


Repository:
  rL LLVM

https://reviews.llvm.org/D36562

Files:
  lib/CodeGen/CGExpr.cpp
  test/CodeGen/2009-12-07-BitFieldAlignment.c

Index: test/CodeGen/2009-12-07-BitFieldAlignment.c
===
--- test/CodeGen/2009-12-07-BitFieldAlignment.c
+++ test/CodeGen/2009-12-07-BitFieldAlignment.c
@@ -4,7 +4,7 @@
 struct S {
   int a, b;
   void *c;
-  unsigned d : 8;
+  unsigned d : 7;
   unsigned e : 8;
 };
 
Index: lib/CodeGen/CGExpr.cpp
===
--- lib/CodeGen/CGExpr.cpp
+++ lib/CodeGen/CGExpr.cpp
@@ -1653,30 +1653,72 @@
   return EmitLoadOfBitfieldLValue(LV, Loc);
 }
 
+/// Check if current Field is better to be accessed separately. When current
+/// field has legal integer width, and its bitfield offset is naturally
+/// aligned, it is better to access the bitfield like a separate integer var.
+static bool IsSeparatableBitfield(const CGBitFieldInfo &Info,
+  const llvm::DataLayout &DL,
+  ASTContext &Context) {
+  if (!DL.isLegalInteger(Info.Size))
+return false;
+  // Make sure Field is natually aligned.
+  if (Info.Offset %
+  (DL.getABIIntegerTypeAlignment(Info.Size) * Context.getCharWidth()) !=
+  0)
+return false;
+  return true;
+}
+
 RValue CodeGenFunction::EmitLoadOfBitfieldLValue(LValue LV,
  SourceLocation Loc) {
   const CGBitFieldInfo &Info = LV.getBitFieldInfo();
 
   // Get the output type.
   llvm::Type *ResLTy = ConvertType(LV.getType());
 
   Address Ptr = LV.getBitFieldAddress();
-  llvm::Value *Val = Builder.CreateLoad(Ptr, LV.isVolatileQualified(), "bf.load");
-
-  if (Info.IsSigned) {
-assert(static_cast(Info.Offset + Info.Size) <= Info.StorageSize);
-unsigned HighBits = Info.StorageSize - Info.Offset - Info.Size;
-if (HighBits)
-  Val = Builder.CreateShl(Val, HighBits, "bf.shl");
-if (Info.Offset + HighBits)
-  Val = Builder.CreateAShr(Val, Info.Offset + HighBits, "bf.ashr");
+  llvm::Value *Val = nullptr;
+  if (IsSeparatableBitfield(Info, CGM.getDataLayout(), getContext())) {
+// Ptr is the pointer to the Bitfield Storage. Add Offset to the Ptr
+// if the Offset is not zero.
+if (Info.Offset != 0) {
+  Address BC = Builder.CreateBitCast(Ptr, Int8PtrTy);
+  llvm::Value *GEP = Builder.CreateGEP(
+  BC.getPointer(),
+  llvm::ConstantInt::get(Int32Ty,
+ Info.Offset / getContext().getCharWidth()));
+  Ptr = Address(GEP, Ptr.getAlignment());
+}
+// Adjust the element type of Ptr if it has different size with Info.Size.
+if (Ptr.getElementType()->getScalarSizeInBits() != Info.Size) {
+  llvm::Type *BFType = llvm::Type::getIntNTy(getLLVMContext(), Info.Size);
+  llvm::Type *BFTypePtr =
+  llvm::Type::getIntNPtrTy(getLLVMContext(), Info.Size);
+  unsigned Align = CGM.getDataLayout().getABITypeAlignment(BFType) *
+   getContext().getCharWidth();
+  Ptr = Builder.CreateBitCast(
+  Address(Ptr.getPointer(), getContext().toCharUnitsFromBits(Align)),
+  BFTypePtr);
+}
+Val = Builder.CreateLoad(Ptr, LV.isVolatileQualified(), "bf.load");
   } else {
-if (Info.Offset)
-  Val = Builder.CreateLShr(Val, Info.Offset, "bf.lshr");
-if (static_cast(Info.Offset) + Info.Size < Info.StorageSize)
-  Val = Builder.CreateAnd(Val, llvm::APInt::getLowBitsSet(Info.Stor

[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-08-10 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

I limit the bitfield separation in the last update to only happen at the 
beginning of a run so no bitfield combine will be blocked.

I think I need to explain more about why I change the direction and start to 
work on the problem in frontend. Keeping the information by generating widening 
type and letting llvm pass to do narrowing looks like a good solution to me 
originally. However, I saw in real cases that the narrowing approach in a late 
llvm stage has more difficulties than I originally thought. Some of them are 
solved but at the cost of code complexity, but others are more difficult.

- store forwarding issue: To extract legal integer width bitfields from a large 
integer generated by frontend, we need to split both stores and loads related 
with legal integer bitfields. If store is narrowed and load is not, the width 
of load may be smaller than the store and the target may have difficulty to do 
store forwarding and that fact will hit the performance.  Note, we found case 
that related load and store are in different funcs, so when deciding whether to 
narrow a store or not, it is possible that we have no idea that the related 
load is narrowed or not. If we cannot know all the related loads will be 
narrowed, the store is better not to be narrowed.

- After instcombine, some bitfield access information will be lost: The case we 
saw is: unsigned long bf1 : 16 unsigned long bf2 : 16 unsigned long bf3 : 16 
unsigned long bf4 : 8

  bool cond = "bf3 == 0 && bf4 == -1";

  Before instcombine, bf3 and bf4 are extracted from an i64 separately. We can 
know bf3 is a 16 bits access and bf4 is a 8 bit access from the extracting code 
pattern. After instcombine, bf3 and bf4 are merged into a 24 bit access, the 
comparison above is changed to: extract 24 bit data from the i64 (%bf.load = 
wide load i64, %extract = and %bf.load, 0xff) and compare %extract 
with 0x. The information that there are two legal integer bitfield 
accesses is lost, and we won't do narrowing for the load here.

  Because we cannot split the load here, we trigger store forwarding issue.

That is why I am exploring to work on the bitfield access issue in multiple 
directions.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-08-10 Thread Wei Mi via Phabricator via cfe-commits
wmi updated this revision to Diff 110641.
wmi added a comment.

Don't separate bitfield in the middle of a run because it is possible to hinder 
bitfields accesses combine. Only separate bitfield at the beginning of a run.


Repository:
  rL LLVM

https://reviews.llvm.org/D36562

Files:
  lib/CodeGen/CGRecordLayoutBuilder.cpp
  test/CodeGen/2009-12-07-BitFieldAlignment.c
  test/CodeGenCXX/2009-12-23-MissingSext.cpp
  test/CodeGenCXX/bitfield.cpp
  test/CodeGenCXX/pod-member-memcpys.cpp
  test/OpenMP/atomic_capture_codegen.cpp
  test/OpenMP/atomic_read_codegen.c
  test/OpenMP/atomic_update_codegen.cpp
  test/OpenMP/atomic_write_codegen.c

Index: test/OpenMP/atomic_write_codegen.c
===
--- test/OpenMP/atomic_write_codegen.c
+++ test/OpenMP/atomic_write_codegen.c
@@ -28,12 +28,14 @@
 int4 int4x;
 
 struct BitFields {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } bfx;
 
 struct BitFields_packed {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } __attribute__ ((__packed__)) bfx_packed;
 
@@ -58,13 +60,15 @@
 } __attribute__ ((__packed__)) bfx3_packed;
 
 struct BitFields4 {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } bfx4;
 
 struct BitFields4_packed {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } __attribute__ ((__packed__)) bfx4_packed;
Index: test/OpenMP/atomic_update_codegen.cpp
===
--- test/OpenMP/atomic_update_codegen.cpp
+++ test/OpenMP/atomic_update_codegen.cpp
@@ -27,12 +27,14 @@
 int4 int4x;
 
 struct BitFields {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } bfx;
 
 struct BitFields_packed {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } __attribute__ ((__packed__)) bfx_packed;
 
@@ -57,13 +59,15 @@
 } __attribute__ ((__packed__)) bfx3_packed;
 
 struct BitFields4 {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } bfx4;
 
 struct BitFields4_packed {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } __attribute__ ((__packed__)) bfx4_packed;
Index: test/OpenMP/atomic_read_codegen.c
===
--- test/OpenMP/atomic_read_codegen.c
+++ test/OpenMP/atomic_read_codegen.c
@@ -28,12 +28,14 @@
 int4 int4x;
 
 struct BitFields {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } bfx;
 
 struct BitFields_packed {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } __attribute__ ((__packed__)) bfx_packed;
 
@@ -58,13 +60,15 @@
 } __attribute__ ((__packed__)) bfx3_packed;
 
 struct BitFields4 {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } bfx4;
 
 struct BitFields4_packed {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } __attribute__ ((__packed__)) bfx4_packed;
Index: test/OpenMP/atomic_capture_codegen.cpp
===
--- test/OpenMP/atomic_capture_codegen.cpp
+++ test/OpenMP/atomic_capture_codegen.cpp
@@ -27,12 +27,14 @@
 int4 int4x;
 
 struct BitFields {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } bfx;
 
 struct BitFields_packed {
-  int : 32;
+  int : 2;
+  int : 30;
   int a : 31;
 } __attribute__ ((__packed__)) bfx_packed;
 
@@ -57,13 +59,15 @@
 } __attribute__ ((__packed__)) bfx3_packed;
 
 struct BitFields4 {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } bfx4;
 
 struct BitFields4_packed {
-  short : 16;
+  short : 6;
+  short : 10;
   int a: 1;
   long b : 7;
 } __attribute__ ((__packed__)) bfx4_packed;
Index: test/CodeGenCXX/pod-member-memcpys.cpp
===
--- test/CodeGenCXX/pod-member-memcpys.cpp
+++ test/CodeGenCXX/pod-member-memcpys.cpp
@@ -69,7 +69,7 @@
 
 struct BitfieldMember3 {
   virtual void f();
-  int   : 8;
+  int z : 8;
   int x : 1;
   int y;
 };
Index: test/CodeGenCXX/bitfield.cpp
===
--- test/CodeGenCXX/bitfield.cpp
+++ test/CodeGenCXX/bitfield.cpp
@@ -478,3 +478,42 @@
 s->b = x;
   }
 }
+
+namespace N8 {
+  // If a bitfield is at the beginning of a group of bitfields, has the width of legal integer type
+  // and it is natually aligned, the bitfield will be accessed like a separate memory location.
+  struct S {
+unsigned b1 : 16;
+unsigned b2 : 6;
+unsigned b3 : 2;
+unsigned b4 : 3;
+  };
+  unsigned read00(S* s) {
+// CHECK-X86-64-LABEL: define i32 @_ZN2N86read00EPNS_1SE
+// CHECK-X86-64:   %[[cast:.*]] = bitcast %{{.*}} to i16*
+// CHECK-X86-64:   %[[val:.*]]  = load i16, i16* %[[cast]]
+// CHECK-X86-64:   %[[ext:.*]]  = zext i16 %[[val]] to i32
+// CHECK-X86-64:  ret i32 %[[ext]]
+// CHECK-PPC64-LABEL: define zeroext i32 @_ZN2N86read00EPNS_1SE
+// CHECK-PPC64:   %[[val:.*]]   = load i32, i32* {{.*}}
+// CHECK-P

[PATCH] D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type

2017-08-09 Thread Wei Mi via Phabricator via cfe-commits
wmi added a comment.

In https://reviews.llvm.org/D36562#837594, @chandlerc wrote:

> This has been discussed before and I still pretty strongly disagree with it.
>
> This cripples the ability of TSan to find race conditions between accesses to 
> consecutive bitfields -- and these bugs have actually come up.


I guess you mean accessing different bitfields in a consecutive run 
simultaneously can cause data race. Because bitfields order in a consecutive 
run is implementation defined. With the change, Tsan may miss reporting that, 
but such data race can be exposed in a different compiler.

This can be solved by detecting tsan mode in the code. If tsan is enabled, we 
can stop splitting the bitfields.

> We also have had cases in the past where LLVM missed significant bitfield 
> combines and optimizations due to loading them as separate integers. Those 
> would become problems again, and I think they would be even harder to solve 
> than narrowing the access is going to be because we will have strictly less 
> information to work with.

how about only separating legal integer width bitfields at the beginning of a 
consecutive run? Then it won't hinder bitfields combines. This way, it can 
still help for some cases, including the important case we saw.

> Ultimately, while I understand the appeal of this approach, I don't think it 
> is correct and I think we should instead figure out how to optimize these 
> memory accesses well in LLVM. That approach will have the added benefit of 
> optimizing cases where the user has manually used a large integer to simulate 
> bitfields, and making combining and canonicalization substantially easier.




Repository:
  rL LLVM

https://reviews.llvm.org/D36562



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits