[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-06 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy abandoned this revision.
vedgy added a comment.

D143418  implements my latest idea described 
in the previous comment. Let us continue the discussion there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-02 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4098695 , @vedgy wrote:

> Another disadvantage of using `CINDEX_VERSION_MINOR` instead of the `sizeof` 
> is that an older libclang version won't know about compatibility of newer 
> versions with itself. But this reasoning brought me to a different thought: 
> when a program is compiled against a libclang version X, it should not be 
> expected to be runtime-compatible with a libclang version older than X. For 
> example, KDevelop uses `CINDEX_VERSION_MINOR` to decide whether or not 
> certain libclang API is available.
>
> I suspect modifying the exposed struct's size will violate ODR in C++ 
> programs that use libclang. Quote from the cppreference ODR page 
> :
>
>> Note: in C, there is no program-wide ODR for types, and even extern 
>> declarations of the same variable in different translation units may have 
>> different types as long as they are compatible. In C++, the source-code 
>> tokens used in declarations of the same type must be the same as described 
>> above: if one .cpp file defines `struct S { int x; };` and the other .cpp 
>> file defines `struct S { int y; };`, the behavior of the program that links 
>> them together is undefined. This is usually resolved with unnamed namespaces.
>
> I think the answer to this StackOverflow question 
> 
>  confirms my suspicion.

Oh wow, I know the person who asked that question 12 years ago, that's always a 
fun thing to run into randomly. :-D

Given that this idiom (using size of a structure to version it) is used in 
practice all over the place, I'm not really worried about violating the ODR. 
Because the library is using the size information provided to determine what's 
safe to access, the only actionable UB comes from setting the size field 
incorrectly because then the library may access out of bounds memory. But an 
optimizer has no real opportunity for optimization here because we're talking 
about a structure being passed across a DSO boundary (so not even the linker 
sees both the library and the application in this scenario, so LTO also doesn't 
come into play).

(If this idiom wasn't so entrenched in industry, we could use a tagged union 
instead, but conceptually that's an identical solution to this one because the 
`Size` field is acting as the tag to determine which union members are 
accessible. I don't think we need to go to that much trouble though.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-02 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

Another disadvantage of using `CINDEX_VERSION_MINOR` instead of the `sizeof` is 
that an older libclang version won't know about compatibility of newer versions 
with itself. But this reasoning brought me to a different thought: when a 
program is compiled against a libclang version X, it should not be expected to 
be runtime-compatible with a libclang version older than X. For example, 
KDevelop uses `CINDEX_VERSION_MINOR` to decide whether or not certain libclang 
API is available.

I suspect modifying the exposed struct's size will violate ODR in C++ programs 
that use libclang. Quote from the cppreference ODR page 
:

> Note: in C, there is no program-wide ODR for types, and even extern 
> declarations of the same variable in different translation units may have 
> different types as long as they are compatible. In C++, the source-code 
> tokens used in declarations of the same type must be the same as described 
> above: if one .cpp file defines `struct S { int x; };` and the other .cpp 
> file defines `struct S { int y; };`, the behavior of the program that links 
> them together is undefined. This is usually resolved with unnamed namespaces.

I think the answer to this StackOverflow question 

 confirms my suspicion.

In order to avoid the undefined behavior, libclang can use the same approach as 
for other structs/classes: make the struct opaque and provide functions 
`clang_createIndexOptions()`,  `clang_disposeIndexOptions()`, 
`clang_setTempDirPath(CXIndexOptions options, const char *path)`.

`int excludeDeclarationsFromPCH` and `int displayDiagnostics` currently passed 
to `clang_createIndex()` can also be included in the struct and acquire their 
own setter functions. Then only a single argument will be passed to 
`clang_createIndexWithOptions()`: `CXIndexOptions`.

--

In D139774#4094619 , @aaron.ballman 
wrote:

> In D139774#4092553 , @vedgy wrote:
>
>> In D139774#4091631 , 
>> @aaron.ballman wrote:
>>
>>> My preference is still for specific API names. If we want to do something 
>>> general to all temp files, I think `FileSystemOptions` is the way to go.
>>>
>>> My concern with not using a constructor is that, because this is the C API, 
>>> we will be supporting the functionality for a long time even if we do 
>>> switch to using a global override in `FileSystemOptions` where you would 
>>> need to set the information up before executing the compiler. To me, these 
>>> paths are something that should not be possible to change once the compiler 
>>> has started executing. (That also solves the multithreading problem where 
>>> multiple threads are fighting over what the temp directory should be and 
>>> stepping on each other's toes.)
>>
>> As I understand, this leaves two choices:
>>
>> 1. Specific preamble API names, two separate non-constructor setters; the 
>> option values are stored in a separate struct (or even as two separate data 
>> members), not inside `FileSystemOptions`.
>> 2. General temporary-storage arguments (possibly combined in a struct) to a 
>> new overloaded constructor function; the option values are stored inside the 
>> `FileSystemOptions` struct.
>>
>> The second alternative is likely more difficult to implement, more risky and 
>> less convenient to use (the store-in-memory bool option cannot be modified 
>> at any time). Perhaps it should be delayed until (and if) we learn of other 
>> temporary files libclang creates? A downside of implementing the first 
>> option now is that the specific API would have to be supported for a long 
>> time, even after the general temporary-file API is implemented.
>
> I still think the general solution is where we ultimately want to get to and 
> that makes me hesitant to go with specific preamble API names despite that 
> being the direction you prefer. If this wasn't the C APIs, I'd be less 
> concerned because we make no stability guarantees about our C++ interfaces. 
> But we try really hard not to break the C APIs, so adding the specific names 
> is basically committing to not only the APIs but their semantics. I think 
> that makes implementing the general solution slightly more awkward because 
> these are weird special cases that barely get tested in-tree, so they'd be 
> very easy to overlook and accidentally break.
>
> Is there a middle ground where, instead of #2 for general temporary storage, 
> we went with #2 but with compiler-specific directories instead of system 
> directories. e.g., don't let the caller set the temp directory, but do let 
> the caller set the preamble directory (which defaults to the temp directory) 
> so long as it's set before invoking the 

[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-01 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4096733 , @vedgy wrote:

> In D139774#4096695 , @aaron.ballman 
> wrote:
>
>> There's three scenarios when a field is added to the structure: 1) library 
>> and app are matching versions, 2) library is newer than app, 3) app is newer 
>> than library. #1 is the happy path most folks will hopefully be on. For #2 
>> and #3, the app will either be sending more or less data than the library 
>> expects, but the library will know how much of the structure is valid based 
>> on the size field. If the size is too small and the library can't get data 
>> it needs, it can catch that and report an error instead of reading past a 
>> buffer. And if the size is too large, the library can ignore anything it 
>> doesn't know about or it can give an error due to not knowing how to support 
>> the semantics of the newer interface.
>
> In the second case, the library ideally should assume that the missing struct 
> members are not specified and behave as the corresponding older version.

I think agree.

> In the third case, the app can support older libclang versions by passing 
> successively smaller structs until libclang returns a valid `CIndex`.

That's how it tends to work when I've used APIs like this before (this is a 
pretty common pattern in the Win32 C APIs).

In D139774#4096776 , @vedgy wrote:

> Perhaps the struct should contain the value of `CINDEX_VERSION_MINOR` instead 
> of the `sizeof`? This way, libclang can change the meaning of a struct member 
> without changing the size of the struct. For example, replace 
> `PreambleStoragePath` with `TemporaryDirectoryPath`. Such a change could even 
> be quiet, because setting the more general temporary directory path would 
> likely be welcome or at least acceptable to the API users.

We try to keep libclang APIs source and ABI stable; changing the name or 
meaning of the structure member would defeat source stability. We could still 
use the minor version regardless, but then the minor version must continue to 
be monotonic even if we bump the major version (which we said we expect not to 
do, but "expect not to" != "will never").


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-01 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

Perhaps the struct should contain the value of `CINDEX_VERSION_MINOR` instead 
of the `sizeof`? This way, libclang can change the meaning of a struct member 
without changing the size of the struct. For example, replace 
`PreambleStoragePath` with `TemporaryDirectoryPath`. Such a change could even 
be quiet, because setting the more general temporary directory path would 
likely be welcome or at least acceptable to the API users.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-01 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4096695 , @aaron.ballman 
wrote:

> There's three scenarios when a field is added to the structure: 1) library 
> and app are matching versions, 2) library is newer than app, 3) app is newer 
> than library. #1 is the happy path most folks will hopefully be on. For #2 
> and #3, the app will either be sending more or less data than the library 
> expects, but the library will know how much of the structure is valid based 
> on the size field. If the size is too small and the library can't get data it 
> needs, it can catch that and report an error instead of reading past a 
> buffer. And if the size is too large, the library can ignore anything it 
> doesn't know about or it can give an error due to not knowing how to support 
> the semantics of the newer interface.

In the second case, the library ideally should assume that the missing struct 
members are not specified and behave as the corresponding older version. In the 
third case, the app can support older libclang versions by passing successively 
smaller structs until libclang returns a valid `CIndex`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-01 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4096638 , @vedgy wrote:

> In D139774#4096527 , @aaron.ballman 
> wrote:
>
>> That sounds like a good plan to me. I wonder if we want to name it something 
>> like `clang_createIndexWithOptions` (or something generic like that), and 
>> give it a versioned structure of options along the lines of:
>>
>>   struct CIndexOptions {
>> uint32_t Size; // sizeof(struct CIndexOptions), used for option 
>> versioning
>> const char *PreambleStoragePath; // This pointer can be freed after 
>> creating the index
>>   };
>>
>> and define the function to return an error if `Size < sizeof(struct 
>> CIndexOptions)`. This should allow us to add additional options to the 
>> structure without having to introduce a new constructor API each time.
>
> Would this be the recommended usage of such an API?
>
> 1. Call `clang_createIndexWithOptions`.
> 2. If it returns `nullptr` index, report an error in the application (e.g. 
> print a warning or show an error in the UI) and fall back to code paths that 
> support older Clang versions, beginning with calling the older constructor 
> `clang_createIndex`.

Yeah, I would imagine robust code to look like:

  CIndexOptions Opts;
  Opts.Size = sizeof(Opts);
  Opts.PreambleStoragePath = somePathPtr;
  CIndex Idx = clang_createIndexWithOptions(/*excludeDeclsFromPCH=*/1, 
/*displayDiagnostics=*/1, );
  if (!Idx) {
Idx = clang_createIndex(/*excludeDeclsFromPCH=*/1, 
/*displayDiagnostics=*/1);
if (!Idx)
  handleError();
  }

> Is assigning `sizeof(CIndexOptions)` to `Size` the API user's responsibility 
> or should libclang define an inline function `clang_initializeCIndexOptions`?

The user's responsibility. There's three scenarios when a field is added to the 
structure: 1) library and app are matching versions, 2) library is newer than 
app, 3) app is newer than library. #1 is the happy path most folks will 
hopefully be on. For #2 and #3, the app will either be sending more or less 
data than the library expects, but the library will know how much of the 
structure is valid based on the size field. If the size is too small and the 
library can't get data it needs, it can catch that and report an error instead 
of reading past a buffer. And if the size is too large, the library can ignore 
anything it doesn't know about or it can give an error due to not knowing how 
to support the semantics of the newer interface.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-01 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4096527 , @aaron.ballman 
wrote:

> That sounds like a good plan to me. I wonder if we want to name it something 
> like `clang_createIndexWithOptions` (or something generic like that), and 
> give it a versioned structure of options along the lines of:
>
>   struct CIndexOptions {
> uint32_t Size; // sizeof(struct CIndexOptions), used for option versioning
> const char *PreambleStoragePath; // This pointer can be freed after 
> creating the index
>   };
>
> and define the function to return an error if `Size < sizeof(struct 
> CIndexOptions)`. This should allow us to add additional options to the 
> structure without having to introduce a new constructor API each time.

Would this be the recommended usage of such an API?

1. Call `clang_createIndexWithOptions`.
2. If it returns `nullptr` index, report an error in the application (e.g. 
print a warning or show an error in the UI) and fall back to code paths that 
support older Clang versions, beginning with calling the older constructor 
`clang_createIndex`.

Is assigning `sizeof(CIndexOptions)` to `Size` the API user's responsibility or 
should libclang define an inline function `clang_initializeCIndexOptions`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-02-01 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4095522 , @vedgy wrote:

> In D139774#4094619 , @aaron.ballman 
> wrote:
>
>> Is there a middle ground where, instead of #2 for general temporary storage, 
>> we went with #2 but with compiler-specific directories instead of system 
>> directories. e.g., don't let the caller set the temp directory, but do let 
>> the caller set the preamble directory (which defaults to the temp directory) 
>> so long as it's set before invoking the compiler? This still won't let you 
>> change options mid-run, but it also seems like it should have less risk of 
>> affecting other components while still solving the thread safety issues. I'm 
>> not certain if it's any easier to implement, but I think it does save you 
>> from modifying `FileSystemOptions`. As a separate item, we could then 
>> consider adding a new C API to let you toggle the in-memory vs on-disk 
>> functionality after exploring that it won't cause other problems because 
>> nobody considered the possibility that it's not a stable value for the 
>> compiler invocation.
>
> OK, so I'm going to implement overriding the preamble directory in 
> `clang_createIndexWithPreambleStoragePath` (or `clang_createIndex2` or 
> `clang_createIndexExt` or?); try to keep it simple and not modify 
> `FileSystemOptions`; deal with the in-memory option separately later. Abandon 
> this revision now and create another one once ready?

That sounds like a good plan to me. I wonder if we want to name it something 
like `clang_createIndexWithOptions` (or something generic like that), and give 
it a versioned structure of options along the lines of:

  struct CIndexOptions {
uint32_t Size; // sizeof(struct CIndexOptions), used for option versioning
const char *PreambleStoragePath; // This pointer can be freed after 
creating the index
  };

and define the function to return an error if `Size < sizeof(struct 
CIndexOptions)`. This should allow us to add additional options to the 
structure without having to introduce a new constructor API each time.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-31 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4094619 , @aaron.ballman 
wrote:

> Is there a middle ground where, instead of #2 for general temporary storage, 
> we went with #2 but with compiler-specific directories instead of system 
> directories. e.g., don't let the caller set the temp directory, but do let 
> the caller set the preamble directory (which defaults to the temp directory) 
> so long as it's set before invoking the compiler? This still won't let you 
> change options mid-run, but it also seems like it should have less risk of 
> affecting other components while still solving the thread safety issues. I'm 
> not certain if it's any easier to implement, but I think it does save you 
> from modifying `FileSystemOptions`. As a separate item, we could then 
> consider adding a new C API to let you toggle the in-memory vs on-disk 
> functionality after exploring that it won't cause other problems because 
> nobody considered the possibility that it's not a stable value for the 
> compiler invocation.

OK, so I'm going to implement overriding the preamble directory in 
`clang_createIndexWithPreambleStoragePath` (or `clang_createIndex2` or 
`clang_createIndexExt` or?); try to keep it simple and not modify 
`FileSystemOptions`; deal with the in-memory option separately later. Abandon 
this revision now and create another one once ready?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-31 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4092553 , @vedgy wrote:

> In D139774#4091631 , @aaron.ballman 
> wrote:
>
>> My preference is still for specific API names. If we want to do something 
>> general to all temp files, I think `FileSystemOptions` is the way to go.
>>
>> My concern with not using a constructor is that, because this is the C API, 
>> we will be supporting the functionality for a long time even if we do switch 
>> to using a global override in `FileSystemOptions` where you would need to 
>> set the information up before executing the compiler. To me, these paths are 
>> something that should not be possible to change once the compiler has 
>> started executing. (That also solves the multithreading problem where 
>> multiple threads are fighting over what the temp directory should be and 
>> stepping on each other's toes.)
>
> As I understand, this leaves two choices:
>
> 1. Specific preamble API names, two separate non-constructor setters; the 
> option values are stored in a separate struct (or even as two separate data 
> members), not inside `FileSystemOptions`.
> 2. General temporary-storage arguments (possibly combined in a struct) to a 
> new overloaded constructor function; the option values are stored inside the 
> `FileSystemOptions` struct.
>
> The second alternative is likely more difficult to implement, more risky and 
> less convenient to use (the store-in-memory bool option cannot be modified at 
> any time). Perhaps it should be delayed until (and if) we learn of other 
> temporary files libclang creates? A downside of implementing the first option 
> now is that the specific API would have to be supported for a long time, even 
> after the general temporary-file API is implemented.

I still think the general solution is where we ultimately want to get to and 
that makes me hesitant to go with specific preamble API names despite that 
being the direction you prefer. If this wasn't the C APIs, I'd be less 
concerned because we make no stability guarantees about our C++ interfaces. But 
we try really hard not to break the C APIs, so adding the specific names is 
basically committing to not only the APIs but their semantics. I think that 
makes implementing the general solution slightly more awkward because these are 
weird special cases that barely get tested in-tree, so they'd be very easy to 
overlook and accidentally break.

Is there a middle ground where, instead of #2 for general temporary storage, we 
went with #2 but with compiler-specific directories instead of system 
directories. e.g., don't let the caller set the temp directory, but do let the 
caller set the preamble directory (which defaults to the temp directory) so 
long as it's set before invoking the compiler? This still won't let you change 
options mid-run, but it also seems like it should have less risk of affecting 
other components while still solving the thread safety issues. I'm not certain 
if it's any easier to implement, but I think it does save you from modifying 
`FileSystemOptions`. As a separate item, we could then consider adding a new C 
API to let you toggle the in-memory vs on-disk functionality after exploring 
that it won't cause other problems because nobody considered the possibility 
that it's not a stable value for the compiler invocation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-30 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4091631 , @aaron.ballman 
wrote:

> My preference is still for specific API names. If we want to do something 
> general to all temp files, I think `FileSystemOptions` is the way to go.
>
> My concern with not using a constructor is that, because this is the C API, 
> we will be supporting the functionality for a long time even if we do switch 
> to using a global override in `FileSystemOptions` where you would need to set 
> the information up before executing the compiler. To me, these paths are 
> something that should not be possible to change once the compiler has started 
> executing. (That also solves the multithreading problem where multiple 
> threads are fighting over what the temp directory should be and stepping on 
> each other's toes.)

As I understand, this leaves two choices:

1. Specific preamble API names, two separate non-constructor setters; the 
option values are stored in a separate struct (or even as two separate data 
members), not inside `FileSystemOptions`.
2. General temporary-storage arguments (possibly combined in a struct) to a new 
overloaded constructor function; the option values are stored inside the 
`FileSystemOptions` struct.

The second alternative is likely more difficult to implement, more risky and 
less convenient to use (the store-in-memory bool option cannot be modified at 
any time). Perhaps it should be delayed until (and if) we learn of other 
temporary files libclang creates? A downside of implementing the first option 
now is that the specific API would have to be supported for a long time, even 
after the general temporary-file API is implemented.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-30 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4081888 , @vedgy wrote:

> 3 out of 4 alternatives remain:
>
>> 1. Add an option to store the preamble-*.pch files in RAM instead of /tmp 
>> and add a corresponding option in KDevelop configuration UI. This would work 
>> perfectly for me, provided I don't change my mind and decide to turn this 
>> option off, in which case I'll be back to square one.
>> 2. Add an option to store the preamble-*.pch files in a custom directory 
>> instead of a general temporary directory. The option could be named 
>> generally (2a: overrideTemporaryDirectory) or specially (2b: 
>> setPreambleStoragePath). If the option is named generally, other temporary 
>> files created by libclang could be identified in the future and placed in 
>> the same directory without changing the API.
>> 3. 1 and 2 - the options can be propagated between identical end points 
>> together, so this combination is natural.
>
> I think **#3** is the way to go. @aaron.ballman has agreed to this.

I can live with it.

> **#3a** vs **#3b** hasn't been decided upon yet:
>
>> The bool (1) and the path (2) options can be passed through API layers 
>> together in a struct. They can both be named generally 
>> (preferStoringTempFilesInMemory, setTemporaryDirectory) or specifically 
>> (storePreamblesInMemory, setPreambleStoragePath).
>
> @aaron.ballman has put forth arguments in favor of specific names and 
> separate API for different temporary directory uses. I have replied with 
> counterarguments in favor of general temporary storage API.

My preference is still for specific API names. If we want to do something 
general to all temp files, I think `FileSystemOptions` is the way to go.

> In D139774#4081055 , @aaron.ballman 
> wrote:
>
>> modify `FileSystemOptions` to store an override for the temp directory
>
> I have mentioned that `FileSystemOptions` is widely used and only class 
> `ASTUnit` would need the temp directory (and possibly a flag whether to 
> prefer RAM storage). The wide usage in itself is not considered a reason not 
> to add data members.
>
> `ASTUnit::FileSystemOpts` is used in several places in //ASTUnit.cpp//:
>
>   FileSystemOpts = Clang->getFileSystemOpts();
>   AST->FileSystemOpts = CI->getFileSystemOpts();
>   AST->FileMgr = new FileManager(AST->FileSystemOpts, VFS);
>   AST->FileSystemOpts = FileMgr->getFileSystemOpts();
>
> I don't know whether propagating the new options to and from 
> `CompilerInstance`, `CompilerInvocation` and `FileManager` is a good idea. As 
> of now, only `ASTUnit::getMainBufferWithPrecompiledPreamble()` is going to 
> use the new options.
>
>> and updating APIs so that the temp directory can be specified by a new 
>> CIndex constructor function.
>
> Now that the temporary directory options are going to be propagated 
> explicitly, I am no longer sure another constructor function is preferable to 
> separate option setter(s). I don't see any use for temporary directory 
> location in the definition of `clang_createIndex()`. And modifying the 
> temporary directory location multiple times during program execution is no 
> longer a problem: an updated location will be used for new preambles. The 
> store-in-memory bool option in particular should be possible to modify at any 
> time, because it can be an option in an IDE's UI.

My concern with not using a constructor is that, because this is the C API, we 
will be supporting the functionality for a long time even if we do switch to 
using a global override in `FileSystemOptions` where you would need to set the 
information up before executing the compiler. To me, these paths are something 
that should not be possible to change once the compiler has started executing. 
(That also solves the multithreading problem where multiple threads are 
fighting over what the temp directory should be and stepping on each other's 
toes.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-25 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

3 out of 4 alternatives remain:

> 1. Add an option to store the preamble-*.pch files in RAM instead of /tmp and 
> add a corresponding option in KDevelop configuration UI. This would work 
> perfectly for me, provided I don't change my mind and decide to turn this 
> option off, in which case I'll be back to square one.
> 2. Add an option to store the preamble-*.pch files in a custom directory 
> instead of a general temporary directory. The option could be named generally 
> (2a: overrideTemporaryDirectory) or specially (2b: setPreambleStoragePath). 
> If the option is named generally, other temporary files created by libclang 
> could be identified in the future and placed in the same directory without 
> changing the API.
> 3. 1 and 2 - the options can be propagated between identical end points 
> together, so this combination is natural.

I think **#3** is the way to go. @aaron.ballman has agreed to this.

**#3a** vs **#3b** hasn't been decided upon yet:

> The bool (1) and the path (2) options can be passed through API layers 
> together in a struct. They can both be named generally 
> (preferStoringTempFilesInMemory, setTemporaryDirectory) or specifically 
> (storePreamblesInMemory, setPreambleStoragePath).

@aaron.ballman has put forth arguments in favor of specific names and separate 
API for different temporary directory uses. I have replied with 
counterarguments in favor of general temporary storage API.

In D139774#4081055 , @aaron.ballman 
wrote:

> modify `FileSystemOptions` to store an override for the temp directory

I have mentioned that `FileSystemOptions` is widely used and only class 
`ASTUnit` would need the temp directory (and possibly a flag whether to prefer 
RAM storage). The wide usage in itself is not considered a reason not to add 
data members.

`ASTUnit::FileSystemOpts` is used in several places in //ASTUnit.cpp//:

  FileSystemOpts = Clang->getFileSystemOpts();
  AST->FileSystemOpts = CI->getFileSystemOpts();
  AST->FileMgr = new FileManager(AST->FileSystemOpts, VFS);
  AST->FileSystemOpts = FileMgr->getFileSystemOpts();

I don't know whether propagating the new options to and from 
`CompilerInstance`, `CompilerInvocation` and `FileManager` is a good idea. As 
of now, only `ASTUnit::getMainBufferWithPrecompiledPreamble()` is going to use 
the new options.

> and updating APIs so that the temp directory can be specified by a new CIndex 
> constructor function.

Now that the temporary directory options are going to be propagated explicitly, 
I am no longer sure another constructor function is preferable to separate 
option setter(s). I don't see any use for temporary directory location in the 
definition of `clang_createIndex()`. And modifying the temporary directory 
location multiple times during program execution is no longer a problem: an 
updated location will be used for new preambles.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-25 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4079249 , @vedgy wrote:

> Now that a consensus has been reached that this revision is to be discarded, 
> can we please resume the discussion of which alternative should be 
> implemented in the replacement revision?

Based on the discussion here, I think the preference is to modify 
`FileSystemOptions` to store an override for the temp directory, and updating 
APIs so that the temp directory can be specified by a new CIndex constructor 
function. (If anyone disagrees with that as the approach we'd like to take, 
please speak up!)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-25 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

Now that a consensus has been reached that this revision is to be discarded, 
can we please resume the discussion of which alternative should be implemented 
in the replacement revision?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-20 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D139774#4069014 , @aaron.ballman 
wrote:

> In D139774#4066920 , @dblaikie 
> wrote:
>
>> Don't let me hold this up - I think it all feels a bit too ad-hoc for my own 
>> preferences (feels like there should be fairly general solutions to this - 
>> rather than playing whack-a-mole on only the biggest temporary files - both 
>> in terms of the options KDevelop developers have 
>> considered/accepted/declined, and in terms of what's being proposed to be 
>> added to Clang), but I don't feel strongly enough about any of that to 
>> veto/push very hard here.
>
> I think the general solution would be what's proposed here -- allow 
> overriding the system temp directory at the LLVM filesystem API level. But 
> that continues to feel wrong to me -- asking for the system temp directory 
> should get you the system temp directory. For example, nothing says that 
> libclang's need for the temp directory is the same as someone else using the 
> LLVM filesystem APIs within the same library but for totally unrelated 
> purposes. It seems like the wrong layer at which to add this logic -- the 
> base API should get you the system directory, and the caller should be 
> responsible for deciding if they wanted something different than that or not. 
> I realize that turns into a bit of a game of whack-a-mole to find all of the 
> places where we make such decisions, but that seems like the more correct 
> approach to me. If we want this to be controlled by the programmer (and it 
> sounds like KDevelop has very good reasons to want this to be controlled), I 
> think it should become a part of the API interface rather than a hidden 
> one-time setter that impacts the entire process.

I think the only problem I have with changing the lower layer is the race 
condition/making decisions for totally unrelated clients. If there was some 
"LLVMSystemContext" that was passed into every LLVM API & one client could 
create/configure that with one temp directory and use it for LLVM, Clang, etc, 
and another client in the same process could use another, that'd be fine. I 
guess that would allow the independent configuration, but also the singular 
configuration when that's what the client desires. But that's impractical/would 
be too much engineering for this issue.

*shrug* Ah well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-20 Thread Erich Keane via Phabricator via cfe-commits
erichkeane added inline comments.



Comment at: llvm/lib/Support/Unix/Path.inc:1450
 
 void system_temp_directory(bool ErasedOnReboot, SmallVectorImpl ) 
{
   Result.clear();

So I was asked to take a look at this, and I believe that changing this 
function is absolutely the wrong approach.  The purpose of this API is to be 
used to get what the system/terminal owner has set as the temporary directory.  
Allowing a user of the API to change the meaning of this is improper.

If we want to change where certain files are stored, the logic for that needs 
to happen at a higher level than this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-20 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4066920 , @dblaikie wrote:

> Don't let me hold this up - I think it all feels a bit too ad-hoc for my own 
> preferences (feels like there should be fairly general solutions to this - 
> rather than playing whack-a-mole on only the biggest temporary files - both 
> in terms of the options KDevelop developers have 
> considered/accepted/declined, and in terms of what's being proposed to be 
> added to Clang), but I don't feel strongly enough about any of that to 
> veto/push very hard here.

I think the general solution would be what's proposed here -- allow overriding 
the system temp directory at the LLVM filesystem API level. But that continues 
to feel wrong to me -- asking for the system temp directory should get you the 
system temp directory. For example, nothing says that libclang's need for the 
temp directory is the same as someone else using the LLVM filesystem APIs 
within the same library but for totally unrelated purposes. It seems like the 
wrong layer at which to add this logic -- the base API should get you the 
system directory, and the caller should be responsible for deciding if they 
wanted something different than that or not. I realize that turns into a bit of 
a game of whack-a-mole to find all of the places where we make such decisions, 
but that seems like the more correct approach to me. If we want this to be 
controlled by the programmer (and it sounds like KDevelop has very good reasons 
to want this to be controlled), I think it should become a part of the API 
interface rather than a hidden one-time setter that impacts the entire process.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-20 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

One idea discussed in comments to the KDevelop merge request, which I haven't 
mentioned here is this: remove the preamble files immediately after creating 
and opening them. This is safe on Unix-like OSes, because every file that is 
still open will not actually be deleted but its directory entry cleared. It's 
an old Unix trick to mark (open) files as volatile. You could do this with any 
opened file (that you don't want to be able to close and then reopen) 
immediately after creating it in which case it'll get cleaned up even in case 
of a hard crash. However, my analysis of 
//clang/lib/Frontend/PrecompiledPreamble.cpp// shows that such unlinking isn't 
straightforward for the preamble files, e.g. because of 
`PrecompiledPreamble::getSize()`, which checks the file size at the preamble 
file path.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

Don't let me hold this up - I think it all feels a bit too ad-hoc for my own 
preferences (feels like there should be fairly general solutions to this - 
rather than playing whack-a-mole on only the biggest temporary files - both in 
terms of the options KDevelop developers have considered/accepted/declined, and 
in terms of what's being proposed to be added to Clang), but I don't feel 
strongly enough about any of that to veto/push very hard here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4066593 , @dblaikie wrote:

>>> (1) seems OK-ish, I guess there's some question as to what the tradeoff is 
>>> for that option - does that blow out memory usage of the client/kdevelop? 
>>> But I guess it's probably fine.
>>
>> Do you think we should do one of the options for (2) or do you think (1) 
>> should be sufficient?
>
> If (1) is sufficient for KDevelop's needs and already implemented in some 
> form for clangd, if I understand correctly, that seems the cheapest/least 
> involved?

Not sufficient for all KDevelop users. Namely it doesn't work for low-RAM 
systems where /tmp is on disk to save RAM.

The bool (1) and the path (2) options can be passed through API layers together 
in a struct. They can both be named generally (preferStoringTempFilesInMemory, 
setTemporaryDirectory) or specifically (storePreamblesInMemory, 
setPreambleStoragePath).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D139774#4066574 , @aaron.ballman 
wrote:

> In D139774#4066519 , @dblaikie 
> wrote:
>
>> I've mixed feelings about this, but yeah, I guess the issues with global 
>> state are pretty undesirable. I don't worry too much about (2) - doesn't 
>> feel too problematic to me (for any individual file, presumably the 
>> implementation recorded the name of the file if they were going to access it 
>> again - so still be using the old path if necessary).
>
> FWIW, #2 relates to #3 for me -- it's not that I worry the compiler will 
> guess the wrong path, it's more that I worry users will get confused because 
> some temp files go in one place while other temp files go in another place, 
> and they file issues we track down to timing issues.

I'd be willing to roll the dice on that - my guess is it wouldn't be too bad - 
but just guessing. (as all good/difficult design discussions are - debating 
predictions of the future based on our respective experience)

>> But, yeah, lack of any "system" abstraction that could be passed into all 
>> the various LLVM/MC/AST contexts to swap out the implementation limits the 
>> options a bit to more ad-hoc/case-by-case solutions. I feel like that's not 
>> a great solution to KDevelop's general problem, though - they're dealing 
>> with one particularly big file, but it doesn't feel very principled to me to 
>> fix that one compared to all the other (albeit smaller) temp files & maybe 
>> at some point in the future some bigger temp files are created elsewhere...
>>
>>> So my preference is for components to have an option to pick a different 
>>> location as part of their API layer, rather than at the lower level. Based 
>>> on that, I'm definitely in support of #1, and I'm in support of one of the 
>>> options for #2. I lean towards #2b over #2a due to wanting individual 
>>> overrides rather than a blanket override (e.g., we should be able to put 
>>> preamble files in a different location than we put, say, crash logs).
>>>
>>> @dblaikie does that sound reasonable to you?
>>
>> (1) seems OK-ish, I guess there's some question as to what the tradeoff is 
>> for that option - does that blow out memory usage of the client/kdevelop? 
>> But I guess it's probably fine.
>
> Do you think we should do one of the options for (2) or do you think (1) 
> should be sufficient?

If (1) is sufficient for KDevelop's needs and already implemented in some form 
for clangd, if I understand correctly, that seems the cheapest/least involved?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4066519 , @dblaikie wrote:

> I've mixed feelings about this, but yeah, I guess the issues with global 
> state are pretty undesirable. I don't worry too much about (2) - doesn't feel 
> too problematic to me (for any individual file, presumably the implementation 
> recorded the name of the file if they were going to access it again - so 
> still be using the old path if necessary).

FWIW, #2 relates to #3 for me -- it's not that I worry the compiler will guess 
the wrong path, it's more that I worry users will get confused because some 
temp files go in one place while other temp files go in another place, and they 
file issues we track down to timing issues.

> But, yeah, lack of any "system" abstraction that could be passed into all the 
> various LLVM/MC/AST contexts to swap out the implementation limits the 
> options a bit to more ad-hoc/case-by-case solutions. I feel like that's not a 
> great solution to KDevelop's general problem, though - they're dealing with 
> one particularly big file, but it doesn't feel very principled to me to fix 
> that one compared to all the other (albeit smaller) temp files & maybe at 
> some point in the future some bigger temp files are created elsewhere...
>
>> So my preference is for components to have an option to pick a different 
>> location as part of their API layer, rather than at the lower level. Based 
>> on that, I'm definitely in support of #1, and I'm in support of one of the 
>> options for #2. I lean towards #2b over #2a due to wanting individual 
>> overrides rather than a blanket override (e.g., we should be able to put 
>> preamble files in a different location than we put, say, crash logs).
>>
>> @dblaikie does that sound reasonable to you?
>
> (1) seems OK-ish, I guess there's some question as to what the tradeoff is 
> for that option - does that blow out memory usage of the client/kdevelop? But 
> I guess it's probably fine.

Do you think we should do one of the options for (2) or do you think (1) should 
be sufficient?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4066519 , @dblaikie wrote:

> (1) seems OK-ish, I guess there's some question as to what the tradeoff is 
> for that option - does that blow out memory usage of the client/kdevelop? But 
> I guess it's probably fine.

At least one KDevelop user who has limited RAM and keeps /tmp on disk (not 
tmpfs) protested vehemently against unconditional storing of the temporary 
files in RAM. Several gigabytes of RAM can be valuable on some systems. Storing 
huge files in a temporary directory is definitely more flexible. But to users 
who keep /tmp on tmpfs, storing the preamble files in RAM is preferable, 
because memory is always freed right after a KDevelop crash, not on next start 
of the same KDevelop session (which can occur much later).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D139774#4065753 , @aaron.ballman 
wrote:

> In D139774#4063131 , @vedgy wrote:
>
>> After a discussion under the corresponding KDevelop merge request, I can see 
>> 4-6 alternative ways to address the temporary directory issue:
>>
>> 1. Add an option to store the //preamble-*.pch// files in RAM instead of 
>> /tmp and add a corresponding option in KDevelop configuration UI. This would 
>> work perfectly for me, provided I don't change my mind and decide to turn 
>> this option off, in which case I'll be back to square one.
>> 2. Add an option to store the //preamble-*.pch// files in a custom directory 
>> instead of a general temporary directory. The option could be named 
>> generally (2a: overrideTemporaryDirectory) or specially (2b: 
>> setPreambleStoragePath). If the option is named generally, other temporary 
>> files created by libclang could be identified in the future and placed in 
>> the same directory without changing the API.
>> 3. 1 and 2 - the options can be propagated between identical end points 
>> together, so this combination is natural.
>> 4. The current patch. This is the easiest (already implemented) and most 
>> reliable (the temporary directory is definitely and completely overridden) 
>> approach. But there are thread safety concerns due to the introduction of 
>> global state.
>>
>> If the 4th option is unacceptable, I lean towards the option 3a (1 and 2a), 
>> because the amount of implementation work should not be much greater than 1 
>> alone or 2a alone. If the 4th option is unacceptable, I suppose a separate 
>> review request based on the current LLVM main branch should be created and 
>> this one closed, right?
>
> From a design perspective, my preference is to not add new APIs at the file 
> system support layer in LLVM to override the temporary directory but instead 
> allow individual components to override the decision where to put files. 
> Overriding a system directory at the LLVM support level feels unclean to me 
> because 1) threading concerns (mostly related to lock performance), 2) 
> consistency concerns (put files in temp directory, override temp directory, 
> put additional files in the new directory), 3) principle of least surprise. 
> To me, the LLVM layer is responsible for interfacing with the system and 
> asking for the system temp directory should get you the same answer as 
> querying for that from the OS directly.

I've mixed feelings about this, but yeah, I guess the issues with global state 
are pretty undesirable. I don't worry too much about (2) - doesn't feel too 
problematic to me (for any individual file, presumably the implementation 
recorded the name of the file if they were going to access it again - so still 
be using the old path if necessary).

But, yeah, lack of any "system" abstraction that could be passed into all the 
various LLVM/MC/AST contexts to swap out the implementation limits the options 
a bit to more ad-hoc/case-by-case solutions. I feel like that's not a great 
solution to KDevelop's general problem, though - they're dealing with one 
particularly big file, but it doesn't feel very principled to me to fix that 
one compared to all the other (albeit smaller) temp files & maybe at some point 
in the future some bigger temp files are created elsewhere...

> So my preference is for components to have an option to pick a different 
> location as part of their API layer, rather than at the lower level. Based on 
> that, I'm definitely in support of #1, and I'm in support of one of the 
> options for #2. I lean towards #2b over #2a due to wanting individual 
> overrides rather than a blanket override (e.g., we should be able to put 
> preamble files in a different location than we put, say, crash logs).
>
> @dblaikie does that sound reasonable to you?

(1) seems OK-ish, I guess there's some question as to what the tradeoff is for 
that option - does that blow out memory usage of the client/kdevelop? But I 
guess it's probably fine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4065753 , @aaron.ballman 
wrote:

> I lean towards #2b over #2a due to wanting individual overrides rather than a 
> blanket override (e.g., we should be able to put preamble files in a 
> different location than we put, say, crash logs).

Crash logs don't really belong to a common temporary directory. Are they 
actually placed into a temporary directory erased on reboot? I'd prefer coarser 
categories, like temp dir erased-on-reboot and not-erased-on-reboot, similar to 
`void system_temp_directory(bool erasedOnReboot` in //Path.h//. This definitely 
makes more sense to the use case of removing these temporary files on next 
application start. Why would a libclang-using program care about categories of 
its internally-used temporary files?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-19 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4063131 , @vedgy wrote:

> After a discussion under the corresponding KDevelop merge request, I can see 
> 4-6 alternative ways to address the temporary directory issue:
>
> 1. Add an option to store the //preamble-*.pch// files in RAM instead of /tmp 
> and add a corresponding option in KDevelop configuration UI. This would work 
> perfectly for me, provided I don't change my mind and decide to turn this 
> option off, in which case I'll be back to square one.
> 2. Add an option to store the //preamble-*.pch// files in a custom directory 
> instead of a general temporary directory. The option could be named generally 
> (2a: overrideTemporaryDirectory) or specially (2b: setPreambleStoragePath). 
> If the option is named generally, other temporary files created by libclang 
> could be identified in the future and placed in the same directory without 
> changing the API.
> 3. 1 and 2 - the options can be propagated between identical end points 
> together, so this combination is natural.
> 4. The current patch. This is the easiest (already implemented) and most 
> reliable (the temporary directory is definitely and completely overridden) 
> approach. But there are thread safety concerns due to the introduction of 
> global state.
>
> If the 4th option is unacceptable, I lean towards the option 3a (1 and 2a), 
> because the amount of implementation work should not be much greater than 1 
> alone or 2a alone. If the 4th option is unacceptable, I suppose a separate 
> review request based on the current LLVM main branch should be created and 
> this one closed, right?

From a design perspective, my preference is to not add new APIs at the file 
system support layer in LLVM to override the temporary directory but instead 
allow individual components to override the decision where to put files. 
Overriding a system directory at the LLVM support level feels unclean to me 
because 1) threading concerns (mostly related to lock performance), 2) 
consistency concerns (put files in temp directory, override temp directory, put 
additional files in the new directory), 3) principle of least surprise. To me, 
the LLVM layer is responsible for interfacing with the system and asking for 
the system temp directory should get you the same answer as querying for that 
from the OS directly.

So my preference is for components to have an option to pick a different 
location as part of their API layer, rather than at the lower level. Based on 
that, I'm definitely in support of #1, and I'm in support of one of the options 
for #2. I lean towards #2b over #2a due to wanting individual overrides rather 
than a blanket override (e.g., we should be able to put preamble files in a 
different location than we put, say, crash logs).

@dblaikie does that sound reasonable to you?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-18 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

After a discussion under the corresponding KDevelop merge request, I can see 
4-6 alternative ways to address the temporary directory issue:

1. Add an option to store the //preamble-*.pch// files in RAM instead of /tmp 
and add a corresponding option in KDevelop configuration UI. This would work 
perfectly for me, provided I don't change my mind and decide to turn this 
option off, in which case I'll be back to square one.
2. Add an option to store the //preamble-*.pch// files in a custom directory 
instead of a general temporary directory. The option could be named generally 
(2a: overrideTemporaryDirectory) or specially (2b: setPreambleStoragePath). If 
the option is named generally, other temporary files created by libclang could 
be identified in the future and placed in the same directory without changing 
the API.
3. 1 and 2 - the options can be propagated between identical end points 
together, so this combination is natural.
4. The current patch. This is the easiest (already implemented) and most 
reliable (the temporary directory is definitely and completely overridden) 
approach. But there are thread safety concerns due to the introduction of 
global state.

If the 4th option is unacceptable, I lean towards the option 3a (1 and 2a), 
because the amount of implementation work should not be much greater than 1 
alone or 2a alone. If the 4th option is unacceptable, I suppose a separate 
review request based on the current LLVM main branch should be created and this 
one closed, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-12 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4047976 , @dblaikie wrote:

> It seemed like the places where kdevelop would want to suppress the temp dir 
> cleanup would be enumerable/more within kdevelop's control than instances of 
> libraries kdevelop is using wanting to create their own unexposed temp files. 
> But, yeah, would be a bunch of work to go and touch all those places. Ah well 
> *shrug*

That may be true. But a few small-size generated files slipping into the system 
temporary directory is not much of a problem. /tmp is cleared on shutdown after 
all. Causing bugs in a user executable run from KDevelop is much worse.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-12 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D139774#4046308 , @vedgy wrote:

> In D139774#4045361 , @dblaikie 
> wrote:
>
>> 1. Should clang be doing something better with these temp files anyway, no 
>> matter the directory they go in? (setting them for cleanup at process exit 
>> or the like - I think we have such a filesystem abstraction, maybe that only 
>> works on Windows? I'm not sure)
>
> That'd be great, but appears unfeasible on GNU/Linux in case of a crash: 
> https://stackoverflow.com/questions/471344/guaranteed-file-deletion-upon-program-termination-c-c
>
>> 2. If there isn't a better general way to avoid creating temp trash that's a 
>> problem, I'd have thought that KDevelop/other tools would have issues with 
>> other temp files too, and want a more general solution (I'd have thought 
>> changing the process's temp directory, and changing it back for user process 
>> launches, would be sufficient - so it can cleanup after itself for all 
>> files, not just these particular clang-created ones)
>
> I do plan to put other temporary files KDevelop generates in the same 
> session-specific temporary directory and clean it on start. But modifying 
> KDevelop's temporary directory environment variable has been rejected during 
> code review for good reason. As the bug this review aims to fix 
>  puts it:
>
>> setting the environment variables is problematic, because they are inherited 
>> by the IDE's code and all child processes it spawns (compiler, build system 
>> and user-provided executables). The IDE must then remove the temporary 
>> directory environment variable from each child process where it can cause 
>> undesirable behavior.

Yeah, sorry, I didn't entirely follow/understand the discussion on the linked 
bug - part of the reason I brought it up here.

It seemed like the places where kdevelop would want to suppress the temp dir 
cleanup would be enumerable/more within kdevelop's control than instances of 
libraries kdevelop is using wanting to create their own unexposed temp files. 
But, yeah, would be a bunch of work to go and touch all those places. Ah well 
*shrug*


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-12 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a subscriber: milianw.
vedgy added a comment.

In D139774#4045361 , @dblaikie wrote:

> 1. Should clang be doing something better with these temp files anyway, no 
> matter the directory they go in? (setting them for cleanup at process exit or 
> the like - I think we have such a filesystem abstraction, maybe that only 
> works on Windows? I'm not sure)

That'd be great, but appears unfeasible on GNU/Linux in case of a crash: 
https://stackoverflow.com/questions/471344/guaranteed-file-deletion-upon-program-termination-c-c

> 2. If there isn't a better general way to avoid creating temp trash that's a 
> problem, I'd have thought that KDevelop/other tools would have issues with 
> other temp files too, and want a more general solution (I'd have thought 
> changing the process's temp directory, and changing it back for user process 
> launches, would be sufficient - so it can cleanup after itself for all files, 
> not just these particular clang-created ones)

I do plan to put other temporary files KDevelop generates in the same 
session-specific temporary directory and clean it on start. But modifying 
KDevelop's temporary directory environment variable has been rejected during 
code review for good reason. As the bug this review aims to fix 
 puts it:

> setting the environment variables is problematic, because they are inherited 
> by the IDE's code and all child processes it spawns (compiler, build system 
> and user-provided executables). The IDE must then remove the temporary 
> directory environment variable from each child process where it can cause 
> undesirable behavior.



In D139774#4045460 , @MaskRay wrote:

> libclang functions like `clang_reparseTranslationUnit_Impl` call 
> `clang/lib/Frontend/ASTUnit.cpp:1397` and build the preamble with 
> `/*StoreInMemory=*/false`.
> If `StoreInMemory` is configurable (true), then you can avoid the temporary 
> file `preamble-*.pch`.
>
> `clang/lib/Frontend/ASTUnit.cpp` is used by `clang/lib/Tooling/Tooling.cpp` 
> and libclang. Personally I think an unconditional `/*StoreInMemory=*/true` is 
> fine. (The concern is slightly memory usage increase).

This is a simple and promising solution. But maybe it should be configurable. 
The `StoreInMemory` option was introduced for the benefit of //clangd//, and 
//clangd// still stores the preambles on disk by default. See D39843 
. I'm pinging another KDevelop developer who 
may know better how this should work in KDevelop: @milianw


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-11 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

libclang functions like `clang_reparseTranslationUnit_Impl` call 
`clang/lib/Frontend/ASTUnit.cpp:1397` and build the preamble with 
`/*StoreInMemory=*/false`.
If `StoreInMemory` is configurable (true), then you can avoid the temporary 
file `preamble-*.pch`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-11 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D139774#4044258 , @aaron.ballman 
wrote:

> In D139774#4043886 , @vedgy wrote:
>
>> In D139774#4041308 , 
>> @aaron.ballman wrote:
>>
>>> Is your concern essentially that someone will add a new use to put files in 
>>> a temp directory and not have access to this information from ASTUnit? Or 
>>> do you have other concerns in mind?
>>>
>>> We should certainly investigate whether there are other uses of temp files 
>>> from libclang as part of these changes. It's possible we'll find a 
>>> situation that makes my suggestion untenable because we don't have easy 
>>> access to the information we need.
>>
>> The temporary directory could be used, now or in future, by some support 
>> code, which is used indirectly by libclang. I found the following uses 
>> potentially relevant to libclang:
>>
>> - `Driver::GetTemporaryDirectory(StringRef Prefix)` calls 
>> `llvm::sys::fs::createUniqueDirectory(Prefix, Path);`
>> - `StandardInstrumentations` => `DotCfgChangeReporter` calls 
>> `sys::fs::createUniquePath("cfgdot-%%.dot", SV, true);`
>> - There are plenty of `createTemporaryFile()` uses in llvm and clang. Some 
>> of them are likely used by libclang.
>>
>> I don't know how to efficiently check whether or not each of these indirect 
>> `system_temp_directory()` uses is in turn used by libclang. Even if they 
>> aren't know, they might be later, when libclang API grows.
>
> Oof, that is more exposure than I was thinking we had...
>
>> On a different note, do you think overriding temporary directory path is 
>> useful only to libclang and not likely to be useful to any other LLVM API 
>> users?
>
> I don't think there are any in-tree needs for that functionality, and the 
> solution is fragile enough that I'd like to avoid it if possible (for 
> example, it also makes the API much harder to use across threads because now 
> it's accessing global state). That said, I think the libclang use case you 
> have is compelling and worth solving in-tree if we can (so others don't have 
> to find similar workarounds).
>
 Another issue is with the `FileSystemOptions` part: this class is widely 
 used in Clang. So adding a member to it could adversely affect performance 
 of unrelated code.
>>>
>>> It's good to keep an eye on that, but I'm not too worried about the 
>>> overhead from it (I don't see uses in AST nodes). We can keep an eye on 
>>> https://llvm-compile-time-tracker.com/ to see if there is a surprising 
>>> compile time increase though.
>>
>> [in case we pursue this design approach, which I currently don't feel is 
>> right] Why not add a temporary path data member into `class ASTUnit` under 
>> the `FileSystemOptions FileSystemOpts` member, not inside it? So that other 
>> code is not burdened with unused struct member, and to be able to use a more 
>> suitable type, like `SmallString` for the temporary path buffer.
>
> That's certainly an option, but the design of that would be a bit strange in 
> that we have some file system options in one place and other file system 
> options in another place. I think ultimately, we're going to want them all to 
> live on `FileSystemOptions`. That said, I'm going to rope in some more folks 
> for a wider selection of opinions on how/if to proceed. CC @dblaikie @MaskRay 
> @d0k

Yeah, I think, at a quick glance, I'd lean towards FilesystemOptions too - it's 
mostly used as a member of FileManager, which already has a lot of 
members/fairly large footprint, so I wouldn't worry too much about this having 
a huge impact on the total memory usage/perf/etc.

I'm not sure how I feel about the general premise (though I don't have enough 
context/ownership to want to veto any direction/progress here, just thoughts in 
case they resonate):

1. Should clang be doing something better with these temp files anyway, no 
matter the directory they go in? (setting them for cleanup at process exit or 
the like - I think we have such a filesystem abstraction, maybe that only works 
on Windows? I'm not sure)
2. If there isn't a better general way to avoid creating temp trash that's a 
problem, I'd have thought that KDevelop/other tools would have issues with 
other temp files too, and want a more general solution (I'd have thought 
changing the process's temp directory, and changing it back for user process 
launches, would be sufficient - so it can cleanup after itself for all files, 
not just these particular clang-created ones)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-11 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added subscribers: MaskRay, dblaikie.
aaron.ballman added a comment.

In D139774#4043886 , @vedgy wrote:

> In D139774#4041308 , @aaron.ballman 
> wrote:
>
>> Is your concern essentially that someone will add a new use to put files in 
>> a temp directory and not have access to this information from ASTUnit? Or do 
>> you have other concerns in mind?
>>
>> We should certainly investigate whether there are other uses of temp files 
>> from libclang as part of these changes. It's possible we'll find a situation 
>> that makes my suggestion untenable because we don't have easy access to the 
>> information we need.
>
> The temporary directory could be used, now or in future, by some support 
> code, which is used indirectly by libclang. I found the following uses 
> potentially relevant to libclang:
>
> - `Driver::GetTemporaryDirectory(StringRef Prefix)` calls 
> `llvm::sys::fs::createUniqueDirectory(Prefix, Path);`
> - `StandardInstrumentations` => `DotCfgChangeReporter` calls 
> `sys::fs::createUniquePath("cfgdot-%%.dot", SV, true);`
> - There are plenty of `createTemporaryFile()` uses in llvm and clang. Some of 
> them are likely used by libclang.
>
> I don't know how to efficiently check whether or not each of these indirect 
> `system_temp_directory()` uses is in turn used by libclang. Even if they 
> aren't know, they might be later, when libclang API grows.

Oof, that is more exposure than I was thinking we had...

> On a different note, do you think overriding temporary directory path is 
> useful only to libclang and not likely to be useful to any other LLVM API 
> users?

I don't think there are any in-tree needs for that functionality, and the 
solution is fragile enough that I'd like to avoid it if possible (for example, 
it also makes the API much harder to use across threads because now it's 
accessing global state). That said, I think the libclang use case you have is 
compelling and worth solving in-tree if we can (so others don't have to find 
similar workarounds).

>>> Another issue is with the `FileSystemOptions` part: this class is widely 
>>> used in Clang. So adding a member to it could adversely affect performance 
>>> of unrelated code.
>>
>> It's good to keep an eye on that, but I'm not too worried about the overhead 
>> from it (I don't see uses in AST nodes). We can keep an eye on 
>> https://llvm-compile-time-tracker.com/ to see if there is a surprising 
>> compile time increase though.
>
> [in case we pursue this design approach, which I currently don't feel is 
> right] Why not add a temporary path data member into `class ASTUnit` under 
> the `FileSystemOptions FileSystemOpts` member, not inside it? So that other 
> code is not burdened with unused struct member, and to be able to use a more 
> suitable type, like `SmallString` for the temporary path buffer.

That's certainly an option, but the design of that would be a bit strange in 
that we have some file system options in one place and other file system 
options in another place. I think ultimately, we're going to want them all to 
live on `FileSystemOptions`. That said, I'm going to rope in some more folks 
for a wider selection of opinions on how/if to proceed. CC @dblaikie @MaskRay 
@d0k


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-11 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4041308 , @aaron.ballman 
wrote:

> Is your concern essentially that someone will add a new use to put files in a 
> temp directory and not have access to this information from ASTUnit? Or do 
> you have other concerns in mind?
>
> We should certainly investigate whether there are other uses of temp files 
> from libclang as part of these changes. It's possible we'll find a situation 
> that makes my suggestion untenable because we don't have easy access to the 
> information we need.

The temporary directory could be used, now or in future, by some support code, 
which is used indirectly by libclang. I found the following uses potentially 
relevant to libclang:

- `Driver::GetTemporaryDirectory(StringRef Prefix)` calls 
`llvm::sys::fs::createUniqueDirectory(Prefix, Path);`
- `StandardInstrumentations` => `DotCfgChangeReporter` calls 
`sys::fs::createUniquePath("cfgdot-%%.dot", SV, true);`
- There are plenty of `createTemporaryFile()` uses in llvm and clang. Some of 
them are likely used by libclang.

I don't know how to efficiently check whether or not each of these indirect 
`system_temp_directory()` uses is in turn used by libclang. Even if they aren't 
know, they might be later, when libclang API grows.

On a different note, do you think overriding temporary directory path is useful 
only to libclang and not likely to be useful to any other LLVM API users?

>> Another issue is with the `FileSystemOptions` part: this class is widely 
>> used in Clang. So adding a member to it could adversely affect performance 
>> of unrelated code.
>
> It's good to keep an eye on that, but I'm not too worried about the overhead 
> from it (I don't see uses in AST nodes). We can keep an eye on 
> https://llvm-compile-time-tracker.com/ to see if there is a surprising 
> compile time increase though.

[in case we pursue this design approach, which I currently don't feel is right] 
Why not add a temporary path data member into `class ASTUnit` under the 
`FileSystemOptions FileSystemOpts` member, not inside it? So that other code is 
not burdened with unused struct member, and to be able to use a more suitable 
type, like `SmallString` for the temporary path buffer.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-10 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4041169 , @vedgy wrote:

> In D139774#4040705 , @aaron.ballman 
> wrote:
>
>> At a point where we have a `CIndexer` object, we eventually call 
>> `ASTUnit::LoadFromCommandLine()` and `ASTUnit` has a member 
>> `FileSystemOptions FileSystemOpts`, and `FileSystemOptions` has a 
>> `std::string` for the working directory to use. I think we should store the 
>> temp directory in here as well, and when 
>> `ASTUnit::getMainBufferWithPrecompiledPreamble()` build the preamble, it can 
>> pass that information along to `TempPCHFile` to avoid needing to ask the 
>> system for the temp directory.
>
> The path would have to be passed into several functions.

Understood, and hopefully that doesn't make this a viral slog.

> `TempPCHFile::create()` would have to use `createUniqueFile()` instead of 
> `createTemporaryFile()`. My greatest concern with this improved design is 
> that libclang may use the temporary directory for other purposes - if not 
> yet, then in the future.

Is your concern essentially that someone will add a new use to put files in a 
temp directory and not have access to this information from ASTUnit? Or do you 
have other concerns in mind?

We should certainly investigate whether there are other uses of temp files from 
libclang as part of these changes. It's possible we'll find a situation that 
makes my suggestion untenable because we don't have easy access to the 
information we need.

> Another issue is with the `FileSystemOptions` part: this class is widely used 
> in Clang. So adding a member to it could adversely affect performance of 
> unrelated code.

It's good to keep an eye on that, but I'm not too worried about the overhead 
from it (I don't see uses in AST nodes). We can keep an eye on 
https://llvm-compile-time-tracker.com/ to see if there is a surprising compile 
time increase though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-10 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4040705 , @aaron.ballman 
wrote:

> At a point where we have a `CIndexer` object, we eventually call 
> `ASTUnit::LoadFromCommandLine()` and `ASTUnit` has a member 
> `FileSystemOptions FileSystemOpts`, and `FileSystemOptions` has a 
> `std::string` for the working directory to use. I think we should store the 
> temp directory in here as well, and when 
> `ASTUnit::getMainBufferWithPrecompiledPreamble()` build the preamble, it can 
> pass that information along to `TempPCHFile` to avoid needing to ask the 
> system for the temp directory.

The path would have to be passed into several functions. 
`TempPCHFile::create()` would have to use `createUniqueFile()` instead of 
`createTemporaryFile()`. My greatest concern with this improved design is that 
libclang may use the temporary directory for other purposes - if not yet, then 
in the future. Another issue is with the `FileSystemOptions` part: this class 
is widely used in Clang. So adding a member to it could adversely affect 
performance of unrelated code.

> This does mean we store two copies of the string (one in `CIndexer` and one 
> in `FileSystemOptions`, but I think the improved ownership semantics for the 
> C API makes that duplication somewhat reasonable.

If LLVM does not have its own shared buffer type, a `std::shared_ptr` could be used instead of `std::string`. Or `SmallString` (not shared, 
but avoids allocations given a not too long overriding path).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-10 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4040057 , @vedgy wrote:

> In D139774#4039975 , @aaron.ballman 
> wrote:
>
>> Oh that is a good point! Apologies for not noticing that earlier -- that 
>> makes me wonder if a better approach here is to add a `std::string` to the 
>> `CIndexer` class to represent the temp path to use.
>
> I have suggested the possibility in this review:
>
>> A copy of user-provided temporary directory path buffer can be stored in 
>> class `CIndexer`. But this API in //llvm/Support/Path.h// would stay fragile.
>
> So the question is whether the LLVM API or `CIndexer` should store the buffer.

My goal is to not modify the LLVM path APIs at all.

> The only possible caller of `system_temp_directory()` used by libclang is 
> `llvm::sys::fs::createUniquePath()`. This helper function is called by 
> `createUniqueEntity()`, which in turn is called by several other helper 
> functions, all in //llvm/lib/Support/Path.cpp//. Here is a backtrace from 
> KDevelop built against LLVM at this review's branch:
>
>   
>   #6 0x7fb3717cc1b7 in (anonymous namespace)::TempPCHFile::create () at 
> llvm-project/llvm/include/llvm/ADT/Twine.h:233
>   #7 clang::PrecompiledPreamble::Build(clang::CompilerInvocation const&, 
> llvm::MemoryBuffer const*, clang::PreambleBounds, clang::DiagnosticsEngine&, 
> llvm::IntrusiveRefCntPtr, 
> std::shared_ptr, bool, 
> clang::PreambleCallbacks&) (Invocation=..., MainFileBuffer=0x7fb35000a360, 
> Bounds=Bounds@entry=..., Diagnostics=..., VFS=..., 
> PCHContainerOps=std::shared_ptr (use count 5, 
> weak count 0) = {...}, StoreInMemory=false, Callbacks=...) at 
> llvm-project/clang/lib/Frontend/PrecompiledPreamble.cpp:421
>   #8 0x7fb3717234a8 in 
> clang::ASTUnit::getMainBufferWithPrecompiledPreamble(std::shared_ptr,
>  clang::CompilerInvocation&, llvm::IntrusiveRefCntPtr, 
> bool, unsigned int) (this=0x7fb3505c44d0, 
> PCHContainerOps=std::shared_ptr (use count 5, 
> weak count 0) = {...}, PreambleInvocationIn=..., VFS=..., 
> AllowRebuild=, MaxLines=0) at 
> /usr/include/c++/12.2.0/bits/unique_ptr.h:191
>   #9 0x7fb371729bd6 in 
> clang::ASTUnit::LoadFromCompilerInvocation(std::shared_ptr,
>  unsigned int, llvm::IntrusiveRefCntPtr) 
> (this=0x7fb3505c44d0, 
> PCHContainerOps=std::shared_ptr (use count 5, 
> weak count 0) = {...}, PrecompilePreambleAfterNParses=, 
> VFS=...) at llvm-project/clang/lib/Frontend/ASTUnit.cpp:1685
>   #10 0x7fb37172e359 in clang::ASTUnit::LoadFromCommandLine(char const**, 
> char const**, std::shared_ptr, 
> llvm::IntrusiveRefCntPtr, llvm::StringRef, bool, 
> clang::CaptureDiagsKind, 
> llvm::ArrayRef std::char_traits, std::allocator >, llvm::MemoryBuffer*> >, bool, 
> unsigned int, clang::TranslationUnitKind, bool, bool, bool, 
> clang::SkipFunctionBodiesScope, bool, bool, bool, bool, 
> llvm::Optional, std::unique_ptr std::default_delete >*, 
> llvm::IntrusiveRefCntPtr) (ArgBegin=, 
> ArgEnd=, 
> PCHContainerOps=std::shared_ptr (empty) = 
> {...}, Diags=..., ResourceFilesPath=..., OnlyLocalDecls=false, 
> CaptureDiagnostics=clang::CaptureDiagsKind::All, RemappedFiles=..., 
> RemappedFilesKeepOriginalName=true, PrecompilePreambleAfterNParses=1, 
> TUKind=clang::TU_Complete, CacheCodeCompletionResults=true, 
> IncludeBriefCommentsInCodeCompletion=false, AllowPCHWithCompilerErrors=true, 
> SkipFunctionBodies=clang::SkipFunctionBodiesScope::None, 
> SingleFileParse=false, UserFilesAreVolatile=true, ForSerialization=false, 
> RetainExcludedConditionalBlocks=false, ModuleFormat=..., 
> ErrAST=0x7fb36effd2d8, VFS=...) at 
> llvm-project/clang/lib/Frontend/ASTUnit.cpp:1822
>   #11 0x7fb37104e462 in clang_parseTranslationUnit_Impl(CXIndex, char 
> const*, char const* const*, int, llvm::ArrayRef, unsigned int, 
> CXTranslationUnit*) (CIdx=0x55dbe16c9360, source_filename=, 
> command_line_args=, num_command_line_args=, 
> unsaved_files=..., options=781, out_TU=0x7fb35ca7f630) at 
> /usr/include/c++/12.2.0/bits/stl_vector.h:987
>   #12 0x7fb37104f2b4 in operator() (__closure=0x7fb37a7b6f40) at 
> llvm-project/clang/tools/libclang/CIndex.cpp:3976
>   #13 
> llvm::function_ref::callback_fn  char const*, char const* const*, int, CXUnsavedFile*, unsigned int, unsigned 
> int, CXTranslationUnitImpl**):: >(intptr_t) 
> (callable=callable@entry=140408830783296) at 
> llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45
>   #14 0x7fb3729cc6e0 in llvm::function_ref::operator()() const 
> (this=) at 
> llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68
>   #15 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref) 
> (this=, Fn=...) at 
> llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:433
>   #16 0x7fb3729cc724 in RunSafelyOnThread_Dispatch(void*) 
> (UserData=0x7fb37a7b6e80) at 
> llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:514
>   #17 0x7fb3729cc19a in llvm::thread::Apply 

[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-10 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4039975 , @aaron.ballman 
wrote:

> Oh that is a good point! Apologies for not noticing that earlier -- that 
> makes me wonder if a better approach here is to add a `std::string` to the 
> `CIndexer` class to represent the temp path to use.

I have suggested the possibility in this review:

> A copy of user-provided temporary directory path buffer can be stored in 
> class `CIndexer`. But this API in //llvm/Support/Path.h// would stay fragile.

So the question is whether the LLVM API or `CIndexer` should store the buffer.

The only possible caller of `system_temp_directory()` used by libclang is 
`llvm::sys::fs::createUniquePath()`. This helper function is called by 
`createUniqueEntity()`, which in turn is called by several other helper 
functions, all in //llvm/lib/Support/Path.cpp//. Here is a backtrace from 
KDevelop built against LLVM at this review's branch:

  #0 llvm::sys::path::system_temp_directory(bool, llvm::SmallVectorImpl&) 
(ErasedOnReboot=true, Result=...) at 
/home/Mint14_home/igor/Documents/C/LinuxProjects/external/llvm-project/llvm/lib/Support/Unix/Path.inc:1451
  #1 0x7fb372a55d60 in llvm::sys::fs::createUniquePath(llvm::Twine const&, 
llvm::SmallVectorImpl&, bool) (Model=, ResultPath=..., 
MakeAbsolute=) at 
/home/Mint14_home/igor/Documents/C/LinuxProjects/external/llvm-project/llvm/lib/Support/Path.cpp:811
  #2 0x7fb372a55eb7 in createUniqueEntity(llvm::Twine const&, int&, 
llvm::SmallVectorImpl&, bool, FSEntity, llvm::sys::fs::OpenFlags, 
unsigned int) (Model=..., ResultFD=@0x7fb36effc2d0: 0, ResultPath=..., 
MakeAbsolute=, Type=FS_File, Flags=llvm::sys::fs::OF_None, 
Mode=384) at 
/home/Mint14_home/igor/Documents/C/LinuxProjects/external/llvm-project/llvm/lib/Support/Path.cpp:181
  #3 0x7fb372a5623b in llvm::sys::fs::createTemporaryFile 
(Flags=llvm::sys::fs::OF_None, Type=FS_File, ResultPath=..., 
ResultFD=@0x7fb36effc2d0: 0, Model=...) at 
/home/Mint14_home/igor/Documents/C/LinuxProjects/external/llvm-project/llvm/include/llvm/ADT/Twine.h:233
  #4 llvm::sys::fs::createTemporaryFile(llvm::Twine const&, llvm::StringRef, 
int&, llvm::SmallVectorImpl&, FSEntity, llvm::sys::fs::OpenFlags) 
(Prefix=, Suffix=..., ResultFD=@0x7fb36effc2d0: 0, 
ResultPath=..., Type=Type@entry=FS_File, Flags=llvm::sys::fs::OF_None) at 
/home/Mint14_home/igor/Documents/C/LinuxProjects/external/llvm-project/llvm/lib/Support/Path.cpp:865
  #5 0x7fb372a56451 in llvm::sys::fs::createTemporaryFile(llvm::Twine 
const&, llvm::StringRef, int&, llvm::SmallVectorImpl&, 
llvm::sys::fs::OpenFlags) (Prefix=, Suffix=..., 
ResultFD=, ResultPath=, Flags=) at 
/home/Mint14_home/igor/Documents/C/LinuxProjects/external/llvm-project/llvm/lib/Support/Path.cpp:873
  #6 0x7fb3717cc1b7 in (anonymous namespace)::TempPCHFile::create () at 
/home/Mint14_home/igor/Documents/C/LinuxProjects/external/llvm-project/llvm/include/llvm/ADT/Twine.h:233
  #7 clang::PrecompiledPreamble::Build(clang::CompilerInvocation const&, 
llvm::MemoryBuffer const*, clang::PreambleBounds, clang::DiagnosticsEngine&, 
llvm::IntrusiveRefCntPtr, 
std::shared_ptr, bool, 
clang::PreambleCallbacks&) (Invocation=..., MainFileBuffer=0x7fb35000a360, 
Bounds=Bounds@entry=..., Diagnostics=..., VFS=..., 
PCHContainerOps=std::shared_ptr (use count 5, 
weak count 0) = {...}, StoreInMemory=false, Callbacks=...) at 
/home/igor/Documents/C/LinuxProjects/external/llvm-project/clang/lib/Frontend/PrecompiledPreamble.cpp:421
  #8 0x7fb3717234a8 in 
clang::ASTUnit::getMainBufferWithPrecompiledPreamble(std::shared_ptr,
 clang::CompilerInvocation&, llvm::IntrusiveRefCntPtr, 
bool, unsigned int) (this=0x7fb3505c44d0, 
PCHContainerOps=std::shared_ptr (use count 5, 
weak count 0) = {...}, PreambleInvocationIn=..., VFS=..., 
AllowRebuild=, MaxLines=0) at 
/usr/include/c++/12.2.0/bits/unique_ptr.h:191
  #9 0x7fb371729bd6 in 
clang::ASTUnit::LoadFromCompilerInvocation(std::shared_ptr,
 unsigned int, llvm::IntrusiveRefCntPtr) 
(this=0x7fb3505c44d0, 
PCHContainerOps=std::shared_ptr (use count 5, 
weak count 0) = {...}, PrecompilePreambleAfterNParses=, VFS=...) 
at 
/home/igor/Documents/C/LinuxProjects/external/llvm-project/clang/lib/Frontend/ASTUnit.cpp:1685
  #10 0x7fb37172e359 in clang::ASTUnit::LoadFromCommandLine(char const**, 
char const**, std::shared_ptr, 
llvm::IntrusiveRefCntPtr, llvm::StringRef, bool, 
clang::CaptureDiagsKind, 
llvm::ArrayRef, std::allocator >, llvm::MemoryBuffer*> >, bool, 
unsigned int, clang::TranslationUnitKind, bool, bool, bool, 
clang::SkipFunctionBodiesScope, bool, bool, bool, bool, 
llvm::Optional, std::unique_ptr >*, 
llvm::IntrusiveRefCntPtr) (ArgBegin=, 
ArgEnd=, 
PCHContainerOps=std::shared_ptr (empty) = {...}, 
Diags=..., ResourceFilesPath=..., OnlyLocalDecls=false, 
CaptureDiagnostics=clang::CaptureDiagsKind::All, RemappedFiles=..., 
RemappedFilesKeepOriginalName=true, PrecompilePreambleAfterNParses=1, 
TUKind=clang::TU_Complete, 

[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-10 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

In D139774#4039018 , @vedgy wrote:

> In D139774#4036496 , @aaron.ballman 
> wrote:
>
>> Given that we already have other setters to set global state, I'm less 
>> concerned about adding another one. I hadn't realized we already introduced 
>> the dangers here. But we should document the expectation that the call be 
>> made before creating the index.
>
> There is a difference: `clang_CXIndex_setGlobalOptions()` and 
> `clang_CXIndex_setInvocationEmissionPathOption()` take a `CXIndex` argument 
> and thus can only be called **after** creating an index. So requiring to call 
> `clang_overrideTemporaryDirectory()` before creating an index, plus one more 
> time with `nullptr` argument after disposing of the index, wouldn't be 
> particularly consistent with other setters.

Oh that is a good point! Apologies for not noticing that earlier -- that makes 
me wonder if a better approach here is to add a `std::string` to the `CIndexer` 
class to represent the temp path to use. I started investigating that idea and 
then I realized I have no idea what is calling `system_temp_directory()` in the 
first place. It doesn't seem to be anything from the C indexing API but it's 
possible this is coming from one of the other library components. Can you help 
me track down the call stack where we start creating the temporary files so I 
can better understand? My hope is that we can thread the information through 
rather than using a global setter, if possible.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-10 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4036496 , @aaron.ballman 
wrote:

> Given that we already have other setters to set global state, I'm less 
> concerned about adding another one. I hadn't realized we already introduced 
> the dangers here. But we should document the expectation that the call be 
> made before creating the index.

There is a difference: `clang_CXIndex_setGlobalOptions()` and 
`clang_CXIndex_setInvocationEmissionPathOption()` take a `CXIndex` argument and 
thus can only be called **after** creating an index. So requiring to call 
`clang_overrideTemporaryDirectory()` before creating an index, plus one more 
time with `nullptr` argument after disposing of the index, wouldn't be 
particularly consistent with other setters.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-09 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy marked an inline comment as not done.
vedgy added a comment.

In D139774#4036496 , @aaron.ballman 
wrote:

> In terms of the C API, I think it'd make more sense to name in terms of 
> "override" rather than "set", but I don't feel as strongly about it given the 
> other setters. In terms of the C++ file system API, I think "override" makes 
> the most sense though (we don't have setters to follow the naming convention 
> for) because some systems do allow you to set the system directory.

Let's keep the naming in C and C++ APIs consistent: 
`clang_overrideTemporaryDirectory()` and 
`override_system_temp_directory_erased_on_reboot()`.

> In terms of memory ownership, WDYT of requiring the caller to handle this? 
> e.g., calling `set_system_temp_directory_erased_on_reboot` will `strdup` a 
> nonnull pointer and `free` the stored pointer when given nullptr.

I like this idea. libclang-user code would become easier to use than it is now 
(though less easy compared to libclang managing memory itself). The libclang 
API documentation can require overriding the temp directory before creating an 
index and un-overriding it with `nullptr` after calling `clang_disposeIndex()`.
Now in order to make this libclang API harder to misuse, I lean towards passing 
the temporary directory in `clang_createIndexWithTempDir()` and letting 
`clang_disposeIndex()` handle the un-overriding (call 
`override_system_temp_directory_erased_on_reboot(nullptr)`) automatically. 
Makes sense? I feel that the `clang_createIndexWithTempDir()` name could be 
improved, but don't know how...

What memory [de]allocation method should 
`override_system_temp_directory_erased_on_reboot()` use? `new[]` and 
`delete[]`? Or should `strdup()` from POSIX be used because it is defined as 
`_strdup` on Windows? (along with standard `free()`)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-09 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

Given that we already have other setters to set global state, I'm less 
concerned about adding another one. I hadn't realized we already introduced the 
dangers here. But we should document the expectation that the call be made 
before creating the index.

In terms of the C API, I think it'd make more sense to name in terms of 
"override" rather than "set", but I don't feel as strongly about it given the 
other setters. In terms of the C++ file system API, I think "override" makes 
the most sense though (we don't have setters to follow the naming convention 
for) because some systems do allow you to set the system directory.

In terms of memory ownership, WDYT of requiring the caller to handle this? 
e.g., calling `set_system_temp_directory_erased_on_reboot` will `strdup` a 
nonnull pointer and `free` the stored pointer when given nullptr.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-06 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy marked an inline comment as done and 2 inline comments as not done.
vedgy added inline comments.



Comment at: llvm/include/llvm/Support/Path.h:423
+/// tempDirUtf8 pointer previously passed to this function.
+void set_system_temp_directory_erased_on_reboot(const char *tempDirUtf8);
+

vedgy wrote:
> aaron.ballman wrote:
> > Err, I'm not super excited about this new API. For starters, it's not 
> > setting the system temp directory at all (it doesn't make any modifications 
> > to the host system); instead it overrides the system temp directory. But 
> > also, this is pretty fragile due to allowing the user to override the temp 
> > directory after the compiler has already queried for the system temp 
> > directory, so now you get files in two different places. Further, it's 
> > fragile because the caller is responsible for keeping that pointer valid 
> > for the lifetime of the program. Finally, we don't allow you to override 
> > any other system directory that you can query (like the home directory).
> >  it's not setting the system temp directory at all (it doesn't make any 
> > modifications to the host system); instead it overrides the system temp 
> > directory.
> Rename to `override_system_temp_directory_erased_on_reboot()`?
> 
> > this is pretty fragile due to allowing the user to override the temp 
> > directory after the compiler has already queried for the system temp 
> > directory, so now you get files in two different places.
> Unfortunately, I don't know how to prevent this. In practice calling 
> `clang_setTemporaryDirectory()` before `clang_createIndex()` works perfectly 
> well in KDevelop: the //preamble-*.pch// files are created later and never 
> end up in the default temporary directory ///tmp//. The same issue applies to 
> the current querying of the environment variable values repeatedly: the user 
> can set environment variables at any time.
> 
> >  it's fragile because the caller is responsible for keeping that pointer 
> > valid for the lifetime of the program.
> I'd love to duplicate the temporary directory path buffer in Path.cpp. The 
> API would become much easier to use. But then the buffer would have to be 
> destroyed when libclang is unloaded or on program shutdown, which is 
> forbidden by LLVM Coding Standards: //Do not use Static Constructors//. That 
> is, I think the following code in Path.cpp is not acceptable:
> ```
> static SmallVectorImpl ()
> {
>   static SmallVector value;
>   return value;
> }
> ```
> 
> > we don't allow you to override any other system directory that you can 
> > query (like the home directory).
> Well, one has to start somewhere :) Seriously, overriding directories should 
> be allowed only if it has a practical use case. Not just because overriding 
> others is allowed.
A copy of user-provided temporary directory path buffer can be stored in `class 
CIndexer`. But this API in //llvm/Support/Path.h// would stay fragile.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-04 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4025356 , @vedgy wrote:

> `clang_disposeIndex()` would have to un-override the temporary directory 
> then. I'll have to check whether this un-overriding happens too early (before 
> all //preamble-*.pch// files are removed).

Checked by calling `clang_setTemporaryDirectory(nullptr);` right after 
`clang_disposeIndex(m_index);`. Works correctly in KDevelop: all 
//preamble-*.pch// files are removed from the overriding temporary directory on 
exit. So this alternative API is viable.




Comment at: clang/include/clang-c/Index.h:78-79
+ * this function in order to reset the temporary directory to the default value
+ * from the environment. Such a resetting should be done before deleting a
+ * tempDirUtf8 pointer previously passed to this function.
+ */

vedgy wrote:
> aaron.ballman wrote:
> > Should we mention anything about relative vs absolute path requirements? 
> > Separators? 
> On Windows separators are converted automatically. I suppose we don't need to 
> warn users not to pass Windows-style separators on Unix.
> 
> On Windows this function handles both relative and absolute paths. On Unix 
> the existing implementation doesn't check whether the path is absolute or 
> relative. Perhaps it assumes that the path in the environment variable is 
> always absolute. Or absolute vs relative doesn't matter. I'll check what 
> happens if a relative path is passed.
Tested passing a relative path on GNU/Linux: works as expected. libclang stores 
and removes the //preamble-*.pch// in the same absolute path as 
[QDir::absolutePath()](https://doc.qt.io/qt-5/qdir.html#absolutePath) returns 
in KDevelop. Since passing any reasonably formatted path to this function 
works, I suppose mentioning the path requirements in the documentation is 
unnecessary.

While testing, I noticed another requirement that **should** be mentioned in 
this documentation, as well as `set_system_temp_directory_erased_on_reboot`'s 
documentation: the temporary directory must exist, because Clang doesn't create 
it. I couldn't find where Clang writes the preamble files when the temporary 
directory does not exist. Perhaps it doesn't write them to disk at all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-04 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy added a comment.

In D139774#4023940 , @aaron.ballman 
wrote:

> This feels like a configuration property of the libclang execution -- so it'd 
> be nice to require it to be set up from `clang_createIndex()` (IIRC that's 
> the entrypoint for CIndex functionality, but I'm not 100% sure), rather than 
> an API that the user can call repeatedly.

In order to keep backward compatibility, this would require introducing another 
`createIndex` function, e.g. `clang_createIndexWithTempDir()`. 
`clang_disposeIndex()` would have to un-override the temporary directory then. 
I'll have to check whether this un-overriding happens too early (before all 
//preamble-*.pch// files are removed). But notice that separate setup functions 
already exist: `clang_CXIndex_setGlobalOptions()`, 
`clang_CXIndex_setInvocationEmissionPathOption()`. The documentation for 
`clang_setTemporaryDirectory()` can recommend calling it before 
`clang_createIndex()`.
Either alternative works for KDevelop, because it calls `clang_createIndex()` 
once.




Comment at: clang/include/clang-c/Index.h:78-79
+ * this function in order to reset the temporary directory to the default value
+ * from the environment. Such a resetting should be done before deleting a
+ * tempDirUtf8 pointer previously passed to this function.
+ */

aaron.ballman wrote:
> Should we mention anything about relative vs absolute path requirements? 
> Separators? 
On Windows separators are converted automatically. I suppose we don't need to 
warn users not to pass Windows-style separators on Unix.

On Windows this function handles both relative and absolute paths. On Unix the 
existing implementation doesn't check whether the path is absolute or relative. 
Perhaps it assumes that the path in the environment variable is always 
absolute. Or absolute vs relative doesn't matter. I'll check what happens if a 
relative path is passed.



Comment at: llvm/include/llvm/Support/Path.h:423
+/// tempDirUtf8 pointer previously passed to this function.
+void set_system_temp_directory_erased_on_reboot(const char *tempDirUtf8);
+

aaron.ballman wrote:
> Err, I'm not super excited about this new API. For starters, it's not setting 
> the system temp directory at all (it doesn't make any modifications to the 
> host system); instead it overrides the system temp directory. But also, this 
> is pretty fragile due to allowing the user to override the temp directory 
> after the compiler has already queried for the system temp directory, so now 
> you get files in two different places. Further, it's fragile because the 
> caller is responsible for keeping that pointer valid for the lifetime of the 
> program. Finally, we don't allow you to override any other system directory 
> that you can query (like the home directory).
>  it's not setting the system temp directory at all (it doesn't make any 
> modifications to the host system); instead it overrides the system temp 
> directory.
Rename to `override_system_temp_directory_erased_on_reboot()`?

> this is pretty fragile due to allowing the user to override the temp 
> directory after the compiler has already queried for the system temp 
> directory, so now you get files in two different places.
Unfortunately, I don't know how to prevent this. In practice calling 
`clang_setTemporaryDirectory()` before `clang_createIndex()` works perfectly 
well in KDevelop: the //preamble-*.pch// files are created later and never end 
up in the default temporary directory ///tmp//. The same issue applies to the 
current querying of the environment variable values repeatedly: the user can 
set environment variables at any time.

>  it's fragile because the caller is responsible for keeping that pointer 
> valid for the lifetime of the program.
I'd love to duplicate the temporary directory path buffer in Path.cpp. The API 
would become much easier to use. But then the buffer would have to be destroyed 
when libclang is unloaded or on program shutdown, which is forbidden by LLVM 
Coding Standards: //Do not use Static Constructors//. That is, I think the 
following code in Path.cpp is not acceptable:
```
static SmallVectorImpl ()
{
  static SmallVector value;
  return value;
}
```

> we don't allow you to override any other system directory that you can query 
> (like the home directory).
Well, one has to start somewhere :) Seriously, overriding directories should be 
allowed only if it has a practical use case. Not just because overriding others 
is allowed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2023-01-03 Thread Aaron Ballman via Phabricator via cfe-commits
aaron.ballman added a comment.

I'm sympathetic to the problem you're trying to solve (not having to set 
environment variable for the temp directory), but I'm also not convinced this 
is the correct way to approach it. This feels like a configuration property of 
the libclang execution -- so it'd be nice to require it to be set up from 
`clang_createIndex()` (IIRC that's the entrypoint for CIndex functionality, but 
I'm not 100% sure), rather than an API that the user can call repeatedly. Did 
you consider a design like that?




Comment at: clang/docs/ReleaseNotes.rst:863
+- Introduced the new function ``clang_setTemporaryDirectory``,
+  which allows to override the temporary directory path used by Clang.
 - ``clang_Cursor_getNumTemplateArguments``, 
``clang_Cursor_getTemplateArgumentKind``,





Comment at: clang/include/clang-c/Index.h:78-79
+ * this function in order to reset the temporary directory to the default value
+ * from the environment. Such a resetting should be done before deleting a
+ * tempDirUtf8 pointer previously passed to this function.
+ */

Should we mention anything about relative vs absolute path requirements? 
Separators? 



Comment at: llvm/include/llvm/Support/Path.h:423
+/// tempDirUtf8 pointer previously passed to this function.
+void set_system_temp_directory_erased_on_reboot(const char *tempDirUtf8);
+

Err, I'm not super excited about this new API. For starters, it's not setting 
the system temp directory at all (it doesn't make any modifications to the host 
system); instead it overrides the system temp directory. But also, this is 
pretty fragile due to allowing the user to override the temp directory after 
the compiler has already queried for the system temp directory, so now you get 
files in two different places. Further, it's fragile because the caller is 
responsible for keeping that pointer valid for the lifetime of the program. 
Finally, we don't allow you to override any other system directory that you can 
query (like the home directory).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D139774: [libclang] Add API to set temporary directory location

2022-12-11 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy updated this revision to Diff 481913.
vedgy edited the summary of this revision.
vedgy added a comment.

Extract identical code from the two Path.inc files into Path.cpp

One of the Path.inc files is #include-d into this Path.cpp file and nowhere 
else.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang-c/Index.h
  clang/tools/libclang/CIndex.cpp
  clang/tools/libclang/libclang.map
  llvm/include/llvm/Support/Path.h
  llvm/lib/Support/Path.cpp
  llvm/lib/Support/Unix/Path.inc
  llvm/lib/Support/Windows/Path.inc
  llvm/unittests/Support/Path.cpp

Index: llvm/unittests/Support/Path.cpp
===
--- llvm/unittests/Support/Path.cpp
+++ llvm/unittests/Support/Path.cpp
@@ -591,6 +591,26 @@
   EXPECT_TRUE(!TempDir.empty());
 }
 
+TEST(Support, SetTempDirectory) {
+  SmallString<64> DefaultTempDir;
+  path::system_temp_directory(true, DefaultTempDir);
+  EXPECT_TRUE(!DefaultTempDir.empty());
+
+  auto CustomTempDir = DefaultTempDir;
+  path::append(CustomTempDir, "/llvm/test_temp_dir");
+  path::native(CustomTempDir);
+  path::set_system_temp_directory_erased_on_reboot(CustomTempDir.c_str());
+
+  SmallString<64> TempDir;
+  path::system_temp_directory(true, TempDir);
+  EXPECT_EQ(CustomTempDir, TempDir);
+
+  path::set_system_temp_directory_erased_on_reboot(nullptr);
+  TempDir.clear();
+  path::system_temp_directory(true, TempDir);
+  EXPECT_EQ(DefaultTempDir, TempDir);
+}
+
 #ifdef _WIN32
 static std::string path2regex(std::string Path) {
   size_t Pos = 0;
Index: llvm/lib/Support/Windows/Path.inc
===
--- llvm/lib/Support/Windows/Path.inc
+++ llvm/lib/Support/Windows/Path.inc
@@ -1472,11 +1472,16 @@
   (void)ErasedOnReboot;
   Result.clear();
 
+  if (tempDirErasedOnRebootUtf8) {
+const auto len = strlen(tempDirErasedOnRebootUtf8);
+Result.append(tempDirErasedOnRebootUtf8, tempDirErasedOnRebootUtf8 + len);
+  }
+
   // Check whether the temporary directory is specified by an environment var.
   // This matches GetTempPath logic to some degree. GetTempPath is not used
   // directly as it cannot handle evn var longer than 130 chars on Windows 7
   // (fixed on Windows 8).
-  if (getTempDirEnvVar(Result)) {
+  if (!Result.empty() || getTempDirEnvVar(Result)) {
 assert(!Result.empty() && "Unexpected empty path");
 native(Result); // Some Unix-like shells use Unix path separator in $TMP.
 fs::make_absolute(Result); // Make it absolute if not already.
Index: llvm/lib/Support/Unix/Path.inc
===
--- llvm/lib/Support/Unix/Path.inc
+++ llvm/lib/Support/Unix/Path.inc
@@ -1451,8 +1451,11 @@
   Result.clear();
 
   if (ErasedOnReboot) {
+const char *RequestedDir = tempDirErasedOnRebootUtf8;
 // There is no env variable for the cache directory.
-if (const char *RequestedDir = getEnvTempDir()) {
+if (!RequestedDir)
+  RequestedDir = getEnvTempDir();
+if (RequestedDir) {
   Result.append(RequestedDir, RequestedDir + strlen(RequestedDir));
   return;
 }
Index: llvm/lib/Support/Path.cpp
===
--- llvm/lib/Support/Path.cpp
+++ llvm/lib/Support/Path.cpp
@@ -780,6 +780,12 @@
   return true;
 }
 
+static const char *tempDirErasedOnRebootUtf8 = nullptr;
+
+void set_system_temp_directory_erased_on_reboot(const char *tempDirUtf8) {
+  tempDirErasedOnRebootUtf8 = tempDirUtf8;
+}
+
 } // end namespace path
 
 namespace fs {
Index: llvm/include/llvm/Support/Path.h
===
--- llvm/include/llvm/Support/Path.h
+++ llvm/include/llvm/Support/Path.h
@@ -412,6 +412,16 @@
 /// @param result Holds the resulting path name.
 void system_temp_directory(bool erasedOnReboot, SmallVectorImpl );
 
+/// Override the temporary directory path returned by
+/// system_temp_directory(true, result).
+///
+/// @param tempDirUtf8 UTF-8-encoded path to the desired temporary directory.
+/// The pointer is owned by the caller and must be always valid. Pass nullptr to
+/// this function in order to reset the temporary directory to the default value
+/// from the environment. Such a resetting should be done before deleting a
+/// tempDirUtf8 pointer previously passed to this function.
+void set_system_temp_directory_erased_on_reboot(const char *tempDirUtf8);
+
 /// Get the user's home directory.
 ///
 /// @param result Holds the resulting path name.
Index: clang/tools/libclang/libclang.map
===
--- clang/tools/libclang/libclang.map
+++ clang/tools/libclang/libclang.map
@@ -407,6 +407,7 @@
 
 LLVM_16 {
   global:
+clang_setTemporaryDirectory;
 clang_getUnqualifiedType;
 

[PATCH] D139774: [libclang] Add API to set temporary directory location

2022-12-10 Thread Igor Kushnir via Phabricator via cfe-commits
vedgy created this revision.
vedgy added a reviewer: aaron.ballman.
Herald added subscribers: arphaman, hiraditya.
Herald added a project: All.
vedgy requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

Fixes: https://github.com/llvm/llvm-project/issues/51847


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D139774

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang-c/Index.h
  clang/tools/libclang/CIndex.cpp
  clang/tools/libclang/libclang.map
  llvm/include/llvm/Support/Path.h
  llvm/lib/Support/Unix/Path.inc
  llvm/lib/Support/Windows/Path.inc
  llvm/unittests/Support/Path.cpp

Index: llvm/unittests/Support/Path.cpp
===
--- llvm/unittests/Support/Path.cpp
+++ llvm/unittests/Support/Path.cpp
@@ -591,6 +591,26 @@
   EXPECT_TRUE(!TempDir.empty());
 }
 
+TEST(Support, SetTempDirectory) {
+  SmallString<64> DefaultTempDir;
+  path::system_temp_directory(true, DefaultTempDir);
+  EXPECT_TRUE(!DefaultTempDir.empty());
+
+  auto CustomTempDir = DefaultTempDir;
+  path::append(CustomTempDir, "/llvm/test_temp_dir");
+  path::native(CustomTempDir);
+  path::set_system_temp_directory_erased_on_reboot(CustomTempDir.c_str());
+
+  SmallString<64> TempDir;
+  path::system_temp_directory(true, TempDir);
+  EXPECT_EQ(CustomTempDir, TempDir);
+
+  path::set_system_temp_directory_erased_on_reboot(nullptr);
+  TempDir.clear();
+  path::system_temp_directory(true, TempDir);
+  EXPECT_EQ(DefaultTempDir, TempDir);
+}
+
 #ifdef _WIN32
 static std::string path2regex(std::string Path) {
   size_t Pos = 0;
Index: llvm/lib/Support/Windows/Path.inc
===
--- llvm/lib/Support/Windows/Path.inc
+++ llvm/lib/Support/Windows/Path.inc
@@ -1468,15 +1468,22 @@
   return false;
 }
 
+static const char *tempDirErasedOnRebootUtf8 = nullptr;
+
 void system_temp_directory(bool ErasedOnReboot, SmallVectorImpl ) {
   (void)ErasedOnReboot;
   Result.clear();
 
+  if (tempDirErasedOnRebootUtf8) {
+const auto len = strlen(tempDirErasedOnRebootUtf8);
+Result.append(tempDirErasedOnRebootUtf8, tempDirErasedOnRebootUtf8 + len);
+  }
+
   // Check whether the temporary directory is specified by an environment var.
   // This matches GetTempPath logic to some degree. GetTempPath is not used
   // directly as it cannot handle evn var longer than 130 chars on Windows 7
   // (fixed on Windows 8).
-  if (getTempDirEnvVar(Result)) {
+  if (!Result.empty() || getTempDirEnvVar(Result)) {
 assert(!Result.empty() && "Unexpected empty path");
 native(Result); // Some Unix-like shells use Unix path separator in $TMP.
 fs::make_absolute(Result); // Make it absolute if not already.
@@ -1488,6 +1495,10 @@
   Result.append(DefaultResult, DefaultResult + strlen(DefaultResult));
   llvm::sys::path::make_preferred(Result);
 }
+
+void set_system_temp_directory_erased_on_reboot(const char *tempDirUtf8) {
+  tempDirErasedOnRebootUtf8 = tempDirUtf8;
+}
 } // end namespace path
 
 namespace windows {
Index: llvm/lib/Support/Unix/Path.inc
===
--- llvm/lib/Support/Unix/Path.inc
+++ llvm/lib/Support/Unix/Path.inc
@@ -1447,12 +1447,17 @@
   return "/var/tmp";
 }
 
+static const char *tempDirErasedOnRebootUtf8 = nullptr;
+
 void system_temp_directory(bool ErasedOnReboot, SmallVectorImpl ) {
   Result.clear();
 
   if (ErasedOnReboot) {
+const char *RequestedDir = tempDirErasedOnRebootUtf8;
 // There is no env variable for the cache directory.
-if (const char *RequestedDir = getEnvTempDir()) {
+if (!RequestedDir)
+  RequestedDir = getEnvTempDir();
+if (RequestedDir) {
   Result.append(RequestedDir, RequestedDir + strlen(RequestedDir));
   return;
 }
@@ -1465,6 +1470,10 @@
   Result.append(RequestedDir, RequestedDir + strlen(RequestedDir));
 }
 
+void set_system_temp_directory_erased_on_reboot(const char *tempDirUtf8) {
+  tempDirErasedOnRebootUtf8 = tempDirUtf8;
+}
+
 } // end namespace path
 
 namespace fs {
Index: llvm/include/llvm/Support/Path.h
===
--- llvm/include/llvm/Support/Path.h
+++ llvm/include/llvm/Support/Path.h
@@ -412,6 +412,16 @@
 /// @param result Holds the resulting path name.
 void system_temp_directory(bool erasedOnReboot, SmallVectorImpl );
 
+/// Override the temporary directory path returned by
+/// system_temp_directory(true, result).
+///
+/// @param tempDirUtf8 UTF-8-encoded path to the desired temporary directory.
+/// The pointer is owned by the caller and must be always valid. Pass nullptr to
+/// this function in order to reset the temporary directory to the default value
+/// from the environment. Such a resetting should be done before deleting a
+/// tempDirUtf8 pointer previously passed to this function.
+void