Re: Handling slashes in cache names
>> How about using both escaping and a text file with the name? One can think of the escaped name as of a kind of ID, which happens to be human-readable when the name is in ASCII, and as unreadable as an UUID when the name is in UTF. This way we have all the readability in the common case (when name is all English letters and digits), and some limited readability (via looking into text files) when other alphabets are used. Sounds good to me. --Yakov
RE: Handling slashes in cache names
How about using both escaping and a text file with the name? One can think of the escaped name as of a kind of ID, which happens to be human-readable when the name is in ASCII, and as unreadable as an UUID when the name is in UTF. This way we have all the readability in the common case (when name is all English letters and digits), and some limited readability (via looking into text files) when other alphabets are used. Thanks, Stan From: Pavel Tupitsyn Sent: 16 января 2018 г. 14:01 To: dev@ignite.apache.org Subject: Re: Handling slashes in cache names > folder named by ID and txt file inside should do the trick Agree On Tue, Jan 16, 2018 at 1:02 PM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > On Mon, Jan 15, 2018 at 7:31 AM, Pavel Tupitsyn <ptupit...@apache.org> > wrote: > > > > You will never ever relate smth like "fdee0456adcc" to "мои_данные". > > > > As a user, why do I need to understand file names in Ignite work > directory? > > > > Because it is better to have an understandable and human readable directory > structure than not. Let's do it right. >
Re: Handling slashes in cache names
> folder named by ID and txt file inside should do the trick Agree On Tue, Jan 16, 2018 at 1:02 PM, Dmitriy Setrakyanwrote: > On Mon, Jan 15, 2018 at 7:31 AM, Pavel Tupitsyn > wrote: > > > > You will never ever relate smth like "fdee0456adcc" to "мои_данные". > > > > As a user, why do I need to understand file names in Ignite work > directory? > > > > Because it is better to have an understandable and human readable directory > structure than not. Let's do it right. >
Re: Handling slashes in cache names
On Mon, Jan 15, 2018 at 7:31 AM, Pavel Tupitsynwrote: > > You will never ever relate smth like "fdee0456adcc" to "мои_данные". > > As a user, why do I need to understand file names in Ignite work directory? > Because it is better to have an understandable and human readable directory structure than not. Let's do it right.
Re: Handling slashes in cache names
On Mon, Jan 15, 2018 at 7:11 AM, Pavel Tupitsynwrote: > > try creating a directory on all nodes > And then a new node appears with a different kind of file system.. > If a new node cannot create an existing cache, it should not be allowed to start.
Re: Handling slashes in cache names
To understand how much storage you need for cache group "X" and watch the trends. Anyway, folder named by ID and txt file inside should do the trick =) --Yakov
Re: Handling slashes in cache names
> You will never ever relate smth like "fdee0456adcc" to "мои_данные". As a user, why do I need to understand file names in Ignite work directory? On Mon, Jan 15, 2018 at 6:22 PM, Yakov Zhdanovwrote: > >> And then a new node appears with a different kind of file system.. > This is hardly possible. And I suggest not to > > >> Escaping removes all limitations and does not affect usability. > Disagree. You will never ever relate smth like "fdee0456adcc" to > "мои_данные". > > Guys, I just realized that we create folder for cache group. How about we > choose group ID for folder name and put text file cachegroup.info > containing group name to it? > > --Yakov >
Re: Handling slashes in cache names
>> And then a new node appears with a different kind of file system.. This is hardly possible. And I suggest not to >> Escaping removes all limitations and does not affect usability. Disagree. You will never ever relate smth like "fdee0456adcc" to "мои_данные". Guys, I just realized that we create folder for cache group. How about we choose group ID for folder name and put text file cachegroup.info containing group name to it? --Yakov
Re: Handling slashes in cache names
> try creating a directory on all nodes And then a new node appears with a different kind of file system.. Escaping removes all limitations and does not affect usability. Pavel On Mon, Jan 15, 2018 at 5:47 PM, Yakov Zhdanovwrote: > Agree that cache names should be case insensitive - currently it seems that > we have issues on Windows OS. > > As far as allowed characters - why don't we try creating a directory on all > nodes (but calling toLower() prior to creation)? If creation succeeds > everywhere then cache name is acceptable. New nodes should throw exception > if folder creation is impossible. > > I don't like escaping since it will not add any usability for, let's say, > Chinese or Russian names. For example, MySQL supports ASCII: > [0-9,a-z,A-Z$_] (basic Latin letters, digits 0-9, dollar, underscore) and > Extended: U+0080 .. U+ [1] > > I also would think over some intersection of allowed file name characters > in different file systems [2] > > [1] https://dev.mysql.com/doc/refman/5.7/en/identifiers.html > [2] https://en.wikipedia.org/wiki/Filename > > Yakov Zhdanov >
Re: Handling slashes in cache names
Agree that cache names should be case insensitive - currently it seems that we have issues on Windows OS. As far as allowed characters - why don't we try creating a directory on all nodes (but calling toLower() prior to creation)? If creation succeeds everywhere then cache name is acceptable. New nodes should throw exception if folder creation is impossible. I don't like escaping since it will not add any usability for, let's say, Chinese or Russian names. For example, MySQL supports ASCII: [0-9,a-z,A-Z$_] (basic Latin letters, digits 0-9, dollar, underscore) and Extended: U+0080 .. U+ [1] I also would think over some intersection of allowed file name characters in different file systems [2] [1] https://dev.mysql.com/doc/refman/5.7/en/identifiers.html [2] https://en.wikipedia.org/wiki/Filename Yakov Zhdanov
RE: Handling slashes in cache names
Let me return back to this issue. > Well, having to support multiple cache name formats going forward will be > difficult. I don’t think there is a question of multiple name formats. Let’s just say that there are issues that can be solved on the base cache level (e.g. making cache names always case-insensitive) and there are issues that have to be solved by the PDS (e.g. special and non-ASCII symbols that we don’t want to always ban from names). I’m not suggesting to introduce anything to PDS that will afterwards be handled by the base cache code. We’ll just handle some issues first, in PDS, and other issues will be handled separately. > My preference would be to limit to 255 characters right now That would be good, but it doesn’t really solve the issue with the length. Since non-ASCII characters (and non-alphanumeric ASCII) are encoded, the actual length of a cache’s directory name may be greater than the name of the cache (and don’t forget the “cache-“ prefix). We could come up with a “really safe” limit, but it might be too small (around 80?), and that would be limiting the API based on a rather arbitrary Implementation detail. Another reason why I like to have a hash in the file name is that we might run into problems with two names, one of which is an escaped version of the other, like “my/cache” and “my_2f_cache”. And I guess there can be more similar collisions that we just don’t think of right now. Having a hash in the name just works as a (probabilistic) failsafe for that. Thanks, Stan From: Dmitriy Setrakyan Sent: 2 января 2018 г. 16:40 To: dev@ignite.apache.org Subject: Re: Handling slashes in cache names On Fri, Dec 29, 2017 at 2:28 AM, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > > I would surround such replacements with "_", e.g. > "myCacheName_somesymbol_". > Looks nice, will do. > > > Here I am confused. I think the cache names should be case insensitive at > > all times. I seriously doubt enforcing this rule would cause problems. If > > we enforce this rule at cache creation time, then we would not have to > add > > a hashcode at the end. > I think I would still keep the hashcode. E.g. I’m now also truncating > names longer than 255 chars, and the truncated names could be equal. There > could be more edge cases, and adding an imprint of the identity might help > to avoid them. The names are readable enough with the hashes, but scary > enough for users not to mess with them manually – I guess that’s a good > thing :) Making cache names always case-insensitive sounds good, but I’d separate it > to another JIRA issue (it has larger compatibility impact, it affects a > different part of the code base, etc). Is it OK? > Well, having to support multiple cache name formats going forward will be difficult. I would rather we finalize on it right now. My preference would be to limit to 255 characters right now and make cache names case insensitive. I doubt such change would affect many users, but it would definitely make things cleaner. Would be nice to here what others in the community think. Vladimir O., Alexey G.? D.
Re: Handling slashes in cache names
On Fri, Dec 29, 2017 at 2:28 AM, Stanislav Lukyanovwrote: > > I would surround such replacements with "_", e.g. > "myCacheName_somesymbol_". > Looks nice, will do. > > > Here I am confused. I think the cache names should be case insensitive at > > all times. I seriously doubt enforcing this rule would cause problems. If > > we enforce this rule at cache creation time, then we would not have to > add > > a hashcode at the end. > I think I would still keep the hashcode. E.g. I’m now also truncating > names longer than 255 chars, and the truncated names could be equal. There > could be more edge cases, and adding an imprint of the identity might help > to avoid them. The names are readable enough with the hashes, but scary > enough for users not to mess with them manually – I guess that’s a good > thing :) Making cache names always case-insensitive sounds good, but I’d separate it > to another JIRA issue (it has larger compatibility impact, it affects a > different part of the code base, etc). Is it OK? > Well, having to support multiple cache name formats going forward will be difficult. I would rather we finalize on it right now. My preference would be to limit to 255 characters right now and make cache names case insensitive. I doubt such change would affect many users, but it would definitely make things cleaner. Would be nice to here what others in the community think. Vladimir O., Alexey G.? D.
RE: Handling slashes in cache names
> I would surround such replacements with "_", e.g. "myCacheName_somesymbol_". Looks nice, will do. > Here I am confused. I think the cache names should be case insensitive at > all times. I seriously doubt enforcing this rule would cause problems. If > we enforce this rule at cache creation time, then we would not have to add > a hashcode at the end. I think I would still keep the hashcode. E.g. I’m now also truncating names longer than 255 chars, and the truncated names could be equal. There could be more edge cases, and adding an imprint of the identity might help to avoid them. The names are readable enough with the hashes, but scary enough for users not to mess with them manually – I guess that’s a good thing :) Making cache names always case-insensitive sounds good, but I’d separate it to another JIRA issue (it has larger compatibility impact, it affects a different part of the code base, etc). Is it OK? Thanks, Stan From: Dmitriy Setrakyan Sent: 28 декабря 2017 г. 22:33 To: dev@ignite.apache.org Subject: Re: Handling slashes in cache names On Thu, Dec 28, 2017 at 9:22 AM, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > Hi all , > > I’ve implemented an approach of encoding unsafe characters in the cache > names for persistent storage directories. You can find it at > https://github.com/gridgain/apache-ignite/tree/ignite-7264. > How it works now is: 1) all characters outside of the [a-zA-Z0-9_-] class > are replaced with their hex value (seems to be the easiest way); I would surround such replacements with "_", e.g. "myCacheName_somesymbol_". > 2) a hash of the cache name is added at the end of the name to avoid > case-insensitive collisions. > There is still a tiny chance of hitting two cache names that are equal > ignoring case which also have the same hash, but that’s really unlikely. > Here I am confused. I think the cache names should be case insensitive at all times. I seriously doubt enforcing this rule would cause problems. If we enforce this rule at cache creation time, then we would not have to add a hashcode at the end. > > It seems that there are no complications with this approach. > The cache name to directory mapping is like > mycache -> cache-mycache-f19fd83d > my/cool/cache -> cache-my2fcool2fcache > As mentioned above, I would prefer "cache-my_2f_cool_2f_cache" > my!@#$%^&()cache -> cache-my21402324255e262829cache-84ba3e99 > > Turns out the persistence is not the only place that doesn’t like special > symbols in cache names – I also got an exception from MBean registration > when creating a cache with ‘*’ or ‘?’. Filed https://issues.apache.org/ > jira/browse/IGNITE-7334 for that. > > Please let me know if you have any comments. > > Thanks, > Stan > > From: Stanislav Lukyanov > Sent: 25 декабря 2017 г. 18:09 > To: dev@ignite.apache.org > Subject: Handling slashes in cache names > > Hi all, > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I > need some guidance on what’s the best way to approach it. > > The problem is that cache names are not restricted, but if persistence is > enabled the cache needs to have a corresponding directory on the file > system (“cache-…”) which can’t be created if the cache name contains > certain characters (or a reserved system name). > > A straightforward approach would be to check if a cache name is allowed on > the local system (e.g. via `Paths.get(name)`) and fail to create cache if > it isn’t, but I’m a bit concerned with the consistency of the behavior (the > same cache name be allowed on one system and not on another). > I think a better way would be to replace special characters (say, all > non-alphanumeric characters) with underscores in file names (not changing > the cache configuration). Would this be OK? Are there any risks I’m not > considering? > > WDYT? > > Thanks, > Stan > >
Re: Handling slashes in cache names
On Thu, Dec 28, 2017 at 9:22 AM, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > Hi all , > > I’ve implemented an approach of encoding unsafe characters in the cache > names for persistent storage directories. You can find it at > https://github.com/gridgain/apache-ignite/tree/ignite-7264. > How it works now is: 1) all characters outside of the [a-zA-Z0-9_-] class > are replaced with their hex value (seems to be the easiest way); I would surround such replacements with "_", e.g. "myCacheName_somesymbol_". > 2) a hash of the cache name is added at the end of the name to avoid > case-insensitive collisions. > There is still a tiny chance of hitting two cache names that are equal > ignoring case which also have the same hash, but that’s really unlikely. > Here I am confused. I think the cache names should be case insensitive at all times. I seriously doubt enforcing this rule would cause problems. If we enforce this rule at cache creation time, then we would not have to add a hashcode at the end. > > It seems that there are no complications with this approach. > The cache name to directory mapping is like > mycache -> cache-mycache-f19fd83d > my/cool/cache -> cache-my2fcool2fcache > As mentioned above, I would prefer "cache-my_2f_cool_2f_cache" > my!@#$%^&()cache -> cache-my21402324255e262829cache-84ba3e99 > > Turns out the persistence is not the only place that doesn’t like special > symbols in cache names – I also got an exception from MBean registration > when creating a cache with ‘*’ or ‘?’. Filed https://issues.apache.org/ > jira/browse/IGNITE-7334 for that. > > Please let me know if you have any comments. > > Thanks, > Stan > > From: Stanislav Lukyanov > Sent: 25 декабря 2017 г. 18:09 > To: dev@ignite.apache.org > Subject: Handling slashes in cache names > > Hi all, > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I > need some guidance on what’s the best way to approach it. > > The problem is that cache names are not restricted, but if persistence is > enabled the cache needs to have a corresponding directory on the file > system (“cache-…”) which can’t be created if the cache name contains > certain characters (or a reserved system name). > > A straightforward approach would be to check if a cache name is allowed on > the local system (e.g. via `Paths.get(name)`) and fail to create cache if > it isn’t, but I’m a bit concerned with the consistency of the behavior (the > same cache name be allowed on one system and not on another). > I think a better way would be to replace special characters (say, all > non-alphanumeric characters) with underscores in file names (not changing > the cache configuration). Would this be OK? Are there any risks I’m not > considering? > > WDYT? > > Thanks, > Stan > >
RE: Handling slashes in cache names
Hi all , I’ve implemented an approach of encoding unsafe characters in the cache names for persistent storage directories. You can find it at https://github.com/gridgain/apache-ignite/tree/ignite-7264. How it works now is: 1) all characters outside of the [a-zA-Z0-9_-] class are replaced with their hex value (seems to be the easiest way); 2) a hash of the cache name is added at the end of the name to avoid case-insensitive collisions. There is still a tiny chance of hitting two cache names that are equal ignoring case which also have the same hash, but that’s really unlikely. It seems that there are no complications with this approach. The cache name to directory mapping is like mycache -> cache-mycache-f19fd83d my/cool/cache -> cache-my2fcool2fcache my!@#$%^&()cache -> cache-my21402324255e262829cache-84ba3e99 Turns out the persistence is not the only place that doesn’t like special symbols in cache names – I also got an exception from MBean registration when creating a cache with ‘*’ or ‘?’. Filed https://issues.apache.org/jira/browse/IGNITE-7334 for that. Please let me know if you have any comments. Thanks, Stan From: Stanislav Lukyanov Sent: 25 декабря 2017 г. 18:09 To: dev@ignite.apache.org Subject: Handling slashes in cache names Hi all, I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I need some guidance on what’s the best way to approach it. The problem is that cache names are not restricted, but if persistence is enabled the cache needs to have a corresponding directory on the file system (“cache-…”) which can’t be created if the cache name contains certain characters (or a reserved system name). A straightforward approach would be to check if a cache name is allowed on the local system (e.g. via `Paths.get(name)`) and fail to create cache if it isn’t, but I’m a bit concerned with the consistency of the behavior (the same cache name be allowed on one system and not on another). I think a better way would be to replace special characters (say, all non-alphanumeric characters) with underscores in file names (not changing the cache configuration). Would this be OK? Are there any risks I’m not considering? WDYT? Thanks, Stan
Re: Handling slashes in cache names
On Wed, Dec 27, 2017 at 8:05 AM, Pavel Tupitsynwrote: > Yep, base64 is just an example. > We need some kind of urlencode, but tailored for file names, so that > names remain readable. > > To avoid uppercase/lowercase collisions on Windows, we can restrict allowed > characters to lowercase English letters and numbers, - and _, and escape > everything > else in some way. > I think that we should allow users to specify any case they like, but internally we should always convert to upper or lower case, whichever one we choose.
Re: Handling slashes in cache names
Igniters Use cache name for file and directory names on a file system is bad idea. In that case we should keep in mind many limitiations vary FS. Why do not use mapping cache name to an identifier tolerated to FS lacks? On Wed, Dec 27, 2017 at 7:05 PM, Pavel Tupitsynwrote: > Yep, base64 is just an example. > We need some kind of urlencode, but tailored for file names, so that > names remain readable. > > To avoid uppercase/lowercase collisions on Windows, we can restrict allowed > characters > to lowercase English letters and numbers, - and _, and escape everything > else in some way. > > On Wed, Dec 27, 2017 at 5:36 PM, Dmitriy Setrakyan > wrote: > > > On Wed, Dec 27, 2017 at 6:25 AM, Vladimir Ozerov > > wrote: > > > > > Having different policies for persistent and non-persistent caches > sounds > > > like a bad idea for me, because there could be troubles should user try > > to > > > switch to persistent mode. It would require code changes. > > > > > > Can we just escape all non-latin symbols (e.g. using base64), while > > leaving > > > the rest as is? With this approach in most cases cache name will remain > > the > > > same, and only multibyte characters would be affected. > > > > > > > Agree, if we can keep cache names in human readable form. Would be nice > to > > see some examples. > > > -- Sergey Kozlov GridGain Systems www.gridgain.com
Re: Handling slashes in cache names
Yep, base64 is just an example. We need some kind of urlencode, but tailored for file names, so that names remain readable. To avoid uppercase/lowercase collisions on Windows, we can restrict allowed characters to lowercase English letters and numbers, - and _, and escape everything else in some way. On Wed, Dec 27, 2017 at 5:36 PM, Dmitriy Setrakyanwrote: > On Wed, Dec 27, 2017 at 6:25 AM, Vladimir Ozerov > wrote: > > > Having different policies for persistent and non-persistent caches sounds > > like a bad idea for me, because there could be troubles should user try > to > > switch to persistent mode. It would require code changes. > > > > Can we just escape all non-latin symbols (e.g. using base64), while > leaving > > the rest as is? With this approach in most cases cache name will remain > the > > same, and only multibyte characters would be affected. > > > > Agree, if we can keep cache names in human readable form. Would be nice to > see some examples. >
Re: Handling slashes in cache names
On Wed, Dec 27, 2017 at 6:25 AM, Vladimir Ozerovwrote: > Having different policies for persistent and non-persistent caches sounds > like a bad idea for me, because there could be troubles should user try to > switch to persistent mode. It would require code changes. > > Can we just escape all non-latin symbols (e.g. using base64), while leaving > the rest as is? With this approach in most cases cache name will remain the > same, and only multibyte characters would be affected. > Agree, if we can keep cache names in human readable form. Would be nice to see some examples.
Re: Handling slashes in cache names
Having different policies for persistent and non-persistent caches sounds like a bad idea for me, because there could be troubles should user try to switch to persistent mode. It would require code changes. Can we just escape all non-latin symbols (e.g. using base64), while leaving the rest as is? With this approach in most cases cache name will remain the same, and only multibyte characters would be affected. On Wed, Dec 27, 2017 at 5:15 PM, Dmitriy Setrakyanwrote: > On Wed, Dec 27, 2017 at 3:42 AM, Pavel Tupitsyn > wrote: > > > Agree with Stan and Vladimir. > > We should not impose any restrictions on cache names, some users may have > > issues with that. > > > > Using cache names as file names is internal implementation detail. > > We can use cache id or some kind of encoding (base64, etc) to avoid file > > system issues. > > > > > Pavel, I disagree. I want to look at the file system and be able to clearly > tell which folder belongs to which cache. If you use encryption or some > other encoding, this would be impossible. > > I doubt that introducing cache name validation for *persistent* caches > would affect any existing users. It sounds like for non-persistent caches > the validation is not needed, right? > > D. >
Re: Handling slashes in cache names
On Wed, Dec 27, 2017 at 3:42 AM, Pavel Tupitsynwrote: > Agree with Stan and Vladimir. > We should not impose any restrictions on cache names, some users may have > issues with that. > > Using cache names as file names is internal implementation detail. > We can use cache id or some kind of encoding (base64, etc) to avoid file > system issues. > > Pavel, I disagree. I want to look at the file system and be able to clearly tell which folder belongs to which cache. If you use encryption or some other encoding, this would be impossible. I doubt that introducing cache name validation for *persistent* caches would affect any existing users. It sounds like for non-persistent caches the validation is not needed, right? D.
Re: Handling slashes in cache names
Also, considering case-insensitivity issue, we need to choose some encoding that only uses upper or lower case letters in encoding result. By the way, such encoding will resolve cache name clashes due to case-insensitivity issue. Best Regards, Igor On Wed, Dec 27, 2017 at 4:18 PM, Igor Sapego <isap...@apache.org> wrote: > I personally like a Pavel's suggestion - base64 encoding seems like > a good solution, while string hashes will arise a collision issue. > > Best Regards, > Igor > > On Wed, Dec 27, 2017 at 3:29 PM, Petr Ivanov <mr.wei...@gmail.com> wrote: > >> Special characters banning seems to be exclusive way and cannot be >> controlled in future if new symbols arise. >> Maybe better solution will be choosing the array of permitted symbols for >> caches names (i.e. [a-zA-Z0-9_-])? >> >> >> Also +1 for using abstract hash string for directories names. >> >> >> > On 27 Dec 2017, at 15:14, Stanislav Lukyanov <stanlukya...@gmail.com> >> wrote: >> > >> > We can – by mapping a cache name to some (safe) string to be used as a >> directory name, say via Base64 as Pavel has suggested. >> > >> > However, I think that banning certain characters might be reasonable. >> > Some characters might be considered reserved (e.g. slashes, colon, >> asterisk, etc) to be used later, in case some future feature requires cache >> names to have an actual meaning. >> > Some characters might be banned just as a precaution (e.g. control >> characters or whitespaces) because they might cause problems with logging >> or elsewhere (you might have a bad time processing a cache name with \0 in >> it :) ). >> > >> > The question is whether or not these considerations worth adding code >> and/or changing existing behavior. >> > >> > BTW Java folks had similar discussion on Java module names resulting in >> http://mail.openjdk.java.net/pipermail/jpms-spec-experts/201 >> 6-December/000515.html. >> > >> > Thanks, >> > Stan >> > >> > From: Vladimir Ozerov >> > Sent: 27 декабря 2017 г. 14:37 >> > To: dev@ignite.apache.org >> > Subject: Re: Handling slashes in cache names >> > >> > Cache name appears to me purely logical entity. Can we simply store >> cache >> > ID in file system paths without adding any restrictions to cache names? >> > >> > On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov < >> stanlukya...@gmail.com> >> > wrote: >> > >> >> Well, that’s my question too :) >> >> Do we have any compatibility guidelines or other documents on what can >> or >> >> cannot be in a minor/major release? >> >> >> >> Also, it might be helpful to add an environment variable (like >> >> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, >> just >> >> in case. >> >> >> >> Thanks, >> >> Stan >> >> >> >> From: Dmitriy Setrakyan >> >> Sent: 26 декабря 2017 г. 17:02 >> >> To: dev@ignite.apache.org >> >> Subject: Re: Handling slashes in cache names >> >> >> >> Looks good to me. Is this going to be an exception on startup? If yes, >> is >> >> it safe to release it, or should we wait till 3.0? >> >> >> >> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < >> >> stanlukya...@gmail.com> >> >> wrote: >> >> >> >>> Thanks for the feedback. >> >>> >> >>> It seems that another thing to handle is case-insensitive FS – >> “mycache” >> >>> and “MyCache” is the same on Windows, so it might be reasonable to >> >> disallow >> >>> having two caches with names that are equal ignoring case. >> >>> And one more thing is control characters – forbidding at least range >> of >> >>> ASCII 0x00-0x20 seems reasonable. >> >>> >> >>> To summarize, a possible set of restrictions would be >> >>> - Whitespace characters (via Character.isWhitespaceCharacter) >> >>> - Control characters (via Character.isISOCharacter) >> >>> - Slashes >> >>> - Characters reserved in Windows (<>:"/\|?*) >> >>> - Length (say, up to 255) >> >>> - Distinct names of caches when ignoring case >> >>> It seems reasonable to enforce that even regardless of persistence >> >>> directories naming (AFAIU tha
Re: Handling slashes in cache names
I personally like a Pavel's suggestion - base64 encoding seems like a good solution, while string hashes will arise a collision issue. Best Regards, Igor On Wed, Dec 27, 2017 at 3:29 PM, Petr Ivanov <mr.wei...@gmail.com> wrote: > Special characters banning seems to be exclusive way and cannot be > controlled in future if new symbols arise. > Maybe better solution will be choosing the array of permitted symbols for > caches names (i.e. [a-zA-Z0-9_-])? > > > Also +1 for using abstract hash string for directories names. > > > > On 27 Dec 2017, at 15:14, Stanislav Lukyanov <stanlukya...@gmail.com> > wrote: > > > > We can – by mapping a cache name to some (safe) string to be used as a > directory name, say via Base64 as Pavel has suggested. > > > > However, I think that banning certain characters might be reasonable. > > Some characters might be considered reserved (e.g. slashes, colon, > asterisk, etc) to be used later, in case some future feature requires cache > names to have an actual meaning. > > Some characters might be banned just as a precaution (e.g. control > characters or whitespaces) because they might cause problems with logging > or elsewhere (you might have a bad time processing a cache name with \0 in > it :) ). > > > > The question is whether or not these considerations worth adding code > and/or changing existing behavior. > > > > BTW Java folks had similar discussion on Java module names resulting in > http://mail.openjdk.java.net/pipermail/jpms-spec-experts/ > 2016-December/000515.html. > > > > Thanks, > > Stan > > > > From: Vladimir Ozerov > > Sent: 27 декабря 2017 г. 14:37 > > To: dev@ignite.apache.org > > Subject: Re: Handling slashes in cache names > > > > Cache name appears to me purely logical entity. Can we simply store cache > > ID in file system paths without adding any restrictions to cache names? > > > > On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov < > stanlukya...@gmail.com> > > wrote: > > > >> Well, that’s my question too :) > >> Do we have any compatibility guidelines or other documents on what can > or > >> cannot be in a minor/major release? > >> > >> Also, it might be helpful to add an environment variable (like > >> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, > just > >> in case. > >> > >> Thanks, > >> Stan > >> > >> From: Dmitriy Setrakyan > >> Sent: 26 декабря 2017 г. 17:02 > >> To: dev@ignite.apache.org > >> Subject: Re: Handling slashes in cache names > >> > >> Looks good to me. Is this going to be an exception on startup? If yes, > is > >> it safe to release it, or should we wait till 3.0? > >> > >> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < > >> stanlukya...@gmail.com> > >> wrote: > >> > >>> Thanks for the feedback. > >>> > >>> It seems that another thing to handle is case-insensitive FS – > “mycache” > >>> and “MyCache” is the same on Windows, so it might be reasonable to > >> disallow > >>> having two caches with names that are equal ignoring case. > >>> And one more thing is control characters – forbidding at least range of > >>> ASCII 0x00-0x20 seems reasonable. > >>> > >>> To summarize, a possible set of restrictions would be > >>> - Whitespace characters (via Character.isWhitespaceCharacter) > >>> - Control characters (via Character.isISOCharacter) > >>> - Slashes > >>> - Characters reserved in Windows (<>:"/\|?*) > >>> - Length (say, up to 255) > >>> - Distinct names of caches when ignoring case > >>> It seems reasonable to enforce that even regardless of persistence > >>> directories naming (AFAIU that’s what Dmitry meant by forbidding things > >>> altogether), so that’s what I’m going to do. > >>> Any concerns? > >>> Specifically, would it be OK from backward compatibility point of view > to > >>> forbid all these characters now for all caches? > >>> > >>> Thanks, > >>> Stan > >>> > >>> > >>> From: Alexey Kuznetsov > >>> Sent: 26 декабря 2017 г. 7:51 > >>> To: dev@ignite.apache.org > >>> Subject: Re: Handling slashes in cache names > >>> > >>> It also make sense to limit cache name length to reasonable length. > >>> Because some File systems coul
Re: Handling slashes in cache names
Special characters banning seems to be exclusive way and cannot be controlled in future if new symbols arise. Maybe better solution will be choosing the array of permitted symbols for caches names (i.e. [a-zA-Z0-9_-])? Also +1 for using abstract hash string for directories names. > On 27 Dec 2017, at 15:14, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > > We can – by mapping a cache name to some (safe) string to be used as a > directory name, say via Base64 as Pavel has suggested. > > However, I think that banning certain characters might be reasonable. > Some characters might be considered reserved (e.g. slashes, colon, asterisk, > etc) to be used later, in case some future feature requires cache names to > have an actual meaning. > Some characters might be banned just as a precaution (e.g. control characters > or whitespaces) because they might cause problems with logging or elsewhere > (you might have a bad time processing a cache name with \0 in it :) ). > > The question is whether or not these considerations worth adding code and/or > changing existing behavior. > > BTW Java folks had similar discussion on Java module names resulting in > http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-December/000515.html. > > Thanks, > Stan > > From: Vladimir Ozerov > Sent: 27 декабря 2017 г. 14:37 > To: dev@ignite.apache.org > Subject: Re: Handling slashes in cache names > > Cache name appears to me purely logical entity. Can we simply store cache > ID in file system paths without adding any restrictions to cache names? > > On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <stanlukya...@gmail.com> > wrote: > >> Well, that’s my question too :) >> Do we have any compatibility guidelines or other documents on what can or >> cannot be in a minor/major release? >> >> Also, it might be helpful to add an environment variable (like >> IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, just >> in case. >> >> Thanks, >> Stan >> >> From: Dmitriy Setrakyan >> Sent: 26 декабря 2017 г. 17:02 >> To: dev@ignite.apache.org >> Subject: Re: Handling slashes in cache names >> >> Looks good to me. Is this going to be an exception on startup? If yes, is >> it safe to release it, or should we wait till 3.0? >> >> On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < >> stanlukya...@gmail.com> >> wrote: >> >>> Thanks for the feedback. >>> >>> It seems that another thing to handle is case-insensitive FS – “mycache” >>> and “MyCache” is the same on Windows, so it might be reasonable to >> disallow >>> having two caches with names that are equal ignoring case. >>> And one more thing is control characters – forbidding at least range of >>> ASCII 0x00-0x20 seems reasonable. >>> >>> To summarize, a possible set of restrictions would be >>> - Whitespace characters (via Character.isWhitespaceCharacter) >>> - Control characters (via Character.isISOCharacter) >>> - Slashes >>> - Characters reserved in Windows (<>:"/\|?*) >>> - Length (say, up to 255) >>> - Distinct names of caches when ignoring case >>> It seems reasonable to enforce that even regardless of persistence >>> directories naming (AFAIU that’s what Dmitry meant by forbidding things >>> altogether), so that’s what I’m going to do. >>> Any concerns? >>> Specifically, would it be OK from backward compatibility point of view to >>> forbid all these characters now for all caches? >>> >>> Thanks, >>> Stan >>> >>> >>> From: Alexey Kuznetsov >>> Sent: 26 декабря 2017 г. 7:51 >>> To: dev@ignite.apache.org >>> Subject: Re: Handling slashes in cache names >>> >>> It also make sense to limit cache name length to reasonable length. >>> Because some File systems could have limitations on path length. >>> See: https://en.wikipedia.org/wiki/Filename#Length_restrictions >>> >>> On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan < >> dsetrak...@apache.org> >>> wrote: >>> >>>> My preference would be to prohibit forward and backward slashes in >> cache >>>> names altogether, as they may create a false feeling of some directory >>>> structure, which does not exist. We should also prohibit spaces as >> well. >>>> >>>> D. >>>> >>>> On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < >>>>
RE: Handling slashes in cache names
We can – by mapping a cache name to some (safe) string to be used as a directory name, say via Base64 as Pavel has suggested. However, I think that banning certain characters might be reasonable. Some characters might be considered reserved (e.g. slashes, colon, asterisk, etc) to be used later, in case some future feature requires cache names to have an actual meaning. Some characters might be banned just as a precaution (e.g. control characters or whitespaces) because they might cause problems with logging or elsewhere (you might have a bad time processing a cache name with \0 in it :) ). The question is whether or not these considerations worth adding code and/or changing existing behavior. BTW Java folks had similar discussion on Java module names resulting in http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-December/000515.html. Thanks, Stan From: Vladimir Ozerov Sent: 27 декабря 2017 г. 14:37 To: dev@ignite.apache.org Subject: Re: Handling slashes in cache names Cache name appears to me purely logical entity. Can we simply store cache ID in file system paths without adding any restrictions to cache names? On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > Well, that’s my question too :) > Do we have any compatibility guidelines or other documents on what can or > cannot be in a minor/major release? > > Also, it might be helpful to add an environment variable (like > IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, just > in case. > > Thanks, > Stan > > From: Dmitriy Setrakyan > Sent: 26 декабря 2017 г. 17:02 > To: dev@ignite.apache.org > Subject: Re: Handling slashes in cache names > > Looks good to me. Is this going to be an exception on startup? If yes, is > it safe to release it, or should we wait till 3.0? > > On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < > stanlukya...@gmail.com> > wrote: > > > Thanks for the feedback. > > > > It seems that another thing to handle is case-insensitive FS – “mycache” > > and “MyCache” is the same on Windows, so it might be reasonable to > disallow > > having two caches with names that are equal ignoring case. > > And one more thing is control characters – forbidding at least range of > > ASCII 0x00-0x20 seems reasonable. > > > > To summarize, a possible set of restrictions would be > > - Whitespace characters (via Character.isWhitespaceCharacter) > > - Control characters (via Character.isISOCharacter) > > - Slashes > > - Characters reserved in Windows (<>:"/\|?*) > > - Length (say, up to 255) > > - Distinct names of caches when ignoring case > > It seems reasonable to enforce that even regardless of persistence > > directories naming (AFAIU that’s what Dmitry meant by forbidding things > > altogether), so that’s what I’m going to do. > > Any concerns? > > Specifically, would it be OK from backward compatibility point of view to > > forbid all these characters now for all caches? > > > > Thanks, > > Stan > > > > > > From: Alexey Kuznetsov > > Sent: 26 декабря 2017 г. 7:51 > > To: dev@ignite.apache.org > > Subject: Re: Handling slashes in cache names > > > > It also make sense to limit cache name length to reasonable length. > > Because some File systems could have limitations on path length. > > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions > > > > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > My preference would be to prohibit forward and backward slashes in > cache > > > names altogether, as they may create a false feeling of some directory > > > structure, which does not exist. We should also prohibit spaces as > well. > > > > > > D. > > > > > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > > > stanlukya...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, > > and > > > I > > > > need some guidance on what’s the best way to approach it. > > > > > > > > The problem is that cache names are not restricted, but if > persistence > > is > > > > enabled the cache needs to have a corresponding directory on the file > > > > system (“cache-…”) which can’t be created if the cache name contains > > > > certain characters (or a reserved system name). > > > > > > > > A straightforward approach would be to check if a cache name is >
Re: Handling slashes in cache names
Agree with Stan and Vladimir. We should not impose any restrictions on cache names, some users may have issues with that. Using cache names as file names is internal implementation detail. We can use cache id or some kind of encoding (base64, etc) to avoid file system issues. Thanks, Pavel On Wed, Dec 27, 2017 at 2:38 PM, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > That’s interesting, thanks. > So, do you think the locale-specific file separators should be banned as > well? > Handling all kinds of cases like this might be complicated. > > I’d rather use something else if the cache name is not a valid file name, > a hash of the cache name. > This way all corner cases can be handled at once. > The algorithm would be > 1) Check that cache name doesn’t contain banned characters > 2) Try to create a Path for “cache-” > 3) If failed, create a Path for “cache-” > > Stan > > From: Igor Sapego > Sent: 26 декабря 2017 г. 17:59 > To: dev@ignite.apache.org > Subject: Re: Handling slashes in cache names > > There are also some international features that you might want to > address. For example, instead of backslash some other characters > may be used on Windows - ¥ on the Japanese version, ₩ on the > Korean version. > See [1] for more info. > > Here is the citation: > Security Considerations for Character Sets in File Names > Windows code page and OEM character sets used on > Japanese-language systems contain the Yen symbol (¥) instead of > a backslash (\). Thus, the Yen character is a prohibited character for > NTFS and FAT file systems. When mapping Unicode to > a Japanese-language code page, conversion functions map both > backslash (U+005C) and the normal Unicode Yen symbol (U+00A5) > to this same character. For security reasons, your applications should > not typically allow the character U+00A5 in a Unicode string that > might be converted for use as a FAT file name. > > [1] - https://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx > > > Best Regards, > Igor > > On Tue, Dec 26, 2017 at 5:01 PM, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > > Looks good to me. Is this going to be an exception on startup? If yes, is > > it safe to release it, or should we wait till 3.0? > > > > On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < > > stanlukya...@gmail.com> > > wrote: > > > > > Thanks for the feedback. > > > > > > It seems that another thing to handle is case-insensitive FS – > “mycache” > > > and “MyCache” is the same on Windows, so it might be reasonable to > > disallow > > > having two caches with names that are equal ignoring case. > > > And one more thing is control characters – forbidding at least range of > > > ASCII 0x00-0x20 seems reasonable. > > > > > > To summarize, a possible set of restrictions would be > > > - Whitespace characters (via Character.isWhitespaceCharacter) > > > - Control characters (via Character.isISOCharacter) > > > - Slashes > > > - Characters reserved in Windows (<>:"/\|?*) > > > - Length (say, up to 255) > > > - Distinct names of caches when ignoring case > > > It seems reasonable to enforce that even regardless of persistence > > > directories naming (AFAIU that’s what Dmitry meant by forbidding things > > > altogether), so that’s what I’m going to do. > > > Any concerns? > > > Specifically, would it be OK from backward compatibility point of view > to > > > forbid all these characters now for all caches? > > > > > > Thanks, > > > Stan > > > > > > > > > From: Alexey Kuznetsov > > > Sent: 26 декабря 2017 г. 7:51 > > > To: dev@ignite.apache.org > > > Subject: Re: Handling slashes in cache names > > > > > > It also make sense to limit cache name length to reasonable length. > > > Because some File systems could have limitations on path length. > > > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions > > > > > > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan < > > dsetrak...@apache.org> > > > wrote: > > > > > > > My preference would be to prohibit forward and backward slashes in > > cache > > > > names altogether, as they may create a false feeling of some > directory > > > > structure, which does not exist. We should also prohibit spaces as > > well. > > > > > > > > D. > > > > > > > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > > > > stanlukya.
RE: Handling slashes in cache names
That’s interesting, thanks. So, do you think the locale-specific file separators should be banned as well? Handling all kinds of cases like this might be complicated. I’d rather use something else if the cache name is not a valid file name, a hash of the cache name. This way all corner cases can be handled at once. The algorithm would be 1) Check that cache name doesn’t contain banned characters 2) Try to create a Path for “cache-” 3) If failed, create a Path for “cache-” Stan From: Igor Sapego Sent: 26 декабря 2017 г. 17:59 To: dev@ignite.apache.org Subject: Re: Handling slashes in cache names There are also some international features that you might want to address. For example, instead of backslash some other characters may be used on Windows - ¥ on the Japanese version, ₩ on the Korean version. See [1] for more info. Here is the citation: Security Considerations for Character Sets in File Names Windows code page and OEM character sets used on Japanese-language systems contain the Yen symbol (¥) instead of a backslash (\). Thus, the Yen character is a prohibited character for NTFS and FAT file systems. When mapping Unicode to a Japanese-language code page, conversion functions map both backslash (U+005C) and the normal Unicode Yen symbol (U+00A5) to this same character. For security reasons, your applications should not typically allow the character U+00A5 in a Unicode string that might be converted for use as a FAT file name. [1] - https://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx Best Regards, Igor On Tue, Dec 26, 2017 at 5:01 PM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > Looks good to me. Is this going to be an exception on startup? If yes, is > it safe to release it, or should we wait till 3.0? > > On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < > stanlukya...@gmail.com> > wrote: > > > Thanks for the feedback. > > > > It seems that another thing to handle is case-insensitive FS – “mycache” > > and “MyCache” is the same on Windows, so it might be reasonable to > disallow > > having two caches with names that are equal ignoring case. > > And one more thing is control characters – forbidding at least range of > > ASCII 0x00-0x20 seems reasonable. > > > > To summarize, a possible set of restrictions would be > > - Whitespace characters (via Character.isWhitespaceCharacter) > > - Control characters (via Character.isISOCharacter) > > - Slashes > > - Characters reserved in Windows (<>:"/\|?*) > > - Length (say, up to 255) > > - Distinct names of caches when ignoring case > > It seems reasonable to enforce that even regardless of persistence > > directories naming (AFAIU that’s what Dmitry meant by forbidding things > > altogether), so that’s what I’m going to do. > > Any concerns? > > Specifically, would it be OK from backward compatibility point of view to > > forbid all these characters now for all caches? > > > > Thanks, > > Stan > > > > > > From: Alexey Kuznetsov > > Sent: 26 декабря 2017 г. 7:51 > > To: dev@ignite.apache.org > > Subject: Re: Handling slashes in cache names > > > > It also make sense to limit cache name length to reasonable length. > > Because some File systems could have limitations on path length. > > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions > > > > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > My preference would be to prohibit forward and backward slashes in > cache > > > names altogether, as they may create a false feeling of some directory > > > structure, which does not exist. We should also prohibit spaces as > well. > > > > > > D. > > > > > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > > > stanlukya...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, > > and > > > I > > > > need some guidance on what’s the best way to approach it. > > > > > > > > The problem is that cache names are not restricted, but if > persistence > > is > > > > enabled the cache needs to have a corresponding directory on the file > > > > system (“cache-…”) which can’t be created if the cache name contains > > > > certain characters (or a reserved system name). > > > > > > > > A straightforward approach would be to check if a cache name is > allowed > > > on > > > > the local system (e.g. via `Paths.get(name)`) and fail to create > cache > > if > > > > it isn’t, but I’m a bit concerned with the consistency of the > behavior > > > (the > > > > same cache name be allowed on one system and not on another). > > > > I think a better way would be to replace special characters (say, all > > > > non-alphanumeric characters) with underscores in file names (not > > changing > > > > the cache configuration). Would this be OK? Are there any risks I’m > not > > > > considering? > > > > > > > > WDYT? > > > > > > > > Thanks, > > > > Stan > > > > > > > > > > > > > > > -- > > Alexey Kuznetsov > > > > >
Re: Handling slashes in cache names
Cache name appears to me purely logical entity. Can we simply store cache ID in file system paths without adding any restrictions to cache names? On Wed, Dec 27, 2017 at 2:26 PM, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > Well, that’s my question too :) > Do we have any compatibility guidelines or other documents on what can or > cannot be in a minor/major release? > > Also, it might be helpful to add an environment variable (like > IGNITE_DISABLE_CACHE_NAME_RESTRICTIONS) to restore the old behavior, just > in case. > > Thanks, > Stan > > From: Dmitriy Setrakyan > Sent: 26 декабря 2017 г. 17:02 > To: dev@ignite.apache.org > Subject: Re: Handling slashes in cache names > > Looks good to me. Is this going to be an exception on startup? If yes, is > it safe to release it, or should we wait till 3.0? > > On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < > stanlukya...@gmail.com> > wrote: > > > Thanks for the feedback. > > > > It seems that another thing to handle is case-insensitive FS – “mycache” > > and “MyCache” is the same on Windows, so it might be reasonable to > disallow > > having two caches with names that are equal ignoring case. > > And one more thing is control characters – forbidding at least range of > > ASCII 0x00-0x20 seems reasonable. > > > > To summarize, a possible set of restrictions would be > > - Whitespace characters (via Character.isWhitespaceCharacter) > > - Control characters (via Character.isISOCharacter) > > - Slashes > > - Characters reserved in Windows (<>:"/\|?*) > > - Length (say, up to 255) > > - Distinct names of caches when ignoring case > > It seems reasonable to enforce that even regardless of persistence > > directories naming (AFAIU that’s what Dmitry meant by forbidding things > > altogether), so that’s what I’m going to do. > > Any concerns? > > Specifically, would it be OK from backward compatibility point of view to > > forbid all these characters now for all caches? > > > > Thanks, > > Stan > > > > > > From: Alexey Kuznetsov > > Sent: 26 декабря 2017 г. 7:51 > > To: dev@ignite.apache.org > > Subject: Re: Handling slashes in cache names > > > > It also make sense to limit cache name length to reasonable length. > > Because some File systems could have limitations on path length. > > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions > > > > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > My preference would be to prohibit forward and backward slashes in > cache > > > names altogether, as they may create a false feeling of some directory > > > structure, which does not exist. We should also prohibit spaces as > well. > > > > > > D. > > > > > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > > > stanlukya...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, > > and > > > I > > > > need some guidance on what’s the best way to approach it. > > > > > > > > The problem is that cache names are not restricted, but if > persistence > > is > > > > enabled the cache needs to have a corresponding directory on the file > > > > system (“cache-…”) which can’t be created if the cache name contains > > > > certain characters (or a reserved system name). > > > > > > > > A straightforward approach would be to check if a cache name is > allowed > > > on > > > > the local system (e.g. via `Paths.get(name)`) and fail to create > cache > > if > > > > it isn’t, but I’m a bit concerned with the consistency of the > behavior > > > (the > > > > same cache name be allowed on one system and not on another). > > > > I think a better way would be to replace special characters (say, all > > > > non-alphanumeric characters) with underscores in file names (not > > changing > > > > the cache configuration). Would this be OK? Are there any risks I’m > not > > > > considering? > > > > > > > > WDYT? > > > > > > > > Thanks, > > > > Stan > > > > > > > > > > > > > > > -- > > Alexey Kuznetsov > > > > > >
Re: Handling slashes in cache names
There are also some international features that you might want to address. For example, instead of backslash some other characters may be used on Windows - ¥ on the Japanese version, ₩ on the Korean version. See [1] for more info. Here is the citation: Security Considerations for Character Sets in File Names Windows code page and OEM character sets used on Japanese-language systems contain the Yen symbol (¥) instead of a backslash (\). Thus, the Yen character is a prohibited character for NTFS and FAT file systems. When mapping Unicode to a Japanese-language code page, conversion functions map both backslash (U+005C) and the normal Unicode Yen symbol (U+00A5) to this same character. For security reasons, your applications should not typically allow the character U+00A5 in a Unicode string that might be converted for use as a FAT file name. [1] - https://msdn.microsoft.com/en-us/library/dd374047(v=vs.85).aspx Best Regards, Igor On Tue, Dec 26, 2017 at 5:01 PM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > Looks good to me. Is this going to be an exception on startup? If yes, is > it safe to release it, or should we wait till 3.0? > > On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov < > stanlukya...@gmail.com> > wrote: > > > Thanks for the feedback. > > > > It seems that another thing to handle is case-insensitive FS – “mycache” > > and “MyCache” is the same on Windows, so it might be reasonable to > disallow > > having two caches with names that are equal ignoring case. > > And one more thing is control characters – forbidding at least range of > > ASCII 0x00-0x20 seems reasonable. > > > > To summarize, a possible set of restrictions would be > > - Whitespace characters (via Character.isWhitespaceCharacter) > > - Control characters (via Character.isISOCharacter) > > - Slashes > > - Characters reserved in Windows (<>:"/\|?*) > > - Length (say, up to 255) > > - Distinct names of caches when ignoring case > > It seems reasonable to enforce that even regardless of persistence > > directories naming (AFAIU that’s what Dmitry meant by forbidding things > > altogether), so that’s what I’m going to do. > > Any concerns? > > Specifically, would it be OK from backward compatibility point of view to > > forbid all these characters now for all caches? > > > > Thanks, > > Stan > > > > > > From: Alexey Kuznetsov > > Sent: 26 декабря 2017 г. 7:51 > > To: dev@ignite.apache.org > > Subject: Re: Handling slashes in cache names > > > > It also make sense to limit cache name length to reasonable length. > > Because some File systems could have limitations on path length. > > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions > > > > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > My preference would be to prohibit forward and backward slashes in > cache > > > names altogether, as they may create a false feeling of some directory > > > structure, which does not exist. We should also prohibit spaces as > well. > > > > > > D. > > > > > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > > > stanlukya...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, > > and > > > I > > > > need some guidance on what’s the best way to approach it. > > > > > > > > The problem is that cache names are not restricted, but if > persistence > > is > > > > enabled the cache needs to have a corresponding directory on the file > > > > system (“cache-…”) which can’t be created if the cache name contains > > > > certain characters (or a reserved system name). > > > > > > > > A straightforward approach would be to check if a cache name is > allowed > > > on > > > > the local system (e.g. via `Paths.get(name)`) and fail to create > cache > > if > > > > it isn’t, but I’m a bit concerned with the consistency of the > behavior > > > (the > > > > same cache name be allowed on one system and not on another). > > > > I think a better way would be to replace special characters (say, all > > > > non-alphanumeric characters) with underscores in file names (not > > changing > > > > the cache configuration). Would this be OK? Are there any risks I’m > not > > > > considering? > > > > > > > > WDYT? > > > > > > > > Thanks, > > > > Stan > > > > > > > > > > > > > > > -- > > Alexey Kuznetsov > > > > >
Re: Handling slashes in cache names
Looks good to me. Is this going to be an exception on startup? If yes, is it safe to release it, or should we wait till 3.0? On Tue, Dec 26, 2017 at 2:08 AM, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > Thanks for the feedback. > > It seems that another thing to handle is case-insensitive FS – “mycache” > and “MyCache” is the same on Windows, so it might be reasonable to disallow > having two caches with names that are equal ignoring case. > And one more thing is control characters – forbidding at least range of > ASCII 0x00-0x20 seems reasonable. > > To summarize, a possible set of restrictions would be > - Whitespace characters (via Character.isWhitespaceCharacter) > - Control characters (via Character.isISOCharacter) > - Slashes > - Characters reserved in Windows (<>:"/\|?*) > - Length (say, up to 255) > - Distinct names of caches when ignoring case > It seems reasonable to enforce that even regardless of persistence > directories naming (AFAIU that’s what Dmitry meant by forbidding things > altogether), so that’s what I’m going to do. > Any concerns? > Specifically, would it be OK from backward compatibility point of view to > forbid all these characters now for all caches? > > Thanks, > Stan > > > From: Alexey Kuznetsov > Sent: 26 декабря 2017 г. 7:51 > To: dev@ignite.apache.org > Subject: Re: Handling slashes in cache names > > It also make sense to limit cache name length to reasonable length. > Because some File systems could have limitations on path length. > See: https://en.wikipedia.org/wiki/Filename#Length_restrictions > > On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > > My preference would be to prohibit forward and backward slashes in cache > > names altogether, as they may create a false feeling of some directory > > structure, which does not exist. We should also prohibit spaces as well. > > > > D. > > > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > > stanlukya...@gmail.com> > > wrote: > > > > > Hi all, > > > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, > and > > I > > > need some guidance on what’s the best way to approach it. > > > > > > The problem is that cache names are not restricted, but if persistence > is > > > enabled the cache needs to have a corresponding directory on the file > > > system (“cache-…”) which can’t be created if the cache name contains > > > certain characters (or a reserved system name). > > > > > > A straightforward approach would be to check if a cache name is allowed > > on > > > the local system (e.g. via `Paths.get(name)`) and fail to create cache > if > > > it isn’t, but I’m a bit concerned with the consistency of the behavior > > (the > > > same cache name be allowed on one system and not on another). > > > I think a better way would be to replace special characters (say, all > > > non-alphanumeric characters) with underscores in file names (not > changing > > > the cache configuration). Would this be OK? Are there any risks I’m not > > > considering? > > > > > > WDYT? > > > > > > Thanks, > > > Stan > > > > > > > > > -- > Alexey Kuznetsov > >
RE: Handling slashes in cache names
Thanks for the feedback. It seems that another thing to handle is case-insensitive FS – “mycache” and “MyCache” is the same on Windows, so it might be reasonable to disallow having two caches with names that are equal ignoring case. And one more thing is control characters – forbidding at least range of ASCII 0x00-0x20 seems reasonable. To summarize, a possible set of restrictions would be - Whitespace characters (via Character.isWhitespaceCharacter) - Control characters (via Character.isISOCharacter) - Slashes - Characters reserved in Windows (<>:"/\|?*) - Length (say, up to 255) - Distinct names of caches when ignoring case It seems reasonable to enforce that even regardless of persistence directories naming (AFAIU that’s what Dmitry meant by forbidding things altogether), so that’s what I’m going to do. Any concerns? Specifically, would it be OK from backward compatibility point of view to forbid all these characters now for all caches? Thanks, Stan From: Alexey Kuznetsov Sent: 26 декабря 2017 г. 7:51 To: dev@ignite.apache.org Subject: Re: Handling slashes in cache names It also make sense to limit cache name length to reasonable length. Because some File systems could have limitations on path length. See: https://en.wikipedia.org/wiki/Filename#Length_restrictions On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > My preference would be to prohibit forward and backward slashes in cache > names altogether, as they may create a false feeling of some directory > structure, which does not exist. We should also prohibit spaces as well. > > D. > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > stanlukya...@gmail.com> > wrote: > > > Hi all, > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and > I > > need some guidance on what’s the best way to approach it. > > > > The problem is that cache names are not restricted, but if persistence is > > enabled the cache needs to have a corresponding directory on the file > > system (“cache-…”) which can’t be created if the cache name contains > > certain characters (or a reserved system name). > > > > A straightforward approach would be to check if a cache name is allowed > on > > the local system (e.g. via `Paths.get(name)`) and fail to create cache if > > it isn’t, but I’m a bit concerned with the consistency of the behavior > (the > > same cache name be allowed on one system and not on another). > > I think a better way would be to replace special characters (say, all > > non-alphanumeric characters) with underscores in file names (not changing > > the cache configuration). Would this be OK? Are there any risks I’m not > > considering? > > > > WDYT? > > > > Thanks, > > Stan > > > -- Alexey Kuznetsov
Re: Handling slashes in cache names
It also make sense to limit cache name length to reasonable length. Because some File systems could have limitations on path length. See: https://en.wikipedia.org/wiki/Filename#Length_restrictions On Tue, Dec 26, 2017 at 1:41 AM, Dmitriy Setrakyanwrote: > My preference would be to prohibit forward and backward slashes in cache > names altogether, as they may create a false feeling of some directory > structure, which does not exist. We should also prohibit spaces as well. > > D. > > On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanov < > stanlukya...@gmail.com> > wrote: > > > Hi all, > > > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and > I > > need some guidance on what’s the best way to approach it. > > > > The problem is that cache names are not restricted, but if persistence is > > enabled the cache needs to have a corresponding directory on the file > > system (“cache-…”) which can’t be created if the cache name contains > > certain characters (or a reserved system name). > > > > A straightforward approach would be to check if a cache name is allowed > on > > the local system (e.g. via `Paths.get(name)`) and fail to create cache if > > it isn’t, but I’m a bit concerned with the consistency of the behavior > (the > > same cache name be allowed on one system and not on another). > > I think a better way would be to replace special characters (say, all > > non-alphanumeric characters) with underscores in file names (not changing > > the cache configuration). Would this be OK? Are there any risks I’m not > > considering? > > > > WDYT? > > > > Thanks, > > Stan > > > -- Alexey Kuznetsov
Re: Handling slashes in cache names
My preference would be to prohibit forward and backward slashes in cache names altogether, as they may create a false feeling of some directory structure, which does not exist. We should also prohibit spaces as well. D. On Mon, Dec 25, 2017 at 7:09 AM, Stanislav Lukyanovwrote: > Hi all, > > I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I > need some guidance on what’s the best way to approach it. > > The problem is that cache names are not restricted, but if persistence is > enabled the cache needs to have a corresponding directory on the file > system (“cache-…”) which can’t be created if the cache name contains > certain characters (or a reserved system name). > > A straightforward approach would be to check if a cache name is allowed on > the local system (e.g. via `Paths.get(name)`) and fail to create cache if > it isn’t, but I’m a bit concerned with the consistency of the behavior (the > same cache name be allowed on one system and not on another). > I think a better way would be to replace special characters (say, all > non-alphanumeric characters) with underscores in file names (not changing > the cache configuration). Would this be OK? Are there any risks I’m not > considering? > > WDYT? > > Thanks, > Stan >
Handling slashes in cache names
Hi all, I’m looking into https://issues.apache.org/jira/browse/IGNITE-7264, and I need some guidance on what’s the best way to approach it. The problem is that cache names are not restricted, but if persistence is enabled the cache needs to have a corresponding directory on the file system (“cache-…”) which can’t be created if the cache name contains certain characters (or a reserved system name). A straightforward approach would be to check if a cache name is allowed on the local system (e.g. via `Paths.get(name)`) and fail to create cache if it isn’t, but I’m a bit concerned with the consistency of the behavior (the same cache name be allowed on one system and not on another). I think a better way would be to replace special characters (say, all non-alphanumeric characters) with underscores in file names (not changing the cache configuration). Would this be OK? Are there any risks I’m not considering? WDYT? Thanks, Stan