Re: Fwd: LMDB and text encoding

2017-06-07 Thread Howard Chu
Timur Kristóf wrote: Hi Everyone, I've just came accross this old thread and am wondering, is this still an issue? No, it was resolved long ago. Does LMDB have a way to use non-ASCII path names with mdb_env_open in a cross-platform way? If not, would you guys accept patches to LMDB with

Re: Fwd: LMDB and text encoding

2017-06-07 Thread Timur Kristóf
Hi Everyone, I've just came accross this old thread and am wondering, is this still an issue? Does LMDB have a way to use non-ASCII path names with mdb_env_open in a cross-platform way? If not, would you guys accept patches to LMDB with this regard? Thanks, Timur

Re: Fwd: LMDB and text encoding

2015-02-15 Thread Florian Weimer
* Timur Kristóf: A path is always a Unicode string, while a DB name can be an arbitrary binary blob. On many POSIX platforms, a path is a blob which does not contain '\000'. These systems do not enforce Unicode encoding at all.

Re: Fwd: LMDB and text encoding

2015-02-15 Thread Timur Kristóf
A path is always a Unicode string, while a DB name can be an arbitrary binary blob. On many POSIX platforms, a path is a blob which does not contain '\000'. These systems do not enforce Unicode encoding at all. My mistake. I was unaware. On those platforms, how do you type a path name

Re: Fwd: LMDB and text encoding

2015-02-15 Thread Timur Kristóf
A path is always a Unicode string, while a DB name can be an arbitrary binary blob. On many POSIX platforms, a path is a blob which does not contain '\000'. These systems do not enforce Unicode encoding at all. My mistake. I was unaware. On those platforms, how do you type a path name

Re: Fwd: LMDB and text encoding

2015-02-15 Thread Florian Weimer
* Timur Kristóf: A path is always a Unicode string, while a DB name can be an arbitrary binary blob. On many POSIX platforms, a path is a blob which does not contain '\000'. These systems do not enforce Unicode encoding at all. My mistake. I was unaware. On those platforms, how do you

Fwd: LMDB and text encoding

2015-02-02 Thread Timur Kristóf
On Mon, Feb 2, 2015 at 3:37 AM, Howard Chu h...@symas.com wrote: Hallvard Breien Furuseth wrote: On 02/02/15 00:40, Howard Chu wrote: It looks OK to me. No one raises any concerns I'll commit it in a few hours. Some sudden last thoughts: mdb_dump.c also has a check (memchr(key.mv_data,

Fwd: LMDB and text encoding

2015-02-02 Thread Timur Kristóf
I just had a look at how BDB handled this. As you can see they used a TO_TSTRING macro to convert incoming pathnames from UTF8 to UTF16. https://gitorious.org/berkeleydb/berkeleydb/source/347d239a1e44ed4f773ae9274c2a32cf2b8999c0:src/os_windows/os_open.c

Re: Fwd: LMDB and text encoding

2015-02-02 Thread Howard Chu
Timur Kristóf wrote: I just had a look at how BDB handled this. As you can see they used a TO_TSTRING macro to convert incoming pathnames from UTF8 to UTF16. https://gitorious.org/berkeleydb/berkeleydb/source/347d239a1e44ed4f773ae9274c2a32cf2b8999c0:src/os_windows/os_open.c

Re: Fwd: LMDB and text encoding

2015-02-02 Thread Hallvard Breien Furuseth
On 02. feb. 2015 14:24, Howard Chu wrote: Hallvard Breien Furuseth wrote: I suggest we wait to deal with DB names until we also have a way to deal with filenames. And this time test that it works is practice:-) Hopefully users and programmers will only need one method of handling non-ASCII

Re: Fwd: LMDB and text encoding

2015-02-02 Thread Timur Kristóf
DB names are purely internal to LMDB, so they bear no relation to OS filenames and none of this discussion matters to them. They're exposed to the programmer and the program's users. Either may want them on command-line arguments, in config files, etc. It will be inconvenient if LMDB

Re: Fwd: LMDB and text encoding

2015-02-02 Thread Hallvard Breien Furuseth
On 02. feb. 2015 16:03, Timur Kristóf wrote: A path is always a Unicode string, while a DB name can be an arbitrary binary blob. So I don't think that we can treat them the same way. Not the point. A program which uses LDMB can choose to treat its own DB names in its own LMDB environments as

Re: Fwd: LMDB and text encoding

2015-02-02 Thread Hallvard Breien Furuseth
On 02. feb. 2015 16:25, Timur Kristóf wrote: Okay. What do you suggest? I suggest we wait to deal with DB names until we also have a way to deal with filenames. And this time test that it works is practice. And then I also suggest to try to make this mess simple to deal with for programmers

Re: Fwd: LMDB and text encoding

2015-02-02 Thread Timur Kristóf
I suggest we wait to deal with DB names until we also have a way to deal with filenames. And this time test that it works is practice. And then I also suggest to try to make this mess simple to deal with for programmers and or users. I guess I should have separated that from the rest more