Re: LMDB and text encoding

2015-02-01 Thread Howard Chu
Timur Kristóf wrote: Hi Everyone, I've been talking to Howard about this and he suggested to post it to this mailing list. There are two things that I recently noticed about how LMDB works with various encodings and I think it's worth to discuss. 2. Path names Functions like mdb_env_open, m

Re: LMDB and text encoding

2015-02-01 Thread Howard Chu
Hallvard Breien Furuseth wrote: On 02/02/15 00:40, Howard Chu wrote: It looks OK to me. No one raises any concerns I'll commit it in a few hours. Some sudden last thoughts: mdb_dump.c also has a check (memchr(key.mv_data, '\0', key.mv_size) to exclude non-databases, which is no longer valid.

Re: LMDB and text encoding

2015-02-01 Thread Hallvard Breien Furuseth
On 02/02/15 02:00, Hallvard Breien Furuseth wrote: Come to think of it, I have no idea if the dump format supports DB names with \0 in them. ...and there will now be database names which cannot be spelled on the command line, like for . I don't think that was quite the point.

Re: LMDB and text encoding

2015-02-01 Thread Hallvard Breien Furuseth
On 02/02/15 00:40, Howard Chu wrote: It looks OK to me. No one raises any concerns I'll commit it in a few hours. Some sudden last thoughts: mdb_dump.c also has a check (memchr(key.mv_data, '\0', key.mv_size) to exclude non-databases, which is no longer valid. Database names with \0 in them c

Re: LMDB and text encoding

2015-02-01 Thread Howard Chu
Timur Kristóf wrote: Hi, I forgot to add an ENOMEM check. I added it now. I think this patch is ready for Howard and Hallvard to review. :) It looks OK to me. No one raises any concerns I'll commit it in a few hours. Timur On Thu, Jan 29, 2015 at 2:42 PM, Timur Kristóf wrote: Here is a fi

Re: LMDB and text encoding

2015-02-01 Thread Timur Kristóf
Hi, I forgot to add an ENOMEM check. I added it now. I think this patch is ready for Howard and Hallvard to review. :) Timur On Thu, Jan 29, 2015 at 2:42 PM, Timur Kristóf wrote: > Here is a fixed version of the patch. > > On Thu, Jan 29, 2015 at 10:29 AM, Timur Kristóf > wrote: >>> mdb_dbi_o

Re: LMDB and text encoding

2015-01-29 Thread Timur Kristóf
Here is a fixed version of the patch. On Thu, Jan 29, 2015 at 10:29 AM, Timur Kristóf wrote: >> mdb_dbi_open treats its name parameter as a C string. This means UTF-8 on >> unixes and ANSI on Windows, which is problematic for cross-platform >> applications. [...] > > Here is a patch that addresse

Re: LMDB and text encoding

2015-01-29 Thread Timur Kristóf
I've had a brief chat with Hallvard on IRC. We came up with several possible solutions, although each of them has its drawbacks. Writing cross-platform code that supports unicode is always a messy business. I vote for option 4, but would like to hear everyone's opinions before starting to work on a

Re: LMDB and text encoding

2015-01-29 Thread Hallvard Breien Furuseth
I wrote: Though I notice Windows #defines CreateFile() & co as CreateFileA or CreateFileW depending on whether or not UNICODE is #defined (and some other macros), without even mentioning this in the CreateFile() doc. I suppose ldmb.h could do something similar - but with doc:-) Whoops, is does

Re: LMDB and text encoding

2015-01-29 Thread Hallvard Breien Furuseth
My take: On 27. jan. 2015 22:39, Timur Kristóf wrote: > 1. Database names MDB_val here sounds nice... 2. Path names Functions like mdb_env_open, mdb_env_get_path, mdb_env_copy and the likes accept a char* for path names. This is fine on most unixes where char* is an UTF-8 string, but unfortun

Re: LMDB and text encoding

2015-01-29 Thread Timur Kristóf
> mdb_dbi_open treats its name parameter as a C string. This means UTF-8 on > unixes and ANSI on Windows, which is problematic for cross-platform > applications. [...] Here is a patch that addresses this concern. If you like it, I'll move on to the other issue. From 8c32675cbc4d32fe151b76ef28268af