Re: Want to read a whole file as utf-8
Since I'm now almost finished, I'm glad to show you my work: https://github.com/Dgame/m3 You're free to use it or to contribute to it.
Want to read a whole file as utf-8
How can I do that without any GC allocation? Nothing in std.file seems to be marked with @nogc I'm asking since it seems very complicated to do that with C++, maybe D is a better choice, then we would probably move our whole project from C++ to D.
Re: Want to read a whole file as utf-8
On 2015-02-03 at 19:53, Foo wrote: How can I do that without any GC allocation? Nothing in std.file seems to be marked with @nogc I'm asking since it seems very complicated to do that with C++, maybe D is a better choice, then we would probably move our whole project from C++ to D. Looks like std.stdio isn't marked with @nogc all the way either. So for now the temporary solution would be to use std.c.stdio. Get the file size, malloc a buffer large enough for it[1], use std.c.stdio.read to fill it, assign it to a char[] slice and std.utf.decode to consume the text... Oh wait, decode isn't @nogc either. FFS, what now? [1] I assume the file is small, otherwise there would be an extra step involved where after nearing the end of the buffer you move the rest of the data to the front, read new data after it, and continue decoding.
Re: Want to read a whole file as utf-8
On Tuesday, 3 February 2015 at 19:44:49 UTC, FG wrote: On 2015-02-03 at 19:53, Foo wrote: How can I do that without any GC allocation? Nothing in std.file seems to be marked with @nogc I'm asking since it seems very complicated to do that with C++, maybe D is a better choice, then we would probably move our whole project from C++ to D. Looks like std.stdio isn't marked with @nogc all the way either. So for now the temporary solution would be to use std.c.stdio. Get the file size, malloc a buffer large enough for it[1], use std.c.stdio.read to fill it, assign it to a char[] slice and std.utf.decode to consume the text... Oh wait, decode isn't @nogc either. FFS, what now? [1] I assume the file is small, otherwise there would be an extra step involved where after nearing the end of the buffer you move the rest of the data to the front, read new data after it, and continue decoding. Use std.utf.validate instead of decode. It will only allocate one exception if necessary.
Re: Want to read a whole file as utf-8
On 2015-02-03 at 20:50, Tobias Pankrath wrote: Use std.utf.validate instead of decode. It will only allocate one exception if necessary. Looks to me like it uses decode internally... But Foo, do you have to use @nogc? It still looks like it's work in progress, and lack of it doesn't mean that the GC is actually involved in the function. It will probably take several months for the obvious nogc parts of the std lib to get annotated, and much longer to get rid of unnecessary use of the GC. So maybe the solution for now is to verify the source code of the function in question with ones own set of eyeballs and decide if it's good enough for use, ie. doesn't leak too much?
Re: Want to read a whole file as utf-8
On Tuesday, 3 February 2015 at 19:56:37 UTC, FG wrote: On 2015-02-03 at 20:50, Tobias Pankrath wrote: Use std.utf.validate instead of decode. It will only allocate one exception if necessary. Looks to me like it uses decode internally... But Foo, do you have to use @nogc? It still looks like it's work in progress, and lack of it doesn't mean that the GC is actually involved in the function. It will probably take several months for the obvious nogc parts of the std lib to get annotated, and much longer to get rid of unnecessary use of the GC. So maybe the solution for now is to verify the source code of the function in question with ones own set of eyeballs and decide if it's good enough for use, ie. doesn't leak too much? Yes, we don't want to use a GC. We want determinsitic life times. I'm not the boss, but I support the idea. @Nordlöw Neither of them can be marked with @nogc. :/
Re: Want to read a whole file as utf-8
On Tuesday, 3 February 2015 at 19:44:49 UTC, FG wrote: On 2015-02-03 at 19:53, Foo wrote: How can I do that without any GC allocation? Nothing in std.file seems to be marked with @nogc I'm asking since it seems very complicated to do that with C++, maybe D is a better choice, then we would probably move our whole project from C++ to D. Looks like std.stdio isn't marked with @nogc all the way either. So for now the temporary solution would be to use std.c.stdio. Get the file size, malloc a buffer large enough for it[1], use std.c.stdio.read to fill it, assign it to a char[] slice and std.utf.decode to consume the text... Oh wait, decode isn't @nogc either. FFS, what now? [1] I assume the file is small, otherwise there would be an extra step involved where after nearing the end of the buffer you move the rest of the data to the front, read new data after it, and continue decoding. How would I use decoding for that? Isn't there a way to read the file as utf8 or event better, as unicode?
Re: Want to read a whole file as utf-8
On Tuesday, 3 February 2015 at 18:53:28 UTC, Foo wrote: How can I do that without any GC allocation? Nothing in std.file seems to be marked with @nogc I'm asking since it seems very complicated to do that with C++, maybe D is a better choice, then we would probably move our whole project from C++ to D. My module https://github.com/nordlow/justd/blob/master/mmfile_ex.d together with https://github.com/nordlow/justd/blob/master/bylines.d is about as low-level as you can get in D.
Re: Want to read a whole file as utf-8
On 2015-02-04 at 00:07, Foo wrote: How would I use decoding for that? Isn't there a way to read the file as utf8 or event better, as unicode? Well, apparently the utf-8-aware foreach loop still works just fine. This program shows the file size and the number of unicode glyps, or whatever they are called: import core.stdc.stdio; int main() @nogc { const int bufSize = 64000; char[bufSize] buffer; size_t bytesRead, count; FILE* f = core.stdc.stdio.fopen(test.d, r); if (!f) return 1; bytesRead = fread(cast(void*)buffer, 1, bufSize, f); if (bytesRead bufSize - 1) { printf(File is too big); return 1; } if (!bytesRead) return 2; foreach (dchar d; buffer[0..bytesRead]) count++; printf(read %d bytes, %d unicode characters\n, bytesRead, count); fclose(f); return 0; } Outputs for example this: read 838 bytes, 829 unicode characters (It would be more complicated if it had to process bigger files.)
Re: Want to read a whole file as utf-8
On Tuesday, 3 February 2015 at 23:07:03 UTC, Foo wrote: On Tuesday, 3 February 2015 at 19:44:49 UTC, FG wrote: On 2015-02-03 at 19:53, Foo wrote: How can I do that without any GC allocation? Nothing in std.file seems to be marked with @nogc I'm asking since it seems very complicated to do that with C++, maybe D is a better choice, then we would probably move our whole project from C++ to D. Looks like std.stdio isn't marked with @nogc all the way either. So for now the temporary solution would be to use std.c.stdio. Get the file size, malloc a buffer large enough for it[1], use std.c.stdio.read to fill it, assign it to a char[] slice and std.utf.decode to consume the text... Oh wait, decode isn't @nogc either. FFS, what now? [1] I assume the file is small, otherwise there would be an extra step involved where after nearing the end of the buffer you move the rest of the data to the front, read new data after it, and continue decoding. How would I use decoding for that? Isn't there a way to read the file as utf8 or event better, as unicode? Arrays of char, wchar and dchar are supposed to be UTF strings and of course you can just read them using a c function from a file. You'd just need to make sure they are valid UTF before passing them on to other parts of phobos. What do you mean with as unicode?
Re: Want to read a whole file as utf-8
On Tuesday, 3 February 2015 at 23:55:19 UTC, FG wrote: On 2015-02-04 at 00:07, Foo wrote: How would I use decoding for that? Isn't there a way to read the file as utf8 or event better, as unicode? Well, apparently the utf-8-aware foreach loop still works just fine. This program shows the file size and the number of unicode glyps, or whatever they are called: import core.stdc.stdio; int main() @nogc { const int bufSize = 64000; char[bufSize] buffer; size_t bytesRead, count; FILE* f = core.stdc.stdio.fopen(test.d, r); if (!f) return 1; bytesRead = fread(cast(void*)buffer, 1, bufSize, f); if (bytesRead bufSize - 1) { printf(File is too big); return 1; } if (!bytesRead) return 2; foreach (dchar d; buffer[0..bytesRead]) count++; printf(read %d bytes, %d unicode characters\n, bytesRead, count); fclose(f); return 0; } Outputs for example this: read 838 bytes, 829 unicode characters (It would be more complicated if it had to process bigger files.) To use a foreach loop is such a nice idea! tank you very much. :) That's my code now: private: static import m3.m3; static import core.stdc.stdio; alias printf = core.stdc.stdio.printf; public: @trusted @nogc auto readFile(in string filename) nothrow { import std.c.stdio : FILE, SEEK_END, SEEK_SET, fopen, fclose, fseek, ftell, fread; FILE* f = fopen(filename.ptr, rb); fseek(f, 0, SEEK_END); immutable size_t fsize = ftell(f); fseek(f, 0, SEEK_SET); char[] str = m3.m3.make!(char[])(fsize); fread(str.ptr, fsize, 1, f); fclose(f); return str; } @trusted @nogc @property dstring toUTF32(in char[] s) { dchar[] r = m3.m3.make!(dchar[])(s.length); // r will never be longer than s foreach (immutable size_t i, dchar c; s) { r[i] = c; } return cast(dstring) r; } @nogc void main() { auto str = readFile(test_file.txt); scope(exit) m3.m3.destruct(str); auto str2 = str.toUTF32; printf(%d : %d\n, cast(int) str[0], cast(int) str2[0]); } m3 is my own module and means manual memory management, three m's so m3. If we will use D (what is now much more likely) that is our core module for memory management.
Re: Want to read a whole file as utf-8
On 2015-02-04 at 01:56, Namespace wrote: FILE* f = fopen(filename.ptr, rb); fseek(f, 0, SEEK_END); immutable size_t fsize = ftell(f); fseek(f, 0, SEEK_SET); That's quite a smart way to get the size of the file. I started with std.file.getSize (which obviously isn't marked as @nogc) and ended up with the monstrosity below (which I have only compiled on Windows), so I decided not to mention it in my previous post. Wouldn't be the point anyway, since I have only shown an example with a single-fill fixed buffer. But here it is, rendered useless by your code: long getFileSize(const char* cName) @nogc { version(Windows) { import core.sys.windows.windows; WIN32_FILE_ATTRIBUTE_DATA fad; if (!GetFileAttributesExA(cName, GET_FILEEX_INFO_LEVELS.GetFileExInfoStandard, fad)) return -1; ULARGE_INTEGER li; li.LowPart = fad.nFileSizeLow; li.HighPart = fad.nFileSizeHigh; return li.QuadPart; } else version(Posix) { import core.sys.posix.sys.stat; stat_t statbuf = void; if (stat(cName, statbuf)) return -1; return statbuf.st_size; } }