subject:"Want to read a whole file as utf\-8"

Re: Want to read a whole file as utf-8

2015-02-04 Thread Foo via Digitalmars-d-learn

Since I'm now almost finished, I'm glad to show you my work: 
https://github.com/Dgame/m3

You're free to use it or to contribute to it.

Want to read a whole file as utf-8

2015-02-03 Thread Foo via Digitalmars-d-learn

How can I do that without any GC allocation? Nothing in std.file 
seems to be marked with @nogc


I'm asking since it seems very complicated to do that with C++, 
maybe D is a better choice, then we would probably move our whole 
project from C++ to D.

Re: Want to read a whole file as utf-8

2015-02-03 Thread FG via Digitalmars-d-learn


On 2015-02-03 at 19:53, Foo wrote:

How can I do that without any GC allocation? Nothing in std.file seems to be 
marked with @nogc

I'm asking since it seems very complicated to do that with C++, maybe D is a 
better choice, then we would probably move our whole project from C++ to D.


Looks like std.stdio isn't marked with @nogc all the way either.

So for now the temporary solution would be to use std.c.stdio.
Get the file size, malloc a buffer large enough for it[1],
use std.c.stdio.read to fill it, assign it to a char[] slice
and std.utf.decode to consume the text...

Oh wait, decode isn't @nogc either. FFS, what now?


[1] I assume the file is small, otherwise there would be an extra step
involved where after nearing the end of the buffer you move the rest
of the data to the front, read new data after it, and continue decoding.

Re: Want to read a whole file as utf-8

2015-02-03 Thread Tobias Pankrath via Digitalmars-d-learn


On Tuesday, 3 February 2015 at 19:44:49 UTC, FG wrote:

On 2015-02-03 at 19:53, Foo wrote:
How can I do that without any GC allocation? Nothing in 
std.file seems to be marked with @nogc


I'm asking since it seems very complicated to do that with 
C++, maybe D is a better choice, then we would probably move 
our whole project from C++ to D.


Looks like std.stdio isn't marked with @nogc all the way either.

So for now the temporary solution would be to use std.c.stdio.
Get the file size, malloc a buffer large enough for it[1],
use std.c.stdio.read to fill it, assign it to a char[] slice
and std.utf.decode to consume the text...

Oh wait, decode isn't @nogc either. FFS, what now?


[1] I assume the file is small, otherwise there would be an 
extra step
involved where after nearing the end of the buffer you move the 
rest
of the data to the front, read new data after it, and continue 
decoding.


Use std.utf.validate instead of decode. It will only allocate one 
exception if necessary.

Re: Want to read a whole file as utf-8

2015-02-03 Thread FG via Digitalmars-d-learn


On 2015-02-03 at 20:50, Tobias Pankrath wrote:

Use std.utf.validate instead of decode. It will only allocate one exception if 
necessary.


Looks to me like it uses decode internally...

But Foo, do you have to use @nogc? It still looks like it's work in progress,
and lack of it doesn't mean that the GC is actually involved in the function.
It will probably take several months for the obvious nogc parts of the std lib
to get annotated, and much longer to get rid of unnecessary use of the GC.
So maybe the solution for now is to verify the source code of the function in
question with ones own set of eyeballs and decide if it's good enough for use,
ie. doesn't leak too much?

Re: Want to read a whole file as utf-8

2015-02-03 Thread Foo via Digitalmars-d-learn


On Tuesday, 3 February 2015 at 19:56:37 UTC, FG wrote:

On 2015-02-03 at 20:50, Tobias Pankrath wrote:
Use std.utf.validate instead of decode. It will only allocate 
one exception if necessary.


Looks to me like it uses decode internally...

But Foo, do you have to use @nogc? It still looks like it's 
work in progress,
and lack of it doesn't mean that the GC is actually involved in 
the function.
It will probably take several months for the obvious nogc parts 
of the std lib
to get annotated, and much longer to get rid of unnecessary use 
of the GC.
So maybe the solution for now is to verify the source code of 
the function in
question with ones own set of eyeballs and decide if it's good 
enough for use,

ie. doesn't leak too much?


Yes, we don't want to use a GC. We want determinsitic life times. 
I'm not the boss, but I support the idea.


@Nordlöw Neither of them can be marked with @nogc. :/

Re: Want to read a whole file as utf-8

2015-02-03 Thread Foo via Digitalmars-d-learn


On Tuesday, 3 February 2015 at 19:44:49 UTC, FG wrote:

On 2015-02-03 at 19:53, Foo wrote:
How can I do that without any GC allocation? Nothing in 
std.file seems to be marked with @nogc


I'm asking since it seems very complicated to do that with 
C++, maybe D is a better choice, then we would probably move 
our whole project from C++ to D.


Looks like std.stdio isn't marked with @nogc all the way either.

So for now the temporary solution would be to use std.c.stdio.
Get the file size, malloc a buffer large enough for it[1],
use std.c.stdio.read to fill it, assign it to a char[] slice
and std.utf.decode to consume the text...

Oh wait, decode isn't @nogc either. FFS, what now?


[1] I assume the file is small, otherwise there would be an 
extra step
involved where after nearing the end of the buffer you move the 
rest
of the data to the front, read new data after it, and continue 
decoding.


How would I use decoding for that? Isn't there a way to read the 
file as utf8 or event better, as unicode?

Re: Want to read a whole file as utf-8

2015-02-03 Thread Nordlöw


On Tuesday, 3 February 2015 at 18:53:28 UTC, Foo wrote:
How can I do that without any GC allocation? Nothing in 
std.file seems to be marked with @nogc


I'm asking since it seems very complicated to do that with C++, 
maybe D is a better choice, then we would probably move our 
whole project from C++ to D.


My module

https://github.com/nordlow/justd/blob/master/mmfile_ex.d

together with

https://github.com/nordlow/justd/blob/master/bylines.d

is about as low-level as you can get in D.

Re: Want to read a whole file as utf-8

2015-02-03 Thread FG via Digitalmars-d-learn


On 2015-02-04 at 00:07, Foo wrote:

How would I use decoding for that? Isn't there a way to read the file as utf8 
or event better, as unicode?


Well, apparently the utf-8-aware foreach loop still works just fine.
This program shows the file size and the number of unicode glyps, or whatever 
they are called:

import core.stdc.stdio;
int main() @nogc
{
const int bufSize = 64000;
char[bufSize] buffer;
size_t bytesRead, count;
FILE* f = core.stdc.stdio.fopen(test.d, r);
if (!f)
return 1;
bytesRead = fread(cast(void*)buffer, 1, bufSize, f);
if (bytesRead  bufSize - 1) {
printf(File is too big);
return 1;
}
if (!bytesRead)
return 2;
foreach (dchar d; buffer[0..bytesRead])
count++;
printf(read %d bytes, %d unicode characters\n, bytesRead, count);
fclose(f);
return 0;
}

Outputs for example this: read 838 bytes, 829 unicode characters

(It would be more complicated if it had to process bigger files.)

Re: Want to read a whole file as utf-8

2015-02-03 Thread Tobias Pankrath via Digitalmars-d-learn


On Tuesday, 3 February 2015 at 23:07:03 UTC, Foo wrote:

On Tuesday, 3 February 2015 at 19:44:49 UTC, FG wrote:

On 2015-02-03 at 19:53, Foo wrote:
How can I do that without any GC allocation? Nothing in 
std.file seems to be marked with @nogc


I'm asking since it seems very complicated to do that with 
C++, maybe D is a better choice, then we would probably move 
our whole project from C++ to D.


Looks like std.stdio isn't marked with @nogc all the way 
either.


So for now the temporary solution would be to use std.c.stdio.
Get the file size, malloc a buffer large enough for it[1],
use std.c.stdio.read to fill it, assign it to a char[] slice
and std.utf.decode to consume the text...

Oh wait, decode isn't @nogc either. FFS, what now?


[1] I assume the file is small, otherwise there would be an 
extra step
involved where after nearing the end of the buffer you move 
the rest
of the data to the front, read new data after it, and continue 
decoding.


How would I use decoding for that? Isn't there a way to read 
the file as utf8 or event better, as unicode?


Arrays of char, wchar and dchar are supposed to be UTF strings 
and of course you can just read them using a c function from a 
file. You'd just need to make sure they are valid UTF before 
passing them on to other parts of phobos.


What do you mean with as unicode?

Re: Want to read a whole file as utf-8

2015-02-03 Thread Namespace via Digitalmars-d-learn


On Tuesday, 3 February 2015 at 23:55:19 UTC, FG wrote:

On 2015-02-04 at 00:07, Foo wrote:
How would I use decoding for that? Isn't there a way to read 
the file as utf8 or event better, as unicode?


Well, apparently the utf-8-aware foreach loop still works just 
fine.
This program shows the file size and the number of unicode 
glyps, or whatever they are called:


import core.stdc.stdio;
int main() @nogc
{
const int bufSize = 64000;
char[bufSize] buffer;
size_t bytesRead, count;
FILE* f = core.stdc.stdio.fopen(test.d, r);
if (!f)
return 1;
bytesRead = fread(cast(void*)buffer, 1, bufSize, f);
if (bytesRead  bufSize - 1) {
printf(File is too big);
return 1;
}
if (!bytesRead)
return 2;
foreach (dchar d; buffer[0..bytesRead])
count++;
printf(read %d bytes, %d unicode characters\n, 
bytesRead, count);

fclose(f);
return 0;
}

Outputs for example this: read 838 bytes, 829 unicode characters

(It would be more complicated if it had to process bigger 
files.)


To use a foreach loop is such a nice idea! tank you very much. :)

That's my code now:

private:

static import m3.m3;
static import core.stdc.stdio;
alias printf = core.stdc.stdio.printf;

public:

@trusted
@nogc
auto readFile(in string filename) nothrow {
	import std.c.stdio : FILE, SEEK_END, SEEK_SET, fopen, fclose, 
fseek, ftell, fread;


FILE* f = fopen(filename.ptr, rb);
fseek(f, 0, SEEK_END);
immutable size_t fsize = ftell(f);
fseek(f, 0, SEEK_SET);

char[] str = m3.m3.make!(char[])(fsize);
fread(str.ptr, fsize, 1, f);
fclose(f);

return str;
}

@trusted
@nogc
@property
dstring toUTF32(in char[] s) {
dchar[] r = m3.m3.make!(dchar[])(s.length); // r will never 
be longer than s

foreach (immutable size_t i, dchar c; s) {
r[i] = c;
}

return cast(dstring) r;
}

@nogc
void main() {
auto str = readFile(test_file.txt);
scope(exit) m3.m3.destruct(str);

auto str2 = str.toUTF32;
printf(%d : %d\n, cast(int) str[0], cast(int) str2[0]);
}


m3 is my own module and means manual memory management, three 
m's so m3. If we will use D (what is now much more likely) that 
is our core module for memory management.

Re: Want to read a whole file as utf-8

2015-02-03 Thread FG via Digitalmars-d-learn


On 2015-02-04 at 01:56, Namespace wrote:


 FILE* f = fopen(filename.ptr, rb);
 fseek(f, 0, SEEK_END);
 immutable size_t fsize = ftell(f);
 fseek(f, 0, SEEK_SET);



That's quite a smart way to get the size of the file.

I started with std.file.getSize (which obviously isn't marked as @nogc) and 
ended up with the monstrosity below (which I have only compiled on Windows), so 
I decided not to mention it in my previous post. Wouldn't be the point anyway, 
since I have only shown an example with a single-fill fixed buffer. But here it 
is, rendered useless by your code:

long getFileSize(const char* cName) @nogc
{
version(Windows)
{
import core.sys.windows.windows;
WIN32_FILE_ATTRIBUTE_DATA fad;
if (!GetFileAttributesExA(cName, 
GET_FILEEX_INFO_LEVELS.GetFileExInfoStandard, fad))
return -1;
ULARGE_INTEGER li;
li.LowPart = fad.nFileSizeLow;
li.HighPart = fad.nFileSizeHigh;
return li.QuadPart;
}
else version(Posix)
{
import core.sys.posix.sys.stat;
stat_t statbuf = void;
if (stat(cName, statbuf))
return -1;
return statbuf.st_size;
}
}

Re: Want to read a whole file as utf-8

Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

Re: Want to read a whole file as utf-8

12 matches

Site Navigation

Mail list logo

Footer information