How to know whether a file's encoding is ansi or utf8?
Greetings! As subjected,how can I know whether a file is in UTF8 encoding or ansi? Thanks for the help in advance. Regards, Sam
Re: How to know whether a file's encoding is ansi or utf8?
On Tuesday, 22 July 2014 at 09:50:00 UTC, Sam Hu wrote: Greetings! As subjected,how can I know whether a file is in UTF8 encoding or ansi? Thanks for the help in advance. Regards, Sam Sorry,I mean by by code,for example,when I try to read a file content and printed to a text control in GUI,or to console,will proceed differently regarding file encoding.
Re: fork/waitpid and std.concurrency.spawn
On Tuesday, 22 July 2014 at 07:58:50 UTC, Puming wrote: Is there a fork()/wait() API similar to std.concurrency spawn()? The best thing I've got so far is module core.sys.posix.unistd.fork(), but it seems to only work in posix. Is there a unified API for process level concurrency? ideally with actor and send message support too. You need std.process.
Re: How to know whether a file's encoding is ansi or utf8?
Read the BOM ? module main; import std.stdio; enum Encoding { UTF7, UTF8, UTF32, Unicode, BigEndianUnicode, ASCII }; Encoding GetFileEncoding(string fileName) { import std.file; auto bom = cast(ubyte[]) read(fileName, 4); if (bom[0] == 0x2b bom[1] == 0x2f bom[2] == 0x76) return Encoding.UTF7; if (bom[0] == 0xef bom[1] == 0xbb bom[2] == 0xbf) return Encoding.UTF8; if (bom[0] == 0xff bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE if (bom[0] == 0xfe bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE if (bom[0] == 0 bom[1] == 0 bom[2] == 0xfe bom[3] == 0xff) return Encoding.UTF32; return Encoding.ASCII; } void main(string[] args) { if(GetFileEncoding(test.txt) == Encoding.UTF8) writeln(The file is UTF8); else writeln(File is not UTF8 :(); } On Tuesday, 22 July 2014 at 09:50:00 UTC, Sam Hu wrote: Greetings! As subjected,how can I know whether a file is in UTF8 encoding or ansi? Thanks for the help in advance. Regards, Sam
Re: Calling dynamically bound functions from weakly pure function
On Saturday, 19 July 2014 at 11:12:00 UTC, Marc Schütz wrote: Casting to pure would break purity if the called function is not actually pure. AFAIU, the problem is that the mutable function pointers are not accessible from inside the pure function at all, in which case the solution is to cast them to immutable, not to pure. Indeed that is the problem. I didn't think of casting to immutable, that should work.. But to cast something, you'd need to have access to it in the first place... This seems to work: int function(int) pure my_func_ptr; struct CallImmutable { static opDispatch(string fn, Args...)(Args args) { return mixin(fn)(args); } } int test() pure { return CallImmutable.my_func_ptr(1); } But I suspect it's because of a bug. `CallImmutable.opDispatch` should not be deduced to be pure, and this not be callable from `test`. Yeah that looks like a bug. I should be able to conjure something up, perhaps using assumeUnique, that won't break in newer versions. Thanks for the answers!
Re: fork/waitpid and std.concurrency.spawn
I've only found spawnProcess/spawnShell and the like, which executes a new command, but not a function pointer, like fork() and std.concurrency.spawn does. What is the function that does what I describe? On Tuesday, 22 July 2014 at 10:43:58 UTC, FreeSlave wrote: On Tuesday, 22 July 2014 at 07:58:50 UTC, Puming wrote: Is there a fork()/wait() API similar to std.concurrency spawn()? The best thing I've got so far is module core.sys.posix.unistd.fork(), but it seems to only work in posix. Is there a unified API for process level concurrency? ideally with actor and send message support too. You need std.process.
Re: How to know whether a file's encoding is ansi or utf8?
On Tuesday, 22 July 2014 at 11:59:34 UTC, Alexandre wrote: Read the BOM ? module main; import std.stdio; enum Encoding { UTF7, UTF8, UTF32, Unicode, BigEndianUnicode, ASCII }; Encoding GetFileEncoding(string fileName) { import std.file; auto bom = cast(ubyte[]) read(fileName, 4); if (bom[0] == 0x2b bom[1] == 0x2f bom[2] == 0x76) return Encoding.UTF7; if (bom[0] == 0xef bom[1] == 0xbb bom[2] == 0xbf) return Encoding.UTF8; if (bom[0] == 0xff bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE if (bom[0] == 0xfe bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE if (bom[0] == 0 bom[1] == 0 bom[2] == 0xfe bom[3] == 0xff) return Encoding.UTF32; return Encoding.ASCII; } void main(string[] args) { if(GetFileEncoding(test.txt) == Encoding.UTF8) writeln(The file is UTF8); else writeln(File is not UTF8 :(); } On Tuesday, 22 July 2014 at 09:50:00 UTC, Sam Hu wrote: Greetings! As subjected,how can I know whether a file is in UTF8 encoding or ansi? Thanks for the help in advance. Regards, Sam Thanks. This is exactly what I want at this moment.
Re: How to know whether a file's encoding is ansi or utf8?
Note that BOMs are optional and may be not presented in Unicode file. Also presence of leading bytes which look BOM does not necessarily mean that file is encoded in some kind of Unicode.
Re: Map one tuple to another Tuple of different type
I'm just confused about how static while is supposed to work because static foreach, to my understanding, would have to work by making a new type for each iteration. I say this because, 1) runtime foreach works like that (with type = range), and 2) without ctfe foreach, the only way I know of to iterate a typelist is to make a new type with one less element, so I imagine static foreach lowers to that. I suppose its possible to make a struct with static immutable start and end iterators, and make new types out of advancing the start iterator until it was equal to the end. Seems like a step backward though. Anyway my actual question is: if all values are constant at compile time, how would a static while loop terminate?
Re: How to know whether a file's encoding is ansi or utf8?
http://www.architectshack.com/TextFileEncodingDetector.ashx On Tuesday, 22 July 2014 at 15:53:23 UTC, FreeSlave wrote: Note that BOMs are optional and may be not presented in Unicode file. Also presence of leading bytes which look BOM does not necessarily mean that file is encoded in some kind of Unicode. There are several difficulties in this case ...
Re: fork/waitpid and std.concurrency.spawn
On Tuesday, 22 July 2014 at 14:26:05 UTC, Puming wrote: I've only found spawnProcess/spawnShell and the like, which executes a new command, but not a function pointer, like fork() and std.concurrency.spawn does. What is the function that does what I describe? On Tuesday, 22 July 2014 at 10:43:58 UTC, FreeSlave wrote: On Tuesday, 22 July 2014 at 07:58:50 UTC, Puming wrote: Is there a fork()/wait() API similar to std.concurrency spawn()? The best thing I've got so far is module core.sys.posix.unistd.fork(), but it seems to only work in posix. Is there a unified API for process level concurrency? ideally with actor and send message support too. You need std.process. I'm not sure what you're trying to do. Posix fork does not just spawn function, it spawns new process as copy of its parent and continue execution from the point where fork returns. Windows creates processes in some different way, and it seems there is no function with same functionality as Posix fork in WinAPI (by the way you can try to find some implementations on the Internet / use Cygwin / try to use Microsoft Posix Subsystem). I think the reason why phobos does not have functionality you want is that standard library should be platform-agnostic. So instead of emulating things which are not supported by some platform, it just truncates them.
Re: Map one tuple to another Tuple of different type
On Tue, Jul 22, 2014 at 03:52:14PM +, Vlad Levenfeld via Digitalmars-d-learn wrote: I'm just confused about how static while is supposed to work because static foreach, to my understanding, would have to work by making a new type for each iteration. I say this because, 1) runtime foreach works like that (with type = range), and 2) without ctfe foreach, the only way I know of to iterate a typelist is to make a new type with one less element, so I imagine static foreach lowers to that. I suppose its possible to make a struct with static immutable start and end iterators, and make new types out of advancing the start iterator until it was equal to the end. Seems like a step backward though. Anyway my actual question is: if all values are constant at compile time, how would a static while loop terminate? Basically, think of it as custom loop unrolling: TypeTuple!( int, x, float, y, uint, z ) t; // This loop: foreach (i; staticIota(0, 3)) { t[i]++; } // Is equivalent to: t[0]++; t[1]++; t[2]++; // Which is equivalent to: t.x++; t.y++; t.z++; The loop body is basically expanded for each iteration, with the loop variable suitably substituted with each element of the typelist. T -- Microsoft is to operating systems security ... what McDonalds is to gourmet cooking.
Need help with basic functional programming
I have been writing several lexers and parsers. The grammars I need to parse are really complex, and consequently I didn't feel confident about the code quality, especially in the lexers. So I decided to jump on the functional progamming bandwagon to see if that would help. It definitely does help, there are fewer lines of code, and I feel better about the code quality. I started at the high level, and had the input buffer return a range of characters, and the lexer return a range of tokens. But when I got down to the lower levels of building up tokens, I ran into a problem: First I started with this which worked: private void getNumber(MCInputStreamRange buf) { while (!buf.empty()) { p++; buf.popFront(); if (buf.front() = '0' || buf.front() = '9') break; *p = buf.front(); } curTok.kind = Token_t.NUMBER; curTok.image = cast(string) cbuffer[0 .. (p - cbuffer.ptr)].dup; } I thought I could improve this like so: private void getNumber(MCInputStreamRange buf) { auto s = buf.until(a = '0' || a = '9'); curTok.kind = Token_t.NUMBER; curTok.image = to!string(s); } The problem is that until seems to not stop at the end of the number, and instead continues until the end of the buffer. Am I doing something wrong here? Also, what is the fastest way to convert a range to a string? Thanks, Eric
Re: Need help with basic functional programming
Eric: while (!buf.empty()) { p++; buf.popFront(); Those () can be omitted, if you mind the noise (but you can also keep them). if (buf.front() = '0' || buf.front() = '9') break; std.ascii.isDigit helps. curTok.image = cast(string) cbuffer[0 .. (p - cbuffer.ptr)].dup; If you want a string, then idup is better. Try to minimize the number of casts in your code. auto s = buf.until(a = '0' || a = '9'); Perhaps you need a ! after the until, or a !q{a = '0' || a = '9'}. Also, what is the fastest way to convert a range to a string? The text function is the simplest. Bye, bearophile
Re: Need help with basic functional programming
On Tuesday, 22 July 2014 at 16:50:47 UTC, Eric wrote: private void getNumber(MCInputStreamRange buf) { auto s = buf.until(a = '0' || a = '9'); curTok.kind = Token_t.NUMBER; curTok.image = to!string(s); } The problem is that until seems to not stop at the end of the number, and instead continues until the end of the buffer. Am I doing something wrong here? You've forgotten the exclamation mark: buf.until!(...) Without it, the string is not the predicate, but the sentinel value. I.e. the range stops when it sees the characters a = '0' || a = '9'. By the way, do you really mean to stop on '0' and '9'? Do you perhaps mean a '0' || a '9'? Also, what is the fastest way to convert a range to a string? The fastest to type is probably text(r) (or r.text). The fastest for me to come up with is r.to!string, which does exactly the same. I don't know about run time, but text/to!string is hopefully fine.
Re: Need help with basic functional programming
By the way, do you really mean to stop on '0' and '9'? Do you perhaps mean a '0' || a '9'? Yes, my bad...
Re: Need help with basic functional programming
On Tuesday, 22 July 2014 at 17:09:29 UTC, bearophile wrote: Eric: while (!buf.empty()) { p++; buf.popFront(); Those () can be omitted, if you mind the noise (but you can also keep them). if (buf.front() = '0' || buf.front() = '9') break; std.ascii.isDigit helps. curTok.image = cast(string) cbuffer[0 .. (p - cbuffer.ptr)].dup; If you want a string, then idup is better. Try to minimize the number of casts in your code. auto s = buf.until(a = '0' || a = '9'); Perhaps you need a ! after the until, or a !q{a = '0' || a = '9'}. Also, what is the fastest way to convert a range to a string? The text function is the simplest. Bye, bearophile Thanks! All very good suggestions... -Eric
Re: Need help with basic functional programming
On Tuesday, 22 July 2014 at 17:09:29 UTC, bearophile wrote: Eric: while (!buf.empty()) { p++; buf.popFront(); Those () can be omitted, if you mind the noise (but you can also keep them). Actually, the ones behind `empty` and `front` are wrong, because these are defined to be properties. They just happen to work currently.
Re: Map one tuple to another Tuple of different type
On Tuesday, 22 July 2014 at 16:42:14 UTC, H. S. Teoh via Digitalmars-d-learn wrote: On Tue, Jul 22, 2014 at 03:52:14PM +, Vlad Levenfeld via Digitalmars-d-learn wrote: Anyway my actual question is: if all values are constant at compile time, how would a static while loop terminate? Basically, think of it as custom loop unrolling: TypeTuple!( int, x, float, y, uint, z ) t; // This loop: foreach (i; staticIota(0, 3)) { t[i]++; } // Is equivalent to: t[0]++; t[1]++; t[2]++; // Which is equivalent to: t.x++; t.y++; t.z++; The loop body is basically expanded for each iteration, with the loop variable suitably substituted with each element of the typelist. You're misunderstanding him. Your example is a static foreach, but Vlad asked about static while. I too don't see how a static while is supposed to work.
Re: Map one tuple to another Tuple of different type
Yes, though the loop unrolling is news to me. I'll have to keep that in mind next time I'm trying to squeeze some extra performance out of a loop. btw, found a static switch enhancement request here: https://issues.dlang.org/show_bug.cgi?id=6921