Re: Dealing with Autodecode

2016-05-31 Thread Kirill Kryukov via Digitalmars-d

On Wednesday, 1 June 2016 at 01:36:43 UTC, Adam D. Ruppe wrote:
D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY WITH A SIMPLE MIGRATION PATH


This.

I only recently started full scale use of D, but I lurked here 
for years. D has a few quirks here and there, but overall it's a 
fantastic language. However the biggest putting off factor for me 
is the attitude of the leadership towards fixing the issues and 
completing the language.


The idea of autodecoding is very natural to appear for someone 
who only recently discovered Unicode. Whoa, instead of code pages 
we now have "unicode code points". Great. Only much later the 
person realizes that working with code points isn't always 
correct. So I don't blame anyone for designing/implementing 
autodecoding years ago. But. Not acknowledging that autodecoding 
is seriously wrong now, looks like a complete brain damage.


The entire community seems united in the view that autodecoding 
is both slow and usually wrong. The users are begging for this 
breaking change. There's a number of approaches about handling 
the deprecation. Even the code that for some reason really needs 
to work with code points will benefit from explicitly stating 
that it needs code points. But no we must endure this madness 
forever.


I realize that priorities of a language user might be different 
from those of a language leadership. With fixed (removed) 
autodecoding the user gets a cleaner language. Their program will 
work faster and is easier to reason about. User's brain cycles 
are not wasted for useless crap like working around autodecoding.


On the other hand, the language/stdlib designer now has to admit 
their initial design was sub-optimal. Their books and articles 
are now obsolete. And they will be the ones who receive 
complaints from the inevitable few upset with the change.


However keeping the current situation means for me personally: 1. 
Not switching to D wholesale, but just toying with it. 2. Even 
when using D for work I don't want to talk about it to others. I 
was seriously thinking about starting a D-learning seminar at 
work, and I still might, but the thought that autodecoding is 
going to stay is cooling my enthusiasm.


I just did a numerical app in D, where it shines, I think. 
However much of my work code is dealing with huge texts. I don't 
want to fight with autodecode at every step. I'd like arrays of 
chars be arrays of chars without any magic crap auto-inserted 
behind my back. I don't want to become an expert in avoiding 
language pitfalls (The reason I abandoned C++ years ago). I also 
don't want to re-implement the staple string processing routines 
(though I might, if at least the language constructs work without 
autodecode, which seems not the case here).


Think about it. 99% of code working with code points is _broken_ 
anyway. (In the sense, that the usual assumption is that code 
point represents a character, while in fact it does not). When 
working with code units, the developer will notice the problem 
right away. When working with code points, the problem is not 
apparent until years later (essentially what happened to D 
itself).


Feel free to ignore my non-D-core-dev comment. Even though I 
suspect many D users may agree with me. An even larger number of 
potential D users does not want autodecoding either.


Thanks,
Kirill


Re: D Embedded Database v0.1 Released

2016-05-31 Thread Piotrek via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 22:08:00 UTC, Stefan Koch wrote:
Nice effort. How would you like collaboration with the SQLite-D 
project.


Thanks. Correct me if I'm wrong but SQLite-D is a compile time 
SQLite3 file reader. If so, I can predict not many common parts. 
Maybe the one would be a data deserialization component however I 
didn't check how it's done in SQLite-D.



With has similar goals albeit file format compatible to SQLite.


When I was selecting possible file format I was thinking about 
SQLite one. I am actually a fan of the SQLite project. However 
there are some shortcomings present in current SQlite3 format:


- SQlite3 is not really a one file storage (i.e. journal file)
- it gets fragmented very quickly (check out design goals for 
SQLite4)
- it's overcomplicated and non deterministic with respect to real 
time software
- it has unnecessary overhead because every column is actually a 
variant type


Add to this the main goal of replacing SQL with D 
ranges+algorithms. In result it turned out it would be great to 
have an alternate format.


BTW. Would someone be so kind and post the above paragraph on 
Reddit under a comment about Sqlite db. I'm not registered there.


Piotrek


[Issue 15885] float serialized to JSON loses precision

2016-05-31 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15885

--- Comment #4 from github-bugzi...@puremagic.com ---
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/7a486d9d038448595c74aa4ef4bd7d9e952a4b64
Fix issue 15885 - numeric values serialized to JSON lose precision.

https://github.com/dlang/phobos/commit/f4ad734aad6e3b2dd4881508d2b15eebb732a26c
Merge pull request #4345 from tsbockman/issue-15885-tsb

Fix issue 15885 - float serialized to JSON loses precision

--



[Issue 15885] float serialized to JSON loses precision

2016-05-31 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15885

github-bugzi...@puremagic.com changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--


Re: The Case Against Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d

On 5/31/2016 4:00 PM, ag0aep6g wrote:

Wikipedia says [1] that UCS-2 is essentially UTF-16 without surrogate pairs. I
suppose you mean UTF-32/UCS-4.
[1] https://en.wikipedia.org/wiki/UTF-16


Thanks for the correction.


Re: The Case Against Autodecode

2016-05-31 Thread Jack Stouffer via Digitalmars-d

On Wednesday, 1 June 2016 at 02:17:21 UTC, Jonathan M Davis wrote:

...


This thread is going in circles; the against crowd has stated 
each of their arguments very clearly at least five times in 
different ways.


The cost/benefit problems with auto decoding are as clear as day. 
If the evidence already presented in this thread (and in the many 
others) isn't enough to convince people of that, then I don't 
think anything else said will have an impact.


I don't want to sound like someone telling people not to discuss 
this anymore, but honestly, what is continuing this thread going 
to accomplish?


Re: Dealing with Autodecode

2016-05-31 Thread tsbockman via Digitalmars-d

On Wednesday, 1 June 2016 at 02:58:36 UTC, Brad Roberts wrote:

...the rate of bug fixing which exceeds the rate of fix pulling.


Speaking of which:
https://github.com/dlang/phobos/pull/4345
https://github.com/dlang/phobos/pull/3973



Re: Reddit announcements

2016-05-31 Thread Jason White via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 18:57:29 UTC, o-genki-desu-ka wrote:

Many nice announcements here last week. I put some on reddit.


Thank you for doing this! I agree previous posts though, that 
this is too many at once. Also, I think posting a link directly 
to the project instead of the forum post would have been better.


[Issue 16107] [ICE] - Internal error: backend/cgcod.c 2297

2016-05-31 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=16107

--- Comment #1 from b2.t...@gmx.com ---
definition of Foo can be reduced to

class Foo
{
alias TreeItemType = typeof(this);

TreeItemSiblings!TreeItemType _siblings;// remove this decl
TreeItemChildren!TreeItemType _children;// or this one  : OK
}
The content was initially a mixin template, which explains why it was
incoherant...anyway still the ICE.

--


Re: Button: A fast, correct, and elegantly simple build system.

2016-05-31 Thread Jason White via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 14:28:02 UTC, Dicebot wrote:
Can it be built from just plain dmd/phobos install available? 
One of major concernc behind discussion that resulted in Atila 
reggae effort is that propagating additional third-party 
dependencies is very damaging for build systems. Right now 
Button seems to fail rather hard on this front (i.e. Lua for 
build description + uncertain amount of build dependencies for 
Button itself).


Building it only requires dmd+phobos+dub.

Why is having dependencies so damaging for build systems? Does it 
really matter with a package manager like Dub? If there is 
another thread that answers these questions, please point me to 
it.


The two dependencies Button itself has could easily be moved into 
the same project. I kept them separate because they can be useful 
for others. These are the command-line parser and IO stream 
libraries.


As for the dependency on Lua, it is statically linked into a 
separate executable (called "button-lua") and building it is 
dead-simple (just run make). Using the Lua build description 
generator is actually optional, it's just that writing build 
descriptions in JSON would be horribly tedious.


[Issue 16107] New: [ICE] - Internal error: backend/cgcod.c 2297

2016-05-31 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=16107

  Issue ID: 16107
   Summary: [ICE] - Internal error: backend/cgcod.c 2297
   Product: D
   Version: D2
  Hardware: x86_64
OS: Linux
Status: NEW
  Severity: critical
  Priority: P1
 Component: dmd
  Assignee: nob...@puremagic.com
  Reporter: b2.t...@gmx.com

The following code, compiled with DMD 2.071.1-b2 crashes the compiler:

===
import std.stdio, std.traits;

struct TreeItemChildren(T){}

struct TreeItemSiblings(T){}

class Foo
{
enum isStruct = is(typeof(this) == struct);
static if (isStruct)
alias TreeItemType = typeof(this)*;
else
alias TreeItemType = typeof(this);

TreeItemSiblings!TreeItemType _siblings;// remove this decl
TreeItemChildren!TreeItemType _children;// or this one  : OK
}

template Bug(T)
{
bool check()
{
bool result;
import std.meta: aliasSeqOf;
import std.range: iota;

foreach(i;  aliasSeqOf!(iota(0, T.tupleof.length)))
{
alias MT = typeof(T.tupleof[i]);
static if (is(MT == struct))
result |= Bug!MT;   // result = result | ... : OK
if (result) break; // remove this line   : OK

}
return result;
}
enum Bug = check();
}

void main()
{
assert(!Bug!Foo);
}

produces 

> Internal error: backend/cgcod.c 2297

The comments in the code indicates each time that the bug doesn't happen when
the stuff is commented.

--


Re: Button: A fast, correct, and elegantly simple build system.

2016-05-31 Thread Jason White via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 10:15:14 UTC, Atila Neves wrote:

On Monday, 30 May 2016 at 19:16:50 UTC, Jason White wrote:
I am pleased to finally announce the build system I've been 
slowly working on for over a year in my spare time:


snip
In fact, there is some experimental support for automatic 
conversion of Makefiles to Button's build description format 
using a fork of GNU Make itself: 
https://github.com/jasonwhite/button-make


I'm going to take a look at that!


I think the Makefile converter is probably the coolest thing 
about this build system. I don't know of any other build system 
that has done this. The only problem is that it doesn't do well 
with Makefiles that invoke make recursively. I tried compiling 
Git using it, but Git does some funky stuff with recursive make 
like grepping the output of the sub-make.


- Can automatically build when an input file is modified 
(using inotify).


Nope, I never found that interesting. Possibly because I keep 
saving after every edit in OCD style and I really don't want 
things running automatically.


I constantly save like a madman too. If an incremental build is 
sufficiently fast, it doesn't really matter. You can also specify 
a delay so it accumulates changes and then after X milliseconds 
it runs a build.


- Recursive: It can build the build description as part of the 
build.


I'm not sure what that means. reggae copies CMake here and runs 
itself when the build description changes, if that's what you 
mean.


It means that Button can run Button as a build task (and it does 
it correctly). A child Button process reports its dependencies to 
the parent Button process via a pipe. This is the same mechanism 
that detects dependencies for ordinary tasks. Thus, there is no 
danger of doing incorrect incremental builds when recursively 
running Button like there is with Make.



- Lua is the primary build description language.


In reggae you can pick from D, Python, Ruby, Javascript and Lua.


That's pretty cool. It is possible for Button to do the same, but 
I don't really want to support that many languages. In fact, the 
Make and Lua build descriptions both work the same exact way - 
they output a JSON build description for Button to use. So long 
as someone can write a program to do this, they can write their 
build description in it.


Re: Button: A fast, correct, and elegantly simple build system.

2016-05-31 Thread Jason White via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 03:40:32 UTC, rikki cattermole wrote:

Are you on Freenode (no nick to name right now)?
I would like to talk to you about a few ideas relating to lua 
and D.


No, I'm not on IRC. I'll see if I can find the time to hop on 
this weekend.


Re: Dealing with Autodecode

2016-05-31 Thread Brad Roberts via Digitalmars-d

On 5/31/2016 7:40 PM, Walter Bright via Digitalmars-d wrote:

On 5/31/2016 7:28 PM, Jonathan M Davis via Digitalmars-d wrote:

The other critical thing is to make sure that Phobos in general works
with
byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
trying to use byCodeUnit instead of naked strings, I ran into this:

https://issues.dlang.org/show_bug.cgi?id=15800


That was posted 3 months ago. No PR to fix it (though it likely is an
easy fix). If we can't get these things fixed in Phobos, how can we tell
everyone else to fix their code?


I hope that wasn't a serious question.  The answer is trivial.  The rate 
of incoming bug reports exceeds the rate of bug fixing which exceeds the 
rate of fix pulling.  Has since about the dawn of time.


Re: Dealing with Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d

On 5/31/2016 7:28 PM, Jonathan M Davis via Digitalmars-d wrote:

The other critical thing is to make sure that Phobos in general works with
byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
trying to use byCodeUnit instead of naked strings, I ran into this:

https://issues.dlang.org/show_bug.cgi?id=15800


That was posted 3 months ago. No PR to fix it (though it likely is an easy fix). 
If we can't get these things fixed in Phobos, how can we tell everyone else to 
fix their code?




Re: Dealing with Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d

On 5/31/2016 6:36 PM, Adam D. Ruppe wrote:

Our preliminary investigation found about 130 places in Phobos that need to be
changed. That's not hard to fix!


PRs please!



Re: Dealing with Autodecode

2016-05-31 Thread Nick Sabalausky via Digitalmars-d

On 05/31/2016 09:36 PM, Adam D. Ruppe wrote:


version(string_migration)
deprecated void popFront(T)(ref T t) if(isSomeString!T) {
   static assert(0, "this is crap, fix your code.");
}
else
deprecated("use -versionstring_migration to fix your buggy code, would
you like to know more?")
/* existing popFront here */



I vote we use Adam's exact verbiage, too! :)



D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE QUALITY
WITH A SIMPLE MIGRATION PATH



Yes. This. If I wanted an endless bucket of baggage, I'd have stuck with 
C++.



3) A wee bit longer, we exterminate all this autodecoding crap and enjoy
Phobos being a smaller, more efficient library.



Yay! Profit!



Re: Dealing with Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 17:46:04 Walter Bright via Digitalmars-d wrote:
> It is not practical to just delete or deprecate autodecode - it is too
> embedded into things. What we can do, however, is stop using it ourselves
> and stop relying on it in the documentation, much like [] is eschewed in
> favor of std::vector in C++.
>
> The way to deal with it is to replace reliance on autodecode with .byDchar
> (.byDchar has a bonus of not throwing an exception on invalid UTF, but using
> the replacement dchar instead.)
>
> To that end, and this will be an incremental process:
>
> 1. Temporarily break autodecode such that using it will cause a compile
> error. Then, see what breaks in Phobos and fix those to use .byDchar
>
> 2. Change examples in the documentation and the Phobos examples to use
> .byDchar
>
> 3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit when
> dealing with ranges/arrays of characters to make it clear what is happening.

The other critical thing is to make sure that Phobos in general works with
byDChar, byCodeUnit, etc. For instance, pretty much as soon as I started
trying to use byCodeUnit instead of naked strings, I ran into this:

https://issues.dlang.org/show_bug.cgi?id=15800

But once Phobos no longer relies on autodecoding except maybe in places
where we can't actually excise it completely without breaking code (and
hopefully there are none of those), then we can look at how feasible the
full removal of auto-decoding really is. IMHO, leaving it in is a _huge_
piece of technical debt that we don't want and probably can't afford, so I
really don't think that we should just assume that we can't remove it due to
the breakage that it would cause. But we definitely have work to do before
we can have Phobos in a state where it's reasonable to even make an attempt.
byCodeUnit and friends were a good start, but we need to make it so that
they're treated as first-class citizens, and they're not right now.

- Jonathan M Davis



Re: Reddit announcements

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 5/31/16 2:57 PM, o-genki-desu-ka wrote:

Many nice announcements here last week. I put some on reddit.

https://www.reddit.com/r/programming/comments/4lwufi/d_embedded_database_v01_released/


https://www.reddit.com/r/programming/comments/4lwubv/c_to_d_converter_based_on_clang/


https://www.reddit.com/r/programming/comments/4lwu5p/coedit_2_ide_update_6_released/


https://www.reddit.com/r/programming/comments/4lwtxw/compiletime_sqlite_for_d_beta_release/


https://www.reddit.com/r/programming/comments/4lwtr0/button_a_fast_correct_and_elegantly_simple_build/


https://www.reddit.com/r/programming/comments/4lwtn9/first_release_of_powernex_an_os_kernel_written_in/


Very nice. Response has been positive. Thank you very much! -- Andrei


Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 23:36:20 Marco Leise via Digitalmars-d wrote:
> Am Tue, 31 May 2016 16:56:43 -0400
>
> schrieb Andrei Alexandrescu :
> > On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote:
> > > In the vast majority of cases what folks care about is full character
> >
> > How are you so sure? -- Andrei
>
> Because a full character is the typical unit of a written
> language. It's what we visualize in our heads when we think
> about finding a substring or counting characters. A special
> case of this is the reduction to ASCII where we can use code
> units in place of grapheme clusters.

Exactly. How many folks here have written code where the correct thing to do
is to search on code points? Under what circumstances is that even useful?
Code points are a mid-level abstraction between UTF-8/16 and graphemes that
are not particularly useful on their own. Yes, by using code points, we
eliminate the differences between the encodings, but how much code even
operates on multiple string types? Having all of your strings have the same
encoding fixes the consistency problem just as well as autodecoding to dchar
evereywhere does - and without the efficiency hit. Typically, folks operate
on string or char[] unless they're talking to the Windows API, in which
case, they need wchar[]. Our general recommendation is that D code operate
on UTF-8 except when it needs to operate on a different encoding because of
other stuff it has to interact with (like the Win32 API), in which case,
ideally it converts those strings to UTF-8 once they get into the D code and
operates on them as UTF-8, and anything that has to be output in a different
encoding is operated on as UTF-8 until it needs to be outputed, in which
case, it's converted to UTF-16 or whatever the target encoding is. Not
much of anyone is recommending that you use dchar[] everywhere, but that's
essentially what the range API is trying to force.

I think that it's very safe to say that the vast majority of string
processing either is looking to operate on strings as a whole or on
individual, full characters within a string. Code points are neither. While
code may play tricks with Unicode to be efficient (e.g. operating at the
code unit level where it can rather than decoding to either code points or
graphemes), or it might make assumptions about its data being ASCII-only,
aside from explicit Unicode processing code, I have _never_ seen code that
was actually looking to logically operate on only pieces of characters.
While it may operate on code units for efficiency, it's always looking to be
logically operating on string as a unit or on whole characters.

Anyone looking to operate on code points is going to need to take into
account the fact that they're not full characters, just like anyone who
operates on code units needs to take into account the fact that they're not
whole characters. Operating on code points as if they were characters -
which is exactly what D currently does with ranges - is just plain wrong.
We need to support operating at the code point level for those rare cases
where it's actually useful, but autedecoding makes no sense. It incurs a
performance penality without actually giving correct results except in those
rare cases where you want code points instead of full characters. And only
Unicode experts are ever going to want that. The average programmer who is
not super Unicode savvy doesn't even know what code points are. They're
clearly going to be looking to operate on strings as sequences of
characters, not sequences of code points. I don't see how anyone could
expect otherwise. Code points are a mid-level, Unicode abstraction that only
those who are Unicode savvy are going to know or care about, let alone want
to operate on.

- Jonathan M Davis



Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 20:38:14 Nick Sabalausky via Digitalmars-d wrote:
> On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
> > On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:
> >> On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:
> >>> Let's put the question this way. Given the following string, what do
> >>> *you*  think walkLength should return?
> >>>
> >>> şŭt̥ḛ́k̠
> >>
> >> The number of code units in the string. That's the contract promised and
> >> honored by Phobos. -- Andrei
> >
> > Code points I mean. -- Andrei
>
> Yes, we know it's the contract. ***That's the problem.*** As everybody
> is saying, it *SHOULDN'T* be the contract.
>
> Why shouldn't it be the contract? Because it's proven itself, both
> logically (as presented by pretty much everybody other than you in both
> this and other threads) and empirically (in phobos, warp, and other user
> code) to be both the least useful and most PITA option.

Exactly. Operating at the code point level rarely makes sense. What sorts of
algorithms purposefully do that in a typical program? Unless you're doing
very specific Unicode stuff or somehow know that your strings don't contain
any graphemes that are made up of multiple code points, operating at the
code point level is just bug-prone, and unless you're using dchar[]
everywhere, it's slow to boot, because you're strings have to be decoded
whether the algorithm needs to or not.

I think that it's very safe to say that the vast majority of string
algorithms are either able to operate at the code unit level without
decoding (though possibly encoding another string to match - e.g. with a
string comparison or search), or they have to operate at the grapheme level
in order to deal with full characters. A code point is borderline useless on
its own. It's just a step above the different UTF encodings without actually
getting to proper characters.

- Jonathan M Davis




Re: Variables should have the ability to be @nogc

2016-05-31 Thread Basile B. via Digitalmars-d

On Tuesday, 31 May 2016 at 23:46:59 UTC, Marco Leise wrote:

Am Tue, 31 May 2016 20:41:09 +
schrieb Basile B. :

The only thing is that I'm not sure about is the tri-state and 
the recursion. I cannot find a case where it would be 
justified.


The recursion is simply there to find pointers in nested 
structs and their GcScan annotations:


- the "auto" is like if there's no annotation.
- the "yes" seems useless because there is no case where the 
scanner should fail to detect members that are managed by the GC. 
It's for this case that things are a bit vague.


Otherwise only the "no" remains.

So far I'll go for this: https://dpaste.dzfl.pl/e3023ba6a7e2
with another annotation type name, for example 'AddGcRange' or 
'GcScan'.


Re: Dealing with Autodecode

2016-05-31 Thread Adam D. Ruppe via Digitalmars-d

On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:

It is not practical to just delete or deprecate autodecode


Yes, it is.

We need to stop holding on to the mistakes of the past. 9 of 10 
dentists agree that autodecoding is a mistake. Not just WAS a 
mistake, IS a mistake. It has ongoing cost. If we don't fix our 
attitude about these problems, we are going to turn into that 
very demon we despise, yea, even the next C++!


And that's not a good thing.


To that end, and this will be an incremental process:


I have a better one, that we discussed on IRC last night:

1) put the string overloads for front and popFront on a version 
switch:


version(string_migration)
deprecated void popFront(T)(ref T t) if(isSomeString!T) {
  static assert(0, "this is crap, fix your code.");
}
else
deprecated("use -versionstring_migration to fix your buggy code, 
would you like to know more?")

/* existing popFront here */


At the same time, make sure the various byWhatever functions and 
structs are easily available.


Our preliminary investigation found about 130 places in Phobos 
that need to be changed. That's not hard to fix! The static 
assert(0) version tells you the top-level call that triggered it. 
You go there, you add .byDchar or whatever, and recompile, it 
just works, migration achieved. Or better yet, you think about 
your code and fix it properly, boom, code quality improved.


D USERS **WANT** BREAKING CHANGES THAT INCREASE OVERALL CODE 
QUALITY WITH A SIMPLE MIGRATION PATH



2) After a while, we swap the version conditions, so opting into 
it preserves the old behavior for a while.


3) A wee bit longer, we exterminate all this autodecoding crap 
and enjoy Phobos being a smaller, more efficient library.




Re: Free the DMD backend

2016-05-31 Thread Alex Parrill via Digitalmars-d

On Tuesday, 31 May 2016 at 20:18:34 UTC, default0 wrote:
I have no idea how licensing would work in that regard but 
considering that DMDs backend is actively maintained and may 
eventually even be ported to D, wouldn't it at some point 
differ enough from Symantecs "original" backend to simply call 
the DMD backend its own thing?


The way I understand it is that no matter how different a 
derivative work (such as any modification to DMD) gets, it's 
still a derivative work, and is subject to the terms of the 
license of the original work.


Re: Dealing with Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d

On 5/31/2016 5:56 PM, Stefan Koch wrote:

It is only going to get harder to remove it.


Removing it from Phobos and adjusting the documentation as I suggested is the 
way forward regardless. If we can't get that done, how can we tell our users 
they have to do the same to their code?


Re: Transient ranges

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d

On 5/31/16 4:59 PM, Dicebot wrote:

On Tuesday, 31 May 2016 at 18:11:34 UTC, Steven Schveighoffer wrote:

1) Current definition of input range (most importantly, the fact `front`
has to be @property-like) implies `front` to always return the same
result until `popFront` is called.


Regardless of property-like or not, this should be the case.
Otherwise, popFront makes no sense.


Except it isn't in many cases you call "bugs" :(


If you want to use such "ranges", the compiler will not stop you. Just 
don't expect any help from Phobos.



2) For ranges that call predicates on elements to evaluate next element
this can only be achieved by caching - predicates are never required to
be pure.


Or, by not returning different things from your predicate.


It is perfectly legal for predicate to be non-pure and that would be
hugely annoying if anyone decided to prohibit it. Also even pure
predicates may be simply very expensive to evaluate which can make
`front` a silent pessimization.


There's no requirement or need to prevent it. Just a) don't do it, or b) 
deal with the consequences.





This is like saying RedBlackTree is broken when I give it a predicate
of "a == b".


RBL at least makes certain demands about valid predicate can be. This is
not case for ranges in general.


RedBlackTree with "a == b" will compile and operate. It just won't do 
much red-black-tree-like things.



3) But caching is sub-optimal performance wise and thus bunch of Phobos
algorithms violate `front` consistency / cheapness expectation
evaluating predicates each time it is called (liked map).


I don't think anything defensively caches front in case the next call
to front is different, unless that's specifically the reason for the
range.


And that makes input ranges violate implication #1 (front stability)
casually to the point it can't be relied at all and one has to always
make sure it is only evaluated once (make stack local copy or something
like that).


That's a little much. If you expect such things, you can construct them. 
There's no way for the functions to ascertain what your lambda is going 
to do (halting problem) and determine to cache or not based on that.



I think we should be aware that the range API doesn't prevent bugs of
all kinds. There's only so much analysis the compiler can do.


This is a totally valid code I want to actually work and not be
discarded as "bug".


Then it's not a bug? It's going to work just fine how you specified it. 
I just don't consider it a valid "range" for general purposes.


You can do this if you want caching:

only(0).map!(x => uniform(0, 10)).cache

-Steve


Re: Free the DMD backend

2016-05-31 Thread Eugene Wissner via Digitalmars-d

On Tuesday, 31 May 2016 at 20:12:33 UTC, Russel Winder wrote:
On Tue, 2016-05-31 at 10:09 +, Atila Neves via 
Digitalmars-d wrote:

 […]

No, no, no, no. We had LDC be the default already on Arch 
Linux for a while and it was a royal pain. I want to choose to 
use LDC when and if I need performance. Otherwise, I want my 
projects to compile as fast possible and be able to use all 
the shiny new features.


So write a new backend for DMD the licence of which allows DMD 
to be in Debian and Fedora.


LDC shouldn't be the default compiler to be included in Debian or 
Fedora. Reference compiler and the default D compiler in a 
particular distribution are two independent things.


Re: Dealing with Autodecode

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d

On 5/31/16 8:46 PM, Walter Bright wrote:

It is not practical to just delete or deprecate autodecode - it is too
embedded into things. What we can do, however, is stop using it
ourselves and stop relying on it in the documentation, much like [] is
eschewed in favor of std::vector in C++.

The way to deal with it is to replace reliance on autodecode with
.byDchar (.byDchar has a bonus of not throwing an exception on invalid
UTF, but using the replacement dchar instead.)

To that end, and this will be an incremental process:

1. Temporarily break autodecode such that using it will cause a compile
error. Then, see what breaks in Phobos and fix those to use .byDchar

2. Change examples in the documentation and the Phobos examples to use
.byDchar

3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit
when dealing with ranges/arrays of characters to make it clear what is
happening.


I gotta be honest, if the end of this tunnel doesn't have a char[] array 
which acts like an array in all circumstances, I see little point in 
changing anything.


-Steve


Re: The Case Against Autodecode

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d

On 5/31/16 4:38 PM, Timon Gehr wrote:

On 31.05.2016 21:51, Steven Schveighoffer wrote:

On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:

On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via
Digitalmars-d wrote:
[...]

Does walkLength yield the same number for all representations?


Let's put the question this way. Given the following string, what do
*you* think walkLength should return?


Compiler error.


What about e.g. joiner?


Compiler error. Better than what it does now.

-Steve


Re: Dealing with Autodecode

2016-05-31 Thread Stefan Koch via Digitalmars-d

On Wednesday, 1 June 2016 at 00:46:04 UTC, Walter Bright wrote:
It is not practical to just delete or deprecate autodecode - it 
is too embedded into things.

Which Things ?
The way to deal with it is to replace reliance on autodecode 
with .byDchar (.byDchar has a bonus of not throwing an 
exception on invalid UTF, but using the replacement dchar 
instead.)



To that end, and this will be an incremental process:



So does this mean we intend to carry the auto-decoding wart with 
us into the future. And telling everyone :
"The oblivious way is broken we just have it for backwards 
compatibility ?"


To come back to c++ [] vs. std.vector.

The actually have valid reasons; mainly c compatibility.
To keep [] as a pointer.

I believe As of now D is still flexible enough to make a radical 
change.

We cannot keep putting this off!

It is only going to get harder to remove it.



Dealing with Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d
It is not practical to just delete or deprecate autodecode - it is too embedded 
into things. What we can do, however, is stop using it ourselves and stop 
relying on it in the documentation, much like [] is eschewed in favor of 
std::vector in C++.


The way to deal with it is to replace reliance on autodecode with .byDchar 
(.byDchar has a bonus of not throwing an exception on invalid UTF, but using the 
replacement dchar instead.)


To that end, and this will be an incremental process:

1. Temporarily break autodecode such that using it will cause a compile error. 
Then, see what breaks in Phobos and fix those to use .byDchar


2. Change examples in the documentation and the Phobos examples to use .byDchar

3. Best practices should use .byDchar, .byWchar, .byChar, .byCodeUnit when 
dealing with ranges/arrays of characters to make it clear what is happening.


Re: The Case Against Autodecode

2016-05-31 Thread Nick Sabalausky via Digitalmars-d

On 05/31/2016 01:23 PM, Andrei Alexandrescu wrote:

On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:

The standard library has to fight against itself because of autodecoding!
The vast majority of the algorithms in Phobos are special-cased on
strings
in an attempt to get around autodecoding. That alone should highlight the
fact that autodecoding is problematic.


The way I see it is it's specialization to speed things up without
giving up the higher level abstraction. -- Andrei


Problem is, that "higher"[1] level abstraction you don't want to give up 
(ie working on code points) is rarely useful, and yet the default is to 
pay the price for something which is rarely useful.


[1] It's really the mid-level abstraction - grapheme is the high-level 
one (and more likely useful).




Re: The Case Against Autodecode

2016-05-31 Thread Nick Sabalausky via Digitalmars-d

On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:

On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:

On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:

Let's put the question this way. Given the following string, what do
*you*  think walkLength should return?

şŭt̥ḛ́k̠


The number of code units in the string. That's the contract promised and
honored by Phobos. -- Andrei


Code points I mean. -- Andrei


Yes, we know it's the contract. ***That's the problem.*** As everybody 
is saying, it *SHOULDN'T* be the contract.


Why shouldn't it be the contract? Because it's proven itself, both 
logically (as presented by pretty much everybody other than you in both 
this and other threads) and empirically (in phobos, warp, and other user 
code) to be both the least useful and most PITA option.




Re: The Case Against Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d

On 5/31/2016 1:57 AM, Chris wrote:

1. Given you experience with Warp, how hard would it be to clean Phobos up?


It's not hard, it's just a bit tedious.


2. After recoding a number of Phobos functions, how much code did actually break
(yours or someone else's)?.


It's been a while so I don't remember exactly, but as I recall if the API had to 
change, I created a new overload or a new name, and left the old one as it is. 
For the std.path functions, I just changed them. While that technically changed 
the API, I'm not aware of any actual problems it caused.


(Decoding file strings is a latent bug anyway, as pointed out elsewhere in this 
thread. It's a change that had to be made sooner or later.)




Re: Reddit announcements

2016-05-31 Thread Seb via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 20:47:39 UTC, cym13 wrote:

On Tuesday, 31 May 2016 at 19:33:46 UTC, John Colvin wrote:

On Tuesday, 31 May 2016 at 18:57:29 UTC, o-genki-desu-ka wrote:

Many nice announcements here last week. I put some on reddit.

https://www.reddit.com/r/programming/comments/4lwufi/d_embedded_database_v01_released/

https://www.reddit.com/r/programming/comments/4lwubv/c_to_d_converter_based_on_clang/

https://www.reddit.com/r/programming/comments/4lwu5p/coedit_2_ide_update_6_released/

https://www.reddit.com/r/programming/comments/4lwtxw/compiletime_sqlite_for_d_beta_release/

https://www.reddit.com/r/programming/comments/4lwtr0/button_a_fast_correct_and_elegantly_simple_build/

https://www.reddit.com/r/programming/comments/4lwtn9/first_release_of_powernex_an_os_kernel_written_in/


I'm a bit concerned that people will react negatively to them 
all being dumped at once.


 Same here, moreover while some annoncements are about "ready 
to show" projects (button or powernex for example) others like 
"D embedded database" clearly are too young not to annoye 
/programming/ people IMHO.


Currently there's a bot that posts everything to reddit, but it 
also somehow kills every discussion there.


https://www.reddit.com/r/d_language/

Btw if you have better ideas how to solve this problem, you might 
get involved in this discussion:


https://github.com/CyberShadow/DFeed/issues/63


Re: Split general into multiple threads

2016-05-31 Thread Seb via Digitalmars-d

On Sunday, 29 May 2016 at 11:44:25 UTC, ZombineDev wrote:

On Sunday, 29 May 2016 at 11:35:12 UTC, Seb wrote:

[...]


I like this list better than the current, but with one change: 
taking LDC out of core and renaming it to LDC and LLVM so other 
D projects that leverage LLVM can be hosted there (e.g. SDC, 
Calypso, CPP2D, etc) and to be on par with GDC.


Having an additional LLVM category sounds reasonable.
So we go with this new structure? Any major objections?

It would be nice to be able to move conversations. Instead of 
"please use > D.learn instead", you would see "moved to more 
appropriate D.learn".


See also: https://github.com/CyberShadow/DFeed/issues/67


Re: year to date pull statistics (week ending 2016-05-28)

2016-05-31 Thread Seb via Digitalmars-d

On Tuesday, 31 May 2016 at 23:48:00 UTC, Brad Roberts wrote:

total open: 252

created since 2016-01-01 and still open: 106

...
total open: 284
created since 2016-01-01 and still open: 142


Ouch - that's a huge spike!
What happened to the idea from dconf to automatically assing PR 
managers based on a hard-coded maintainers for modules and 
randomly otherwise?


Other ideas?


Re: Variables should have the ability to be @nogc

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 20:41:09 +
schrieb Basile B. :

> The only thing is that I'm not sure about is the tri-state and 
> the recursion. I cannot find a case where it would be justified.

The recursion is simply there to find pointers in nested
structs and their GcScan annotations:

// A does not need scanning
struct A
{
B b;
}

struct B
{
@noScan void* p;
}

The tri-state may not be necessary, I don't remember my
rationale there. I do use GcScan.automatic as the default in
memory allocation for example with the option to force it to
yes or no. It gives you more control, just in case.

-- 
Marco



Re: year to date pull statistics (week ending 2016-05-28)

2016-05-31 Thread Brad Roberts via Digitalmars-d

total open: 284
created since 2016-01-01 and still open: 142

 created  closed  delta
2016-05-29 - today  25  25  0
2016-05-22 - 2016-05-28 46  34-12
2016-05-15 - 2016-05-21 40  36 -4
2016-05-08 - 2016-05-14 82  55-27
2016-05-01 - 2016-05-07 37  59+22
2016-04-24 - 2016-04-30 74  85+11
2016-04-17 - 2016-04-23 51  58 +7
2016-04-10 - 2016-04-16 52  58 +6
2016-04-03 - 2016-04-09 64  44-20
2016-03-27 - 2016-04-02 65  60 -5
2016-03-20 - 2016-03-26 65  62 -3
2016-03-13 - 2016-03-19 44  51 +7
2016-03-06 - 2016-03-12 41  46 +5
2016-02-28 - 2016-03-05 54  47 -7
2016-02-21 - 2016-02-27 29  20 -9
2016-02-14 - 2016-02-20 32  36 +4
2016-02-07 - 2016-02-13 52  52  0
2016-01-31 - 2016-02-06 54  61 +7
2016-01-24 - 2016-01-30 40  37 -3
2016-01-17 - 2016-01-23 31  21-10
2016-01-10 - 2016-01-16 39  42 +3
2016-01-03 - 2016-01-09 26  33 +7
2016-01-01 - 2016-01-02  2   5 +3
   --- ------
  10451027-18

https://auto-tester.puremagic.com/chart.ghtml?projectid=1



[OT] UTF-16

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 15:47:02 -0700
schrieb Walter Bright :

> But I didn't know which encoding would win - UTF-8, UTF-16, or UCS-2, so D 
> bet 
> on all three. If I had a do-over, I'd just support UTF-8. UTF-16 is useful 
> pretty much only as a transitional encoding to talk with Windows APIs.

I think so too, although more APIs than just Windows use
UTF-16. Think of Java or ICU. Aside from their Java heritage
they found that it is the fastest encoding for transcoding
from and to Unicode as UTF-16 codepoints cover most 8-bit
codepages. Also Qt defined a char as UTF-16 code point, but
they probably regret it as the 'charmap' program KCharSelect
is now unable to show Unicode characters >= 0x1.

-- 
Marco



Re: The Case Against Autodecode

2016-05-31 Thread ag0aep6g via Digitalmars-d

On 06/01/2016 12:47 AM, Walter Bright wrote:

But I didn't know which encoding would win - UTF-8, UTF-16, or UCS-2, so
D bet on all three. If I had a do-over, I'd just support UTF-8. UTF-16
is useful pretty much only as a transitional encoding to talk with
Windows APIs. Nobody uses UCS-2 (it consumes far too much memory).


Wikipedia says [1] that UCS-2 is essentially UTF-16 without surrogate 
pairs. I suppose you mean UTF-32/UCS-4.



[1] https://en.wikipedia.org/wiki/UTF-16


Re: faster splitter

2016-05-31 Thread David Nadlinger via Digitalmars-d
On Tuesday, 31 May 2016 at 21:29:34 UTC, Andrei Alexandrescu 
wrote:
You may want to then try https://dpaste.dzfl.pl/392710b765a9, 
which generates inline code on all compilers. -- Andrei


In general, it might be beneficial to use 
ldc.intrinsics.llvm_expect (cf. __builtin_expect) for things like 
that in order to optimise basic block placement. (We should 
probably have a compiler-independent API for that in core.*, by 
the way.) In this case, the skip computation path is probably 
small enough for that not to matter much, though.


Another thing that might be interesting to do (now that you have 
a "clever" baseline) is to start counting cycles and make some 
comparisons against manual asm/intrinsics implementations. For 
short(-ish) needles, PCMPESTRI is probably the most promising 
candidate, although I suspect that for \r\n scanning in long 
strings in particular, an optimised AVX2 solution might have 
higher throughput.


Of course these observations are not very valuable without 
backing them up with measurements, but it seems like before 
optimising a generic search algorithm for short-needle test 
cases, having one's eyes on a solid SIMD baseline would be a 
prudent thing to do.


 — David


Re: The Case Against Autodecode

2016-05-31 Thread Walter Bright via Digitalmars-d

On 5/31/2016 1:20 PM, Marco Leise wrote:

[...]


I agree. I dealt the madness of code pages, Shift-JIS, EBCDIC, locales, etc., in 
the pre-Unicode days. Despite its problems, Unicode (and UTF-8) is a major 
improvement, and I mean major.


16 years ago, I bet that Unicode was the future, and events have shown that to 
be correct.


But I didn't know which encoding would win - UTF-8, UTF-16, or UCS-2, so D bet 
on all three. If I had a do-over, I'd just support UTF-8. UTF-16 is useful 
pretty much only as a transitional encoding to talk with Windows APIs. Nobody 
uses UCS-2 (it consumes far too much memory).


Re: D Embedded Database v0.1 Released

2016-05-31 Thread Stefan Koch via Digitalmars-d-announce

On Saturday, 28 May 2016 at 14:08:18 UTC, Piotrek wrote:

Short description

A database engine for quick and easy integration into any D 
program. Full compatibility with D types and ranges.


Design Goals (none is accomplished yet)

- ACID
- No external dependencies
- Single file storage
- Multithread support
- Suitable for microcontrollers


More info for interested at:

Docs:

https://gitlab.com/PiotrekDlang/DraftLib/blob/master/docs/database/index.md


Code:
https://gitlab.com/PiotrekDlang/DraftLib/tree/master/src

The project is at its early stage of development.

Piotrek


Nice effort. How would you like collaboration with the SQLite-D 
project.

With has similar goals albeit file format compatible to SQLite.



Re: Our Sister

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Wed, 1 Jun 2016 01:06:36 +1000
schrieb Manu via Digitalmars-d :

> D loves templates, but templates aren't a given. Closed-source
> projects often can't have templates in the public API (ie, source
> should not be available), and this is my world.

Same effect for GPL code. Funny. (Template instantiations are
like statically linking in the open source code.)

-- 
Marco



Re: The Case Against Autodecode

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 16:56:43 -0400
schrieb Andrei Alexandrescu :

> On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote:
> > In the vast majority of cases what folks care about is full character  
> 
> How are you so sure? -- Andrei

Because a full character is the typical unit of a written
language. It's what we visualize in our heads when we think
about finding a substring or counting characters. A special
case of this is the reduction to ASCII where we can use code
units in place of grapheme clusters.

-- 
Marco



Re: Getting the parameters and other attributes belonging to the function overload with the greatest number of arguments

2016-05-31 Thread pineapple via Digitalmars-d-learn

On Tuesday, 31 May 2016 at 20:46:37 UTC, Basile B. wrote:

Yes this can be done, you must use the getOverload trait:

https://dlang.org/spec/traits.html#getOverloads

The result of this trait is the function itself so it's not 
hard to use, e.g the result can be passed directly to 
'Parameters', 'ReturnType' and such library traits.


Awesome, thank you!


Re: Transient ranges

2016-05-31 Thread Dicebot via Digitalmars-d

On Tuesday, 31 May 2016 at 21:25:12 UTC, Timon Gehr wrote:

On 31.05.2016 22:59, Dicebot wrote:




I think we should be aware that the range API doesn't prevent 
bugs of

all kinds. There's only so much analysis the compiler can do.


This is a totally valid code I want to actually work and not be
discarded as "bug".


map often allows random access. Do you suggest it should cache 
opIndex too?


Random access map must store all already evaluated items in 
memory in mu opinion.


Re: faster splitter

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 04:18 PM, Chris wrote:

I actually thought that dmd didn't place
`computeSkip` inside of the loop. This begs the question if it should be
moved to the loop, in case we use it in Phobos, to make sure that it is
as fast as possible even with dmd. However, I like it the way it is now.


You may want to then try https://dpaste.dzfl.pl/392710b765a9, which 
generates inline code on all compilers. -- Andrei


Re: Transient ranges

2016-05-31 Thread Timon Gehr via Digitalmars-d

On 31.05.2016 22:59, Dicebot wrote:





I think we should be aware that the range API doesn't prevent bugs of
all kinds. There's only so much analysis the compiler can do.


This is a totally valid code I want to actually work and not be
discarded as "bug".


map often allows random access. Do you suggest it should cache opIndex too?


Re: The Case Against Autodecode

2016-05-31 Thread Max Samukha via Digitalmars-d
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu 
wrote:


If user code needs to go upper at the grapheme level, they can 
If anything this thread strengthens my opinion that 
autodecoding is a sweet spot. -- Andrei


Unicode FAQ disagrees (http://unicode.org/faq/utf_bom.html):

"Q: How about using UTF-32 interfaces in my APIs?

A: Except in some environments that store text as UTF-32 in 
memory, most Unicode APIs are using UTF-16. With UTF-16 APIs  the 
low level indexing is at the storage or code unit level, with 
higher-level mechanisms for graphemes or words specifying their 
boundaries in terms of the code units. This provides efficiency 
at the low levels, and the required functionality at the high 
levels."





Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 05:01:17PM -0400, Andrei Alexandrescu via Digitalmars-d 
wrote:
> On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Wasn't the whole point of operating at the code point level by
> > default to make it so that code would be operating on full
> > characters by default instead of chopping them up as is so easy to
> > do when operating at the code unit level?
> 
> The point is to operate on representation-independent entities
> (Unicode code points) instead of low-level representation-specific
> artifacts (code units).

This is basically saying that we operate on dchar[] by default, except
that we disguise its detrimental memory usage consequences by
compressing to UTF-8/UTF-16 and incurring the cost of decompression
every time we access its elements.  Perhaps you love the idea of running
an OS that stores all files in compressed form and always decompresses
upon every syscall to read(), but I prefer a higher-performance system.


> That's the contract, and it seems meaningful
> seeing how Unicode is defined in terms of code points as its abstract
> building block.

Where's this contract stated, and when did we sign up for this?


> If user code needs to go lower at the code unit level, they can do so.
> If user code needs to go upper at the grapheme level, they can do so.

Only with much pain by using workarounds to bypass meticulously-crafted
autodecoding algorithms in Phobos.


> If anything this thread strengthens my opinion that autodecoding is a
> sweet spot. -- Andrei

No, autodecoding is a stalemate that's neither fast nor correct.


T

-- 
"Real programmers can write assembly code in any language. :-)" -- Larry Wall


Re: The Case Against Autodecode

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 13:06:16 -0400
schrieb Andrei Alexandrescu :

> On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Equality does not require decoding. Similarly, functions like find don't
> > either. Something like filter generally would, but it's also not
> > particularly normal to filter a string on a by-character basis. You'd
> > probably want to get to at least the word level in that case.  
> 
> It's nice that the stdlib takes care of that.

Both "equality" and "find" require byGrapheme.

 ⇰ The equivalence algorithm first brings both strings to a
   common normalization form (NFD or NFC), which works on one
   grapheme cluster at a time and afterwards does the binary
   comparison.
   http://www.unicode.org/reports/tr15/#Canon_Compat_Equivalence

 ⇰ Find would yield false positives for the start of grapheme clusters.
   I.e. will match 'o' in an NFD "ö" (simplified example).
   http://www.unicode.org/reports/tr10/#Searching

-- 
Marco



Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d

On 31.05.2016 22:20, Marco Leise wrote:

Am Tue, 31 May 2016 16:29:33 +
schrieb Joakim:


>Part of it is the complexity of written language, part of it is
>bad technical decisions.  Building the default string type in D
>around the horrible UTF-8 encoding was a fundamental mistake,
>both in terms of efficiency and complexity.  I noted this in one
>of my first threads in this forum, and as Andrei said at the
>time, nobody agreed with me, with a lot of hand-waving about how
>efficiency wasn't an issue or that UTF-8 arrays were fine.
>Fast-forward years later and exactly the issues I raised are now
>causing pain.

Maybe you can dig up your old post and we can look at each of
your complaints in detail.



It is probably this one. Not sure what "exactly the issues" are though.

http://forum.dlang.org/thread/bwbuowkblpdxcpyse...@forum.dlang.org


Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote:

Wasn't the whole point of operating at the code point level by default to
make it so that code would be operating on full characters by default
instead of chopping them up as is so easy to do when operating at the code
unit level?


The point is to operate on representation-independent entities (Unicode 
code points) instead of low-level representation-specific artifacts 
(code units). That's the contract, and it seems meaningful seeing how 
Unicode is defined in terms of code points as its abstract building 
block. If user code needs to go lower at the code unit level, they can 
do so. If user code needs to go upper at the grapheme level, they can do 
so. If anything this thread strengthens my opinion that autodecoding is 
a sweet spot. -- Andrei


Re: Transient ranges

2016-05-31 Thread Dicebot via Digitalmars-d
On Tuesday, 31 May 2016 at 18:11:34 UTC, Steven Schveighoffer 
wrote:
1) Current definition of input range (most importantly, the 
fact `front`
has to be @property-like) implies `front` to always return the 
same

result until `popFront` is called.


Regardless of property-like or not, this should be the case. 
Otherwise, popFront makes no sense.


Except it isn't in many cases you call "bugs" :(

2) For ranges that call predicates on elements to evaluate 
next element
this can only be achieved by caching - predicates are never 
required to

be pure.


Or, by not returning different things from your predicate.


It is perfectly legal for predicate to be non-pure and that would 
be hugely annoying if anyone decided to prohibit it. Also even 
pure predicates may be simply very expensive to evaluate which 
can make `front` a silent pessimization.


This is like saying RedBlackTree is broken when I give it a 
predicate of "a == b".


RBL at least makes certain demands about valid predicate can be. 
This is not case for ranges in general.


3) But caching is sub-optimal performance wise and thus bunch 
of Phobos

algorithms violate `front` consistency / cheapness expectation
evaluating predicates each time it is called (liked map).


I don't think anything defensively caches front in case the 
next call to front is different, unless that's specifically the 
reason for the range.


And that makes input ranges violate implication #1 (front 
stability) casually to the point it can't be relied at all and 
one has to always make sure it is only evaluated once (make stack 
local copy or something like that).



I think we should be aware that the range API doesn't prevent 
bugs of all kinds. There's only so much analysis the compiler 
can do.


This is a totally valid code I want to actually work and not be 
discarded as "bug".


Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 03:44 PM, Jonathan M Davis via Digitalmars-d wrote:

In the vast majority of cases what folks care about is full character


How are you so sure? -- Andrei


Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 03:34 PM, ag0aep6g wrote:

On 05/31/2016 07:21 PM, Andrei Alexandrescu wrote:

Could you please substantiate that? My understanding is that code unit
is a higher-level Unicode notion independent of encoding, whereas code
point is an encoding-dependent representation detail. -- Andrei


You got the terms mixed up. Code unit is lower level. Code point is
higher level.


Apologies and thank you. -- Andrei



Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 04:55 PM, Andrei Alexandrescu wrote:

On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:

Let's put the question this way. Given the following string, what do
*you*  think walkLength should return?

şŭt̥ḛ́k̠


The number of code units in the string. That's the contract promised and
honored by Phobos. -- Andrei


Code points I mean. -- Andrei


Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 03:32 PM, H. S. Teoh via Digitalmars-d wrote:

Let's put the question this way. Given the following string, what do
*you*  think walkLength should return?

şŭt̥ḛ́k̠


The number of code units in the string. That's the contract promised and 
honored by Phobos. -- Andrei


Re: Is there any overhead iterating over a pointer using a slice?

2016-05-31 Thread Johan Engelen via Digitalmars-d-learn

On Tuesday, 31 May 2016 at 18:55:18 UTC, Gary Willoughby wrote:


If I have a pointer and iterate over it using a slice, like 
this:


T* foo = 

foreach (element; foo[0 .. length])
{
...
}

Is there any overhead compared with pointer arithmetic in a for 
loop?


Use the assembly output of your compiler to check! :-)  It's fun 
to look at.

For example, with GDC:
http://goo.gl/Ur9Srv

No difference.

cheers,
  Johan



[Issue 15371] __traits(getMember) should bypass the protection

2016-05-31 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=15371

--- Comment #4 from b2.t...@gmx.com ---
In the meantime, when the trait code is for a struct or a class it's possible
to use its '.tupleof' property. It's not affected by the visibility.

Instead of all member:

import std.meta: aliasSeqOf;
import std.range: iota;
foreach(i;  aliasSeqOf!(iota(0, T.tupleof.length)))
{
alias MT = typeof(T.tupleof[i]);
...
}
This is not exactly the same but when the trait code is to inspect the variable
types or UDAs it works fine.

--


Re: The Case Against Autodecode

2016-05-31 Thread Joakim via Digitalmars-d

On Tuesday, 31 May 2016 at 20:28:32 UTC, ag0aep6g wrote:

On 05/31/2016 06:29 PM, Joakim wrote:
D devs should lead the way in getting rid of the UTF-8 
encoding, not

bickering about how to make it more palatable.  I suggested a
single-byte encoding for most languages, with double-byte for 
the ones
which wouldn't fit in a byte.  Use some kind of header or 
other metadata
to combine strings of different languages, _rather than 
encoding the

language into every character!_


Guys, may I ask you to move this discussion to a new thread? 
I'd like to follow the (already crowded) autodecode thing, and 
this is really a separate topic.


No, this is the root of the problem, but I'm not interested in 
debating it, so you can go back to discussing how to avoid the 
elephant in the room.


Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 10:38:03PM +0200, Timon Gehr via Digitalmars-d wrote:
> On 31.05.2016 21:51, Steven Schveighoffer wrote:
> > On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:
> > > On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via
> > > Digitalmars-d wrote:
> > > [...]
> > > > Does walkLength yield the same number for all representations?
> > > 
> > > Let's put the question this way. Given the following string, what
> > > do *you* think walkLength should return?
> > 
> > Compiler error.
> > 
> > -Steve
> 
> What about e.g. joiner?

joiner is one of those algorithms that can work perfectly fine *without*
autodecoding anything at all. The only time it'd actually need to decode
would be if you're joining a set of UTF-8 strings with a UTF-16
delimiter, or some other such combination, which should be pretty rare.
After all, within the same application you'd usually only be dealing
with a single encoding rather than mixing UTF-8, UTF-16, and UTF-32
willy-nilly.

(Unless the code is specifically written for transcoding, in which case
decoding is part of the job description, so it should be expected that
the programmer ought to know how to do it properly without needing
Phobos to do it for him.)

Even in the case of s.joiner('Ш'), joiner could easily convert that
dchar into a short UTF-8 string and then operate directly on UTF-8.


T

-- 
Just because you survived after you did it, doesn't mean it wasn't stupid!


Re: Getting the parameters and other attributes belonging to the function overload with the greatest number of arguments

2016-05-31 Thread Basile B. via Digitalmars-d-learn

On Tuesday, 31 May 2016 at 20:06:47 UTC, pineapple wrote:
I'd like to find the overload of some function with the most 
parameters and (in this specific case) to get their identifiers 
using e.g. ParameterIdentifierTuple. There have also been cases 
where I'd have liked to iterate over the result of 
Parameters!func for each overload of that function. Can this be 
done, and if so how?


Yes this can be done, you must use the getOverload trait:

https://dlang.org/spec/traits.html#getOverloads

The result of this trait is the function itself so it's not hard 
to use, e.g the result can be passed directly to 'Parameters', 
'ReturnType' and such library traits.


Re: The Case Against Autodecode

2016-05-31 Thread Joakim via Digitalmars-d

On Tuesday, 31 May 2016 at 20:20:46 UTC, Marco Leise wrote:

Am Tue, 31 May 2016 16:29:33 +
schrieb Joakim :

Part of it is the complexity of written language, part of it 
is bad technical decisions.  Building the default string type 
in D around the horrible UTF-8 encoding was a fundamental 
mistake, both in terms of efficiency and complexity.  I noted 
this in one of my first threads in this forum, and as Andrei 
said at the time, nobody agreed with me, with a lot of 
hand-waving about how efficiency wasn't an issue or that UTF-8 
arrays were fine. Fast-forward years later and exactly the 
issues I raised are now causing pain.


Maybe you can dig up your old post and we can look at each of 
your complaints in detail.


Not interested.  I believe you were part of that thread then.  
Google it if you want to read it again.


UTF-8 is an antiquated hack that needs to be eradicated.  It 
forces all other languages than English to be twice as long, 
for no good reason, have fun with that when you're downloading 
text on a 2G connection in the developing world.  It is 
unnecessarily inefficient, which is precisely why 
auto-decoding is a problem. It is only a matter of time till 
UTF-8 is ditched.


You don't download twice the data. First of all, some
languages had two-byte encodings before UTF-8, and second web
content is full of HTML syntax and gzip compressed afterwards.


The vast majority can be encoded in a single byte, and are 
unnecessarily forced to two bytes by the inefficient UTF-8/16 
encodings.  HTML syntax is a non sequitur; compression helps but 
isn't as efficient as a proper encoding.



Take this Thai Wikipedia entry for example:
https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2
The download of the gzipped html is 11% larger in UTF-8 than
in Thai TIS-620 single-byte encoding. And that is dwarfed by
the size of JS + images. (I don't have the numbers, but I
expect the effective overhead to be ~2%).


Nobody on a 2G connection is waiting minutes to download such 
massive web pages.  They are mostly sending text to each other on 
their favorite chat app, and waiting longer and using up more of 
their mobile data quota if they're forced to use bad encodings.



Ironically a lot of symbols we take for granted would then
have to be implemented as HTML entities using their Unicode
code points(sic!). Amongst them basic stuff like dashes, degree
(°) and minute (′), accents in names, non-breaking space or
footnotes (↑).


No, they just don't use HTML, opting for much superior mobile 
apps instead. :)


D devs should lead the way in getting rid of the UTF-8 
encoding, not bickering about how to make it more palatable.  
I suggested a single-byte encoding for most languages, with 
double-byte for the ones which wouldn't fit in a byte.  Use 
some kind of header or other metadata to combine strings of 
different languages, _rather than encoding the language into 
every character!_


That would have put D on an island. "Some kind of header" would 
be a horrible mess to have in strings, because you have to 
account for it when concatenating strings and scan for them all 
the time to see if there is some interspersed 2 byte encoding 
in the stream. That's hardly better than UTF-8. And yes, a huge 
amount of websites mix scripts and a lot of other text uses the 
available extra symbols like ° or α,β,γ.


Let's see: a constant-time addition to a header or constantly 
decoding every character every time I want to manipulate the 
string... I wonder which is a better choice?!  You would not 
"intersperse" any other encodings, unless you kept track of those 
substrings in the header.  My whole point is that such mixing of 
languages or "extra symbols" is an extreme minority use case: the 
vast majority of strings are a single language.


The common string-handling use case, by far, is strings with 
only one language, with a distant second some substrings in a 
second language, yet here we are putting the overhead into 
every character to allow inserting characters from an 
arbitrary language!  This is madness.


No thx, madness was when we couldn't reliably open text files, 
because nowhere was the encoding stored and when you had to 
compile programs for each of a dozen codepages, so localized 
text would be rendered correctly. And your retro codepage 
system wont convince the world to drop Unicode either.


Unicode _is_ a retro codepage system, they merely standardized a 
bunch of the most popular codepages.  So that's not going away no 
matter what system you use. :)


Yes, the complexity of diacritics and combining characters 
will remain, but that is complexity that is inherent to the 
variety of written language.  UTF-8 is not: it is just a bad 
technical decision, likely chosen for ASCII compatibility and 
some misguided notion that being able to combine arbitrary 
language strings with no other metadata was worthwhile.  

Re: Reddit announcements

2016-05-31 Thread cym13 via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 19:33:46 UTC, John Colvin wrote:

On Tuesday, 31 May 2016 at 18:57:29 UTC, o-genki-desu-ka wrote:

Many nice announcements here last week. I put some on reddit.

https://www.reddit.com/r/programming/comments/4lwufi/d_embedded_database_v01_released/

https://www.reddit.com/r/programming/comments/4lwubv/c_to_d_converter_based_on_clang/

https://www.reddit.com/r/programming/comments/4lwu5p/coedit_2_ide_update_6_released/

https://www.reddit.com/r/programming/comments/4lwtxw/compiletime_sqlite_for_d_beta_release/

https://www.reddit.com/r/programming/comments/4lwtr0/button_a_fast_correct_and_elegantly_simple_build/

https://www.reddit.com/r/programming/comments/4lwtn9/first_release_of_powernex_an_os_kernel_written_in/


I'm a bit concerned that people will react negatively to them 
all being dumped at once.


 Same here, moreover while some annoncements are about "ready to 
show" projects (button or powernex for example) others like "D 
embedded database" clearly are too young not to annoye 
/programming/ people IMHO.


Re: Variables should have the ability to be @nogc

2016-05-31 Thread Basile B. via Digitalmars-d

On Tuesday, 31 May 2016 at 19:04:39 UTC, Marco Leise wrote:

Am Tue, 31 May 2016 15:53:44 +
schrieb Basile B. :

This solution seems smarter than using the existing '@nogc' 
attribute. Plus one also for the fact that nothing has to be 
done in DMD.


I just constrained myself to what can be done in user code from 
the start. :)


Did you encounter the issue with protected and private members 
?


For me when i've tested the template i've directly got some 
warnings. DMD interprets my 'getMember' calls as a deprecated 
abuse of bug 314 but in dmd 2.069 I would get true errors.


Actually it is in a large half-ported code base from C++ and I 
haven't ever had a running executable, nor did I test it with 
recent dmd versions. My idea was to mostly have @nogc code, but 
allow it for a transition time or places where GC use does not 
have an impact. Here is the code, free to use for all purposes.


Thx for sharing the template. When using '.tupleof' instead of 
the traits 'allMember'/'getMember' there's no issue with the 
visibility, which is awesome. It means that the template can be 
proposed very quickly in phobos.


The only thing is that I'm not sure about is the tri-state and 
the recursion. I cannot find a case where it would be justified.





Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 10:47:56PM +0300, Dmitry Olshansky via Digitalmars-d 
wrote:
> On 31-May-2016 01:00, Walter Bright wrote:
> > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> > > I don't agree on changing those. Indexing and slicing a char[] is
> > > really useful and actually not hard to do correctly (at least with
> > > regard to handling code units).
> > 
> > Yup. It isn't hard at all to use arrays of codeunits correctly.
> 
> Ehm as long as all you care for is operating on substrings I'd say.
> Working with individual character requires either decoding or clever
> tricks like operating on encoded UTF directly.
[...]

Working on individual characters needs byGrapheme, unless you know
beforehand that the character(s) you're working with are ASCII, or fits
in a single code unit.

About "clever tricks", it's not really that hard.  I was thinking that
things like s.canFind('Ш') should translate the 'Ш' into a UTF-8 byte
sequence, and then do a substring search directly on the encoded string.
This way, a large number of single-character algorithms don't even need
to decode.  The way UTF-8 is designed guarantees that there will not be
any false positives.  This will eliminate a lot of the current overhead
of autodecoding.


T

-- 
Klein bottle for rent ... inquire within. -- Stephen Mulraney


Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d

On 31.05.2016 21:51, Steven Schveighoffer wrote:

On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:

On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via
Digitalmars-d wrote:
[...]

Does walkLength yield the same number for all representations?


Let's put the question this way. Given the following string, what do
*you* think walkLength should return?


Compiler error.

-Steve


What about e.g. joiner?


Re: The Case Against Autodecode

2016-05-31 Thread ag0aep6g via Digitalmars-d

On 05/31/2016 06:29 PM, Joakim wrote:

D devs should lead the way in getting rid of the UTF-8 encoding, not
bickering about how to make it more palatable.  I suggested a
single-byte encoding for most languages, with double-byte for the ones
which wouldn't fit in a byte.  Use some kind of header or other metadata
to combine strings of different languages, _rather than encoding the
language into every character!_


Guys, may I ask you to move this discussion to a new thread? I'd like to 
follow the (already crowded) autodecode thing, and this is really a 
separate topic.


Re: D Embedded Database v0.1 Released

2016-05-31 Thread Dmitri via Digitalmars-d-announce

On Saturday, 28 May 2016 at 14:08:18 UTC, Piotrek wrote:

Short description

A database engine for quick and easy integration into any D 
program. Full compatibility with D types and ranges.


Design Goals (none is accomplished yet)

- ACID
- No external dependencies
- Single file storage
- Multithread support
- Suitable for microcontrollers


Example code:

import draft.database;

import std.stdio;

void main(string[] args)
{
static struct Test
{
int a;
string s;
}

auto db = DataBase("testme.db");
auto collection = 
db.collection!Test("collection_name",true);


collection.put(Test(1,"Hello DB"));


writeln(db.collection!Test("collection_name"));
}


More info for interested at:

Docs:

https://gitlab.com/PiotrekDlang/DraftLib/blob/master/docs/database/index.md


Code:
https://gitlab.com/PiotrekDlang/DraftLib/tree/master/src

The project is at its early stage of development.

Piotrek


This might provide useful information if you're aiming for 
something like sqlite (hopefully not offtopic):


https://github.com/cznic/ql

It's an embeddable database engine in Go with goals similar to 
yours and at an advanced stage.


regards,
dmitri.


Re: The Case Against Autodecode

2016-05-31 Thread Joakim via Digitalmars-d

On Tuesday, 31 May 2016 at 18:34:54 UTC, Jonathan M Davis wrote:
On Tuesday, May 31, 2016 16:29:33 Joakim via Digitalmars-d 
wrote:
UTF-8 is an antiquated hack that needs to be eradicated.  It 
forces all other languages than English to be twice as long, 
for no good reason, have fun with that when you're downloading 
text on a 2G connection in the developing world.  It is 
unnecessarily inefficient, which is precisely why 
auto-decoding is a problem. It is only a matter of time till 
UTF-8 is ditched.


Considering that *nix land uses UTF-8 almost exclusively, and 
many C libraries do even on Windows, I very much doubt that 
UTF-8 is going anywhere anytime soon - if ever. The Win32 API 
does use UTF-16, and Java and C# do, but vast sea of code that 
is C or C++ generally uses UTF-8 as do plenty of other 
programming languages.


I agree that both UTF encodings are somewhat popular now.

And even aside from English, most European languages are going 
to be more efficient with UTF-8, because they're still 
primarily ASCII even if they contain characters that are not. 
Stuff like Chinese is definitely worse in UTF-8 than it would 
be in UTF-16, but there are a lot of languages other than 
English which are going to encode better with UTF-8 than UTF-16 
- let alone UTF-32.


And there are a lot more languages that will be twice as long 
than English, ie ASCII.


Regardless, UTF-8 isn't going anywhere anytime soon. _Way_ too 
much uses it for it to be going anywhere, and most folks have 
no problem with that. Any attempt to get rid of it would be a 
huge, uphill battle.


I disagree, it is inevitable.  Any tech so complex and 
inefficient cannot last long.


But D supports UTF-8, UTF-16, _and_ UTF-32 natively - even 
without involving the standard library - so anyone who wants to 
avoid UTF-8 is free to do so.


Yes, but not by using UTF-16/32, which use too much memory.  I've 
suggested a single-byte encoding for most languages instead, both 
in my last post and the earlier thread.


D could use this new encoding internally, while keeping its 
current UTF-8/16 strings around for any outside UTF-8/16 data 
passed in.  Any of that data run through algorithms that don't 
require decoding could be kept in UTF-8, but the moment any 
decoding is required, D would translate UTF-8 to the new 
encoding, which would be much easier for programmers to 
understand and manipulate. If UTF-8 output is needed, you'd have 
to encode back again.


Yes, this translation layer would be a bit of a pain, but the new 
encoding would be so much more efficient and understandable that 
it would be worth it, and you're already decoding and encoding 
back to UTF-8 for those algorithms now.  All that's changing is 
that you're using a new and different encoding than dchar as the 
default.  If it succeeds for D, it could then be sold more widely 
as a replacement for UTF-8/16.


I think this would be the right path forward, not navigating this 
UTF-8/16 mess further.


Re: The Case Against Autodecode

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 16:29:33 +
schrieb Joakim :

> Part of it is the complexity of written language, part of it is 
> bad technical decisions.  Building the default string type in D 
> around the horrible UTF-8 encoding was a fundamental mistake, 
> both in terms of efficiency and complexity.  I noted this in one 
> of my first threads in this forum, and as Andrei said at the 
> time, nobody agreed with me, with a lot of hand-waving about how 
> efficiency wasn't an issue or that UTF-8 arrays were fine.  
> Fast-forward years later and exactly the issues I raised are now 
> causing pain.

Maybe you can dig up your old post and we can look at each of
your complaints in detail.

> UTF-8 is an antiquated hack that needs to be eradicated.  It 
> forces all other languages than English to be twice as long, for 
> no good reason, have fun with that when you're downloading text 
> on a 2G connection in the developing world.  It is unnecessarily 
> inefficient, which is precisely why auto-decoding is a problem.  
> It is only a matter of time till UTF-8 is ditched.

You don't download twice the data. First of all, some
languages had two-byte encodings before UTF-8, and second web
content is full of HTML syntax and gzip compressed afterwards.
Take this Thai Wikipedia entry for example:
https://th.wikipedia.org/wiki/%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B9%80%E0%B8%97%E0%B8%A8%E0%B9%84%E0%B8%97%E0%B8%A2
The download of the gzipped html is 11% larger in UTF-8 than
in Thai TIS-620 single-byte encoding. And that is dwarfed by
the size of JS + images. (I don't have the numbers, but I
expect the effective overhead to be ~2%).
Ironically a lot of symbols we take for granted would then
have to be implemented as HTML entities using their Unicode
code points(sic!). Amongst them basic stuff like dashes, degree
(°) and minute (′), accents in names, non-breaking space or
footnotes (↑).

> D devs should lead the way in getting rid of the UTF-8 encoding, 
> not bickering about how to make it more palatable.  I suggested a 
> single-byte encoding for most languages, with double-byte for the 
> ones which wouldn't fit in a byte.  Use some kind of header or 
> other metadata to combine strings of different languages, _rather 
> than encoding the language into every character!_

That would have put D on an island. "Some kind of header"
would be a horrible mess to have in strings, because you have
to account for it when concatenating strings and scan for them
all the time to see if there is some interspersed 2 byte
encoding in the stream. That's hardly better than UTF-8. And
yes, a huge amount of websites mix scripts and a lot of other
text uses the available extra symbols like ° or α,β,γ.

> The common string-handling use case, by far, is strings with only 
> one language, with a distant second some substrings in a second 
> language, yet here we are putting the overhead into every 
> character to allow inserting characters from an arbitrary 
> language!  This is madness.

No thx, madness was when we couldn't reliably open text files,
because nowhere was the encoding stored and when you had to
compile programs for each of a dozen codepages, so localized
text would be rendered correctly. And your retro codepage
system wont convince the world to drop Unicode either.

> Yes, the complexity of diacritics and combining characters will 
> remain, but that is complexity that is inherent to the variety of 
> written language.  UTF-8 is not: it is just a bad technical 
> decision, likely chosen for ASCII compatibility and some 
> misguided notion that being able to combine arbitrary language 
> strings with no other metadata was worthwhile.  It is not.

The web proves you wrong. Scripts do get mixed often. Be it
Wikipedia, a foreign language learning site or mathematical
symbols.

-- 
Marco



Re: asm woes...

2016-05-31 Thread Era Scarecrow via Digitalmars-d-learn

On Tuesday, 31 May 2016 at 18:52:16 UTC, Marco Leise wrote:
The 'this' pointer is usually in some register already. On 
Linux 32-bit for example it is in EAX, on Linux 64-bit is in 
RDI.


 The AX register seems like a bad choice, since you require the 
AX/DX registers when you do multiplication and division (although 
all other registers are general purpose some instructions are 
still tied to specific registers). SI/DI are a much better choice.


By the way, you are right that 32-bit does not have access to 
64-bit machine words (actually kind of obvious), but your idea 
wasn't far fetched, since there is the X32 architecture at 
least for Linux. It uses 64-bit machine words, but 32-bit 
pointers and allows for compact and fast programs.


 As i recall the switch to use the larger registers is a simple 
switch per instruction, something like either 60h, 66h or 67h. I 
forget which one exactly, as i recall writing assembly programs 
using 16bit DOS but using 32bit registers using that trick (built 
into the assembler). Although to use the lower registers by 
themselves required the same switch, so...


Re: faster splitter

2016-05-31 Thread Chris via Digitalmars-d

On Tuesday, 31 May 2016 at 19:59:50 UTC, qznc wrote:

On Tuesday, 31 May 2016 at 19:29:25 UTC, Chris wrote:
Would it speed things up even more, if we put the function 
`computeSkip` into the loop or is this done automatically by 
the compiler?


LDC inlines it. DMD does not.

More numbers:

./benchmark.ldc
Search in Alice in Wonderland
   std: 147 ±1
manual: 100 ±0
  qznc: 121 ±1
 Chris: 103 ±1
Andrei: 144 ±1
   Andrei2: 105 ±1
Search in random short strings
   std: 125 ±15
manual: 117 ±10
  qznc: 104 ±6
 Chris: 123 ±14
Andrei: 104 ±5
   Andrei2: 103 ±4
Mismatch in random long strings
   std: 140 ±22
manual: 164 ±64
  qznc: 115 ±13
 Chris: 167 ±63
Andrei: 161 ±68
   Andrei2: 106 ±9
Search random haystack with random needle
   std: 138 ±27
manual: 135 ±33
  qznc: 116 ±16
 Chris: 141 ±36
Andrei: 131 ±33
   Andrei2: 109 ±12
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Random short strings has haystacks of 10 to 300 characters and 
needles of 2 to 10. Basically, no time for initialisation.


Random long strings has haystacks of size 1000, 10_000, 
100_000, or 1_000_000 and needles 50 to 500. It inserts a 
character into a random index of the needle to force a mismatch.


The last one is the configuration as before.

Overall, Andrei2 (the lazy compute skip) is really impressive. 
:)


Yep. It's really impressive. I actually thought that dmd didn't 
place `computeSkip` inside of the loop. This begs the question if 
it should be moved to the loop, in case we use it in Phobos, to 
make sure that it is as fast as possible even with dmd. However, 
I like it the way it is now.


`Adrei2` is that it performs consistently well.


Re: Free the DMD backend

2016-05-31 Thread default0 via Digitalmars-d
I have no idea how licensing would work in that regard but 
considering that DMDs backend is actively maintained and may 
eventually even be ported to D, wouldn't it at some point differ 
enough from Symantecs "original" backend to simply call the DMD 
backend its own thing?


Or are all the changes to the DMD backend simply changes to 
Symantecs backend period?


Then again even if that'd legally be fine after some point, 
someone would have to make the judgement call and that seems like 
a potentially large legal risk, so I guess even if it'd work that 
way it would be an unrealistic step.


Re: Free the DMD backend

2016-05-31 Thread Russel Winder via Digitalmars-d
On Tue, 2016-05-31 at 10:09 +, Atila Neves via Digitalmars-d wrote:
> […]
> 
> No, no, no, no. We had LDC be the default already on Arch Linux 
> for a while and it was a royal pain. I want to choose to use LDC 
> when and if I need performance. Otherwise, I want my projects to 
> compile as fast possible and be able to use all the shiny new 
> features.

So write a new backend for DMD the licence of which allows DMD to be in
Debian and Fedora.

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

signature.asc
Description: This is a digitally signed message part


Getting the parameters and other attributes belonging to the function overload with the greatest number of arguments

2016-05-31 Thread pineapple via Digitalmars-d-learn
I'd like to find the overload of some function with the most 
parameters and (in this specific case) to get their identifiers 
using e.g. ParameterIdentifierTuple. There have also been cases 
where I'd have liked to iterate over the result of 
Parameters!func for each overload of that function. Can this be 
done, and if so how?




Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 22:47:56 Dmitry Olshansky via Digitalmars-d wrote:
> On 31-May-2016 01:00, Walter Bright wrote:
> > On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:
> >> I don't agree on changing those. Indexing and slicing a char[] is
> >> really useful
> >> and actually not hard to do correctly (at least with regard to
> >> handling code
> >> units).
> >
> > Yup. It isn't hard at all to use arrays of codeunits correctly.
>
> Ehm as long as all you care for is operating on substrings I'd say.
> Working with individual character requires either decoding or clever
> tricks like operating on encoded UTF directly.

Yeah, but Phobos provides the tools to do that reasonably easily even when
autodecoding isn't involved. Sure, it's slightly more tedious to call
std.utf.decode or std.utf.encode yourself rather than letting autodecoding
take care of it, but it's easy enough to do and allows you to control when
it's done. And we have stuff like byChar!dchar or byGrapheme for the cases
where you don't want to actually operate on arrays of code units.

- Jonathan M Davis



Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 21:48:36 Timon Gehr via Digitalmars-d wrote:
> On 31.05.2016 21:40, Wyatt wrote:
> > On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
> >> The 'length' of a character is not one in all contexts.
> >> The following text takes six columns in my terminal:
> >>
> >> 日本語
> >> 123456
> >
> > That's a property of your font and font rendering engine, not Unicode.
>
> Sure. Hence "context". If you are e.g. trying to manually underline some
> text in console output, for example in a compiler error message,
> counting the number of characters will not actually be what you want,
> even though it works reliably for ASCII text.
>
> > (Also, it's probably not quite six columns; most fonts I've tested, 漢字
> > are rendered as something like 1.5 characters wide, assuming your
> > terminal doesn't overlap them.)
> >
> > -Wyatt
>
> It's precisely six columns in my terminal (also in emacs and in gedit).
>
> My point was, how can std.algorithm ever guess correctly what you
> /actually/ intended to do?

It can't, which is precisely why having it select for you was a bad design
decision. The programmer needs to be making that decision. And the fact that
Phobos currently makes that decision for you means that it's often doing the
wrong thing, and the fact that it chose to decode code points by default
means that it's often eating up unnecessary cycles to boot.

- Jonathan M Davis




Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 15:33:38 Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/31/2016 02:53 PM, Jonathan M Davis via Digitalmars-d wrote:
> > walkLength treats a code point like it's a character.
>
> No, it treats a code point like it's a code point. -- Andrei

Wasn't the whole point of operating at the code point level by default to
make it so that code would be operating on full characters by default
instead of chopping them up as is so easy to do when operating at the code
unit level? Thanks to how Phobos treats strings as ranges of dchar, most D
code treats code points as if they were characters. So, whether it's correct
or not, a _lot_ of D code is treating walkLength like it returns the number
of characters in a string. And if walkLength doesn't provide the number of
characters in a string, why would I want to use it under normal
circumstances? Why would I want to be operating at the code point level in
my code? It's not necessarily a full character, since it's not necessarily a
grapheme. So, by using walkLength and front and popFront and whatnot with
strings, I'm not getting full characters. I'm still only getting pieces of
characters - just like would happen if strings were treated as ranges of
code units. I'm just getting bigger pieces of the characters out of the
deal. But if they're not full characters, what's the point?

I am sure that there is code that is going to want to operate at the code
point level, but your average program is either operating on strings as a
whole or individual characters. As long as strings are being operated on as
a whole, code units are generally plenty, and careful encoding of characters
into code units for comparisons means that much of the time that you want to
operate on individual characters, you can still operate at the code unit
level. But if you can't, then you need the grapheme level, because a code
point is not necessarily a full character.

So, what is the point of operating on code points in your average D program?
walkLength will not always tell me the number of characters in a string.
front risks giving me a partial character rather than a whole one. Slicing
dchar[] risks chopping up characters just like slicing char[] does.
Operating on code points by default does not result in correct string
processing.

I honestly don't see how autodecoding is defensible. We may not be able to
get rid of it due to the breakage that doing that would cause, but I fail to
see how it is at all desirable that we have autodecoded strings. I can
understand how we got it if it's based on a misunderstanding on your part
about how Unicode works. We all make mistakes. But I fail to see how
autodecoding wasn't a mistake. It's the worst of both worlds - inefficient
while still incorrect. At least operating at the code unit level would be
fast while being incorrect, and it would be obviously incorrect once you did
anything with non-ASCII values, whereas it's easy to miss that ranges of
dchar are doing the wrong thing too

- Jonathan M Davis



Re: faster splitter

2016-05-31 Thread qznc via Digitalmars-d

On Tuesday, 31 May 2016 at 19:29:25 UTC, Chris wrote:
Would it speed things up even more, if we put the function 
`computeSkip` into the loop or is this done automatically by 
the compiler?


LDC inlines it. DMD does not.

More numbers:

./benchmark.ldc
Search in Alice in Wonderland
   std: 147 ±1
manual: 100 ±0
  qznc: 121 ±1
 Chris: 103 ±1
Andrei: 144 ±1
   Andrei2: 105 ±1
Search in random short strings
   std: 125 ±15
manual: 117 ±10
  qznc: 104 ±6
 Chris: 123 ±14
Andrei: 104 ±5
   Andrei2: 103 ±4
Mismatch in random long strings
   std: 140 ±22
manual: 164 ±64
  qznc: 115 ±13
 Chris: 167 ±63
Andrei: 161 ±68
   Andrei2: 106 ±9
Search random haystack with random needle
   std: 138 ±27
manual: 135 ±33
  qznc: 116 ±16
 Chris: 141 ±36
Andrei: 131 ±33
   Andrei2: 109 ±12
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Random short strings has haystacks of 10 to 300 characters and 
needles of 2 to 10. Basically, no time for initialisation.


Random long strings has haystacks of size 1000, 10_000, 100_000, 
or 1_000_000 and needles 50 to 500. It inserts a character into a 
random index of the needle to force a mismatch.


The last one is the configuration as before.

Overall, Andrei2 (the lazy compute skip) is really impressive. :)


Re: The Case Against Autodecode

2016-05-31 Thread Steven Schveighoffer via Digitalmars-d

On 5/31/16 3:32 PM, H. S. Teoh via Digitalmars-d wrote:

On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d 
wrote:
[...]

Does walkLength yield the same number for all representations?


Let's put the question this way. Given the following string, what do
*you* think walkLength should return?


Compiler error.

-Steve


Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 07:40:13PM +, Wyatt via Digitalmars-d wrote:
> On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
> > 
> > The 'length' of a character is not one in all contexts.
> > The following text takes six columns in my terminal:
> > 
> > 日本語
> > 123456
> 
> That's a property of your font and font rendering engine, not Unicode.
> (Also, it's probably not quite six columns; most fonts I've tested,
> 漢字 are rendered as something like 1.5 characters wide, assuming your
> terminal doesn't overlap them.)
[...]

I believe he was talking about a console terminal that uses 2 columns to
render the so-called "double width" characters. The CJK block does
contain "double-width" versions of selected blocks (e.g., the ASCII
block), to be used with said characters.

Of course, using string length to measure string width is a risky
venture fraught with pitfalls, because your terminal may not actually
render them the way you think it should. Nevertheless, it does serve to
highlight why a construct like s.walkLength is essentially buggy,
because there is not enough information to determine which length it
should return -- length of the buffer in bytes, or the number of code
points, or the number of graphemes, or the width of the string. No
matter which choice you make, it only works for a subset of cases and is
wrong for the other cases.  This is a prime illustration of why forcing
autodecoding on every string in D is a wrong design.


T

-- 
Не дорог подарок, дорога любовь.


Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d

On 31.05.2016 21:40, Wyatt wrote:

On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:


The 'length' of a character is not one in all contexts.
The following text takes six columns in my terminal:

日本語
123456


That's a property of your font and font rendering engine, not Unicode.


Sure. Hence "context". If you are e.g. trying to manually underline some 
text in console output, for example in a compiler error message, 
counting the number of characters will not actually be what you want, 
even though it works reliably for ASCII text.



(Also, it's probably not quite six columns; most fonts I've tested, 漢字
are rendered as something like 1.5 characters wide, assuming your
terminal doesn't overlap them.)

-Wyatt


It's precisely six columns in my terminal (also in emacs and in gedit).

My point was, how can std.algorithm ever guess correctly what you 
/actually/ intended to do?


Re: The Case Against Autodecode

2016-05-31 Thread Dmitry Olshansky via Digitalmars-d

On 31-May-2016 01:00, Walter Bright wrote:

On 5/30/2016 11:25 AM, Adam D. Ruppe wrote:

I don't agree on changing those. Indexing and slicing a char[] is
really useful
and actually not hard to do correctly (at least with regard to
handling code
units).


Yup. It isn't hard at all to use arrays of codeunits correctly.


Ehm as long as all you care for is operating on substrings I'd say.
Working with individual character requires either decoding or clever 
tricks like operating on encoded UTF directly.


--
Dmitry Olshansky


Re: The Case Against Autodecode

2016-05-31 Thread Jonathan M Davis via Digitalmars-d
On Tuesday, May 31, 2016 21:20:19 Timon Gehr via Digitalmars-d wrote:
> On 31.05.2016 20:53, Jonathan M Davis via Digitalmars-d wrote:
> > On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d 
wrote:
> >> >On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote:
> >>> > >On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via
> >>> > >Digitalmars-d
> >
> > wrote:
>  > >>On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:
> > > >>>Saying that operating at the code point level - UTF-32 - is
> > > >>>correct
> > > >>>is like saying that operating at UTF-16 instead of UTF-8 is
> > > >>>correct.
>  > >>
>  > >>Could you please substantiate that? My understanding is that code
>  > >>unit
>  > >>is a higher-level Unicode notion independent of encoding, whereas
>  > >>code
>  > >>point is an encoding-dependent representation detail. -- Andrei
> >> >
> >> >Does walkLength yield the same number for all representations?
> >
> > walkLength treats a code point like it's a character. My point is that
> > that's incorrect behavior. It will not result in correct string processing
> > in the general case, because a code point is not guaranteed to be a
> > full character.
> > ...
>
> What's "correct"? Maybe the user intended to count the number of code
> points in order to pre-allocate a dchar[] of the correct size.
>
> Generally, I don't see how algorithms become magically "incorrect" when
> applied to utf code units.

In the vast majority of cases what folks care about is full characters,
which is not what code points are. But the fact that they want different
things in different situation just highlights the fact that just converting
everything to code points by default is a bad idea. And even worse, code
points are usually the worst choice. Many operations don't require decoding
and can be done at the code unit level, meaning that operating at the code
point level is just plain inefficient. And the vast majority of the
operations that can't operate at the code point level, then need to operate
on full characters, which means that they need to be operating at the
grapheme level. Code points are in this weird middle ground that's useful in
some cases but usually isn't what you want or need.

We need to be able to operate at the code unit level, the code point level,
and the grapheme level. But defaulting to the code point level really makes
no sense.

> > walkLength does not report the length of a character as one in all cases
> > just like length does not report the length of a character as one in all
> > cases. walkLength is counting bigger units than length, but it's still
> > counting pieces of a character rather than counting full characters.
>
> The 'length' of a character is not one in all contexts.
> The following text takes six columns in my terminal:
>
> 日本語
> 123456

Well, that's getting into displaying characters which is a whole other can
of worms, but it also highlights that assuming that the programmer wants a
particular level of unicode is not a particularly good idea and that we
should avoid converting for them without being asked, since it risks being
inefficient to no benefit.

- Jonathan M Davis




Re: The Case Against Autodecode

2016-05-31 Thread Wyatt via Digitalmars-d

On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:


The 'length' of a character is not one in all contexts.
The following text takes six columns in my terminal:

日本語
123456


That's a property of your font and font rendering engine, not 
Unicode. (Also, it's probably not quite six columns; most fonts 
I've tested, 漢字 are rendered as something like 1.5 characters 
wide, assuming your terminal doesn't overlap them.)


-Wyatt


[Issue 16090] popFront generates out-of-bounds array index on corrupted utf-8 strings

2016-05-31 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=16090

--- Comment #2 from github-bugzi...@puremagic.com ---
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/e1af1b0b51ea9f29d4ff8076d73c03ba10bfc73c
fix issue 16090 - popFront generates out-of-bounds array index on corrupted
utf-8 strings

https://github.com/dlang/phobos/commit/279ccd7c5c8cebfb21a3138aecf7f3a85444e538
Merge pull request #4387 from aG0aep6G/16090

fix issue 16090 - popFront generates out-of-bounds array index on cor…

--


Re: The Case Against Autodecode

2016-05-31 Thread H. S. Teoh via Digitalmars-d
On Tue, May 31, 2016 at 02:30:08PM -0400, Andrei Alexandrescu via Digitalmars-d 
wrote:
[...]
> Does walkLength yield the same number for all representations?

Let's put the question this way. Given the following string, what do
*you* think walkLength should return?

şŭt̥ḛ́k̠

I think any reasonable person would have to say it should return 5,
because there are 5 visual "characters" here. Otherwise, what is even
the meaning of walkLength?! For it to return anything other than 5 means
that it's a leaky abstraction, because it's leaking low-level
"implementation details" of the Unicode representation of this string.

However, with the current implementation of autodecoding, walkLength
returns 11.  Can anyone reasonably argue that it's reasonable for
"şŭt̥ḛ́k̠".walkLength to equal 11?  What difference does this make if we
get rid of autodecoding, and walkLength returns 17 instead? *Both* are
wrong.

17 is actually the right answer if you're looking to allocate a buffer
large enough to hold this string, because that's the number of bytes it
occupies.

5 is the right answer to an end user who knows nothing about Unicode.

11 is an answer that a question that only makes sense to a Unicode
specialist, and that no layperson understands.

11 is the answer we currently give. And that, at the cost of
across-the-board performance degradation.  Yet you're seriously arguing
that 11 should be the right answer, by insisting that the current
implementation of autodecoding is "correct".  It boggles the mind.


T

-- 
Today's society is one of specialization: as you grow, you learn more and more 
about less and less. Eventually, you know everything about nothing.


Re: The Case Against Autodecode

2016-05-31 Thread ag0aep6g via Digitalmars-d

On 05/31/2016 07:21 PM, Andrei Alexandrescu wrote:

Could you please substantiate that? My understanding is that code unit
is a higher-level Unicode notion independent of encoding, whereas code
point is an encoding-dependent representation detail. -- Andrei


You got the terms mixed up. Code unit is lower level. Code point is 
higher level.


One code point is encoded with one or more code units. char is a UTF-8 
code unit. wchar is a UTF-16 code unit. dchar is both a UTF-32 code unit 
and a code point, because in UTF-32 it's a 1-to-1 relation.


Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 02:57 PM, Jonathan M Davis via Digitalmars-d wrote:

In addition, as soon as you have ubyte[], none of the string-related
functions work. That's fixable, but as it stands, operating on ubyte[]
instead of char[] is a royal pain.


That'd be nice to fix indeed. Please break the ground? -- Andrei


Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 02:53 PM, Jonathan M Davis via Digitalmars-d wrote:

walkLength treats a code point like it's a character.


No, it treats a code point like it's a code point. -- Andrei


Re: The Case Against Autodecode

2016-05-31 Thread Andrei Alexandrescu via Digitalmars-d

On 05/31/2016 02:46 PM, Timon Gehr wrote:

On 31.05.2016 20:30, Andrei Alexandrescu wrote:

D's


Phobos'


foreach, too. -- Andrei



Re: Reddit announcements

2016-05-31 Thread John Colvin via Digitalmars-d-announce

On Tuesday, 31 May 2016 at 18:57:29 UTC, o-genki-desu-ka wrote:

Many nice announcements here last week. I put some on reddit.

https://www.reddit.com/r/programming/comments/4lwufi/d_embedded_database_v01_released/

https://www.reddit.com/r/programming/comments/4lwubv/c_to_d_converter_based_on_clang/

https://www.reddit.com/r/programming/comments/4lwu5p/coedit_2_ide_update_6_released/

https://www.reddit.com/r/programming/comments/4lwtxw/compiletime_sqlite_for_d_beta_release/

https://www.reddit.com/r/programming/comments/4lwtr0/button_a_fast_correct_and_elegantly_simple_build/

https://www.reddit.com/r/programming/comments/4lwtn9/first_release_of_powernex_an_os_kernel_written_in/


I'm a bit concerned that people will react negatively to them all 
being dumped at once.


Re: faster splitter

2016-05-31 Thread Chris via Digitalmars-d

On Tuesday, 31 May 2016 at 18:56:14 UTC, qznc wrote:



The mistake is to split on "," instead of ','. The slow 
splitter at the start of this thread searches for "\r\n".


Your lazy-skip algorithm looks great!

./benchmark.ldc
Search in Alice in Wonderland
   std: 168 ±6 +29 ( 107)   -3 ( 893)
manual: 112 ±3 +28 (  81)   -1 ( 856)
  qznc: 149 ±4 +30 (  79)   -1 ( 898)
 Chris: 142 ±5 +28 ( 102)   -2 ( 898)
Andrei: 153 ±3 +28 (  79)   -1 ( 919)
   Andrei2: 101 ±2 +34 (  31)   -1 ( 969)
Search random haystack with random needle
   std: 172 ±19+61 ( 161)  -11 ( 825)
manual: 161 ±47+73 ( 333)  -35 ( 666)
  qznc: 163 ±21+33 ( 314)  -15 ( 661)
 Chris: 190 ±47+80 ( 302)  -33 ( 693)
Andrei: 140 ±37+60 ( 315)  -27 ( 676)
   Andrei2: 103 ±6 +57 (  64)   -2 ( 935)
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7 CPU   M 620  @ 
2.67GHz


The Alice benchmark searches Alice in Wonderland for "find a 
pleasure in all their simple joys" and

finds it in the last sentence.


Would it speed things up even more, if we put the function 
`computeSkip` into the loop or is this done automatically by the 
compiler?


Re: The Case Against Autodecode

2016-05-31 Thread Timon Gehr via Digitalmars-d

On 31.05.2016 20:53, Jonathan M Davis via Digitalmars-d wrote:

On Tuesday, May 31, 2016 14:30:08 Andrei Alexandrescu via Digitalmars-d wrote:

>On 5/31/16 2:11 PM, Jonathan M Davis via Digitalmars-d wrote:

> >On Tuesday, May 31, 2016 13:21:57 Andrei Alexandrescu via Digitalmars-d

wrote:

> >>On 05/31/2016 01:15 PM, Jonathan M Davis via Digitalmars-d wrote:

> >>>Saying that operating at the code point level - UTF-32 - is correct
> >>>is like saying that operating at UTF-16 instead of UTF-8 is correct.

> >>
> >>Could you please substantiate that? My understanding is that code unit
> >>is a higher-level Unicode notion independent of encoding, whereas code
> >>point is an encoding-dependent representation detail. -- Andrei

> >

>Does walkLength yield the same number for all representations?

walkLength treats a code point like it's a character. My point is that
that's incorrect behavior. It will not result in correct string processing
in the general case, because a code point is not guaranteed to be a
full character.
...


What's "correct"? Maybe the user intended to count the number of code 
points in order to pre-allocate a dchar[] of the correct size.


Generally, I don't see how algorithms become magically "incorrect" when 
applied to utf code units.



walkLength does not report the length of a character as one in all cases
just like length does not report the length of a character as one in all
cases. walkLength is counting bigger units than length, but it's still
counting pieces of a character rather than counting full characters.



The 'length' of a character is not one in all contexts.
The following text takes six columns in my terminal:

日本語
123456


Re: Variables should have the ability to be @nogc

2016-05-31 Thread Marco Leise via Digitalmars-d
Am Tue, 31 May 2016 15:53:44 +
schrieb Basile B. :

> This solution seems smarter than using the existing '@nogc' 
> attribute. Plus one also for the fact that nothing has to be done 
> in DMD.

I just constrained myself to what can be done in user code
from the start. :)

> Did you encounter the issue with protected and private members ?
> 
> For me when i've tested the template i've directly got some 
> warnings. DMD interprets my 'getMember' calls as a deprecated 
> abuse of bug 314 but in dmd 2.069 I would get true errors.

Actually it is in a large half-ported code base from C++ and I
haven't ever had a running executable, nor did I test it with
recent dmd versions. My idea was to mostly have @nogc code,
but allow it for a transition time or places where GC use
does not have an impact. Here is the code, free to use for
all purposes.


enum GcScan { no, yes, automatic }
enum noScan = GcScan.no;

template gcScanOf(T)
{
import std.typetuple;

static if (is(T == struct) || is(T == union))
{
enum isGcScan(alias uda) = is(typeof(uda) == GcScan);

GcScan findGcScan(List...)()
{
auto result = GcScan.automatic;
foreach (attr; List) if (is(typeof(attr) == GcScan))
result = attr;
return result;
}

enum gcScanOf()
{
auto result = GcScan.no;
foreach (i; Iota!(T.tupleof.length))
{
enum memberGcScan = 
findMatchingUda!(T.tupleof[i], isGcScan, true);
static if (memberGcScan.length == 0)
enum eval = 
gcScanOf!(typeof(T.tupleof[i]));
else
enum eval = evalGcScan!(memberGcScan, 
typeof(T.tupleof[i]));

static if (eval)
{
result = eval;
break;
}
}
return result;
}
}
else
{
static if (isStaticArray!T && is(T : E[N], E, size_t N))
enum gcScanOf = is(E == void) ? GcScan.yes : gcScanOf!E;
else
enum gcScanOf = hasIndirections!T ? GcScan.yes : 
GcScan.no;
}
}

enum evalGcScan(GcScan gc, T) = (gc == GcScan.automatic) ? gcScanOf!T : gc;

template findMatchingUda(alias symbol, alias func, bool optional = false, bool 
multiple = false)
{
import std.typetuple;

enum symbolName = __traits(identifier, symbol);
enum funcName   = __traits(identifier, func);

template Filter(List...)
{
static if (List.length == 0)
alias Filter = TypeTuple!();
else static if (__traits(compiles, func!(List[0])) && 
func!(List[0]))
alias Filter = TypeTuple!(List[0], Filter!(List[1 .. 
$]));
else
alias Filter = Filter!(List[1 .. $]);
}

alias filtered = Filter!(__traits(getAttributes, symbol));
static assert(filtered.length <= 1 || multiple,
  symbolName ~ " may only have one UDA matching " ~ 
funcName ~ ".");
static assert(filtered.length >= 1 || optional,
  symbolName ~ " requires a UDA matching " ~ funcName ~ 
".");

static if (multiple || optional)
alias findMatchingUda = filtered;
else static if (filtered.length == 1)
alias findMatchingUda = filtered[0];
}

-- 
Marco



Is there any overhead iterating over a pointer using a slice?

2016-05-31 Thread Gary Willoughby via Digitalmars-d-learn

In relation to this thread:

http://forum.dlang.org/thread/ddckhvcxlyuvuiyaz...@forum.dlang.org

Where I asked about slicing a pointer, I have another question:

If I have a pointer and iterate over it using a slice, like this:

T* foo = 

foreach (element; foo[0 .. length])
{
...
}

Is there any overhead compared with pointer arithmetic in a for 
loop?


  1   2   3   >