Re: how to declare C's static function?

2016-03-28 Thread aki via Digitalmars-d-learn

On Tuesday, 29 March 2016 at 01:04:50 UTC, Mike Parker wrote:

On Monday, 28 March 2016 at 14:40:40 UTC, Adam D. Ruppe wrote:

On Monday, 28 March 2016 at 04:53:19 UTC, aki wrote:

So... You mean there are no way to declare functions
without exporting the symbol?


alas, no, even if it is private it can conflict on the outside 
(so stupid btw).




Seems to be fixed in the latest beta (finally):

http://dlang.org/changelog/2.071.0.html#dip22


Good news!



Re: how to declare C's static function?

2016-03-28 Thread Mike Parker via Digitalmars-d-learn

On Monday, 28 March 2016 at 14:40:40 UTC, Adam D. Ruppe wrote:

On Monday, 28 March 2016 at 04:53:19 UTC, aki wrote:

So... You mean there are no way to declare functions
without exporting the symbol?


alas, no, even if it is private it can conflict on the outside 
(so stupid btw).




Seems to be fixed in the latest beta (finally):

http://dlang.org/changelog/2.071.0.html#dip22


Re: char array weirdness

2016-03-28 Thread Steven Schveighoffer via Digitalmars-d-learn

On 3/28/16 7:06 PM, Anon wrote:

The compiler doesn't know that, and it isn't true in general. You could
have, for example, U+3042 in your char[]. That would be encoded as three
chars. It wouldn't make sense (or be correct) for val.front to yield
'\xe3' (the first byte of U+3042 in UTF-8).


I just want to interject to say that the compiler understands that 
char[] is an array of char code units just fine. It's Phobos that has a 
strange interpretation of it.


-Steve


Re: Strange behavior in console with UTF-8

2016-03-28 Thread Jonathan Villa via Digitalmars-d-learn
On Monday, 28 March 2016 at 18:28:33 UTC, Steven Schveighoffer 
wrote:

On 3/27/16 12:04 PM, Jonathan Villa wrote:

I can reproduce your issue on windows.

It works on Mac OS X.

I see different behavior on 32-bit (DMC stdlib) vs. 64-bit 
(MSVC stdlib). On both, the line is not read properly (I get a 
length of 0). On 32-bit, the program exits immediately, 
indicating it cannot read any more data.


On 64-bit, the program continues to allow input.

I don't think this is normal behavior, and should be filed as a 
bug. I'm not a Windows developer normally, but I would guess 
this is an issue with the Windows flavors of readln.


Please file here: https://issues.dlang.org under the Phobos 
component.


-Steve


Ok, I'm gonna register it with your data. Thanks.

JV.


Re: char array weirdness

2016-03-28 Thread Jack Stouffer via Digitalmars-d-learn

On Monday, 28 March 2016 at 23:07:22 UTC, Jonathan M Davis wrote:

...


Thanks for the detailed responses. I think I'll compile this info 
and put it in a blog post so people can just point to it when 
someone else is confused.





Re: char array weirdness

2016-03-28 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Mar 28, 2016 at 04:07:22PM -0700, Jonathan M Davis via 
Digitalmars-d-learn wrote:
[...]
> The range API considers all strings to have an element type of dchar.
> char, wchar, and dchar are UTF code units - UTF-8, UTF-16, and UTF-32
> respectively. One or more code units make up a code point, which is
> actually something displayable but not necessarily what you'd call a
> character (e.g.  it could be an accent). One or more code points then
> make up a grapheme, which is really what a displayable character is.
> When Andrei designed the range API, he didn't know about graphemes -
> just code units and code points, so he thought that code points were
> guaranteed to be full characters and decided that that's what we'd
> operate on for correctness' sake.
[...]

Unfortunately, the fact that the default is *not* to use graphemes makes
working with non-European language strings pretty much just as ugly and
error-prone as working with bare char's in European language strings.

You gave the example of filter() returning wrong results when used with
a range of chars (if we didn't have autodecoding), but the same can be
said of using filter() *with* autodecoding on a string that contains
combining diacritics: your diacritics may get randomly reattached to
stuff they weren't originally attached to, or you may end up with wrong
sequences of Unicode code points (e.g. diacritics not attached to any
grapheme). Using filter() on Korean text, even with autodecoding, will
pretty much produce garbage. And so on.

So in short, we're paying a performance cost for something that's only
arguably better but still not quite there, and this cost is attached to
almost *everything* you do with strings, regardless of whether you need
to (e.g., when you know you're dealing with pure ASCII data).  Even when
dealing with non-ASCII Unicode data, in many cases autodecoding
introduces a constant (and unnecessary!) overhead.  E.g., searching for
a non-ASCII character is equivalent to a substring search on the encoded
form of the character, and there is no good reason why Phobos couldn't
have done this instead of autodecoding every character while scanning
the string.  Regexes on Unicode strings could possibly be faster if the
regex engine internally converted literals in the regex into their
equivalent encoded forms and did the scanning without decoding. (IIRC
Dmitry did remark in some PR some time ago, to the effect that the regex
engine has been optimized to the point where the cost of autodecoding is
becoming visible, and the next step might be to bypass autodecoding.)

I argue that auto-decoding, as currently implemented, is a net minus,
even though I realize this is unlikely to change in this lifetime. It
charges a constant performance overhead yet still does not guarantee
things will behave as the user would expect (i.e., treat the string as
graphemes rather than code points).


T

-- 
We are in class, we are supposed to be learning, we have a teacher... Is it too 
much that I expect him to teach me??? -- RL


Re: char array weirdness

2016-03-28 Thread Jonathan M Davis via Digitalmars-d-learn
On Monday, March 28, 2016 16:02:26 H. S. Teoh via Digitalmars-d-learn wrote:
> For the time being, I'd recommend std.utf.byCodeUnit as a workaround.

Yeah, though as I've started using it, I've quickly found that enough
of Phobos doesn't support it yet, that it's problematic. e.g.

https://issues.dlang.org/show_bug.cgi?id=15800

The situation will improve, but for the moment, the most reliable thing is
still to use strings as ranges of dchar but special case functions for them
so that they avoid decoding where necessary. The main problem is places like
filter where if you _know_ that you're just dealing with ASCII but the code
has to treat the string as a range of dchar anyway, because it has to decode
to match what's expected of auto-decoding. To some extent, using
std.string.representation gets around that, but it runs into problems
similar to those of byCodeUnit.

So, we have a ways to go.

- Jonathan M Davis



Re: char array weirdness

2016-03-28 Thread ag0aep6g via Digitalmars-d-learn

On 29.03.2016 00:49, Jack Stouffer wrote:

But the value fits into a char; a dchar is a waste of space. Why on
Earth would a different type be given for the front value than the type
of the elements themselves?


UTF-8 strings are decoded by the range primitives. That is, `front` 
returns one Unicode code point (type dchar) that's pieced together from 
up to four UTF-8 code units (type char). A code point does not fit into 
the 8 bits of a char.


Re: char array weirdness

2016-03-28 Thread Anon via Digitalmars-d-learn

On Monday, 28 March 2016 at 23:06:49 UTC, Anon wrote:

Any because you're using ranges,


*And because you're using ranges,




Re: char array weirdness

2016-03-28 Thread Anon via Digitalmars-d-learn

On Monday, 28 March 2016 at 22:49:28 UTC, Jack Stouffer wrote:

On Monday, 28 March 2016 at 22:43:26 UTC, Anon wrote:

On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:

void main () {
import std.range.primitives;
char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 
's'];

pragma(msg, ElementEncodingType!(typeof(val)));
pragma(msg, typeof(val.front));
}

prints

char
dchar

Why?


Unicode! `char` is UTF-8, which means a character can be from 
1 to 4 bytes. val.front gives a `dchar` (UTF-32), consuming 
those bytes and giving you a sensible value.


But the value fits into a char;


The compiler doesn't know that, and it isn't true in general. You 
could have, for example, U+3042 in your char[]. That would be 
encoded as three chars. It wouldn't make sense (or be correct) 
for val.front to yield '\xe3' (the first byte of U+3042 in UTF-8).



a dchar is a waste of space.


If you're processing Unicode text, you *need* to use that space. 
Any because you're using ranges, it is only 3 extra bytes, 
anyway. It isn't going to hurt on modern systems.


Why on Earth would a different type be given for the front 
value than the type of the elements themselves?


Unicode. A single char cannot hold a Unicode code point. A single 
dchar can.


Re: char array weirdness

2016-03-28 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Mar 28, 2016 at 10:49:28PM +, Jack Stouffer via Digitalmars-d-learn 
wrote:
> On Monday, 28 March 2016 at 22:43:26 UTC, Anon wrote:
> >On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:
> >>void main () {
> >>import std.range.primitives;
> >>char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 's'];
> >>pragma(msg, ElementEncodingType!(typeof(val)));
> >>pragma(msg, typeof(val.front));
> >>}
> >>
> >>prints
> >>
> >>char
> >>dchar
> >>
> >>Why?
> >
> >Unicode! `char` is UTF-8, which means a character can be from 1 to 4
> >bytes. val.front gives a `dchar` (UTF-32), consuming those bytes and
> >giving you a sensible value.
> 
> But the value fits into a char; a dchar is a waste of space. Why on
> Earth would a different type be given for the front value than the
> type of the elements themselves?

Welcome to the world of auto-decoding.  Phobos ranges always treat any
string / wstring / dstring as a range of dchar, even if it's encoded as
UTF-8.

The pros and cons of auto-decoding have been debated to death several
times already. Walter hates it and wishes to get rid of it, but so far
Andrei has refused to budge.  Personally I lean on the side of killing
auto-decoding, but it seems unlikely to change at this point.  (But you
never know... if enough people revolt against it, maybe there's a small
chance Andrei could be convinced...)

For the time being, I'd recommend std.utf.byCodeUnit as a workaround.


T

-- 
Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N


Re: char array weirdness

2016-03-28 Thread Jonathan M Davis via Digitalmars-d-learn
On Monday, March 28, 2016 22:34:31 Jack Stouffer via Digitalmars-d-learn 
wrote:
> void main () {
>  import std.range.primitives;
>  char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 's'];
>  pragma(msg, ElementEncodingType!(typeof(val)));
>  pragma(msg, typeof(val.front));
> }
>
> prints
>
>  char
>  dchar
>
> Why?

assert(typeof(ElementType!(typeof(val)) == dchar));

The range API considers all strings to have an element type of dchar. char,
wchar, and dchar are UTF code units - UTF-8, UTF-16, and UTF-32
respectively. One or more code units make up a code point, which is actually
something displayable but not necessarily what you'd call a character (e.g.
it could be an accent). One or more code points then make up a grapheme,
which is really what a displayable character is. When Andrei designed the
range API, he didn't know about graphemes - just code units and code points,
so he thought that code points were guaranteed to be full characters and
decided that that's what we'd operate on for correctness' sake.

In the case of UTF-8, a code point is made up of 1 - 4 code units of 8 bits
each. In the case of UTF-16, a code point is mode up of 1 - 2 code units of
16 bits each. And in the case of UTF-32, a code unit is guaranteed to be a
single code point. So, by having the range API decode UTF-8 and UTF-16 to
UTF-32, strings then become ranges of dchar and avoid having code points
chopped up by stuff like slicing. So, while a code point is not actually
guaranteed to be a full character, certain classes of bugs are prevented by
operating on ranges of code points rather than code units. Of course, for
full correctness, graphemes need to be taken into account, and some
algorithms generally don't care whether they're operating on code units,
code points, or graphemes (e.g. find on code units generally works quite
well, whereas something like filter would be a complete disaster if you're
not actually dealing with ASCII).

Arrays of char and wchar are termed "narrow strings" - hence isNarrowString
is true for them (but not arrays of dchar) - and the range API does not
consider them to have slicing, be random access, or have length, because as
ranges of dchar, those operations would be O(n) rather than O(1). However,
because of this mess of whether an algorithm works best when operating on
code units or code points and the desire to avoid decoding to code points if
unnecessary, many algorithms special case narrow strings in order to
operate on them more efficiently. So, ElementEncodingType was introduced for
such cases. ElementType gives you the element type of the range, and for
everythnig but narrow strings ElementEncodingType is the same as
ElementType, but in the case of narrow strings, whereas ElementType is
dchar, ElementEncodingType is the actual element type of the array - hence
why ElementEncodingType(typeof(val)) is char in your code above.

The correct way to deal with this is really to understand Unicode well
enough to know when you should be dealing at the code unit, code point, or
grapheme level and write your code accordingly, but that's not exactly easy.
So, in some respects, just operating on strings as dchar simplifies things
and reduces bugs relating to breaking up code points, but it does come with
an efficiency cost, and it does make the range API more confusing when it
comes to operating on narrow strings. And it isn't even fully correct,
because it doesn't take graphemes into account. But it's what we're stuck
with at this point.

std.utf provides byCodeUnit and byChar to iterate by code unit or specific
character types, and std.uni provides byGrapheme for iterating by grapheme
(along with plenty of other helper functions). So, the tools to deal with
range s of characters more precisely are there, but they do require some
understanding of Unicode, and they don't always interact with the rest of
Phobos very well, since they're newer (e.g. std.conv.to doesn't fully work
with byCodeUnit yet, even though it works with ranges of dchar just fine).

- Jonathan M Davis



Re: char array weirdness

2016-03-28 Thread Jack Stouffer via Digitalmars-d-learn

On Monday, 28 March 2016 at 22:43:26 UTC, Anon wrote:

On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:

void main () {
import std.range.primitives;
char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 's'];
pragma(msg, ElementEncodingType!(typeof(val)));
pragma(msg, typeof(val.front));
}

prints

char
dchar

Why?


Unicode! `char` is UTF-8, which means a character can be from 1 
to 4 bytes. val.front gives a `dchar` (UTF-32), consuming those 
bytes and giving you a sensible value.


But the value fits into a char; a dchar is a waste of space. Why 
on Earth would a different type be given for the front value than 
the type of the elements themselves?


Re: char array weirdness

2016-03-28 Thread Anon via Digitalmars-d-learn

On Monday, 28 March 2016 at 22:34:31 UTC, Jack Stouffer wrote:

void main () {
import std.range.primitives;
char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 's'];
pragma(msg, ElementEncodingType!(typeof(val)));
pragma(msg, typeof(val.front));
}

prints

char
dchar

Why?


Unicode! `char` is UTF-8, which means a character can be from 1 
to 4 bytes. val.front gives a `dchar` (UTF-32), consuming those 
bytes and giving you a sensible value.


char array weirdness

2016-03-28 Thread Jack Stouffer via Digitalmars-d-learn

void main () {
import std.range.primitives;
char[] val = ['1', '0', 'h', '3', '6', 'm', '2', '8', 's'];
pragma(msg, ElementEncodingType!(typeof(val)));
pragma(msg, typeof(val.front));
}

prints

char
dchar

Why?


Re: How to be more careful about null pointers?

2016-03-28 Thread Adam D. Ruppe via Digitalmars-d-learn

On Monday, 28 March 2016 at 21:01:19 UTC, cy wrote:

No exception raised for dereferencing a null.


If it didn't give the error, either you swallowed it or you 
didn't actually dereference null.


The latter is a kinda strange happenstance, but if you are 
calling a static or final method on an object and it doesn't 
actually use any member variables, it is possible for the call to 
succeed even if null.


void doSomething(void* this) {
  // if you never actually use this
}

// then this is no error:

doSomething(null);


What is the db library you are using? Did you compile it along 
with your program or use a .lib with it?


Re: How to be more careful about null pointers?

2016-03-28 Thread cy via Digitalmars-d-learn

On Monday, 28 March 2016 at 21:01:19 UTC, cy wrote:
I invoked db.find_chapter.bindAll(8,4), when db was a null 
pointer.


No, no, no it's worse than that. What I did was (db.)find_chapter 
= (db.)backend.prepare("...") when backend was null, and got no 
error. find_chapter was garbage of course, but there was no 
error. And only much later, when I called 
db.find_chapter.bindAll(...) did it react to the garbage data, by 
messily segfaulting. db was a good pointer, but db.backend was 
bad, and db.backend.prepare() didn't even flinch getting passed a 
null this, and dereferencing that null internally to construct 
Statement(Database,string).


How to be more careful about null pointers?

2016-03-28 Thread cy via Digitalmars-d-learn
I finally found the null pointer. It took a week. I was assigning 
"db = db" when I should have been assigning "this.db = db". 
Terrible, I know. But...


I invoked db.find_chapter.bindAll(8,4), when db was a null 
pointer. There was no null pointer error. No exception raised for 
dereferencing a null. I'm not in release mode. Assertions are 
enabled. Shouldn't that have raised a null pointer exception?


Instead, it accesses db as if db were not null, producing a 
garbage structure in find_chapter, which bindAll chokes on, then 
causes the whole program to segfault.


I realize enforce(db).find_chapter would work, but... I thought D 
was more careful about null pointers? At least enough to die on 
dereferencing them?


Re: Strange behavior in console with UTF-8

2016-03-28 Thread Steven Schveighoffer via Digitalmars-d-learn

On 3/27/16 12:04 PM, Jonathan Villa wrote:

On Saturday, 26 March 2016 at 16:34:34 UTC, Steven Schveighoffer wrote:

On 3/25/16 6:47 PM, Jonathan Villa wrote:

On Friday, 25 March 2016 at 13:58:44 UTC, Steven Schveighoffer wrote:

[...]


OK, the following inputs I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù.
Just one input is enough to reproduce the behaviour.

JV


It's the same Ali suggested (if I get it right) and the behaviour its
the same.

It just get to send a UTF8 char to reproduce the mess, independently of
the char type you send.



At this point, I think knowing exactly what input you are sending
would be helpful. Can you attach a file which has the input that
causes the error? Or just paste the input into your post.



The following chars I've tested: á, é, í, ó, ú, ñ, à, è, ì, ò, ù.
Just one input of thouse is enough to reproduce the behaviour


I can reproduce your issue on windows.

It works on Mac OS X.

I see different behavior on 32-bit (DMC stdlib) vs. 64-bit (MSVC 
stdlib). On both, the line is not read properly (I get a length of 0). 
On 32-bit, the program exits immediately, indicating it cannot read any 
more data.


On 64-bit, the program continues to allow input.

I don't think this is normal behavior, and should be filed as a bug. I'm 
not a Windows developer normally, but I would guess this is an issue 
with the Windows flavors of readln.


Please file here: https://issues.dlang.org under the Phobos component.

-Steve


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-28 Thread wobbles via Digitalmars-d-learn

On Sunday, 27 March 2016 at 07:45:00 UTC, ParticlePeter wrote:

On Wednesday, 23 March 2016 at 20:00:55 UTC, wobbles wrote:

[...]


Thanks Wobbles, I took your approach. There were some minor 
issues, here is a working version:


[...]


Great, thanks for fixing it up!


Re: how to declare C's static function?

2016-03-28 Thread aki via Digitalmars-d-learn

On Monday, 28 March 2016 at 14:40:40 UTC, Adam D. Ruppe wrote:

On Monday, 28 March 2016 at 04:53:19 UTC, aki wrote:

So... You mean there are no way to declare functions
without exporting the symbol?


alas, no, even if it is private it can conflict on the outside 
(so stupid btw).


Is it all the same function being referenced? Just importing 
from there would be ok.


Thank you for clarify.
Now I know I have to change the name as a result.

Thanks,
aki.



Re: how to declare C's static function?

2016-03-28 Thread Adam D. Ruppe via Digitalmars-d-learn

On Monday, 28 March 2016 at 04:53:19 UTC, aki wrote:

So... You mean there are no way to declare functions
without exporting the symbol?


alas, no, even if it is private it can conflict on the outside 
(so stupid btw).


Is it all the same function being referenced? Just importing from 
there would be ok.




Re: how to declare C's static function?

2016-03-28 Thread Jacob Carlborg via Digitalmars-d-learn

On 2016-03-28 06:02, aki wrote:

Hello,

When I porting legacy app. written in C to D,
I have a problem.

file a.d:
extern (C) private void foo() {}

file b.d:
extern (C) private void foo() {}

  Error 1: Previous Definition Different : _foo

In C language, "static void foo(){}" does not
export the symbol out side the compilation unit.
In D, the function foo() above conflicts even if
it is private. How can I declare C's static function?


Can you declare the function as "package" in one module and import it 
into the other module?


--
/Jacob Carlborg