Re: to delete the '\0' characters

2022-09-23 Thread Salih Dincer via Digitalmars-d-learn

On Friday, 23 September 2022 at 22:17:51 UTC, Paul Backus wrote:
Apologies for the confusion. You can use 
[stripRight](https://phobos.dpldocs.info/std.string.stripRight.2.html)


We have a saying: Estaghfirullah!

Thank you all so much because it has been very useful for me.

I learned two things:

* First, we can use strip() functions with parameters:
https://dlang.org/phobos/std_algorithm_mutation.html#.strip

(examples are very nice)

* Second, we could walk through the string in reverse and with 
indexOf():

https://github.com/dlang/phobos/blob/master/std/string.d#L3418

**Source Code:**
```d
//import std.string : stripRight;/*
string stripRight(string str, const(char)[] chars)
{
  import std.string : indexOf;
  for (; !str.empty; str.popBack())
  {
if (chars.indexOf(str.back) == -1)
  break;
  }
  return str;
}//*/
```

Delicious...

SDB@79


Re: to delete the '\0' characters

2022-09-23 Thread Paul Backus via Digitalmars-d-learn

On Friday, 23 September 2022 at 18:37:59 UTC, Salih Dincer wrote:
On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
Is there a more accurate way to delete **the '\0' characters 
at the end of the string?**


* character**S**
* at the **END**
* of the **STRING**


Apologies for the confusion. You can use [`stripRight`][1] for 
this:


```d
import std.string: stripRight;
import std.stdio: writeln;

void main()
{
string[] samples = [
"the one\0", "the two\0\0", "the three\0\0\0", "the 
four\0\0\0\0",

"the five\0\0\0\0\0", "the six\0\0\0\0\0\0",
"the seven\0\0\0\0\0\0\0", "the eight\0\0\0\0\0\0\0\0"
];

foreach (s; samples) {
writeln(s.stripRight("\0"));
}
}
```

[1]: https://phobos.dpldocs.info/std.string.stripRight.2.html


Re: to delete the '\0' characters

2022-09-23 Thread Ali Çehreli via Digitalmars-d-learn

On 9/23/22 11:37, Salih Dincer wrote:

> * character**S**
> * at the **END**
> * of the **STRING**

I think the misunderstanding is due to the following data you've posted 
earlier (I am abbreviating):


53 F6 6E 6D 65 64 65 6E 20 79 75 72 64 75 6D 75 6E 20 FC 73 74 FC 6E 64 
65 20 74 FC 74 65 6E 20 65 6E 20 73 6F 6E 20 6F 63 61 6B 0
4F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 20 79 131 6C 64 
131 7A 131 64 131 72 20 70 61 72 6C 61 79 61 63 61 6B 0 0 0 0


You must have meant there were multiple strings there (apparently on 
separate lines) but I assumed you were showing a single string with 0 
bytes inside the string. (Word wrap must have contributed to the 
misunderstanding.)


Ali

P.S. With that understanding, now I think searching from the end for the 
first non-zero byte may be faster than searching from the beginning for 
the first zero; but again, it depends on the data.





Re: to delete the '\0' characters

2022-09-23 Thread Salih Dincer via Digitalmars-d-learn
On Friday, 23 September 2022 at 14:38:35 UTC, Jesse Phillips 
wrote:


You should be explicit with requirements.


Sorry, generally what I speak is Turkish language. So, I speak 
English as a foreign language but it's clear I wrote. What do you 
think when you look at the text I've pointed to following?


On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
Is there a more accurate way to delete **the '\0' characters at 
the end of the string?**


* character**S**
* at the **END**
* of the **STRING**


```d
auto splitz(string s) {
return s.splitter('\0')
   .filter!(x => !x.empty);
}
```


By the way, if we're going to filter, why are we splitting? 
Anyways! For this implementation, indexOf() is a powerful enough 
tool. In fact, it's pretty fast, as there is a maximum of the \0 
8 characters possible and when those 8 '\0' are at the end of the 
string! For example:


```d
void main()
{
  string[] samples = ["the one\0", "the two\0\0", "the 
three\0\0\0",

  "the four\0\0\0\0", "the five\0\0\0\0\0",
  "the six\0\0\0\0\0\0", "the 
seven\0\0\0\0\0\0\0",

  "the eight\0\0\0\0\0\0\0\0"];

  import std.stdio : writefln;
  foreach(s; samples)
  {
auto start = s.length - 8;
string res = s.splitZeros!false(start);
writefln("%(%02X%)", cast(ubyte[])res);
  }
}

string splitZeros(bool keepSep)(string s, size_t start = 0)
{
  auto keep = keepSep ? 0 : 1;

  import std.string : indexOf;
  if(auto seekPos = s.indexOf('\0', start) + 1)
  {
return s[0..seekPos - keep];
  }
  return s;
}
```
SDB@79


Re: to delete the '\0' characters

2022-09-23 Thread Jesse Phillips via Digitalmars-d-learn

On Friday, 23 September 2022 at 08:50:42 UTC, Salih Dincer wrote:
On Thursday, 22 September 2022 at 21:49:36 UTC, Ali Çehreli 
wrote:

On 9/22/22 14:31, Salih Dincer wrote:

If you have multiple '\0' chars that you will continue looking 
for, how about the following?


It can be preferred in terms of working at ranges.  But it 
isn't useful in terms of having more than one character and 
moving away from strings. For example:


```d
auto data = [ "hello", "and", "goodbye", "world" ];
auto hasZeros = data.joiner("\0\0").text; // ("hello\0\0", 
"and\0\0", "goodbye\0\0", "world\0\0")


    assert(hasZeros.count('\0') == 7);
assert(hasZeros.splitz.walkLength == data.length * 2 - 1);

auto range = hasZeros.splitz; // ("hello", "", "and", "", 
"goodbye", "", "world")

```
SDB@79



You should be explicit with requirements. It was hard to tell if 
you original code was correct.


```d
auto splitz(string s) {
return s.splitter('\0')
   .filter!(x => !x.empty);
}
```


Re: to delete the '\0' characters

2022-09-23 Thread Salih Dincer via Digitalmars-d-learn

On Thursday, 22 September 2022 at 21:49:36 UTC, Ali Çehreli wrote:

On 9/22/22 14:31, Salih Dincer wrote:

If you have multiple '\0' chars that you will continue looking 
for, how about the following?


It can be preferred in terms of working at ranges.  But it isn't 
useful in terms of having more than one character and moving away 
from strings. For example:


```d
auto data = [ "hello", "and", "goodbye", "world" ];
auto hasZeros = data.joiner("\0\0").text; // ("hello\0\0", 
"and\0\0", "goodbye\0\0", "world\0\0")


    assert(hasZeros.count('\0') == 7);
assert(hasZeros.splitz.walkLength == data.length * 2 - 1);

auto range = hasZeros.splitz; // ("hello", "", "and", "", 
"goodbye", "", "world")

```
SDB@79


Re: to delete the '\0' characters

2022-09-23 Thread Quirin Schroll via Digitalmars-d-learn
On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
Is there a more accurate way to delete the '\0' characters at 
the end of the string?


Accurate? No. Your code works. Correct is correct, no matter 
efficiency or style.


I tried functions in this module: 
https://dlang.org/phobos/std_string.html


[code]


You won’t do it any shorter than this if returning a range of 
`dchar` is fine:

```d
auto removez(const(char)[] string, char ch = '\0')
{
import std.algorithm.iteration;
return string.splitter(ch).joiner;
}
```
If `dchar` is a problem and a range is not what you want,
```d
inout(char)[] removez(inout(char)[] chars) @safe pure nothrow
{
import std.array, std.algorithm.iteration;
auto data = cast(const(ubyte)[])chars;
auto result = data.splitter(0).joiner.array;
return (() inout @trusted => cast(inout(char)[])result)();
}
```
Bonus: Works with any kind of array of qualified char. As 
`string` is simply `immutable(char)[]`, `removez` returns a 
`string` given a `string`, but returns a `char[]` given a 
`char[]`, etc.


Warning: I do not know if the `@trusted` expression is really 
okay. The cast is not `@safe` because of type qualifiers: If 
`inout` becomes nothing (i.e. mutable), the cast removes `const`. 
I suspect that it is still okay because the result of `array` is 
unique. Maybe others know better?


Re: to delete the '\0' characters

2022-09-22 Thread Ali Çehreli via Digitalmars-d-learn

On 9/22/22 14:31, Salih Dincer wrote:

> string splitz(string s)
> {
>import std.string : indexOf;
>auto seekPos = s.indexOf('\0');
>return seekPos > 0 ? s[0..seekPos] : s;
> }

If you have multiple '\0' chars that you will continue looking for, how 
about the following?


import std;

auto splitz(string s) {
return s.splitter('\0');
}

unittest {
auto data = [ "hello", "and", "goodbye", "world" ];
auto hasZeros = data.joiner("\0").text;
assert(hasZeros.count('\0') == 3);
assert(hasZeros.splitz.equal(data));
}

void main() {
}

Ali




Re: to delete the '\0' characters

2022-09-22 Thread Salih Dincer via Digitalmars-d-learn
On Thursday, 22 September 2022 at 20:53:28 UTC, Salih Dincer 
wrote:


```d
string splitz(string s)
{
  import std.string : indexOf;
  size_t seekPos = s.indexOf('\0');
  return s[0..seekPos];
}
```


I ignored the possibility of not finding '\0'. I'm fixing it now:

```d
string splitz(string s)
{
  import std.string : indexOf;
  auto seekPos = s.indexOf('\0');
  return seekPos > 0 ? s[0..seekPos] : s;
}
```

But I also wish it could be like this:

```d
string splitz(string s)
{
  import std.string : indexOf;
  if(auto seekPos = s.indexOf('\0') > 0)
  {
return s[0..seekPos];
  }
  return s;
}
```

SDB@79




Re: to delete the '\0' characters

2022-09-22 Thread Salih Dincer via Digitalmars-d-learn

On Thursday, 22 September 2022 at 15:22:06 UTC, Ali Çehreli wrote:

On 9/22/22 08:19, Ali Çehreli wrote:
> ```d
> string noZeroes(string s)
> {
>  return s.byCodeUnit.filter!(c => c != '\0');
> }
> ```
That won't compile; the return type must be 'auto'.

Ali


Thank you for all the valuable information you wrote. I chose to 
split because the '\0' are at the end of the string:


```d
string splitz(string s)
{
  import std.string : indexOf;
  size_t seekPos = s.indexOf('\0');
  return s[0..seekPos];
}
```

SDB@79


Re: to delete the '\0' characters

2022-09-22 Thread Ali Çehreli via Digitalmars-d-learn

On 9/22/22 08:19, Ali Çehreli wrote:

> string noZeroes(string s)
> {
>  return s.byCodeUnit.filter!(c => c != '\0');
> }

That won't compile; the return type must be 'auto'.

Ali




Re: to delete the '\0' characters

2022-09-22 Thread Ali Çehreli via Digitalmars-d-learn

On 9/22/22 03:53, Salih Dincer wrote:
> Is there a more accurate way to delete the '\0' characters at the end of
> the string? I tried functions in this module:
> https://dlang.org/phobos/std_string.html

Just to remind, the following are always related as well because strings 
are arrays, which are ranges:


  std.range
  std.algorithm
  std.array

>r ~= c;

Stefan Koch once said the ~ operator should be called "the slow 
operator". Meaning, if you want to make your code slow, then use that 
operator. :)


The reason is, that operation may need to allocate memory from the heap 
and copy existing elements there. And any memory allocation may trigger 
a garbage collection cycle.


Of course, none of that matters if we are talking about a short string. 
However, it may become a dominating reason why a program may be slow.


I was going to suggest Paul Backus' solution as well but I may leave the 
array part out in my own code until I really need it:


string noZeroes(string s)
{
return s.byCodeUnit.filter!(c => c != '\0');
}

Now, a caller may be happy without an array:

auto a = s.noZeroes.take(10);

And another can easily add a .array when really needed:

auto b = s.noZeroes.array;

That may be seen as premature optimization but I see it as avoiding a 
premature pessimization because I did not put in any extra work there. 
But again, this all depends on each program.


If we were talking about mutable elements and the order of elements did 
not matter, then the fastest option would be to remove with 
SwapStrategy.unstable:


import std;

void main() {
auto arr = [ 1, 0, 2, 0, 0, 3, 4, 5 ];
arr = remove!(i => i == 0, SwapStrategy.unstable)(arr);
writeln(arr);
}

unstable works by swapping the first 0 that it finds with the last 
non-zero that it finds and continues in that way. No memory is 
allocated. As a result, the order of elements will not preserved but 
unstable can be very fast compared to .stable (which is the default) 
because .stable must move elements to the left (multiple times in some 
cases) and can be expensive especially for some types.


The result of the program above is the following:

[1, 5, 2, 4, 3]

Zeros are removed but the order is not preserved.

And very important: Don't forget to assign remove's return value back to 
'arr'. ;)


I know this will not work for a string but something to keep in mind...

Ali




Re: to delete the '\0' characters

2022-09-22 Thread Paul Backus via Digitalmars-d-learn
On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
Is there a more accurate way to delete the '\0' characters at 
the end of the string? I tried functions in this module: 
https://dlang.org/phobos/std_string.html


```d
auto foo(string s)
{
  string r;
  foreach(c; s)
  {
if(c > 0)
{
  r ~= c;
}
  }
  return r;
}
```


```d
import std.algorithm : filter;
import std.utf : byCodeUnit;
import std.array : array;

string removeZeroes(string s)
{
return s.byCodeUnit
.filter!(c => c != '\0')
.array;
}
```


Re: to delete the '\0' characters

2022-09-22 Thread Salih Dincer via Digitalmars-d-learn

On Thursday, 22 September 2022 at 13:29:43 UTC, user1234 wrote:

Two remarks:

1. The zero implicitly added to literals is not part of the 
string. for example s[$-1] will not give 0 unless you added it 
explictly to a literal


2. you code remove all the 0, not the one at the end. As it 
still ensure what you want to achieve, maybe try 
[`stripRight()`](https://dlang.org/phobos/std_string.html#.stripRight). The second overload allows to specify the characters to remove.


As I mentioned earlier stripRight() and others don't work. What 
I'm talking about is not the terminating character. Actually, I'm 
the one who added the \0 character, and they are multiple. For 
example:


4B 6F 72 6B 6D 61 20 73 F6 6E 6D 65 7A 20 62 75 20 15F 61 66 61 
6B 6C 61 72 64 61 20 79 FC 7A 65 6E 20 61 6C 20 73 61 6E 63 61 
6B 0 0
53 F6 6E 6D 65 64 65 6E 20 79 75 72 64 75 6D 75 6E 20 FC 73 74 FC 
6E 64 65 20 74 FC 74 65 6E 20 65 6E 20 73 6F 6E 20 6F 63 61 6B 0
4F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 20 79 131 
6C 64 131 7A 131 64 131 72 20 70 61 72 6C 61 79 61 63 61 6B 0 0 0 
0
4F 20 62 65 6E 69 6D 64 69 72 20 6F 20 62 65 6E 69 6D 20 6D 69 6C 
6C 65 74 69 6D 69 6E 64 69 72 20 61 6E 63 61 6B 0 0
C7 61 74 6D 61 20 6B 75 72 62 61 6E 20 6F 6C 61 79 131 6D 20 E7 
65 68 72 65 6E 69 20 65 79 20 6E 61 7A 6C 131 20 68 69 6C 61 6C 0 
0
4B 61 68 72 61 6D 61 6E 20 131 72 6B 131 6D 61 20 62 69 72 20 67 
FC 6C 20 6E 65 20 62 75 20 15F 69 64 64 65 74 20 62 75 20 63 65 
6C E2 6C 0 0 0 0 0 0


Thanks, SDB@79


Re: to delete the '\0' characters

2022-09-22 Thread user1234 via Digitalmars-d-learn
On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer 
wrote:
Is there a more accurate way to delete the '\0' characters at 
the end of the string? I tried functions in this module: 
https://dlang.org/phobos/std_string.html


```d
auto foo(string s)
{
  string r;
  foreach(c; s)
  {
if(c > 0)
{
  r ~= c;
}
  }
  return r;
}
```

SDB@79


Two remarks:

1. The zero implicitly added to literals is not part of the 
string. for example s[$-1] will not give 0 unless you added it 
explictly to a literal


2. you code remove all the 0, not the one at the end. As it still 
ensure what you want to achieve, maybe try 
[`stripRight()`](https://dlang.org/phobos/std_string.html#.stripRight). The second overload allows to specify the characters to remove.


Re: to delete the '\0' characters

2022-09-22 Thread ag0aep6g via Digitalmars-d-learn

On 22.09.22 13:14, ag0aep6g wrote:

On 22.09.22 12:53, Salih Dincer wrote:

[...]

```d
auto foo(string s)
{
   string r;
   foreach(c; s)
   {
 if(c > 0)
 {
   r ~= c;
 }
   }
   return r;
}
```

[...]

Here's a snippet that's a bit shorter than yours and doesn't copy the data:

     while (s.length > 0 && s[$ - 1] == '\0')
     {
     s = s[0 .. $ - 1];
     }
     return s;

But do you really want to allow embedded '\0's? I.e., should 
foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"?


Whoops. Your code actually turns "foo\0bar" into "foobar", removing the 
embedded '\0'. So my supposed alternative is wrong.


Still, you usually want to stop at the first '\0'.


Re: to delete the '\0' characters

2022-09-22 Thread ag0aep6g via Digitalmars-d-learn

On 22.09.22 12:53, Salih Dincer wrote:
Is there a more accurate way to delete the '\0' characters at the end of 
the string? I tried functions in this module: 
https://dlang.org/phobos/std_string.html


```d
auto foo(string s)
{
   string r;
   foreach(c; s)
   {
     if(c > 0)
     {
   r ~= c;
     }
   }
   return r;
}
```


I don't understand what you mean by "more accurate".

Here's a snippet that's a bit shorter than yours and doesn't copy the data:

while (s.length > 0 && s[$ - 1] == '\0')
{
s = s[0 .. $ - 1];
}
return s;

But do you really want to allow embedded '\0's? I.e., should 
foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"?


Usually, it's the first '\0' that signals the end of a string. In that 
case you better start the search at the front and stop at the first hit.


to delete the '\0' characters

2022-09-22 Thread Salih Dincer via Digitalmars-d-learn
Is there a more accurate way to delete the '\0' characters at the 
end of the string? I tried functions in this module: 
https://dlang.org/phobos/std_string.html


```d
auto foo(string s)
{
  string r;
  foreach(c; s)
  {
if(c > 0)
{
  r ~= c;
}
  }
  return r;
}
```

SDB@79