Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-28 Thread wobbles via Digitalmars-d-learn

On Sunday, 27 March 2016 at 07:45:00 UTC, ParticlePeter wrote:

On Wednesday, 23 March 2016 at 20:00:55 UTC, wobbles wrote:

[...]


Thanks Wobbles, I took your approach. There were some minor 
issues, here is a working version:


[...]


Great, thanks for fixing it up!


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-27 Thread ParticlePeter via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 20:00:55 UTC, wobbles wrote:
Again, totally untested, but I think logically it should work. 
( No D compiler on this machine so it mightn't even compile :] )


Thanks Wobbles, I took your approach. There were some minor 
issues, here is a working version:


auto multiSlice(string data, string[] delims)  {

   import std.algorithm : canFind;
   import std.string : indexOf;

   struct MultiSliceRange  {
  string m_str;
  string[] m_delims;
  bool empty(){
 return m_str.length == 0;
  }

  void popFront(){
 auto idx = findNextIndex;
 m_str = m_str[idx..$];
 return;
  }

  string front(){
 auto idx = findNextIndex;
 return m_str[0..idx];
  }

  private size_t findNextIndex()  {
 auto index = size_t.max;
 foreach(delim; m_delims)  {
if(m_str.canFind(delim))  {
   auto foundIndex = m_str.indexOf(delim);
   if(index > foundIndex && foundIndex > 0)  {
  index = foundIndex;
   }
}
 }
 return index;
  }
   }

   return MultiSliceRange(data, delims);
}


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread Simen Kjaeraas via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 18:10:05 UTC, ParticlePeter wrote:

Thanks Simen,
your tokenCounter is inspirational, for the rest I'll take some 
time for testing.


My pleasure. :) Testing it on your example data shows it to work 
there. However, as stated above, the documentation says it's 
undefined, so future changes (even optimizations and bugfixes) to 
Phobos could make it stop working:


"This predicate must be an equivalence relation, that is, it must 
be reflexive (pred(x,x) is always true), symmetric (pred(x,y) == 
pred(y,x)), and transitive (pred(x,y) && pred(y,z) implies 
pred(x,z)). If this is not the case, the range returned by 
chunkBy may assert at runtime or behave erratically."



But some additional thoughts from my sided:
I get all the lines of the file into one range. Calling array 
on it should give me an array, but how would I use find to get 
an index into this array?
With the indices I could slice up the array into four slices, 
no allocation required. If there is no easy way to just get an 
index instead of an range, I would try to use something like 
the tokenCounter to find all the indices.


The chunkBy example should not allocate. chunkBy itself is lazy, 
as are its sub-ranges. No copying of string contents is 
performed. So unless you have very specific reasons to use 
slicing, I don't see why chunkBy shouldn't be good enough.


Full disclosure:
There is a malloc call in RefCounted, which is used for 
optimization purposes when chunkBy is called on a forward range. 
When chunkBy is called on an array, that's a 6-word allocation 
(24 bytes on 32-bit, 48 bytes on 64-bit), happening once. There 
are no other dependencies that allocate.


Such is the beauty of D. :)

--
  Simen


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread wobbles via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:
I need to parse an ascii with multiple tokens. The tokens can 
be seen as keys. After every token there is a bunch of lines 
belonging to that token, the values.

The order of tokens is unknown.

I would like to read the file in as a whole string, and split 
the string with:

splitter(fileString, [token1, token2, ... tokenN]);

And would like to get a range of strings each starting with 
tokenX and ending before the next token.


Does something like this exist?

I know how to parse the string line by line and create new 
strings and append the appropriate lines, but I don't know how 
to do this with a lazy result range and new allocations.


This isn't tested, but this is my first thought:

void main(){
string testString = "this:is:a-test;"
foreach(str; testString.multiSlice([":","-",";"]))
   writefln("Got: %s", str);
}

auto multiSlice(string string, string[] delims){
   struct MultiSliceRange{
string m_str;
string[] m_delims;
bool empty(){
   return m_str.length == 0;
}

void popFront(){
   auto idx = findNextIndex;
   m_str = m_str[idx..$];
   return;
}

string front(){
auto idx = findNextIndex;
return m_str[0..idx];
}
private long findNextIndex(){
long foundIndex=-1;
foreach(delim; m_delims){
if(m_str.canFind(delim)){
if(foundIndex == -1 || m_str.indexOf(delim) 
>= 0)){

 foundIndex = m_str.indexOf(delim);
}
}
}
return foundIndex;
}
   }

   return MultiSliceRange(string, delims);
}


Again, totally untested, but I think logically it should work. ( 
No D compiler on this machine so it mightn't even compile :] )


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread ParticlePeter via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 15:23:38 UTC, Simen Kjaeraas wrote:

Without a bit more detail, it's a bit hard to help.

std.algorithm.splitter has an overload that takes a function 
instead of a separator:


import std.algorithm;
auto a = "a,b;c";
auto b = a.splitter!(e => e == ';' || e == ',');
assert(equal(b, ["a", "b", "c"]));

However, not only are the separators lost in the process, it 
only allows single-element separators. This might be good 
enough given the information you've divulged, but I'll hazard a 
guess it isn't.


My next stop is std.algorithm.chunkBy:

auto a = ["a","b","c", "d", "e"];
auto b = a.chunkBy!(e => e == "a" || e == "d");
auto result = [
tuple(true, ["a"]), tuple(false, ["b", "c"]),
tuple(true, ["d"]), tuple(false, ["e"])
];

No assert here, since the ranges in the tuples are not arrays. 
My immediate concern is that two consecutive tokens with no 
intervening values will mess it up. Also, the result looks a 
bit messy. A little more involved, and according to 
documentation not guaranteed to work:


bool isToken(string s) {
return s == "a" || s == "d";
}

bool tokenCounter(string s) {
static string oldToken;
static bool counter = true;
if (s.isToken && s != oldToken) {
oldToken = s;
counter = !counter;
}
return counter;
}

unittest {
import std.algorithm;
import std.stdio;
import std.typecons;
import std.array;

auto a = ["a","b","c", "d", "e", "a", "d"];
auto b = a.chunkBy!tokenCounter.map!(e=>e[1]);
auto result = [
["a", "b", "c"],
["d", "e"],
["a"],
["d"]
];
writeln(b);
writeln(result);
}

Again no assert, but b and result have basically the same 
contents. Also handles consecutive tokens neatly (but 
consecutive identical tokens will be grouped together).


Hope this helps.

--
  Simen


Thanks Simen,
your tokenCounter is inspirational, for the rest I'll take some 
time for testing.


But some additional thoughts from my sided:
I get all the lines of the file into one range. Calling array on 
it should give me an array, but how would I use find to get an 
index into this array?
With the indices I could slice up the array into four slices, no 
allocation required. If there is no easy way to just get an index 
instead of an range, I would try to use something like the 
tokenCounter to find all the indices.






Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread ParticlePeter via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 14:20:12 UTC, Andrea Fontana wrote:

Any input => output example?


Sure, it is ensight gold case file format:

FORMAT
type:  ensight gold

GEOMETRY
model:   1exgold2.geo**

VARIABLE
scalar per node: 1 Stress exgold2.scl**
vector per node: 1 Displacement   exgold2.dis**

TIME
time set:  1
number of steps:   3
filename start number: 0
filename increment:1
time values:   1.0   2.0   3.0


The separators would be ["FORMAT", "TIME", "VARIABLE", 
"GEOMETRY"].
The blank lines between the blocks and the order of the 
separators in the file is not known.
I would expect a range of four ranges of lines: one for each 
text-block above.





Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread Simen Kjaeraas via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:
I need to parse an ascii with multiple tokens. The tokens can 
be seen as keys. After every token there is a bunch of lines 
belonging to that token, the values.

The order of tokens is unknown.

I would like to read the file in as a whole string, and split 
the string with:

splitter(fileString, [token1, token2, ... tokenN]);

And would like to get a range of strings each starting with 
tokenX and ending before the next token.


Does something like this exist?

I know how to parse the string line by line and create new 
strings and append the appropriate lines, but I don't know how 
to do this with a lazy result range and new allocations.


Without a bit more detail, it's a bit hard to help.

std.algorithm.splitter has an overload that takes a function 
instead of a separator:


import std.algorithm;
auto a = "a,b;c";
auto b = a.splitter!(e => e == ';' || e == ',');
assert(equal(b, ["a", "b", "c"]));

However, not only are the separators lost in the process, it only 
allows single-element separators. This might be good enough given 
the information you've divulged, but I'll hazard a guess it isn't.


My next stop is std.algorithm.chunkBy:

auto a = ["a","b","c", "d", "e"];
auto b = a.chunkBy!(e => e == "a" || e == "d");
auto result = [
tuple(true, ["a"]), tuple(false, ["b", "c"]),
tuple(true, ["d"]), tuple(false, ["e"])
];

No assert here, since the ranges in the tuples are not arrays. My 
immediate concern is that two consecutive tokens with no 
intervening values will mess it up. Also, the result looks a bit 
messy. A little more involved, and according to documentation not 
guaranteed to work:


bool isToken(string s) {
return s == "a" || s == "d";
}

bool tokenCounter(string s) {
static string oldToken;
static bool counter = true;
if (s.isToken && s != oldToken) {
oldToken = s;
counter = !counter;
}
return counter;
}

unittest {
import std.algorithm;
import std.stdio;
import std.typecons;
import std.array;

auto a = ["a","b","c", "d", "e", "a", "d"];
auto b = a.chunkBy!tokenCounter.map!(e=>e[1]);
auto result = [
["a", "b", "c"],
["d", "e"],
["a"],
["d"]
];
writeln(b);
writeln(result);
}

Again no assert, but b and result have basically the same 
contents. Also handles consecutive tokens neatly (but consecutive 
identical tokens will be grouped together).


Hope this helps.

--
  Simen


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread Andrea Fontana via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 12:00:15 UTC, ParticlePeter wrote:
On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter 
wrote:


Stupid typos:

I need to parse an ascii

file

with multiple tokens. ...


...

to do this with a lazy result range and

without

new allocations.


Any input => output example?


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread ParticlePeter via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:

Stupid typos:

I need to parse an ascii

file

with multiple tokens. ...


...

to do this with a lazy result range and

without

new allocations.