parsing fastq files with D

2016-03-23 Thread eastanon via Digitalmars-d-learn
Fastq is a format for storing DNA sequences together with the 
associated quality information often encoded in ascii characters. 
It is typically made of 4 lines  for example 2 fastq entries 
would look like this.


@seq1
TTAAAT
+
?+BBB/DHH@
@seq2
GACCCTTTGCA
+
?+BHB/DIH@

I do not have a lot of D expirience and I am writing a simple 
parser to help work with  these files. Ideally it should be fast 
with low memory footprint. I am working with very large files of 
this type and can be  up to 1GB.


module fastq;

import std.stdio;
import std.file;
import std.exception;
import std.algorithm;
import std.string;

struct Record{

string sequence;
string quals;
string name;
}

auto Records(string filename){

static auto toRecords(S)(S str){

auto res = findSplitBefore(str,"+\n");

auto seq = res[0];
auto qual = res[1];

return Record(seq,qual);
}

string text = cast(string)std.file.read(filename);

enforce(text.length > 0 && text[0] == '@');
text = text[1 .. $];

auto entries = splitter(text,'@');

return map!toRecords(entries);
}

The issue with this is that the "+" character can be part of the 
quality information and I am using it to split the quality 
information from the sequence information. and ends up splitting 
the quality information which is wrong.
Ideally I do not want to use regex and I have heard of ragel for 
parsing but never used it. Such a solution would also be welcome, 
since I read it can be very fast.


Which is the idiomatic way to capture, sequence name (starts with 
@ character and the first entry) the sequence, (line2) the 
quality scores( line 4)


Re: How do you append to a dynamic array using move semantics?

2016-03-23 Thread cy via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 23:44:55 UTC, ag0aep6g wrote:

You got the order of arguments wrong here. Source goes first,


Oh, derp. Thanks. Right then... it works as expected.


Re: parsing HTML for a web robot (crawler) like application

2016-03-23 Thread Adam D. Ruppe via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 10:49:03 UTC, Nordlöw wrote:

HTML-docs here:

http://dpldocs.info/experimental-docs/arsd.dom.html


Indeed, though the docs are still a work in progress (the lib is 
now about 6 years old, but until recently, ddoc blocked me from 
using examples in the comments so I didn't bother. I've fixed 
that now though, but haven't finished writing them all up).



Basic idea though for web scraping:

auto document = new Document();
document.parseGarbage(your_html_string);

// supports most the CSS syntax, and you might also know it from 
jQuery

Element[] elements = document.querySelectorAll("css selector");
// or if you just want the first hit or null if none...
Element element = document.querySelector("css selector");


And once you have a reference:

element.innerText
element.innerHTML

to print its contents in some form.



You can do a lot more too (a LOT more), but just these functions 
should get you started.



The parseGarbage function will also need you to compile in the 
characterencodings.d file from my same github. It will handle 
charset detection and translation as well as tag soup parsing. I 
use it for a lot of web scraping myself.


Re: Compiler Specific dub Dependencies

2016-03-23 Thread Mike Parker via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 20:30:04 UTC, Jack Stouffer wrote:
Is there any way in dub to specify that a module should only be 
linked and compiled for DMD and not for LDC?


I am using the Economic Modeling containers library, and 
because it uses std.experimental.allocator, it can't be used 
with LDC through dub. I have coded in such a way with static 
if's that LDC will still compile without it, but dub will try 
to compile it anyway because it's in the dependencies JSON 
dictionary.


I would try using a default configuration with a platform 
specification. Never done it before, but it would look like this 
in SDLang:


configuration "default-app" {
platforms "dmd"
}

http://code.dlang.org/package-format?lang=sdl#configuration-settings
http://code.dlang.org/package-format?lang=json#configuration-settings


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread Simen Kjaeraas via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 18:10:05 UTC, ParticlePeter wrote:

Thanks Simen,
your tokenCounter is inspirational, for the rest I'll take some 
time for testing.


My pleasure. :) Testing it on your example data shows it to work 
there. However, as stated above, the documentation says it's 
undefined, so future changes (even optimizations and bugfixes) to 
Phobos could make it stop working:


"This predicate must be an equivalence relation, that is, it must 
be reflexive (pred(x,x) is always true), symmetric (pred(x,y) == 
pred(y,x)), and transitive (pred(x,y) && pred(y,z) implies 
pred(x,z)). If this is not the case, the range returned by 
chunkBy may assert at runtime or behave erratically."



But some additional thoughts from my sided:
I get all the lines of the file into one range. Calling array 
on it should give me an array, but how would I use find to get 
an index into this array?
With the indices I could slice up the array into four slices, 
no allocation required. If there is no easy way to just get an 
index instead of an range, I would try to use something like 
the tokenCounter to find all the indices.


The chunkBy example should not allocate. chunkBy itself is lazy, 
as are its sub-ranges. No copying of string contents is 
performed. So unless you have very specific reasons to use 
slicing, I don't see why chunkBy shouldn't be good enough.


Full disclosure:
There is a malloc call in RefCounted, which is used for 
optimization purposes when chunkBy is called on a forward range. 
When chunkBy is called on an array, that's a 6-word allocation 
(24 bytes on 32-bit, 48 bytes on 64-bit), happening once. There 
are no other dependencies that allocate.


Such is the beauty of D. :)

--
  Simen


Re: How do you append to a dynamic array using move semantics?

2016-03-23 Thread ag0aep6g via Digitalmars-d-learn

On 24.03.2016 00:44, ag0aep6g wrote:

On 24.03.2016 00:26, cy wrote:

++items.length
move(items[$-1],item); // Error: struct Thing is not copyable because it
is annotated with @disable


You got the order of arguments wrong here. Source goes first, target
second. Works for me with `move(item, items[$-1]);`.


Though it should compile the other way around, too. And it does for me.


Re: How do you append to a dynamic array using move semantics?

2016-03-23 Thread ag0aep6g via Digitalmars-d-learn

On 24.03.2016 00:26, cy wrote:

++items.length
move(items[$-1],item); // Error: struct Thing is not copyable because it
is annotated with @disable


You got the order of arguments wrong here. Source goes first, target 
second. Works for me with `move(item, items[$-1]);`.


How do you append to a dynamic array using move semantics?

2016-03-23 Thread cy via Digitalmars-d-learn

struct Thing {
  @disable this(this);
}
...
items ~= move(item); // Error: struct Thing is not copyable 
because it is annotated with @disable


++items.length
move(items[$-1],item); // Error: struct Thing is not copyable 
because it is annotated with @disable


appender(items).put(move(item)); // Error: template 
std.array.Appender!(Thing[]).Appender.put cannot deduce function 
from argument types !()(Thing)


...?


Re: inout and templates don't mix...

2016-03-23 Thread Ali Çehreli via Digitalmars-d-learn

On 03/23/2016 02:31 PM, cy wrote:

> struct Someop(Type) {
>Type thing;
>void foo() {
>  thing.bar();
>}
> }
>
> struct Foo {
>void bar() {
>  import std.stdio: writeln;
>  writeln("bar");
>}
> }
>
> struct Bar {
>void thingy(inout(Foo) foo) inout {
>  auto op = Someop(foo);

The following is a workaround for this example:

  auto op = Someop!Foo(foo);

I'm not sure whether Someop's implicit constructor should take part in 
deducing Someop's Type template parameter.


Ali



Re: Checking if a port is listening

2016-03-23 Thread Lucien via Digitalmars-d-learn

On Saturday, 19 March 2016 at 18:24:38 UTC, Marc Schütz wrote:

On Saturday, 19 March 2016 at 09:55:13 UTC, Lucien wrote:

const int MAX = 64;
Socket[] sockets = new Socket[MAX];
string ipb = "192.168.0.";

for (int i = 1; i < MAX; i++) {


Here's the reason for your SEGV: You need to start at 0, 
because otherwise `sockets[0]` is `null`. When you add that to 
the SocketSet, it will trigger the segfault. I guess you want 
to skip the 0 because it represents the subnet address; in that 
case, you simply mustn't add `sockets[0]` to the set.


But then there is another problems: You're using `select()` the 
wrong way. The point of using select() is that you can check 
things asynchronously. Your code should be structured like this 
(pseudo code):


auto ss = new SocketSet();
for(i; 1 .. MAX) {
auto s = new Socket(...);
s.blocking = false;
s.connect(...);
ss.add(s);
}

while(ss.count > 0) {
auto write_ss = ss.dup;
auto status = Socket.select(null /* read */, write_ss /* 
write */, null /* error */, 500.msecs);

// for a connect()ing socket, writeability means connected
if(status < 0)
writeln("interrupted, retrying");
else if(status == 0)
writeln("timeout, retrying");
else {
writeln(status, " socket(s) changed state");
for(fd; 0 .. write_ss.maxfd+1) {
// check whether this socket has changed
if(!write_ss.isSet(fd)) continue;
// if yes, remove it from the original SocketSet
ss.remove(fd);
writeln("successfully connected to 192.168.0.", 
fd+1);

}
}
}


This code works fine :
--
import std.stdio;
import std.socket;
import std.conv;
import core.time;
import core.thread;

void main()
{
const int MAX = 254, TRIES = 5;
Socket[] sockets = new Socket[MAX];
string ipb = "192.168.0.";
SocketSet ss = new SocketSet();


for (int i = 0; i < MAX; i++) {
string ip = ipb~to!string(i+1);

Socket s = new Socket(AddressFamily.INET, 
std.socket.SocketType.STREAM, ProtocolType.TCP);

s.blocking = false;
InternetAddress ia = new InternetAddress(ip, 22);
sockets[i] = s;
s.connect(ia);
ss.add(s);
}
Thread.sleep(100.msecs);
for (int t = 0; t < TRIES; t++)
{
SocketSet write_ss = ss;
int status = Socket.select(null, write_ss, null, 
100.msecs);


if(status < 0)
writeln("interrupted, retrying");
else if(status == 0)
{
writeln("timeout, retrying");
} else {
writeln(status, " socket(s) changed state");
for (int i = 0; i < write_ss.tupleof[1] -2; i++) {

string ip = "192.168.0."~to!string(i+1);
Socket fd = sockets[i];
if(!ss.isSet(fd))
continue;
ss.remove(fd);
writeln("successfully connected to ", ip);
}
}
}
writeln("DONE");
}
--

When I remove the Thread.sleep, it doesn't find all adresses. Why 
?


inout and templates don't mix...

2016-03-23 Thread cy via Digitalmars-d-learn

halp

There's a module that tries to define complex operations on both 
const and non-const structs, since it's the same operation for 
both. So every function that invokes those operations is 
copy-pasted twice, just with "const" added. Switching to inout to 
eliminate that huge amount of code duplication causes an error, I 
can't figure out how to fix.


struct Someop(Type) {
  Type thing;
  void foo() {
thing.bar();
  }
}

struct Foo {
  void bar() {
import std.stdio: writeln;
writeln("bar");
  }
}

struct Bar {
  void thingy(inout(Foo) foo) inout {
auto op = Someop(foo);
op.foo();
  }
}

void main() {
  Foo foo;
  Bar bar;
  bar.thingy(foo);
}

=>

Error: struct derp.Someop cannot deduce function from argument 
types !()(inout(Foo))


if I put in Someop!(typeof(foo))(foo) it gives the error:

Error: variable derp.Someop!(inout(Foo)).Someop.thing only 
parameters or stack based variables can be inout


...even though Someop is a struct allocated on the stack.

What I'm dealing with is like:

struct Bar {
  void thingy(Foo foo) {
auto op = Someop(foo);
//...lotsastuff...
op.foo();
  }
  void thingy(const(Foo) foo) const {
auto op = Someop(foo);
//...lotsastuff...
op.foo();
  }
  // repeat ad-nauseum...
}


Re: If I understand const right...

2016-03-23 Thread ag0aep6g via Digitalmars-d-learn

On 23.03.2016 22:26, ag0aep6g wrote:

On 23.03.2016 22:18, cy wrote:

On Wednesday, 23 March 2016 at 21:10:49 UTC, ag0aep6g wrote:

[...]

b = new int(*b + 1);

Here "b" is pointing to mutable heap allocated data, which got cast to
constant.

with b = b + 1, it's still constant memory.


It's stack memory. Its constness isn't any more physical than with `new`.


PS: You can also allocate an explicitly immutable int:


b = new immutable(int)(*b + 1);



Re: If I understand const right...

2016-03-23 Thread ag0aep6g via Digitalmars-d-learn

On 23.03.2016 22:18, cy wrote:

On Wednesday, 23 March 2016 at 21:10:49 UTC, ag0aep6g wrote:

[...]

b = new int(*b + 1);

Here "b" is pointing to mutable heap allocated data, which got cast to
constant.

with b = b + 1, it's still constant memory.


It's stack memory. Its constness isn't any more physical than with `new`.


Re: If I understand const right...

2016-03-23 Thread cy via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 21:10:49 UTC, ag0aep6g wrote:

Just to be 100% clear: you're adding to the pointer here,


No, that's what I meant to do.


b = new int(*b + 1);
Here "b" is pointing to mutable heap allocated data, which got 
cast to constant.


with b = b + 1, it's still constant memory.




Re: If I understand const right...

2016-03-23 Thread ag0aep6g via Digitalmars-d-learn

On 23.03.2016 21:52, cy wrote:

const(int)[2] a = [23,24];
const(int)* b = a;


Should be: const(int)* b = a.ptr;


writeln(," always constant");
writeln(a, " always constant");


There's some subtlety here. `a` itself is not const, but its elements 
are. `a` being a fixed-sized array, you can't actually change anything 
when the elements are not mutable. Things would be different with a 
dynamic array.



writeln(a[0]," always constant");
writeln(," always constant");


The address of a local variable isn't exactly const/immutable in the 
sense of the type system, I think. It simply doesn't change during the 
run of the function, and afterwards it's not considered alive anymore.



writeln(b," always mutable");
writeln(*b, "constant");


Yup and yup.


b = b + 1;


Just to be 100% clear: you're adding to the pointer here, not to the 
value that's being pointed at. It's 24, because that's the second item 
in `a`.


You can also allocate a whole new int and add set it to `*b + 1`. I 
think that's closer to your original goal.



b = new int(*b + 1);



writeln(*b, "also constant, but a different one.");

something like that...




getOverloads, but also include all the imported members

2016-03-23 Thread Yuxuan Shui via Digitalmars-d-learn

Say:

module one;
void func(int a){}

/

module two;
import one;
void func(float a){}

Is there a way to get both func() in module two?


If I understand const right...

2016-03-23 Thread cy via Digitalmars-d-learn

a = a + 1

a is const, a + 1 is const, yet a can't be assigned to a + 1. And 
I think the reason is like...


const(int) a = 23;
while(something()) {
  a = a + 1;
}

in the first iteration, a is set to 23, and the value of "a + 1" 
is 24, but where is the computer gonna store that 24? It can't 
store it where 23 is, because that's constant data. In a register 
variable? What about the next iteration? A runtime queue of 
previously calculated consts that builds up with each iteration?


...not gonna happen. So since there's nowhere to store that 24 
(without some non-const variable to store it in), you can't point 
"a" at the new address, even if 24 itself would fit inside 
another constant bit of memory just fine.


I'm actually used to the runtime queue thing, from scheme and the 
like. "a = a + 1" allocates a new bit of memory, made immutable 
and never changed from then on, storing "a + 1" in it and 
pointing a at it. And if "a + 1" has already been calculated, it 
finds that old value and reuses it.


So I think that's why you can't assign to a constant variable, is 
that there's no locating/initializing of new constant memory on 
the fly, to have a place to put that 24, 25, etc. Variables, even 
mutable variables, always have the same address, and any of us 
who are confused can think of assigment as a storage operation, 
more like erlang's "=>" rather than scheme's "(let)".


To "change" an address, you have to use a mutable pointer (or the 
like). The variable will always have the same address, but 
there's a second address stored at that address, and since the 
storage is mutable, that second address can be changed, by 
mutating the memory stored at the first address.


So like...

const(int)[2] a = [23,24];
const(int)* b = a;
writeln(," always constant");
writeln(a, " always constant");
writeln(a[0]," always constant");
writeln(," always constant");
writeln(b," always mutable");
writeln(*b, "constant");
b = b + 1;
writeln(*b, "also constant, but a different one.");

something like that...


Compiler Specific dub Dependencies

2016-03-23 Thread Jack Stouffer via Digitalmars-d-learn
Is there any way in dub to specify that a module should only be 
linked and compiled for DMD and not for LDC?


I am using the Economic Modeling containers library, and because 
it uses std.experimental.allocator, it can't be used with LDC 
through dub. I have coded in such a way with static if's that LDC 
will still compile without it, but dub will try to compile it 
anyway because it's in the dependencies JSON dictionary.


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread wobbles via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:
I need to parse an ascii with multiple tokens. The tokens can 
be seen as keys. After every token there is a bunch of lines 
belonging to that token, the values.

The order of tokens is unknown.

I would like to read the file in as a whole string, and split 
the string with:

splitter(fileString, [token1, token2, ... tokenN]);

And would like to get a range of strings each starting with 
tokenX and ending before the next token.


Does something like this exist?

I know how to parse the string line by line and create new 
strings and append the appropriate lines, but I don't know how 
to do this with a lazy result range and new allocations.


This isn't tested, but this is my first thought:

void main(){
string testString = "this:is:a-test;"
foreach(str; testString.multiSlice([":","-",";"]))
   writefln("Got: %s", str);
}

auto multiSlice(string string, string[] delims){
   struct MultiSliceRange{
string m_str;
string[] m_delims;
bool empty(){
   return m_str.length == 0;
}

void popFront(){
   auto idx = findNextIndex;
   m_str = m_str[idx..$];
   return;
}

string front(){
auto idx = findNextIndex;
return m_str[0..idx];
}
private long findNextIndex(){
long foundIndex=-1;
foreach(delim; m_delims){
if(m_str.canFind(delim)){
if(foundIndex == -1 || m_str.indexOf(delim) 
>= 0)){

 foundIndex = m_str.indexOf(delim);
}
}
}
return foundIndex;
}
   }

   return MultiSliceRange(string, delims);
}


Again, totally untested, but I think logically it should work. ( 
No D compiler on this machine so it mightn't even compile :] )


Re: Variant.type bug ?

2016-03-23 Thread Voitech via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 19:18:50 UTC, Chris Wright wrote:
Consider the `coerce` method: 
http://dpldocs.info/experimental-docs/std.variant.VariantN.coerce.html


Example:

import std.variant;
class A {}
class B : A {}

void main()
{
A b = new B;
auto bb = Variant(b).coerce!B;
assert (bb !is null);
}


Magnificent! Thank you ! :)


Re: pass a struct by value/ref and size of the struct

2016-03-23 Thread kinke via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:35:49 UTC, ZombineDev wrote:
If the object is larger than the size of a register on the 
target machine, it is implicitly passed by ref


That's incorrect. As Johan pointed out, this is somewhat true for 
the Win64 ABI (but it firstly copies the argument before passing 
a pointer to it!), but it's not for the 32-bit x86 and x86_64 
System V (used on all non-Windows platforms) ABIs. System V is 
especially elaborate and may pass structs up to twice the size of 
a register in 2 registers. Bigger structs passed by value are 
blitted into the function arguments stack in memory. They are 
then accessed by the callee via a stack offset, that's correct, 
but I wouldn't call that implicit-by-ref-passing, as copying does 
take place, unless the optimizer decides it's unnecessary.


So passing structs > 64-bit by value on Win64 never pays off 
(there's always an indirection); using `const ref(T)` where 
possible makes sure you at least elide the copy. But then again, 
you'll very soon find out that it's not really an option as 
rvalues cannot be passed byref in D, something a lot of people 
[including myself if not already obvious :)] hate about D.


Re: Variant.type bug ?

2016-03-23 Thread Chris Wright via Digitalmars-d-learn
Consider the `coerce` method:
http://dpldocs.info/experimental-docs/std.variant.VariantN.coerce.html

Example:

import std.variant;
class A {}
class B : A {}

void main()
{
A b = new B;
auto bb = Variant(b).coerce!B;
assert (bb !is null);
}


Re: Updating D-based apps without recompiling it

2016-03-23 Thread Chris Wright via Digitalmars-d-learn
On Wed, 23 Mar 2016 12:21:33 +, Ozan wrote:
> Enterprise applications in productive environments requires smooth
> updating mechanisms without recompiling or reinstalling.

The industry standard is to build on a build server and stop the 
application to update, but to have enough redundancy that users don't see 
any interruption of service. That's how Google and Amazon do it.

There are a bare handful of systems that let you avoid that process. In 
general, it's hard enough for humans to reason about how their 
application's durable state will handle application updates; adding 
volatile state into the picture is much harder, and for little gain.


Re: byChunk odd behavior?

2016-03-23 Thread cym13 via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 03:17:05 UTC, Hanh wrote:

Thanks for your help everyone.

I agree that the issue is due to the misusage of an InputRange 
but what is the semantics of 'take' when applied to an 
InputRange? It seems that calling it invalidates the range; in 
which case what is the recommended way to get a few bytes and 
keep on advancing.


Doing *anything* to a range invalidates it (or at least you 
should expect it to), a range is read-once. Never reuse a range. 
Some ranges can be saved in order to use a copy of it, but never 
expect a range to be implicitely reusable.



For instance, to read a ushort, I use
range.read!(ushort)()
Unfortunately, it reads a single value.

For now, I use a loop

foreach (element ; range.enumerate) {
  buffer[i] = range.front;
  range.popFront();
  }

Is there a more idiomatic way to do the same thing?


Two ways, the first one being for reference:

import std.range: enumerate;
foreach (element, index ; range.enumerate) {
buffer[index] = element;
}

And the other one

In Scala, 'take' consumes bytes from the iterator. So the same 
code would be

buffer = range.take(N).toArray


Then just do that!

import std.range, std.array;
auto buffer = range.take(N).array;

auto example = iota(0, 200, 5).take(5).array;
assert(example == [0, 5, 10, 15, 20]);



Re: byChunk odd behavior?

2016-03-23 Thread Chris Wright via Digitalmars-d-learn
On Wed, 23 Mar 2016 03:17:05 +, Hanh wrote:
> In Scala, 'take' consumes bytes from the iterator. So the same code
> would be buffer = range.take(N).toArray

import std.range, std.array;
auto bytes = byteRange.takeExactly(N).array;

There's also take(N), but if the range contains fewer than N elements, it 
will only give you as many as the range contains. If If you're trying to 
deserialize something, takeExactly is probably better.


http://dpldocs.info/experimental-docs/std.range.takeExactly.html
http://dpldocs.info/experimental-docs/std.array.array.1.html


Re: Something wrong with GC

2016-03-23 Thread ag0aep6g via Digitalmars-d-learn

On 22.03.2016 16:56, ag0aep6g wrote:

I've filed an issue: https://issues.dlang.org/show_bug.cgi?id=15821


And it's been fixed:
https://github.com/D-Programming-Language/druntime/pull/1519

Since the issue was a regression, the fix was made against the stable 
branch. It's going to be in the next release. But it's not yet in 
master, which means it's also not going to be in the nightlies for now.


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread ParticlePeter via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 15:23:38 UTC, Simen Kjaeraas wrote:

Without a bit more detail, it's a bit hard to help.

std.algorithm.splitter has an overload that takes a function 
instead of a separator:


import std.algorithm;
auto a = "a,b;c";
auto b = a.splitter!(e => e == ';' || e == ',');
assert(equal(b, ["a", "b", "c"]));

However, not only are the separators lost in the process, it 
only allows single-element separators. This might be good 
enough given the information you've divulged, but I'll hazard a 
guess it isn't.


My next stop is std.algorithm.chunkBy:

auto a = ["a","b","c", "d", "e"];
auto b = a.chunkBy!(e => e == "a" || e == "d");
auto result = [
tuple(true, ["a"]), tuple(false, ["b", "c"]),
tuple(true, ["d"]), tuple(false, ["e"])
];

No assert here, since the ranges in the tuples are not arrays. 
My immediate concern is that two consecutive tokens with no 
intervening values will mess it up. Also, the result looks a 
bit messy. A little more involved, and according to 
documentation not guaranteed to work:


bool isToken(string s) {
return s == "a" || s == "d";
}

bool tokenCounter(string s) {
static string oldToken;
static bool counter = true;
if (s.isToken && s != oldToken) {
oldToken = s;
counter = !counter;
}
return counter;
}

unittest {
import std.algorithm;
import std.stdio;
import std.typecons;
import std.array;

auto a = ["a","b","c", "d", "e", "a", "d"];
auto b = a.chunkBy!tokenCounter.map!(e=>e[1]);
auto result = [
["a", "b", "c"],
["d", "e"],
["a"],
["d"]
];
writeln(b);
writeln(result);
}

Again no assert, but b and result have basically the same 
contents. Also handles consecutive tokens neatly (but 
consecutive identical tokens will be grouped together).


Hope this helps.

--
  Simen


Thanks Simen,
your tokenCounter is inspirational, for the rest I'll take some 
time for testing.


But some additional thoughts from my sided:
I get all the lines of the file into one range. Calling array on 
it should give me an array, but how would I use find to get an 
index into this array?
With the indices I could slice up the array into four slices, no 
allocation required. If there is no easy way to just get an index 
instead of an range, I would try to use something like the 
tokenCounter to find all the indices.






Re: Finding out names in shared libraries

2016-03-23 Thread Jacob Carlborg via Digitalmars-d-learn

On 2016-03-23 16:17, Ozan wrote:

Hi


If I want to use a class or a function in a shared library, it is
necessary to use funny names like
"D7myclass10getMyClassFZC7myclass7MyClass".

Is it possible to get a list of all the names in shared library? What is
the schema behind these names? Is there a listing for "D7", "10",
"FZC7", "7" and so on?


Not exactly sure what you need. But it's possible to get all classes at 
runtime like this [1].


For functions, there's really no pretty way to do that. You can either 
implement runtime reflection using compile time reflection, which will 
most likely require you to modify the code you want to inspect. Or you 
can inspect the symbol table in the binary/shared library, which is a 
bit complicated and platform dependent.


[1] 
https://github.com/D-Programming-Language/druntime/blob/master/src/object.d#L973


--
/Jacob Carlborg


Re: Updating D-based apps without recompiling it

2016-03-23 Thread Jacob Carlborg via Digitalmars-d-learn

On 2016-03-23 18:15, Jesse Phillips wrote:


Do you have an example of this being done in any other language?


In Erlang it's possible to hot swap code. I'm not sure how it works 
though. But if we're talking servers, the easiest is to have multiple 
instances and restart one at the time with the new code.


--
/Jacob Carlborg


Re: Updating D-based apps without recompiling it

2016-03-23 Thread Jesse Phillips via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 12:21:33 UTC, Ozan wrote:

Hi


Enterprise applications in productive environments requires 
smooth updating mechanisms without recompiling or reinstalling. 
It's not possible to stop an enterprise application, then run 
"dub --reforce" and wait until finish. Mostly only few 
functions need to be replaced.


Has someone experience with handling upgrading/updating D-Apps 
on the fly?


Working with dynamic libraries or distributed components is not 
secure enough,
but maybe there are solutions, maybe around base calls and 
functions or completely different.



Regards, Ozan


Do you have an example of this being done in any other language? 
Essentially whatever code is being replaced, you're going to need 
to recompile it. If you're not using dynamic/shared libraries 
Adam is pointing you in the right direction.


If it is a desktop application then it is probably easiest if it 
communicates to a local service that provides the "replaceable" 
functions, when you stand up the new service the app can transfer 
the communication to it.


I can't speak to your security concerns.


Re: Finding out names in shared libraries

2016-03-23 Thread Ozan via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 15:17:18 UTC, Ozan wrote:
If I want to use a class or a function in a shared library, it 
is necessary to use funny names like 
"D7myclass10getMyClassFZC7myclass7MyClass".


Is it possible to get a list of all the names in shared 
library? What is the schema behind these names? Is there a 
listing for "D7", "10", "FZC7", "7" and so on?


Solved in the core.demangle module...
The demangle module converts mangled D symbols to a 
representation similar to what would have existed in code.


Regards, Ozan


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread ParticlePeter via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 14:20:12 UTC, Andrea Fontana wrote:

Any input => output example?


Sure, it is ensight gold case file format:

FORMAT
type:  ensight gold

GEOMETRY
model:   1exgold2.geo**

VARIABLE
scalar per node: 1 Stress exgold2.scl**
vector per node: 1 Displacement   exgold2.dis**

TIME
time set:  1
number of steps:   3
filename start number: 0
filename increment:1
time values:   1.0   2.0   3.0


The separators would be ["FORMAT", "TIME", "VARIABLE", 
"GEOMETRY"].
The blank lines between the blocks and the order of the 
separators in the file is not known.
I would expect a range of four ranges of lines: one for each 
text-block above.





Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread Simen Kjaeraas via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:
I need to parse an ascii with multiple tokens. The tokens can 
be seen as keys. After every token there is a bunch of lines 
belonging to that token, the values.

The order of tokens is unknown.

I would like to read the file in as a whole string, and split 
the string with:

splitter(fileString, [token1, token2, ... tokenN]);

And would like to get a range of strings each starting with 
tokenX and ending before the next token.


Does something like this exist?

I know how to parse the string line by line and create new 
strings and append the appropriate lines, but I don't know how 
to do this with a lazy result range and new allocations.


Without a bit more detail, it's a bit hard to help.

std.algorithm.splitter has an overload that takes a function 
instead of a separator:


import std.algorithm;
auto a = "a,b;c";
auto b = a.splitter!(e => e == ';' || e == ',');
assert(equal(b, ["a", "b", "c"]));

However, not only are the separators lost in the process, it only 
allows single-element separators. This might be good enough given 
the information you've divulged, but I'll hazard a guess it isn't.


My next stop is std.algorithm.chunkBy:

auto a = ["a","b","c", "d", "e"];
auto b = a.chunkBy!(e => e == "a" || e == "d");
auto result = [
tuple(true, ["a"]), tuple(false, ["b", "c"]),
tuple(true, ["d"]), tuple(false, ["e"])
];

No assert here, since the ranges in the tuples are not arrays. My 
immediate concern is that two consecutive tokens with no 
intervening values will mess it up. Also, the result looks a bit 
messy. A little more involved, and according to documentation not 
guaranteed to work:


bool isToken(string s) {
return s == "a" || s == "d";
}

bool tokenCounter(string s) {
static string oldToken;
static bool counter = true;
if (s.isToken && s != oldToken) {
oldToken = s;
counter = !counter;
}
return counter;
}

unittest {
import std.algorithm;
import std.stdio;
import std.typecons;
import std.array;

auto a = ["a","b","c", "d", "e", "a", "d"];
auto b = a.chunkBy!tokenCounter.map!(e=>e[1]);
auto result = [
["a", "b", "c"],
["d", "e"],
["a"],
["d"]
];
writeln(b);
writeln(result);
}

Again no assert, but b and result have basically the same 
contents. Also handles consecutive tokens neatly (but consecutive 
identical tokens will be grouped together).


Hope this helps.

--
  Simen


Finding out names in shared libraries

2016-03-23 Thread Ozan via Digitalmars-d-learn

Hi


If I want to use a class or a function in a shared library, it is 
necessary to use funny names like 
"D7myclass10getMyClassFZC7myclass7MyClass".


Is it possible to get a list of all the names in shared library? 
What is the schema behind these names? Is there a listing for 
"D7", "10", "FZC7", "7" and so on?



Regards, Ozan




Re: Variant.type bug ?

2016-03-23 Thread Voitech via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 12:52:24 UTC, Adam D. Ruppe wrote:

On Wednesday, 23 March 2016 at 08:01:36 UTC, Voitech wrote:
Hi Variant stores variant.type as not the "highest" in 
hierarchy.


Yeah, it stores the static type. You can use it to get that 
then do a normal dynamic cast to test for a more derived type.


Ok but how to handle sittuation like this ?

class TypeHolder{
import std.variant;
Variant[TypeInfo] data;

void add(T)(T value){
data[typeid(value)]=value;
}

T getByType(T)(){
Variant retVar=data.get(typeid(T),Variant(null));
T val=retVar.get!T; //fails
return val;
}

}
unittest{
import std.variant;
A a= new A;
B b= new B;
C c = new C;

A ab= new B;
A ac = new C;
TypeHolder holder = new TypeHolder;
holder.add(a);
holder.add(ab);
holder.add(ac);
assert(holder.data.length==3);
A result=holder.getByType!A;
assert(result==a);
result=holder.getByType!B; //fails
assert(result==ab);
result=holder.getByType!C; //fails
assert(result==ac);
}

I can hold objects in other AA but Object[TypeInfo] rather  than 
Variant. Or is there a way to get super type of provided T ?





Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread Andrea Fontana via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 12:00:15 UTC, ParticlePeter wrote:
On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter 
wrote:


Stupid typos:

I need to parse an ascii

file

with multiple tokens. ...


...

to do this with a lazy result range and

without

new allocations.


Any input => output example?


Re: parsing HTML for a web robot (crawler) like application

2016-03-23 Thread Andrea Fontana via Digitalmars-d-learn
On Wednesday, 23 March 2016 at 09:02:37 UTC, Martin Tschierschke 
wrote:

Hello!
I want to set up a web robot to detect changes on certain web 
pages or sites.
Any hint to similar projects or libraries at dub or git to look 
at,

before starting to develop my own RegExp for parsing?

Best regards
mt.


See also: http://code.dlang.org/packages/htmld


Re: Variant.type bug ?

2016-03-23 Thread Adam D. Ruppe via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 08:01:36 UTC, Voitech wrote:
Hi Variant stores variant.type as not the "highest" in 
hierarchy.


Yeah, it stores the static type. You can use it to get that then 
do a normal dynamic cast to test for a more derived type.




Re: Updating D-based apps without recompiling it

2016-03-23 Thread Adam D. Ruppe via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 12:21:33 UTC, Ozan wrote:
Has someone experience with handling upgrading/updating D-Apps 
on the fly?


The way I always did it was to simply have old and new running 
side-by-side in the transition.


So, without stopping the old version, compile the new one and 
start it. Tell the web server to start using the new one for all 
new connections without breaking any existing connections.


Then when all existing connections are finished, you can stop the 
old one and remove it.


Updating D-based apps without recompiling it

2016-03-23 Thread Ozan via Digitalmars-d-learn

Hi


Enterprise applications in productive environments requires 
smooth updating mechanisms without recompiling or reinstalling. 
It's not possible to stop an enterprise application, then run 
"dub --reforce" and wait until finish. Mostly only few functions 
need to be replaced.


Has someone experience with handling upgrading/updating D-Apps on 
the fly?


Working with dynamic libraries or distributed components is not 
secure enough,
but maybe there are solutions, maybe around base calls and 
functions or completely different.



Regards, Ozan


Re: Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread ParticlePeter via Digitalmars-d-learn

On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:

Stupid typos:

I need to parse an ascii

file

with multiple tokens. ...


...

to do this with a lazy result range and

without

new allocations.







Does something like std.algorithm.iteration:splitter with multiple seperators exist?

2016-03-23 Thread ParticlePeter via Digitalmars-d-learn
I need to parse an ascii with multiple tokens. The tokens can be 
seen as keys. After every token there is a bunch of lines 
belonging to that token, the values.

The order of tokens is unknown.

I would like to read the file in as a whole string, and split the 
string with:

splitter(fileString, [token1, token2, ... tokenN]);

And would like to get a range of strings each starting with 
tokenX and ending before the next token.


Does something like this exist?

I know how to parse the string line by line and create new 
strings and append the appropriate lines, but I don't know how to 
do this with a lazy result range and new allocations.


Re: Something wrong with GC

2016-03-23 Thread thedeemon via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 13:46:41 UTC, stunaep wrote:

So what am I do to?


Just learn more about available containers and their semantics. 
Maybe you don't need Array!T when there is a simple T[].
If you think you do need Array, then think about memory 
management: where are you going to allocate the data - in the GC 
heap or outside it. Depending on your answers there are different 
approaches. It's all solvable if you pause and think what exactly 
you're trying to do.




Re: parsing HTML for a web robot (crawler) like application

2016-03-23 Thread Nordlöw via Digitalmars-d-learn
On Wednesday, 23 March 2016 at 09:06:37 UTC, Rene Zwanenburg 
wrote:
Adam's dom.d will get you pretty far. I believe it can also 
handle documents that aren't completely well-formed.


https://github.com/adamdruppe/arsd/blob/master/dom.d


HTML-docs here:

http://dpldocs.info/experimental-docs/arsd.dom.html

throught Adam's own web-service.


Re: parsing HTML for a web robot (crawler) like application

2016-03-23 Thread Martin Tschierschke via Digitalmars-d-learn
On Wednesday, 23 March 2016 at 09:06:37 UTC, Rene Zwanenburg 
wrote:

[...]


Adam's dom.d will get you pretty far. I believe it can also 
handle documents that aren't completely well-formed.


https://github.com/adamdruppe/arsd/blob/master/dom.d

Thank you! This forum has an incredible fast auto responder ;-)



Re: parsing HTML for a web robot (crawler) like application

2016-03-23 Thread Rene Zwanenburg via Digitalmars-d-learn
On Wednesday, 23 March 2016 at 09:02:37 UTC, Martin Tschierschke 
wrote:

Hello!
I want to set up a web robot to detect changes on certain web 
pages or sites.
Any hint to similar projects or libraries at dub or git to look 
at,

before starting to develop my own RegExp for parsing?

Best regards
mt.


Adam's dom.d will get you pretty far. I believe it can also 
handle documents that aren't completely well-formed.


https://github.com/adamdruppe/arsd/blob/master/dom.d


parsing HTML for a web robot (crawler) like application

2016-03-23 Thread Martin Tschierschke via Digitalmars-d-learn

Hello!
I want to set up a web robot to detect changes on certain web 
pages or sites.
Any hint to similar projects or libraries at dub or git to look 
at,

before starting to develop my own RegExp for parsing?

Best regards
mt.


Variant.type bug ?

2016-03-23 Thread Voitech via Digitalmars-d-learn
Hi Variant stores variant.type as not the "highest" in hierarchy. 
Like this

A a= new A;
A b = new B; //B:A
Variant bVar=Variant(b);
bVar.type will be typeid(A) not typeid(B). Is this intentional ? 
If so is there a way to get "concrete" type of "b" variable like 
when passing to template function ?


void templateFunc(T)(T v){//just test function for B not used 
with other type

import std.variant;
typeof(v) val=v;//concrete type ??
Variant var=val;
assert(var.type==typeid(B));//fails
}

unittest{
A b= new B;
templateFunc(b);
}

Types and unittests:

module typeTest;
import std.traits;
import std.meta;
class A{

void a(){}
}

class B:A{
int b(){
return 1;
}

}

class C:B,D{
string c(){
return "";
}
override int d() {
return 0;   
}
}

interface D{
int d();
}

void templateFunc(T)(T v){//just test function for B not used 
with other

import std.variant;
typeof(v) val=v;//concrete type ??
Variant var=val;
assert(var.type==typeid(B));//fails
}

unittest{
A b= new B;
templateFunc(b);
}

unittest{
import std.variant;
A a= new A;
B b= new B;
C c = new C;

A ab= new B;
A ac = new C;

Variant variant;
variant=a;
assert(typeid(a) == variant.type);
variant=b;
assert(typeid(b) == variant.type);
variant=c;
assert(typeid(c) == variant.type);
variant=ab;
assert(typeid(ab) == variant.type); //fails
variant=ac;
assert(typeid(ac) == variant.type); //fails
}