Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Uknown via Digitalmars-d-announce

On Friday, 11 May 2018 at 23:46:16 UTC, Dmitry Olshansky wrote:
On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer 
wrote:

On 5/11/18 1:30 AM, Dmitry Olshansky wrote:
On Thursday, 10 May 2018 at 23:22:02 UTC, Steven 
Schveighoffer wrote:
grep on Mac is a piece of sheat, sadly and I don’t know why 
exactly (too old?). Use some 3-rd party thing like ‘sift’ 
written in Go.


You can always use GNU grep. The one that comes with macOS is 
pretty old and slow. If you have macports, its just `port install 
grep`. I'm sure brew will have a similar package for GNU grep.


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Dmitry Olshansky via Digitalmars-d-announce
On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer 
wrote:

On 5/11/18 1:30 AM, Dmitry Olshansky wrote:
On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer 
wrote:
OK, so at dconf I spoke with a few very smart guys about how 
I can use mmap to make a zero-copy buffer. And I implemented 
this on the plane ride home.


However, I am struggling to find a use case for this that 
showcases why you would want to use it. While it does work, 
and works beautifully, it doesn't show any measurable 
difference vs. the array allocated buffer that copies data 
when it needs to extend.


I’d start with something clinicaly synthetic.
Say your record size is exactly half of buffer + 1 byte. If 
you were to extend the size of buffer, it would amortize.


Hm.. this wouldn't work, because the idea is to keep some of 
the buffer full. What will happen here is that the buffer will 
extend to be able to accomodate the extra byte, and then you 
are back to having less of the buffer full at once. Iopipe is 
not afraid to increase the buffer :)


Then you cannot test it in such way.





Basically:
16 Mb buffer fixed
vs
16 Mb mmap-ed ring

Where you read pieces in 8M+1 blocks.Yes, we are aiming to 
blow the CPU cache there. Otherwise CPU cache is so fast that 
ocasional copy is zilch, once we hit primary memory it’s not. 
Adjust sizes for your CPU.


This isn't how it will work. The system looks at the buffer and 
says "oh, I can just read 8MB - 1 byte," which gives you 2 
bytes less than you need. Then you need the extra 2 bytes, so 
it will increase the buffer to hold at least 2 records.


I do get the point of having to go outside the cache. I'll look 
and see if maybe specifying a 1000 line context helps ;)


Nope. Consider reading binary records where you know length in 
advance and skip over it w/o need to touch every byte. There it 
might help. If you touch every byte and do something the cost of 
copying the tail is zilch.


One example is net string which is:

13,Hello, world!

Basically length in ascii digits ‘,’ followed by tgat much UTF-8 
codeunits.

No decoding nessary.

Torrent files use that I think, maybe other files. Is a nice 
example that avoids scans to find delimiters.




Update: nope, still pretty much the same.

The amount of work done per byte though has to be minimal to 
actually see anything.


Right, this is another part of the problem -- if copying is so 
rare compared to the other operations, then the difference is 
going to be lost in the noise.


What I have learned here is:

1. Ring buffers are really cool (I still love how it works) and 
perform as well as normal buffers


This is also good. Normal ring buffers usually suck  in speed 
department.



2. The use cases are much smaller than I thought
3. In most real-world applications, they are a wash, and not 
worth the OS tricks needed to use it.
4. iopipe makes testing with a different kind of buffer really 
easy, which was one of my original goals. So I'm glad that 
works!


I'm going to (obviously) leave them there, hoping that someone 
finds a good use case, but I can say that my extreme excitement 
at getting it to work was depressed quite a bit when I found it 
didn't really gain much in terms of performance for the use 
cases I have been doing.
Should be mostly trivial in fact. I mean our first designs for 
IOpipe is where I wanted regex to work with it.


Basically - if we started a match, extend window until we get 
it or lose it. Then release up to the next point of potential 
start.


I'm thinking it's even simpler than that. All matches are dead 
on a line break (it's how grep normally works), so you simply 
have to parse the lines and run each one via regex. What I 
don't know is how much it costs regex to startup and run on an 
individual line.


It is malloc/free/addRange/removeRange for each call. I optimized 
2.080 to reuse last recently used engine w/o these costs but I’ll 
have to check if it covers all cases.




One thing I could do to amortize is keep 2N lines in the 
buffer, and run the regex on a whole context's worth of lines, 
then dump them all.


I believe integrating iopipe awareness it in regex will easily 
make it 50% faster. A guestimate though.




I don't get why grep is so bad at this, since it is supposedly


grep on Mac is a piece of sheat, sadly and I don’t know why 
exactly (too old?). Use some 3-rd party thing like ‘sift’ written 
in Go.




-Steve




Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Joakim via Digitalmars-d-announce
On Friday, 11 May 2018 at 16:07:26 UTC, Steven Schveighoffer 
wrote:

On 5/11/18 11:44 AM, Steven Schveighoffer wrote:

On 5/10/18 7:22 PM, Steven Schveighoffer wrote:


[...]


Shameful note: Macos grep is BSD grep, and is not NEARLY as 
fast as GNU grep, which has much better performance (and is 2x 
as fast as iopipe_search on my Linux VM, even when printing 
line numbers).


So at least there is something to strive for :)


More testing reveals that as I increase the context lines to 
print, iopipe performs better than GNU grep. A shocking thing 
is that at 9 lines of context, grep goes up slightly, but all 
of a sudden at 10 lines of context, it doubles in the time 
taken (and is now slower than the iopipe_search).


Also noting: my Linux VM does not have ldc, so these are dmd 
numbers.


-Steve


What stops you from downloading a linux release from here?

https://github.com/ldc-developers/ldc/releases


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Jon Degenhardt via Digitalmars-d-announce
On Friday, 11 May 2018 at 15:44:04 UTC, Steven Schveighoffer 
wrote:

On 5/10/18 7:22 PM, Steven Schveighoffer wrote:

Shameful note: Macos grep is BSD grep, and is not NEARLY as 
fast as GNU grep, which has much better performance (and is 2x 
as fast as iopipe_search on my Linux VM, even when printing 
line numbers).


Yeah, the MacOS default versions of the Unix text processing 
tools are really slow. It's worth installing the GNU versions if 
doing performance comparisons on MacOS, or because you work with 
large files. Homebrew and MacPorts both have the GNU versions. 
Some relevant packages: coreutils, grep, gsed (sed), gawk (awk).


Most tools are in coreutils. Many will be installed with a 'g' 
prefix by default, leaving the existing tools in place. e.g. 
'cut' will be installed as 'gcut' unless specified otherwise.


--Jon



Re: Funding for code-d/serve-d

2018-05-11 Thread Mike Parker via Digitalmars-d-announce

On Sunday, 6 May 2018 at 16:31:02 UTC, Meta wrote:

I'm a little unclear how OpenCollective works. Do you have to 
specifically donate to this goal, or does every donation made 
just go to that? Furthermore, I don't really want to create an 
OpenCollective account just to donate; I'd prefer to do it 
directly with my Paypal. Is that possible?


All donations at OpenCollective go toward the current goal. At 
the moment, we have no way to automatically link donations at 
PayPal with the OC goal.


Going forward, we'll investigate how to tie this all together, 
but for now if the OC account is a blocker for you a PayPal 
donation with a note that it's for the VS Code Plugin is 
sufficient. I'll keep track manually until we have a better 
system.




Also, does anyone have an image of the supporter t-shirts? If 
possible I want to donate an amount to get one of those.


I'll throw the images on a web page before the weekend is out.




Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Steven Schveighoffer via Digitalmars-d-announce

On 5/11/18 11:44 AM, Steven Schveighoffer wrote:

On 5/10/18 7:22 PM, Steven Schveighoffer wrote:

However, this example *does* show the power of iopipe -- it handles 
all flavors of unicode with one template function, is quite 
straightforward (though I want to abstract the line tracking code, 
that stuff is really tricky to get right). Oh, and it's roughly 10x 
faster than grep, and a bunch faster than fgrep, at least on my 
machine ;) I'm tempted to add regex processing to see if it still 
beats grep.


Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU 
grep, which has much better performance (and is 2x as fast as 
iopipe_search on my Linux VM, even when printing line numbers).


So at least there is something to strive for :)


More testing reveals that as I increase the context lines to print, 
iopipe performs better than GNU grep. A shocking thing is that at 9 
lines of context, grep goes up slightly, but all of a sudden at 10 lines 
of context, it doubles in the time taken (and is now slower than the 
iopipe_search).


Also noting: my Linux VM does not have ldc, so these are dmd numbers.

-Steve


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Jonathan M Davis via Digitalmars-d-announce
On Friday, May 11, 2018 11:44:04 Steven Schveighoffer via Digitalmars-d-
announce wrote:
> Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU
> grep, which has much better performance (and is 2x as fast as
> iopipe_search on my Linux VM, even when printing line numbers).

Curiously, the grep on FreeBSD seems to be GNU's grep with some additional
patches, though I expect that it's a ways behind whatever GNU is releasing
now, because while they were willing to put some GPLv2 stuff in FreeBSD,
they have not been willing to have anything to do with GPLv3. FreeBSD's grep
claims to be version 2.5.1-FreeBSD, whereas ports has the gnugrep package
which is version 2.27, so that implies a fairly large version difference
between the two. I have no idea how they compare in terms of performance.
Either way, I would have expected FreeBSD to be using their own
implementation, not something from GNU, especially since they seem to be
trying to purge GPL stuff from FreeBSD. So, the fact that FreeBSD is using
GNU's grep is a bit surprising. If I had to guess, I would guess that they
switched to the GNU version at some point in the past, because it was easier
to grab it than to make what they had faster, but I don't know. Either way,
it sounds like Mac OS X either didn't take their grep from FreeBSD in this
case, or they took it from an older version before FreeBSD switching to
using GNU's grep.

- Jonathan M Davis


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Steven Schveighoffer via Digitalmars-d-announce

On 5/10/18 7:22 PM, Steven Schveighoffer wrote:

However, this example *does* show the power of iopipe -- it handles all 
flavors of unicode with one template function, is quite straightforward 
(though I want to abstract the line tracking code, that stuff is really 
tricky to get right). Oh, and it's roughly 10x faster than grep, and a 
bunch faster than fgrep, at least on my machine ;) I'm tempted to add 
regex processing to see if it still beats grep.


Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU 
grep, which has much better performance (and is 2x as fast as 
iopipe_search on my Linux VM, even when printing line numbers).


So at least there is something to strive for :)

-Steve


Re: autowrap v0.0.1 - Automatically wrap existing D code for use in Python and Excel

2018-05-11 Thread Atila Neves via Digitalmars-d-announce

On Thursday, 10 May 2018 at 19:50:40 UTC, Nikos wrote:

In my dub.sdl file I have


configuration "python35" {
 subConfiguration "autowrap" "python35"
}


and I run


dub build --config=python35


which still tries to find python36. Why doesn't it look for 3.5?


Copy + paste error, sorry. Fixed now.

Atila


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Uknown via Digitalmars-d-announce
On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer 
wrote:

[...]
I do get the point of having to go outside the cache. I'll look 
and see if maybe specifying a 1000 line context helps ;)


Update: nope, still pretty much the same.



I'm sure someone will find some good show off program.

The amount of work done per byte though has to be minimal to 
actually see anything.


Right, this is another part of the problem -- if copying is so 
rare compared to the other operations, then the difference is 
going to be lost in the noise.


What I have learned here is:

1. Ring buffers are really cool (I still love how it works) and 
perform as well as normal buffers

2. The use cases are much smaller than I thought
3. In most real-world applications, they are a wash, and not 
worth the OS tricks needed to use it.


Now I need to learn all about ring-buffers. Do you have any good 
starting points?


4. iopipe makes testing with a different kind of buffer really 
easy, which was one of my original goals. So I'm glad that 
works!


That satisfying feeling when the code works exactly the way you 
wanted it to!


I'm going to (obviously) leave them there, hoping that someone 
finds a good use case, but I can say that my extreme excitement 
at getting it to work was depressed quite a bit when I found it 
didn't really gain much in terms of performance for the use 
cases I have been doing.


I'm sure someone will find a place where its useful.


However, this example *does* show the power of iopipe -- it 
handles all flavors of unicode with one template function, is 
quite straightforward (though I want to abstract the line 
tracking code, that stuff is really tricky to get right). Oh, 
and it's roughly 10x faster than grep, and a bunch faster 
than fgrep, at least on my machine ;) I'm tempted to add 
regex processing to see if it still beats grep.


Should be mostly trivial in fact. I mean our first designs for 
IOpipe is where I wanted regex to work with it.


Basically - if we started a match, extend window until we get 
it or lose it. Then release up to the next point of potential 
start.


I'm thinking it's even simpler than that. All matches are dead 
on a line break (it's how grep normally works), so you simply 
have to parse the lines and run each one via regex. What I 
don't know is how much it costs regex to startup and run on an 
individual line.


One thing I could do to amortize is keep 2N lines in the 
buffer, and run the regex on a whole context's worth of lines, 
then dump them all.


iopipe is looking like a great library!

I don't get why grep is so bad at this, since it is supposedly 
doing the matching without line boundaries. I was actually 
quite shocked when iopipe was that much faster -- even when I'm 
not asking grep to print out line numbers (so it doesn't 
actually ever really have to keep track of lines).


-Steve


That reminds me of this great blog post detailing grep's 
performance:

http://ridiculousfish.com/blog/posts/old-age-and-treachery.html

Also, one of the original authors of grep wrote about its 
performance optimizations, for anyone interested:

https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Ali Çehreli via Digitalmars-d-announce

On 05/11/2018 06:28 AM, Steven Schveighoffer wrote:

> 1. Ring buffers are really cool (I still love how it works) and perform
> as well as normal buffers
> 2. The use cases are much smaller than I thought

There is the LMAX Disruptor, which was open sourced a few year ago along 
with a large number of articles, describing its history and design in 
great detail. Because of the large number of articles like this one



https://mechanitis.blogspot.com/2011/06/dissecting-disruptor-whats-so-special.html

it's impossible to find the one that had left an impression on me at the 
time I read it. The article was describing their story from the 
beginning to finally getting to their current design, starting from a 
simple std::map, lock contentions and other concurrency pitfall. They 
finally settled on a multi-producer-single-consumer design where the 
consumer works on one thread. This was giving them the biggest CPU cache 
advantage.


The producers and the consumer share a ring buffer for communication. 
Perhaps the example you're looking for is in there somewhere. :)


Ali



Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Steven Schveighoffer via Digitalmars-d-announce

On 5/11/18 5:55 AM, Kagamin wrote:

On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote:
However, I am struggling to find a use case for this that showcases 
why you would want to use it. While it does work, and works 
beautifully, it doesn't show any measurable difference vs. the array 
allocated buffer that copies data when it needs to extend.


Depends on OS and hardware. I would expect mmap implementation to be 
slower as it reads file in chunks of 4kb and relies on page faults.


As Dmitry hinted at, there actually is no file involved. I'm mapping 
just straight memory to 2 segments. In fact, in my test application, I'm 
using stdin as the input, which may not even involve a file.


It's just as fast as using memory, the only cool part is that you can 
write a buffer that wraps to the beginning as if it were a normal array.


What surprises me is that the copying for the normal buffer doesn't hurt 
performance that much. I suppose this should probably have been 
expected, as CPUs are really really good at processing consecutive 
memory, and the copying you end up having to do is generally small 
compared to the rest of your app.


-Steve


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Steven Schveighoffer via Digitalmars-d-announce

On 5/11/18 1:30 AM, Dmitry Olshansky wrote:

On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote:
OK, so at dconf I spoke with a few very smart guys about how I can use 
mmap to make a zero-copy buffer. And I implemented this on the plane 
ride home.


However, I am struggling to find a use case for this that showcases 
why you would want to use it. While it does work, and works 
beautifully, it doesn't show any measurable difference vs. the array 
allocated buffer that copies data when it needs to extend.


I’d start with something clinicaly synthetic.
Say your record size is exactly half of buffer + 1 byte. If you were to 
extend the size of buffer, it would amortize.


Hm.. this wouldn't work, because the idea is to keep some of the buffer 
full. What will happen here is that the buffer will extend to be able to 
accomodate the extra byte, and then you are back to having less of the 
buffer full at once. Iopipe is not afraid to increase the buffer :)




Basically:
16 Mb buffer fixed
vs
16 Mb mmap-ed ring

Where you read pieces in 8M+1 blocks.Yes, we are aiming to blow the CPU 
cache there. Otherwise CPU cache is so fast that ocasional copy is 
zilch, once we hit primary memory it’s not. Adjust sizes for your CPU.


This isn't how it will work. The system looks at the buffer and says 
"oh, I can just read 8MB - 1 byte," which gives you 2 bytes less than 
you need. Then you need the extra 2 bytes, so it will increase the 
buffer to hold at least 2 records.


I do get the point of having to go outside the cache. I'll look and see 
if maybe specifying a 1000 line context helps ;)


Update: nope, still pretty much the same.

The amount of work done per byte though has to be minimal to actually 
see anything.


Right, this is another part of the problem -- if copying is so rare 
compared to the other operations, then the difference is going to be 
lost in the noise.


What I have learned here is:

1. Ring buffers are really cool (I still love how it works) and perform 
as well as normal buffers

2. The use cases are much smaller than I thought
3. In most real-world applications, they are a wash, and not worth the 
OS tricks needed to use it.
4. iopipe makes testing with a different kind of buffer really easy, 
which was one of my original goals. So I'm glad that works!


I'm going to (obviously) leave them there, hoping that someone finds a 
good use case, but I can say that my extreme excitement at getting it to 
work was depressed quite a bit when I found it didn't really gain much 
in terms of performance for the use cases I have been doing.


in the buffer. But alas, it's roughly the same, even with large number 
of lines for context (like 200).


However, this example *does* show the power of iopipe -- it handles 
all flavors of unicode with one template function, is quite 
straightforward (though I want to abstract the line tracking code, 
that stuff is really tricky to get right). Oh, and it's roughly 10x 
faster than grep, and a bunch faster than fgrep, at least on my 
machine ;) I'm tempted to add regex processing to see if it still 
beats grep.


Should be mostly trivial in fact. I mean our first designs for IOpipe is 
where I wanted regex to work with it.


Basically - if we started a match, extend window until we get it or lose 
it. Then release up to the next point of potential start.


I'm thinking it's even simpler than that. All matches are dead on a line 
break (it's how grep normally works), so you simply have to parse the 
lines and run each one via regex. What I don't know is how much it costs 
regex to startup and run on an individual line.


One thing I could do to amortize is keep 2N lines in the buffer, and run 
the regex on a whole context's worth of lines, then dump them all.


I don't get why grep is so bad at this, since it is supposedly doing the 
matching without line boundaries. I was actually quite shocked when 
iopipe was that much faster -- even when I'm not asking grep to print 
out line numbers (so it doesn't actually ever really have to keep track 
of lines).


-Steve


Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Dmitry Olshansky via Digitalmars-d-announce

On Friday, 11 May 2018 at 09:55:10 UTC, Kagamin wrote:
On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer 
wrote:
However, I am struggling to find a use case for this that 
showcases why you would want to use it. While it does work, 
and works beautifully, it doesn't show any measurable 
difference vs. the array allocated buffer that copies data 
when it needs to extend.


Depends on OS and hardware. I would expect mmap implementation 
to be slower as it reads file in chunks of 4kb and relies on 
page faults.


It doesn’t. Instead it has a buffer mmaped twice side by side. 
Therefore you can avoid copy at the end when it wraps around.


Otherwise it’s the same buffering as usual.



Re: iopipe v0.0.4 - RingBuffers!

2018-05-11 Thread Kagamin via Digitalmars-d-announce
On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer 
wrote:
However, I am struggling to find a use case for this that 
showcases why you would want to use it. While it does work, and 
works beautifully, it doesn't show any measurable difference 
vs. the array allocated buffer that copies data when it needs 
to extend.


Depends on OS and hardware. I would expect mmap implementation to 
be slower as it reads file in chunks of 4kb and relies on page 
faults.