Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 23:46:16 UTC, Dmitry Olshansky wrote: On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer wrote: On 5/11/18 1:30 AM, Dmitry Olshansky wrote: On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote: grep on Mac is a piece of sheat, sadly and I don’t know why exactly (too old?). Use some 3-rd party thing like ‘sift’ written in Go. You can always use GNU grep. The one that comes with macOS is pretty old and slow. If you have macports, its just `port install grep`. I'm sure brew will have a similar package for GNU grep.
Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer wrote: On 5/11/18 1:30 AM, Dmitry Olshansky wrote: On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote: OK, so at dconf I spoke with a few very smart guys about how I can use mmap to make a zero-copy buffer. And I implemented this on the plane ride home. However, I am struggling to find a use case for this that showcases why you would want to use it. While it does work, and works beautifully, it doesn't show any measurable difference vs. the array allocated buffer that copies data when it needs to extend. I’d start with something clinicaly synthetic. Say your record size is exactly half of buffer + 1 byte. If you were to extend the size of buffer, it would amortize. Hm.. this wouldn't work, because the idea is to keep some of the buffer full. What will happen here is that the buffer will extend to be able to accomodate the extra byte, and then you are back to having less of the buffer full at once. Iopipe is not afraid to increase the buffer :) Then you cannot test it in such way. Basically: 16 Mb buffer fixed vs 16 Mb mmap-ed ring Where you read pieces in 8M+1 blocks.Yes, we are aiming to blow the CPU cache there. Otherwise CPU cache is so fast that ocasional copy is zilch, once we hit primary memory it’s not. Adjust sizes for your CPU. This isn't how it will work. The system looks at the buffer and says "oh, I can just read 8MB - 1 byte," which gives you 2 bytes less than you need. Then you need the extra 2 bytes, so it will increase the buffer to hold at least 2 records. I do get the point of having to go outside the cache. I'll look and see if maybe specifying a 1000 line context helps ;) Nope. Consider reading binary records where you know length in advance and skip over it w/o need to touch every byte. There it might help. If you touch every byte and do something the cost of copying the tail is zilch. One example is net string which is: 13,Hello, world! Basically length in ascii digits ‘,’ followed by tgat much UTF-8 codeunits. No decoding nessary. Torrent files use that I think, maybe other files. Is a nice example that avoids scans to find delimiters. Update: nope, still pretty much the same. The amount of work done per byte though has to be minimal to actually see anything. Right, this is another part of the problem -- if copying is so rare compared to the other operations, then the difference is going to be lost in the noise. What I have learned here is: 1. Ring buffers are really cool (I still love how it works) and perform as well as normal buffers This is also good. Normal ring buffers usually suck in speed department. 2. The use cases are much smaller than I thought 3. In most real-world applications, they are a wash, and not worth the OS tricks needed to use it. 4. iopipe makes testing with a different kind of buffer really easy, which was one of my original goals. So I'm glad that works! I'm going to (obviously) leave them there, hoping that someone finds a good use case, but I can say that my extreme excitement at getting it to work was depressed quite a bit when I found it didn't really gain much in terms of performance for the use cases I have been doing. Should be mostly trivial in fact. I mean our first designs for IOpipe is where I wanted regex to work with it. Basically - if we started a match, extend window until we get it or lose it. Then release up to the next point of potential start. I'm thinking it's even simpler than that. All matches are dead on a line break (it's how grep normally works), so you simply have to parse the lines and run each one via regex. What I don't know is how much it costs regex to startup and run on an individual line. It is malloc/free/addRange/removeRange for each call. I optimized 2.080 to reuse last recently used engine w/o these costs but I’ll have to check if it covers all cases. One thing I could do to amortize is keep 2N lines in the buffer, and run the regex on a whole context's worth of lines, then dump them all. I believe integrating iopipe awareness it in regex will easily make it 50% faster. A guestimate though. I don't get why grep is so bad at this, since it is supposedly grep on Mac is a piece of sheat, sadly and I don’t know why exactly (too old?). Use some 3-rd party thing like ‘sift’ written in Go. -Steve
Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 16:07:26 UTC, Steven Schveighoffer wrote: On 5/11/18 11:44 AM, Steven Schveighoffer wrote: On 5/10/18 7:22 PM, Steven Schveighoffer wrote: [...] Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers). So at least there is something to strive for :) More testing reveals that as I increase the context lines to print, iopipe performs better than GNU grep. A shocking thing is that at 9 lines of context, grep goes up slightly, but all of a sudden at 10 lines of context, it doubles in the time taken (and is now slower than the iopipe_search). Also noting: my Linux VM does not have ldc, so these are dmd numbers. -Steve What stops you from downloading a linux release from here? https://github.com/ldc-developers/ldc/releases
Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 15:44:04 UTC, Steven Schveighoffer wrote: On 5/10/18 7:22 PM, Steven Schveighoffer wrote: Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers). Yeah, the MacOS default versions of the Unix text processing tools are really slow. It's worth installing the GNU versions if doing performance comparisons on MacOS, or because you work with large files. Homebrew and MacPorts both have the GNU versions. Some relevant packages: coreutils, grep, gsed (sed), gawk (awk). Most tools are in coreutils. Many will be installed with a 'g' prefix by default, leaving the existing tools in place. e.g. 'cut' will be installed as 'gcut' unless specified otherwise. --Jon
Re: Funding for code-d/serve-d
On Sunday, 6 May 2018 at 16:31:02 UTC, Meta wrote: I'm a little unclear how OpenCollective works. Do you have to specifically donate to this goal, or does every donation made just go to that? Furthermore, I don't really want to create an OpenCollective account just to donate; I'd prefer to do it directly with my Paypal. Is that possible? All donations at OpenCollective go toward the current goal. At the moment, we have no way to automatically link donations at PayPal with the OC goal. Going forward, we'll investigate how to tie this all together, but for now if the OC account is a blocker for you a PayPal donation with a note that it's for the VS Code Plugin is sufficient. I'll keep track manually until we have a better system. Also, does anyone have an image of the supporter t-shirts? If possible I want to donate an amount to get one of those. I'll throw the images on a web page before the weekend is out.
Re: iopipe v0.0.4 - RingBuffers!
On 5/11/18 11:44 AM, Steven Schveighoffer wrote: On 5/10/18 7:22 PM, Steven Schveighoffer wrote: However, this example *does* show the power of iopipe -- it handles all flavors of unicode with one template function, is quite straightforward (though I want to abstract the line tracking code, that stuff is really tricky to get right). Oh, and it's roughly 10x faster than grep, and a bunch faster than fgrep, at least on my machine ;) I'm tempted to add regex processing to see if it still beats grep. Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers). So at least there is something to strive for :) More testing reveals that as I increase the context lines to print, iopipe performs better than GNU grep. A shocking thing is that at 9 lines of context, grep goes up slightly, but all of a sudden at 10 lines of context, it doubles in the time taken (and is now slower than the iopipe_search). Also noting: my Linux VM does not have ldc, so these are dmd numbers. -Steve
Re: iopipe v0.0.4 - RingBuffers!
On Friday, May 11, 2018 11:44:04 Steven Schveighoffer via Digitalmars-d- announce wrote: > Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU > grep, which has much better performance (and is 2x as fast as > iopipe_search on my Linux VM, even when printing line numbers). Curiously, the grep on FreeBSD seems to be GNU's grep with some additional patches, though I expect that it's a ways behind whatever GNU is releasing now, because while they were willing to put some GPLv2 stuff in FreeBSD, they have not been willing to have anything to do with GPLv3. FreeBSD's grep claims to be version 2.5.1-FreeBSD, whereas ports has the gnugrep package which is version 2.27, so that implies a fairly large version difference between the two. I have no idea how they compare in terms of performance. Either way, I would have expected FreeBSD to be using their own implementation, not something from GNU, especially since they seem to be trying to purge GPL stuff from FreeBSD. So, the fact that FreeBSD is using GNU's grep is a bit surprising. If I had to guess, I would guess that they switched to the GNU version at some point in the past, because it was easier to grab it than to make what they had faster, but I don't know. Either way, it sounds like Mac OS X either didn't take their grep from FreeBSD in this case, or they took it from an older version before FreeBSD switching to using GNU's grep. - Jonathan M Davis
Re: iopipe v0.0.4 - RingBuffers!
On 5/10/18 7:22 PM, Steven Schveighoffer wrote: However, this example *does* show the power of iopipe -- it handles all flavors of unicode with one template function, is quite straightforward (though I want to abstract the line tracking code, that stuff is really tricky to get right). Oh, and it's roughly 10x faster than grep, and a bunch faster than fgrep, at least on my machine ;) I'm tempted to add regex processing to see if it still beats grep. Shameful note: Macos grep is BSD grep, and is not NEARLY as fast as GNU grep, which has much better performance (and is 2x as fast as iopipe_search on my Linux VM, even when printing line numbers). So at least there is something to strive for :) -Steve
Re: autowrap v0.0.1 - Automatically wrap existing D code for use in Python and Excel
On Thursday, 10 May 2018 at 19:50:40 UTC, Nikos wrote: In my dub.sdl file I have configuration "python35" { subConfiguration "autowrap" "python35" } and I run dub build --config=python35 which still tries to find python36. Why doesn't it look for 3.5? Copy + paste error, sorry. Fixed now. Atila
Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 13:28:58 UTC, Steven Schveighoffer wrote: [...] I do get the point of having to go outside the cache. I'll look and see if maybe specifying a 1000 line context helps ;) Update: nope, still pretty much the same. I'm sure someone will find some good show off program. The amount of work done per byte though has to be minimal to actually see anything. Right, this is another part of the problem -- if copying is so rare compared to the other operations, then the difference is going to be lost in the noise. What I have learned here is: 1. Ring buffers are really cool (I still love how it works) and perform as well as normal buffers 2. The use cases are much smaller than I thought 3. In most real-world applications, they are a wash, and not worth the OS tricks needed to use it. Now I need to learn all about ring-buffers. Do you have any good starting points? 4. iopipe makes testing with a different kind of buffer really easy, which was one of my original goals. So I'm glad that works! That satisfying feeling when the code works exactly the way you wanted it to! I'm going to (obviously) leave them there, hoping that someone finds a good use case, but I can say that my extreme excitement at getting it to work was depressed quite a bit when I found it didn't really gain much in terms of performance for the use cases I have been doing. I'm sure someone will find a place where its useful. However, this example *does* show the power of iopipe -- it handles all flavors of unicode with one template function, is quite straightforward (though I want to abstract the line tracking code, that stuff is really tricky to get right). Oh, and it's roughly 10x faster than grep, and a bunch faster than fgrep, at least on my machine ;) I'm tempted to add regex processing to see if it still beats grep. Should be mostly trivial in fact. I mean our first designs for IOpipe is where I wanted regex to work with it. Basically - if we started a match, extend window until we get it or lose it. Then release up to the next point of potential start. I'm thinking it's even simpler than that. All matches are dead on a line break (it's how grep normally works), so you simply have to parse the lines and run each one via regex. What I don't know is how much it costs regex to startup and run on an individual line. One thing I could do to amortize is keep 2N lines in the buffer, and run the regex on a whole context's worth of lines, then dump them all. iopipe is looking like a great library! I don't get why grep is so bad at this, since it is supposedly doing the matching without line boundaries. I was actually quite shocked when iopipe was that much faster -- even when I'm not asking grep to print out line numbers (so it doesn't actually ever really have to keep track of lines). -Steve That reminds me of this great blog post detailing grep's performance: http://ridiculousfish.com/blog/posts/old-age-and-treachery.html Also, one of the original authors of grep wrote about its performance optimizations, for anyone interested: https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
Re: iopipe v0.0.4 - RingBuffers!
On 05/11/2018 06:28 AM, Steven Schveighoffer wrote: > 1. Ring buffers are really cool (I still love how it works) and perform > as well as normal buffers > 2. The use cases are much smaller than I thought There is the LMAX Disruptor, which was open sourced a few year ago along with a large number of articles, describing its history and design in great detail. Because of the large number of articles like this one https://mechanitis.blogspot.com/2011/06/dissecting-disruptor-whats-so-special.html it's impossible to find the one that had left an impression on me at the time I read it. The article was describing their story from the beginning to finally getting to their current design, starting from a simple std::map, lock contentions and other concurrency pitfall. They finally settled on a multi-producer-single-consumer design where the consumer works on one thread. This was giving them the biggest CPU cache advantage. The producers and the consumer share a ring buffer for communication. Perhaps the example you're looking for is in there somewhere. :) Ali
Re: iopipe v0.0.4 - RingBuffers!
On 5/11/18 5:55 AM, Kagamin wrote: On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote: However, I am struggling to find a use case for this that showcases why you would want to use it. While it does work, and works beautifully, it doesn't show any measurable difference vs. the array allocated buffer that copies data when it needs to extend. Depends on OS and hardware. I would expect mmap implementation to be slower as it reads file in chunks of 4kb and relies on page faults. As Dmitry hinted at, there actually is no file involved. I'm mapping just straight memory to 2 segments. In fact, in my test application, I'm using stdin as the input, which may not even involve a file. It's just as fast as using memory, the only cool part is that you can write a buffer that wraps to the beginning as if it were a normal array. What surprises me is that the copying for the normal buffer doesn't hurt performance that much. I suppose this should probably have been expected, as CPUs are really really good at processing consecutive memory, and the copying you end up having to do is generally small compared to the rest of your app. -Steve
Re: iopipe v0.0.4 - RingBuffers!
On 5/11/18 1:30 AM, Dmitry Olshansky wrote: On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote: OK, so at dconf I spoke with a few very smart guys about how I can use mmap to make a zero-copy buffer. And I implemented this on the plane ride home. However, I am struggling to find a use case for this that showcases why you would want to use it. While it does work, and works beautifully, it doesn't show any measurable difference vs. the array allocated buffer that copies data when it needs to extend. I’d start with something clinicaly synthetic. Say your record size is exactly half of buffer + 1 byte. If you were to extend the size of buffer, it would amortize. Hm.. this wouldn't work, because the idea is to keep some of the buffer full. What will happen here is that the buffer will extend to be able to accomodate the extra byte, and then you are back to having less of the buffer full at once. Iopipe is not afraid to increase the buffer :) Basically: 16 Mb buffer fixed vs 16 Mb mmap-ed ring Where you read pieces in 8M+1 blocks.Yes, we are aiming to blow the CPU cache there. Otherwise CPU cache is so fast that ocasional copy is zilch, once we hit primary memory it’s not. Adjust sizes for your CPU. This isn't how it will work. The system looks at the buffer and says "oh, I can just read 8MB - 1 byte," which gives you 2 bytes less than you need. Then you need the extra 2 bytes, so it will increase the buffer to hold at least 2 records. I do get the point of having to go outside the cache. I'll look and see if maybe specifying a 1000 line context helps ;) Update: nope, still pretty much the same. The amount of work done per byte though has to be minimal to actually see anything. Right, this is another part of the problem -- if copying is so rare compared to the other operations, then the difference is going to be lost in the noise. What I have learned here is: 1. Ring buffers are really cool (I still love how it works) and perform as well as normal buffers 2. The use cases are much smaller than I thought 3. In most real-world applications, they are a wash, and not worth the OS tricks needed to use it. 4. iopipe makes testing with a different kind of buffer really easy, which was one of my original goals. So I'm glad that works! I'm going to (obviously) leave them there, hoping that someone finds a good use case, but I can say that my extreme excitement at getting it to work was depressed quite a bit when I found it didn't really gain much in terms of performance for the use cases I have been doing. in the buffer. But alas, it's roughly the same, even with large number of lines for context (like 200). However, this example *does* show the power of iopipe -- it handles all flavors of unicode with one template function, is quite straightforward (though I want to abstract the line tracking code, that stuff is really tricky to get right). Oh, and it's roughly 10x faster than grep, and a bunch faster than fgrep, at least on my machine ;) I'm tempted to add regex processing to see if it still beats grep. Should be mostly trivial in fact. I mean our first designs for IOpipe is where I wanted regex to work with it. Basically - if we started a match, extend window until we get it or lose it. Then release up to the next point of potential start. I'm thinking it's even simpler than that. All matches are dead on a line break (it's how grep normally works), so you simply have to parse the lines and run each one via regex. What I don't know is how much it costs regex to startup and run on an individual line. One thing I could do to amortize is keep 2N lines in the buffer, and run the regex on a whole context's worth of lines, then dump them all. I don't get why grep is so bad at this, since it is supposedly doing the matching without line boundaries. I was actually quite shocked when iopipe was that much faster -- even when I'm not asking grep to print out line numbers (so it doesn't actually ever really have to keep track of lines). -Steve
Re: iopipe v0.0.4 - RingBuffers!
On Friday, 11 May 2018 at 09:55:10 UTC, Kagamin wrote: On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote: However, I am struggling to find a use case for this that showcases why you would want to use it. While it does work, and works beautifully, it doesn't show any measurable difference vs. the array allocated buffer that copies data when it needs to extend. Depends on OS and hardware. I would expect mmap implementation to be slower as it reads file in chunks of 4kb and relies on page faults. It doesn’t. Instead it has a buffer mmaped twice side by side. Therefore you can avoid copy at the end when it wraps around. Otherwise it’s the same buffering as usual.
Re: iopipe v0.0.4 - RingBuffers!
On Thursday, 10 May 2018 at 23:22:02 UTC, Steven Schveighoffer wrote: However, I am struggling to find a use case for this that showcases why you would want to use it. While it does work, and works beautifully, it doesn't show any measurable difference vs. the array allocated buffer that copies data when it needs to extend. Depends on OS and hardware. I would expect mmap implementation to be slower as it reads file in chunks of 4kb and relies on page faults.