Re: [go-nuts] Type func binding and interfaces
> > Any code that keeps data aligned to memory page and disk page sizes is > automatically significantly faster, because misalignment automatically > doubles the amount of memory that has to be accessed to satisfy a request. > This is why Binary Heaps are way slower than B-heaps. My opinion is without measurement you will miss necessary knowledge found between assumptions. The package testing benchmark has non-obvious features for repeatable data: https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go I see you have logic tests (https://github.com/calibrae-project/bast/blob/master/pkg/bast/bast_test.go), and benchmarks are another step. Matt On Wednesday, April 25, 2018 at 9:37:48 AM UTC-5, Louki Sumirniy wrote: > > Always the elements are inserted according to greater than and less than. > equal can't happen. The first value inserted will be the root to begin > with, but if the tree gets heavy on one side, you rotate the root to > rebalance. from any given node, you know that you will find the element you > are looking for according to the node you are at, and its greater or less > than test. You can know this is always going to be true because that's how > the inserts will populate the tree, and the 'completeness' constraint > enforces a requirement for the 'bunching up' of data. So long as there is > space between two nodes, they can be as high up in the tree as the distance > between them, each row downwards allows a power of two sequence of > increasing numbers. The 'completeness' constraint is extended by not > allowing orphans, every child has a parent, until you get to the root, so > vice versa, you know that you will find every element in the tree by > walking downwards and going left or right depending on greater than and > less than, this is the comparability constraint. You should be able to see > that a number of architectural features in this data storage system imply > heuristics for optimisation. > > Just to update, reading about B-heaps and how it structures storage, I > realised that I need to think of the tree in terms of 4kb blocks (well, for > most platforms) as this is the amount of memory that will be accessed in > one operation. However many rows a payload size fits into 4kb is the size > of each subtree. So for 32 bit values, I have one page at the top, and then > 1024 pages that are the left and right pairs of the 512 elements at the > bottom of the first page, and, of course, this repeats again. This changes > how I have to implement the walk functions, as when it overflows past 512, > I know I have moved to the second page-row. > > So... I have to kinda go back to the drawing board a little on the > structuring of the algorithm since it is working on 4kb page sizes. I may > have to switch up the design so that the pages are the primary indices of > the first dimension, and then the pages themselves, the subtrees, can be > allocated as fixed length arrays, which also simplifies how Go will deal > with them. Further, instead of allocating whole rows at a time, only those > rows that have been breached require allocation of a new page for a > subtree. In this respect, then the question about search cost becomes a > little different, since we use 4k pages, and we know walking them goes on > inside 14Gb/s SRAM cache, performance optimisation by tree balancing has a > somewhat different character than the Binary Heap linear mapping. > > Why do I seem so confident about the performance of this, potentially? > Look up B-heaps and Varnish reverse proxy. It is designed at the low level > with the same basic concept in mind, using B-heaps - structure the heap > segments according to memory pages. If you are familiar with filesystems, > you may know about block alignment. It's not quite so important for > spinning disks but for flash and S/DRAM, memory is only ever handled on the > hardware level in these block sizes. Disks usually use 512 byte blocks in > the FS level and generally most disks have 512 byte blocks, some have 1k, > 2k, 4k or 8k. Most current generation FS's use 4k blocks as the default > size, for the reason it is a 1:1 mapping with memory pages. > > The difference with what I am designing is that it does not order > vertically, it orders horizontally. It's basically fractal, each subtree > has the same number of potential subtrees below it. > > Any code that keeps data aligned to memory page and disk page sizes is > automatically significantly faster, because misalignment automatically > doubles the amount of memory that has to be accessed to satisfy a request. > This is why Binary Heaps are way slower than B-heaps. > > Anyway, because many of my assumptions based on my conceptual model have > changed, and big thanks to you guys, I have to rework the design to account > for this. Thanks :p > > On Wednesday, 25 April 2018 15:11:15 UTC+3, rog wrote: >> >> On 25 April 2018 at 10:24, Louki Sumirniy >>
Re: [go-nuts] Type func binding and interfaces
Always the elements are inserted according to greater than and less than. equal can't happen. The first value inserted will be the root to begin with, but if the tree gets heavy on one side, you rotate the root to rebalance. from any given node, you know that you will find the element you are looking for according to the node you are at, and its greater or less than test. You can know this is always going to be true because that's how the inserts will populate the tree, and the 'completeness' constraint enforces a requirement for the 'bunching up' of data. So long as there is space between two nodes, they can be as high up in the tree as the distance between them, each row downwards allows a power of two sequence of increasing numbers. The 'completeness' constraint is extended by not allowing orphans, every child has a parent, until you get to the root, so vice versa, you know that you will find every element in the tree by walking downwards and going left or right depending on greater than and less than, this is the comparability constraint. You should be able to see that a number of architectural features in this data storage system imply heuristics for optimisation. Just to update, reading about B-heaps and how it structures storage, I realised that I need to think of the tree in terms of 4kb blocks (well, for most platforms) as this is the amount of memory that will be accessed in one operation. However many rows a payload size fits into 4kb is the size of each subtree. So for 32 bit values, I have one page at the top, and then 1024 pages that are the left and right pairs of the 512 elements at the bottom of the first page, and, of course, this repeats again. This changes how I have to implement the walk functions, as when it overflows past 512, I know I have moved to the second page-row. So... I have to kinda go back to the drawing board a little on the structuring of the algorithm since it is working on 4kb page sizes. I may have to switch up the design so that the pages are the primary indices of the first dimension, and then the pages themselves, the subtrees, can be allocated as fixed length arrays, which also simplifies how Go will deal with them. Further, instead of allocating whole rows at a time, only those rows that have been breached require allocation of a new page for a subtree. In this respect, then the question about search cost becomes a little different, since we use 4k pages, and we know walking them goes on inside 14Gb/s SRAM cache, performance optimisation by tree balancing has a somewhat different character than the Binary Heap linear mapping. Why do I seem so confident about the performance of this, potentially? Look up B-heaps and Varnish reverse proxy. It is designed at the low level with the same basic concept in mind, using B-heaps - structure the heap segments according to memory pages. If you are familiar with filesystems, you may know about block alignment. It's not quite so important for spinning disks but for flash and S/DRAM, memory is only ever handled on the hardware level in these block sizes. Disks usually use 512 byte blocks in the FS level and generally most disks have 512 byte blocks, some have 1k, 2k, 4k or 8k. Most current generation FS's use 4k blocks as the default size, for the reason it is a 1:1 mapping with memory pages. The difference with what I am designing is that it does not order vertically, it orders horizontally. It's basically fractal, each subtree has the same number of potential subtrees below it. Any code that keeps data aligned to memory page and disk page sizes is automatically significantly faster, because misalignment automatically doubles the amount of memory that has to be accessed to satisfy a request. This is why Binary Heaps are way slower than B-heaps. Anyway, because many of my assumptions based on my conceptual model have changed, and big thanks to you guys, I have to rework the design to account for this. Thanks :p On Wednesday, 25 April 2018 15:11:15 UTC+3, rog wrote: > > On 25 April 2018 at 10:24, Louki Sumirniy >wrote: > > As you look deeper into the link discussing the B-heap you can see that > actually I am pretty much exactly following the same general structure in > my algorithm. The structure will align neatly with page boundaries and that > means less page faults and reduced pressure on the virtual memory mapping > system that means actually the tree can go a lot larger while cutting down > delays on retrieving data. I intuitively understood this in my own visual > model and this picture in particular is exactly how I visualised the data > storage map: > > To expand slightly on why this data structure isn't good for > searching, imagine you're searching for the number 40 in either of > those pictured data structures. When you're looking at the root, you > have to make a decision whether to move to the left or the right >
Re: [go-nuts] Type func binding and interfaces
> > I worked for a little while on the C++ server application for the Steem > network node, and I was intending to remove a whole swathe of code relating > to protocol changes at various hard forks. The number of times I ran across > poorly ordered if/then (not even using switch!) that would perform > unnecessary comparisons in more cases than not, to me it explained the > ballooning amount of time that was required to play the blockchain into a > database shared file. What I’ve been saying is you should look at a performance measure to prove a hypothesis like this one. Removing that code may or may not work. I think you’ve mixed optimizing the algorithm and code, and we have to look at one at a time to effectively reach your goal. A program that measures progress toward the goal (performance) is a foundation I think we can more seriously start from for the code part. Can you share something like this with us? Matt On Wednesday, April 25, 2018 at 4:08:17 AM UTC-5, Louki Sumirniy wrote: > > I think that it's not necessarily non-idiomatic to use closures instead of > interfaces in Go, it's more that Go has had interfaces longer than it's had > closures, and so more code has been written this way. > > In Angular 2+ you have the option of embedding HTML, CSS and TS code > inside one file, or instead having 4, with the main file importing the > other three. I don't like this for several reasons, but mainly because it > makes a more complex interlinking between them, and I think even you can > say that if your application is big enough, all this extra import work will > also cost a lot of time, though in Go of course that time is not so big in > the first place due to how efficient its compiler is. > > The way I see it is that closures and interfaces are two ways to do > exactly the same thing, binding namespaces together. One requires more > accessory declarations and lines of code, not a huge extra overhead, but > then for exported stuff - well, this is the one upside of it but I don't > think it is that great, gofmt wants to see header comments on every > exported function. Which is a nice idea, in theory, but in my opinion if > you need comments your naming scheme sucks. > > That's also why I created distinct comparators with meaningful names > instead of creating named return values. b.IsEqual(data, cursor) compared > to b.Compare(data, cursor) == EQUAL is not just a lot longer, but harder to > read. I think technically it may also consume more code cache space to > implement this extra operation, though on the other side, there is a small > extra overhead for having three instead of 1. The compare function has to > run three cases, most concisely expressed as > > switch{ > case b.Store(c.index)>d: > return GREATER; > case b.Store(c.index) return LESSER > } > return EQUAL > > If my function only needs to know if it's equal (for example a search tree > walk), it's just done two comparison/branch operations for absolutely no > reason, and if I switch the order to benefit search, I raise the cost of > determining which direction to step next after no match on a node. > > I worked for a little while on the C++ server application for the Steem > network node, and I was intending to remove a whole swathe of code relating > to protocol changes at various hard forks. The number of times I ran across > poorly ordered if/then (not even using switch!) that would perform > unnecessary comparisons in more cases than not, to me it explained the > ballooning amount of time that was required to play the blockchain into a > database shared file. Literally, over a week, at that point, almost 9 > months ago, and last I heard 270Gb of memory is required because this > playback takes a stupid amount of time (effectively, eternity) unless the > shared file is stored in a ramdisk. > > So, just to be clear, I am not using OOPish techniques because I am a > believer, and I don't think that convention should dictate effective > programming either. Closures are a more compact notation than interfaces, > and for me this is equally important as writing code that does not waste > cycles doing things for the sake of some arbitrary model that does not > greatly improve maintainability and readability at the cost of overhead > that will stack up the more this approach is used. > > Heaps have certainly got some disadvantages compared to bucket sorts and > reference based trees but the one thing they have is data locality. Data > locality can be a huge advantage because of CPU cache misses being avoided. > Cache memory on my i5 CPU is 14Gb/s write speed. The DDR4-2400 memory in my > system writes at 4Gb/s. This is linear writing, in the case of the main > memory, so not only does conventional binary tree architecture cause a very > high bandwidth load on the main memory, it also seeks the data randomly > which adds even
Re: [go-nuts] Type func binding and interfaces
On 25 April 2018 at 10:24, Louki Sumirniywrote: > As you look deeper into the link discussing the B-heap you can see that > actually I am pretty much exactly following the same general structure in my > algorithm. The structure will align neatly with page boundaries and that > means less page faults and reduced pressure on the virtual memory mapping > system that means actually the tree can go a lot larger while cutting down > delays on retrieving data. I intuitively understood this in my own visual > model and this picture in particular is exactly how I visualised the data > storage map: To expand slightly on why this data structure isn't good for searching, imagine you're searching for the number 40 in either of those pictured data structures. When you're looking at the root, you have to make a decision whether to move to the left or the right child, but you can only see 2 and 3. How do you decide that you need to go left? Now imagine you're searching for 31 - how do you decide that the correct direction is right? -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
On 25 April 2018 at 10:08, Louki Sumirniywrote: > I think that it's not necessarily non-idiomatic to use closures instead of > interfaces in Go, it's more that Go has had interfaces longer than it's had > closures, and so more code has been written this way. Sorry, that's just not true. Go had both interfaces and closures on the first day of its public release. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
Roger is right. A heap can be a good structure for a priority queue but not for search. That is because it is partially ordered and siblings are not in any sorted order. In any case heaps are typically implemented with a vector. Go already has a pkg for it. go doc heap. Seems like you’re doing something with blockchains. If you can clearly describe the precise problem you’re trying to solve (or point to an existing description), without getting into implementation issues, we may be able to help. Right now it is hard to reverse engineer the problem from your code and messages to this list. > On Apr 25, 2018, at 2:08 AM, Louki Sumirniy >wrote: > > I think that it's not necessarily non-idiomatic to use closures instead of > interfaces in Go, it's more that Go has had interfaces longer than it's had > closures, and so more code has been written this way. > > In Angular 2+ you have the option of embedding HTML, CSS and TS code inside > one file, or instead having 4, with the main file importing the other three. > I don't like this for several reasons, but mainly because it makes a more > complex interlinking between them, and I think even you can say that if your > application is big enough, all this extra import work will also cost a lot of > time, though in Go of course that time is not so big in the first place due > to how efficient its compiler is. > > The way I see it is that closures and interfaces are two ways to do exactly > the same thing, binding namespaces together. One requires more accessory > declarations and lines of code, not a huge extra overhead, but then for > exported stuff - well, this is the one upside of it but I don't think it is > that great, gofmt wants to see header comments on every exported function. > Which is a nice idea, in theory, but in my opinion if you need comments your > naming scheme sucks. > > That's also why I created distinct comparators with meaningful names instead > of creating named return values. b.IsEqual(data, cursor) compared to > b.Compare(data, cursor) == EQUAL is not just a lot longer, but harder to > read. I think technically it may also consume more code cache space to > implement this extra operation, though on the other side, there is a small > extra overhead for having three instead of 1. The compare function has to run > three cases, most concisely expressed as > > switch{ > case b.Store(c.index)>d: > return GREATER; > case b.Store(c.index) return LESSER > } > return EQUAL > > If my function only needs to know if it's equal (for example a search tree > walk), it's just done two comparison/branch operations for absolutely no > reason, and if I switch the order to benefit search, I raise the cost of > determining which direction to step next after no match on a node. > > I worked for a little while on the C++ server application for the Steem > network node, and I was intending to remove a whole swathe of code relating > to protocol changes at various hard forks. The number of times I ran across > poorly ordered if/then (not even using switch!) that would perform > unnecessary comparisons in more cases than not, to me it explained the > ballooning amount of time that was required to play the blockchain into a > database shared file. Literally, over a week, at that point, almost 9 months > ago, and last I heard 270Gb of memory is required because this playback takes > a stupid amount of time (effectively, eternity) unless the shared file is > stored in a ramdisk. > > So, just to be clear, I am not using OOPish techniques because I am a > believer, and I don't think that convention should dictate effective > programming either. Closures are a more compact notation than interfaces, and > for me this is equally important as writing code that does not waste cycles > doing things for the sake of some arbitrary model that does not greatly > improve maintainability and readability at the cost of overhead that will > stack up the more this approach is used. > > Heaps have certainly got some disadvantages compared to bucket sorts and > reference based trees but the one thing they have is data locality. Data > locality can be a huge advantage because of CPU cache misses being avoided. > Cache memory on my i5 CPU is 14Gb/s write speed. The DDR4-2400 memory in my > system writes at 4Gb/s. This is linear writing, in the case of the main > memory, so not only does conventional binary tree architecture cause a very > high bandwidth load on the main memory, it also seeks the data randomly which > adds even more delays due to the linearity of the memory cells. > > To illustrate my point, here is a couple of articles I found relating to this > and how data locality has brought massive performance benefits in reverse > proxy servers: https://queue.acm.org/detail.cfm?id=1814327 and I found this > story from this
Re: [go-nuts] Type func binding and interfaces
As you look deeper into the link discussing the B-heap you can see that actually I am pretty much exactly following the same general structure in my algorithm. The structure will align neatly with page boundaries and that means less page faults and reduced pressure on the virtual memory mapping system that means actually the tree can go a lot larger while cutting down delays on retrieving data. I intuitively understood this in my own visual model and this picture in particular is exactly how I visualised the data storage map: (this necessarily also requires a 1-root) but then the B-heap reduces the fragmentation of the memory through the page even further: S, I probably will be considering revising the mapping to correspond to this as I can see why it would reduce paging during both searches and 'bubbling up' (in my case, bunching up, as in inwards) required to maintain a high fill ratio. On Wednesday, 25 April 2018 12:08:17 UTC+3, Louki Sumirniy wrote: > > I think that it's not necessarily non-idiomatic to use closures instead of > interfaces in Go, it's more that Go has had interfaces longer than it's had > closures, and so more code has been written this way. > > In Angular 2+ you have the option of embedding HTML, CSS and TS code > inside one file, or instead having 4, with the main file importing the > other three. I don't like this for several reasons, but mainly because it > makes a more complex interlinking between them, and I think even you can > say that if your application is big enough, all this extra import work will > also cost a lot of time, though in Go of course that time is not so big in > the first place due to how efficient its compiler is. > > The way I see it is that closures and interfaces are two ways to do > exactly the same thing, binding namespaces together. One requires more > accessory declarations and lines of code, not a huge extra overhead, but > then for exported stuff - well, this is the one upside of it but I don't > think it is that great, gofmt wants to see header comments on every > exported function. Which is a nice idea, in theory, but in my opinion if > you need comments your naming scheme sucks. > > That's also why I created distinct comparators with meaningful names > instead of creating named return values. b.IsEqual(data, cursor) compared > to b.Compare(data, cursor) == EQUAL is not just a lot longer, but harder to > read. I think technically it may also consume more code cache space to > implement this extra operation, though on the other side, there is a small > extra overhead for having three instead of 1. The compare function has to > run three cases, most concisely expressed as > > switch{ > case b.Store(c.index)>d: > return GREATER; > case b.Store(c.index) return LESSER > } > return EQUAL > > If my function only needs to know if it's equal (for example a search tree > walk), it's just done two comparison/branch operations for absolutely no > reason, and if I switch the order to benefit search, I raise the cost of > determining which direction to step next after no match on a node. > > I worked for a little while on the C++ server application for the Steem > network node, and I was intending to remove a whole swathe of code relating > to protocol changes at various hard forks. The number of times I ran across > poorly ordered if/then (not even using switch!) that would perform > unnecessary comparisons in more cases than not, to me it explained the > ballooning amount of time that was required to play the blockchain into a > database shared file. Literally, over a week, at that point, almost 9 > months ago, and last I heard 270Gb of memory is required because this > playback takes a stupid amount of time (effectively, eternity) unless the > shared file is stored in a ramdisk. > > So, just to be clear, I am not using OOPish techniques because I am a > believer, and I don't think that convention should dictate effective > programming either. Closures are a more compact notation than interfaces, > and for me this is equally important as writing code that does not waste > cycles doing things for the sake of some arbitrary model that does not > greatly improve maintainability and readability at the cost of overhead > that will stack up the more this approach is used. > > Heaps have certainly got some disadvantages compared to bucket sorts and > reference based trees but the one thing they have is data locality. Data > locality can be a huge advantage because of CPU cache misses being avoided. > Cache memory on my i5 CPU is 14Gb/s write speed. The DDR4-2400 memory in my > system writes at 4Gb/s. This is linear writing, in the case of the main > memory, so not only does conventional binary tree architecture cause a very > high bandwidth load on the main memory, it also seeks the data randomly > which adds even more delays due to the
Re: [go-nuts] Type func binding and interfaces
I think that it's not necessarily non-idiomatic to use closures instead of interfaces in Go, it's more that Go has had interfaces longer than it's had closures, and so more code has been written this way. In Angular 2+ you have the option of embedding HTML, CSS and TS code inside one file, or instead having 4, with the main file importing the other three. I don't like this for several reasons, but mainly because it makes a more complex interlinking between them, and I think even you can say that if your application is big enough, all this extra import work will also cost a lot of time, though in Go of course that time is not so big in the first place due to how efficient its compiler is. The way I see it is that closures and interfaces are two ways to do exactly the same thing, binding namespaces together. One requires more accessory declarations and lines of code, not a huge extra overhead, but then for exported stuff - well, this is the one upside of it but I don't think it is that great, gofmt wants to see header comments on every exported function. Which is a nice idea, in theory, but in my opinion if you need comments your naming scheme sucks. That's also why I created distinct comparators with meaningful names instead of creating named return values. b.IsEqual(data, cursor) compared to b.Compare(data, cursor) == EQUAL is not just a lot longer, but harder to read. I think technically it may also consume more code cache space to implement this extra operation, though on the other side, there is a small extra overhead for having three instead of 1. The compare function has to run three cases, most concisely expressed as switch{ case b.Store(c.index)>d: return GREATER; case b.Store(c.index)https://queue.acm.org/detail.cfm?id=1814327 and I found this story from this stackexchange topic: https://stackoverflow.com/questions/6531543/efficient-implementation-of-binary-heaps The benefit of exploiting the properties of cpu and memory caching yielded a net boost of nearly 10x. I can't see how, at bare minimum, based on the ratios of cache and memory write speed on my system, I won't see close to or around 3x improvement compared to reference based binary trees, and potentially a similar kind of improvement compared to bucket sorting (which is how Cuckoo Cycle searches for cycles in hashchains). I don't know anything about what actual result it will have, but I am building it anyway, and I will use closures because I personally prefer the notation. On Wednesday, 25 April 2018 10:48:28 UTC+3, rog wrote: > > On 25 April 2018 at 08:05, Louki Sumirniy >wrote: > > > https://stackoverflow.com/questions/6531543/efficient-implementation-of-binary-heaps > > > > > Pretty much what I'm working on here is this, except with left to right > sort > > instead of vertical. I think this guy's work will help me iron out the > > performance issues. > > You do know that heaps aren't a great data structure for searching, right? > > > Another thing, that is more on topic more specifically, is that > collections > > of interface methods introduce a significant overhead, compared to > simply > > passing the pointer to the structure as a parameter. I am thinking that > > maybe a way to hide this parameter passing is by using closures, which > bind > > in the namespace from a hypothetical initialiser function, without > actually > > having to specify the pointer passing across. The structure is created > and > > allocated by the initialising function (constructor, if you like) and > > because all of the functions are declared within the namespace as > closures, > > the compiler implicitly passes the pointer to the struct without having > to > > specify it. > > You don't need to do this. You're still thinking in traditional OO > terms. I'd suggest trying to think in a more Go like way instead. FWIW > I tried to point you in a more sensible direction with this code: > https://play.golang.org/p/wv0T8T8Ynns > > > I don't know exactly what FP paradigm says about structures and > collections > > of functions exactly, as prima facie it looks like OOP. But closures are > a > > uniquely FP mechanism, and really they are just a way to merge > namespaces, > > and by doing this we don't have to pass the pointer to the function > > explicitly, as it is already accessible. > > What gives you the idea that closures are a uniquely FP mechanism? > > rog. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
On 25 April 2018 at 08:05, Louki Sumirniywrote: > https://stackoverflow.com/questions/6531543/efficient-implementation-of-binary-heaps > > Pretty much what I'm working on here is this, except with left to right sort > instead of vertical. I think this guy's work will help me iron out the > performance issues. You do know that heaps aren't a great data structure for searching, right? > Another thing, that is more on topic more specifically, is that collections > of interface methods introduce a significant overhead, compared to simply > passing the pointer to the structure as a parameter. I am thinking that > maybe a way to hide this parameter passing is by using closures, which bind > in the namespace from a hypothetical initialiser function, without actually > having to specify the pointer passing across. The structure is created and > allocated by the initialising function (constructor, if you like) and > because all of the functions are declared within the namespace as closures, > the compiler implicitly passes the pointer to the struct without having to > specify it. You don't need to do this. You're still thinking in traditional OO terms. I'd suggest trying to think in a more Go like way instead. FWIW I tried to point you in a more sensible direction with this code: https://play.golang.org/p/wv0T8T8Ynns > I don't know exactly what FP paradigm says about structures and collections > of functions exactly, as prima facie it looks like OOP. But closures are a > uniquely FP mechanism, and really they are just a way to merge namespaces, > and by doing this we don't have to pass the pointer to the function > explicitly, as it is already accessible. What gives you the idea that closures are a uniquely FP mechanism? rog. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
https://stackoverflow.com/questions/6531543/efficient-implementation-of-binary-heaps Pretty much what I'm working on here is this, except with left to right sort instead of vertical. I think this guy's work will help me iron out the performance issues. Another thing, that is more on topic more specifically, is that collections of interface methods introduce a significant overhead, compared to simply passing the pointer to the structure as a parameter. I am thinking that maybe a way to hide this parameter passing is by using closures, which bind in the namespace from a hypothetical initialiser function, without actually having to specify the pointer passing across. The structure is created and allocated by the initialising function (constructor, if you like) and because all of the functions are declared within the namespace as closures, the compiler implicitly passes the pointer to the struct without having to specify it. I don't know exactly what FP paradigm says about structures and collections of functions exactly, as prima facie it looks like OOP. But closures are a uniquely FP mechanism, and really they are just a way to merge namespaces, and by doing this we don't have to pass the pointer to the function explicitly, as it is already accessible. At least this is the tack I'm taking for today's outing into attempting to create a binary search tree with cache data locality. To some extent the benefits of this have been demonstrated with some other, relatively new algorithms like the one linked to above, and many of the same principles will apply. Index starting at shortens the walk functions significantly. Also, there is a question about the nil value for nodes. As someone pointed out, a non-trivial amount of hash results will be zero. Unless I'm mistaken, this also means that at that point, a hashchain would repeat, as this common originating value necessarily produces a common output. So in fact, I can use the zero sentinel or nil node payload, because we can consider the hashchain terminated if it does make a zero, and there is no point going any further as the remainder of elements of such a hashchain will be the exact same pattern as starting from a zero nonce. On Tuesday, 24 April 2018 17:30:30 UTC+3, matthe...@gmail.com wrote: > > I'd suggest starting with the basic algorithm without any abstraction >> (just hard-code in the type you want to store), then benchmark/tweak >> the algorithm, and only then try to make it general. > > > This is my conclusion too. Abstracting the code is a lot of churn if we’re > not sure performance without abstraction works. We’ll be able to help on > the Go front better with a foundation. > > Matt > > On Tuesday, April 24, 2018 at 9:22:21 AM UTC-5, Louki Sumirniy wrote: >> >> Reading through the wikipedia description of a heap, and especially a >> binary heap... it's a heap. But that's not a sexy name! Technically it's >> not a heap because it sorts left to right, heaps sort bottom to top. >> >> I am stripping down my code and directly declaring the struct variables >> as function types, and I will be removing the method-interface completely, >> as I don't need it. I will be able to then isolate the external interface >> functions and 'hide' the internals while being able to easily link the >> scope of functions to the structure. >> >> If I really need a secondary value to flag empty or not, then the memory >> efficient way would be to have a bit-indexed flag for this property, so for >> every 32 bits of data one bit of flag and keep this in an orthogonal entry >> in the structure. You are correct, though the odds are small, probably >> 1/2^40 for 32 bit hashes making zeros, that kinda works out to 2^32/2^40 >> frequency of all zeroes and I think that amounts to 0.390625% being zero. >> That would have been a problem! So for 30 rows of 32 bits I will need 2^30 >> array elements, 2^30/8 bytes to store the empty/fill. I can't think of a >> more space efficient way to store this property. About 134Mb would be >> required for this, based on 30 rows. It can, however, be extended at the >> same time as the tree so its size will keep this proportion linear to the >> size of the tree array, it's not a big overhead compared to the 4Gb the 30 >> row store will use. >> >> If it seems like this is excessive numbers, this stores 1/4 of the total >> magnitude of possible values of 32 bit hashes, or half hashes. In the >> application I have in mind, probably most of the time trees won't grow to >> more than about 15 rows anyway, which is not that much, before the required >> pattern is found in a hashchain, at least until the number of miners on the >> network gets really big. I am writing a Proof of Work algorithm and I was >> aiming for something that would scale down to short time periods at low >> difficulty. >> >> This tree store would satisfy that by allowing solutions to pop out at a >> particular sequence in a
Re: [go-nuts] Type func binding and interfaces
> > I'd suggest starting with the basic algorithm without any abstraction > (just hard-code in the type you want to store), then benchmark/tweak > the algorithm, and only then try to make it general. This is my conclusion too. Abstracting the code is a lot of churn if we’re not sure performance without abstraction works. We’ll be able to help on the Go front better with a foundation. Matt On Tuesday, April 24, 2018 at 9:22:21 AM UTC-5, Louki Sumirniy wrote: > > Reading through the wikipedia description of a heap, and especially a > binary heap... it's a heap. But that's not a sexy name! Technically it's > not a heap because it sorts left to right, heaps sort bottom to top. > > I am stripping down my code and directly declaring the struct variables as > function types, and I will be removing the method-interface completely, as > I don't need it. I will be able to then isolate the external interface > functions and 'hide' the internals while being able to easily link the > scope of functions to the structure. > > If I really need a secondary value to flag empty or not, then the memory > efficient way would be to have a bit-indexed flag for this property, so for > every 32 bits of data one bit of flag and keep this in an orthogonal entry > in the structure. You are correct, though the odds are small, probably > 1/2^40 for 32 bit hashes making zeros, that kinda works out to 2^32/2^40 > frequency of all zeroes and I think that amounts to 0.390625% being zero. > That would have been a problem! So for 30 rows of 32 bits I will need 2^30 > array elements, 2^30/8 bytes to store the empty/fill. I can't think of a > more space efficient way to store this property. About 134Mb would be > required for this, based on 30 rows. It can, however, be extended at the > same time as the tree so its size will keep this proportion linear to the > size of the tree array, it's not a big overhead compared to the 4Gb the 30 > row store will use. > > If it seems like this is excessive numbers, this stores 1/4 of the total > magnitude of possible values of 32 bit hashes, or half hashes. In the > application I have in mind, probably most of the time trees won't grow to > more than about 15 rows anyway, which is not that much, before the required > pattern is found in a hashchain, at least until the number of miners on the > network gets really big. I am writing a Proof of Work algorithm and I was > aiming for something that would scale down to short time periods at low > difficulty. > > This tree store would satisfy that by allowing solutions to pop out at a > particular sequence in a hash chain derived from a nonce, and so workers > would be able to find solutions in a much more temporally distributed > manner than occurs with Cuckoo Cycle and Equihash, which both use array > sorts to do their searches. Hence the reason for using this type of > structure. And the reason for wanting to use a compact underlying storage > is precisely about maximising the caching efficiency in the CPU, because I > want to write a solver that can't be sped up significantly with a faster > memory bus and/or memory cell retrieval latency (and further, retrieval > latency in memory cells still has a minor but significant amount of extra > latency for random access, thus wanting to reduce the memory footprint, and > increase linearity). It makes absolutely no sense to use a reference based > binary tree for values that are as small as the references themselves, as > this implies automatically a 4x size of memory required. > > ok. again, big thanks for your help with this, I really will stop > ruminating, I promise! > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
Reading through the wikipedia description of a heap, and especially a binary heap... it's a heap. But that's not a sexy name! Technically it's not a heap because it sorts left to right, heaps sort bottom to top. I am stripping down my code and directly declaring the struct variables as function types, and I will be removing the method-interface completely, as I don't need it. I will be able to then isolate the external interface functions and 'hide' the internals while being able to easily link the scope of functions to the structure. If I really need a secondary value to flag empty or not, then the memory efficient way would be to have a bit-indexed flag for this property, so for every 32 bits of data one bit of flag and keep this in an orthogonal entry in the structure. You are correct, though the odds are small, probably 1/2^40 for 32 bit hashes making zeros, that kinda works out to 2^32/2^40 frequency of all zeroes and I think that amounts to 0.390625% being zero. That would have been a problem! So for 30 rows of 32 bits I will need 2^30 array elements, 2^30/8 bytes to store the empty/fill. I can't think of a more space efficient way to store this property. About 134Mb would be required for this, based on 30 rows. It can, however, be extended at the same time as the tree so its size will keep this proportion linear to the size of the tree array, it's not a big overhead compared to the 4Gb the 30 row store will use. If it seems like this is excessive numbers, this stores 1/4 of the total magnitude of possible values of 32 bit hashes, or half hashes. In the application I have in mind, probably most of the time trees won't grow to more than about 15 rows anyway, which is not that much, before the required pattern is found in a hashchain, at least until the number of miners on the network gets really big. I am writing a Proof of Work algorithm and I was aiming for something that would scale down to short time periods at low difficulty. This tree store would satisfy that by allowing solutions to pop out at a particular sequence in a hash chain derived from a nonce, and so workers would be able to find solutions in a much more temporally distributed manner than occurs with Cuckoo Cycle and Equihash, which both use array sorts to do their searches. Hence the reason for using this type of structure. And the reason for wanting to use a compact underlying storage is precisely about maximising the caching efficiency in the CPU, because I want to write a solver that can't be sped up significantly with a faster memory bus and/or memory cell retrieval latency (and further, retrieval latency in memory cells still has a minor but significant amount of extra latency for random access, thus wanting to reduce the memory footprint, and increase linearity). It makes absolutely no sense to use a reference based binary tree for values that are as small as the references themselves, as this implies automatically a 4x size of memory required. ok. again, big thanks for your help with this, I really will stop ruminating, I promise! -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
I spent a bit of time thinking about all the indirections in my code and decided I'm going hybrid functional/procedural on this, and minimising the visible overhead that the OOP constructs are clearly creating. One nice thing about Go's hostile attitude towards OOP is that when you find the way to do it, you realise 'oh, that's why go doesn't like it'. soo, for me :) unsubscribing and not ruminating any further (*sigh of relief in the peanut gallery*) :D On Tuesday, 24 April 2018 14:59:22 UTC+3, Louki Sumirniy wrote: > > I'll just be brief this time and hit the points quickly: > > Yes, technically it is a Complete Binary Tree but you won't find a > Complete Implementation of one anywhere, I have looked, and there isn't. > The concept has been rejected without being tested by anyone who has > considered it at least what I can find. > > Those relational functions you mentioned are not correct for a tree with > the array index 0 being the root, this is them here: > > Parent: > > c.Index = c.Index>>1 - (c.Index+1)%2 > > Left Child: > > c.Index = c.Index<<1 ^ 1 > > Right Child: > > c.Index = c.Index<<1 + 2 > > I have used bitwise operators because it is always powers of two, and thus > will be faster on a processor with bitshift and no worse than using *2 > without. I used an XOR on left because the shift is left and thus the LSB > can be flipped without computation to add one. The right one adds two, this > is related to using 0 as the root also. > > The parent function is the one that got me the most excited when I came up > with it - it avoids the use of comparison/branch functions by taking > advantage of the property of the tree rooted at 0, that every left child is > odd and every right is even (that's what the IsOdd function returns - and > computes it with &1, and is used in my Left and Right walk functions that > 'flatten' the tree and walk it like it's a sorted list). By adding one to > the index you flip its' odd/even status, gives you a 1 if it then becomes > odd. This effectively is a 'move inwards' operation. It saves on comparison > and branching and I know on 32 bit integers bit shift, addition and modulus > are very fast. (and the code is very short in the cache) > > By rooting the tree at array index 0 it becomes possible to pick any given > index and know if it's a left or right child by whether it's odd or even, > and the special case of zero, which is neither (yes, I know it's referred > to as even but it's not really a number, according to number theory), tells > you also you can't move upwards. > > Compared to a potential and mostly filled tree of 30 rows, the amount of > space taken up by the code is quite small, with the separation of functions > for tree and data type, there is no wasted space repeating functionality > that can be abstracted. Plus, for up to around 20 rows of tree, the array > will fit mostly within a cpu cache, and since operations take place on one > side and the other, and mainly affect small segments of the subtree, each > subsequent row does not have to be fully loaded into the cache for most > operations. > > Technically my approach is more like a functional programming approach to > the problem, as I am passing pointers to functions in order to implement > the data specific elements, only the interface for the array store is OOP > style (and it DOES have to be encapsulated in a struct because Go slices do > not work with interfaces). > > As to exposing things, in fact an external payload data type library can > decide exactly what it will expose. I am only writing a base implementation > with uint32 because to test it I have to fill the data with *something* and > quite likely 32 bit hashes will be actually used in my intended application > for this library, as the heads and tails from 64bit hashes. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
On 24 April 2018 at 12:59, Louki Sumirniywrote: > I'll just be brief this time and hit the points quickly: > > Yes, technically it is a Complete Binary Tree but you won't find a Complete > Implementation of one anywhere, I have looked, and there isn't. The concept > has been rejected without being tested by anyone who has considered it at > least what I can find. Actually, I think it's probably a "full binary tree" you're thinking of rather than a complete binary tree, although I can't tell until you implement some of the insert or search operations. FYI a complete implementation of a complete binary tree can be found in the Go standard library as part of the heap implementation: see https://golang.org/pkg/container/heap so it's hard to say the concept has been rejected by everyone. It's the standard algorithm for a binary heap. > Technically my approach is more like a functional programming approach to > the problem, as I am passing pointers to functions in order to implement the > data specific elements, only the interface for the array store is OOP style > (and it DOES have to be encapsulated in a struct because Go slices do not > work with interfaces). I'm sorry, there really is nothing functional-programming-like about your implementation. Use of function values does not imply a functional programming style. > As to exposing things, in fact an external payload data type library can > decide exactly what it will expose. I am only writing a base implementation > with uint32 because to test it I have to fill the data with *something* and > quite likely 32 bit hashes will be actually used in my intended application > for this library, as the heads and tails from 64bit hashes. If you're using 32 or 64 bit hashes, you really should not assume you're cannot get zero-valued hashes. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
I'll just be brief this time and hit the points quickly: Yes, technically it is a Complete Binary Tree but you won't find a Complete Implementation of one anywhere, I have looked, and there isn't. The concept has been rejected without being tested by anyone who has considered it at least what I can find. Those relational functions you mentioned are not correct for a tree with the array index 0 being the root, this is them here: Parent: c.Index = c.Index>>1 - (c.Index+1)%2 Left Child: c.Index = c.Index<<1 ^ 1 Right Child: c.Index = c.Index<<1 + 2 I have used bitwise operators because it is always powers of two, and thus will be faster on a processor with bitshift and no worse than using *2 without. I used an XOR on left because the shift is left and thus the LSB can be flipped without computation to add one. The right one adds two, this is related to using 0 as the root also. The parent function is the one that got me the most excited when I came up with it - it avoids the use of comparison/branch functions by taking advantage of the property of the tree rooted at 0, that every left child is odd and every right is even (that's what the IsOdd function returns - and computes it with &1, and is used in my Left and Right walk functions that 'flatten' the tree and walk it like it's a sorted list). By adding one to the index you flip its' odd/even status, gives you a 1 if it then becomes odd. This effectively is a 'move inwards' operation. It saves on comparison and branching and I know on 32 bit integers bit shift, addition and modulus are very fast. (and the code is very short in the cache) By rooting the tree at array index 0 it becomes possible to pick any given index and know if it's a left or right child by whether it's odd or even, and the special case of zero, which is neither (yes, I know it's referred to as even but it's not really a number, according to number theory), tells you also you can't move upwards. Compared to a potential and mostly filled tree of 30 rows, the amount of space taken up by the code is quite small, with the separation of functions for tree and data type, there is no wasted space repeating functionality that can be abstracted. Plus, for up to around 20 rows of tree, the array will fit mostly within a cpu cache, and since operations take place on one side and the other, and mainly affect small segments of the subtree, each subsequent row does not have to be fully loaded into the cache for most operations. Technically my approach is more like a functional programming approach to the problem, as I am passing pointers to functions in order to implement the data specific elements, only the interface for the array store is OOP style (and it DOES have to be encapsulated in a struct because Go slices do not work with interfaces). As to exposing things, in fact an external payload data type library can decide exactly what it will expose. I am only writing a base implementation with uint32 because to test it I have to fill the data with *something* and quite likely 32 bit hashes will be actually used in my intended application for this library, as the heads and tails from 64bit hashes. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
On 24 April 2018 at 09:58, Louki Sumirniywrote: > > > On Tuesday, 24 April 2018 11:15:34 UTC+3, rog wrote: >> >> On 23 April 2018 at 19:58, Louki Sumirniy >> wrote: > > >> >> > I set many of those identifiers to export because I wanted to keep >> > separate >> > derived libraries that replaced the storage types embedded into the main >> > library. These limitations would not exist were the 'back store' over >> > the >> > wire or on disk. They only exist for the storage that is most accessible >> > to >> > the program, the main memory. >> >> I don't understand this. If you're replacing the underlying storage >> types, why would you need to export details of an algorithm that the >> storage types shouldn't need to have anything to do with? > > > Because if they can't be accessed outside of the scope of the file I am > forced to add the alternative data types within the same file. I can't parse this sentence. There's nothing stopping you from having alternative storage implementations outside the package AFAICS. Also, file scope isn't really a thing in Go (*). > This is a > massive loss of flexibility (and means an importing application cannot > implement its own variant). In my experience, allowing external code to "implement its own variant" by exporting all the component pieces of an algorithm and allowing it to arbitrarily vary them is a recipe for cryptic APIs and broken invariants. This is a hallmark of class-hierarchy-based OOP, which is a style that Go specifically avoids. I'd suggest exposing as few moving parts as possible. If someone wants to change the algorithm, let them fork the code. >> > This idea that I should be hiding all the functions smells like OOP to >> > me. >> >> This isn't OOP - it's just good code hygiene. By hiding implementation >> details, you make the public interface clear and you prevent clients >> from relying on internal implementation details that you might want to >> change. > > > How do I override the functions to implement the data abstraction if the > overridden functions are not exported? Don't do that. You've got two principal abstractions here - the underlying storage (more-or-less a slice) and the algorithm itself. It's important to be able to change the storage implementation, but the algorithm itself does not need to be able to be overridden. I think you're still thinking as if you're in a subclass-based language. >> Private functions are not "security by obscurity". Private functions >> aren't "obscure" (you can see them in the source perfectly well) but >> they are "secure" because you cannot access them. > > > This is nonsense because all I have to do is fork and expose and what > exactly is the nature of the danger of me knowing the internals? There is no danger in *you* knowing the internals, but there is danger in *external code* depending on the internals. >> I think you overestimate the novelty of your algorithm. It sounds to >> me like it bears significant resemblance to a "complete binary tree" >> (see https://xlinux.nist.gov/dads/HTML/completeBinaryTree.html). > > > A complete binary tree does not have a concept of empty nodes. It must do, because a complete binary tree can be implemented by an array and does not have to have exactly 2^n nodes, so by implication some of the nodes in the array may be empty. > It uses references to bind nodes together, thus the overhead of memory > storage, and the randomness (usually insert order) in the array of nodes. If by "references", you mean "pointers", that's not true. As the link I included says: It can be efficiently implemented as an array, where a node at index i has children at indexes 2i and 2i+1 and a parent at index i/2 That sounds very like what you're doing. > The random memory locations of the node data is precisely what this > algorithm aims to eliminate, as well as reducing memory overhead. Your algorithm can waste up to half the overall storage space. I don't see how that's "reducing memory overhead" in general. > Benchmarks would require a completed algorithm. I will absolutely be > profiling it once it's done, in order to work out the comparative processing > loads for each type of operation. I have a vague concept of the heuristics > for optimisation operations, but I had been putting off filling these parts > in because I still hadn't achieved the data type abstraction at that point. I'd suggest starting with the basic algorithm without any abstraction (just hard-code in the type you want to store), then benchmark/tweak the algorithm, and only then try to make it general. > Thanks for your discussion about this... it's implicitly complimentary to > have my ruminations actually read and considered. Your ruminations are verbose enough that I suspect I won't be reading them much more in the future, I'm afraid, but I hope my thoughts have been useful to you. rog. (*) technically, package identifiers are in
Re: [go-nuts] Type func binding and interfaces
Looking more into "Complete" binary trees, it's somewhat relevant in that this structure probably should be pushed upwards and the upper reaches of the tree made as complete as possible. Given a random source of data to fill the tree with, the odds are that the middle values will dominate due to normal/gaussian distribution, the only optimisation problem there has to do with when laterally adjacent nodes are numerically adjacent, yet they are towards the top of the tree. These nodes block any further inserts below them and need to be pushed apart, I call this 'declumping', it's one of the elements of the heuristics I have developed so far. On Tuesday, 24 April 2018 12:02:44 UTC+3, Louki Sumirniy wrote: > > I just dug into the data structure spec for Btrfs, figuring it would have > trees in it, and no, they are not implemented in the way I am working here > either. > > Yes, it may well be that I am on a wild goose chase with this, but hey, > I've learned an awful lot of cool stuff now that will serve me well later > anyway, regardless of whether this succeeds or fails when the rubber hits > the road. > > On Tuesday, 24 April 2018 11:58:39 UTC+3, Louki Sumirniy wrote: >> >> >> >> On Tuesday, 24 April 2018 11:15:34 UTC+3, rog wrote: >>> >>> On 23 April 2018 at 19:58, Louki Sumirniy >>>wrote: >>> >> >> >>> > I set many of those identifiers to export because I wanted to keep >>> separate >>> > derived libraries that replaced the storage types embedded into the >>> main >>> > library. These limitations would not exist were the 'back store' over >>> the >>> > wire or on disk. They only exist for the storage that is most >>> accessible to >>> > the program, the main memory. >>> >>> I don't understand this. If you're replacing the underlying storage >>> types, why would you need to export details of an algorithm that the >>> storage types shouldn't need to have anything to do with? >>> >> >> Because if they can't be accessed outside of the scope of the file I am >> forced to add the alternative data types within the same file. This is a >> massive loss of flexibility (and means an importing application cannot >> implement its own variant). In my test/scratch/thingy, even within the same >> folder direct access was impossible because of file scoping, so I >> capitalised the fields, because if it can't be done even in the same >> folder, in a different pkg/name folder for another implementing library I >> won't be able to do this either >> >> >>> > Regarding the use of zero as an nil, that is already accepted by >>> numerous >>> > built in types (error, for example), and in this case the reason why I >>> have >>> > put that into the *abstract* type is because the data I am writing >>> this >>> > algorithm for is hashes, which are never zero, at least in practise, >>> over a >>> > period of decades, no hash functions produce zero out of any >>> significant >>> > amount of known data. >>> >>> Why does your algorithm need to know about the possible nil-ness of >>> the stored items? It doesn't use that property now, at any rate. >>> >> >> Zero is the default signifier of an empty node. Zero acts as the sentinel >> for the search to detect when it has fallen off the edge of a branch. But >> zero isn't necessarily the 'empty node' value, an implementation using >> strings needs to compare to "", and struct and slice types would need to >> test for nil, and one may conceive of an application where the empty field >> sentinel actually has a value (and its initialisation would require filling >> the data that way, of course) >> >> >>> >>> > This idea that I should be hiding all the functions smells like OOP to >>> me. >>> >>> This isn't OOP - it's just good code hygiene. By hiding implementation >>> details, you make the public interface clear and you prevent clients >>> from relying on internal implementation details that you might want to >>> change. >>> >> >> How do I override the functions to implement the data abstraction if the >> overridden functions are not exported? Internal implementation details for >> this specific library will not change once I hit the beta milestone. This >> is not a complex application, it is just a data storage/search library. >> Somewhere along the line there will have to be some exported elements that >> can be passed to and fro and possibly some of these will be encapsulated >> like this. But encapsulation can be a big obstacle to reuse and derivation. >> >> Specific points that clearly do not need to be visible need to be pointed >> out. If someone is *extending* this then to be honest, they should just be >> forking it anyway, rather than importing it. Also, I don't think that is as >> easy to do in Go as it would be in an OOP language, as I had to do very >> specific things in the base type struct spec in order to enable overriding. >> In an OOP language you can override until the
Re: [go-nuts] Type func binding and interfaces
I just dug into the data structure spec for Btrfs, figuring it would have trees in it, and no, they are not implemented in the way I am working here either. Yes, it may well be that I am on a wild goose chase with this, but hey, I've learned an awful lot of cool stuff now that will serve me well later anyway, regardless of whether this succeeds or fails when the rubber hits the road. On Tuesday, 24 April 2018 11:58:39 UTC+3, Louki Sumirniy wrote: > > > > On Tuesday, 24 April 2018 11:15:34 UTC+3, rog wrote: >> >> On 23 April 2018 at 19:58, Louki Sumirniy >>wrote: >> > > >> > I set many of those identifiers to export because I wanted to keep >> separate >> > derived libraries that replaced the storage types embedded into the >> main >> > library. These limitations would not exist were the 'back store' over >> the >> > wire or on disk. They only exist for the storage that is most >> accessible to >> > the program, the main memory. >> >> I don't understand this. If you're replacing the underlying storage >> types, why would you need to export details of an algorithm that the >> storage types shouldn't need to have anything to do with? >> > > Because if they can't be accessed outside of the scope of the file I am > forced to add the alternative data types within the same file. This is a > massive loss of flexibility (and means an importing application cannot > implement its own variant). In my test/scratch/thingy, even within the same > folder direct access was impossible because of file scoping, so I > capitalised the fields, because if it can't be done even in the same > folder, in a different pkg/name folder for another implementing library I > won't be able to do this either > > >> > Regarding the use of zero as an nil, that is already accepted by >> numerous >> > built in types (error, for example), and in this case the reason why I >> have >> > put that into the *abstract* type is because the data I am writing this >> > algorithm for is hashes, which are never zero, at least in practise, >> over a >> > period of decades, no hash functions produce zero out of any >> significant >> > amount of known data. >> >> Why does your algorithm need to know about the possible nil-ness of >> the stored items? It doesn't use that property now, at any rate. >> > > Zero is the default signifier of an empty node. Zero acts as the sentinel > for the search to detect when it has fallen off the edge of a branch. But > zero isn't necessarily the 'empty node' value, an implementation using > strings needs to compare to "", and struct and slice types would need to > test for nil, and one may conceive of an application where the empty field > sentinel actually has a value (and its initialisation would require filling > the data that way, of course) > > >> >> > This idea that I should be hiding all the functions smells like OOP to >> me. >> >> This isn't OOP - it's just good code hygiene. By hiding implementation >> details, you make the public interface clear and you prevent clients >> from relying on internal implementation details that you might want to >> change. >> > > How do I override the functions to implement the data abstraction if the > overridden functions are not exported? Internal implementation details for > this specific library will not change once I hit the beta milestone. This > is not a complex application, it is just a data storage/search library. > Somewhere along the line there will have to be some exported elements that > can be passed to and fro and possibly some of these will be encapsulated > like this. But encapsulation can be a big obstacle to reuse and derivation. > > Specific points that clearly do not need to be visible need to be pointed > out. If someone is *extending* this then to be honest, they should just be > forking it anyway, rather than importing it. Also, I don't think that is as > easy to do in Go as it would be in an OOP language, as I had to do very > specific things in the base type struct spec in order to enable overriding. > In an OOP language you can override until the cows come home, and this is > the 'fragility' problem of OOP classes that Go spsecifically aims to > eliminate by enforcing composition. > > If a user of the library were to embed the structure in another type, they > would get access to these private elements anyway, and then by doing this > on an import rather than a fork, they would be depending on me not to > change the way it works. Everything I have exposed is specifically exposed > to allow an external library to simply reimplement the set of accessors. > These accessors need to know the Cursor type also, as this is how the tree > is walked. > > If I was writing a much more complex library then clearly there needs to > be exposure of only that which is interface, and the interface needs to be > immutable until version bumps. > > >> > As it stands with
Re: [go-nuts] Type func binding and interfaces
On Tuesday, 24 April 2018 11:15:34 UTC+3, rog wrote: > > On 23 April 2018 at 19:58, Louki Sumirniy >wrote: > > > I set many of those identifiers to export because I wanted to keep > separate > > derived libraries that replaced the storage types embedded into the main > > library. These limitations would not exist were the 'back store' over > the > > wire or on disk. They only exist for the storage that is most accessible > to > > the program, the main memory. > > I don't understand this. If you're replacing the underlying storage > types, why would you need to export details of an algorithm that the > storage types shouldn't need to have anything to do with? > Because if they can't be accessed outside of the scope of the file I am forced to add the alternative data types within the same file. This is a massive loss of flexibility (and means an importing application cannot implement its own variant). In my test/scratch/thingy, even within the same folder direct access was impossible because of file scoping, so I capitalised the fields, because if it can't be done even in the same folder, in a different pkg/name folder for another implementing library I won't be able to do this either > > Regarding the use of zero as an nil, that is already accepted by > numerous > > built in types (error, for example), and in this case the reason why I > have > > put that into the *abstract* type is because the data I am writing this > > algorithm for is hashes, which are never zero, at least in practise, > over a > > period of decades, no hash functions produce zero out of any significant > > amount of known data. > > Why does your algorithm need to know about the possible nil-ness of > the stored items? It doesn't use that property now, at any rate. > Zero is the default signifier of an empty node. Zero acts as the sentinel for the search to detect when it has fallen off the edge of a branch. But zero isn't necessarily the 'empty node' value, an implementation using strings needs to compare to "", and struct and slice types would need to test for nil, and one may conceive of an application where the empty field sentinel actually has a value (and its initialisation would require filling the data that way, of course) > > > This idea that I should be hiding all the functions smells like OOP to > me. > > This isn't OOP - it's just good code hygiene. By hiding implementation > details, you make the public interface clear and you prevent clients > from relying on internal implementation details that you might want to > change. > How do I override the functions to implement the data abstraction if the overridden functions are not exported? Internal implementation details for this specific library will not change once I hit the beta milestone. This is not a complex application, it is just a data storage/search library. Somewhere along the line there will have to be some exported elements that can be passed to and fro and possibly some of these will be encapsulated like this. But encapsulation can be a big obstacle to reuse and derivation. Specific points that clearly do not need to be visible need to be pointed out. If someone is *extending* this then to be honest, they should just be forking it anyway, rather than importing it. Also, I don't think that is as easy to do in Go as it would be in an OOP language, as I had to do very specific things in the base type struct spec in order to enable overriding. In an OOP language you can override until the cows come home, and this is the 'fragility' problem of OOP classes that Go spsecifically aims to eliminate by enforcing composition. If a user of the library were to embed the structure in another type, they would get access to these private elements anyway, and then by doing this on an import rather than a fork, they would be depending on me not to change the way it works. Everything I have exposed is specifically exposed to allow an external library to simply reimplement the set of accessors. These accessors need to know the Cursor type also, as this is how the tree is walked. If I was writing a much more complex library then clearly there needs to be exposure of only that which is interface, and the interface needs to be immutable until version bumps. > > As it stands with how it has to be implemented in Go, the accessing of > these > > would-be-private-in-C++ scenario, is very cumbersome due to the need to > use > > type assertions. I have tinkered with this for educational value but I > don't > > want to have to deal with this in the actual implementation. A key > insight > > of Schneier's work is 'security by obscurity' is weak. Private functions > fit > > that maxim pretty well. > > Private functions are not "security by obscurity". Private functions > aren't "obscure" (you can see them in the source perfectly well) but > they are "secure" because you
Re: [go-nuts] Type func binding and interfaces
On 23 April 2018 at 19:58, Louki Sumirniywrote: > The type function bindings are not the problem, the problem is the fact that > despite in every other way the signatures match, they will not be found by > the compiler to bind to the type, and thus the need to wrap the invocations. > > I maybe should explain that I have had some experience programming with > Vala, which I suppose borrows something from C#, a competitor to Go, that in > my opinion mostly goes too far on the OOP side, but in Vala, it was > something that eventually pretty much put me off the language, having to > specify in many library functions 'bound' variables as parameters, 'in' and > 'out'. The code I have written essentially follows this model. I don't like > this model, and I like how Go otherwise allows the flexibility of > composition, but it puts this artificial barrier up against polymorphism. > > Note that another area that is very similar in its intent is the way that > you can't bind slices to interfaces. Having already figured out the above > problem, it only took me about an hour to figure out how to encapsulate the > slice in a struct and bypass this limitation. You can bind slices to interfaces, although you'll need to define a new slice type and some methods if you want it to do anything slice-specific. > What puzzles me though, is why anyone thinks that it should be made so > complicated to simply allow a DATA STORAGE library to switch out different > back-store types is a good thing. I don't understand why you think it's so complex to do this. If you look at the code I posted, you'll see that the Store type is very straightforward to implement - most of the code could even be generated automatically in fact. > As for your points: > > I set many of those identifiers to export because I wanted to keep separate > derived libraries that replaced the storage types embedded into the main > library. These limitations would not exist were the 'back store' over the > wire or on disk. They only exist for the storage that is most accessible to > the program, the main memory. I don't understand this. If you're replacing the underlying storage types, why would you need to export details of an algorithm that the storage types shouldn't need to have anything to do with? > Regarding the use of zero as an nil, that is already accepted by numerous > built in types (error, for example), and in this case the reason why I have > put that into the *abstract* type is because the data I am writing this > algorithm for is hashes, which are never zero, at least in practise, over a > period of decades, no hash functions produce zero out of any significant > amount of known data. Why does your algorithm need to know about the possible nil-ness of the stored items? It doesn't use that property now, at any rate. > This idea that I should be hiding all the functions smells like OOP to me. This isn't OOP - it's just good code hygiene. By hiding implementation details, you make the public interface clear and you prevent clients from relying on internal implementation details that you might want to change. > As it stands with how it has to be implemented in Go, the accessing of these > would-be-private-in-C++ scenario, is very cumbersome due to the need to use > type assertions. I have tinkered with this for educational value but I don't > want to have to deal with this in the actual implementation. A key insight > of Schneier's work is 'security by obscurity' is weak. Private functions fit > that maxim pretty well. Private functions are not "security by obscurity". Private functions aren't "obscure" (you can see them in the source perfectly well) but they are "secure" because you cannot access them. > I think that there is no real benefit in doing this, > since after all, this is the source, and if you can be told you can't access > that thing that is 'private' then it's not terribly private in reality, only > for arbitrary philosophical reasons excluded from your programmatic access > without forking the code. As above, doing this is considered good code hygiene. If you do decide to fork the code and make something public, then it's obvious that you're breaking the API boundaries. > The search functions are already in there, it's a piece of cake, given the > ability to test for equality, lesser and greater. That was probably one of > the first things I implemented, because it's so simple. Nobody has ever > implemented a binary search tree with a dense store like this before, and so > the insert and delete functions are not there yet. Just use your imagination > for a little bit, even with a 4 row *dense* tree and consider what happens > when I delete a node with children, let's say, at row 3, with two children. > Where do thtose children go? They are now orphans, as their parent has been > cut out of the tree. So, and critical to the design of BAST, is that it be > possible to perform short
Re: [go-nuts] Type func binding and interfaces
On Mon, Apr 23, 2018 at 8:58 PM Louki Sumirniy < louki.sumirniy.stal...@gmail.com> wrote: > The type function bindings are not the problem, the problem is the fact that despite in every other way the signatures match, they will not be found by the compiler to bind to the type, and thus the need to wrap the invocations. I understand all the words, can parse the sentence easily, but still have no ... idea what you're saying. Even after guessing that most probably by 'type function binding' you mean simply a method. Using well defined terms, appearing in the language specification, can make the ideas communicated much more comprehensible to others. -- -j -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Type func binding and interfaces
The type function bindings are not the problem, the problem is the fact that despite in every other way the signatures match, they will not be found by the compiler to bind to the type, and thus the need to wrap the invocations. I maybe should explain that I have had some experience programming with Vala, which I suppose borrows something from C#, a competitor to Go, that in my opinion mostly goes too far on the OOP side, but in Vala, it was something that eventually pretty much put me off the language, having to specify in many library functions 'bound' variables as parameters, 'in' and 'out'. The code I have written essentially follows this model. I don't like this model, and I like how Go otherwise allows the flexibility of composition, but it puts this artificial barrier up against polymorphism. Note that another area that is very similar in its intent is the way that you can't bind slices to interfaces. Having already figured out the above problem, it only took me about an hour to figure out how to encapsulate the slice in a struct and bypass this limitation. What puzzles me though, is why anyone thinks that it should be made so complicated to simply allow a DATA STORAGE library to switch out different back-store types is a good thing. As for your points: I set many of those identifiers to export because I wanted to keep separate derived libraries that replaced the storage types embedded into the main library. These limitations would not exist were the 'back store' over the wire or on disk. They only exist for the storage that is most accessible to the program, the main memory. Regarding the use of zero as an nil, that is already accepted by numerous built in types (error, for example), and in this case the reason why I have put that into the *abstract* type is because the data I am writing this algorithm for is hashes, which are never zero, at least in practise, over a period of decades, no hash functions produce zero out of any significant amount of known data. This idea that I should be hiding all the functions smells like OOP to me. As it stands with how it has to be implemented in Go, the accessing of these would-be-private-in-C++ scenario, is very cumbersome due to the need to use type assertions. I have tinkered with this for educational value but I don't want to have to deal with this in the actual implementation. A key insight of Schneier's work is 'security by obscurity' is weak. Private functions fit that maxim pretty well. I think that there is no real benefit in doing this, since after all, this is the source, and if you can be told you can't access that thing that is 'private' then it's not terribly private in reality, only for arbitrary philosophical reasons excluded from your programmatic access without forking the code. The search functions are already in there, it's a piece of cake, given the ability to test for equality, lesser and greater. That was probably one of the first things I implemented, because it's so simple. Nobody has ever implemented a binary search tree with a dense store like this before, and so the insert and delete functions are not there yet. Just use your imagination for a little bit, even with a 4 row *dense* tree and consider what happens when I delete a node with children, let's say, at row 3, with two children. Where do thtose children go? They are now orphans, as their parent has been cut out of the tree. So, and critical to the design of BAST, is that it be possible to perform short optimisations during delete/insert write operations that do not cost too much time, but at the same time, do not cause the search cost to become excessively spread out, and specifically with delete, you can't have orphans. They have to be pushed up and inwards, it's not complicated, but it gets more complicated the more nodes are children of the severed node. It's not a model for tree structures that ANYONE is familiar with. Unlike simple, stupid newbie mistakes, this is intentional. I am writing this algorithm because it, by its architecture, reduces random access of memory, and in this, it gains a benefit of reduced cache misses in the CPU and thus will have, potentially, a serious performance advantage compared to any search-and-sort algorithm whether it be hashtables or vector reference based binary search trees. Yes, on the last point, it always uses power of two because it is modelling a dense binary tree, hence the name 'Bifurcation Array'. Thus also why I haven't finished the insert/delete functions. You can't just spin things around like in a vector based BST. There is a cost in search linearity for the differentials of path lengths from the root, but here's the thing - if you want to dodge allocating a new row, you could instead find the value that you could push inwards to the other side, and trace a path until there is an inner empty child that you can 'rotate' into. Most of the time this will
Re: [go-nuts] Type func binding and interfaces
On 22 April 2018 at 23:20, Louki Sumirniywrote: > I essentially am trying to find an effective method in Go, preferably not > too wordy, that lets me create an abstract data type, a struct, and a set of > functions that bind to a different data type, and that I can write, > preferably not in too much code, a change that allows the data type of the > embedded data to be changed. It's basically kinda inheritance, but after > much fiddling I found a hackish sorta way that isn't *too* boilerplate > filled: > > type nullTester func(*Bast, uint32) bool > > type Bast struct { > ... > isNullnullTester > ... > } > > func isNull(b *Bast, d uint32) bool { > return d == 0 > } > > func NewBast() (b *Bast) { > ... > b.isNull = isNull > ... > } > > // IsNull - tests if a value in the tree is null > func (b *Bast) IsNull(d uint32) bool { > return b.isNull(b, d) > } > > > Now, bear in mind I haven't shown all of the code. But there is a slice > array in the Bast struct, and I it is defined as an interface{} and isNull > is one of a set of operators that have to be written to match the type used > in the slice store, this might be a bad example because it doesn't actually > act on the interface typed slice, but the point here is just this: > > It does not appear to be possible to make the type specification from the > top line match the function signature of the type-bound function in the > bottom of the code snippet. I haven't been able to find anything that shows > that a func type can have a method binding. > > https://github.com/calibrae-project/bast/blob/master/pkg/bast/bast.go is > where my WiP lives. This slightly hacky solution seems sound to me, I just > don't like to be forced to use workarounds like this. If a type signature > cannot be written that matches a method, yet I can do it this way, I don't > see what purpose this serves as far as any kind of correctness and > bug-resistance issues go. I would have to deal with a lot more potential > bugs if I had to concretely implemennt this library for the sake of 1 slice > and 7 functions out of a much larger library that conceptually is intended > to only deal with comparable, mainly numerical values anyway. I don't really understand why you think you want all those function types. Your code seems to mix concerns quite a bit, but interfaces in Go are generally closely focused on a particular thing. I don't understand anything about your algorithm, but ISTM you could do something like this: https://play.golang.org/p/wv0T8T8Ynns Some changes I made in doing that; - unexported a load of identifiers that really don't deserve to be part of the public API - made Cursor into a lightweight by-value type - defined a Store interface that doesn't require knowledge of the Cursor data type. A few other observations: - requiring a value type to know whether an item is null or not does not seem great when you're storing integers - who is to say that the zero uint32 is not a valid thing to store? - requiring callers to know about AddRow (and hence how the tree is organised) does not seem ideal. - there's no way to search for or insert items, but I guess you'll be doing that later. - it looks like your algorithm will always use a power-of-two multiple of the element size, which could end up very inefficient when the tree is large. cheers, rog. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.