Hello, the challenge is to create an option to change the table during the tourney if the player to want, like the option to make a deal at final table. att Hugo
2013/11/29 Marcelo Cerri <mhce...@linux.vnet.ibm.com> > Hi Andy. > > On Thu, Nov 28, 2013 at 09:11:35AM +0100, Andy Polyakov wrote: > > >Any comments on that? > > > > In one word "no-o-o-o-o-o-o". :-) In more words. Preferred way to > > integrate processor-specific code is plotted in Intel AES-NI and > > SPARC T4 modules. And "preferred" does not really mean "matter of > > choice". [s390x module is usually mentioned in the context, and the > > answer is I wish I had time to do something about it.] > > > > Can you be more specific with that? What do you disagree? Is it the way > I'm checking for processor's capabilities or the fact that I included > functions to encrypt and decrypt just individual blocks? Both? Or > anything else? > > Regarding block encryption, my idea is to first provide optimization > for it and then include optimization for the most common cipher modes > (CBC, CTR and so on). In that way, any cipher mode without a specific > optimization still can have some level of performance improvement. > > > >>This patch series adds the initial support for POWER8 new cryptographic > > >>instructions. > > >> > > >>Different versions of the ppc_vcipher_AES_[en|de]crypt were tested and > > >>no significant performance gains where found, even using multiple > vector > > >>registers to load all sub-keys in advance. > > > > You naturally won't observe difference in single-block function. > > Yes, I agree. For single block encryption the bottleneck will be the > vcipher instruction latency, and almost any implementation will perform > similar. > > > Because all instructions are high latency and are dependent on each > > other, so there is a lot of "free slots" to execute all the > > collateral instructions. While it's not self-obvious that gain from > > pre-loading key schedule can be observed in single-threaded > > benchmark even in code with interleaved instructions in > > parallelizeable modes, there might be other factors to consider. The > > POWER8 processor is SMT (right?), and it should be advantageous to > > pre-load for stream operations, so that there is more memory bus > > bandwidth available to the other threads. Or it might be more > > appropriate to use the "free slots" [which will be less numerous in > > parallelizable modes] for other things, for example maintaining > > counter values in CTR... > > > > >>Because of that, the version > > >>included in this series was chosen based on readability. > > > > Why not folded loop then? > > I was talking specifically regarding changing the order that keys are > loaded. I don't see any problem in using a loop instead. > > > > > >>The performance > > >>gain is about 5x in a non-final hardware. > > > > More important question is what is theoretical asymptotic limit, how > > far are we from it and how to get there. Well, answer is naturally > > mode-specific subroutines, but it doesn't change the point. One > > should discuss even absolute numbers, not only relative improvement. > > I understand your point. And yes, just mode-specific routines will be > able to get maximum performance from that. I can post absolute numbers, > but in the end any notion of improvement only can be obtained when > comparing it with the current assembly or C implementation results, > specially because that is not a final hardware and the results will not > reflect the performance of the final hardware. > > > > > >>The patch "perlasm/ppc-xlate.pl: vcipher instructions support" is not > > >>necessary for newer versions of GCC and I'd like to hear opinions if > > >>it's worth to include it or not. > > > > Absolutely. And it applies to all new instructions. One can choose > > to implement module-specific instructions in module itself and > > common ones in ppc-xlate, e.g. vcipher in AES module and ldxvd2x in > > ppc-xlate. > > Ok. > > > > > >>Feel free to ask me any questions regarding the code. > > > > Doesn't one need to take care of vrsave? If it's not required on > > Linux, is it required elsewhere? [It was required on MacOS X]. > > You are right. I think that Linux doesn't rely on VRSAVE right now. But > it might be better to save and set it properly. I will get more > information on that I will let you know. > > > > > Is presented code endian-neutral? Manual doesn't discuss endianness > > in vcipher context, so I assume that instruction operation does not > > depend on current endianness. Which would require split endian > > operation for loading data, I assume in little-endian mode. > > I think that load and store operations will need some adjustment. I can > include a non-tested code for it, but I don't have access yet to a > little-endian environment on POWER8. > > > > > As for ld/stxvd2x for data. Manual "threatens" with penalties on > > cache line and page boundaries, and it doesn't seem to actually make > > promise that it always works with byte alignment across page > > boundaries. Yes, OS surely handles it by serving the exception, but > > we don't want it to happen. Wouldn't it be more appropriate to > > adhere to l/stvx? [See just committed vpaes-ppc.pl module for > > example.] > > > > As for page boundaries in ld/stxvd2x. Key schedule is aligned at 64 > > bits (in e_aes.c) and this doesn't preclude possibility for a > > ld/stxvd2x to cross page boundary. And if there is penalty, it might > > get costly [because of recurring nature of references to key > > schedule]. Should one consider lvx even for key schedule? > > > > They work with non-aligned data but I will check if there's any issues > with page boundaries. > > I'd like to include those changes in a incremental way, what do you > think? I think it would avoid wasting time with submitting huge patches > that might need to be completely redone. > > Regards, > Marcelo > > ______________________________________________________________________ > OpenSSL Project http://www.openssl.org > Development Mailing List openssl-dev@openssl.org > Automated List Manager majord...@openssl.org >