Re: [PATCH 0/4] Initial POWER8 support

Hugo Eduardo L Kfouri Sat, 30 Nov 2013 13:23:32 -0800

Hello,
the challenge is to create an option to change the table during the tourney
if the player to want, like the option to make a deal at final table.
att
Hugo



2013/11/29 Marcelo Cerri <mhce...@linux.vnet.ibm.com>

> Hi Andy.
>
> On Thu, Nov 28, 2013 at 09:11:35AM +0100, Andy Polyakov wrote:
> > >Any comments on that?
> >
> > In one word "no-o-o-o-o-o-o". :-) In more words. Preferred way to
> > integrate processor-specific code is plotted in Intel AES-NI and
> > SPARC T4 modules. And "preferred" does not really mean "matter of
> > choice". [s390x module is usually mentioned in the context, and the
> > answer is I wish I had time to do something about it.]
> >
>
> Can you be more specific with that? What do you disagree? Is it the way
> I'm checking for processor's capabilities or the fact that I included
> functions to encrypt and decrypt just individual blocks? Both? Or
> anything else?
>
> Regarding block encryption, my idea is to first provide optimization
> for it and then include optimization for the most common cipher modes
> (CBC, CTR and so on). In that way, any cipher mode without a specific
> optimization still can have some level of performance improvement.
>
> > >>This patch series adds the initial support for POWER8 new cryptographic
> > >>instructions.
> > >>
> > >>Different versions of the ppc_vcipher_AES_[en|de]crypt were tested and
> > >>no significant performance gains where found, even using multiple
> vector
> > >>registers to load all sub-keys in advance.
> >
> > You naturally won't observe difference in single-block function.
>
> Yes, I agree. For single block encryption the bottleneck will be the
> vcipher instruction latency, and almost any implementation will perform
> similar.
>
> > Because all instructions are high latency and are dependent on each
> > other, so there is a lot of "free slots" to execute all the
> > collateral instructions. While it's not self-obvious that gain from
> > pre-loading key schedule can be observed in single-threaded
> > benchmark even in code with interleaved instructions in
> > parallelizeable modes, there might be other factors to consider. The
> > POWER8 processor is SMT (right?), and it should be advantageous to
> > pre-load for stream operations, so that there is more memory bus
> > bandwidth available to the other threads. Or it might be more
> > appropriate to use the "free slots" [which will be less numerous in
> > parallelizable modes] for other things, for example maintaining
> > counter values in CTR...
> >
> > >>Because of that, the version
> > >>included in this series was chosen based on readability.
> >
> > Why not folded loop then?
>
> I was talking specifically regarding changing the order that keys are
> loaded. I don't see any problem in using a loop instead.
>
> >
> > >>The performance
> > >>gain is about 5x in a non-final hardware.
> >
> > More important question is what is theoretical asymptotic limit, how
> > far are we from it and how to get there. Well, answer is naturally
> > mode-specific subroutines, but it doesn't change the point. One
> > should discuss even absolute numbers, not only relative improvement.
>
> I understand your point. And yes, just mode-specific routines will be
> able to get maximum performance from that. I can post absolute numbers,
> but in the end any notion of improvement only can be obtained when
> comparing it with the current assembly or C implementation results,
> specially because that is not a final hardware and the results will not
> reflect the performance of the final hardware.
>
> >
> > >>The patch "perlasm/ppc-xlate.pl: vcipher instructions support" is not
> > >>necessary for newer versions of GCC and I'd like to hear opinions if
> > >>it's worth to include it or not.
> >
> > Absolutely. And it applies to all new instructions. One can choose
> > to implement module-specific instructions in module itself and
> > common ones in ppc-xlate, e.g. vcipher in AES module and ldxvd2x in
> > ppc-xlate.
>
> Ok.
>
> >
> > >>Feel free to ask me any questions regarding the code.
> >
> > Doesn't one need to take care of vrsave? If it's not required on
> > Linux, is it required elsewhere? [It was required on MacOS X].
>
> You are right. I think that Linux doesn't rely on VRSAVE right now. But
> it might be better to save and set it properly. I will get more
> information on that I will let you know.
>
> >
> > Is presented code endian-neutral? Manual doesn't discuss endianness
> > in vcipher context, so I assume that instruction operation does not
> > depend on current endianness. Which would require split endian
> > operation for loading data, I assume in little-endian mode.
>
> I think that load and store operations will need some adjustment. I can
> include a non-tested code for it, but I don't have access yet to a
> little-endian environment on POWER8.
>
> >
> > As for ld/stxvd2x for data. Manual "threatens" with penalties on
> > cache line and page boundaries, and it doesn't seem to actually make
> > promise that it always works with byte alignment across page
> > boundaries. Yes, OS surely handles it by serving the exception, but
> > we don't want it to happen. Wouldn't it be more appropriate to
> > adhere to l/stvx? [See just committed vpaes-ppc.pl module for
> > example.]
> >
> > As for page boundaries in ld/stxvd2x. Key schedule is aligned at 64
> > bits (in e_aes.c) and this doesn't preclude possibility for a
> > ld/stxvd2x to cross page boundary. And if there is penalty, it might
> > get costly [because of recurring nature of references to key
> > schedule]. Should one consider lvx even for key schedule?
> >
>
> They work with non-aligned data but I will check if there's any issues
> with page boundaries.
>
> I'd like to include those changes in a incremental way, what do you
> think? I think it would avoid wasting time with submitting huge patches
> that might need to be completely redone.
>
> Regards,
> Marcelo
>
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> Development Mailing List                       openssl-dev@openssl.org
> Automated List Manager                           majord...@openssl.org
>

Re: [PATCH 0/4] Initial POWER8 support

Reply via email to