Hi everyone, To somewhat follow up on a previous mail, I have a few questions regarding the total bytes transferred for the following Vec operations (assuming Vec size N and sequential mode):
VecTDot() VecDot() VecNorm() VecScale() VecSet() VecAXPY() VecAYPX() VecWAXPY VecPointwiseMult() 1) for the first three operations, I am loading two vectors so that's 2*N*8 bytes transferred. For storing, am I simply storing one scalar? Or am I individually storing all N components as they are being summed up? 2) for the next five operations where I now have a scalar, am I loading it only once or is it loaded N times? 3) Do any of the above operations "overlap" or depend on one another? For instance if my solver invokes VecTDot X times does it also invoke, say, VecPointwiseMult X times? This is all theoretically speaking (i.e., assuming I am bypassing the cache and write-allocate policy). Thanks, Justin
