Re: [SIESTA-L] Puzzled about ParallelOverK feature

RCP Tue, 03 Nov 2015 05:48:08 -0800

Yeah!, I also suspected the clock. As far as I could follow the
code, wall-clock time derives from f95 SYSTEM_CLOCK intrinsic,
which is being handled by the master node. Rulling out programing
mistakes, would that still be dubious ?.
Well, at least I could time siesta from the shell, so that no
tinkering with the sources.


On 11/03/2015 10:12 AM, Nick Papior wrote:

Oh, a last thing, timing can be dubious as that is typically inferred
from clock-cycles in the cpu, hence I would advice you to time it your
self.Â
For instance by doing:

date
...
date

It depends on the underlying timing function used.


2015-11-03 14:03 GMT+01:00 Nick Papior <[email protected]
<mailto:[email protected]>>:

    That is crazy, and it is _amazing_!

    Nevertheless, I would still not recommend you do these kind of things.


    2015-11-03 13:37 GMT+01:00 RCP <[email protected]
    <mailto:[email protected]>>:

        Good morning !,

        Please have a look at the outcome of the crazy "mpirun -np 9 ..."
        exercise,

        ----------------------------------------------------------------------
        * Running onÂ  Â  9 nodes in parallel

        Â ... snipped ...

        siesta: iscfÂ  Â Eharris(eV)Â  Â  Â  E_KS(eV)Â  Â FreeEng(eV)Â
        Â dDmaxÂ  Ef(eV)
        siesta:Â  Â  1Â  -124261.2908Â  -124261.2891Â  -124261.2891Â
        0.0001 -2.5494
        timer: Routine,Calls,Time,% = IterSCFÂ  Â  Â  Â  1Â  Â
        1733.886Â  99.53
        elaps: Routine,Calls,Wall,% = IterSCFÂ  Â  Â  Â  1Â  Â
        Â 435.030Â  99.53
        -----------------------------------------------------------------------

        Amazing: the elaps row is pretty close to the 410.0 (or so) of my
        previous posts.

        However, yes, I seem to have a misunderstanding about the inner
        workings of the code (well, in a sense, this was my first question)
        because the timing info at the end of the output file says "diagon"
        is taking only about 1/3 of the total time,

        ------------------------------------------------------------------
        elaps: ELAPSED times:
        elaps:Â  RoutineÂ  Â  Â  Â CallsÂ  Â Time/callÂ  Â  Tot.timeÂ
        Â  Â  Â  %
        elaps:Â  siestaÂ  Â  Â  Â  Â  Â  1Â  Â  Â 712.814Â  Â
        Â 712.814Â  Â 100.00
        ...
        elaps:Â  diagonÂ  Â  Â  Â  Â  Â  1Â  Â  Â 225.603Â  Â  Â 225.603
        31.65 <tel:225.603%20%20%20%2031.65>
        elaps:Â  cdiagÂ  Â  Â  Â  Â  Â  Â 2Â  Â  Â  47.763Â  Â  Â
        95.527Â  Â  13.40
        elaps:Â  cdiag1Â  Â  Â  Â  Â  Â  2Â  Â  Â  Â 0.920Â  Â  Â
        Â 1.840Â  Â  Â 0.26
        elaps:Â  cdiag2Â  Â  Â  Â  Â  Â  2Â  Â  Â  Â 3.335Â  Â  Â
        Â 6.670Â  Â  Â 0.94
        elaps:Â  cdiag3Â  Â  Â  Â  Â  Â  2Â  Â  Â  42.961Â  Â  Â
        85.922Â  Â  12.05
        elaps:Â  cdiag4Â  Â  Â  Â  Â  Â  2Â  Â  Â  Â 0.512Â  Â  Â
        Â 1.025Â  Â  Â 0.14
        elaps:Â  DHSCF4Â  Â  Â  Â  Â  Â  1Â  Â  Â  71.874Â  Â  Â
        71.874Â  Â  10.08
        elaps:Â  dfscfÂ  Â  Â  Â  Â  Â  Â 1Â  Â  Â  70.398Â  Â  Â
        70.398Â  Â  Â 9.88
        elaps:Â  overfsmÂ  Â  Â  Â  Â  Â 1Â  Â  Â  Â 0.269Â  Â  Â
        Â 0.269Â  Â  Â 0.04
        elaps:Â  opticalÂ  Â  Â  Â  Â  Â 1Â  Â  Â  Â 0.000Â  Â  Â
        Â 0.000Â  Â  Â 0.00
        -------------------------------------------------------------------

        Take care,

        Roberto

        On 11/03/2015 08:28 AM, Nick Papior wrote:



            2015-11-03 12:10 GMT+01:00 RCP <[email protected]
            <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>:

            Â  Â  Hi,

            Â  Â  Thanks for your time and sharing of wisdom.
            Â  Â  In general terms I do agree with you Nick, in the
            sense that
            Â  Â  running several sequential independent tasks (wien2k)
            Â  Â  simultaneously is not equivalent to running a set of
            Â  Â  inter-communicated, MPI, tasks.

            Â  Â  However here we're talking about a peculiar situation,
            Â  Â  namely, parallelization over k-points is, essentially, an
            Â  Â  embarrassingly parallel problem, at least for my rather
            Â  Â  large cell (97 atoms). The sequential gathering of
            Â  Â  results from different k-points, building the new charge
            Â  Â  density and so on, should take negligible time compared
            Â  Â  to the time spent by a single task in diagonalizing a
            large
            Â  Â  matrix.

            ! ! NO ! ! ;)
            Parallelization across k-points in siesta is NOT the same as an
            embarrassingly parallel problem across k-points.
            The _only_ thing in siesta that is parallelized
            embarrassingly is the
            diagonalisation part (after having communicated all
            Hamiltonian elements
            to all other nodes). Everything else is MPI parallelized, grid
            operations, construction of the Hamiltonian, etc. etc.!Ã‚
            Yes, even though the diagonalization isÃ‚ embarrassingly and
            it _should_
            take the longest time your assumption that the
            diagonalization part is
            still the most time consuming becomes wrong.
            Furthermore 96 atoms ~ 1000 orbitals, not that big a matrix
            to diagonalize.

            Ã‚ Please look at the timing output for clarity of this.


            Â  Â  Of course, oversubscribing the CPUs must hurt performance
            Â  Â  at some point, and this is most likely worse for MPI
            tasks than
            Â  Â  for truly independent ones. But to me 5 MPI tasks
            competing for
            Â  Â  4 cores does not look a scenario that terrible.

            MPI is not sequential programming and any assumption on
            oversubscribing
            you have is wrong, put simply. The 5 MPI tasks does not
            compete for 4
            cores, siesta MPI tasks are linearly dependent on each other
            (as written
            in the last mail) and hence they have to keep up all the
            time. If the
            MPI program was fullyÃ‚ embarrassingly parallelized, then
            yes, you could,
            perhaps, have a point, but siesta is not such a code.
            How you can keep saying that oversubscribing can not be that
            damaging
            for performance (in fact improve) is really baffling to me :)

            Â  Â  Moreover, np=5 and np=4 resulted in almost the same
            elapsed
            Â  Â  time. It is hard to believe that my expected time win for
            Â  Â  np=5 was (almost) exactly compensated by performance
            loss. Ã‚

            Try doubling your system size and do the same calculation.Ã‚


            Â  Â  Nice discussion guys. I'll do a little more research
            and let you
            Â  Â  know if something worth comes out.Ã‚


            Â  Â  Take care,

            Â  Â  Roberto

            Â  Â  On 11/02/2015 07:21 PM, Salvador Barraza-Lopez wrote:

            Â  Â  Â  Â  Could not be clearer Nick.

            Â  Â  Â  Â  RIcardo, if you type top on your machine, you'll
            see two SIESTA
            Â  Â  Â  Â  processes competing for one core's time, and
            performing at 50%
            Â  Â  Â  Â  at most.


            Â  Â  Â  Â  Other cores will wait for these processes when
            an operation
            Â  Â  Â  Â  among all
            Â  Â  Â  Â  cores is necessary in the algorithm (i.e., a sum
            or a
            Â  Â  Â  Â  distributed matrix
            Â  Â  Â  Â  product)... thus these other cores will just
            have to wait for the
            Â  Â  Â  Â  task these processes competing for the same core
            time to end; thus
            Â  Â  Â  Â  degrading performance.


            Â  Â  Â  Â  -Salvador



            Â  Â  Â  Â
            
------------------------------------------------------------------------
            Â  Â  Â  Â  *From:* [email protected]
            <mailto:[email protected]>
            <mailto:[email protected]
            <mailto:[email protected]>>
            Â  Â  Â  Â  <[email protected]
            <mailto:[email protected]>
            <mailto:[email protected]
            <mailto:[email protected]>>> on
            Â  Â  Â  Â  behalf of
            Â  Â  Â  Â  Nick Papior <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
            Â  Â  Â  Â  *Sent:* Monday, November 2, 2015 4:08 PM
            Â  Â  Â  Â  *To:* [email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
            Â  Â  Â  Â  *Subject:* Re: [SIESTA-L] Puzzled about
            ParallelOverK feature


            Â  Â  Â  Â  2015-11-02 22:37 GMT+01:00 RCP
            <[email protected] <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>>:


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Hi Nick,

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Please take my word: I'm not a
            computer guru but started
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  using computers before the PC era :-).
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  I know hyperthreading is evil for
            scientific calculations,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  they're even disabled in BIOS. It is
            not that.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Why I'm saying np=5 should take less
            time than np=4, even if
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  my PC is a quad, is as follows.

            Â  Â  Â  Â  This is a wrong statement!
            Â  Â  Â  Â  By this argument everything that can be
            embarrassingly parallellized
            Â  Â  Â  Â  will take less or equal time when using the
            number of sequential
            Â  Â  Â  Â  divisions.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Distribution of k-points is round
            robin, and assume k-points
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  (the, trimmed, real ones, not M&K
            grid) take about the same
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  time to process.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Thus for np=4 I need 3 "time steps" to
            get the job done,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  namely (4 + 4 + 1) when seen from
            k-points perspective.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  On the other hand for np=5 the time
            taken would be
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  something like 2* 1/0.80 = 2.5,Ã‚Â  or
            even shorter,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  1/0.80 + 1 = 2.25.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â¿What is flawed with this argument?.


            Â  Â  Â  Â  Your flaw lies in using more cores than
            available, this has
            Â  Â  Â  Â  nothing to
            Â  Â  Â  Â  do with number of k-points, and your figures are
            based on a
            Â  Â  Â  Â  sequential
            Â  Â  Â  Â  program governed by the OS, not a parallel
            program (from what I've
            Â  Â  Â  Â  gathered).
            Â  Â  Â  Â  You should try running a simple openmp program with
            Â  Â  Â  Â  OMP_NUM_THREADS=4
            Â  Â  Â  Â  and 5 and see if that also degrades performance.

            Â  Â  Â  Â  Oversubscribing your CPU is _heavily_ inflicting
            performance and
            Â  Â  Â  Â  yes,
            Â  Â  Â  Â  oversubscribing can make your program run worse
            than the number of
            Â  Â  Â  Â  cores, especially when using MPI.
            Â  Â  Â  Â  By your argument you would get the same
            performance by doing
            Â  Â  Â  Â  mpirun -np
            Â  Â  Â  Â  9, no? Try that and you will see that it will be
            slower and
            Â  Â  Â  Â  slower the
            Â  Â  Â  Â  more processors you throw at it.
            Â  Â  Â  Â  MPI is not sequential and comparing the
            execution of a parallel and
            Â  Â  Â  Â  sequential program is, at best, erroneous.

            Â  Â  Â  Â  The reason it runs _perfect_ for your wien2k
            calculations (from
            Â  Â  Â  Â  what you
            Â  Â  Â  Â  say they are sequential programs) is that the
            processors there
            Â  Â  Â  Â  make NO
            Â  Â  Â  Â  communication with each other, meaning that each
            process can be
            Â  Â  Â  Â  halted/resumed at any time without notifying
            anything but the
            Â  Â  Â  Â  running
            Â  Â  Â  Â  process. With your wien2k np=5 the OS can pause,
            resume
            Â  Â  Â  Â  processors as it
            Â  Â  Â  Â  pleases with *relatively* little impact on the
            performance, there is
            Â  Â  Â  Â  some, but not that much. This is because each
            process is not
            Â  Â  Â  Â  dependent
            Â  Â  Â  Â  on the others and it will try and finish some
            before moving on.

            Â  Â  Â  Â  With MPI (siesta) this is _very_ wrong. Most MPI
            programs are
            Â  Â  Â  Â  communication bounded (i.e. not embarrassingly
            parallellized
            Â  Â  Â  Â  using MPI).
            Â  Â  Â  Â  The data is distributed and every process is
            dependent on each
            Â  Â  Â  Â  other, no
            Â  Â  Â  Â  process can progress without informing the other
            processors.
            Â  Â  Â  Â  This means 1) every processor does some work, 2)
            all processors
            Â  Â  Â  Â  communicate with each other, 3) repeat from step
            1). Now do
            Â  Â  Â  Â  steps 1 to 3
            Â  Â  Â  Â  a couple of million times and the OS becomes
            flooded with
            Â  Â  Â  Â  stop/resumes
            Â  Â  Â  Â  (basically, not in its entirety, but for brevity).
            Â  Â  Â  Â  Whenever you use MPI you should never use more
            processors than
            Â  Â  Â  Â  you have
            Â  Â  Â  Â  available.
            Â  Â  Â  Â
            (https://www.open-mpi.org/faq/?category=running#oversubscribing
            Â  Â  Â  Â
            
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.open-2Dmpi.org_faq_-3Fcategory-3Drunning-23oversubscribing&d=BQMFaQ&c=JL-fUnQvtjNLb7dA39cQUcqmjBVITE8MbOdX7Lx6ge8&r=n_Y76F1vumEs9EYNHN2gzA5FD9jzyPhrzl3eOzxCHIQ&m=Vswqzh2TD_CL1r9kiCwjwL16KtOxW26uq4agbMQhfiQ&s=f2e4kVNouFg3LpIMPb-7nvfQslbQkj9jqkn-q-lsO-I&e=>)
            Â  Â  Â  Â  if you time your execution with timings of the
            MPI calls you
            Â  Â  Â  Â  should most
            Â  Â  Â  Â  likely see immense increases in communication
            times as the processes
            Â  Â  Â  Â  waits all the time, test this if you want more
            clear proof!

            Â  Â  Â  Â  Bottomline, never use more MPI processors than
            you have physical
            Â  Â  Â  Â  processors.
            Â  Â  Â  Â  If you still want more explanations, turn to MPI
            developers for more
            Â  Â  Â  Â  technical details, all I can say, never use more
            MPI processors
            Â  Â  Â  Â  than you
            Â  Â  Â  Â  have physical cores.


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Best regards,

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Roberto


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  On 11/02/2015 05:50 PM, Nick Papior wrote:



            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  2015-11-02 21:37 GMT+01:00
            RCP <[email protected] <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>>
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>> <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>>>>:


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Thank you Nick
            and Salvador for your comments.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ So Nick,
            basically you're saying that
            Â  Â  Â  Â  diagonalization time
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  might
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ be playing no
            role. That is at variance, for
            Â  Â  Â  Â  instance, with
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Wien2k,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ where
            diagonalization is the most time
            Â  Â  Â  Â  consuming step. In fact,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ my expectation
            is correct for it; veryfied
            Â  Â  Â  Â  with a similar cell
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ and 9 k-points.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  No, I am definitely not
            saying that! But I have no
            Â  Â  Â  Â  idea about
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  how your
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  system is setup.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Diagonalization _is_ a big
            part of the computation.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  How have you specified the
            k-points? Is it 9 kpoints
            Â  Â  Â  Â  or 9
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  kpoints in the
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  monkhorst pack grid?


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ In that case
            "top" shows a first stage of 5
            Â  Â  Â  Â  processes
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  running at
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ about 4/5=80%
            CPU power (and more or less
            Â  Â  Â  Â  stable) and a 2nd
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  stage of
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ 4 procs,
            running at 100%. This is not MPI,
            Â  Â  Â  Â  but a parallel
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  strategy
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ based on
            scripts (hope you are aware).

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  wien2k is not siesta.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  If wien2k is script based,
            i.e. sequential running and
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  self-managing the
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  processes, then sure they
            behave _very_ differently
            Â  Â  Â  Â  and wien2k
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  should
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  give you the desired
            speedup. Your figures sounds like
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  hyperthreading to me.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ The same
            experiment performed with "mpirun
            Â  Â  Â  Â  -np 5 ..." and
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Siesta,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ shows more
            jumpy figures for CPU usage. One
            Â  Â  Â  Â  task might be
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  at 100%,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ another at 60%,
            and so on,Ãƒâ€šÃ‚Â  as if Linux
            Â  Â  Â  Â  were playing with
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  tasks
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ like a juggler.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  You are still implying usage
            of a quad core machine
            Â  Â  Â  Â  (quad == 4)
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  and 4<5.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  If you _only_ have 4
            processors (intel hyperthreads
            Â  Â  Â  Â  do _not_
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  count as a
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  processes) then your
            assumption is not correct.Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  How would you expect a
            speedup by using 1 more
            Â  Â  Â  Â  process than you
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  have on
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  your system?
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  If you see this juggling it
            sounds like quad == 4
            Â  Â  Â  Â  and not 5.


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ To give you
            some feeling, please look at the
            Â  Â  Â  Â  numbers here,


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Â  Â  Â  Â
            
-----------------------------------------------------------------------
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ * Running
            onÃƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  4 nodes in parallel

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Ãƒâ€š ...
            snipped ...

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ siesta:
            iscfÃƒâ€šÃ‚Â  Ãƒâ€š Eharris(eV)Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚
            Â  Â  Â  Â  Ãƒâ€šÃ‚Â  E_KS(eV)Ãƒâ€šÃ‚Â  Ãƒâ€š FreeEng(eV)Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Ãƒâ€š
            dDmaxÃƒâ€šÃ‚Â  Ef(eV)
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            siesta:Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  1Ãƒâ€šÃ‚Â  -124261.2908Ãƒâ€šÃ‚
            Â  Â  Â  Â  -124261.2891Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  -124261.2891Ãƒâ€šÃ‚Â  0.0001
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ -2.5494
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ timer:
            Routine,Calls,Time,% = IterSCFÃƒâ€šÃ‚
            Â  Â  Â  Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  1Ãƒâ€šÃ‚Â  Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  1637.906Ãƒâ€šÃ‚Â  99.72
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ elaps:
            Routine,Calls,Wall,% = IterSCFÃƒâ€šÃ‚
            Â  Â  Â  Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  1Ãƒâ€šÃ‚Â
            Ãƒâ€šÃ‚Â  Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  410.919
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ 99.72
            <tel:410.919%20%2099.72>


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ * Running
            onÃƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  5 nodes in parallel

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Ãƒâ€š ...
            snipped ...

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ siesta:
            iscfÃƒâ€šÃ‚Â  Ãƒâ€š Eharris(eV)Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚
            Â  Â  Â  Â  Ãƒâ€šÃ‚Â  E_KS(eV)Ãƒâ€šÃ‚Â  Ãƒâ€š FreeEng(eV)Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Ãƒâ€š
            dDmaxÃƒâ€šÃ‚Â  Ef(eV)
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            siesta:Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  1Ãƒâ€šÃ‚Â  -124261.2908Ãƒâ€šÃ‚
            Â  Â  Â  Â  -124261.2891Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  -124261.2891Ãƒâ€šÃ‚Â  0.0001
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ -2.5494
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ timer:
            Routine,Calls,Time,% = IterSCFÃƒâ€šÃ‚
            Â  Â  Â  Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  1Ãƒâ€šÃ‚Â  Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  1654.558Ãƒâ€šÃ‚Â  99.64
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ elaps:
            Routine,Calls,Wall,% = IterSCFÃƒâ€šÃ‚
            Â  Â  Â  Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  1Ãƒâ€šÃ‚Â
            Ãƒâ€šÃ‚Â  Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  415.150Ãƒâ€š
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ 99.64

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Â  Â  Â  Â
            
------------------------------------------------------------------------

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Those elapsed
            times are so close ... there
            Â  Â  Â  Â  must be an easy
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  explanation.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Yes, if you are using mpirun
            -np 5 on a quad core
            Â  Â  Â  Â  machine, then the
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  explanation is easy and your
            numbers are irrelevant.Ãƒâ€š


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Best,

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Roberto


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ On 11/02/2015
            04:14 PM, Nick Papior wrote:

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Basically:
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Diag.ParallelOverK false
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€žÃ¢â€šÂ¬ uses scalapack to diagonalize
            Â  Â  Â  Â  the Hamiltonian
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Diag.ParallelOverK true
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€žÃ¢â€šÂ¬ uses lapack to diagonalize the
            Â  Â  Â  Â  Hamiltonian

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ If
            you have a very large system, you
            Â  Â  Â  Â  will not get
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  anything out
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ of using
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ the
            latter option (rather than using
            Â  Â  Â  Â  an enormous amount
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  of memory).
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Only
            for an _extreme_ number of
            Â  Â  Â  Â  k-points are the latter
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  favourable,
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ there
            are exceptions.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ The
            latter is intended for small bulk
            Â  Â  Â  Â  calculations with
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  many
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ k-points.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Lastly, you have a quad core machine
            Â  Â  Â  Â  and run mpirun -np
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  5, and
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ expect
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ that
            to run faster. That is a wrong
            Â  Â  Â  Â  assumption.Ãƒâ€žÃ¢â€šÂ¬
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Secondly diagonalization is not
            Â  Â  Â  Â  everything in the
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  program, check
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ your
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ TIMES
            file to figure out whether it
            Â  Â  Â  Â  _is_ the
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  diagonalization or
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ a
            mixture.Ãƒâ€žÃ¢â€šÂ¬


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            2015-11-02 19:42 GMT+01:00 RCP
            Â  Â  Â  Â  <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>>
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            <mailto:[email protected] <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>> <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>>>
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            <mailto:[email protected] <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>> <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  <mailto:[email protected]
            <mailto:[email protected]>
            Â  Â  Â  Â  <mailto:[email protected]
            <mailto:[email protected]>>>>>>:


            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Dear everyone,

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  I seem to have a
            Â  Â  Â  Â  misunderstanding on how the
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Diag.ParallellOverK
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  feature works, any comment
            Â  Â  Â  Â  would be much appreciated.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  I've got a large metallic
            Â  Â  Â  Â  cell, though still with 9
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            k-points, that
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  runs on a quad PC; moreover,
            Â  Â  Â  Â  routine diagkp shows
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  k-points are
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  distributed round robin
            Â  Â  Â  Â  among processes. Thus I
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  was expecting
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  "mpirun -np 5 ..." to run
            Â  Â  Â  Â  significantly faster than
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            "mpirun -np 4 ...",
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  as judged from the elapsed
            Â  Â  Â  Â  time of individual scf
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  steps.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Clearly, in the latter case,
            Â  Â  Â  Â  the 9th k-point
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  would be taken by
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  process 0 while the other
            Â  Â  Â  Â  three would remain
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  waiting, right?.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  However, my exppectations
            Â  Â  Â  Â  turned out to be wrong;
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  in fact the
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  2nd alternative appears to
            Â  Â  Â  Â  be a tiny bit faster.
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Why ?.

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Thanks in advance,

            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Ãƒâ€šÃ‚Â  Ãƒâ€šÃ‚Â  Roberto P.




            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ --
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ Kind
            regards Nick




            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  --
            Â  Â  Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Kind regards Nick




            Â  Â  Â  Â  --
            Â  Â  Â  Â  Kind regards Nick


            Â  Â  --

            Â  Â
            
|---------------------------------------------------------------------|
            Â  Â  |Ã‚Â  Ã‚ Dr. Roberto C. PasianotÃ‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Phone: 54 11 4839 6709Ã‚
            Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  |
            Â  Â  |Ã‚Â  Ã‚ Gcia. Materiales, CAC-CNEAÃ‚Â  Ã‚Â  Ã‚Â
            FAXÃ‚Â  : 54 11 6772 7362Ã‚
            Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  |
            Â  Â  |Ã‚Â  Ã‚ Avda. Gral. Paz 1499Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â
            Ã‚Â  Email:
            Â  Â [email protected] <mailto:[email protected]>
            <mailto:[email protected]
            <mailto:[email protected]>>Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ |
            Â  Â  |Ã‚Â  Ã‚ 1650 San Martin, Buenos AiresÃ‚Â  Ã‚Â  Ã‚Â
            Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ |
            Â  Â  |Ã‚Â  Ã‚ ARGENTINAÃ‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â
            Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚
            Â  Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚Â  Ã‚ |
            Â  Â
            
|---------------------------------------------------------------------|




            --
            Kind regards Nick




    --
    Kind regards Nick




--
Kind regards Nick


--

|---------------------------------------------------------------------|
|   Dr. Roberto C. Pasianot         Phone: 54 11 4839 6709            |
|   Gcia. Materiales, CAC-CNEA      FAX  : 54 11 6772 7362            |
|   Avda. Gral. Paz 1499            Email: [email protected]       |
|   1650 San Martin, Buenos Aires                                     |
|   ARGENTINA                                                         |
|---------------------------------------------------------------------|

Re: [SIESTA-L] Puzzled about ParallelOverK feature

Responder a