Re: [Beowulf] immersion

2024-04-07 Thread Scott Atchley
On Sun, Mar 24, 2024 at 2:38 PM Michael DiDomenico wrote: > i'm curious if others think DLC might hit a power limit sooner or later, > like Air cooling already has, given chips keep climbing in watts. > What I am worried about is power per blade/node. The Cray EX design used in Frontier has a

Re: [Beowulf] immersion

2024-03-24 Thread Scott Atchley
olution like the HPE stuff > The ORv3 rack design's maximum power is the number of power shelves times the power per shelf. Reach out to me directly at @ ornl.gov and I can connect you with some vendors. > > > On Sun, Mar 24, 2024 at 1:46 PM Scott Atchley > wrote: > >> On Sat,

Re: [Beowulf] immersion

2024-03-24 Thread Scott Atchley
On Sat, Mar 23, 2024 at 10:40 AM Michael DiDomenico wrote: > i'm curious to know > > 1 how many servers per vat or U > 2 i saw a slide mention 1500w/sqft, can you break that number into kw per > vat? > 3 can you shed any light on the heat exchanger system? it looks like > there's just two pipes

Re: [Beowulf] [External] anyone have modern interconnect metrics?

2024-01-22 Thread Scott Atchley
On Mon, Jan 22, 2024 at 11:16 AM Prentice Bisbal wrote: > > >> > Another interesting topic is that nodes are becoming many-core - any >> > thoughts? >> >> Core counts are getting too high to be of use in HPC. High core-count >> processors sound great until you realize that all those cores are

Re: [Beowulf] [External] anyone have modern interconnect metrics?

2024-01-20 Thread Scott Atchley
On Fri, Jan 19, 2024 at 9:40 PM Prentice Bisbal via Beowulf < beowulf@beowulf.org> wrote: > > Yes, someone is sure to say "don't try characterizing all that stuff - > > it's your application's performance that matters!" Alas, we're a generic > > "any kind of research computing" organization, so

Re: [Beowulf] [EXTERNAL] Re: anyone have modern interconnect metrics?

2024-01-18 Thread Scott Atchley
s you’d solder to a board. And > there are plenty of XAUI->optical kinds of interfaces. And optical cables > are cheap and relatively rugged. > > > > > > *From:* Beowulf *On Behalf Of *Scott Atchley > *Sent:* Wednesday, January 17, 2024 7:18 AM > *To:* Larry Stewart

Re: [Beowulf] anyone have modern interconnect metrics?

2024-01-17 Thread Scott Atchley
While I was at Myricom, the founder, Chuck Seitz, used to say that there was Ethernet and Ethernot. He tied Myricom's fate to Ethernet's 10G PHYs. On Wed, Jan 17, 2024 at 9:08 AM Larry Stewart wrote: > I don't know what the networking technology of the future will be like, > but it will be

Re: [Beowulf] anyone have modern interconnect metrics?

2024-01-17 Thread Scott Atchley
I don't think that UE networks are available yet. On Wed, Jan 17, 2024 at 3:13 AM Jan Wender via Beowulf wrote: > Hi Mark, hi all, > > The limitations of Ethernet seem to be recognised by many participants in > the network area. That is the reason for the founding of the Ultra-Ethernet >

Re: [Beowulf] Checkpointing MPI applications

2023-03-27 Thread Scott Atchley
On Thu, Mar 23, 2023 at 3:46 PM Christopher Samuel wrote: > On 2/19/23 10:26 am, Scott Atchley wrote: > > > We are looking at SCR for Frontier with the idea that users can store > > checkpoints on the node-local drives with replication to a buddy node. > > SCR will manage

Re: [Beowulf] Checkpointing MPI applications

2023-02-19 Thread Scott Atchley
Hi Chris, It looks like it tries to checkpoint application state without checkpointing the application or its libraries (including MPI). I am curious if the checkpoint sizes are similar or significantly larger to the application's typical outputs/checkpoints. If they are much larger, the time to

Re: [Beowulf] Top 5 reasons why mailing lists are better than Twitter

2022-11-21 Thread Scott Atchley
We have OpenMPI running on Frontier with libfabric. We are using HPE's CXI (Cray eXascale Interface) provider instead of RoCE though. On Sat, Nov 19, 2022 at 2:57 AM Matthew Wallis via Beowulf < beowulf@beowulf.org> wrote: > > > ;-) > > 1. Less spam. > 2. Private DMs, just email the person. >

Re: [Beowulf] likwid vs stream (after HPCG discussion)

2022-03-20 Thread Scott Atchley
On Sat, Mar 19, 2022 at 6:29 AM Mikhail Kuzminsky wrote: > If so, it turns out that for the HPC user, stream gives a more > important estimate - the application is translated by the compiler > (they do not write in assembler - except for modules from mathematical > libraries), and stream will

Re: [Beowulf] Data Destruction

2021-09-29 Thread Scott Atchley
> On 9/29/2021 10:06 AM, Scott Atchley wrote: > > Are you asking about selectively deleting data from a parallel file system > (PFS) or destroying drives after removal from the system either due to > failure or system decommissioning? > > For the latter, DOE does not allow us t

Re: [Beowulf] Data Destruction

2021-09-29 Thread Scott Atchley
Are you asking about selectively deleting data from a parallel file system (PFS) or destroying drives after removal from the system either due to failure or system decommissioning? For the latter, DOE does not allow us to send any non-volatile media offsite once it has had user data on it. When

Re: [Beowulf] AMD and AVX512

2021-06-16 Thread Scott Atchley
On Wed, Jun 16, 2021 at 1:15 PM Prentice Bisbal via Beowulf < beowulf@beowulf.org> wrote: > Did anyone else attend this webinar panel discussion with AMD hosted by > HPCWire yesterday? It was titled "AMD HPC Solutions: Enabling Your > Success in HPC" > >

Re: [Beowulf] Project Heron at the Sanger Institute [EXT]

2021-02-04 Thread Scott Atchley
On Thu, Feb 4, 2021 at 9:23 AM Jörg Saßmannshausen < sassy-w...@sassy.formativ.net> wrote: > One of the things I heard a few times is the use of GPUs for the analysis. > Is > that something you are doing as well? ORNL definitely is. We were the first to contribute cycles to the COVID-19 HPC

Re: [Beowulf] Julia on POWER9?

2020-10-16 Thread Scott Atchley
% hostname -f login1.summit.olcf.ornl.gov % module avail |& grep julia forge/19.0.4 ibm-wml-ce/1.6.1-1 *julia*/1.4.2 (E)ppt/2.4.0-beta2 (D)vampir/9.5.0 (D) [*atchley*@*login1*]*~ *% module avail julia

Re: [Beowulf] Best case performance of HPL on EPYC 7742 processor ...

2020-08-17 Thread Scott Atchley
I do not have any specific HPL hints. I would suggest setting the BIOS to NUMAs-Per-Socket to 4 (NSP-4). I would try running 16 processes, one per CCX - two per CCD, with an OpenMP depth of 4. Dell's HPC blog has a few articles on tuning Rome:

Re: [Beowulf] Power per area

2020-03-11 Thread Scott Atchley
1.035 in Perth :) > > Come and have a look at our Houston DC :) > > > > On Wed, Mar 11, 2020 at 3:37 AM Scott Atchley > wrote: > >> Hi everyone, >> >> I am wondering whether immersion cooling makes sense. We are most limited >> by datacenter floor spac

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
e to connect you with their founder/inventor. > > --Jeff > > On Tue, Mar 10, 2020 at 1:08 PM Scott Atchley > wrote: > >> Hi Jeff, >> >> Interesting, I have not seen this yet. >> >> Looking at their 52 kW rack's dimensions, it works out to 3.7 kW/ft^2

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
e that helps a bit. > > Jörg > > Am Dienstag, 10. März 2020, 20:26:18 GMT schrieb David Mathog: > > On Tue, 10 Mar 2020 15:36:42 -0400 Scott Atchley wrote: > > > To make the exercise even more fun, what is the weight per square foot > > > for > > &g

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
On Tue, Mar 10, 2020 at 4:26 PM David Mathog wrote: > On Tue, 10 Mar 2020 15:36:42 -0400 Scott Atchley wrote: > > > To make the exercise even more fun, what is the weight per square foot > > for > > immersion systems? Our data centers have a limit of 250 or 500 pounds

Re: [Beowulf] Power per area

2020-03-10 Thread Scott Atchley
iego. > > https://ddcontrol.com/ > > --Jeff > > On Tue, Mar 10, 2020 at 12:37 PM Scott Atchley > wrote: > >> Hi everyone, >> >> I am wondering whether immersion cooling makes sense. We are most limited >> by datacenter floor space. We can manage to bring in mo

[Beowulf] Power per area

2020-03-10 Thread Scott Atchley
Hi everyone, I am wondering whether immersion cooling makes sense. We are most limited by datacenter floor space. We can manage to bring in more power (up to 40 MW for Frontier) and install more cooling towers (ditto), but we cannot simply add datacenter space. We have asked to build new building

Re: [Beowulf] Interactive vs batch, and schedulers

2020-01-17 Thread Scott Atchley
Hi Jim, While we allow both batch and interactive, the scheduler handles them the same. The scheduler uses queue time, node count, requested wall time, project id, and others to determine when items run. We have backfill turned on so that when the scheduler allocates a large job and the time to

Re: [Beowulf] HPC demo

2020-01-14 Thread Scott Atchley
when I was on a project in Oak Ridge. Was that it? > > > > John McCulloch | PCPC Direct, Ltd. | desk 713-344-0923 > > > > *From:* Scott Atchley > *Sent:* Tuesday, January 14, 2020 7:19 AM > *To:* John McCulloch > *Cc:* beowulf@beowulf.org > *Subject:* Re: [Beowulf] HPC de

Re: [Beowulf] HPC demo

2020-01-14 Thread Scott Atchley
We still have Tiny Titan even though Titan is gone. It allows users to toggle processors on and off and the display has a mode where the "water" is colored coded by the processor, which has a corresponding light. You can see the frame rate go up as you add processors

Re: [Beowulf] traverse @ princeton

2019-10-11 Thread Scott Atchley
es of > v3, we get full EDR to both CPU sockets. > > Bill > > On 10/10/19 12:57 PM, Scott Atchley wrote: > > That is better than 80% peak, nice. > > > > Is it three racks of 15 nodes? Or two racks of 18 and 9 in the third > rack? > > > > You went with

Re: [Beowulf] traverse @ princeton

2019-10-10 Thread Scott Atchley
That is better than 80% peak, nice. Is it three racks of 15 nodes? Or two racks of 18 and 9 in the third rack? You went with a single-port HCA per socket and not the shared, dual-port HCA in the shared PCIe slot? On Thu, Oct 10, 2019 at 8:48 AM Bill Wichser wrote: > Thanks for the kind words.

[Beowulf] Exascale Day (10/18 aka 10^18)

2019-10-04 Thread Scott Atchley
Cray is hosting an online panel with speakers from ANL, LLNL, ORNL, ECP, and Cray on Oct. 18: https://www.cray.com/resources/exascale-day-panel-discussion ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your

Re: [Beowulf] [EXTERNAL] Re: HPE completes Cray acquisition

2019-09-27 Thread Scott Atchley
Cray: This one goes up to 10^18 On Fri, Sep 27, 2019 at 12:08 PM Christopher Samuel wrote: > On 9/27/19 7:40 AM, Lux, Jim (US 337K) via Beowulf wrote: > > > “A HPE company” seems sort of bloodless and corporate. I would kind of > > hope for something like “CRAY – How Fast Do You Want to Go?”

Re: [Beowulf] HPE completes Cray acquisition

2019-09-25 Thread Scott Atchley
These companies compliment each other. HPE is working on some very cool technologies and their purchasing power should help reduce costs. Cray has experience with the leadership-scale systems, several generations of HPC interconnects, and optimizing scientific software. We are waiting to find out

Re: [Beowulf] Titan is no more

2019-08-05 Thread Scott Atchley
111_titan-ornl-olcf-activity-6563128357154762753-ce3m > > > On Sun, Aug 4, 2019 at 1:19 PM Scott Atchley > wrote: > >> Hi everyone, >> >> Titan completed it last job Friday and was powered down at 1 pm. I >> imagine the room was a lot quieter after that. &g

[Beowulf] Titan is no more

2019-08-04 Thread Scott Atchley
Hi everyone, Titan completed it last job Friday and was powered down at 1 pm. I imagine the room was a lot quieter after that. Once Titan and the other systems in the room are removed, work will begin on putting in the new, stronger floor that will hold Frontier. Scott

Re: [Beowulf] HPE to acquire Cray

2019-05-20 Thread Scott Atchley
Geez, I take one day of vacation and this happens. My phone was lit up all day. On Fri, May 17, 2019 at 1:20 AM Kilian Cavalotti < kilian.cavalotti.w...@gmail.com> wrote: > >

Re: [Beowulf] LFortran ... a REPL/Compiler for Fortran

2019-03-25 Thread Scott Atchley
Hmm, how does this compare to Flang ? On Sun, Mar 24, 2019 at 12:33 PM Joe Landman wrote: > See https://docs.lfortran.org/ . Figured Jeff Layton would like this :D > > > -- > Joe Landman > e: joe.land...@gmail.com > t: @hpcjoe > w:

Re: [Beowulf] Large amounts of data to store and process

2019-03-13 Thread Scott Atchley
I agree with your take about slower progress on the hardware front and that software has to improve. DOE funds several vendors to do research to improve technologies that will hopefully benefit HPC, in particular, as well as the general market. I am reviewing a vendor's latest report on

Re: [Beowulf] Introduction and question

2019-02-23 Thread Scott Atchley
Yes, you belong. :-) On Sat, Feb 23, 2019 at 9:41 AM Will Dennis wrote: > Hi folks, > > > > I thought I’d give a brief introduction, and see if this list is a good > fit for my questions that I have about my HPC-“ish” infrastructure... > > > > I am a ~30yr sysadmin (“jack-of-all-trades” type),

Re: [Beowulf] Simulation for clusters performance

2019-01-04 Thread Scott Atchley
You may also want to look at Sandia's Structural Simulation Toolkit and Argonne's CODES . On Thu, Jan 3, 2019 at 6:26 PM Benson Muite wrote: > There are a number of tools. A possible starting point is: > >

Re: [Beowulf] If you can help ...

2018-11-09 Thread Scott Atchley
Done and I reposted your request on LinkedIn as well. On Fri, Nov 9, 2018 at 8:28 AM Douglas Eadline wrote: > > Everyone: > > This is a difficult email to write. For years we (Lara Kisielewska, > Tim Wilcox, Don Becker, myself, and many others) have organized > and staffed the Beowulf Bash each

Re: [Beowulf] If I were specifying a new custer...

2018-10-11 Thread Scott Atchley
What do your apps need? • Lots of memory? Perhaps Power9 or Naples with 8 memory channels? Also, Cavium ThunderX2. • More memory bandwidth? Same as above. • Max single thread performance? Intel or Power9? • Are your apps GPU enabled? If not, do you have budget/time/expertise to do the work?

Re: [Beowulf] New Spectre attacks - no software mitigation - what impact for HPC?

2018-07-17 Thread Scott Atchley
seems to have got mixed in there also! > The main thrust dual ARM based and RISC-V > > Also I like the plexiglass air shroud pictured at Barcelona. I saw > something similar at the HPE centre in Grenoble. > Damn good idea. > > > > > > > > On 17 July 201

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-10 Thread Scott Atchley
On Sun, Jun 10, 2018 at 4:53 AM, Chris Samuel wrote: > On Sunday, 10 June 2018 1:22:07 AM AEST Scott Atchley wrote: > > > Hi Chris, > > Hey Scott, > > > We have looked at this _a_ _lot_ on Titan: > > > > A Multi-faceted Approach to Job Placement for Im

Re: [Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

2018-06-09 Thread Scott Atchley
Hi Chris, We have looked at this _a_ _lot_ on Titan: A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems https://ieeexplore.ieee.org/document/7877165/ This issue we have is small jobs "inside" large jobs interfering with the larger jobs. The item that is

Re: [Beowulf] HPC Systems Engineer Positions

2018-06-01 Thread Scott Atchley
We have three HPC Systems Engineer positions open in the Technology Integration group within the National Center for Computational Science at ORNL. All are available from http://jobs.ornl.gov. On Fri, Jun 1, 2018 at 9:20 AM, Mahmood Sayed wrote: > Hello fellow HPC community. > > I have

Re: [Beowulf] Heterogeneity in a tiny (two-system cluster)?

2018-02-16 Thread Scott Atchley
If it is memory bandwidth limited, you may want to consider AMD's EPYC which has 33% more bandwidth. On Fri, Feb 16, 2018 at 3:41 AM, John Hearns via Beowulf < beowulf@beowulf.org> wrote: > Oh, and while you are at it. > DO a bit of investigation on how the FVCOM model is optimised for use with

Re: [Beowulf] Intel kills Knights Hill, Xeon Phi line "being revised"

2017-11-18 Thread Scott Atchley
ween labs is just too much nonsense. If these research > projects were a start-up, it would have failed hard. > > [1] https://en.wikipedia.org/wiki/X87 > > > > On Sat, Nov 18, 2017 at 8:50 PM, Scott Atchley <e.scott.atch...@gmail.com> > wrote: > >> Hmm, c

Re: [Beowulf] Intel kills Knights Hill, Xeon Phi line "being revised"

2017-11-18 Thread Scott Atchley
Hmm, can you name a large processor vendor who has not accepted US government research funding in the last five years? See DOE's FastForward, FastForward2, DesignForward, DesignForward2, and now PathForward. On Fri, Nov 17, 2017 at 9:18 PM, Jonathan Engwall < engwalljonathanther...@gmail.com>

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-13 Thread Scott Atchley
Are you logging something goes to the disk in the local case, but that is competing for network bandwidth when NFS mounting? On Wed, Sep 13, 2017 at 2:15 PM, Scott Atchley <e.scott.atch...@gmail.com> wrote: > Are you swapping? > > On Wed, Sep 13, 2017 at 2:14 PM, Andrew Latham <

Re: [Beowulf] Varying performance across identical cluster nodes.

2017-09-13 Thread Scott Atchley
Are you swapping? On Wed, Sep 13, 2017 at 2:14 PM, Andrew Latham wrote: > ack, so maybe validate you can reproduce with another nfs root. Maybe a > lab setup where a single server is serving nfs root to the node. If you > could reproduce in that way then it would give some

Re: [Beowulf] Poor bandwith from one compute node

2017-08-17 Thread Scott Atchley
I would agree that the bandwidth points at 1 GigE in this case. For IB/OPA cards running slower than expected, I would recommend ensuring that they are using the correct amount of PCIe lanes. On Thu, Aug 17, 2017 at 12:35 PM, Joe Landman wrote: > > > On 08/17/2017 12:00

Re: [Beowulf] Hyperthreading and 'OS jitter'

2017-07-22 Thread Scott Atchley
I would imagine the answer is "It depends". If the application uses the per-CPU caches effectively, then performance may drop when HT shares the cache between the two processes. We are looking at reserving a couple of cores per node on Summit to run system daemons if the use requests. If the user

Re: [Beowulf] Register article on Epyc

2017-06-22 Thread Scott Atchley
Hi Mark, I agree that these are slightly noticeable but they are far less than accessing a NIC on the "wrong" socket, etc. Scott On Thu, Jun 22, 2017 at 9:26 AM, Mark Hahn wrote: > But now, with 20+ core CPUs, does it still really make sense to have >> dual socket systems

Re: [Beowulf] Register article on Epyc

2017-06-21 Thread Scott Atchley
In addition to storage, if you use GPUs for compute, the single socket is compelling. If you rely on the GPUs for the parallel processing, then the CPUs are just for serial acceleration and handling I/O. A single socket with 32 cores and 128 lanes of PCIe can handle up to eight GPUs with four CPU

Re: [Beowulf] Register article on Epyc

2017-06-21 Thread Scott Atchley
The single socket versions make sense for storage boxes that can use RDMA. You can have two EDR ports out the front using 16 lanes each. For the storage, you can have 32-64 lanes internally or out the back for NVMe. You even have enough lanes for two ports of HDR, when it is ready, and 48-64 lanes

Re: [Beowulf] Suggestions to what DFS to use

2017-02-15 Thread Scott Atchley
Hi Chris, Check with me in about a year. After using Lustre for over 10 years to initially serve ~10 PB of disk and now serve 30+ PB with very nice DDN gear, later this year we will be installing 320 PB (250 PB useable) of GPFS (via IBM ESS storage units) to support Summit, our next gen HPC

Re: [Beowulf] genz interconnect?

2016-10-12 Thread Scott Atchley
The Gen-Z site looks like it has a detailed FAQ. The CCIX FAQ is a little more sparse. The ARM link you posted is a good overview. On Wed, Oct 12, 2016 at 8:11 AM, Michael Di Domenico wrote: > anyone have any info on this? there isn't much out there on the web. > the

Re: [Beowulf] AMD cards with integrated SSD slots

2016-07-27 Thread Scott Atchley
None have AMD CPUs? Number three Titan has AMD Interlagos CPUs and NVIDIA GPUs. Given that the Fiji can access HBM at 512 GB/s, accessing NVM at 4 GB/s will feel rather slow albeit much better than 1-2 GB/s connected to the motherboard's PCIe. On Wed, Jul 27, 2016 at 5:53 PM, Brian Oborn

Re: [Beowulf] NFS HPC survey results.

2016-07-22 Thread Scott Atchley
Did you mean IB over Ethernet (IBoE)? I thought IB over IP has been around long before RoCE. On Thu, Jul 21, 2016 at 7:34 PM, Christopher Samuel wrote: > Thanks so much Bill, very much appreciated! > > On 21/07/16 09:19, Bill Broadley wrote: > > > 15) If IB what transport

[Beowulf] Anyone using Apache Mesos?

2015-11-11 Thread Scott Atchley
Someone asked me and I said I would ask around. Thanks, Scott ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Semour Cray 90th Anniversary

2015-10-14 Thread Scott Atchley
On Wed, Oct 14, 2015 at 3:58 PM, Prentice Bisbal < prentice.bis...@rutgers.edu> wrote: > On 10/03/2015 01:54 PM, Nathan Pimental wrote: > > Very nice article. Are cray computers still made, and how popular are > they? How pricey are they? :) > > > Yes, Argonne National Lab (ANL) announced in

Re: [Beowulf] Accelio

2015-08-20 Thread Scott Atchley
They are using this as a basis for the XioMessenger within Ceph to get RDMA support. On Thu, Aug 20, 2015 at 9:24 AM, John Hearns john.hea...@xma.co.uk wrote: I saw this mentioned on the Mellanox site. Has anyone come across it: http://www.accelio.org/ Looks interesting. Dr.

[Beowulf] China aims for 100 PF

2015-07-17 Thread Scott Atchley
They will use a homegrown GPDSP (general purpose DSP) accelerator in lieu of the Intel Knights Landing accelerators: http://www.theplatform.net/2015/07/15/china-intercepts-u-s-intel-restrictions-with-homegrown-supercomputer-chips/ Also, hints about their interconnect and file system upgrades.

Re: [Beowulf] Paper describing Google's queuing system Borg

2015-04-21 Thread Scott Atchley
Is Omega the successor? The Borg paper mentions Omega : Omega [69] supports multiple parallel, specialized “verti- cals” that are each roughly equivalent to a Borgmaster minus its persistent store and link shards. Omega schedulers use optimistic concurrency control to manipulate a shared repre-

Re: [Beowulf] interesting article on HPC vs evolution of 'big data' analysis

2015-04-09 Thread Scott Atchley
On Wed, Apr 8, 2015 at 9:56 PM, Greg Lindahl lind...@pbm.com wrote: On Wed, Apr 08, 2015 at 03:57:34PM -0400, Scott Atchley wrote: There is concern by some and outright declaration by others (including hardware vendors) that MPI will not scale to exascale due to issues like rank state

Re: [Beowulf] CephFS

2015-04-09 Thread Scott Atchley
No, but you might find this interesting: http://dl.acm.org/citation.cfm?id=2538562 On Thu, Apr 9, 2015 at 11:24 AM, Tom Harvill u...@harvill.net wrote: Hello, Question: is anyone on this list using CephFS in 'production'? If so, what are you using it for (ie. scratch/tmp, archive,

Re: [Beowulf] interesting article on HPC vs evolution of 'big data' analysis

2015-04-08 Thread Scott Atchley
There is concern by some and outright declaration by others (including hardware vendors) that MPI will not scale to exascale due to issues like rank state growing too large for 10-100 million endpoints, lack of reliability, etc. Those that make this claim then offer up their favorite solution (a

Re: [Beowulf] Mellanox Multi-host

2015-03-11 Thread Scott Atchley
Looking at this and the above link: http://www.mellanox.com/page/press_release_item?id=1501 It seems that the OCP Yosemite is a motherboard that allows four compute cards to be plugged into it. The compute cards can even have different CPUs (x86, ARM, Power). The Yosemite board has the NIC and

[Beowulf] Summit

2014-11-14 Thread Scott Atchley
This is what's next: https://www.olcf.ornl.gov/summit/ Scott ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] 10Gb/s iperf test point (TCP) available ?

2010-10-15 Thread Scott Atchley
On Oct 14, 2010, at 10:37 PM, Christopher Samuel wrote: Apologies if this is off topic, but I'm trying to check what speeds the login nodes to our cluster and BlueGene can talk at and the only 10Gb/s iperf server I've been given access to so far (run by AARNET) showed me just under 1Gb/s.

Re: [Beowulf] 48-port 10gig switches?

2010-09-02 Thread Scott Atchley
On Sep 2, 2010, at 12:58 PM, David Mathog wrote: A lot of 1 GbE switches use around 15W/port so I thought 10 GbE switches would be real fire breathers. It doesn't look that way though, the power consumption cited here:

Re: [Beowulf] OT: recoverable optical media archive format?

2010-06-10 Thread Scott Atchley
On Jun 10, 2010, at 3:20 PM, David Mathog wrote: Jesse Becker and others suggested: http://users.softlab.ntua.gr/~ttsiod/rsbep.html I tried it and it works, mostly, but definitely has some warts. To start with I gave it a negative control - a file so badly corrupted it should NOT

Re: [Beowulf] Q: IB message rate large core counts (per node)?

2010-02-24 Thread Scott Atchley
On Feb 23, 2010, at 6:16 PM, Brice Goglin wrote: Greg Lindahl wrote: now that I'm inventorying ignorance, I don't really understand why RDMA always seems to be presented as a big hardware issue. wouldn't it be pretty easy to define an eth or IP-level protocol to do remote puts, gets, even

Re: [Beowulf] which mpi library should I focus on?

2010-02-23 Thread Scott Atchley
On Feb 20, 2010, at 1:49 PM, Paul Johnson wrote: What are the reasons to prefer one or the other? Why choose? You can install both and test with your application to see if there is a performance difference (be sure to keep your runtime environment paths correct - don't mix libraries and

Re: [Beowulf] Performance tuning for Jumbo Frames

2009-12-15 Thread Scott Atchley
On Dec 14, 2009, at 12:57 PM, Alex Chekholko wrote: Set it as high as you can; there is no downside except ensuring all your devices are set to handle that large unit size. Typically, if the device doesn't support jumbo frames, it just drops the jumbo frames silently, which can result in

Re: [Beowulf] Re: scalability

2009-12-10 Thread Scott Atchley
On Dec 10, 2009, at 9:56 AM, Jörg Saßmannshausen wrote: I have heard of Open-MX before, do you need special hardware for that? No, any Ethernet driver on Linux. http://open-mx.org Scott ___ Beowulf mailing list, Beowulf@beowulf.org sponsored by

Re: [Beowulf] mpd ..failed ..!

2009-11-16 Thread Scott Atchley
On Nov 14, 2009, at 7:24 AM, Zain elabedin hammade wrote: I installed mpich2 - 1.1.1-1.fc11.i586.rpm . You should ask this on the mpich list at: https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss I wrote on every machine : mpd mpdtrace -l You started stand-alone MPD rings of size

Re: [Beowulf] large scratch space on cluster

2009-09-29 Thread Scott Atchley
On Sep 29, 2009, at 10:09 AM, Jörg Saßmannshausen wrote: However, I was wondering whether it does make any sense to somehow 'export' that scratch space to other nodes (4 cores only). So, the idea behind that is, if I need a vast amount of scratch space, I could use the one in the 8 core

Re: [Beowulf] large scratch space on cluster

2009-09-29 Thread Scott Atchley
On Sep 29, 2009, at 1:13 PM, Scott Atchley wrote: On Sep 29, 2009, at 10:09 AM, Jörg Saßmannshausen wrote: However, I was wondering whether it does make any sense to somehow 'export' that scratch space to other nodes (4 cores only). So, the idea behind that is, if I need a vast amount

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Scott Atchley
On Jun 29, 2009, at 12:10 PM, Dave Love wrote: When I test Open-MX, I turn interrupt coalescing off. I run omx_pingpong to determine the lowest latency (LL). If the NIC's driver allows one to specify the interrupt value, I set it to LL-1. Right, and that's what I did before, with sensible

Re: [Beowulf] Re: typical latencies for gigabit ethernet

2009-06-29 Thread Scott Atchley
On Jun 29, 2009, at 1:44 PM, Scott Atchley wrote: Right, and that's what I did before, with sensible results I thought. Repeating it now on Centos 5.2 and OpenSuSE 10.3, it doesn't behave sensibly, and I don't know what's different from the previous SuSE results apart, probably, from the minor

Re: [Beowulf] 10 GbE

2009-02-11 Thread Scott Atchley
On Feb 11, 2009, at 7:57 AM, Igor Kozin wrote: Hello everyone, we are embarking on evaluation of 10 GbE for HPC and I was wondering if someone has already had experience with Arista 7148SX 48 port switch or/and Netxen cards. General pros and cons would be greatly appreciated and in

Re: [Beowulf] tcp error: Need ideas!

2009-01-25 Thread Scott Atchley
On Jan 25, 2009, at 10:13 AM, Gerry Creager wrote: -bash-3.2# ethtool -K rx off no offload settings changed You missed the interface here. You should try: -bash-3.2# ethtool -K eth1 rx off -bash-3.2# ethtool -k eth1 Offload parameters for eth1: rx-checksumming: on tx-checksumming: on

Re: [Beowulf] Odd SuperMicro power off issues

2008-12-08 Thread Scott Atchley
Hi Chris, We had a customer with Opterons experience reboots with nothing in the logs, etc. The only thing we saw with ipmitool sel list was: 1 | 11/13/2007 | 10:49:44 | System Firmware Error | We traced to a HyperTransport deadlock, which by default reboots the node. Our engineer

Re: [Beowulf] Security issues

2008-10-27 Thread Scott Atchley
On Oct 25, 2008, at 11:17 PM, Marian Marinov wrote: Also a good security addition will be adding SELinux, RSBAC or GRSecurity to the kernel and actually using any of these. Bear in mind, that there may be performance trade-offs. Enabling SELinux will cut 2 Gb/s off a 10 Gb/s link as

Re: [Beowulf] Has DDR IB gone the way of the Dodo?

2008-10-03 Thread Scott Atchley
On Oct 3, 2008, at 2:24 PM, Bill Broadley wrote: QDR over fiber should be reasonably priced, here's hoping that the days of Myrinet 250MB/sec optical cables will return. Corrections/comments welcome. I am not in sales and I have no access to pricing besides our list prices, but I am told

Re: [Beowulf] scratch File system for small cluster

2008-09-25 Thread Scott Atchley
On Sep 25, 2008, at 10:19 AM, Joe Landman wrote: We have measured NFSoverRDMA speeds (on SDR IB at that) at 460 MB/s, on an RDMA adapter reporting 750 MB/s (in a 4x PCIe slot, so ~860 MB/ s max is what we should expect for this). Faster IB hardware should result in better performance,

Re: [Beowulf] Gigabit Ethernet and RDMA

2008-08-11 Thread Scott Atchley
Hi Gus, Are you trying to find software for NICs you currently have? Or are you looking for gigabit Ethernet NICs that natively support some form of kernel-bypass/zero-copy? I do not know of any of the latter (do Chelsio or others offer 1G NICs with iWarp?). As for the former, there

Re: [Beowulf] Roadrunner picture

2008-07-16 Thread Scott Atchley
On Jul 16, 2008, at 6:50 PM, John Hearns wrote: On Wed, 2008-07-16 at 23:29 +0100, John Hearns wrote: To answer your question more directly, Panasas is a storage cluster to complement your compute cluster. Each storage blade is connected into a shelf (chassis) with an internal ethernet

Re: [Beowulf] automount on high ports

2008-07-02 Thread Scott Atchley
On Jul 2, 2008, at 7:22 AM, Carsten Aulbert wrote: Bogdan Costescu wrote: Have you considered using a parallel file system ? We looked a bit into a few, but would love to get any input from anyone on that. What we found so far was not really convincing, e.g. glusterFS at that time was

Re: [Beowulf] automount on high ports

2008-07-02 Thread Scott Atchley
On Jul 2, 2008, at 10:09 AM, Gerry Creager wrote: Although I believe Lustre's robustness is very good these days, I do not believe that it will not work in your setting. I think that they currently do not recommend mounting a client on a node that is also working as a server as you are

Re: [Beowulf] How Can Microsoft's HPC Server Succeed?

2008-04-03 Thread Scott Atchley
On Apr 3, 2008, at 3:52 PM, Kyle Spaans wrote: On Wed, Apr 2, 2008 at 7:39 PM, Chris Dagdigian [EMAIL PROTECTED] wrote: spew out a terabyte per day of raw data and many times that stuff needs to be post processed and distilled down into different forms. A nice little 8-core box running a

Re: [Beowulf] Cheap SDR IB

2008-01-30 Thread Scott Atchley
On Jan 30, 2008, at 6:20 PM, Gilad Shainer wrote: For BW, Lx provides ~1400MB/s, EX is ~1500MB/s and ConnectX is ~1900MB/s uni-directional on PCIe Gen2. Feel free to contact me directly for more info. Gilad. My god, IB bandwidths always confuse me. :-) I thought IB SDR was 10 GB/s signal

Re: [Beowulf] Really efficient MPIs??

2007-11-28 Thread Scott Atchley
On Nov 28, 2007, at 8:49 AM, Charlie Peck wrote: On Nov 28, 2007, at 8:04 AM, Jeffrey B. Layton wrote: Unless you are using a gigabit ethernet, Open-MPI is noticeably less efficient that LAM-MPI over that fabric. I suspect at some point in the future gige will catch-up but for now my

Re: [Beowulf] Not quite Walmart, or, living without ECC?

2007-11-26 Thread Scott Atchley
On Nov 26, 2007, at 3:27 PM, David Mathog wrote: I ran a little test over the Thanksgiving holiday to see how common random errors in nonECC memory are. I used the memtest86+ bit fade test mode, which writes all 1s, waits 90 minutes, checks the result, then does the same thing for all 0s.

Re: [Beowulf] Problem with Single RAID disk larger than 2TB and Linux

2007-10-03 Thread Scott Atchley
Is someone using a signed int to represent the 1 KB blocks? 2 * 1024 * 1024 * 1024 * 1024 = 219902322 Scott On Oct 3, 2007, at 7:29 AM, Anand Vaidya wrote: Dear Beowulfers, We ran into a problem with large disks which I suspect is fairly common, however the usual solutions are not

Re: [Beowulf] Passwordless ssh - strange problem

2007-09-14 Thread Scott Atchley
On Sep 14, 2007, at 1:14 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Checked that - it's 700. On original host (ssh from) and target (ssh to), tun: $ ls -al ~/.ssh also, try: $ ssh -vvv target Please post back with results and contents of /etc/ssh/*config. Scott

Re: [Beowulf] MPI2007 out - strange pop2 results?

2007-07-21 Thread Scott Atchley
at ISC, you did exactly the same as what my dear friends from Qlogic did... sorry, I could not resist... G -Original Message- From: Scott Atchley [mailto:[EMAIL PROTECTED] Sent: Friday, July 20, 2007 6:21 PM To: Gilad Shainer Cc: Kevin Ball; beowulf@beowulf.org Subject: Re: [Beowulf

Re: [Beowulf] MPI2007 out - strange pop2 results?

2007-07-20 Thread Scott Atchley
Gilad, And you would never compare your products against our deprecated drivers and five year old hardware. ;-) Sorry, couldn't resist. My colleagues are rolling their eyes... Scot On Jul 20, 2007, at 2:55 PM, Gilad Shainer wrote: Hi Kevin, I believe that your company is using this list

Re: [Beowulf] Performance characterising a HPC application

2007-03-21 Thread Scott Atchley
On Mar 21, 2007, at 12:05 AM, Mark Hahn wrote: if the net is a bandwidth bottleneck, then you'd see lots of back- to-back packets, adding up to near wire-speed. if latency is the issue, you'll see relatively long delays between request and response (in NFS, for instance). my real point is

  1   2   >