Re: hw addr
Now I am confused. Please correct me when I am wrong. When I create guestlan of QDIO type, VM can handle it for its guests. It doesn't need real hipersocket. Guests use virtual NIC so it can work even when there are no real OSA adapters. Of course there is at least one for VM itself, but guestlan does not use it. My qdio.o is 1.145, qeth.o is 1.337.4.5. Those are the latest versions I know. During guest IPL I get qdio: loading QDIO base support version 2 ($Revision: 1.145 $/$Revision: 1.66.4.1 $) qdio: Was not able to determine general characteristics of all Hydras aboard. qeth: loading qeth S/390 OSA-Express driver ($Revision: 1.337.4.5 $/$Revision: 1.113.4.1 $/$Revision: 1.42.4.1 $:IPv6:VLAN) qeth: allocated 0 spare buffers qeth: Trying to use card with devnos 0x700/0x701/0x702 qeth: Device 0x700/0x701/0x702 is an OSD Express card (level: 2938) with link type Gigabit Eth (no portname needed by interface) qeth: VLAN not supported on eth0 qeth: IPv6 not supported on eth0 qeth: Could not set up broadcast filtering on eth0: 0x2, continuing eth0: no IPv6 routers present guestlan with static addresses work, only DHCP fails Am I right if I say it has nothing to do with real hw errors ? That it is sw error either in VM or in qdio/qeth drivers ? I don't have VM background and I didn't get answer on VM list regarding my receive/apply, maybe someone here would answer it ( although these two mailing lists have huge intersection :)) Can anybody explain this ? PTF: UM30652APAR: VM63172 -- Receive status: RECEIVED.10/24/03.06:20:56.MAINT Apply status: APPLIED.07/03/03.15:37:38.MAINT Thank you Marian --- Dennis Musselwhite [EMAIL PROTECTED] wrote: Hi, APAR VM63172 (PTF UM30652) provided or improved support for some functions that the Linux device drivers needed to better support applications like DHCP. Our focus at that time was the QDIO model because it included the necessary broadcast support. APAR VM63172 made it possible for the qeth driver to obtain the MAC address that we generated for the virtual NIC. I can imagine a couple of reasons why the HiperSockets connections might still show MAC of zeroes: (1) Older qeth drivers did not send the request for MAC address to our simulated adapters because of IPv6 capabilities. Apparently the hardware added this support along with IPv6 so our lack of IPv6 capability in z/VM 4.3.0 implied that we also could not handle the request for MAC address. The qeth developers released an update to fix this before VM63172 was released, so make sure you have recent qdio/qeth modules. (2) The real HiperSockets facility DOES NOT provide a MAC address at all. It is entirely possible that qeth does not bother sending the request to a HiperSockets adapter (real or simulated). If you define a TYPE QDIO adapter on the same guest and bring up the interface, the ifconfig command should report the same MAC address that you see with CP QUERY NIC details. That would indicate that your drivers are new enough to handle the request. Regards, Dennis Dennis Musselwhite ([EMAIL PROTECTED]) z/VM Development -- CP Network Simulation -- IBM Endicott NY = === Marian Gasparovic === The mere thought hadn't even begun to speculate about the merest possibility of crossing my mind. __ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/
GateD for Linux/390 SuSE 7?
Hello list, We've just moved to z/VM 4.4 and I'm having trouble with my SuSE7 Linux/390. VM's RouteD is having a problem with SuSE7's RouteD packets. We are running the VM RouteD using supply control RIP2B, but Linux is sending RIP1 packets. This is resulting in intermittent connections to the guest and eventually VM deletes the route entirely. IBM wants us to see if we can get the Linux to send RIP2 packets, which I believe only can be done with gateD. Is there an rpm available for SuSE 7 Linux? The connection will stay active if defined as a permanent route in etc gateways in VM, and not running routeD at all on Linux. This solution is working for now, but I'd like to have routeD/gateD running on the Linux guest so I can remove the permanent entry in etc gateways. Any suggestions will be apprectiated. Thanks! David Booher, Systems Programmer Development Support Manager Quest Software 4320 Winfield Rd, Suite 500 Warrenville, IL 60555 630.836.3196 http://www.quest.com
Re: GateD for Linux/390 SuSE 7?
You might try Zebra or its replacement Quagga. This implements just about all the routing protocols, and it can be configured to handle as many or as few as you want. David David Booher [EMAIL PROTECTED]To: [EMAIL PROTECTED] uest.com cc: Sent by: Linux Subject: GateD for Linux/390 SuSE 7? on 390 Port [EMAIL PROTECTED] ARIST.EDU 10/28/2003 02:59 PM Please respond to Linux on 390 Port Hello list, We've just moved to z/VM 4.4 and I'm having trouble with my SuSE7 Linux/390. VM's RouteD is having a problem with SuSE7's RouteD packets. We are running the VM RouteD using supply control RIP2B, but Linux is sending RIP1 packets. This is resulting in intermittent connections to the guest and eventually VM deletes the route entirely. IBM wants us to see if we can get the Linux to send RIP2 packets, which I believe only can be done with gateD. Is there an rpm available for SuSE 7 Linux? The connection will stay active if defined as a permanent route in etc gateways in VM, and not running routeD at all on Linux. This solution is working for now, but I'd like to have routeD/gateD running on the Linux guest so I can remove the permanent entry in etc gateways. Any suggestions will be apprectiated. Thanks! David Booher, Systems Programmer Development Support Manager Quest Software 4320 Winfield Rd, Suite 500 Warrenville, IL 60555 630.836.3196 http://www.quest.com
Re: GateD for Linux/390 SuSE 7?
IBM wants us to see if we can get the Linux to send RIP2 packets, which I believe only can be done with gateD. Is there an rpm available for SuSE 7 Linux? Use Zebra instead of gated. It supports RIP2. -- db
RH linux, qeth/qdio and such
I was able to load a more current version of the RH kernel, etc. and displays and response look much better. qeth/qdio appears to be in this kernel, IFCONFIG diplays eth0 with the data I entered, but I still can't get to the Linux guest. Admittedly, networking isn't my strong suit. I've talked to the network guys and they have helped me as much as they can. We both are confused by some of the terminology. I need define my network environ a little more and all of this is behind a firewall. We have a z800 with two osa express cards, therefore a total of 4 ports. Port 0(card 1) has 10.140.1.22 as it's ip addr Port 0(card 2) has 10.140.1.24. When we first installed the box I tried to put both of these on card 1, port 0 and port 1 respectively. I never could get it to work. Well come to find out and I may have the terminology wrong, you can't have two ip addrs with the same subnet on the same card. When I moved 10.140.1.24 to card 2, port 0, my original network started to work. Right or wrong it is working. However, if it is wrong, please tell me as I don't want to run into problems later on. To continue, I read some where that Linux should or must be installed in it's own subnet. I seem to remember this from the early days but had forgotten it. Is this still true? Thus the ip addr of 10.140.2.x Back to the linux config: I've listed below the prompts during network config when I start linux. Enter the IP address of you new Linux guest: 10.140.2.40 (No problem with this one) Enter the network address of the new Linux guest: 10.140.2.0 (I'm not 100% sure of this answer, from looking at countless examples, the 4th octet should be 0(zero).) Enter the netmask of the new Linux guest: 255.255.255.0 (No problem with this one) Enter teh broadcast address for the new Linux guest: 10.140.2.255 (Not 100% about this one either, most examples code it this way i.e., the 4th octet is 255) Enter the default gateway: 10.140.2.254 (This is where I get confused. a) should it be 255 instead of 254? If so, is the reply to the previous question wrong? or b) does it mean the default gateway for the rest of my network. In this case it would be 10.140.1.254 (but the 254/255 question still lingers). Enter your DNS server(s), seperated by colons( : ): 162.133.1.19:162.133.1.22 (This one isn't clear to me, i.e., does it want the DNS server name or the ip addr. I assume ip addr(s). It doesn't fuss about it. However, I've never entered the DNS server name(s)). Enter your DNS search domain(s) (if any), seperated by colons ( : ): vm.llic.com (I'm not sure about this one either, however it has to be one of two replies (for us) vm.llic.com or llic.com. I have tried llic.com and it doesn't seem to make a difference) I've included a new console listing below. Again, TIA Steve Gentry Lafayette Life Ins. Co. console Ready; redhat 004 FILES PURGED RDR FILE 0048 SENT FROM LNXRH01 PUN WAS 0048 RECS 039K CPY 001 A NOHOLD NOKEEP RDR FILE 0049 SENT FROM LNXRH01 PUN WAS 0049 RECS 0001 CPY 001 A NOHOLD NOKEEP RDR FILE 0050 SENT FROM LNXRH01 PUN WAS 0050 RECS 067K CPY 001 A NOHOLD NOKEEP 003 FILES CHANGED 003 FILES CHANGED Linux version 2.4.21-1.1931.2.399.ent ([EMAIL PROTECTED]) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-16)) #1 SM P Wed Aug 20 15:22:21 EDT 2003 We are running under VM (31 bit mode) This machine has no PFIX support This machine has an IEEE fpu On node 0 totalpages: 32768 zone(0): 32768 pages. zone(1): 0 pages. zone(2): 0 pages. Kernel command line: root=/dev/ram0 ro ip=off DASD=200-20F Highest subchannel number detected (hex) : 0012 Calibrating delay loop... 607.84 BogoMIPS Memory: 118480k/131072k available (2062k kernel code, 0k reserved, 547k data, 316k init) Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) Inode cache hash table entries: 8192 (order: 4, 65536 bytes) Mount cache hash table entries: 512 (order: 0, 4096 bytes) Buffer cache hash table entries: 8192 (order: 3, 32768 bytes) Page-cache hash table entries: 32768 (order: 5, 131072 bytes) debug: Initialization complete POSIX conformance testing by UNIFIX Detected 1 CPU's Boot cpu address 0 cpu 0 phys_idx=0 vers=FF ident=02107A machine=2066 unused= Starting migration thread for cpu 0 init_mach : starting machine check handler Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket mach_handler : ready mach_handler : waiting for wakeup Starting kswapd VFS: Disk quotas vdquot_6.5.1 aio_setup: num_physpages = 8192 aio_setup: sizeof(struct page) = 52 pty: 2048 Unix98 ptys configured NET4: Frame Diverter 0.46 RAMDISK driver initialized: 256 RAM disks of 16384K size 1024 blocksize md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE.
Re: RH linux, qeth/qdio and such
On Tue, 2003-10-28 at 09:56, Steve Gentry wrote: I was able to load a more current version of the RH kernel, etc Enter the default gateway: 10.140.2.254 (This is where I get confused. a) should it be 255 instead of 254? If so, is the reply to the previous question wrong? or b) does it mean the default gateway for the rest of my network. In this case it would be 10.140.1.254 (but the 254/255 question still lingers). If the OSA is dedicated (that is, you're not running in a guest LAN) then you should have 10.40.1.22 or 24 as the address and the same netmask, etc. If this *is* a guest LAN and VM is running one of those addresses, then your default gateway becomes whatever VM's interface on that guest LAN is. So you probably need to give the VM stack an interface coupled to the guest LAN at 10.140.2.1 or something, and make *that* address your gateway. What's happening now is that Linux sees a 24-bit subnet, but there's nothing else on it, hence it can't get outside of it, because it doesn't have a router to go through. Adam
Re: RH linux, qeth/qdio and such
On Tuesday, 10/28/2003 at 10:56 EST, Steve Gentry [EMAIL PROTECTED] wrote: I was able to load a more current version of the RH kernel, etc. and displays and response look much better. qeth/qdio appears to be in this kernel, IFCONFIG diplays eth0 with the data I entered, but I still can't get to the Linux guest. Admittedly, networking isn't my strong suit. I've talked to the network guys and they have helped me as much as they can. We both are confused by some of the terminology. I need define my network environ a little more and all of this is behind a firewall. We have a z800 with two osa express cards, therefore a total of 4 ports. Port 0(card 1) has 10.140.1.22 as it's ip addr Port 0(card 2) has 10.140.1.24. When we first installed the box I tried to put both of these on card 1, port 0 and port 1 respectively. Terminology: Both ports on a single OSA Express card are port 0 because there are actually two OSA devices (chpids) in each card, each with a single port (0). There is an OSA-2 (old style) combo ethernet/token ring card that has two ports on a single chpid that required you to worry about port 0 and 1. A real, live drawing of the network you are trying to build is always the best place to start. Seeing it in picture form clarifies what you are trying to do and will lead you to the Right Answer. I never could get it to work. Well come to find out and I may have the terminology wrong, you can't have two ip addrs with the same subnet on the same card. Yes, you can. When I moved 10.140.1.24 to card 2, port 0, my original network started to work. Right or wrong it is working. However, if it is wrong, please tell me as I don't want to run into problems later on. I suspect issues in the switch or cabling. To continue, I read some where that Linux should or must be installed in it's own subnet. I seem to remember this from the early days but had forgotten it. Is this still true? Thus the ip addr of 10.140.2.x If you are connecting Linux to a guest LAN on z/VM, then yes it needs to be in its own subnet (unless you're using z/VM 4.4. virtual switch). If you are giving real OSA subchannels to the Linux guest, then it must have IP addresses in the real LAN subnet. Back to the linux config: I've listed below the prompts during network config when I start linux. Enter the IP address of you new Linux guest: 10.140.2.40 (No problem with this one) Wrong subnet, I think, based on above. Enter the network address of the new Linux guest: 10.140.2.0 (I'm not 100% sure of this answer, from looking at countless examples, the 4th octet should be 0(zero).) Correct. But you can just press ENTER on this question and it will select the right value based on netmask. Enter the netmask of the new Linux guest: 255.255.255.0 (No problem with this one) Enter teh broadcast address for the new Linux guest: 10.140.2.255 (Not 100% about this one either, most examples code it this way i.e., the 4th octet is 255) Correct. Again, ENTER will give you a good default. Enter the default gateway: 10.140.2.254 (This is where I get confused. a) should it be 255 instead of 254? If so, is the reply to the previous question wrong? or b) does it mean the default gateway for the rest of my network. In this case it would be 10.140.1.254 (but the 254/255 question still lingers). It is the IP address of the router on the 10.140.2 subnet. But based on the above discussion, you don't really have a .2 subnet, so there is no router. If you had assigned an IP address to Linux in the .1 subnet, then the 10.140.1.254 is the correct answer. Enter your DNS server(s), seperated by colons( : ): 162.133.1.19:162.133.1.22 (This one isn't clear to me, i.e., does it want the DNS server name or the ip addr. I assume ip addr(s). It doesn't fuss about it. However, I've never entered the DNS server name(s)). IP addresses. If you specified names, it couldn't resolve them into IP addresses without knowing where the DNS servers are! Enter your DNS search domain(s) (if any), seperated by colons ( : ): vm.llic.com (I'm not sure about this one either, however it has to be one of two replies (for us) vm.llic.com or llic.com. I have tried llic.com and it doesn't seem to make a difference) This is just the list of domain names Linux will append to any host name you use if you don't provide one. Eg. foohost will be treated as foohost.vm.llic.com, but barhost.ibm.com will not be affected. You're right, though, it won't have any affect on connectivity. Alan Altmark Sr. Software Engineer IBM z/VM Development
Re: GateD for Linux/390 SuSE 7?
On Tuesday, 10/28/2003 at 08:59 CST, David Booher [EMAIL PROTECTED] wrote: Hello list, We've just moved to z/VM 4.4 and I'm having trouble with my SuSE7 Linux/390. VM's RouteD is having a problem with SuSE7's RouteD packets. We are running the VM RouteD using supply control RIP2B, but Linux is sending RIP1 packets. This is resulting in intermittent connections to the guest and eventually VM deletes the route entirely. Any suggestions will be apprectiated. Thanks! Since you're on 4.4, you likely using Guest LANs. That means only the virtual routers need to be running dynamic routing, as leaf nodes in a network do not require dynamic routing protocols. (They just send to their default gateway all the time.) But you didn't supply a picture of your network, so it's not possible to say for sure. BTW, you should consider moving to MPROUTE when you have a few minutes to spare; it has a future, RouteD does not. (MPROUTE supports RIP1, RIP2, and OSPF.) I am surprised you are using RIP2B instead of RIP2M. Do you not have multicast-capable hardware? Alan Altmark Sr. Software Engineer IBM z/VM Development
Perpetuating Myths about the zSeries
When Linux started on the s390 over 3 years ago, a lot of work was done to see what Linux on the mainframe was good for. But that was with lower levels of linux (2.2.16, 2.4.7) and slower machines (mp2000, mp3000, g5,g6). Now that Trex is GA, has anyone gone back and re-examined the mythology? - Is Trex capable of a wider range of applications, with more cpu intensive workloads? - How well do the current kernels (2.4.21 - RHEL3 or SLES8 SP3) scale, both in cpu workloads and I/O workloads. - One of the presentations I noticed from IBM Germany inidicated that a few Linux were better than many linux and a single linux. It turns out the single linux was limited by memory. How does 64bit affect linux performance if given a lot of memory. The reasons I speculate is our old friend - bogomips. For various s/390-zSeries processers, they run something like this (SLES8 SP2, 2.4.19 kernel, except for mp2000 at SLES7). mp2000 - less than 200 bogomips 9672-zz7 (g6) 630 bogomips 2064-116 (z1) 820 bogomips 2084-b16 (Trexx GA1) 2400 bogomips! The speed of the top of the line zSeries has increased at four fold in the last 3-4 years. It seems that the literature is lagging what is now available in the field. Is zSeries more competitive now against other platforms than it was four years ago? (caution: my 1749 MHZ intel registers 3538 bogomips, for whatever that is worth). = Jim Sibley Implementor of Linux on zSeries in the beautiful Silicon Valley Computer are useless.They can only give answers. Pablo Picasso __ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/
bogomips and vm guest performance
I've noticed that if I start 10 or 12 VM linux guests at the same time, after they are up the bogomips vary by almost an order of magnitude (1 guest may have 900 bogomips, another 100). Since bogomips are used for timinig purposes within Linux, does the difference affect the relative performance of the various guests? (If you stagger the start, in my case, 1 second between xautolog's, the bogomips between guests are closer). = Jim Sibley Implementor of Linux on zSeries in the beautiful Silicon Valley Computer are useless.They can only give answers. Pablo Picasso __ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/
Re: Perpetuating Myths about the zSeries
On Tuesday, 10/28/2003 at 08:57 PST, Jim Sibley [EMAIL PROTECTED] wrote: mp2000 - less than 200 bogomips 9672-zz7 (g6) 630 bogomips 2064-116 (z1) 820 bogomips 2084-b16 (Trexx GA1) 2400 bogomips! The speed of the top of the line zSeries has increased at four fold in the last 3-4 years. It seems that the literature is lagging what is now available in the field. Is zSeries more competitive now against other platforms than it was four years ago? Why should anyone give a rats behind about bogomips numbers? A four-fold increase in bogomips says only that bogomips runs 4 times as fast as it used to. Your question about comparisons of competitiveness is interesting, but not in the context of bogomips. I would ask if TCO has improved in the last 3-4 years. The CPU selection is, of course, only one variable in the equation. Alan Altmark Sr. Software Engineer IBM z/VM Development
Re: Perpetuating Myths about the zSeries
On Tue, 2003-10-28 at 10:57, Jim Sibley wrote: The speed of the top of the line zSeries has increased at four fold in the last 3-4 years. I'd be amazed if Intel hasn't done at least this well too. Adam
Re: hw addr
On Tuesday, 10/28/2003 at 12:26 PST, Marian Gasparovic [EMAIL PROTECTED] wrote: I don't have VM background and I didn't get answer on VM list regarding my receive/apply, maybe someone here would answer it ( although these two mailing lists have huge intersection :)) Can anybody explain this ? PTF: UM30652APAR: VM63172 -- Receive status: RECEIVED.10/24/03.06:20:56.MAINT Apply status: APPLIED.07/03/03.15:37:38.MAINT This is a characteristic of the RSU process. We ship an apply table (SRVAPPS) on the RSU but we do not ship a receive table (SRVRECS). This will cause the date mismatch because the SRVRECS will get recreated on the date you are applying the RSU but the SRVAPPS comes from the tape, except that reach ahead service (PTFs) customers have their system that are not on the RSU, will get reapplied. It looks like you put on an RSU on 10/24 and it included UM30652. The date you list for the apply status does match what is in the RSU SRVAPPS for that PTF. Alan Altmark Sr. Software Engineer IBM z/VM Development
Re: GateD for Linux/390 SuSE 7?
Since you're on 4.4, you likely using Guest LANs. That means only the virtual routers need to be running dynamic routing, as leaf nodes in a network do not require dynamic routing protocols. Unless they're multi-homed and/or need to adapt to routing failures or maintenance on their default route destination.
Re: bogomips and vm guest performance
Bogomips are used for micro timings in some device drivers. I don't think they are used for any other purpose. Mostly this is for devices with funny timing characteristics such as not accepting a second command for a short period after getting a first command. The channel architecture of the mainframe isolates drivers from this kind of thing, so bogomips should not be used on the mainframe. -Original Message- From: Jim Sibley [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 9:01 AM To: [EMAIL PROTECTED] Subject: bogomips and vm guest performance I've noticed that if I start 10 or 12 VM linux guests at the same time, after they are up the bogomips vary by almost an order of magnitude (1 guest may have 900 bogomips, another 100). Since bogomips are used for timinig purposes within Linux, does the difference affect the relative performance of the various guests? (If you stagger the start, in my case, 1 second between xautolog's, the bogomips between guests are closer). = Jim Sibley Implementor of Linux on zSeries in the beautiful Silicon Valley Computer are useless.They can only give answers. Pablo Picasso __ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/
Re: Perpetuating Myths about the zSeries
On Tue, Oct 28, 2003 at 11:21:23AM -0600, Adam Thornton wrote: | On Tue, 2003-10-28 at 10:57, Jim Sibley wrote: | The speed of the top of the line zSeries has increased | at four fold in the last 3-4 years. | | I'd be amazed if Intel hasn't done at least this well too. It probably has. But CPU power isn't the whole story, either. I'd ask about how fast a machine built around an Intel/AMD CPU can deal with multiple devices concurrently transferring data for read or write I/O operations. If you need sheer computation power, zSeries is probably not right for you (how about PPC?). But if you need a large, high traffic, high uptime, database, you don't really want the kinds of machines typically built around Intel CPUs. -- - | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -
Re: Perpetuating Myths about the zSeries
Ignoring BogoMIPS arguments for the time being and returning to what I think Jim was really asking: Our original recommendations as to what type of workloads were good matches for the 390 architecture were based on the G5/G6 boxes, now that we have the z990 with its enhanced instruction pipelining, improved floating point performance, and faster cycle speed, do we need to revisit our assumptions about what workloads are now good candidates? I don't have a z990 (drats) so can't answer this question. Actuarial calculations are probably still not good choices for zSeries but what of the others we initially discarded?
Re: Perpetuating Myths about the zSeries
I have heard the story line, If you have high transaction volume, then you don't want Big Blue IRON. Well my question then is, what is a transaction? Is this a computation, is this prime number generation, is this high volume websites, or is this is a large database with a Tbyte of data running 1000s of SQL statements in a brief moment of time, for example a web application adding users to a secure way ldap (back-end is db2)? So what is a transaction? When is a instruction not a transaction and if everything is in some way a transaction what exactly is Linux on the MF good for, other than supporting a virtual environment of previously installed and largely under utilized distributed systems? Would I want to run 100 systems in a given z/VM each with some number of JVMs, yes WebSphere? We are talking about putting 6 JVMs onto a single Linux guest. I am looking forward to this as during our POC (proof of Concept) we never tested with more than 1 may be 2. On the topic of availability I am not sure I buy the whole MF is better than 80x86 or Intel / AMD 64 server hardware. Today, everything is redundant and everything is hot swapable. What is different is to get a new stick or replacement stick of memory for the MF could cost you and automobile and you won't find that stick of memory at the local computer store. Thoughts??? Eric Sammons (804)697-3925 FRIT - Infrastructure Engineering Phil Howard [EMAIL PROTECTED] Sent by: Linux on 390 Port [EMAIL PROTECTED] 10/28/2003 01:14 PM Please respond to Linux on 390 Port To: [EMAIL PROTECTED] cc: Subject:Re: Perpetuating Myths about the zSeries On Tue, Oct 28, 2003 at 11:21:23AM -0600, Adam Thornton wrote: | On Tue, 2003-10-28 at 10:57, Jim Sibley wrote: | The speed of the top of the line zSeries has increased | at four fold in the last 3-4 years. | | I'd be amazed if Intel hasn't done at least this well too. It probably has. But CPU power isn't the whole story, either. I'd ask about how fast a machine built around an Intel/AMD CPU can deal with multiple devices concurrently transferring data for read or write I/O operations. If you need sheer computation power, zSeries is probably not right for you (how about PPC?). But if you need a large, high traffic, high uptime, database, you don't really want the kinds of machines typically built around Intel CPUs. -- - | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -
Re: Perpetuating Myths about the zSeries
Well, I have been told that some of the Intel servers are coming up to speed in the following areas, but in most other architectures, you get an outage if you have a memory error. On a zArch box, you might not even see this because the hardware will replace failing memory automagically. This is done strictly in the hardware by sweeping through memory doing testing during off periods to find these soft errors and fix them before they become a hard error. And although it is not touted very much, the zArch implementations (and previous ones as well) are constantly doing internal cross-checking to verify correct results. That's one of the reasons that it is slower than other CPUs. On a zArch machine, you know you are getting the correct result (correct as in the hardware did not cause the problem. Not correct as in you program was doing what you thought it was doing grin). On some others, you are not sure because a glitch could cause an undetected error. Sort of like running without parity on your memory. BTW, did you know that the data path on an Intel process from the main memory to the CPU does not have parity or ECC? So, even with ECC memory, and errors can creap in if the error occurs during this movement of data. The zMachines do use ECC on these internal data paths. Likewise, every zArch box has at least one extra CP that cannot be assigned. If a CP fails, this extra CP can usually take over operation of the failing CP without the underlying software needing to do any kind of recovery at all. The software will get an indication of a hardware error and the box will call home. Again, on most other boxes, this would result in a outage. Perhaps just a reIPL (uh, reboot), but you might be down until the server is fixed or replaced. (most likely replaced anymore). -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team +1.817.255.3225 This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited. -Original Message- From: Eric Sammons [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 12:27 PM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries I have heard the story line, If you have high transaction volume, then you don't want Big Blue IRON. Well my question then is, what is a transaction? Is this a computation, is this prime number generation, is this high volume websites, or is this is a large database with a Tbyte of data running 1000s of SQL statements in a brief moment of time, for example a web application adding users to a secure way ldap (back-end is db2)? So what is a transaction? When is a instruction not a transaction and if everything is in some way a transaction what exactly is Linux on the MF good for, other than supporting a virtual environment of previously installed and largely under utilized distributed systems? Would I want to run 100 systems in a given z/VM each with some number of JVMs, yes WebSphere? We are talking about putting 6 JVMs onto a single Linux guest. I am looking forward to this as during our POC (proof of Concept) we never tested with more than 1 may be 2. On the topic of availability I am not sure I buy the whole MF is better than 80x86 or Intel / AMD 64 server hardware. Today, everything is redundant and everything is hot swapable. What is different is to get a new stick or replacement stick of memory for the MF could cost you and automobile and you won't find that stick of memory at the local computer store. Thoughts??? Eric Sammons (804)697-3925 FRIT - Infrastructure Engineering Phil Howard [EMAIL PROTECTED] Sent by: Linux on 390 Port [EMAIL PROTECTED] 10/28/2003 01:14 PM Please respond to Linux on 390 Port To: [EMAIL PROTECTED] cc: Subject:Re: Perpetuating Myths about the zSeries On Tue, Oct 28, 2003 at 11:21:23AM -0600, Adam Thornton wrote: | On Tue, 2003-10-28 at 10:57, Jim Sibley wrote: | The speed of the top of the line zSeries has increased | at four fold in the last 3-4 years. | | I'd be amazed if Intel hasn't done at least this well too. It probably has. But CPU power isn't the whole story, either. I'd ask about how fast a machine built around an Intel/AMD CPU can deal with multiple devices concurrently transferring data for read or write I/O operations. If you need sheer computation power, zSeries is probably not right for you (how about PPC?). But if you need a large, high traffic, high uptime, database, you don't really want the kinds of machines typically built around Intel CPUs. --
Re: Perpetuating Myths about the zSeries
The level of redundancy is not the same in the Intel/AMD world as it is in the mainframe. In many cases this does not matter. You only need the redundancy if something goes wrong. In many cases an Intel based server is very reliable. It is hard to compare CPU power, but it seems to me that Intel has a big advantage in CPU speed right now. Software is another matter. I am not a fan of Windows, and I find the number of hangs to be annoying. It does work well enough for most web applications though. Linux seems to be very reliable, although I don't have much direct experience with it. z/OS has its problems. When we went from a weekly IPL to a bi-weekly IPL our system crashed after about a week and a half because CSA was exhausted. If I may ramble on a bit: one thing I have noticed is that all systems I have worked with have one common problem, which is programs that try to access memory regions outside of the allocated virtual memory for the process. On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. -Original Message- From: Eric Sammons [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 10:27 AM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries I have heard the story line, If you have high transaction volume, then you don't want Big Blue IRON. Well my question then is, what is a transaction? Is this a computation, is this prime number generation, is this high volume websites, or is this is a large database with a Tbyte of data running 1000s of SQL statements in a brief moment of time, for example a web application adding users to a secure way ldap (back-end is db2)? So what is a transaction? When is a instruction not a transaction and if everything is in some way a transaction what exactly is Linux on the MF good for, other than supporting a virtual environment of previously installed and largely under utilized distributed systems? Would I want to run 100 systems in a given z/VM each with some number of JVMs, yes WebSphere? We are talking about putting 6 JVMs onto a single Linux guest. I am looking forward to this as during our POC (proof of Concept) we never tested with more than 1 may be 2. On the topic of availability I am not sure I buy the whole MF is better than 80x86 or Intel / AMD 64 server hardware. Today, everything is redundant and everything is hot swapable. What is different is to get a new stick or replacement stick of memory for the MF could cost you and automobile and you won't find that stick of memory at the local computer store. Thoughts??? Eric Sammons (804)697-3925 FRIT - Infrastructure Engineering Phil Howard [EMAIL PROTECTED] Sent by: Linux on 390 Port [EMAIL PROTECTED] 10/28/2003 01:14 PM Please respond to Linux on 390 Port To: [EMAIL PROTECTED] cc: Subject:Re: Perpetuating Myths about the zSeries On Tue, Oct 28, 2003 at 11:21:23AM -0600, Adam Thornton wrote: | On Tue, 2003-10-28 at 10:57, Jim Sibley wrote: | The speed of the top of the line zSeries has increased | at four fold in the last 3-4 years. | | I'd be amazed if Intel hasn't done at least this well too. It probably has. But CPU power isn't the whole story, either. I'd ask about how fast a machine built around an Intel/AMD CPU can deal with multiple devices concurrently transferring data for read or write I/O operations. If you need sheer computation power, zSeries is probably not right for you (how about PPC?). But if you need a large, high traffic, high uptime, database, you don't really want the kinds of machines typically built around Intel CPUs. -- - | Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ | | (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ | -
Re: Perpetuating Myths about the zSeries
On Tuesday, 10/28/2003 at 01:27 EST, Eric Sammons [EMAIL PROTECTED] wrote: Thoughts??? I don't think we're trying to compare (in this discussion, anyway) the relative merits of different platforms. The question at hand is whether the latest generation of zSeries hardware and software have improved the environment for hosting Linux to the point that you could consider it for things you might not have 3 years ago. That [presumably] consists of an evaluation of management, reliability, function, performance, speed, capacity, and any other measurement you consider appropriate. Further, consider the technological advances of the discrete solutions. Are they advancing at the same rate as zSeries? In all areas? In some areas? Does that have any affect on decision-making? Please limit responses to 10 typewritten pages, double-spaced, pica. Unsigned submissions will not be accepted by the editors for publication. You have 2 hours. Please begin. (And no talking...) ;-) Where's a PhD candidate when you need one? Alan Altmark Sr. Software Engineer IBM z/VM Development
Re: Perpetuating Myths about the zSeries
-Original Message- From: Fargusson.Alan [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 12:49 PM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries snip If I may ramble on a bit: one thing I have noticed is that all systems I have worked with have one common problem, which is programs that try to access memory regions outside of the allocated virtual memory for the process. On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. Well, with UNIX (sigaction()) and z/OS programs (ESTAE / ESPIE), the programmer can catch this error and attempt to recover. I am not very Windows literate, but I'd lay odds it has something similar. I don't know what the OS itself could do to automatically fix this. I'd say this problem can be laid at the feet of the application programmer. -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team +1.817.255.3225 This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited.
Re: Perpetuating Myths about the zSeries
Well my question then is, what is a transaction? A very good question, and exactly why the how many PCs can I consolidate? question is basically a useless one. The answer has to include what the PCs are doing and how they do it. It's comparing apples and pumpkins. So what is a transaction? The best definition I've come up with is a seqence of operations that accomplishes a single unique unit of business typical to an application, defined by the type of problem and type of application. Would I want to run 100 systems in a given z/VM each with some number of JVMs, yes WebSphere? What do the applications running in the JVMs do? 8-) On the topic of availability I am not sure I buy the whole MF is better than 80x86 or Intel / AMD 64 server hardware. Today, everything is redundant and everything is hot swapable. But not to the point of being able to intercept failed instructions and re-dispatch on pre-installed spare hardware, unless you've bought a really Tandem or some such system, at which point you're not paying much less than the equivalent zSeries. Correcting failure in flight isn't yet possible in Intel hardware systems, and even with the Opteron and Itanium, it won't be easy. None of the Intel systems share instruction pipelines yet. Now if some of the rumors about using a PowerPC core for the next gen zSeries processors are true, or that IBM licenses some of the PowerPC or zSeries multicore fab technology to AMD or Intel to make MCM-style platters of Intel engines, that might change the picture drastically. I don't see that happening, but it'd be a very interesting change in the landscape. What is different is to get a new stick or replacement stick of memory for the MF could cost you and automobile and you won't find that stick of memory at the local computer store. No, you'll find your IBM CE showing up at your door with the correct replacement in his hand before you even know it failed...8-) -- db
Re: Perpetuating Myths about the zSeries
On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. This is programmer error -- the hardware is doing exactly what it should do, methinks. Correcting the developers usually helps, although that's much harder. I've yet to find a programming language or toolset that doesn't do exactly what the programmer tells it to do, even if it's stupid...8-) -- db
Re: Perpetuating Myths about the zSeries
On Tuesday, 10/28/2003 at 10:49 PST, Fargusson.Alan [EMAIL PROTECTED] wrote: If I may ramble on a bit: one thing I have noticed is that all systems I have worked with have one common problem, which is programs that try to access memory regions outside of the allocated virtual memory for the process. On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. [0C5, addressing exception, is what you get when you try to access memory not defined to your address space. 0C4, protection exception, is the result of trying to read or write memory that *is* part of your address space but for which memory protection mechanisms are in effect, such as storage keys and segment protection.] The problem isn't with the fault. The problem is in how the system handles it. If the system provides a way for the application to catch it (VM and MVS do), then you could let the application try to recover, but such exceptions usually mean the program is hosed in some fashion. Odds of a successful recovery (Oops! I'll go back and use the *right* pointer this time!) are poor. We separate the sheep from the goats when we look at how the system deals with such an error. It should, of course, be architecturally impossible for a wild address to destroy any part of the operating system, including any list of resources being used by the program. So, the operating system needs to close files, close sockets, release memory, decrement shared object counters, release held locks, delete semaphores, and so on. As though the program had Never Been. I've seen less robust operating systems lock up during this phase, or not clean up everything belonging to the program, including other programs. Alan Altmark Sr. Software Engineer IBM z/VM Development
Re: Perpetuating Myths about the zSeries
On Tue, 2003-10-28 at 13:09, David Boyes wrote: This is programmer error -- the hardware is doing exactly what it should do, methinks. Correcting the developers usually helps, although that's much harder. I've yet to find a programming language or toolset that doesn't do exactly what the programmer tells it to do, even if it's stupid...8-) I think you could make the case that PROLOG, when it's behaving nondeterministically, is *perhaps* not doing what the programmer tells it to. Oh, and there's Quantum INTERCAL--which, alas, lived on the late, lamented assurdo.com--which might, or might not, have been doing what you told it to do. Adam
Memory access faults.
Catching the fault with sigaction does not give you much opportunity to correct and continue. In fact it seems that you cannot continue from the signal handler. I don't have access to a Linux system, but I tried ignoring the fault on our z/OS Unix system, and the process went into an infinite loop. I suspect it retries the operation. One would need to be able to tell the system to skip the operation. I am not up on ESTAE. On Windows you don't have a way to catch this that I know of. Perhaps some undocumented API function allows this. The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. -Original Message- From: McKown, John [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 10:57 AM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries -Original Message- From: Fargusson.Alan [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 12:49 PM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries snip If I may ramble on a bit: one thing I have noticed is that all systems I have worked with have one common problem, which is programs that try to access memory regions outside of the allocated virtual memory for the process. On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. Well, with UNIX (sigaction()) and z/OS programs (ESTAE / ESPIE), the programmer can catch this error and attempt to recover. I am not very Windows literate, but I'd lay odds it has something similar. I don't know what the OS itself could do to automatically fix this. I'd say this problem can be laid at the feet of the application programmer. -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team +1.817.255.3225 This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited.
Re: Perpetuating Myths about the zSeries
Of course this is a programmer error, and the hardware is doing the right thing. But is the OS doing the right thing? The programmer didn't ask the OS to abort the program. -Original Message- From: David Boyes [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 11:09 AM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. This is programmer error -- the hardware is doing exactly what it should do, methinks. Correcting the developers usually helps, although that's much harder. I've yet to find a programming language or toolset that doesn't do exactly what the programmer tells it to do, even if it's stupid...8-) -- db
Re: GateD for Linux/390 SuSE 7?
On Tue, Oct 28, 2003 at 10:29:18AM -0500, David Boyes wrote: IBM wants us to see if we can get the Linux to send RIP2 packets, which I believe only can be done with gateD. Is there an rpm available for SuSE 7 Linux? Use Zebra instead of gated. It supports RIP2. Use GNU Quagga instead of Zebra, as Zebra is dead and Quagga rose from its ashes. -- - mdz
Re: Memory access faults.
On Tue, 2003-10-28 at 13:32, Fargusson.Alan wrote: The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. Yes, but do you have a better suggestion? I mean, in the common case, you got this error because somewhere, there's a pointer that's pointing at something that's not yours to read (or write). Probably that's an application error; maybe it's a hardware failure. In any event, what *is* the correct behavior? You certainly don't want to give bad data to the end user. You don't want to give him whatever happens to be at that address, since it almost certainly isn't what he really wants and if he uses it he's basing a decision on bad information. What *do* you do other than say, Uh, this program tried to go grab hold of the wrong thing; please report a bug ? Adam
RHEL3 requires 256MB to be supported?
Hi list, As is considered good practice, I've been trying to use VDISK swap with the DASD diagnose driver when possible. I was surprised to see the following message when IPLing RHEL3 WARNING: Red Hat Enterprise Linux AS release 3 (Taroon) requires at least 256MB RAM to run as a supported configuration. (122MB detected) Normal startup will continue in 10 seconds. (and be put in the 10 second penalty box!) Is this a side-effect of PC-server-think? (and what's RAM, don't they mean STORAGE? :)) What's interesting is RHEL-3 seems to use a lot less memory in a default install than SLES-8: RHEL3: # uname -a Linux pbc9939.pok.ibm.com 2.4.21-4.EL #1 SMP Fri Oct 3 17:31:42 EDT 2003 s390 s390 s390 GNU/Linux # free total used free sharedbuffers cached Mem:125072 43456 81616 0 3624 17596 -/+ buffers/cache: 22236 102836 Swap:0 0 0 SLES8+SP2: # uname -a Linux pbc99210 2.4.19-3suse-SMP #1 SMP Fri Sep 5 16:46:09 CDT 2003 s390 unknown pbc99210:~ # free total used free sharedbuffers cached Mem:126008 123244 2764 0 17504 30344 -/+ buffers/cache: 75396 50612 Swap: 304600 1892 302708 -Mike MacIsaac, IBM mikemac at us.ibm.com (845) 433-7061
Re: Perpetuating Myths about the zSeries
-Original Message- From: Fargusson.Alan [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 1:35 PM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries Of course this is a programmer error, and the hardware is doing the right thing. But is the OS doing the right thing? The programmer didn't ask the OS to abort the program. Sure he did! He said: I'm too busy, stupid, or egotistical to handle this problem. You do something. In a case like this I cannot think of a generic action which could address the problem. If the programming is attempting to update a protected or non-existant location, should it just ignore the store? What if the information to be stored is critical to the future running of the program? What now? On reading a protected or non-existant location, what should be returned? binary zeros? What about combining these cases where a program thinks it has saved a critical calculation (such as your bonus for the year) but did it wrong and the OS said OK, I'll just ignore that store. It then tries to get your bonus amount from that location, sees that it is zero and doesn't give you your bonus? Wouldn't prefer that something terrible happen so that the end user will be force to check into it? Granted, a somewhat silly example, but the what to do simply cannot be generally answered by the OS. Only the application programmer can do this. And they refused (My programs are never in error, in error, in error, in error, ...) Actually, believe it or not, on an old IBM DOS system, a data exception would cause a message similar to: JOB TERMINATED DUE TO PROGRAM REQUEST A programmer screamed at me that his program did NOT request that the job be terminated! He was royally angry at this accusation. Not a good choice of words. I would have preferred something like: JOB TERMINATED DUE TO PROGRAMMER ERROR OR STUPIDITY! BIG GRIN -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team +1.817.255.3225 This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited.
Re: Perpetuating Myths about the zSeries
I think you could make the case that PROLOG, when it's behaving nondeterministically, is *perhaps* not doing what the programmer tells it to. MMf. The argument on whether data-driven languages like Prolog or Standard ML are deterministic or not is a very fine line (and has nothing to do with Linux, so I won't go into it here). Since such languages *are* still rule-evaluation based, barring logical contradictions, they do have a predictable end state, thus at some frame of reference, the answer is a reflection of the logic the programmer coded. If there is not a deterministic end state, then you have a positive feedback loop and combinatorial explosion. Here be dragons indeed. Oh, and there's Quantum INTERCAL--which, alas, lived on the late, lamented assurdo.com--which might, or might not, have been doing what you told it to do. It's not clear that Intercal ever did *anything* useful, so I'll concede that one. -- db
Re: Memory access faults.
The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. Unfortunately, that's about the only place it *can* go. Users can't (or shouldn't be able to) change the code on the fly, or, if they can, they're better than the developer -- or at least, a lot more bored and have way too much free time. Methinks that it mostly results from not probing your environment at startup to determine what the limitations are, and then doing more rigorous checking that you don't violate those limits during operation. Way too many programmers assume infinite resources and don't cope with failure to acquire same. While we're airing pet peeves: why do people assume that changing programming languages will somehow fix this problem? It's just as possible to write rotten Java as rotten C or Fortran (in fact, it's possible to write bad Fortran in *any* programming language...8-)), and IDE's and all the other stuff doesn't fix bad programming practices any more than a fancy grease gun fixes seized bearings. It helps, but fused is still fused. (yes, it's been Tuesday all over. grump.) -- db
Re: Memory access faults.
Believe it or not, a lot of thought DOES go in at the Operating System level about the proper action to take for a given problem. When I was learning about Parallel Sysplex in zOS, for example, we were told that there are certain failures that can take down the entire sysplex (all participating machines). One of these is failure of the common time reference. If the clocks get out of sync, BOOM! At first glance, this sounds horribly radical, but the logic was simple: Maintaining data integrity is the first priority. Under certain types of failure situations, where data integrity was threatened, the best solution was to stop everything so it could be restarted in a controlled fashion. Long, long ago, when MVS was first being designed, a conscious decision was made regarding recovery. Every routine in the OS had to be protected by a recovery routine, or by the recovery routine of it's caller. The idea was to prevent a single application or component failure, no matter how serious, from affecting the rest of the workload. End user applications are EXTREMELY well insulated from the OS. Even OS component failures don't do anywhere near the harm they did in earlier operating systems. This doesn't mean the applications don't fail, they just don't usually take anyone else with them. When an application abends, it's because something has happened that it can't handle. Not that it COULDN'T handle it if it wanted to. There are SPIE and STAE exits provided that can catch just about ANY error condition (including operator cancel), and try to do something about them, but covering all possible contingencies is just too much for the average programmer, and would make the apps orders of magnitude more complex. At least with zOS, you get a dump and diagnostic information that you can use to track down the problem. You also have manuals to explain the codes and error messages. By default, you get almost NOTHING from Windows (there isn't even a manual to explain the failure codes), and very little from the various ix-es. -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] Behalf Of Adam Thornton Sent: Tuesday, October 28, 2003 2:43 PM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] Memory access faults. On Tue, 2003-10-28 at 13:32, Fargusson.Alan wrote: The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. Yes, but do you have a better suggestion? I mean, in the common case, you got this error because somewhere, there's a pointer that's pointing at something that's not yours to read (or write). Probably that's an application error; maybe it's a hardware failure. In any event, what *is* the correct behavior? You certainly don't want to give bad data to the end user. You don't want to give him whatever happens to be at that address, since it almost certainly isn't what he really wants and if he uses it he's basing a decision on bad information. What *do* you do other than say, Uh, this program tried to go grab hold of the wrong thing; please report a bug ? Adam
Re: Memory access faults.
Recovery is only as good as the language framework allows it to be. Compilers insulate you from the data and the hardware, and reduce your level of control over how errors are handled. But that's part of what you're buying by using a compiler in the first place: Not to have to worry about all those little details. Assembler programs have access to interrupt exits that allow recovery routines to get control. An infinitely smart programmer could conceivably write enough code to fix or recover from ANY failure, but how many of THOSE are there? And who writes in assembler anymore anyway? -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] Behalf Of David Boyes Sent: Tuesday, October 28, 2003 2:52 PM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] Memory access faults. The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. Unfortunately, that's about the only place it *can* go. Users can't (or shouldn't be able to) change the code on the fly, or, if they can, they're better than the developer -- or at least, a lot more bored and have way too much free time. Methinks that it mostly results from not probing your environment at startup to determine what the limitations are, and then doing more rigorous checking that you don't violate those limits during operation. Way too many programmers assume infinite resources and don't cope with failure to acquire same. While we're airing pet peeves: why do people assume that changing programming languages will somehow fix this problem? It's just as possible to write rotten Java as rotten C or Fortran (in fact, it's possible to write bad Fortran in *any* programming language...8-)), and IDE's and all the other stuff doesn't fix bad programming practices any more than a fancy grease gun fixes seized bearings. It helps, but fused is still fused. (yes, it's been Tuesday all over. grump.) -- db
Re: Memory access faults.
On Tuesday, 10/28/2003 at 01:43 CST, Adam Thornton [EMAIL PROTECTED] wrote: In any event, what *is* the correct behavior? +--+ |Application Failure | | Program XYZ has failed. Because it did | | not register for automatic failure | | reporting, there really isn't much you | | can do about it. But if you're really | | curious, you can look at the registers | | and a disassembly of 100 bytes | | surrounding the failure. Feel free to | | curse and pound on the desk in anger.| | It won't help. Really. | | | | After 1 minute, CANCEL will be selected. | | | | ++ ++ +---+ | | | Look | | Cancel | |restart| | | | inside | || | pgm | | | ++ ++ +---+ | | | +--+ Chuckie
Re: Memory access faults.
But, it is still the programmer's fault! Such things as bounds checking and reasonablness tests need to be instinctual if you are an applications programmer. Just like not breathing while you are under water! If the end user is suffering then they should be standing around the programmers desk making sure he feels their pain and fixes it! Fargusson.Alan [EMAIL PROTECTED] To: [EMAIL PROTECTED] tb.ca.gov cc: Sent by: Linux on Subject: Memory access faults. 390 Port [EMAIL PROTECTED] IST.EDU 10/28/2003 01:32 PM Please respond to Linux on 390 Port Catching the fault with sigaction does not give you much opportunity to correct and continue. In fact it seems that you cannot continue from the signal handler. I don't have access to a Linux system, but I tried ignoring the fault on our z/OS Unix system, and the process went into an infinite loop. I suspect it retries the operation. One would need to be able to tell the system to skip the operation. I am not up on ESTAE. On Windows you don't have a way to catch this that I know of. Perhaps some undocumented API function allows this. The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. -Original Message- From: McKown, John [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 10:57 AM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries -Original Message- From: Fargusson.Alan [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 12:49 PM To: [EMAIL PROTECTED] Subject: Re: Perpetuating Myths about the zSeries snip If I may ramble on a bit: one thing I have noticed is that all systems I have worked with have one common problem, which is programs that try to access memory regions outside of the allocated virtual memory for the process. On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. Well, with UNIX (sigaction()) and z/OS programs (ESTAE / ESPIE), the programmer can catch this error and attempt to recover. I am not very Windows literate, but I'd lay odds it has something similar. I don't know what the OS itself could do to automatically fix this. I'd say this problem can be laid at the feet of the application programmer. -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team +1.817.255.3225 This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited.
Re: Memory access faults.
On Tue, 2003-10-28 at 14:31, Alan Altmark wrote: | surrounding the failure. Feel free to | | curse and pound on the desk in anger.| | It won't help. Really. | I find that it helps quite a lot, myself. It doesn't help me get my job done any quicker, but I feel better. Adam
Re: Perpetuating Myths about the zSeries
- Original Message - From: David Boyes [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 1:09 PM Subject: Re: Perpetuating Myths about the zSeries On Windows this results in the famous general protection fault, on Unix it results in the famous segmentation fault, and on z/OS it is the famous SOC4. I wonder if there isn't a better way to deal with this problem then just aborting the program. Users find this problem really annoying. This is programmer error -- the hardware is doing exactly what it should do, methinks. Correcting the developers usually helps, although that's much harder. I've yet to find a programming language or toolset that doesn't do exactly what the programmer tells it to do, even if it's stupid...8-) So... you haven't used VBScript? Perhaps it barely qualifies as a programming language, but just today I wrote a script to delete a file, then create a new one with the same name. The new one has the same DateCreated as the file that was deleted. Unless you wait long enough between delete and create... then you get the current date/time. Working as designed, says Microsoft. :-/ -- jcf
Re: Memory access faults.
The answer to this may be: it depends. In a batch program it is probably best to abort the program. In a windowing environment it might be best to ask if the user wants to continue. Timesharing users might want an option to tell programs to continue (maybe an environment variable?). -Original Message- From: Adam Thornton [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 11:43 AM To: [EMAIL PROTECTED] Subject: Re: Memory access faults. On Tue, 2003-10-28 at 13:32, Fargusson.Alan wrote: The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. Yes, but do you have a better suggestion? I mean, in the common case, you got this error because somewhere, there's a pointer that's pointing at something that's not yours to read (or write). Probably that's an application error; maybe it's a hardware failure. In any event, what *is* the correct behavior? You certainly don't want to give bad data to the end user. You don't want to give him whatever happens to be at that address, since it almost certainly isn't what he really wants and if he uses it he's basing a decision on bad information. What *do* you do other than say, Uh, this program tried to go grab hold of the wrong thing; please report a bug ? Adam
Re: Memory access faults.
I like that. Reminds me of an old Windows program called First Aid that tried to help catch failures better than Windows itself did. My friends raved about it, but it never worked that well for me. -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] Behalf Of Alan Altmark Sent: Tuesday, October 28, 2003 3:32 PM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] Memory access faults. On Tuesday, 10/28/2003 at 01:43 CST, Adam Thornton [EMAIL PROTECTED] wrote: In any event, what *is* the correct behavior? +--+ |Application Failure | | Program XYZ has failed. Because it did | | not register for automatic failure | | reporting, there really isn't much you | | can do about it. But if you're really | | curious, you can look at the registers | | and a disassembly of 100 bytes | | surrounding the failure. Feel free to | | curse and pound on the desk in anger.| | It won't help. Really. | | | | After 1 minute, CANCEL will be selected. | | | | ++ ++ +---+ | | | Look | | Cancel | |restart| | | | inside | || | pgm | | | ++ ++ +---+ | | | +--+ Chuckie
Re: Memory access faults.
I want to say that I agree very strongly with you about programming languages. I think that the right language should be used for each application, but blaming C for buffer overflows it not helping solve the problem. In fact, a some of the programs with problems (Outlook?) are written in VisualBasic, although VB is written in C, so I guess that doesn't prove anything. -Original Message- From: David Boyes [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 11:52 AM To: [EMAIL PROTECTED] Subject: Re: Memory access faults. The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. Unfortunately, that's about the only place it *can* go. Users can't (or shouldn't be able to) change the code on the fly, or, if they can, they're better than the developer -- or at least, a lot more bored and have way too much free time. Methinks that it mostly results from not probing your environment at startup to determine what the limitations are, and then doing more rigorous checking that you don't violate those limits during operation. Way too many programmers assume infinite resources and don't cope with failure to acquire same. While we're airing pet peeves: why do people assume that changing programming languages will somehow fix this problem? It's just as possible to write rotten Java as rotten C or Fortran (in fact, it's possible to write bad Fortran in *any* programming language...8-)), and IDE's and all the other stuff doesn't fix bad programming practices any more than a fancy grease gun fixes seized bearings. It helps, but fused is still fused. (yes, it's been Tuesday all over. grump.) -- db
Re: Perpetuating Myths about the zSeries
Of course this is a programmer error, and the hardware is doing the right thing. But is the OS doing the right thing? The programmer didn't ask the OS to abort the program. Ostensibly the reason that the OS is limiting access is to do resource access or utilization controls. If the application is attempting to do something that violates the controls placed on a resource, then the OS has two choices -- extend the controls according to a policy, or deny the request. In either case, the application has to cope with either a soft failure or a hard failure. If the application programmer doesn't deal with a hard failure, or the system programmer doesn't deal with it by having a soft retry exit like ESTAE active, then what's the OS supposed to do? Guess? The Symbolics OS had a policy setting for this sort of thing (setq `sym-access-violation-policy `some-kind-of-bitmap-vector-that-I-can't-remember), but it did have a bunch of stability problems, so the default was to abort processes that violated their resource constraints. I don't know too many others that have tried to handle this beyond the give the control back to the app, and default to kill if no app handler.
Re: Memory access faults.
On Tue, 2003-10-28 at 14:54, Fargusson.Alan wrote: The answer to this may be: it depends. In a batch program it is probably best to abort the program. In a windowing environment it might be best to ask if the user wants to continue. Timesharing users might want an option to tell programs to continue (maybe an environment variable?). Well, OK, but if it's contingent, then there isn't a single best answer, and so you're back to putting it at the feet of the app programmer. The OS won't necessarily know whether I'm an interactive or batch user. Heck, some of my programs, running at the same time, in the same environment, may be considered batch or interactive, simply depending on how I am working with them that day. Adam
Re: Perpetuating Myths about the zSeries
Alan wrote: Why should anyone give a rats behind about bogomips numbers? A four-fold increase in bogomips says only that bogomips runs 4 times as fast as it used to. Your question about comparisons of competitiveness is interesting, but not in the context of bogomips. I would ask if TCO has improved in the last 3-4 years. The CPU selection is, of course, only one variable in the equation. As a relative measure between the zSeries platforms, it is an indication of relative speed. Between platforms, its really not useful. I totally agree that TCO is the main issue, but what I see is a bunch of myths developing around Linux on zSeries and the zSeries and its Linux ahead faster than the published measurements indicate. Its like the old TSO is slow myth vs CMS. In the few years of the of the s/360, TSO was slow and a lot of products tried to replace it (ROSCOE, etc). Once TSO got improved, the myth persisted. The performance literature for Linux is way behind what its real capabilites are today. = Jim Sibley Implementor of Linux on zSeries in the beautiful Silicon Valley Computer are useless.They can only give answers. Pablo Picasso __ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/
zSeries EXPO (Nov.10-14, 2003 Las Vegas)
(Cross posted to VMESA-L,LINUX-390, and IBM-MAIN). Hello zSeries enthusiasts, If the word about the upcoming tech conference has not yet reached you via other e-mail or web notices, please allow me to remind you of this annual technical education opportunity. IBM zSeries EXPO ... a technical conference focused on z/OS, VSE,z/VM, Linux on zSeries November 10-14, 2003 Las Vegas Hilton Whether you're new to zSeries, just need a refresher, or if you are experienced in some areas and want to gain more knowledge in others, you're bound to find something to help you sharpen your zSeries skills. Choose from the topics in these conference tracks: zSeries and Storage Technology * Keynote by Erich Clementi, General Manager, zSeries, IBM Corporation * Networking, The Internet, and the z/OS Comm Server * Data Center Operations and Management * Management Issues for a zSeries Environment * WLM, Performance, and Capacity Planning * WebSphere for z/OS, e-business and Java * z/OS, Parallel Sysplex, and Storage Software Virtualization Technology for Linux on zSeries and S/390 * z/VM and zSeries Virtualization Technology Basics * z/VM General Interest * z/VM Connectivity * z/VM System Management * z/VM Performance General Linux on zSeries Sessions * Introductory Linux for the Mainframe Systems Programmer Sessions * Linux on zSeries Installation Sessions * Networking with Linux on zSeries * Linux on zSeries Application Sessions * Linux on zSeries User Experience Sessions * Linux on zSeries Systems Management and Performance Sessions VSE/ESA General Interest Sessions (Mon - Wed) * VSE, z/VM and Linux General Interest Sessions (Mon-Wed) * VSE/ESA Sessions (Mon - Wed) * e-business and VSE/ESA Sessions (Mon- Wed) * CICS Transaction Server for VSE/ESA Sessions (Mon-Wed) ISV Sessions In addition to stand-up lecture, you can choose to attend hands-on-labs on these topics: - The Next Stage: RMF Spreadsheet Reporter Java Edition Hands-on Lab - Using EXCEL for Data Analysis: Hands-on Tutorial - W14 WebSphere for z/OS Administration Hands-on Lab - WebSphere Studio Application Monitor: Hands-on Lab - Monitoring WebSphere for z/OS Using Introscope: Hands-on Lab - Taming CTC Connections: Hands-on Lab - Implementing LDAP on z/OS Hands-on Lab: - HMC and Remote HMC Hands-on Workshop - Linux 101 Lab - Running z/VM to Host Linux -- Installation Lab - Linux for S/390 Installation Lab - Introduction to REXX hands-on-lab - VSE/ESA Workshop Conference web site: http://www-3.ibm.com/services/learning/conf/us/zseries/ Sessions abstracts/agenda: http://www-3.ibm.com/services/learning/conf/us/zseries/schedule.pdf Registration: http://www-3.ibm.com/services/learning/conf/us/zseries/regfee.html Pre-conference classes: http://www-3.ibm.com/services/learning/conf/us/zseries/preconf.html We look forward to seeing you at the zSeries EXPO in Las Vegas! Regards, zSerie EXPO agenda coordinators Pam Christina Glenn Anderson Julie Liesenfelt Chuck Morse
Re: Perpetuating Myths about the zSeries
On Tuesday, 10/28/2003 at 01:30 PST, Jim Sibley [EMAIL PROTECTED] wrote: Its like the old TSO is slow myth vs CMS. In the few years of the of the s/360, TSO was slow and a lot of products tried to replace it (ROSCOE, etc). Once TSO got improved, the myth persisted. Yes, but in this case Everyone Knows it's true! :-) The performance literature for Linux is way behind what its real capabilites are today. And here we agree. Bringing BogusMIPS into the discussion was like throwing a mouse in front of a cat. You distracted us from your real point: That things are better now than they used to be. Alan Altmark Sr. Software Engineer IBM z/VM Development
Re: GateD for Linux/390 SuSE 7?
On Tuesday, 10/28/2003 at 03:57 CST, David Booher [EMAIL PROTECTED] wrote: Some listers have mentioned I should go to MPROUTE etcThat may be a possibility in the future, but as I see it, if ROUTED is still supported, it still should work - it worked fine on 2.4. Yes, ROUTED is still supported. Just letting you know that we will eventually remove it in favor of MPROUTE. No hurry, just FYI. Alan Altmark Sr. Software Engineer IBM z/VM Development
Re: Memory access faults.
I was thinking of batch in z/OS terms, ware there is a distinction. If the OS does not have this distinction then you would treat batch and interactive the same for error handling. -Original Message- From: Adam Thornton [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2003 1:12 PM To: [EMAIL PROTECTED] Subject: Re: Memory access faults. On Tue, 2003-10-28 at 14:54, Fargusson.Alan wrote: The answer to this may be: it depends. In a batch program it is probably best to abort the program. In a windowing environment it might be best to ask if the user wants to continue. Timesharing users might want an option to tell programs to continue (maybe an environment variable?). Well, OK, but if it's contingent, then there isn't a single best answer, and so you're back to putting it at the feet of the app programmer. The OS won't necessarily know whether I'm an interactive or batch user. Heck, some of my programs, running at the same time, in the same environment, may be considered batch or interactive, simply depending on how I am working with them that day. Adam
Re: Memory access faults.
I do -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Hall, Ken (IDS ECCS) Sent: Tuesday, October 28, 2003 3:06 PM To: [EMAIL PROTECTED] Subject: Re: Memory access faults. Recovery is only as good as the language framework allows it to be. Compilers insulate you from the data and the hardware, and reduce your level of control over how errors are handled. But that's part of what you're buying by using a compiler in the first place: Not to have to worry about all those little details. Assembler programs have access to interrupt exits that allow recovery routines to get control. An infinitely smart programmer could conceivably write enough code to fix or recover from ANY failure, but how many of THOSE are there? And who writes in assembler anymore anyway? -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] Behalf Of David Boyes Sent: Tuesday, October 28, 2003 2:52 PM To: [EMAIL PROTECTED] Subject: Re: [LINUX-390] Memory access faults. The problem with laying this at the feet of the application programmer is that they are not perfect, and when the program fails it actually the end user that suffers. Unfortunately, that's about the only place it *can* go. Users can't (or shouldn't be able to) change the code on the fly, or, if they can, they're better than the developer -- or at least, a lot more bored and have way too much free time. Methinks that it mostly results from not probing your environment at startup to determine what the limitations are, and then doing more rigorous checking that you don't violate those limits during operation. Way too many programmers assume infinite resources and don't cope with failure to acquire same. While we're airing pet peeves: why do people assume that changing programming languages will somehow fix this problem? It's just as possible to write rotten Java as rotten C or Fortran (in fact, it's possible to write bad Fortran in *any* programming language...8-)), and IDE's and all the other stuff doesn't fix bad programming practices any more than a fancy grease gun fixes seized bearings. It helps, but fused is still fused. (yes, it's been Tuesday all over. grump.) -- db
Re: Perpetuating Myths about the zSeries
First of all, it was not a myth that TSO was slow when compared to CMS. And i'm not religious about CMS vs TSO. Second, i'd really like a concrete example of what performance literature is way behind for Linux. There were two redbooks this year that looked at many performance issues. If anything, they were productive in finding performance issues that needed to be addressed. From: Jim Sibley [EMAIL PROTECTED] Its like the old TSO is slow myth vs CMS. In the few years of the of the s/360, TSO was slow and a lot of products tried to replace it (ROSCOE, etc). Once TSO got improved, the myth persisted. The performance literature for Linux is way behind what its real capabilites are today. = Jim Sibley If you can't measure it, I'm Just NOT interested!(tm) // Barton Robinson - CBW Internet: [EMAIL PROTECTED] Velocity Software, IncMailing Address: 196-D Castro Street P.O. Box 390640 Mountain View, CA 94041 Mountain View, CA 94039-0640 VM Performance Hotline: 650-964-8867 Fax: 650-964-9012 Web Page: WWW.VELOCITY-SOFTWARE.COM //
Re: RHEL3 requires 256MB to be supported?
On Tuesday 28 October 2003 14:44, you wrote: Hi list, As is considered good practice, I've been trying to use VDISK swap with the DASD diagnose driver when possible. I was surprised to see the following message when IPLing RHEL3 WARNING: Red Hat Enterprise Linux AS release 3 (Taroon) requires at least 256MB RAM to run as a supported configuration. (122MB detected) Normal startup will continue in 10 seconds. (and be put in the 10 second penalty box!) Is this a side-effect of PC-server-think? (and what's RAM, don't they mean STORAGE? :)) You can run with as much or as little virtual storage defined to your guest as you like. If performance stinks, and you want to report it to Red Hat as problem, you'll need to set that value to at least 256MB and reproduce the problem before they'll accept the problem report. So, as always, do what makes sense for your environment. If re-IPLing the system after changing the VM size is not an issue, go with a VM size you like. Mark Post
Re: Perpetuating Myths about the zSeries
Jim said: Its like the old TSO is slow myth vs CMS. In the few years of the of the s/360, TSO was slow and a lot of products tried to replace it (ROSCOE, etc). Once TSO got improved, the myth persisted. In a shop with heavy use of both VM (CMS) and MVS one could gather evidence from objective comparison ... but not all of us live in such a shop. Shucks. So again, the myth persists, as do reports countering it. I'll tell ya an amazing myth: that the mainframe cannot sustain a high interrupt rate. I am still annoyed at the byte-at-a-time nature of interactive Unix, but I had to confess a decade ago, when exposed to AIX/370, that it doesn't kill your system. Ahh... the Cornell days. -- R;
Re: Memory access faults.
I was thinking of batch in z/OS terms, ware there is a distinction. If the OS does not have this distinction then you would treat batch and interactive the same for error handling. That's one of the things I always thought was superior about the TOPS-20 and VMS batch systems. Batch and interactive weren't such different animals. But we digress widely from Linux NQS is a lot closer to the TOPS-20/VMS model for batch (on Linux). Anyone done any experimenting with it recently? -- db