Re: [zones-discuss] networking
Hi, The clprivnet0 interface is something provided by Solaris Cluster. The clprivnet software is a kind of highly available trunking driver that communicates across all of the private networks that connect the machines of the cluster. This set of networks is often called the private interconnect. When there are Zone Clusters, the software automatically sets a subnet and IP addresses for each Zone Cluster. I believe that the cluster software does so as well for the Global Cluster. The administrator specifies a set of subnets and IP addresses for this purpose when configuring the cluster. If you have further questions related to clprivnet, I would suggest sending those questions to sunclus...@sun.com Regards, Ellard On 02/16/10 14:19, Enda O'Connor wrote: Hi Are you sure cluster is disabled, what does /usr/cluster/bin/status show? Enda On 16/02/2010 21:59, Dombrowski, Neil wrote: -Original Message- From: sowmini.varad...@sun.com [mailto:sowmini.varad...@sun.com] Sent: Tuesday, February 16, 2010 1:16 PM To: Dombrowski, Neil Cc: zones-discuss@opensolaris.org Subject: Re: [zones-discuss] networking On (02/16/10 19:03), Dombrowski, Neil wrote: I'm new to zones, and this appears to be a conundrum for me: I have a global zone that shows multiple default routes (on different interfaces). It also shows a third separate interface (clprivnet0) with an IP that's not in anyone's documentation(actually there are two physical servers set up the same way). My guess is that these two servers were to be clustered at one point, but this was aborted before I came onboard. Regardless, the global zone's routing table looks busy, is it because it's showing the routes for the zones? If so, is it possible to have the global zone routing differently than the local zones? hard to answer, without more data on what the subnets for the various zones are, and what the desired routing is. The global zone's netstat may show routes that are only accessible from a non-global zone, so the fact that the routing table is busy does not say anything without more information about the subnet configuration. --Sowmini For an example, let's say zone1 has a default route using gateway 172.16.1.1 and zone2 has a default router using gateway 192.168.0.1. If I am logged into the global zone, and it needs to send a packet to 10.10.10.10, will it use one of the non-global-zone's default route? Looking at /etc/defaultrouter for the global zone, it shows the gateway IPs for the two non-global zones, and also 10.10.10.1 . when I try to traceroute to 10.10.10.10 it never shows a single hop (as if it's not going to any gateway). So, why am I not getting to 10.10.10.10? And if I removed the other default routes in the global zone, will I be damaging the routing for the local zones? If I add a static route in the global zone will that be propagated to the non-local zones(I wouldn't want that)? If there's a good doc out there that explains this, I'd appreciate a pointer to it, or whatever advice you have for me. Thanks, Neil ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] [Fwd: RAC on Zone Clusters BluePrint]
FromEllard Subject RAC on Zone Clusters BluePrint Recently, there were a number of queries for a current document on how to run RAC in non-global zones. We have just published this new document. Gia-Khanh and I published a small book in the BluePrint series that explains the following: 1) A very brief overview of Zone Clusters 2) An overview of how Sun Cluster supports RAC 3) A detailed example of how we configured one system to support RAC running in Zone Clusters 4) For RAC 9i/10g/11g on supported storage topologies we provide an outline of the steps needed to deploy RAC and show the application management configuration (RGM Resource Group / Resource / Dependencies / Affinities). The goal is to explain the use of Zone Clusters to support RAC. Deploying Oracle Real Application Clusters (RAC) on Solaris Zone Clusters The Wiki page url is: http://wikis.sun.com/display/BluePrints/Deploying+Oracle+Real+Application+Clusters+(RAC)+on+Solaris+Zone+Clusters The sun.com mirror page url is: http://www.sun.com/offers/details/820-7661.xml --- This document is intended for public use. Please contact me if you have any questions. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] S10 brand spec.
Hi Jerry, This document provides a lot of useful information. The section solaris10 Brand: What's Not Emulated you repeat some old information that is no longer correct. One point to note is that TX will continue to be incompatible with branded zones. That statement probably dates to the time when the lx brand was the only branded zone other than native. Solaris Trusted Extensions (TX) does not support lx and so it was correct at that time to state that TX does not support branded zones. The BrandZ framework now supports multiple kinds of zones, including the native brand zone. The BrandZ framework provides a powerful mechanism for tailoring the behavior of a zone. The Sun Cluster organization has taken the native brand zone and used the BrandZ framework to add callbacks for notifying Sun Cluster software about various zone changes. This cluster brand zone is a branded zone, but is really a native zone with cluster hooks. Our goal is make this cluster brand zone behave as much as possible just like the native brand zone. We have recently been successful in getting Sun Cluster to work with TX using the cluster brand zone. The Zone Cluster provides a cluster-wide security container. :) So please revise your statement about TX not being able to work with zones other than the native zone. Please note that this is a very recent development. The Sun Cluster organization has not announced support for this as a product offering. Here is the usual disclaimer that an engineering milestone is not the same as a product announcement. In the past, there have been bugs where code was written that assumed that only a native zone supported various options. Please assume that both native and cluster brand zones need the same support. The Sun Cluster organization is considering supporting Zone Clusters composed of non-global zones based upon the solaris10 container for Solaris.Next. This will probably require a cluster+solaris10 composite brand zone. We recognize that the Sun Cluster organization would have to create this composite zone. However, we do request that the design for the solaris10 zone make it possible for the Sun Cluster organization to create such a composite zone by reusing all or almost all solaris10 brand features. We would also be interested in reusing the p2v and v2v features. TX on Solaris.Next will use a branded zone (according to the latest information that I have heard). The Sun Cluster organization will be interested in supporting a composite cluster+TX brand zone. The net result is that we recommend that the Solaris Zones team consider how zone features would be reused by the previously mentioned composite branded zones. Regards, Ellard On 05/12/09 04:28, Jerry Jelinek wrote: Enclosed is a first draft of a spec. for the S10 brand which we plan to submit for a PSARC inception review. Please send us any comments or questions. Thanks, Jerry --- S10C: A Solaris 10 Branded Zone for Solaris.Next Gerald Jelinek, Jordan Vaughan Solaris Virtualization Technologies [A note on terminology: This document uses the terms Solaris 10 and Solaris.Next very frequently. As such, the abbreviations S10 and S.next respectively are used interchangeably with the longer forms. The term virtualization is abbreviated as V12N.] Part 1: Introduction Each new minor release of Solaris brings with it the well known problems of slow user adoption, slow ISV support and concerns about compatibility. The compatibility concerns will be more pronounced with the release of S.next since it's anticipated that there will be greater than normal user-visible changes (e.g. the packaging system, etc.). Fortunately, since the last minor release of Solaris (Solaris 10), V12N techniques have become widespread and V12N can be used as a solution to ease the transition to the new version of Solaris. Zones[1] combined with a brand[2] are particularly well suited for this task since the host system is actually running S.next, whereas this is not necessarily the case with other V12N solutions. In addition, zones are usable on any system which runs S.next, which is also not the case with other V12N alternatives. We already have a proven track record delivering this sort of zones/brand based solution to enable running earlier versions of Solaris on S10 [3, 4], so in one sense this case breaks little new ground. However, the earlier 'solaris8' and 'solaris9' brands were used to host releases that are very static as compared to hosting a zone running S10. In addition, S.next can be expected to continue to change rapidly for the forseeable future. Given this, a 'solaris10' brand for S.next poses additional challenges for projects on both the S10 and S.next sides of the system. Many of these challenges are outside of the scope of an architectural review and include developer education, testing and
[zones-discuss] [Fwd: df -h in zone cluster]
Hi, The question raised by Sunil seems to be a zones question. Does anyone have an explanation or is this a bug ? Regards, Ellard Original Message Subject: df -h in zone cluster Date: Thu, 30 Apr 2009 10:28:38 -0400 From: Sunil Sohani sunil.soh...@sun.com To: sunclus...@sun.com Hi, IHAC running zone clusters. They are are running some monitoring software within zone cluster which does df -h to monitor disk space. Here is sample output: # df -h Filesystem size used avail capacity Mounted on /0K 8.1G13G39%/ /dev21G 8.1G13G39%/dev /lib31G 9.6G21G32%/lib /platform 31G 9.6G21G32%/platform /sbin 31G 9.6G21G32%/sbin /usr31G 9.6G21G32%/usr /usr/local 31G 9.6G21G32%/usr/local proc 0K 0K 0K 0%/proc ctfs 0K 0K 0K 0%/system/contract mnttab 0K 0K 0K 0%/etc/mnttab objfs0K 0K 0K 0%/system/object swap17G 376K17G 1%/etc/svc/volatile fd 0K 0K 0K 0%/dev/fd swap17G64K17G 1%/tmp swap17G56K17G 1%/var/run /sbin 31G 9.6G21G32%/var/cluster/sbin.org /usr/cluster/lib/sc/ifconfig_client_proxy 31G 9.6G21G32%/sbin/ifconfig zdtgdbq01/odb0K29K86G 1%/odb zdtgdbq01/odb/dtg02/flashdata01 0K 5.1G 4.9G52%/odb/dtg02/flashdata01 zdtgdbq01/odb/dtg02/oraarch 0K 525M 9.5G 6%/odb/dtg02/oraarch zdtgdbq01/odb/dtg02/orabackup 0K14M 10.0G 1%/odb/dtg02/orabackup zdtgdbq01/odb/dtg02/orabin 0K 2.2G 7.8G22%/odb/dtg02/orabin zdtgdbq01/odb/dtg02/oradata01 0K 1.2G 8.8G12%/odb/dtg02/oradata01 zdtgdbq01/odb/dtg02/oradata02 0K 1.3G 8.7G13%/odb/dtg02/oradata02 zdtgdbq01/odb/oem01/orabin 0K 1.9G 8.1G19%/odb/oem01/orabin zdtgdbq01 is a ZFS pool used for Oracle database that has been added to this zone cluster. Monitoring software looks at size column and sees 0K and starts sending alerts. Is there a solution for this? Or is that how it is going to be and they need to change the way they monitor it? Customer thinks they haven't configured the resource group properly. Sunil ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Patching clustered zones
Hi Geoff, As an introduction I work on the Sun Cluster development team. Questions about how to support the existing Sun Cluster product can be sent to sunclus...@sun.com There are people that answer questions about the existing product. Sun Cluster today supports non-global zone in two ways: 1) HA-Containers - this approach makes it look like the non-global zone can fail over between machines. However, this approach is NOT based upon detach a zone from one machine and then attach the zone to another machine. 2) Another approach treats a non-global zone as a place where applications can be started/halted under sun cluster control. However, this approach does not provide isolation. Sun Cluster will very, very, very soon ship SC3.2 update 2, which introduces a new zone feature: Zone Cluster - this is a virtual cluster where each virtual node is a non-global zone. The application inside the Zone Cluster sees the Zone Cluster as a dedicated private cluster. This feature provides application fault isolation, security isolation, resource management, and license fee cost containment. We provide many ease-of-use features. I would be happy to provide details on this new feature. Sun Cluster supports the use of ZFS as the root file system with SC3.2u2. Sun Cluster supports several approaches for changing software. 1. Halt all nodes. Install new software on each node. Boot all nodes. 2. Rolling Upgrade - halt one node at a time. While in non-cluster mode, install new software on that node. Reboot that node into cluster mode. Repeat process for each node until done. A portion of the cluster remains up at all times. 3. Quantum Leap - Halt half the cluster and install new software in non-cluster mode. The Quantum Leap then does a very quick handoff of services from the partition with old software to the partition with new software. Next upgrade this second partition. Reboot the second partition in cluster mode, and the full cluster reforms. We also support Live Upgrade. So it is possible to change software with minimal down time. All of these approaches can be used with patches, as well as update releases. Approaches 1 3 always run a cluster with all nodes at the same release level. Rolling Upgrade supports the situation where the different nodes are at different OS SC patch levels (but does nothing for application patches). Quantum Leap can be used to upgrade: OS, Sun Cluster, 3rd party File System or Volume Manager, application software, and any other software that you can put on a cluster. At this point Sun Cluster does not have any need patch on attach function. If you still believe that there is an important upgrade scenario, that Sun Cluster does not support, please let me know. I am the Technical Lead for the infrastructure area, which includes both Zone Clusters and upgrade technology. Regards, Ellard Roush On 01/22/09 11:40, Geoff Lane wrote: We are in the process of setting up a service consisting of SAN based global storage which will host a number of ZFS based zones, each running an application that must be nade highly avalable. The zones are made highly available using Solaris Cluster and failover. This is all rather standard and is described in the Sun zone/cluster docs. For technical reasons all the zones will be full root zones. However, we are having trouble finding a safe patching procedure that minimises application downtime. The zone docs tell us that a zone should be maintained as the same patch level as the global zone, but in a cluster this seems impossible without a total break in service. At some point the app zones will be running on one of the global zones forming the cluster but with mismatched patches. The patch on attach facility will be nice but is it going to work in a cluster. Will the cluster notice that the zone is very slow booting up and treat it as a failure? Even if cluster doesn't care, the apps are still down while the zone is being patched. Is there a soution that I've missed? Thanks, ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] documentation for zones
Hi, Please see comments inline. Ellard Edward Pilatowicz wrote: On Fri, Nov 21, 2008 at 01:02:14PM +0100, Maciej Browarski wrote: Jerry Jelinek pisze: Maciej Browarski wrote: Hello, Is there any consolidate documentation about build of config.xml and platform.xml files ? Because information about content are in many documents, but I can't find exactly what options are correct and possible in this two files. There is no docs because those are project private interfaces. I assume you are trying to create your own brand? Perhaps you can tell us more about what you are trying to do. Thanks, Jerry Yes,I try to understand, how Zones works, and how we can configure it. :) well, you shouldn't be modifying any of the parameters in platform.xml, config.xml, or any of the zone xml files. once again, what are you trying to do? So I have below question: - what are different between privileges set default, prohibited and required in config.xml ? well, the default privs are the privs that all zones get. the prohibited privs are ones that can't be added to zones by zonecfg. the requires pivs are ones that can't be removed from zones by zonecfg. - is this privileges are only information for zoneadm how to configure zones or have any impact to create and running zones?(so is this list of privileges also are hard coded in kernel and config.xml only inform about privileges ?) zone privs are not hardcoded into the kernle. - if I change only brand name in config.xml I see this name later in zoneadm list -iv, so is this has only impact to zoneadm list or also in kernel performance ? (to be more clear, is there any native brand hard coded in kernel, that native zone is more privileges and faster than other names and brand? what exactly information are carry in struct brand p_brand and p_brand_data in proc_t structure). you will break things if you randomly change zone brand names. there is special handling for the native brand in the kernel. if a zone is of type native, the kernel doesn't invoke any of the optional brandz interposition callbacks. that said, i don't think you'd be able to see any observable performance differences. Please note that there is a cluster Brand zone. From the perspective of packaging/patching/updating the cluster Brand zone is identical to the native brand. If the native Brand zone ever gets any other kind of special treatment, the cluster Brand zone will need the same treatment. The cluster Brand zone is really a native Brand zone with cluster hooks. the p_brand and p_brand_data structures are used to keep track of process brand specific data. - which options determinate that packages are also installed/updated from global zone (so if I like to have old packages, not updated in zones but without -G options). I aware that I can break depend between packages. the packaging tools ignore all non-native (ie, branded) zones. there is no brand flag that tells the packaging system to keep a branded zone in sync with the global zone. No. The cluster brand zone is treated just like a native brand zone by the packaging tools. We have a PSARC contract on this point. The Solaris software should NEVER assume that a brand zone is Always different from the native brand zone type. The cluster brand zone needs the same support as the native brand zone. - if I clear attach and detach options packages, will be not checked in zoneadm attach and attached will be successful ? i don't really understand this question. ed ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Questions regarding Solaris containers
Hi, The Sun Cluster Express release has already shipped a new feature called Zone Clusters. This is a Virtual Cluster where the virtual node is a zone. A major reason for developing this feature was to provide the ability to run Oracle RAC in zones. Oracle RAC requires a cluster environment, in other words Oracle RAC requires multiple machines. The Zone Cluster satisfies the needs of Oracle RAC. We have successfully run RAC 9i, 10g, 11g on the same hardware at the same time in different zone clusters. The Sun Cluster Marketing organization always announces new features. As an engineer I cannot formally announce new support. However, I would be happy to provide more information for you, and can even demonstrate this feature in actual operation for interested people. The next product release of Sun Cluster will be SC3.2 update 2 early in 2009. I am most optimistic about supporting RAC with Zone Clusters soon. If anybody wants more detailed information, please contact me. Regards, Ellard ... 8. What databases are supported today for Solaris containers? As per the bigadmin document “db_in_containers”, only non-RAC Oracle is supported by containers. Is this still valid today or is there support provided for Oracle RAC? Oracle is supported. I understand that RAC support may be coming. Is DB2 supported inside containers? I don't know. Steffen Thanks in advance. Regards, -Narsimha ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Running Oracle Database inside Solaris 8/9 Container Using Sun Cluster
Hi Dr. Hung-Sheng Tsao, Sun Cluster supports today single machine Oracle Data Base running inside a Solaris Container (also called a non-global zone), where the zone is a native brand zone. In the latest Sun Cluster Express release, Sun Cluster supports a new feature called a Zone Cluster, which is a Virtual Cluster where the virtual nodes are all non-global zones. We have run Oracle RAC 9i, 10g, and 11g concurrently inside different Zone Clusters on the same set of hardware. Our Marketing team formally announces new features in the commercial product. If you are interested in running Oracle RAC in zones, please contact me and I will provide details on how we will soon be supporting the ability to run Oracle RAC in zones. Please note that I will be at a conference the rest of the week. So I will respond next week upon my return. If anyone else is interested in running RAC in zones, please contact me. Regards, Ellard Dr. Hung-Sheng Tsao (LaoTsao) wrote: Eric Li wrote: Dear All, Our customers like to run existing Oracle database inside Solaris 8/9 container using Sun Cluster. Please kindly advise if - Is this configuration certified by Oracle? not - Will it be supported by Oracle? not - Will Sun Cluster support this? (Sun Cluster 3.2 02/08?) ha-zone - Any references? Thank you in advance for your help. Best regards, Eric ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zone management
Hi Nathan, The Sun Cluster organization is introducing a new cluster Brand zone, that is based upon the native Brand zone plus hooks for Sun Cluster. It would be nice if the design were flexible enough so that we could leverage your proposed tools for a Brand zone other than native, or at least for a Brand that is almost native. Regards, Ellard Roush Nathan Dietsch wrote: Hello All, I am looking to the OpenSolaris community for input on your respective experiences with management tools for zones. I am currently looking at the options for managing zones and am looking for a tool that would let me; * Install new zones according to a template * Clone existing zones * Detach and attach zones on different systems to facilitate migration * Flag an update on attach operation for a zone (I realise that this has not yet been implemented in Solaris 10) * Patch a zone * Shutdown/restart a zone * Handle the management of SMF services within a zone Nice to have, but not overly necessary * Integration with Solaris Resource Manager * Deploy a server personality to a zone (Packages, Conf Files, Mounts, SMF services etc) I know that xVM Ops Centre can handle most of these tasks, but I am not sure that it handles the zone migration or update on attach components and I have briefly read about Container Manager which is part of SMC. I know that both Sun Cluster and VCS can handle the zone migration components, but they do not have the scope to handle the other tasks. Are there any other tools out there that handle these sort of tasks? How do you manage zones in your respective environments? Any and all input is much appreciated. Kind Regards, Nathan Dietsch ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Patches vs Updates - Zone Features
Hi Enda, Enda O'Connor ( Sun Micro Systems Ireland) wrote: Ellard Roush wrote: Hi, Solaris 10 update 4 introduced the BrandZ feature set. Solaris 10 update 5 will introduce more zone features. Today Sun Cluster requires the the Solaris 10 release be at least up to the Solaris 10 update 3 level. We are proposing to ship a new feature in Sun Cluster that will use the BrandZ feature set and support the new zone features in Solaris 10 update 5. Naturally, this new feature will only be operational when the customer installs Solaris 10 update 5. There are at least 2 ways to load new software. 1) Install the Solaris 10 update 5 release In this case we know that everything works fine. 2) Install patches for Solaris 10 update 5. This approach loads all of the bug fixes, and does not load the new packages. If a customer installs patches, will BrandZ and all the new zone features of Solaris 10 update 5 work ? Or will the customer just get the bug fixes ? they'll get everything in this case. 127127-11/127128-11 is the u5 kernel patch that will deliver all this. What features are you interested in. We are using the BrandZ framework to support a cluster Brand zone that is the same as the native Brand zone with hooks added for our software. For example, we use the callbacks to learn when zones change state up vs down, while we still execute the original native brand functionality in these cases. We are going support a Zone Cluster, which is a virtual cluster, where each virtual node is a cluster brand zone. This will enable us to support cluster applications inside a zone environment. This means that we need S10u4 in order to get the BrandZ feature set. We also would like to support the new zone features of S10u5, which will probably include hard caps on CPU's. As for installing patcehs on zones systems there are a few things they need to be aware of. 1 install latest patch utils first (119254/119255) 2 always run patchadd -a patch-id first before installing the patch. the -a does a dryrun and especially in the case of zones, will catch issues like zones dependency issues/unbootable zones etc. No files get modified, so it allows you to identify certain types of issues ( not all issues mind you ). The patchadd -a output is pretty hard to parse, but make sure to examine closely for any issues relating to zones etc. Enda Thanks for the information. Ellard --- Our people in the field say that customers are much more willing to install patch as opposed to installing an update. Some aspects of patching are murky. So your help is appreciated. Regards, Ellard ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] The quick dirty guide to zones on iSCSI LUNs
Hi James, James Carlson wrote: Ellard Roush writes: James Carlson wrote: That point in time is as soon as your application can start. It need not have any dependencies at all. Here is the other point that needs to be clarified. This is not an application. Applications do not start until much later. We have to get the cluster formed and cluster services established before applications run. We probably have different definitions of that term. For networking, an application is something that uses the services provided by a transport or (for raw sockets) network layer protocol. I'm not talking about user applications; just things that use networking services in some way. OK. Now I understand what you mean. Your program (whatever it is) should not need dependencies on networking in order to be successful. As I suggested before, it's sometimes helpful to listen to routing sockets (you can get hints there about when it might be a good time to shorten a retry timer, and thus make your program respond more quickly), but it's not really a dependency issue. The internal interfaces that we had to use are not well documented. Your explanation helps understand what is probably going on. It's hinted at in the documentation, but not as well-documented as it should be. man -s 3socket connect says: underlying transport provider. Generally, stream sockets can successfully connect() only once. Datagram sockets can use [..] ECONNREFUSED The attempt to connect was forcefully rejected. The calling program should close(2) the socket descriptor, and issue another socket(3SOCKET) call to obtain a new descriptor before attempting another connect() call. That generally is also true for most unsuccessful connect() calls and the advice under ECONNREFUSED is actually true for pretty much all failures. The exceptions are the non-failure failures -- EALREADY, EINPROGRESS, and EWOULDBLOCK. I think that issue is what the text is trying to dance around. You're partly connected (at least bound) after the real failures, and getting back to a clean state is easiest just by close() and trying again. The usual references (Stevens and others) have more detailed discussions. The underlying problem is that for much of the BSD world, the code *is* the documentation, so whatever sockets did, well, that's what they do. (For what it's worth, this isn't even one of the darker corners. Raw socket behavior, for example, varies in mysterious ways across OS platforms and even across releases of a given OS.) Thanks for the explanation. Our Quorum Server uses the approach that you suggested. We discovered it the hard way. We are now attempting to use iSCSI devices as quorum devices. I will share your insight with the iSCSI people. Regards, Ellard ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] The quick dirty guide to zones on iSCSI LUNs
Hi James, It is already well known that routes come and go. It is already well known that the way to determine whether a destination is reachable is to attempt to contact that destination. That is NOT the issue that I am raising. We have seen the following PROBLEM. Our code has a dependency upon Solaris network routing. After SMF reports that Solaris network routing initialization has begun. We attempt to contact the quorum device. That attempt fails. We wait and retry. ALL SUBSEQUENT retries fail !!! If we make the code sleep long enough for Solaris routing to complete initialization, then after a failed attempt to connect, then retries work whenever the route becomes available. The problem is that Solaris routing goes into an error state when we attempt to connect before it is ready. SMF starts services as soon as the service dependencies are satisfied. So we can and do attempt our first connection before Solaris routing is really ready ! We are not asking for indication as to when a route is present. We want to know when we can attempt to establish a connection without Solaris routing going into an error state that causes all subsequent attempts to connect to fail. We have found another recovery method for this problem. We do not just retry the connection. We destroy all network data structures (socket) This clears the bad state. retries then eventually succeed. Regards, Ellard James Carlson wrote: Ellard Roush writes: Thanks for explaining about how the routing situation changes dynamically. However, we have been aware of that for a long time. Sun Cluster (SC) is a High Availability product. We have customers that want recovery to occur in less than 2 seconds. While we have not achieved that goal, we are working in that direction. This means that some operations MUST complete very quickly. A late completion of an operation is a failure. Understood. As a general principle, though, you cannot demand that other systems do anything you want at any other time. When networking is involved, other independent systems are involved. In other words, I think the focus is on the wrong level here. The whole deployment -- the routers, bridges, and other infrastructure included -- must be designed to meet your goal, not _just_ this one bit of Solaris software. (And once that's done, the state of routing in Solaris may or may not be at issue.) More specifically, when a quorum device is unreachable for substantial periods of time, the unreachable quorum device is in a failed state as far as we are concerned. This is true even when the device might be reachable 60 seconds from now. The administrator must configure a quorum device that can be reached reliably in a short time period. The solution is easy at this level: send a packet. If you get a sensible response, then that system is in fact reachable. If you don't get a sensible response within the time constraint that you've set for yourself, then it's not. That's really the only information available. The current SMF information does not even tell us when the Solaris routing software can even accept attempts to communicate. That's correct. As I've already outlined *it doesn't know* and (more importantly) *it cannot in principle know*. Or, if you prefer: it always accepts attempts to communicate. It just won't always be successful in those attempts. We already know that the attempts can fail. Before the routing software in Solaris is ready, all attempts to communicate will fail. We just want to know when it is safe to try. We are not asking for a dependency upon when a specific route is present. We know that is not possible. We have encountered problems when an attempt is made before the routing software is ready. We want to access the quorum device as soon as we can for quicker recovery, but no sooner than can be achieved reliably. There's just no general solution to the problem. If the only thing you care about is whether routing has established a route to somewhere, then (as I mentioned before) you can listen to a routing socket to observe the resulting RTM_ADD. I don't think that'll actually help you in your quest, but it's certainly doable and answers the immediate (and I think improperly formed) question of when routing software in Solaris is 'ready'. For some value of ready, at least. There is simply *NO WAY* that the system can tell you a priori whether an attempt to transmit a packet will actually result in that packet being sent from the system (ARP can still fail and Spanning Tree can disable ports silently) or whether delivery is possible. Only sending data can do that, and only then in retrospect. If you get an answer, then it must have worked. I strongly disagree that we should be offering any sort of routing is ready checkpoint or SMF dependency. It'd be misleading at best, and would result in a new class of unsolvable failure modes
Re: [zones-discuss] The quick dirty guide to zones on iSCSI LUNs
Hi James, James Carlson wrote: Ellard Roush writes: If we make the code sleep long enough for Solaris routing to complete initialization, then after a failed attempt to connect, then retries work whenever the route becomes available. The problem is that Solaris routing goes into an error state when we attempt to connect before it is ready. OK, it sounds like we're talking at cross-purposes here. Yes. But we finally seem to be reaching an understanding. That is progress. I haven't seen such a problem myself (it sounds like an application bug to me -- at a wild guess, possibly not handling dynamic interfaces correctly; see below). File a bug on solaris/kernel/tcp-ip. The TCP/IP stack itself is responsible for taking user data and matching it against kernel routes (actually, they're forwarding entries). The user space routing daemons (the things controlled by SMF) neither know nor _care_ what the kernel is doing with user data packets, so dependencies on them won't help anything. Even if some sort of error state is possible in the kernel (again, I haven't seen such a thing, at least not described in those terms), I don't see how routing daemons are involved here or how anything iSCSI can do would affect them. We are not asking for indication as to when a route is present. We want to know when we can attempt to establish a connection without Solaris routing going into an error state that causes all subsequent attempts to connect to fail. That point in time is as soon as your application can start. It need not have any dependencies at all. Here is the other point that needs to be clarified. This is not an application. Applications do not start until much later. We have to get the cluster formed and cluster services established before applications run. If you prefer, you may depend on this service so that at least lo0 is plumbed up when you start: svc:/network/loopback:default Most networking applications don't even need that, though. We have found another recovery method for this problem. We do not just retry the connection. We destroy all network data structures (socket) This clears the bad state. retries then eventually succeed. It sounds to me like you're not dealing with dynamic interfaces correctly. If you don't explicitly bind a preferred address to use (most applications do not), then the kernel will choose an address for you. With UDP, this happens on a packet-by-packet basis. With TCP, though, it happens once as the connect() request is started. When the kernel does this, it picks the best-matching kernel forwarding entry (at that moment in time) for the supplied destination IP address (UDP sendto() or TCP connect()), and then selects a source address based on the output interface that this entry points to. Other interfaces may come and go over time, other routes may be learned or forgotten, but we _never_ go back and rewire that TCP source address. It perhaps doesn't sound like the best possible answer, but that's how BSD sockets have worked for many decades, and it's expected behavior. If connect() fails or if you need to give up for some reason, there's no way to unbind. The proper procedure is to close the socket, and build a new one. I think you're barking up the wrong tree by attempting to establish some sort of dependency on routing. The internal interfaces that we had to use are not well documented. Your explanation helps understand what is probably going on. Regards, Ellard ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] The quick dirty guide to zones on iSCSI LUNs
Christine Tran wrote: roush wrote: Sun Cluster plans to support an iSCSI disk as a quorum device. Sun Cluster accesses the iSCSI disk early in the boot process. When the iSCSI disk is on the same subnet as the cluster machines, things work. When the iSCSI disk is on a different subnet the system cannot find the iSCSI disk (ENXIO). However, after Solaris is fully up we have no access problems. Solaris automatically boots up zones in many configurations. The point at which Solaris boots zones is later, so you may or may not hit this problem. I would be interested to hear whether you encounter this problem or not. Hi Ellard, No, I have not encountered this problem. The targets mount just in time for my zones. But it sounds to me like a dependency on svc:/network/routing/route:default for cluster could help this along? CT Hi Christine, We have dependencies upon routing. However, this dependency only let's us know when initialization of routing started and does not tell us when things are ready. iSCSI hides the fact that a network is involved, which complicated solving this issue. But we are working on it. Thanks for the information. This helps confirm that we have a startup ordering problem. Regards, Ellard ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] The quick dirty guide to zones on iSCSI LUNs
Hi Christine, Interesting report. We also will be supporting the use of iSCSI with Sun Cluster. Here is one specific problem that we have encountered that may or may not affect you. Sun Cluster plans to support an iSCSI disk as a quorum device. Sun Cluster accesses the iSCSI disk early in the boot process. When the iSCSI disk is on the same subnet as the cluster machines, things work. When the iSCSI disk is on a different subnet the system cannot find the iSCSI disk (ENXIO). However, after Solaris is fully up we have no access problems. Solaris automatically boots up zones in many configurations. The point at which Solaris boots zones is later, so you may or may not hit this problem. I would be interested to hear whether you encounter this problem or not. Regards, Ellard Roush Christine Tran wrote: What is iSCSI? SCSI over TCP/IP. iSCSI makes remote disks look local. The remote host with storage resource presents iscsi targets. The client accessing the storage is the initiator. iSCSI initiator was present in S10 3/05 and up. iSCSI target went into S10 8/07. Why zones on iSCSI? iSCSI frees you from the limitation of putting zones on local storage. The physical bits of the zoneroot can live anywhere accessible with a network connection. You can use the zone detach/attach function without SAN or shared storage. This ability circumvents a bunch of problems associated with zonepath on NFS mount, for example, see RFE 4963321: hosting root filesystems for zones on NFS servers. What's the catch? Speed of zone installation and patching depends on how fast your network is. Currently it doesn't look like you can do a standard upgrade on a box with zones on iSCSI LUNs because there's no iSCSI packages in the miniroot. What works? Installing and booting zones on iSCSI targets, patching in single-user mode, upgrading via LiveUpgrade. How to do it? This is a quick write-up. I used ZFS zvol but this is not necessary. ZFS makes creating iscsi targets PAINLESS and takes only one command. I placed the zonepath on a striped SVM volume because I was testing a specific config, for speed, and eventually I want to use an SVM mirror to provide redundancy for my zonepath. Most outputs are omitted, what's provided is for clarity. 1. create the targets 2. client discovery of target 3. label disk, lay down SVM, filesystem 4. configure zones 5. apply recommended patch cluster, LU patch cluster 6. lucreate, luupgrade, luactivate nvd is a box running snv_80 but S10 8/07 is just as good. Client is running S10 8/07. nvd# zpool create tran1 c0t18d0 c0t19d0 nvd# zpool create tran2 c0t20d0 c0t21d0 nvd# zfs create -V 16g tran1/xmen nvd# zfs create -V 16g tran2/hulk nvd# zfs set shareiscsi=on tran1/xmen nvd# zfs set shareiscsi=on tran2/hulk nvd# iscsitadm list target -v Target: tran1/xmen iSCSI Name: iqn.1986-03.com.sun:02:4a46145b-8b71-69ab-8cee-c8a9c4367f0a Target: tran2/hulk iSCSI Name: iqn.1986-03.com.sun:02:f57bbbf8-3504-4d9e-8c2b-ddfa45cfb641 ~ iscsiadm add static-config iqn.1986-03.com.sun:02:4a46145 b-8b71-69ab-8cee-c8a9c4367f0a,129.154.158.154 ~ iscsiadm add static-config iqn.1986-03.com.sun:02:f57bbbf 8-3504-4d9e-8c2b-ddfa45cfb641,129.154.158.154 ~ iscsiadm modify discovery --static enable ~ devfsadm -i iscsi ~ iscsiadm list target -S Target: iqn.1986-03.com.sun:02:f57bbbf8-3504-4d9e-8c2b-ddfa45cfb641 OS Device Name: /dev/rdsk/c5t0103BA681D5F2A0047E84932d0s2 Target: iqn.1986-03.com.sun:02:4a46145b-8b71-69ab-8cee-c8a9c4367f0a OS Device Name: /dev/rdsk/c5t0103BA681D5F2A0047E84934d0s2 ~ format [...] 8. c5t0103BA681D5F2A0047E84932d0 SUN-SOLARIS-1 cyl 32766 alt 2 hd4 sec 256 /scsi_vhci/[EMAIL PROTECTED] 9. c5t0103BA681D5F2A0047E84934d0 SUN-SOLARIS-1 cyl 32766 alt 2 hd4 sec 256 /scsi_vhci/[EMAIL PROTECTED] label partition [Striping, nologging and noatime for speed] ~ metainit d30 1 2 c5t0103BA681D5F2A0047E84932d0s0 c5t0103BA681D5F2A0047E84934d0s0 -i 32k ~ newfs -v /dev/md/dsk/d30 ~ mount -F ufs -o nologging,noatime /dev/md/dsk/d30 /zones [You need the mount-at-boot option == yes, otherwise it would not mount at boot, despite what the mount(1M) manpage says] ~ vi vfstab /dev/md/dsk/d30 /dev/md/rdsk/d30 /zones ufs 1 yes nologging,noatime ~ zonecfg -z zone1 zonecfg:zone1 create zonecfg:zone1 set zonepath=/zones/zone1 [...] ~ zoneadm -z zone1 install ~ zoneadm -z zone1 boot {1} ok boot -s Entering System Maintenance Mode [iSCSI Initiator is present] ~ modinfo |grep -i iscsi 36 13252e8 2b4a0 271 1 iscsi (Sun iSCSI Initiator v20061003-0) [Target LUNS are present] ~ iscsiadm list target Target: iqn.1986-03.com.sun:02:f57bbbf8-3504-4d9e-8c2b-ddfa45cfb641 Target: iqn.1986-03.com.sun:02:4a46145b-8b71-69ab-8cee-c8a9c4367f0a [boot zones, apply patch cluster and LU patch cluster. sunsolve.sun.com has
Re: [zones-discuss] updating a zone when attaching
Hi Jerry, This proposal mentions native zones. Please ensure that the cluster brand is treated as a native brand, as noted in PSARC 2007/304. By the way PSARC 2007/304 was approved last week. The changes are now in Nevada. We have been working with the ON Install gate C-teams. The changes will go into the S10u4 gates once we receive notification of what date they want the putback to occur. After the changes get into both S10u4 gates, I will return to discuss the long-term solution for S10u5. Regards, Ellard Jerry Jelinek wrote: Enclosed is a draft of an ARC fast-track proposal I have been working on recently, in-between a few other things. I would like to submit this for ARC review shortly but I wanted to send this out to see if anybody had any comments before I do that. I have cc-ed the install-discuss alias as well, since there is some overlap, although this is probably most interesting to zones folks. One additional comment. I believe this proposal should also address the recurring question about being able to migrate a zone from sun4u to sun4v (and back). Please send me any comments or questions. Thanks, Jerry --- SUMMARY: This fast-track enhances the Solaris Zones [1] subsystem to address an existing RFE [2] requesting the ability to update a non-global zone when migrating from one machine to another. Currently when we migrate a zone we validate that the destination host has the same pkg versions and patches for the zone-dependent packages as were installed on the source host. This is described in the zone migration ARC case [3]. While this is safe and ensures that the new host is capable of properly supporting the zone, it is also very restrictive. With this enhancement, if the new host has higher versions of the zone-dependent pkgs, or higher versions of patches for those pkgs, then when we attach the zone to the new host we will enable an update of the pkgs in the zone to match the new host. Patch binding is requested for this update on attach capability. The stability of these interfaces is documented in the interface table below. DETAILS: Update on attach is different from a traditional zone upgrade. In the traditional upgrade all native zones are upgraded as part of upgrading the base system using a standard Solaris media image as the source for the pkgs to upgrade to. Pkg operations on pkgs with the SUNW_ALLZONES attribute set must be run from the global zone, the operation will be performed on all native zones, and this behavior is built-in to the pkg commands. With update on attach we are only updating a single zone. We cannot depend on the basic pkg behavior which updates all zones when a pkg is installed in the global zone. We cannot use standard Solaris media since the host can have a variety of patches installed which have updated the base system pkgs beyond any specific Solaris release. Instead what we want to do is similar to what happens when a zone is initially installed. The spooled pkg data and global zone files are the source for installing the zone. In this way the zone is installed with the correct pkg versions along with any patches that have been applied to those pkgs. We can do something similar for update on attach. The zone 'attach' validation already generates a list of mismatched pkg versions and patches. We can use this information to determine which dependent pkgs need to be updated so that the zone can run properly on the new host. We will remove the obsolete versions of those pkgs and install the up-to-date version from the pkg data spooled in the global zone. This procedure will preserve any editable or volatile files that are delivered by these pkgs. The normal pkg install scripts and class action scripts are run as part of this process so any updates performed by these scripts will take place. As described in [3] the dependent pkgs are those that have the SUNW_PKG_ALLZONES=true pkg attribute as well as any pkgs installed in an inherited-pkg-dir. Only these pkgs will be updated to match the new host. We will ensure that we will only update a zone to a host running the same or later version of the dependent pkgs. For example, if the new host has a mix of higher and lower version patches as compared to the source host then we will not allow an update during the attach. By default the zone will not be updated during attach. Instead, the existing output listing the pkgs that are out of sync will continue to be printed. We will add a new option (-u) to the 'zoneadm attach' subcommand. When this option is used then zoneadm will update the necessary pkgs during the attach (assuming there are any to update). Because the zone has previously booted and run on the source host it is
Re: [zones-discuss] Why is mount disabled for branded zones
Hi Enda, This provides a good opportunity to clear up some misinformation. The BrandZ lx zone type does not use standard patch/package commands. There will be BrandZ zone types that do use standard patch/package commands. The Cluster group is developing now a cluster BrandZ zone type that uses the BrandZ callbacks to enhance a zone. The cluster BrandZ uses standard patch/package commands. The Zones BrandZ team in Solaris told us that a BrandZ approach was the correct way to enhance a zone. We are now in the middle of correcting these problems. If you have information about places where this problem appears, please let us know so that we can fix the problem. Thanks, Ellard Enda O'Connor ( Sun Micro Systems Ireland) wrote: Tirthankar wrote: Hi, On my machine (running s01u4_06) I have 3 local zones. pship2 @ / $ zoneadm list -cv ID NAME STATUS PATH BRANDIP 0 global running/ native shared 2 cz2 running/zones/cz2 my_brand shared 5 cz4 running/zones/cz4 native shared - cz3 installed /zones/cz3 lx shared pship2 @ / $ cz2 is my_brand branded zone pship2 @ / $ zoneadm -z cz2 mount zoneadm: zone 'cz2': mount operation is invalid for branded zones. Why is mount command disallowed for a branded zone ? I can boot the zone, using the normal zoneadm -z cz2 boot command Note: The config.xml and platform.xml for my_brand is identical to the native brand except for the brand name. Hi mount is an internal state used by the patch/package commands only. It basically does some mount magic, such that the zone's zone is mounted in from the global lofs, plus /dev etc. Not really applicable to a zone that is not native as it cannot be patched. Enda ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org