[Users] General challenges w/ Ovirt 3.1

2012-09-29 Thread Hans Lellelid
Hi -

I apologize in advance that this email is less about a specific
problem and more a general inquiry as to the most recommended /
likely-to-be-successful way path.

I am on my third attempt to get an ovirt system up and running (using
a collection of spare servers, that meet the requirements set out in
the user guide).  I'm looking to implement a viable evolution to the
unscalable stove-piped ESXi servers (i.e. which are free).  And while
I'm happy to learn more about the underpinnings, I recognize that to
really be a replacement for these VMWare solutions, this has to mostly
Just Work -- and that's eventually why I've given up on previous
occasions (after a couple days of work) and decided to revisit in 6
months.

My basic server setup is:
 - oVirt Mgmt (engine)
 - oVirt HV1 (hypervisor node)
 - oVirt HV2 (hypervisor node)
 - oVirt Disk (NFS share)

1st attempt: I installed the latest stable Node image (2.5.1) on the
HV1 and HV2 machines and re-installed the Mgmt server w/ Fedora 17
(64-bit) and all the latest stable engine packages.  For the first
time, Node installation and Engine setup all went flawlessly.  But I
could not mount the NFS shares.  Upon deeper research, this appeared
to be the bug mentioned about NFS; I was *sure* that the official
stable Node image would have had a downgraded kernel, but apparently
not :) I have no idea if there is an officially supported way to
downgrade the kernel on the Node images; the warnings say that any
changes will not persist, so I assume there is not.  (I am frankly a
little surprised that the official stable packages  ISO won't
actually work to mount NFS shares, which is the recommended storage
strategy and kinda critical to this thing!?) .  FWIW, the oVirt Disk
system is a CentOS 6.2 system.

2nd attempt: I re-installed the nodes as Fedora 17 boxes and
downgraded the kernels to 3.4.6-2.  Then I connected these from the
Engine (specifying the root pw) and watched the logs while things
installed.  After reboot neither of the servers were reachable.
Sitting in front of the console, I realized that networking was
refusing to start; several errors printed to the console looked like:

device-mapper: table: 253:??: multipath: error getting device (I don't
remember exactly what was after the 253:)

calling multipath -ll yielded no output, calling multipath -r
re-issued the above errors

Obviously the Engine did a lot of work there, setting up the bridge,
etc.  I did not spend a long time trying to untangle this.  (In
retrospect, I will go back and probably spend more time trying to
track this down, but it's difficult since I lose network  have to
stand at the console in the server room :))

3rd attempt: I re-installed the nodes with Fedora 17 and attempted to
install VDSM manually by RPM.  Despite following the instructions to
turn off ssl (ssl=false in /etc/vdsm/vdsm.conf), I am seeing SSL
unknown cert errors from the python socket server with every attempt
of the engine to talk to the node.   I added the CA from the master
into the /etc/pki/vdsm (since that was the commented-out path in the
config file as the trust store)  and added the server's cert here too,
but have no idea what form these files should take to be respected by
the python server -- or if they are respected at all.  I couldn't find
this documented anywhere, so I left the servers spewing logs for the
weekend figuring that I'll either give up or try another strategy on
Monday.

So is there a general strategy that should get me to a working system
here?  I suspect that the Node image is not a good path, since it
appears to be incompatible with NFS mounting.  The
Fedora-17-installed-by-engine sounds good, but there's a lot of magic
there  it obviously completely broke my systems.  Is that where I
should focus my efforts?  Should I ditch NFS storage and just try to
get something working with local-only storage on the nodes?  (Shared
storage would be a primary motivation for moving to ovirt, though.)

I am very excited for this to work for me someday.  I think it has
been frustrating to have such sparse (or outdated?) documentation and
such fundamental problems/bugs/configuration challenges.  I'm using
pretty standard (Dell) commodity servers (SATA drives, simple RAID
setups, etc.).

Sorry for no log output, I can provide more of that when back at work
on Monday, but this was more of a general inquiry on where I should
plan to take this.

Thanks in advance!
Hans
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] General challenges w/ Ovirt 3.1

2012-09-29 Thread Dave Neary

Hi,

On 09/29/2012 01:37 PM, Hans Lellelid wrote:

I apologize in advance that this email is less about a specific
problem and more a general inquiry as to the most recommended /
likely-to-be-successful way path.


Having just gone through the process, I hope I can help a little! You 
might want to check (and add to) the Troubleshooting page where I 
documented the various hiccups I had, and how I addressed them:


http://wiki.ovirt.org/wiki/Troubleshooting

There's also Node Troubleshooting and Troubleshooting NFS Storage 
Issues which might help you: 
http://wiki.ovirt.org/wiki/Node_Troubleshooting and 
http://wiki.ovirt.org/wiki/Troubleshooting_NFS_Storage_Issues


Also Jason Brooks's Up and running with oVirt 3.1 article is useful I 
think: 
http://blog.jebpages.com/archives/up-and-running-with-ovirt-3-1-edition/



2nd attempt: I re-installed the nodes as Fedora 17 boxes and
downgraded the kernels to 3.4.6-2.  Then I connected these from the
Engine (specifying the root pw) and watched the logs while things
installed.  After reboot neither of the servers were reachable.
Sitting in front of the console, I realized that networking was
refusing to start; several errors printed to the console looked like:


When you say that they are not reachable, what do you mean? By default, 
installing F17 as a node sets the iptables settings to:


# oVirt default firewall configuration. Automatically generated by vdsm 
bootstrap script.

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
# vdsm
-A INPUT -p tcp --dport 54321 -j ACCEPT
# libvirt tls
-A INPUT -p tcp --dport 16514 -j ACCEPT
# SSH
-A INPUT -p tcp --dport 22 -j ACCEPT
# guest consoles
-A INPUT -p tcp -m multiport --dports 5634:6166 -j ACCEPT
# migration
-A INPUT -p tcp -m multiport --dports 49152:49216 -j ACCEPT
# snmp
-A INPUT -p udp --dport 161 -j ACCEPT
# Reject any other input traffic
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -m physdev ! --physdev-is-bridged -j REJECT
--reject-with icmp-host-prohibited
COMMIT

So if you're trying to ping the nodes, you should see nothing, but ssh, 
snmp and vdsm should be available. If you have a local console access to 
the nodes, you should check the IPTables config.


I don't understand why you would lose your network connection entirely, 
though. I don't think that the network config for the nodes is changed 
by the installer.



3rd attempt: I re-installed the nodes with Fedora 17 and attempted to
install VDSM manually by RPM.  Despite following the instructions to
turn off ssl (ssl=false in /etc/vdsm/vdsm.conf), I am seeing SSL
unknown cert errors from the python socket server with every attempt
of the engine to talk to the node.


Hopefully the Node Troubleshooting page (or somebody else) can help 
you here, I'm afraid I can't.



The
Fedora-17-installed-by-engine sounds good, but there's a lot of magic
there  it obviously completely broke my systems.  Is that where I
should focus my efforts?  Should I ditch NFS storage and just try to
get something working with local-only storage on the nodes?  (Shared
storage would be a primary motivation for moving to ovirt, though.)


I would focus on this approach, and would continue to aim to use NFS 
storage. It works fine as long as you are on the 3?4?x kernels.



I am very excited for this to work for me someday.  I think it has
been frustrating to have such sparse (or outdated?) documentation and
such fundamental problems/bugs/configuration challenges.  I'm using
pretty standard (Dell) commodity servers (SATA drives, simple RAID
setups, etc.).


The Quick Setup Guide was useful to me, as long as everything went 
well: http://wiki.ovirt.org/wiki/Quick_Start_Guide


Hope some of that is helpful!

Cheers,
Dave.

--
Dave Neary
Community Action and Impact
Open Source and Standards, Red Hat
Ph: +33 9 50 71 55 62 / Cell: +33 6 77 01 92 13
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] General challenges w/ Ovirt 3.1

2012-09-29 Thread Joop

Dave Neary wrote:

Hi,

On 09/29/2012 01:37 PM, Hans Lellelid wrote:

I apologize in advance that this email is less about a specific
problem and more a general inquiry as to the most recommended /
likely-to-be-successful way path.
Having been there I know the difficulty of setting up oVirt but with the 
release of 3.1 things have improved immensely. One problem remains and 
you have found it already and that is the NFS problem with kernel 3.5.x

I have installed several Fed17 nodes and have had no problems sofar.
I install from the LiveCD Fed17 KDE 64bits and then add the ovirt repo 
and install vdsm and its dependancies. After that I add the node through 
engine and after the installation is ready and the node has rebooted it 
functions as one would expect.

If you want I can paste my bash history.
One thing to look out for is to switch off NetworkManager before hand 
and configure and activate 'network'


Joop

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Increase storage domain

2012-09-29 Thread Ayal Baron


- Original Message -
 On Thu, Sep 27, 2012 at 11:08 AM, Ayal Baron  aba...@redhat.com 
 wrote:
 
 
 
 
 
 
 
  Alan Johnson  a...@datdec.com  meant to write:
  So, no change.
 
 This looks like an LVM issue. Have you tried deactivating the VG
 before pvresize?
 
 
 
 I have not, but I don't think I'll bother playing with that any more
 since there is a more accepted way of growing that has no
 significant down side and leaves open the potential for more
 functionality. Good to know that I should not have to make the
 change.
 
 
 I should mention that the host is running is a minimal install of
 CentOS 6.3, updated, and then tweaked by oVirt. Perhaps there is
 some other package that enables this functionality?

no there isn't.  LVM should work fine.

 
 
 
 
  However, as I read this email, it occurred that some other things
  might not be equal. Specifically, using multiple LUNs could provide
  a means of shrinking the storage domain in the future. LVM provides
  a simple means to remove a PV from a VG, but does the engine
  support
  this in the CLI or GUI? That is, if the a storage domain has
  multiple LUNs in it, can those be removed at a later date?
 
 Not yet.
 
 
 Does this mean it is in the works? If not, where could I put in such
 feature request?
 
 
 Certainly, I have no pressing need of this, but it seems like a
 fairly simple thing to implement since I have done it so easily in
 the past with a just a couple of commands outside of an oVirt
 environment. I believe the primary purpose of the LVM functionality
 was to enable removal of dying PVs before they take out an entire
 VG. No reason it would not work just as well to remove a healthy PV.
 It can take a long time to move all the extents off the PV
 requested, but there is command to show the progress, so it would
 also be easy to wrap that in to the GUI.

What's simple in a single host environment is really not that simple when it 
comes to clusters.
The tricky part is the coordination between the different hosts and doing it 
live or with minimal impact.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users