Re: [PVE-User] Some features that I miss in the PVE WebUI

2020-05-22 Thread Thomas Lamprecht
Hi,

On 5/22/20 4:03 PM, Frank Thommen wrote:
> Dear all,
> 
> having worked with oVirt in the past there are some features that I am really 
> missing in PVE in my daily work:
> 
> a) a tabular overview over all virtual machines. This should/might also 
> include some performance data and the description.  See the attached partial 
> screenshot from oVirt, where this is implemented quite nicely. This is /not/ 
> to replace proper monitoring but to provide a quick overview over the 
> PVE-based intrastructure

Isn't this what Datacenter -> Search could be? Note that the list drops any 
attachments.

> 
> a1) the possibility to provide the virtual machines and containers with a 
> short, one-line description

Why not use the VM/CT notes? Editable over VM/CT -> Summary panel?

> 
> b) the possibility to use keywords from the Notes field or the description 
> (see a1 above) in the search box.  Our hosts are all named 
> -vm which forces us to keep a separate list for the mapping 
> of services to hostnames

Dominik has done some patches adding "tags", which then would be searchable.
Some backend support is there, but we had some discussion about how to integrate
them in the frontend. I think this will be picked up soonish and should provide
what you seek in b) maybe also a1..

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Mellanox ConnectX-5 and SR-IOV

2020-05-15 Thread Thomas Lamprecht
On 5/15/20 9:00 AM, Uwe Sauter wrote:
> Chris,
> 
> thanks for taking a look.
> 
> 
> Am 14.05.20 um 23:13 schrieb Chris Hofstaedtler | Deduktiva:
>> * Uwe Sauter  [200514 22:23]:
>> [...]
>>> More details:
>>>
>>> I followed these two instructions:
>>>
>>> https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x
>>>
>>> https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connect-ib-connectx-4-with-kvm--infiniband-x
>>>
>>> At about halfway through the second site I can't go on because the 
>>> mentioned paths are not available in /sys:
>>> /sys/class/infiniband/mlx5_0/device/sriov/0/policy
>>> /sys/class/infiniband/mlx5_0/device/sriov/0/node
>>> /sys/class/infiniband/mlx5_0/device/sriov/0/port
>> Disclaimer: I don't have the hardware you're talking about.
>>
>> However, I took a quick look at the mainline kernel drivers and the
>> driver sources from mellanox.com - the mainline kernel does _NOT_
>> have the code for these files.
>>
>> I guess if you want to use that, you'll have to install the ofed
>> driver from mellanox.com (preferably by starting with the sources
>> for Ubuntu 20.04).
> As I mentioned I tried to install Mellanox OFED (for Debian) but it wants to 
> uninstall many PVE related packages. If anyone has
> successfully installed MOFED on PVE 6 and can provide instructions, I'd be 
> happy.
> 
> 

I do not have this HW either but from a quick look you need to do two
things:

# apt install pve-headers
(the script assumes that this is linux-headers-*, and thus fails here already)


Open the "install.pl" file in an editor and search for the "sub uninstall"
There add an early return:

return 0;

immediately after the opening { after that you should be able to build it.

It tries to install it directly with dpkg and that may fail, if so try to
install it manually with `apt install /path/to/package.deb ...other.deb`
you may want to pass all debs from the DEBS sub directories to that one

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.2 released!

2020-05-13 Thread Thomas Lamprecht
Hi,

On 5/12/20 8:33 PM, Stephan Leemburg wrote:
> One question though. What is meant by:
> 
>  * Improve default settings to support hundreds to thousands* of
>    parallel running Containers per node (* thousands only with simple
>    distributions like Alpine Linux)
> 
> Is that setting feature nesting and running docker inside a lxc container or 
> in a kvm? Could you elaborate a little more on that statement?

No docker, no QEMU/KVM. Simply some sysctl default values for things like open
INotify watchers, entries in the ARP neighborhood table, maximum of mmap memory
area counts, maximum kernel keyring entries, ... got bumped to ensure that
systems using a lot of CTs do not need to do this fine tuning themself.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Newbie in cluster - architecture questions

2020-02-17 Thread Thomas Lamprecht
Hi,

On 2/17/20 2:21 PM, Demetri A. Mkobaranov wrote:
> 1) From the Proxmox manual it seems like a cluster, without HA, offers just 
> the ability to migrate a guest from a node to another one. Is this correct?

That plus some other things:
* you can manage all nodes through connecting to any node (multi master system)
* replicate ZFS backed VMs easily between nodes
* define backup jobs for VMs or resource pools and they'll run independently 
where the VM currently is.

> 
> 1B) Basically the configuration of one node is replicated (like /etc/pve via 
> pmxcfs) on other nodes so that they are all aware of the guests and then 
> corosync makes each node aware of the status of any other node and probably 
> triggers the synchronization via pmxcfs. Right? Nothing more (unless further 
> configuration) ?

Yeah, basically yes. A Proxmox VE base design is that each VM/CT belongs
to a node, so other nodes must not touch them, but only redirect API calls
and what not to the owning node.

And yes, all VM/CT, storage, firewall and some other configuration is on
/etc/pve and that is a realtime shared configuration file system. Any change
to any file will get replicated to all nodes in a reliable, virtual synchronous,
way.

> 
> 2) Can I have nodes belonging to one cluster but living in different 
> countries? Or in this case a multi-cluster is required (like 3 nodes in one 
> datacenter and a cluster in another datacenter somehow linked together) ?
> 

Theoretically yes, practically not so.
Clustering has some assertions on timing, so you need LAN like latencies between
those nodes. It can work with <= 10 milliseconds but ideal are <= 2 milliseconds
round trip times.

Linking clusters is planned and some work is going on (the API Token series was
preparatory work for linking clusters) but it is not yet possible.

hope that clears things up a bit.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Broken link in Ceph Wiki

2020-02-16 Thread Thomas Lamprecht
Hi,

On 2/17/20 8:30 AM, Amin Vakil wrote:
> This link is broken and gives 404 not found.
> 
> http://ceph.com/papers/weil-thesis.pdf
> 
> I think this is the new and working link:
> 
> https://ceph.com/wp-content/uploads/2016/08/weil-thesis.pdf

Yes seems right, thanks for telling us. Can you please also link to the page 
which
contains the outdated link - I did not find it immediately in our wiki, we have
multiple pages regarding ceph. That'd be great!

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Per-VM backup hook scripts?

2020-02-11 Thread Thomas Lamprecht
Hi,

On 2/11/20 12:25 PM, Dmytro O. Redchuk wrote:
> Hi masters,
> 
> please is it possible to attach backup hook scripts on per-vm basics,
> via GUI or CLI?
> 

Currently you only can specify a hook script for a whole backup job.
Either by uncommenting and setting the node-wide "script: /path/to/script"
setting or passing the "--script" variable to a specific `vzdump` call.


Bur as the VMID gets passed to the backup script so you can do specific steps
or actions based on that. Also calling a specific per VM script is possible.

See:
/usr/share/doc/pve-manager/examples/vzdump-hook-script.pl

on a Proxmox VE installation for an example.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Host Alias for InfluxDB Data

2020-01-15 Thread Thomas Lamprecht
On 1/15/20 3:58 PM, Martin Holub via pve-user wrote:
> Is there any way to add an Alias instead the reported "hostname" for
> InfluxDB. My problem is, i have severall Hosts with hostname "s1" but
> FQDN like s1.$location.domain.tld. As the Influx exported seems to
> report only the hostname, but not the FQDN. I  now have all Metrics on
> one Host, so this make the export quite useless in my scenario.

Hi!

to either use the FQDN or having a possibility to set an alias for a host
could make sense, the problem seems valid. Can you please open an enhancement
request at https://bugzilla.proxmox.com ? So we can track this.

Cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Upgrade to 6.1 successfull, but no web ui on one node, cannot start VM's on that node

2019-12-06 Thread Thomas Lamprecht
On 12/6/19 2:21 PM, Lindsay Mathieson wrote:
> Solved it - there were a lot off ssl errors in syslog, needed to run:
> 
>  * pvecm updatecerts  -f
> 
> 
> Dunno how it became a problem as I've never fiddled with custom certs
> 

maybe you got hit by the stricter security policy on Debian 10:
https://www.debian.org/releases/buster/amd64/release-notes/ch-information.en.html#openssl-defaults

But our upgrade checker script should had noticed that, though - so
not sure.

> On 6/12/2019 11:10 pm, Lindsay Mathieson wrote:
>> On 6/12/2019 11:07 pm, Tim Marx wrote:
>>> Try to ssh into that node and check the status of pve-cluster e.g.
>>> # systemctl status pve-cluster
>>>
>>> If it had problems while startup, try to restart the service.
>>
>>
>> Thanks - seems ok though:
>>
>>
>> systemctl status pve-cluster
>> ● pve-cluster.service - The Proxmox VE cluster filesystem
>>    Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor 
>> preset: enabled)
>>    Active: active (running) since Fri 2019-12-06 22:24:07 AEST; 44min ago
>>  Main PID: 2131 (pmxcfs)
>>     Tasks: 6 (limit: 4915)
>>    Memory: 30.2M
>>    CGroup: /system.slice/pve-cluster.service
>>    └─2131 /usr/bin/pmxcfs
>>
>> Dec 06 22:28:30 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 22:28:31 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 22:30:36 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 22:38:35 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 22:44:06 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 22:45:37 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 22:47:06 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 22:53:24 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 23:00:38 vnb pmxcfs[2131]: [status] notice: received log
>> Dec 06 23:08:25 vnb pmxcfs[2131]: [status] notice: received log
>>
>>
> 



___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pve5to6 : FAIL: Corosync transport explicitly set to 'udpu' instead of implicit default!

2019-12-05 Thread Thomas Lamprecht
On 12/6/19 7:27 AM, Lindsay Mathieson wrote:
> Thanks, that did the trick, with some sweaty moments :) cluster all updated
> to corosync 3.0 and healthy.

Great! Yeah, it's definitively slightly scary to do, but it was the
only way we found working for a upgrade way forward - i.e., avoiding
a cluster rebuild as it was necessary for the PVE 3.4 to 4.x upgrade.

Hopefully the "to 7.x" will be again a smooth-as-silk upgrade :)

cheers,
Thomas
 
> On Fri, 6 Dec 2019 at 15:51, Thomas Lamprecht 
> wrote:
> 
>> On 12/6/19 1:31 AM, Lindsay Mathieson wrote:
>>> As per the subject, I have the error : "FAIL: Corosync transport
>> explicitly set to 'udpu' instead of implicit default!"
>>>
>>>
>>> Can I ignore that for the upgrade? I had constant problems with
>> multicast, udpu is quite reliable.
>>>
>>
>> FAILures from the checker script are (almost) *never* ignore-able. :)
>>
>> In this case you will be glad to hear that with corosync 3, a new transport
>> technology was adoped, i.e., kronosnet. It currently is only capable of
>> unicast. The corosync internal multicast-udp and udpu stack was depreacated
>> and removed in favor of that. So having it set to udpu will fail the
>> upgrade.
>>
>> See:
>>
>> https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0#Cluster:_always_upgrade_to_Corosync_3_first
>>
>>
>> In your case, and a healthy cluster, I'd drop the transport while *not*
>> restarting corosync yet. That's a change which cannot be applied live, so
>> corosync will ignore it for now. Then you can continue with the upgrade
>> to corosync 3 - still on PVE 5/Stretch, see above.
>>
>> cheers,
>> Thomas
>>
>>
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pve5to6 : FAIL: Corosync transport explicitly set to 'udpu' instead of implicit default!

2019-12-05 Thread Thomas Lamprecht
On 12/6/19 1:31 AM, Lindsay Mathieson wrote:
> As per the subject, I have the error : "FAIL: Corosync transport explicitly 
> set to 'udpu' instead of implicit default!"
> 
> 
> Can I ignore that for the upgrade? I had constant problems with multicast, 
> udpu is quite reliable.
> 

FAILures from the checker script are (almost) *never* ignore-able. :)

In this case you will be glad to hear that with corosync 3, a new transport
technology was adoped, i.e., kronosnet. It currently is only capable of
unicast. The corosync internal multicast-udp and udpu stack was depreacated
and removed in favor of that. So having it set to udpu will fail the upgrade.

See:
https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0#Cluster:_always_upgrade_to_Corosync_3_first


In your case, and a healthy cluster, I'd drop the transport while *not*
restarting corosync yet. That's a change which cannot be applied live, so
corosync will ignore it for now. Then you can continue with the upgrade
to corosync 3 - still on PVE 5/Stretch, see above.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.1 released!

2019-12-05 Thread Thomas Lamprecht
On 12/5/19 8:47 AM, Uwe Sauter wrote:
> Am 05.12.19 um 07:58 schrieb Thomas Lamprecht:
>> On 12/4/19 11:17 PM, Uwe Sauter wrote:
>>> When trying to migrate VMs to a host that was already rebooted I get the 
>>> following in the task viewer window in the web ui:
>>>
>>> Check VM 109: precondition check passed
>>> Migrating VM 109
>>> Use of uninitialized value $val in pattern match (m//) at 
>>> /usr/share/perl5/PVE/RESTHandler.pm line 441.
>>> [...]
>>> Hope this is just cosmetic…
>>>
>>
>> It is, but I'm wondering why you get this.. Migration was just started 
>> normally
>> through the webinterface, or?
> 
> 
> I selected the server on the left, then bulk actions, migrate, all running 
> VMs, chose the target host and started migration.
> 

OK, bulk migration was the thing, old gui could pass an undefined value for
the new "with local disks" option for bulk migration, that wasn't caught
correctly. Behavior was not impacted, but lots of ugly warnings are never
nice. Fixed in master, thanks a lot for reporting!

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.1 released!

2019-12-04 Thread Thomas Lamprecht
Hi,

On 12/4/19 11:17 PM, Uwe Sauter wrote:
> Hi,
> 
> upgraded a cluster of three servers to 6.1. Currently I'm in the process of 
> rebooting them one after the other.
> 

Upgrade from 5.4 to 6.1 or from 6.0 to 6.1 ?

> When trying to migrate VMs to a host that was already rebooted I get the 
> following in the task viewer window in the web ui:
> 
> Check VM 109: precondition check passed
> Migrating VM 109
> Use of uninitialized value $val in pattern match (m//) at 
> /usr/share/perl5/PVE/RESTHandler.pm line 441.
> trying to acquire lock...
>  OK
> Check VM 200: precondition check passed
> Migrating VM 200
> Use of uninitialized value $val in pattern match (m//) at 
> /usr/share/perl5/PVE/RESTHandler.pm line 441.
> Check VM 203: precondition check passed
> Migrating VM 203
> Use of uninitialized value $val in pattern match (m//) at 
> /usr/share/perl5/PVE/RESTHandler.pm line 441.
> Check VM 204: precondition check passed
> Migrating VM 204
> Use of uninitialized value $val in pattern match (m//) at 
> /usr/share/perl5/PVE/RESTHandler.pm line 441.
> Check VM 205: precondition check passed
> Migrating VM 205
> Use of uninitialized value $val in pattern match (m//) at 
> /usr/share/perl5/PVE/RESTHandler.pm line 441.
> All jobs finished, used 5 workers in total.
> TASK OK
> 
> 
> Hope this is just cosmetic…
> 

It is, but I'm wondering why you get this.. Migration was just started normally
through the webinterface, or?


regards,
Thomas

> 
> Regards,
> 
> Uwe
> 
> 
> 
> Am 04.12.19 um 10:38 schrieb Martin Maurer:
>> Hi all,
>>
>> We are very excited to announce the general availability of Proxmox VE 6.1.
>>
>> It is built on Debian Buster 10.2 and a specially modified Linux Kernel 5.3, 
>> QEMU 4.1.1, LXC 3.2, ZFS 0.8.2, Ceph 14.2.4.1 (Nautilus), Corosync 3.0, and 
>> more of the current leading open-source virtualization technologies.
>>
>> This release brings new configuration options available in the GUI which 
>> make working with Proxmox VE even more comfortable and secure. Editing the 
>> cluster-wide bandwidth limit for traffic types such as migration, 
>> backup-restore, clone, etc. is possible via the GUI. If the optional package 
>> ifupdown2 of the Debian network interface manager is installed, it’s now 
>> possible to change the network configuration and reload it in the Proxmox 
>> web interface without a reboot. We have improvements to 2-factor 
>> authentication with TOTP and U2F.
>>
>> The HA stack has been improved and comes with a new 'migrate' shutdown 
>> policy, migrating running services to another node on shutdown.
>>
>> In the storage backend, all features offered by newer kernels with Ceph and 
>> KRBD are supported with version 6.1.
>>
>> We have some notable bug fixes, one of them being the QEMU monitor timeout 
>> issue or stability improvements for corosync. Countless other bugfixes and 
>> smaller improvements are listed in the release notes.
>>
>> Release notes
>> https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_6.1
>>
>> Video intro
>> https://www.proxmox.com/en/training/video-tutorials/item/what-s-new-in-proxmox-ve-6-1
>>
>> Download
>> https://www.proxmox.com/en/downloads
>> Alternate ISO download:
>> http://download.proxmox.com/iso/
>>
>> Documentation
>> https://pve.proxmox.com/pve-docs/
>>
>> Community Forum
>> https://forum.proxmox.com
>>
>> Source Code
>> https://git.proxmox.com
>>
>> Bugtracker
>> https://bugzilla.proxmox.com
>>
>> FAQ
>> Q: Can I dist-upgrade Proxmox VE 6.0 to 6.1 with apt?
>> A: Yes, just via GUI or via CLI with apt update && apt dist-upgrade
>>
>> Q: Can I install Proxmox VE 6.1 on top of Debian Buster?
>> A: Yes, see https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Buster
>>
>> Q: Can I upgrade my Proxmox VE 5.4 cluster with Ceph Luminous to 6.x and 
>> higher with Ceph Nautilus?
>> A: This is a two step process. First, you have to upgrade Proxmox VE from 
>> 5.4 to 6.0, and afterwards upgrade Ceph from Luminous to Nautilus. There are 
>> a lot of improvements and changes, please follow exactly the upgrade 
>> documentation.
>> https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0
>> https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus
>>
>> Q: Where can I get more information about future feature updates?
>> A: Check our roadmap, forum, mailing list and subscribe to our newsletter.
>>
>> A big THANK YOU to our active community for all your feedback, testing, bug 
>> reporting and patch submitting!
>>
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user



___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.1 released!

2019-12-04 Thread Thomas Lamprecht
Hi,

On 12/4/19 3:36 PM, Olivier Benghozi wrote:
> I suggest you should just leave appart the proxmox iso installer. Had only 
> problems with it.

We did tens of installations test just this week on many different HW,
combinations. Not a single one were it didn't worked here.

We'd be happy if you share your specific problems and HW, maybe on
https://bugzilla.proxmox.com/ maybe we can take a look at them.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.1 released!

2019-12-04 Thread Thomas Lamprecht
On 12/4/19 3:33 PM, Roland @web.de wrote:
> thanks for making proxmox!
> 
> unfortunatly i cannot install it on fujitsu rx300 s6, mouse/keyboard
> won't work in installer screen anymore. 6.0 and before works without
> problems.

We try hard to fix Installer issues, so more information would be great.

I guess it still works in the grub menu? (just to be sure)
If you select "Debug Mode", does it works there too (initially)?

As it worked with 6.0 and before problems could come from a regression
with the 5.3 kernel. We now also include more drivers, which actually
fixed an issue for the Mailgateway (re-uses installer) and keyboard
input under Hyper-V..

regards,
Thomas

> 
> will test on other machine soon.
> 
> maybe someone has similar issuesor has a clue how to workaround
> 
> roland
> 
> 
> Am 04.12.19 um 12:11 schrieb Gilles Pietri:
> 
>> Le 04/12/2019 à 10:38, Martin Maurer a écrit :
>>> Hi all,
>>>
>>> We are very excited to announce the general availability of Proxmox VE 6.1.
>>>
>>> It is built on Debian Buster 10.2 and a specially modified Linux Kernel 
>>> 5.3, QEMU 4.1.1, LXC 3.2, ZFS 0.8.2, Ceph 14.2.4.1 (Nautilus), Corosync 
>>> 3.0, and more of the current leading open-source virtualization 
>>> technologies.
>>>
>>> This release brings new configuration options available in the GUI which 
>>> make working with Proxmox VE even more comfortable and secure. Editing the 
>>> cluster-wide bandwidth limit for traffic types such as migration, 
>>> backup-restore, clone, etc. is possible via the GUI. If the optional 
>>> package ifupdown2 of the Debian network interface manager is installed, 
>>> it’s now possible to change the network configuration and reload it in the 
>>> Proxmox web interface without a reboot. We have improvements to 2-factor 
>>> authentication with TOTP and U2F.
>>>
>>> The HA stack has been improved and comes with a new 'migrate' shutdown 
>>> policy, migrating running services to another node on shutdown.
>>>
>>> In the storage backend, all features offered by newer kernels with Ceph and 
>>> KRBD are supported with version 6.1.
>>>
>>> We have some notable bug fixes, one of them being the QEMU monitor timeout 
>>> issue or stability improvements for corosync. Countless other bugfixes and 
>>> smaller improvements are listed in the release notes.
>> Hi!
>>
>> This is amazing, thanks a lot for your work, it is appreciated. Proxmox
>> truly is a wonderful project and both the community and the company
>> behind it deserves every thanks and support for their nice work.
>>
>> Setting up the update test bed right now!
>>
>> Regards,
>> Gilles


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] VMs created in rapid succession are assigned the same IPv4 address

2019-12-02 Thread Thomas Lamprecht
Hey,

On 12/2/19 11:18 PM, Adrian Petrescu wrote:
> Hey all, I have a pretty intriguing issue.
> 
> I'm spinning up VMs through a Terraform 
> provider(https://github.com/Telmate/terraform-provider-proxmox
> if it matters), which goes through the /api2/json endpoints. They are
> all full clones of a simple ubuntu1804 template. Everything is working
> just fine when I spin them up one at a time. The VMs are all just using
> a simple vmbr0 bridge with CIDR 192.168.128.207/16.
> 
> However, if I use `count = N` (with N > 1) to create multiple VMs "at
> once" (I'm using scare quotes because they are still just individual
> calls to `POST /api2/json/nodes//qemu//clone` being fired off
> in rapid succession), then once everything comes up, I find that all the
> VMs in that batch were assigned the same IPv4 address, which makes all
> but one of them inaccessible.
> The IPv6 address is different, the MAC addresses are different, and if I
> reboot the VM, the IPv4 address gets reassigned to something unique as
> well, so it's not as if the parameterization is somehow forcing it. If I
> slow the calls down and make them one at a time, everything is fine. So
> it really does seem like the DHCP server has some sort of strange race
> condition that ignores the MAC. But surely any reasonable DHCP
> implementation can deal with such a basic case, so I must be missing
> something.

This really sounds like a bad DHCP, it's the one assigning the same IP
multiple times, which it really shouldn't. What DHCP server do you use?

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] proxmox-ve: 5.4-2 can't access webinterface after update

2019-11-25 Thread Thomas Lamprecht
Hi,

On 11/25/19 6:14 AM, k...@zimmer.net wrote:
> Hi,
> i updated my Proxmox VE Host (via 'apt-get update; apt-get upgrade').

That's the wrong way to upgrade a Proxmox VE host[0] and is probably the
cause for your problems. Use

apt-get update
apt-get dist-upgrade

or the more modern interface:
apt update
apt full-upgrade

[0]: 
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_system_software_updates

> Now the web interface is not accessible anymore. Additionally i have
> disconnected network devices in the webinterface before the update. I
> can start the VMs via 'qm start 100', but they cannot get network (and i
> don't know how to fix this on the command line).
> To fix the web interface i tried 'service pveproxy restart' and 'service
> pvedaemon restart' - with no success.
> To fix the network in the VMs i tried 'qm set 100 --net0
> model=virtio,link_down=1' and 'qm set 100 --net0
> model=virtio,link_down=0' - also without success.

> proxmox-ve: 5.4-2 (running kernel: 4.15.18-7-pve)

Proxmox VE 5.4-2 is latest from this year, ok, but

> pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)

this manager version is from fall 2018, and highly probable incompatible
with the rest of the stack - other packages may have this issue too..

> pve-kernel-4.15: 5.2-10
> pve-kernel-4.13: 5.2-2
> pve-kernel-4.15.18-7-pve: 4.15.18-27
> pve-kernel-4.15.18-4-pve: 4.15.18-23
> pve-kernel-4.13.16-4-pve: 4.13.16-51
> pve-kernel-4.13.16-1-pve: 4.13.16-46
> pve-kernel-4.13.13-4-pve: 4.13.13-35
> pve-kernel-4.4.98-3-pve: 4.4.98-103
> corosync: 2.4.4-pve1
> criu: 2.11.1-1~bpo90
> glusterfs-client: 3.8.8-1
> ksm-control-daemon: 1.2-2
> libjs-extjs: 6.0.1-2
> libpve-access-control: 5.0-8
> libpve-apiclient-perl: 2.0-5
> libpve-common-perl: 5.0-41
> libpve-guest-common-perl: 2.0-18
> libpve-http-server-perl: 2.0-14
> libpve-storage-perl: 5.0-30
> libqb0: 1.0.3-1~bpo9
> lvm2: 2.02.168-pve6
> lxc-pve: 3.1.0-7
> lxcfs: 3.0.3-pve1
> novnc-pve: 1.0.0-3
> proxmox-widget-toolkit: 1.0-28
> pve-cluster: 5.0-38
> pve-container: 2.0-29
> pve-docs: 5.4-2
> pve-firewall: 3.0-22
> pve-firmware: 2.0-7
> pve-ha-manager: 2.0-9
> pve-i18n: 1.1-4
> pve-libspice-server1: 0.14.1-2
> pve-qemu-kvm: 3.0.1-4
> pve-xtermjs: 3.12.0-1
> qemu-server: 5.0-38
> smartmontools: 6.5+svn4324-1
> spiceterm: 3.0-5
> vncterm: 1.5-3
> # service pveproxy status
> ● pveproxy.service - PVE API Proxy Server
> Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor
> preset: enabled)
> Active: active (running) since Mon 2019-11-25 05:10:18 CET; 24min ago
> Process: 7797 ExecStop=/usr/bin/pveproxy stop (code=exited,
> status=0/SUCCESS)
> Process: 7803 ExecStart=/usr/bin/pveproxy start (code=exited,
> status=0/SUCCESS)
> Main PID: 7830 (pveproxy)
> Tasks: 4 (limit: 19660)
> Memory: 123.4M
> CPU: 2.107s
> CGroup: /system.slice/pveproxy.service
> ├─7830 pveproxy
> ├─7833 pveproxy worker
> ├─7834 pveproxy worker
> └─7835 pveproxy worker
> Nov 25 05:31:36 holodoc pveproxy[7835]: Can't locate object method
> "set_request_host" via package "PVE::RPCEnvironment" at
> /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1206.
> Nov 25 05:31:41 holodoc pveproxy[7835]: problem with client
> 192.168.1.86; Connection timed out
> # service pvedaemon status
> ● pvedaemon.service - PVE API Daemon
> Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor
> preset: enabled)
> Active: active (running) since Mon 2019-11-25 05:11:16 CET; 25min ago
> Process: 7991 ExecStop=/usr/bin/pvedaemon stop (code=exited,
> status=0/SUCCESS)
> Process: 8001 ExecStart=/usr/bin/pvedaemon start (code=exited,
> status=0/SUCCESS)
> Main PID: 8021 (pvedaemon)
> Tasks: 4 (limit: 19660)
> Memory: 114.8M
> CPU: 1.791s
> CGroup: /system.slice/pvedaemon.service
> ├─8021 pvedaemon
> ├─8024 pvedaemon worker
> ├─8025 pvedaemon worker
> └─8026 pvedaemon worker
> Nov 25 05:11:15 holodoc systemd[1]: Starting PVE API Daemon...
> Nov 25 05:11:16 holodoc pvedaemon[8021]: starting server
> Nov 25 05:11:16 holodoc pvedaemon[8021]: starting 3 worker(s)
> Nov 25 05:11:16 holodoc pvedaemon[8021]: worker 8024 started
> Nov 25 05:11:16 holodoc pvedaemon[8021]: worker 8025 started
> Nov 25 05:11:16 holodoc pvedaemon[8021]: worker 8026 started
> Nov 25 05:11:16 holodoc systemd[1]: Started PVE API Daemon.
> Any ideas how to fix this?
> Best regards,
> Kai
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user



___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] multi-function devices, webGUI and fix #2436

2019-11-23 Thread Thomas Lamprecht
Hi,

On 11/23/19 9:29 AM, arjenvanweel...@gmail.com wrote:
> Hi,
> 
> Yesterday evening, I was surprised by the same PCI passthrough issue as
> described in 
> https://forum.proxmox.com/threads/pci-passthrough-not-working-after-update.60580/
> . A VM failed to start with the error "no pci device info for
> device '00:xy.0'", while working fine before the apt-get dist-upgrade
> and reboot. 
> Once it was clear what the issue was, it was easily resolved by adding
> : to the hostpci entries.
> Unfortunately, the Help button/documentation does not mention this.
> 
> This issue did not occur for multi-function devices. Also, when using
> the webUI and enabling "All Functions" for the device, the setting is
> changed from "hostpci0: :00:xy.0" to "hostpci0: 00.xy".
> Changing it the other way around (disable multi-function in webUI) does
> not add the required ":", which will fail at the next VM start.
> Is this an intended difference or is it an oversight that will change
> (unexpectedly) in the future?
> 

We had a fix which allowed to use PCI domains other than "",
while not common we had people report the need.
This fix seemed to made some unintended fallout.. I just uploaded
qemu-server in version 6.0-17 with a small fix for it, I'll check
around a bit and if it seems got it may get to no-subscription soon.

Thanks for reporting!

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Reboot on psu failure in redundant setup

2019-11-08 Thread Thomas Lamprecht
On 11/8/19 4:22 PM, Mark Adams wrote:
> I didn't configure it to do
> this myself, so is this an automatic feature? Everything I have read says
> it should be configured manually.

Maybe my previous mail did not answered this point in a good way.

You need to configure *hardware-based* Watchdogs manually. But the
fallback will *always* be the Linux Kernel Softdog (which is very
reliable, from experience ^^) - else, without fencing, HA recovery
could never be done in a safe way (double resource usage).

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Reboot on psu failure in redundant setup

2019-11-08 Thread Thomas Lamprecht
Hi,

On 11/8/19 4:35 PM, Daniel Berteaud wrote:
> - Le 8 Nov 19, à 16:22, Mark Adams m...@openvs.co.uk a écrit :
>> Hi All,
>>
>> This cluster is on 5.4-11.
>>
>> This is most probably a hardware issue either with ups or server psus, but
>> wanted to check if there is any default watchdog or auto reboot in a
>> proxmox HA cluster.
>>
>> Explanation of what happened:
>>
>> All servers have redundant psu, being fed from separate ups in
>> separate racks on separate feeds. One of the UPS went out, and when it did
>> all nodes rebooted. They were functioning normally after the reboot, but I
>> wasn't expecting the reboot to occur.
>>
>> When the UPS went down, it also took down all of the core network because
>> the power was not connected up in a redundant fashion. Ceph and "LAN"
>> traffic was blocked because of this. Did a watchdog reboot each node
>> because it lost contact with its cluster peers? I didn't configure it to do
>> this myself, so is this an automatic feature? Everything I have read says
>> it should be configured manually.
>>
>> Thanks in advance.
> 
> Yes, that's expected. If all nodes are isolated from each other, they will be 
> self-fenced (using a software watchdog) to prevent any corruption and allow 
> services to be recovered on the quorate part of the cluster. In your case, 
> there was no quorate part, as there was no network at all.

Small addition, it can also be a HW Watchdog if configured[0].

And yes, as soon as you enable a HA service that node and the current
HA manager node will enable and pull-up a watchdog. And, if the node hangs
or there's a quorum loss for more than 60s, the watchdog updates will stop
and the node will get self-fenced soon afterwards (not more than a few
seconds).

[0]: 
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_configure_hardware_watchdog

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] GPG signature error running pveam update

2019-10-16 Thread Thomas Lamprecht
Hi,

On 10/15/19 4:43 PM, Adam Weremczuk wrote:
> Hi all,
> 
> It started failing following Debian 9->10 and PVE 5->6 upgrade:
> 
> pveam update
> update failed - see /var/log/pveam.log for details
> 
> "apt-key list" wasn't showing it so I've added it:
> 
> wget 
> https://github.com/turnkeylinux/turnkey-keyring/raw/master/turnkey-release-keyring.gpg
> apt-key add turnkey-release-keyring.gpg
> OK
> 
> It's now listed and looks ok at the first glance:
> 
> /etc/apt/trusted.gpg
> 
> pub   rsa2048 2008-08-15 [SC] [expires: 2023-08-12]
>   694C FF26 795A 29BA E07B  4EB5 85C2 5E95 A16E B94D
> uid   [ unknown] Turnkey Linux Release Key 
> 
> The errors in "pveam update" and pveam.log haven't gone away though:
> 
> 2019-10-15 15:34:31 starting update
> 2019-10-15 15:34:31 start download 
> http://download.proxmox.com/images/aplinfo-pve-6.dat.asc
> 2019-10-15 15:34:31 download finished: 200 OK
> 2019-10-15 15:34:31 start download 
> http://download.proxmox.com/images/aplinfo-pve-6.dat.gz
> 2019-10-15 15:34:31 download finished: 200 OK
> 2019-10-15 15:34:31 signature verification: gpgv: Signature made Fri Sep 27 
> 14:53:26 2019 BST
> 2019-10-15 15:34:31 signature verification: gpgv: using RSA key 
> 353479F83781D7F8ED5F5AC57BF2812E8A6E88E0
> 2019-10-15 15:34:31 signature verification: gpgv: Can't check signature: No 
> public key
> 2019-10-15 15:34:31 unable to verify signature - command '/usr/bin/gpgv -q 
> --keyring /usr/share/doc/pve-manager/trustedkeys.gpg 
> /var/lib/pve-manager/apl-info/pveam-download.proxmox.com.tmp.31480.asc 
> /var/lib/pve-manager/apl-info/pveam-download.proxmox.com.tmp.31480' failed: 
> exit code 2
> 2019-10-15 15:34:31 start download 
> https://releases.turnkeylinux.org/pve/aplinfo.dat.asc
> 2019-10-15 15:34:31 download finished: 200 OK
> 2019-10-15 15:34:31 start download 
> https://releases.turnkeylinux.org/pve/aplinfo.dat.gz
> 2019-10-15 15:34:32 download finished: 200 OK
> 2019-10-15 15:34:32 signature verification: gpgv: Signature made Sun Aug  4 
> 08:49:59 2019 BST
> 2019-10-15 15:34:32 signature verification: gpgv: using RSA key 
> 694CFF26795A29BAE07B4EB585C25E95A16EB94D
> 2019-10-15 15:34:32 signature verification: gpgv: Good signature from 
> "Turnkey Linux Release Key "
> 2019-10-15 15:34:32 update successful
> 
> Am I doing something wrong?
> 

No, we were doing something wrong :/

So the trusted keys is not updated all the time, it would normally be updated
when a new file was added, but in our case the build happens in a temporary
directory with all times having the same timestamp - so GNU make did not know
that it needs to regenerate the trusted key file.
As keys are added/removed in a frequency of ~ 2 years this was forgotten to do
here by manually running the update target in the source and committing to git.

I'll fix this up and release a follow up pve-manager soon, thanks for the report
and sorry for any inconvenience caused.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] running Debian 10 containers in PVE 5.4

2019-10-15 Thread Thomas Lamprecht
Hi,

On 10/15/19 11:58 AM, Adam Weremczuk wrote:
> Hello,
> 
> I'm running PVE 5.4-13 (Debian 9.11 based) using free no-subscription repos.
> 
> Recently I've deployed a few Debian 10.0 containers which I later upgraded to 
> 10.1.
> 
> I'm having constant issues with these CTs such as delayed start and console 
> availability (up to 15 minutes), unexpected network disconnections etc.
> 
> No such issues for Debian 9.x containers.
> 
> Is running Debian 10.x over 9.11 officially supported?

Somewhat, but the relative new systemd inside Debian 10 and other
newer distros is not always that compatible with current Container
Environments.

As a starter I'd enable the "nesting" feature in the CTs options, it
should help for quite a few issues by allowing systemd to setup it's
own cgroups in the CT. This option can only be done as root, and while
it has no real security implications it still exposes more of the host
in the CT.

There's some work underway to improve the CT experience with newer
distribution versions, but that will need still quite a bit of time and
will only become available on the PVE 6.X series, AFAICT.

> 
> Will switching to paid community repositories greatly improve my experience?

No, while the enterprise version is surely more stable it has not more
features than the community ones.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Trouble Creating CephFS

2019-10-11 Thread Thomas Lamprecht
Hi,

On 10/10/19 5:47 PM, JR Richardson wrote:
> Hi All,
> 
> I'm testing ceph in the lab. I constructed a 3 node proxmox cluster
> with latest 5.4-13 PVE all updates done yesterday and used the
> tutorials to create ceph cluster, added monitors on each node, added 9
> OSDs, 3 disks per ceph cluster node, ceph status OK.

Just to be sure: you did all that using the PVE Webinterface? Which tutorials
do you mean? Why not with 6.0? Starting out with Nautilus now will safe you
one major ceph (and PVE) upgrade.

> 
> From the GUI of the ceph cluster, when I go to CephFS, I can create
> MSDs, 1 per node OK, but their state is up:standby. When I try to
> create a CephFS, I get timeout error. But when I check Pools,
> 'cephfs_data' was created with 3/2 128 PGs and looks OK, ceph status
> health_ok.

Hmm, so no MDSs gets up and ready into the active state...
Was "cephfs_metadata" also created?

You could check out the 
# ceph fs status
# ceph mds stat


> 
> I copied over keyring and I can attach this to an external PVE as RBD
> storage but I don't get a path parameter so the ceph storage will only
> allow for raw disk images. If I try to attatch as CephFS, the content
> does allow for Disk Image. I need the ceph cluster to export the
> cephfs so I can attach and copy over qcow2  images. I can create new
> disk and spin up VMs within the ceph storage pool. Because I can
> attach and use the ceph pool, I'm guessing it is considered Block
> storage, hence the raw disk only creation for VM HDs. How do I setup
> the ceph to export the pool as file based?

with either CephFS or, to be techincally complete, by creating an FS on
a Rados Block Device (RBD).

> 
> I came across this bug:
> https://bugzilla.proxmox.com/show_bug.cgi?id=2108
> 
> I'm not sure it applies but sounds similar to what I'm seeing. It's

I realyl think that this exact bug cannot apply to you if you run with,
5.4.. if you did not see any:

>  mon_command failed - error parsing integer value '': Expected option value 
> to be integer, got ''in"}

errors in the log this it cannot be this bug. Not saying that it cannot
possibly be a bug, but not this one, IMO.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] upgrade smartmon tools?

2019-10-01 Thread Thomas Lamprecht
Hi,

On 10/1/19 4:08 AM, Roberto Alvarado wrote:
> Hi Folks,
> 
> Do you know what is the best way to upgrade the smartmon tools package to 
> version 7?
> 

just upgrade to Proxmox VE 6.x, it has smartmontools 7:

# apt show smartmontools
Package: smartmontools
Version: 7.0-pve2

For now we have no plans to backport it to 5.4.

> I have some problems with nvme units and smartmont 6.x, but with the 7 
> version all works without problem, but some proxmox packages depends on 
> smartmon and I dont want to create a problem with this update.

You could re-build it yourself from our git[0] if you really cannot upgrade
and ain't scared from installing some dev packages and executing some `make` :)

# git clone git://git.proxmox.com/git/smartmontools.git
# cd smartmontools
# make submodule
# make deb

[0]: https://git.proxmox.com/?p=smartmontools.git;a=summary

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] AMD ZEN 2 (EPYC 7002 aka "rome") kernel requirements

2019-09-27 Thread Thomas Lamprecht
Hi,

On 9/27/19 10:30 AM, Mark Adams wrote:
> Hi All,
> 
> I'm trying out one of these new processors, and it looks like I need at
> least 5.2 kernel to get some support, preferably 5.3.
> 

We're onto a 5.3 based kernel, may need a bit until a build gets
released for testing though.

But the things required for that newer platform to work will be
also backported to older kernels.

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Images on CephFS?

2019-09-25 Thread Thomas Lamprecht
Hi,

On 9/25/19 3:46 PM, Mark Schouten wrote:
> Hi,
> 
> Just noticed that this is not a PVE 6-change. It's also changed in 5.4-3. 
> We're using this actively, which makes me wonder what will happen if we 
> stop/start a VM using disks on CephFS...

huh, AFAICT we never allowed that, the git-history of the CephFS
storage Plugin is quite short[0] so you can confirm yourself..
The initial commit did not allow VM/CT images neither[1]..

[0]: 
https://git.proxmox.com/?p=pve-storage.git;a=history;f=PVE/Storage/CephFSPlugin.pm;h=c18f8c937029d46b68aeafded5ec8d0a9d9c30ad;hb=HEAD
[1]: 
https://git.proxmox.com/?p=pve-storage.git;a=commitdiff;h=e34ce1444359ee06f50dd6907c0937d10748ce05

> 
> Any way we can enable it again?

IIRC, the rational was that if Ceph is used, RBD will be prefered
for CT/VM anyhow - but CephFS seems to be quite performant, and as
all functionality should be there (or get added easily) we could
enable it just fine.. 

Just scratching my head how you were able to use it for images if
the plugin was never told to allow it..

cheers,
Thomas

> 
> -- 
> Mark Schouten
> Tuxis B.V.
> https://www.tuxis.nl/ | +31 318 200208
> 
> -- Original Message --
> From: "Mark Schouten" 
> To: "PVE User List" 
> Sent: 9/19/2019 9:15:17 AM
> Subject: [PVE-User] Images on CephFS?
> 
>>
>> Hi,
>>
>> We just built our latest cluster with PVE 6.0. We also offer CephFS 'slow 
>> but large' storage with our clusters, on which people can create images for 
>> backupservers. However, it seems that in PVE 6.0, we can no longer use 
>> CephFS for images?
>>
>>
>> Cany anybody confirm (and explain?) or am I looking in the wrong direction?
>>
>> -- 
>> Mark Schouten 
>>
>> Tuxis, Ede, https://www.tuxis.nl
>>
>> T: +31 318 200208
>>
>>
>> ___
>> pve-user mailing list
>> pve-user@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Bug report: Syntax error in /etc/aliases

2019-09-03 Thread Thomas Lamprecht
On 03.09.19 12:39, Musee Ullah via pve-user wrote:
> On 2019/09/03 3:14, Uwe Sauter wrote:
>> I'd suggest to do:
>> sed -i -e 's/^www:/www: /' /etc/aliases
>>
>> so that lines that were changed by a user are also caught.
> 
> just pointing out that consecutive package updates'll continuously add
> more spaces with the above since it doesn't check if there's already a
> space.
> 
> sed -E -i -e 's/^www:(\w)/www: \1/' /etc/aliases
> 
> 

That's why I said "at one single package version transition", independent
of what exactly we finally do, I'd always guarded it with a version check
inside a postinst debhelper script, e.g., like:


if dpkg --compare-versions "$2" 'lt' '6.0-X'; then
sed ...
fi

thus it happens only if an upgrade transitions from a version pre "6.0-x"
(independent how old) to a version equal or newer than "6.0-x".
No point in checking everytime, if a admin changed it back to something
"bad" then it was probably wanted, or at least not our fault like it is
here. :) 

But you suggestion itself would work fine, in general.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Bug report: Syntax error in /etc/aliases

2019-09-03 Thread Thomas Lamprecht
Hi Uwe,

On 03.09.19 09:18, Uwe Sauter wrote:
> Hi all,
> 
> on a freshly installed PVE 6 my /etc/aliases looks like:
> 
> # cat /etc/aliases
> postmaster: root
> nobody: root
> hostmaster: root
> webmaster: root
> www:root
> 
> and I get this output from mailq
> 
> # mailq
> -Queue ID-  --Size-- Arrival Time -Sender/Recipient---
> 2F38327892 5452 Fri Aug 30 23:25:46  MAILER-DAEMON
>   (alias database unavailable)
>  root@px-golf.localdomain
> 
> 30E0F27893 5548 Fri Aug 30 23:25:46  MAILER-DAEMON
>   (alias database unavailable)
>  root@px-golf.localdomain
> 
> 
> 
> If I change the last line in the aliases file to "www: root" (with a space as 
> the format requires as the man page says), recreate
> the alias database and flush the mail queues, everything looks fine.
> 
> # sed -i -e 's,www:root,www: root,g' /etc/aliases
> # newaliases
> # postqueue -f
> # mailq
> Mail queue is empty
> 
> 
> Looks like the package that adds the www entry makes an error.


Yes, you're right! Much thanks for the report, fixed for the next ISO release.

@Fabian: we should probably do a postinst hook which fixes this up?

Doing
# sed -i -e 's/^www:root$/www: root/' /etc/aliases

at one single package version transition could be enough.
I'd say checksum matching the file to see if it was modified since shipping is
not really required, as such matched entries are really not correct.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] ZFS - question

2019-08-14 Thread Thomas Lamprecht
Hi,

Am 8/13/19 um 10:37 AM schrieb lord_Niedzwiedz:
> 
> I run a "Stop" backup on proxmox, it shuts down the machine.
> Starts making a copy.
> But it immediately turns it on ("restarts only", doesn't stop for the 
> duration of the copy !! - why ??).
> "resuming VM again after 21 seconds" ?? !! why like this ?
> 
> Is it better to make a Snapshot copy ?? (its qestion number three ;) )

First, can you please post such threads as new ones, not as reply-to to
an existing one ("Proxmox VE 6.0 released!" in this case). That'd be
great, as it keeps inboxes a bit cleaner and it just make sense to
separate different, especially new, topics and questions by thread.
Thanks!

That said, "Stop" mode backups shut the machine down first, to bring it
in a consistent state. Then, the machine is started again in a "paused"
state - here we save the state of the consistent disk blocks, and then
resume the VM while we start backing up the data. If the VM writes to
some data we did not yet backed up we can detect that and save the changes
somewhere else (simplified said) while we backup the consistent data.

This mode guarantees that the backup was made from a consistent state,
but also that the VM can resume its operation relatively fast again.


If your underlying storage supports snapshots in a performant way -
like, among others, ZFS, LVM-Thin or a qcow2 backed disk do - you can
use that mode too. But, it does not has the same consistency guarantees
the stop mode has, there is some small chance to inconsistency - it
depends totally on the applications and workload running inside the VM.
Mostly the inconsistency risk here is the same as with a powerloss -
if the applications running inside can cope with that well you won't
have issues.

You can use snapshot as a full replacement for stop mode in combination
with the guest agent, though, as then we issue a "guest-fsfreeze-freeze"
before doing the snapshot - that should ensure consistency.

See:
https://pve.proxmox.com/pve-docs/chapter-vzdump.html#_backup_modes
for some documentation regarding this.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.0 released!

2019-08-12 Thread Thomas Lamprecht
Am 8/6/19 um 3:57 PM schrieb Hervé Ballans:
> Our OSDs are currently in 'filestore' backend. Does Nautilus handle this 
> backend or do we have to migrate OSDs in 'Bluestore' ?

Nautlius can still handle Filestore.
But, we do not support adding new Filestore OSDs through our tooling
any more (you can still use ceph-volume directly, though) - just FYI.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] running Buster CT on 5.4.6

2019-07-17 Thread Thomas Lamprecht
Hi,

On 7/16/19 5:16 PM, Adam Weremczuk wrote:
> I've just deployed a test Debian 10.0 container on PVE 5.4.6 from the default 
> template.
> 
> It installed fine, network is working ok across the LAN and I can ssh to it.
> 
> Regardless whether I disable IPv6 or not (net.ipv6.conf.ens4.disable_ipv6 = 
> 1) I'm getting the following errors:
> 
> ping 8.8.8.8
> connect: Network is unreachable

Hmm, strange, works just fine here (tested on both PVE 6.0 and 5.4).

> 
> ping google.com
> connect: Cannot assign requested address
> 
> host google.com
> google.com has address 172.217.169.46
> (DNS working fine)
> 
> I've never had such problems for any out of the box Debian 9 containers.
> 
> Any idea what's wrong and how to fix it?

Any firewall setup? Also, can you post the 

# ip addr
# ip route

outputs from inside the CT?

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.0 released!

2019-07-17 Thread Thomas Lamprecht
On 7/16/19 5:37 PM, Alain péan wrote:
> I shall indeed test carefully on a test cluster. But the problem is that I 
> have one still in filestore, and the other in bluestore, so perhaps, I shall 
> have to migrate all to bluestore in a first step...

You can still use Filestore backed Clusters, you cannot use our
tooling to add new Filestore backed OSDs (but you could use ceph-volume
for that), see:

https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_ceph_filestore

So there's no direct immediate need to upgrade them, although I'd
setup new OSDs with bluestore, as this will be more future proof.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] adding ceph osd nodes

2019-07-17 Thread Thomas Lamprecht
On 7/17/19 12:47 PM, mj wrote:
> Question: is it possible to add some extra ceph OSD storage nodes, without 
> proxmox virtualisation, and thus without the need to purchase additional 
> proxmox licenses?
> 
> Anyone doing that?
> 
> We are wondering for example if the extra mon nodes & OSDs would show up in 
> the pve gui.

Most of our data comes from directly talking to the monitors / ceph cluster
over RADOS, so the ceph point of view is our point of view.

You may need to do a bit more manual work in setting the others "external
nodes" up, though. Copying authkey(s), bootstrap keyring, initial config...




___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Corosync Upgrade Issue (v2 -> v3)

2019-07-17 Thread Thomas Lamprecht
On 7/16/19 9:28 PM, Ricardo Correa wrote:
> systemd[1]: Starting The Proxmox VE cluster filesystem...
> systemd[1]: pve-cluster.service: Start operation timed out. Terminating.
> pmxcfs[13267]: [main] crit: read error: Interrupted system call

That's strange, that an error happening initially at the startup code,
after we fork the child process which becomes the daemon doing the actual
work we wait in the parent for it to be read, for that a simple pipe is
used from where a byte is read, your read is getting interrupted - something
which normally should not happen..

Can you try to start in it the foreground:
# systemctl stop pve-cluster
# pmxcfs -f

and see what happens their

Also you probably should try to finish the upgrade:
# apt -f install

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Corosync Upgrade Issue (v2 -> v3)

2019-07-17 Thread Thomas Lamprecht
On 7/16/19 11:26 PM, Chris Hofstaedtler | Deduktiva wrote:
> * Fabian Grünbichler  [190716 21:55]:
> [..]
>>>
>>> dpkg: error processing package corosync (--configure):
>>>  dependency problems - leaving unconfigured
>>> Processing triggers for libc-bin (2.24-11+deb9u4) ...
>>> Processing triggers for pve-ha-manager (2.0-9) ...
>>> Processing triggers for pve-manager (5.4-11) ...
>>
>> if you followed the upgrade guidelines and ONLY upgraded corosync here, 
>> these two triggers should not be triggered...
> 
> I've done a corosync-only upgrade the other day and the triggers are
> indeed triggered.

Yes, and that's correct. We actively trigger the pve-cluster (pmxcfs) service
and the "pve-api-updates" trigger - which then trigger manager and ha-manager.

Se, and the in-lined commit message for details (basically new libqb is 
incompat,
as new corosync is freshly stared it load the new libqb, pmxcfs has still the 
old
loaded and cannot communicate anymore, thus it needs to be restarted, and
subsequently all daemons using IPCC calls to pmxcfs):

https://git.proxmox.com/?p=libqb.git;a=commitdiff;h=5abd5865b8d2d0cf245e4b3085a08fb22bf6e7fd

cheers,
Thomas

> 
> dpkg.log (truncated):
> 
> 2019-07-07 21:14:45 startup archives unpack
> 2019-07-07 21:14:46 upgrade libcorosync-common4:amd64 2.4.4-pve1 
> 3.0.2-pve2~bpo9
> 2019-07-07 21:14:46 status triggers-pending libc-bin:amd64 2.24-11+deb9u4
> 2019-07-07 21:14:46 status half-configured libcorosync-common4:amd64 
> 2.4.4-pve1
> 2019-07-07 21:14:46 status unpacked libcorosync-common4:amd64 2.4.4-pve1
> 2019-07-07 21:14:46 status half-installed libcorosync-common4:amd64 2.4.4-pve1
> 2019-07-07 21:14:46 status half-installed libcorosync-common4:amd64 2.4.4-pve1
> 2019-07-07 21:14:46 status unpacked libcorosync-common4:amd64 3.0.2-pve2~bpo9
> 2019-07-07 21:14:46 status unpacked libcorosync-common4:amd64 3.0.2-pve2~bpo9
> 2019-07-07 21:14:46 upgrade libqb0:amd64 1.0.3-1~bpo9 1.0.5-1~bpo9+2
> 2019-07-07 21:14:46 status triggers-pending pve-ha-manager:amd64 2.0-9
> 2019-07-07 21:14:46 status triggers-pending pve-manager:amd64 5.4-10
> ...
> 
> full dpkg.log:
>  https://gist.github.com/zeha/9d47a95776d375d6f386b89c5be4a35a
> 
> 
> Chris
> 
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 



___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.0 released!

2019-07-16 Thread Thomas Lamprecht
On 7/16/19 4:57 PM, Thomas Lamprecht wrote:
> On 7/16/19 4:38 PM, Alain péan wrote:
>> *ceph-disk has been removed*: After upgrading it is not possible to create 
>> new OSDs without upgrading to Ceph Nautilus.
>>
>> So it willbe mandatory to upgrade to Ceph Nautilus, in addition to the other 
>> changes ?
> 
> yes, if you upgrade to 6.x you will need to upgrade Ceph to Nautilus sooner 
> or later.
> 
> See:
> 
> http://intranet.proxmox.com/index.php/Upgrade_from_5.x_to_6.0
> http://intranet.proxmox.com/index.php/Ceph_Luminous_to_Nautilus

sorry, meant:

https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0
https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 6.0 released!

2019-07-16 Thread Thomas Lamprecht
On 7/16/19 4:38 PM, Alain péan wrote:
> *ceph-disk has been removed*: After upgrading it is not possible to create 
> new OSDs without upgrading to Ceph Nautilus.
> 
> So it willbe mandatory to upgrade to Ceph Nautilus, in addition to the other 
> changes ?

yes, if you upgrade to 6.x you will need to upgrade Ceph to Nautilus sooner or 
later.

See:

http://intranet.proxmox.com/index.php/Upgrade_from_5.x_to_6.0
http://intranet.proxmox.com/index.php/Ceph_Luminous_to_Nautilus

We upgraded internal production clusters and a lot of test setups
without issues, so I think it should be dooable, especially if tried
out in a test setup first.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] JS reloading page

2019-07-08 Thread Thomas Lamprecht
Hi,

On 7/8/19 6:04 PM, bsd--- via pve-user wrote:
> Hello, 
> 
> There is a JS in Proxmox VE v.5.4.6 which reloads the page and forces all 
> menu item at the top every 5". 

A full page reload? We only do that on cluster creation, as there
the websites TLS certificate changed, and thus it's necessary.

> This is really very annoying because we have a quite extensive list of hosts 
> / devices and it always puts back the list at the top ! 

Do you mean in the "Resource Tree" on the left?
Could you please share some details like browser/OS used,
also exactly on which component the issues is showing up?

thanks!

> 
> Is there a way to remove this somehow ? 
> 
> You should really consider removing this JS feature it is painful and totally 
> useless. 
> 
> 
> Thanks. 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Debian buster inside PVE KVM

2019-07-08 Thread Thomas Lamprecht
Am 7/8/19 um 12:13 PM schrieb Fabian Grünbichler:
> On Mon, Jul 08, 2019 at 09:10:48AM +0200, Thomas Lamprecht wrote:
>> Am 7/8/19 um 8:05 AM schrieb Fabian Grünbichler:
>>> On Mon, Jul 08, 2019 at 02:16:34AM +0200, Chris Hofstaedtler | Deduktiva 
>>> wrote:
>>>> Hello,
>>>>
>>>> while doing some test upgrades I ran into the buster RNG problem [1],
>>>> where the newer kernel and systemd use a lot more randomness during
>>>> boot, causing startup delays.
>>>>
>>>> Very clearly noticable in dmesg:
>>>> [1.500056] random: fast init done
>>>> [  191.700840] random: crng init done
>>>> [  191.701445] random: 7 urandom warning(s) missed due to ratelimiting
>>>>
>>>> I couldn't find a supported way of enabling virtio_rng [2] in PVE
>>>> 5.4 or the 6.0 beta. As a test, I've set "args: -device
>>>> virtio-rng-pci" and that appears to work - the VM auto-loads the
>>>> virtio_rng kmod and "crng init done" happens at ~4s after poweron.
>>>
>>> yes, that's the way to go for now.
>>>
>>>> Are there any recommendations at this time or plans for adding
>>>> virtio_rng?
>>>
>>> filed [1] to keep track of adding proper support, as it sounds like a
>>> simple enough but worthwhile feature to me :)
>>>
>>> 1: https://bugzilla.proxmox.com/show_bug.cgi?id=2264
>>>
>>
>> The request for this is a bit older, and then some concerns about
>> possible depleting the hosts entropy pool were raised.
>> Maybe we want to ship havedged, or at least recommend it in docs if no
>> other "high" bandwitdh (relatively speaking) HW rng source is
>> available on the host... ATM, I cannot find the discussion, sorry,
>> IIRC it was on a mailing list of ours..
> 
> haveged is surrounded by some controversy especially for usage inside
> VMs, since it relies on jitter via timer instructions that may or may
> not be passed through to the actual hardware, and most recommendations
> actually err on the side of "stay away unless you have no choice"(see
> 1, 2 and the stuff linked there).

OK, that are the issues I was concerned about possibly existing. Thanks
for pointing at them!

> 
> virtio-rng does have the issue of potentially depleting the host's
> entropy pool, with a proper HWRNG, this is not really an issue. it is
> possible to ratelimit the virtio-rng device (max-bytes/period
> parameter).
> 
> offering as opt-in it with the proper caveat ("only enable if your host
> can provide lots of entropy") is probably better than pointing at
> potentially problematic solutions?

Definitively.

> 
> VMs with CPU types that pass in rdrand/rdseed are also "fixed".
> 
> 1: https://wiki.debian.org/BoottimeEntropyStarvation
> 2: https://wiki.archlinux.org/index.php/Haveged
> 
h

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Debian buster inside PVE KVM

2019-07-08 Thread Thomas Lamprecht
Am 7/8/19 um 9:56 AM schrieb arjenvanweel...@gmail.com:
> Is just installing haveged sufficient? Can the Proxmox-team decide to
> add haveged to it's dependencies? Or is more discussion required? 

It'd be, the service is then enabled and running by default.

For me it'd be OK to add as a dependency or recommends somewhere,
but I have to say that I did not looked to much into possible
bad implications or what people with good knowledge of statistics/
randomness think about havedged, but it should not be to bad, AFAICT,
or at least hopefully better than nothing ^^

> I'll have a look but cannot guarantee anything.

Appreciated!

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Debian buster inside PVE KVM

2019-07-08 Thread Thomas Lamprecht
Am 7/8/19 um 9:34 AM schrieb arjenvanweel...@gmail.com:
> Having this (as an option) in the GUI would be very nice, 
> and 'apt-get install haveged' is quick and easy.

opt-in is surely no problem, my concerns would be rather for
the case where we just add this for VMs with Linux as ostype,
because why not, VMs can only profit from it, as said, the
single thing to look out is that enough entropy is available.

And sure it's easy to install havedged, were using a sane Linux
Distro as base, after all ;) But one *needs* to do it, else bad
or no entropy can harm too. 

It'd be great if one would you like to assemble a patch for this,
shouldn't be to much work.

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Debian buster inside PVE KVM

2019-07-08 Thread Thomas Lamprecht
Am 7/8/19 um 8:05 AM schrieb Fabian Grünbichler:
> On Mon, Jul 08, 2019 at 02:16:34AM +0200, Chris Hofstaedtler | Deduktiva 
> wrote:
>> Hello,
>>
>> while doing some test upgrades I ran into the buster RNG problem [1],
>> where the newer kernel and systemd use a lot more randomness during
>> boot, causing startup delays.
>>
>> Very clearly noticable in dmesg:
>> [1.500056] random: fast init done
>> [  191.700840] random: crng init done
>> [  191.701445] random: 7 urandom warning(s) missed due to ratelimiting
>>
>> I couldn't find a supported way of enabling virtio_rng [2] in PVE
>> 5.4 or the 6.0 beta. As a test, I've set "args: -device
>> virtio-rng-pci" and that appears to work - the VM auto-loads the
>> virtio_rng kmod and "crng init done" happens at ~4s after poweron.
> 
> yes, that's the way to go for now.
> 
>> Are there any recommendations at this time or plans for adding
>> virtio_rng?
> 
> filed [1] to keep track of adding proper support, as it sounds like a
> simple enough but worthwhile feature to me :)
> 
> 1: https://bugzilla.proxmox.com/show_bug.cgi?id=2264
> 

The request for this is a bit older, and then some concerns about
possible depleting the hosts entropy pool were raised.
Maybe we want to ship havedged, or at least recommend it in docs if no
other "high" bandwitdh (relatively speaking) HW rng source is
available on the host... ATM, I cannot find the discussion, sorry,
IIRC it was on a mailing list of ours..


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] [pve-devel] Proxmox VE 6.0 beta released!

2019-07-05 Thread Thomas Lamprecht
Hi,

On 7/5/19 9:32 AM, mj wrote:
> Looks like a great new release!
> 
> Does corosync 3.0 mean that the notes on 
> [https://pve.proxmox.com/wiki/Multicast_notes] are no longer relevant?

We will update the documentation and wiki articles regarding this in
the following days, until the final PVE 6 release it should be ready.

> 
> Anything else/new to consider/check to make sure that kronosnet will work 
> nicely?

Ports stayed the same, communication is udp unicast per default,
our firewall has now better default allow rules for the cluster
networks used, so no, normally not to much special handling should
be needed.

Note: multicast may not be gone forever, kronosnet has some plans
to add support for it, but that may need quite a bit of time, and
even then we will try to keep support for the unicast kronosnet
transport, if possible.

cheers,
Thomas

> 
> MJ
> 
> On 7/5/19 9:27 AM, Eneko Lacunza wrote:
>> Hi Martin,
>>
>> Thanks a lot for your hard work, Maurer-ITans and the rest of developers...
>>
>> It seems that in PVE 6.0, with corosync 3.0, multicast won't be used by 
>> default? I think it could be interesting to have a PVE_6.x cluster wiki page 
>> to explain a bit the new cluster, max nodes, ...
>>
>> Also, thanks for taking the time to develop, test and describe a way for 
>> in-place upgrade without having to re-create the cluster, I think it would 
>> make the life easier for a lot of us...
>>
>> Cheers!!
>>
>> El 4/7/19 a las 21:06, Martin Maurer escribió:
>>> Hi all!
>>>
>>> We're happy to announce the first beta release for the Proxmox VE 6.x 
>>> family! It's based on the great Debian Buster (Debian 10) and a 5.0 kernel, 
>>> QEMU 4.0, ZFS 0.8.1, Ceph 14.2.1, Corosync 3.0 and countless improvements 
>>> and bugfixes. The new installer supports ZFS root via UEFI, for example you 
>>> can boot a ZFS mirror on NVMe SSDs (using systemd-boot instead of grub). 
>>> The full release notes will be available together with the final release 
>>> announcement.
>>>
>>> For more details, see:
>>> https://forum.proxmox.com/threads/proxmox-ve-6-0-beta-released.55670/


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Cluster does not start, corosync timeout...

2019-07-04 Thread Thomas Lamprecht
On 7/4/19 12:35 PM, Marco Gaiarin wrote:
> We had a major power outgage here, and our cluster have some trouble on
> restart. The worster was:
> 
>  Jul  3 19:58:40 pvecn1 corosync[3443]:  [MAIN  ] Corosync Cluster Engine 
> ('2.4.4-dirty'): started and ready to provide service.
>  Jul  3 19:58:40 pvecn1 corosync[3443]:  [MAIN  ] Corosync built-in features: 
> dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro 
> bindnow
>  Jul  3 19:58:40 pvecn1 corosync[3443]: notice  [MAIN  ] Corosync Cluster 
> Engine ('2.4.4-dirty'): started and ready to provide service.
>  Jul  3 19:58:40 pvecn1 corosync[3443]: info[MAIN  ] Corosync built-in 
> features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp 
> pie relro bindnow
>  Jul  3 20:00:09 pvecn1 systemd[1]: corosync.service: Start operation timed 
> out. Terminating.
>  Jul  3 20:00:09 pvecn1 systemd[1]: corosync.service: Unit entered failed 
> state.

Hmm, that's strange, do you have the full log between "19:58:40" and
"20:00:09", as normally there should be some more info, at least for
corosync and pve-cluster, e.g., the following output would be great:

journalctl -u corosync -u pve-cluster --since "2019-07-03 19:58:40" --until 
"2019-07-03 20:00:09"

> 
> But... some host in the cluster missed from /etc/hosts: this suffices
> to have corosync not to start correctly?
> 

depends on the config, as you stated yourself with multicast it normally
won't be an issue, but maybe the switch had some issues with multicast initially
after the power outage, as a guess.

> 
> Looking at docs (https://pve.proxmox.com/pve-docs/pve-admin-guide.html):
> 
>  While it’s often common use to reference all other nodenames in /etc/hosts 
> with their IP this is not strictly necessary for a cluster, which normally 
> uses multicast, to work. It maybe useful as you then can connect from one 
> node to the other with SSH through the easier to remember node name.
> 
> this mean i've not multicast correctly working? I was sure i had...

can you please post your corosync.conf ?


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pve-firewall, clustering and HA gone bad

2019-06-25 Thread Thomas Lamprecht
On 6/25/19 9:44 AM, Thomas Lamprecht wrote:
> And as also said (see quote below), for more specific hinters I need the raw
> logs, unmerged and as untouched as possible.

may just be that I did not saw the mail in my inbox, so it looks like
you already send it to me, sorry about missing it.

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pve-firewall, clustering and HA gone bad

2019-06-25 Thread Thomas Lamprecht
On 6/25/19 9:10 AM, Mark Schouten wrote:
> On Thu, Jun 13, 2019 at 12:34:28PM +0200, Thomas Lamprecht wrote:
>>> 2: ha-manager should not be able to start the VM's when they are running
>>> elsewhere
>>
>> This can only happen if fencing fails, and that fencing works is always
>> a base assumption we must take (as else no HA is possible at all).
>> So it would be interesting why fencing did not worked here (see below
>> for the reason I could not determine that yet as I did not have your logs
>> at hand)
> 
> Reading the emails from that specific night, I saw this message:
> 
>  The node 'proxmox01' failed and needs manual intervention.
> 
>  The PVE HA manager tries to fence it and recover the
>  configured HA resources to a healthy node if possible.
> 
>  Current fence status: SUCCEED
>  fencing: acknowledged - got agent lock for node 'proxmox01'
> 
> This seems to suggest that the cluster is confident that the fencing
> succeeded. How does it determine that?
> 

It got the other's node LRM agent lock through pmxcfs.

Normal LRM cycle is

0. startup
1. (re-)acquire agent lock, if OK go to 2, else to 4
2. do work (start, stop, migrate resources)
3. got to 1
4. no lock: if we had the lock once we stop watchdog updates, stop doing
   anything, wait for either quorum again (<60s) or the watchdog to trigger
   (>=60)
   if we never had the lock just poll for it continuously

Locks can be held only by one node. If the CRM sees a node offline for >120
seconds (IIRC) it tries to acquire the lock from that node, once it has it
it can know that the HA stack on the other side cannot start any actions
anymore - and if your "unfreeze before watchdog enable" did not happened
it would got fenced by the watchdog.

The lock and recovery action itself was not the direct root cause, as said,
the most I could take out from the logs you sent was:
> ...
> So, the "unfreeze before the respective LRM got active+online with watchdog"
> seems the cause of the real wrong behavior here in your log, it allows the
> recovery to happen, as else frozen services wouldn't not have been recovered
> (that mechanism exactly exists to avoid such issues during a upgrade, where
> one does not want to stop or migrate all HA VM/CTs)

And as also said (see quote below), for more specific hinters I need the raw
logs, unmerged and as untouched as possible.

On 6/13/19 6:29 PM, Thomas Lamprecht wrote:
> While you interpolated the different logs into a single time-line it does not
> seem to match everywhere, for my better understanding could you please send 
> me:
> 
> * corosync.conf
> * the journal or syslog of proxmox01 and proxmox03 around "Jun 12 01:38:16"
>   plus/minus ~ 5 minutes, please in separated files, no interpolation and as
>   unredacted as possible
> * information if you have a HW watchdog or use the Linux soft-dog
> 
> that would be appreciated.

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pve-firewall, clustering and HA gone bad

2019-06-14 Thread Thomas Lamprecht
Hi,

On 6/13/19 10:08 PM, JR Richardson wrote:
>> On 6/13/19 3:29 PM, Horace wrote:
>>> Should this stuff be in 'help' documentation ?
>>
>> The thing with the resolved ringX_ addresses?
>>
>> Hmm, it would not hurt if something regarding this is written there.
>> But it isn't as black and white, and often depends a lot on the
>> preferences of the admin(s) and their setup/environment.
>>
>> Some hints could probably given, especially for a IPv6 addition/switch,
>> as the getaddrinfo preference of IPv6 over IPv4 if both are configured
>> has often bitten people (see /etc/gai.conf , man gai.conf), not only with
>> clustering or PVE.
>>
>> A few other hints could probably thrown into that too..
>> Stefan (CCd), would you be willing to take a look at this and expand the
>> "Cluster Network" section from the pvecm chapter in pve-docs a bit
>> regarding this? That'd be great.
>>
> 
> Hi All,
> 
> Sorry to hijack thread, but I was about to perform a 10 node cluster
> upgrade and after reading above, I have some reservations.
> 
> I did a mix of versions 4.x and 5.x nodes over the last couple of
> years and my corosync.conf file has a mix of 'ring0_addr' entries as
> DNS name and IP Address. All node hosts files are up to date with all
> nodes in the cluster. I'm running PVE 5.2-5 across all nodes, seems to
> be working fine, no issues.
> 
> Should I update corosync.conf 'ring0_addr:' entries to all IP
> Addresses before attempting the upgrade?

If you did no host network change(s) you really should be fine.

The issue of Mark Schouten was mainly due to a few things coming
together, if he had the ring0_addr's resovled the FW had still blocked
in this case, as the local_net calculations still picked up the new
IPv6 net as primary first, AFAICT.

> 
> If so, I assume I have to stop the pmxcfs and or corosync, update the
> file on any node, then restart cluster service on the that node to
> push update to all nodes?

Would work, but it more intrusive than it needs to be. What I would do is:

1. Do an omping check[0] with all the new addresses you plan to replace the
   hostnames from ring0_addr *first*, as this shows if the cluster can talk
   with each other through those addresses at all. You can also get the
   currently used IPs by using the following command (maybe grep for 'ip')
   # corosync-cmapctl runtime.member

2. # cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new

3. # editor /etc/pve/corosync.conf.new

4. change all ring0_addr to their respective IP counter part, as we use the
   _exact_ same addresses, just written out, as before you can do this all
  at once. If you /change/ the addresses to other ones it should not be
  done this way, at least not if you aren't really comfortable with corosync
  and played around (in testing systems) a lot with such stuff.

5. ensure you increased the config_version by one

6. safe and diff to ensure the changes you're about the enact are OK:
   # diff -u /etc/pve/corosync.conf /etc/pve/corosync.conf.new

7. now lets enforce the changes cluster wide, 
   # mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf

9. pmxcfs sees that the corosync conf changed and the config version is
   newer, each node thus copies this over from /etc/pve/corosync.conf to
   /etc/corosync/corosync.conf
   
9. The journalctl/syslog should show some message about corosync reloading
   the config, possible telling you that it cannot enact ring0_addr change of
   itself during runtime, which is ok _here_ as we did not change the address
   at all, just switched to another representation of it.

As said that's for a address change which does not really changes the address ;)
Else, I probably would

1. stop pve-ha-lrm everywhere, then pve-ha-crm (order is important)

2. do the edits as above, pmxcfs and corosync must still run, triple check
   the changes, ensure that the new network is reachable from all nodes
   (omping can help)

3. enforce config by moving the .new over the real one.

4. # systemctl restar corosync pve-cluster # everywhere

5. start ha services again.

[0]: 
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pve-firewall, clustering and HA gone bad

2019-06-13 Thread Thomas Lamprecht
On 6/13/19 3:29 PM, Horace wrote:
> Should this stuff be in 'help' documentation ?

The thing with the resolved ringX_ addresses?

Hmm, it would not hurt if something regarding this is written there.
But it isn't as black and white, and often depends a lot on the
preferences of the admin(s) and their setup/environment.

Some hints could probably given, especially for a IPv6 addition/switch,
as the getaddrinfo preference of IPv6 over IPv4 if both are configured
has often bitten people (see /etc/gai.conf , man gai.conf), not only with
clustering or PVE.

A few other hints could probably thrown into that too..
Stefan (CCd), would you be willing to take a look at this and expand the
"Cluster Network" section from the pvecm chapter in pve-docs a bit
regarding this? That'd be great.

> 
> On 6/13/19 12:29 PM, Thomas Lamprecht wrote:
>> On 6/13/19 1:30 PM, Mark Schouten wrote:
>>> On Thu, Jun 13, 2019 at 12:34:28PM +0200, Thomas Lamprecht wrote:
>>>> Hi,
>>>> Do your ringX_addr in corosync.conf use the hostnames or the resolved
>>>> addresses? As with nodes added on newer PVE (at least 5.1, IIRC) we try
>>>> to resolve the nodename and use the resolved address to exactly avoid
>>>> such issues. If it don't uses that I recommend changing that instead
>>>> of the all nodes in al /etc/hosts approach.
>>> It has the hostnames. It's a cluster upgraded from 4.2 up to current.
>> OK, I suggest that you change that to the resolved IPs and add a "name"
>> property, if not already there (at the moment not to sure when I added
>> the "name" per-default to the config, it was sometime in a 4.x release)
>> IOW, the config's "nodelist" section should look something like:
>>
>> ...
>> nodelist {
>>    node {
>>  name: prod1
>>  nodeid: 1
>>  quorum_votes: 1
>>  ring0_addr: 192.168.30.75
>>    }
>>    node {
>>  name: prod2
>>  nodeid: 2
>>  quorum_votes: 1
>>  ring0_addr: 192.168.30.76
>>    }
>>    ...
>> }
>>
>> As said in the previous reply, that should avoid most issues of this kind,
>> and avoid the need for the /etc/host stuff on all hosts.
>>


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] pve-firewall, clustering and HA gone bad

2019-06-13 Thread Thomas Lamprecht
On 6/13/19 1:30 PM, Mark Schouten wrote:
> On Thu, Jun 13, 2019 at 12:34:28PM +0200, Thomas Lamprecht wrote:
>> Hi,
>> Do your ringX_addr in corosync.conf use the hostnames or the resolved
>> addresses? As with nodes added on newer PVE (at least 5.1, IIRC) we try
>> to resolve the nodename and use the resolved address to exactly avoid
>> such issues. If it don't uses that I recommend changing that instead
>> of the all nodes in al /etc/hosts approach.
> 
> It has the hostnames. It's a cluster upgraded from 4.2 up to current.

OK, I suggest that you change that to the resolved IPs and add a "name"
property, if not already there (at the moment not to sure when I added
the "name" per-default to the config, it was sometime in a 4.x release)
IOW, the config's "nodelist" section should look something like:

...
nodelist {
  node {
name: prod1
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.30.75
  }
  node {
name: prod2
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.30.76
  }
  ...
}

As said in the previous reply, that should avoid most issues of this kind,
and avoid the need for the /etc/host stuff on all hosts.

> 
>>> It seems that pve-firewall tries to detect localnet, but failed to do so
>>> correct. localnet should be 192.168.1.0/24, but instead it detected the
>>> IPv6 addresses. Which isn't entirely incorrect, but IPv6 is not used for
>>> clustering, so I should open IPv4 in the firewall not IPv6. So it seems
>>> like nameresolving is used to define localnat, and not what corosync is
>>> actually using.
>>
>> From a quick look at the code: That seems true and is definitively the
>> wrong behavior :/
> 
> Ok, I'll file a bug for that.

Thanks!

>>> 2: ha-manager should not be able to start the VM's when they are running
>>> elsewhere
>>
>> This can only happen if fencing fails, and that fencing works is always
>> a base assumption we must take (as else no HA is possible at all).
>> So it would be interesting why fencing did not worked here (see below
>> for the reason I could not determine that yet as I did not have your logs
>> at hand)
> 
> We must indeed make assumptions. Are there ways we can assume better? :)

Hmm, hard, as fencing must work. And it does normally if
* the fence device works (in this case the watchdog)
* no manual tinkering on HA was involved (no finger pointing, really, but
  while we try to fend off some manual changes, one can get the pve-ha-*
  in certain states where the VM is running but watchdogs got closed
* a bug (naturally), but with the simulation & regression tests this should
  be covered against in principle.

it will work, but closer analysis of your incident will hopefully show
what the case was, and if there could be enhancements against that.

>> The list trims attachments, could you please send them directly to my
>> address? I'd really like to see those.
> 
> Attached again, so you should receive it now.
> 

OK got the attachment now, thanks! 

I think I got the relevant part below:

> Jun 12 01:37:56 proxmox01 pve-ha-lrm[3729778]: status change 
> wait_for_agent_lock => active
> Jun 12 01:37:56 proxmox01 pve-ha-lrm[3729778]: successfully acquired lock 
> 'ha_agent_proxmox01_lock'
> Jun 12 01:37:56 proxmox01 pve-ha-lrm[3729778]: watchdog active

-> upgrade stuff on proxmox01, thus restart of pve-ha-lrm

> Jun 12 01:38:05 proxmox01 pve-ha-lrm[3729778]: received signal TERM
> Jun 12 01:38:05 proxmox01 pve-ha-lrm[3729778]: restart LRM, freeze all 
> services
> Jun 12 01:38:14 proxmox03 pve-ha-crm[3084869]: service 'vm:100': state 
> changed from 'started' to 'freeze'
> ...
> Jun 12 01:38:14 proxmox03 pve-ha-crm[3084869]: service 'vm:800': state 
> changed from 'started' to 'freeze'

-> ... all got frozen (which is OK)

> Jun 12 01:38:16 proxmox01 pve-ha-lrm[3729778]: watchdog closed (disabled)
> Jun 12 01:38:18 proxmox01 pve-ha-lrm[3731520]: status change startup => 
> wait_for_agent_lock

-> here, proxmox01 has not yet the LRM lock and is not yet active (!),
   but current master (proxmox03) already unfreezes proxmox01's services:

> Jun 12 01:38:24 proxmox03 pve-ha-crm[3084869]: service 'vm:100': state 
> changed from 'freeze' to 'started'
> ...
> Jun 12 01:38:24 proxmox03 pve-ha-crm[3084869]: service 'vm:800': state 
> changed from 'freeze' to 'started'

(remember that for below, the fact that those services got unfreezed before (!)
the watchdog was active again reads _very_ worrisome to me. They really 
shouldn't,
as freeze is exactly for avoiding issues during upgrade/restart of HA without 
stopping
all services)

-> now the quorum breaks as of the firewall allowing the IPv6 not IPv4 net, a 
bit later the H

Re: [PVE-User] pve-firewall, clustering and HA gone bad

2019-06-13 Thread Thomas Lamprecht
Hi,

On 6/13/19 11:47 AM, Mark Schouten wrote:
> Let me start off with saying that I am not fingerpointing at anyone,
> merely looking for how to prevent sh*t from happening again!
> 
> Last month I emailed about issues with pve-firewall. I was told that
> there were fixes in the newest packages, so this maintenance I started
> with upgrading pve-firewall before anything else. Which went well for
> about all the clusters I upgraded.
> 
> Then I ended up at the last (biggest, 9 nodes) cluster, and stuff got
> pretty ugly. Here's what happened:
> 
> 1: I enabled IPv6 on the cluster interfaces in the last month. I've done
> this before on other clusters, nothing special there. So I added the
> IPv6 addresses on the interfaces and added all nodes in all the
> /etc/hosts files. I've had issues with not being able to start clusters
> because hostnames could not resolve, so all my nodes in all my clusters
> have all the hostnames and addresses of their respective peers in
> /etc/hosts.

Do your ringX_addr in corosync.conf use the hostnames or the resolved
addresses? As with nodes added on newer PVE (at least 5.1, IIRC) we try
to resolve the nodename and use the resolved address to exactly avoid
such issues. If it don't uses that I recommend changing that instead
of the all nodes in al /etc/hosts approach.

> 2: I upgraded pve-firewall on all the nodes, no issues there
> 3: I started dist-upgrading on proxmox01 and proxmox02, and restarting
> pve-firewall with `pve-firewall restart` because of [1] and noticed that
> pvecm status did not list any of the other nodes in list of peers. So we
> had:
>   proxmox01: proxmox01
>   proxmox02: proxmox02
>   proxmox03-proxmox09: proxmox03-proxmox09
> 
> Obviously, /etc/pve was readonly on proxmox01 and proxmox02, since they
> had no quorum.
> 4: HA is heavily used on this cluster. Just about all VM's have it
> enabled. So since 'I changed nothing', I restarted pve-cluster a few
> times on the broken nodes. Nothing helped.
> 4: I then restarted pve-cluster on proxmox03, and all of the sudden,
> proxmox01 looked happy again.
> 5: In the meantime, ha-manager had kicked in and started VM's on other
> nodes, but did not actually let proxmox01 fence itself, but I did not
> notice this.
> 6: I tried restarting pve-cluster on yet another node, and then all
> nodes except proxmox01 and proxmox02 fenced themselves, rebooting
> alltogether.
> 
> After rebooting, the cluster was not completely happy, because the
> firewall was still confused. So why was this firewall confused? Nothing
> changed, remember? Well, nothing except bullet 1.
> 
> It seems that pve-firewall tries to detect localnet, but failed to do so
> correct. localnet should be 192.168.1.0/24, but instead it detected the
> IPv6 addresses. Which isn't entirely incorrect, but IPv6 is not used for
> clustering, so I should open IPv4 in the firewall not IPv6. So it seems
> like nameresolving is used to define localnat, and not what corosync is
> actually using.

From a quick look at the code: That seems true and is definitively the
wrong behavior :/

> 
> I fixed the current situation by adding the correct [ALIASES] in
> cluster.fw, and now all is well (except for the broken VM's that were
> running on two nodes and have broken images).
> 
> So I think there are two issues here:
> 1: pve-firewall should better detect the IP's used for essential
> services

Yes, granted, that should probably be. I'll try to take a look at this.

> 2: ha-manager should not be able to start the VM's when they are running
> elsewhere

This can only happen if fencing fails, and that fencing works is always
a base assumption we must take (as else no HA is possible at all).
So it would be interesting why fencing did not worked here (see below
for the reason I could not determine that yet as I did not have your logs
at hand)

> 
> Obviously, this is a faulty situation which causes unexpected results.
> Again, I'm not pointing fingers, I would like to discuss how we can
> improve these kind of faulty situations.
> 
> In the attachment, you can find a log with dpkg, pmxcfs, pve-ha-(lc)rm
> from all nodes. So maybe someone can better asses what went wrong.

The list trims attachments, could you please send them directly to my
address? I'd really like to see those.

> 
> [1]: https://bugzilla.proxmox.com/show_bug.cgi?id=1823
> 

[1] flew under my radar..

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Roadmap - improved SDN support

2019-05-23 Thread Thomas Lamprecht
Hi,

On 5/23/19 10:43 AM, Thomas Naumann wrote:
> there is an extra point "improved SDN support" under roadmap in
> official proxmox-wiki. Who can give a hint what this means in detail?
> 

Maybe you did not see it but Alexandre answered already to the same mail
on pve-devel[0].

[0]: https://pve.proxmox.com/pipermail/pve-devel/2019-May/037069.html


cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Source code for Kernel with patches

2019-05-21 Thread Thomas Lamprecht
On 5/17/19 4:07 PM, Igor Podlesny wrote:
> On Fri, 17 May 2019 at 17:59, Saint Michael  wrote:
>>
>> Maybe you should share the patch here so we benefit from it.
> 
> Thomas said everything is kept in public git repository, what else are
> you looking to benefit from? :)
> 

The original poster of the thread planned to add some patches on top
of our publicly available ones, and the one you reply to thus requested
to have them be shared, so that if it improves stuff it can be integrated
here by us and thus all using Proxmox Kernels can profit from it.
At least, that's my interpretation - and we're always glad about sharing
improvements back, but also know that this is a bit of work with some
initial "learning curve", so I certainly do not judge anybody if they are
refraining from it for such reasons. :-)

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Support for Ceph Nautilus?

2019-05-17 Thread Thomas Lamprecht
On 5/17/19 9:53 AM, Christian Balzer wrote:
> On Fri, 17 May 2019 08:05:21 +0200 Thomas Lamprecht wrote:
>> On 5/17/19 4:27 AM, Christian Balzer wrote:
>>> is there anything that's stopping the current PVE to work with an
>>> externally configured Ceph Nautilus cluster?  
>>
>>
>> Short: rather not, you need to try out to be sure though.

Dominik reminded me that with the new ceph release scheme you always
should be able to use clients with one stable version older or newer
than the server version just fine, so as Mimic was no stable release
you should be able to use Luminous clients with Nautilus server without
issues.

>>
> Whenever Ceph Nautilus packages drop for either Stretch or Buster, I'm
> quite aware of the issues you mention below.

Ah, I did not understand your question this way, sorry, and as said
they'll drop when PVE 6 drops and that'll highly probably be after
buster :-)

>  
>> You probably cannot use the kernel RBD as it's support may be to old
>> for Nautilus images.
>>
>> The userspace libraries we use else would need to be updated to Nautilus
>> to ensure working fine with everything changed in Nautilus.
>>
>> So why don't we, the Proxmox development community, don't just update to
>> Nautilus?
>>
>> That ceph version started to employ programming language features available
>> in only relative recent compiler versions, sadly the one which _every_ binary
>> in Stretch is build with (gcc 6.3) does not supports those features - we
>> thought about workarounds, but felt very uneasy of all of them - the compiler
>> and it's used libc is such a fundamental base of a Linux Distro that we 
>> cannot
>> change that for a single application without hurting stability and bringing 
>> up
>> lots of problems in any way.
>>
>> So Nautilus will come first with Proxmox VE 6.0 based upon Debian Buster,
>> compiled with gcc 8, which supports all those new shiny used features.
>>
> Any timeline (he asks innocently, knowing the usual Debian release delays)?
> 

For that it can make sense to ask the Debian release team, as at the moment
no public information is available, one can only speculate on the count and
types of release blocking bugs, with that and past releases in mind one can
probably roughly extrapolate the timeline and be correct for ± a month or so.

>>>
>>> No matter how many bugfixes and backports have been done for Luminous, it
>>> still feels like a very weak first attempt with regards to making
>>> Bluestore the default storage and I'd rather not deploy anything based on
>>> it.  
>>
>> We use Bluestore in our own Infrastructure without issues, and lot's of PVE 
>> user
>> do also - if that make you feelings shift a bit to the better.
>>
> 
> The use case I have is one where Bluestore on the same HW would perform
> noticeably worse than filestore and with cache tiers being in limbo
> (neither recommended nor replaced with something else) 
> 
> At this point in time I would like to deploy Nautilus and then forget
> about it, that installation will not be upgraded ever and retired before
> the usual 5 years are up.

sound like a security nightmare if not heavily contained, but I guess you know
that having that on the internet is not the best idea and that Proxmox release
cycles are normally supported for about ~3 years from initial .0 release, at
least in the past, and that not only means security update but any issue or
question which will come up in >3 years (you all prob. know that, and can deal
with it yourself, I do no question that, just as a reminder for anybody reading
this)

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Source code for Kernel with patches

2019-05-17 Thread Thomas Lamprecht
Hi,

On 5/17/19 2:57 AM, Mike O'Connor wrote:
> Hi Guys
> 
> Where can I download the source code for the PVE kernels with there
> patches (including old releases) ? I want to apply a patch to fix an issue.
> 

All our sources are available at: https://git.proxmox.com/

For cloning the kernel do:
$ git clone git://git.proxmox.com/git/pve-kernel
$ cd pve-kernel

# init submodules (would be done automatically on first build, but for
# applying patches you may want to have it earlier)
$ make submodule

on-top patches belong in the "patches/kernel" directory as .patch file,
which the 'patch' tool should be able understand.

What I often do is changing into the submodule directory, currently that'd
be:
$ git submodule update --init  # ensure all is checked out as desired
$ cd submodules/ubuntu-bionic

changing to a new branch
$ git chechkout -b backport-foo

then applying upstream/backported patches
$ git am 

then formating them out to the correct directory:
$ git format-patch -s -o ../../patches/kernel/ --start-number=100 --no-numbered 
--no-signature --zero-commit  Ubuntu-4.15.0-46.49..

(you may want to change start-number and the last parameter above, it is the
commit/tag/branch from which you want to base your work on), if you add a single
patch you could also just use "-1" ("-2" for two, ...) instead of 
"Ubuntu-4.15.0-X.Y"

Hope that helps.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Support for Ceph Nautilus?

2019-05-17 Thread Thomas Lamprecht
Hi,

On 5/17/19 4:27 AM, Christian Balzer wrote:
> 
> Hello,
> 
> is there anything that's stopping the current PVE to work with an
> externally configured Ceph Nautilus cluster?


Short: rather not, you need to try out to be sure though.

You probably cannot use the kernel RBD as it's support may be to old
for Nautilus images.

The userspace libraries we use else would need to be updated to Nautilus
to ensure working fine with everything changed in Nautilus.

So why don't we, the Proxmox development community, don't just update to
Nautilus?

That ceph version started to employ programming language features available
in only relative recent compiler versions, sadly the one which _every_ binary
in Stretch is build with (gcc 6.3) does not supports those features - we
thought about workarounds, but felt very uneasy of all of them - the compiler
and it's used libc is such a fundamental base of a Linux Distro that we cannot
change that for a single application without hurting stability and bringing up
lots of problems in any way.

So Nautilus will come first with Proxmox VE 6.0 based upon Debian Buster,
compiled with gcc 8, which supports all those new shiny used features.

> 
> No matter how many bugfixes and backports have been done for Luminous, it
> still feels like a very weak first attempt with regards to making
> Bluestore the default storage and I'd rather not deploy anything based on
> it.

We use Bluestore in our own Infrastructure without issues, and lot's of PVE user
do also - if that make you feelings shift a bit to the better.

cheers,
Thomas

> 
> Regards,
> 
> Christian
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] VM ID 4 digits -> no view in Backup-List

2019-05-15 Thread Thomas Lamprecht
Hi,

On 5/15/19 9:34 AM, Anton Blau wrote:
> Hello,
> 
> for better clarity, I have assigned 4-digit IDs for some VMs (eg 1250).
> 
> In the menu Data Center -> Backup -> Add these VMs do no longer appear.
> 
> Is this a bug or did I do something wrong?
> 

Same as Dominic, I cannot reproduce this here. Can you provide information
about your used browser and the output of `pveversion -v`

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Ceph and firewalling

2019-05-09 Thread Thomas Lamprecht
On 5/9/19 10:09 AM, Mark Schouten wrote:
> On Thu, May 09, 2019 at 07:53:50AM +0200, Alexandre DERUMIER wrote:
>> But to really be sure to not have the problem anymore :
>>
>> add in /etc/sysctl.conf
>>
>> net.netfilter.nf_conntrack_tcp_be_liberal = 1
> 
> This is very useful info. I'll create a bug for Proxmox, so they can
> consider it to set this in pve-firewall, which seems a good default if
> you ask me.
> 

IMO this is not a sensible default, it makes conntrack almost void:

>   nf_conntrack_tcp_be_liberal - BOOLEAN
>   [...]
>   If it's non-zero, we mark *only out of window RST segments* as 
> INVALID.

The more relevant flag, nf_conntrack_tcp_loose (If it is set to zero,
we disable picking up already established connections) is already on
(non-zero) by default.

The issue you ran into was the case where pve-cluster (pmxcfs) was
upgraded and restarted and pve-firewall thought that the user deleted
all rules and thus flushed them, is already fixed for most common cases
(package upgrade and normal restart of pve-cluster), so this shouldn't
be an issue with pve-firewall in version 3.0-20

But, Stoiko offered to re-take a look at this and try doing additional
error handling if the fw config read fails (as in pmxcfs not mounted)
and keep the current rules un-touched in this case (i.e., no remove,
no add) or maybe also moving the management rules above the conntrack,
but we need to take a close look here to ensure this has no non-intended
side effects.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] What is the purpose of second gluster server in storage config?

2019-05-08 Thread Thomas Lamprecht
On 5/8/19 10:15 AM, Igor Podlesny wrote:
> On Wed, 8 May 2019 at 15:02, Thomas Lamprecht  wrote:
> [...]
>>> -- I didn't open no ticket, neither did I __complain__. I just let
>>> others know there's a pitfall, meanwhile thoroughly describing what it
>>> was. That's it.
>>
>> In a mail were a user ask where to open a request for this (i.e., 
>> pro-actively
>> trying to do something to make it better), you just wrote:
>> "Proxmox tells you go suffer, that's what happens"
> 
> You're shamelessly making it up.
> Let's see how it really was:
> https://pve.proxmox.com/pipermail/pve-user/2019-May/170632.html
> 
> So, "go suffer tells you proxmox" was direct reply to
> "Proxmox recognize the failure and mounts the secondary?"
> 
> Kinda very different story (a bit). But still no dramatic.

OK, here I just read the question about where to report this and saw
the "Proxmox ..." answer, as Proxmox is "we" but also often used for Proxmox VE
(PVE), which leads to two totally different interpretations, so sorry for 
reading
to hasty and assuming bad faith here.

> 
>> (i.e., pro-actively trying to do something to make it better), you just 
>> wrote:
> 
> I've "pro-actively" reported that issue here on mail list on April 6:
> https://pve.proxmox.com/pipermail/pve-user/2019-April/170542.html
> 
> The only response I've gotten was yours: "Anyway, I'd be open for
> patches regarding this."

Yeah, still am, from all possible contributors - not only you, but yes
the Bugzilla request should've been there too.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] What is the purpose of second gluster server in storage config?

2019-05-08 Thread Thomas Lamprecht
On 5/8/19 9:37 AM, Igor Podlesny wrote:
> On Wed, 8 May 2019 at 14:14, Thomas Lamprecht  wrote:
>> On 5/8/19 8:57 AM, Igor Podlesny wrote:
>>> On Wed, 8 May 2019 at 13:11, Thomas Lamprecht  
>>> wrote:
> [...]
>>> In short: pain, suffering and all That.
>>>
>>
>> Yes, things are not always perfect. But instead of complaining, in a bit
>> dramatic way, maybe just open a enhancement/bug request, so that this gets
>> finally tracked and resolved or, as we're open source, you can naturally
>> also always provide a patch yourself, to get this kick started.
> 
> (Well, if you're speaking personally to me...)
> 
> -- I didn't open no ticket, neither did I __complain__. I just let
> others know there's a pitfall, meanwhile thoroughly describing what it
> was. That's it.
In a mail were a user ask where to open a request for this (i.e., pro-actively
trying to do something to make it better), you just wrote:
"Proxmox tells you go suffer, that's what happens"

...

> 
> Also, where did you find any drama even for a bit? :) If Proxmox
> developers aren't able to start fixing it due absence of a ticket, and
> something/someone prohibits them to open such a ticket themselves --
> that's more dramatic. ;-)
> 

There's a lot to do. GlusterFS isn't currently the most popular nor feature
full choice in software defined storage, so if a user can't even bother to
open a short Bugzilla request to ensure tracking of it happens why anybody
should bother fixing it, testing it, and releasing it for free for him?
Also, If I choose to what work on next I probably won't remember a single user
list post a month ago, but triaging in the Bugzilla gives it a bigger chance to
make it visible. No, nothing prohibits us from doing it ourself, but, if that's
to much to ask then it's probably not really a pain to you anyway and makes it
low priority automatically.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] What is the purpose of second gluster server in storage config?

2019-05-08 Thread Thomas Lamprecht
On 5/3/19 12:57 PM, Igor Podlesny wrote:
> On Fri, 3 May 2019 at 14:44, Iztok Gregori  wrote:
>>
>> Hi to all!
>>
>> So what happens when one of the configured servers fails, Proxmox
>> recognize the failure and mounts the secondary? If this so the running
> 
> Proxmox tells you go suffer, that's what happens. )
> 

What? 

>> I think that the possibility to pass multiple servers to QEMU should be 
>> implemented. Where can I open a feature request?


You can go over to https://bugzilla.proxmox.com/ and open a enhancement
request there.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Cluster mixed hardware behavior (CPUs)

2019-04-26 Thread Thomas Lamprecht
Am 4/26/19 um 4:56 PM schrieb Roland @web.de:
>> will run at the lowest common denominator. In other words, if you have 3
>> hosts each with CPU frequencies being 2.1 GHz, 2.3 GHz, and 2.5 GHz
>> respectively, the entire cluster will run at a 2.1 GHz level.
> 
> huh, really? never heard of that, where is that information from?
> 
> from what i know, clustering is about cpu features, not speed, see
> https://communities.vmware.com/thread/545610
> 
> What vmware does is EVC ( https://kb.vmware.com/s/article/1003212 ) ,
> that way you can mask what cpu features/instructions should be used
> within the cluster.
> 
> my proxmox knowledge is limited, but from what i can see you set what
> cpu features being used at VM level (hardware ->processors ->type)

exactly, and the default "kvm64" is a CPU type with reduced registers
so that, in theory, you can live migrate between all 64 bit amd64/x68_64
based host CPUs. You may still get into issues if you mix vendors (AMD,
Intel) but between same vendors you normally really won't have issues.

(also the inter-vendor may work, but they make far more often troubles,
it seems)

cheers,
Thomas

> 
> roland
> 
> 
> 
> 
> Am 26.04.19 um 14:05 schrieb Craig Jones:
>> Hello,
>>
>> To my understanding, in the vSphere world, a cluster with hosts of mixed
>> CPU frequencies and generations (let's assume consistent manufacturer)
>> will run at the lowest common denominator. In other words, if you have 3
>> hosts each with CPU frequencies being 2.1 GHz, 2.3 GHz, and 2.5 GHz
>> respectively, the entire cluster will run at a 2.1 GHz level. Is this
>> the same in the Proxmox world or what would be the equivalent behavior
>> for a Proxmox cluster?
>>
>> Thanks,
>> Craig
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Cluster mixed hardware behavior (CPUs)

2019-04-26 Thread Thomas Lamprecht
Am 4/26/19 um 2:05 PM schrieb Craig Jones:
> Hello,
> 
> To my understanding, in the vSphere world, a cluster with hosts of mixed 
> CPU frequencies and generations (let's assume consistent manufacturer) 
> will run at the lowest common denominator. In other words, if you have 3 
> hosts each with CPU frequencies being 2.1 GHz, 2.3 GHz, and 2.5 GHz 
> respectively, the entire cluster will run at a 2.1 GHz level. Is this 
> the same in the Proxmox world or what would be the equivalent behavior 
> for a Proxmox cluster?
> 

No, Proxmox VE does not has the GHz logic VMWare products have.

You add cores to the VM, the VM can use that, if you want to reduce
total CPU usage you can set CPU limits, with this you could add 4 cores
to a VM but limit it to use maximal 200% (which could be all 4 cores at
50%, or 2 at 100%, ...)

see:
https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_cpu

For this, and various other reasons, we recommend homogeneous nodes in
a cluster, while differences to a certain degree do not really matter,
having a very asymmetrical (e.g., a 10 year old quadcore and a new
48-core CPU) setup makes things not easier.

cheers,
Thomas

> Thanks,
> Craig
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Windows Server 2003

2019-04-25 Thread Thomas Lamprecht
Am 4/24/19 um 8:13 PM schrieb David Lawley:
> I did that as part of the migration
> 

and the guest agent works? i.e., things like
# qm guest cmd VMID get-osinfo

also the guest config could be interesting:
# qm config VMID


> Serial driver?  Don't have have any odd devices showing up in the device list
> 
> 
> 
> On 4/24/2019 2:02 PM, Mark Adams wrote:
>> Haven't tried this myself, but have you updated the qemu-agent and serial
>> driver to check it's not that?
>>
>> On Wed, 24 Apr 2019, 18:59 David Lawley,  wrote:
>>
>>> I know, its an oldie, but.. Windows Server 2003
>>>
>>> But since moving it to PVE 5.4 (from3.4) its does not reboot/restart on
>>> its own.  You can select a restart, but it does not. You have to start
>>> it manually.
>>>
>>> Migration from 3.4 was done via backup then restore into 5.4
>>>
>>> Poking around, but if someone has crossed this path before me I would
>>> appreciate a nudge in the right direction
>>>
>>> Thanks a bunch

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] API users

2019-04-24 Thread Thomas Lamprecht
Am 4/24/19 um 1:26 PM schrieb Mark Schouten:
> 
> The goal would indeed be to be able to limit the less secured users to 
> specific source addresses. At this moment, we managed to limit API-calls by 
> looking for the X-requested-by header, combined with the API URL with an 
> exclude for novnc, but the user is still able to login to the web frontend.
> 
> API-users and API-client-addresses would be the best fix, if you ask me.

This sounds legitimate and would be the easiest solution for this providing a
"real" fix, and at the moment I cannot think about an easy workaround achieving
something like this? Could you please open an "enhancement" request at:
https://bugzilla.proxmox.com/ it probably won't be seen as to high priority,
but it should be to hard either, once one really thinks about what makes sense.
(black/whitelist? per realm or per user, ...?)

cheers,
Thomas

> But since the GUI just uses the API, I guess that is more difficult than 
> you'd expect. :/
> 
> --
> 
> Mark Schouten 
> 
> Tuxis, Ede, https://www.tuxis.nl
> 
> T: +31 318 200208 
>  
> 
> 
> 
> - Original Message -
> 
> 
> From: Thomas Lamprecht (t.lampre...@proxmox.com)
> Date: 24-04-2019 12:34
> To: PVE User List (pve-user@pve.proxmox.com), Mark Schouten (m...@tuxis.nl), 
> Dominik Csapak (d.csa...@proxmox.com)
> Subject: Re: [PVE-User] API users
> 
> 
> Am 4/24/19 um 12:19 PM schrieb Mark Schouten:
>>
>> Hi,
>>
>> Sorry, that doesn't answer my question. I want users that have 2FA to be 
>> able to use the GUI, and I want to be able to disallow the GUI for certain 
>> users. I know that the GUI just uses the API as a backend.
> 
> That's not possible, what's your use case for this? If one has API access he 
> can do everything you can do through WebUI anyway?
> 
> And even _if_ we would add some sort of "WebUI" lockout, the API user could 
> just setup pve-manager's WebUI part to point at the API backend endpoint and 
> use that one.
> Or the user could just create a own gui? So I think this is not really 
> dooable and does not fits at all with REST APIs... You just can't control the 
> frontend there...
> 
> If you want to make internal API users more secure you can choose a random, 
> very big (e.g. 64 chars) password for them and be done, nobody will guess 
> that and the user name in a realistic time with the 3 seconds block on wrong 
> login?
> 
> What could _maybe_ make sense is to allow to restrict logins from certain 
> (sub)networks only, so that internal users are not exposed to less trusted 
> networks...
> 
>>
>> By 'do not allow access to /', do you mean for the user, or at a HTTP-level? 
>> Because at HTTP-level, that would completely disable the GUI, which you 
>> obviously don't want. Or do you mean in the permissions for the user?
>>
>> Thanks,
>>
>> --
>>
>> Mark Schouten 
>>
>> Tuxis, Ede, https://www.tuxis.nl
>>
>> T: +31 318 200208 
>>  
>>
>>
>>
>> - Originele bericht -
>>
>>
>> Van: Dominik Csapak (d.csa...@proxmox.com)
>> Datum: 24-04-2019 12:08
>> Naar: PVE User List (pve-user@pve.proxmox.com), Mark Schouten (m...@tuxis.nl)
>> Onderwerp: Re: [PVE-User] API users
>>
>>
>> On 4/24/19 11:54 AM, Mark Schouten wrote:
>>>
>>> Hi,
>>>
>>> we want all users to authenticate using 2FA, but we also want to use the 
>>> API externally, and 2FA with the API is quite difficult.
>>>
>>> In the latest version, you can enable 2FA per user, but you cannot disable 
>>> GUI access for e.g. API users. So a API user can just login without 2FA. Is 
>>> there a way to enable 2FA, and disable the GUI for users without 2FA? 
>>> Perhaps by revoking a rolepermission?
>>>
>>
>> Hi,
>>
>> The GUI and TFA are two independent things. The GUI uses the API in the
>> same way as any external api client would use it (via ajax calls).
>> If you want to disable just the gui, simply do not allow access to '/'
>> via a reverse proxy or something similar.
>>
>> If you want to enforce TFA, you have to enable it on the realm, then it
>> is enforced for all users of that realm
>>
>> The per user TFA is to enable single users to enhance the security of
>> their account, not to enforce using them.
>>
>> hope this answers your question
>>
>>
>>
>> ___
>> pve-user mailing list
>> pve-user@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> 
> 
> 
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] API users

2019-04-24 Thread Thomas Lamprecht
Am 4/24/19 um 12:19 PM schrieb Mark Schouten:
> 
> Hi,
> 
> Sorry, that doesn't answer my question. I want users that have 2FA to be able 
> to use the GUI, and I want to be able to disallow the GUI for certain users. 
> I know that the GUI just uses the API as a backend.

That's not possible, what's your use case for this? If one has API access he 
can do everything you can do through WebUI anyway?

And even _if_ we would add some sort of "WebUI" lockout, the API user could 
just setup pve-manager's WebUI part to point at the API backend endpoint and 
use that one.
Or the user could just create a own gui? So I think this is not really dooable 
and does not fits at all with REST APIs... You just can't control the frontend 
there...

If you want to make internal API users more secure you can choose a random, 
very big (e.g. 64 chars) password for them and be done, nobody will guess that 
and the user name in a realistic time with the 3 seconds block on wrong login?

What could _maybe_ make sense is to allow to restrict logins from certain 
(sub)networks only, so that internal users are not exposed to less trusted 
networks...

> 
> By 'do not allow access to /', do you mean for the user, or at a HTTP-level? 
> Because at HTTP-level, that would completely disable the GUI, which you 
> obviously don't want. Or do you mean in the permissions for the user?
> 
> Thanks,
> 
> --
> 
> Mark Schouten 
> 
> Tuxis, Ede, https://www.tuxis.nl
> 
> T: +31 318 200208 
>  
> 
> 
> 
> - Originele bericht -
> 
> 
> Van: Dominik Csapak (d.csa...@proxmox.com)
> Datum: 24-04-2019 12:08
> Naar: PVE User List (pve-user@pve.proxmox.com), Mark Schouten (m...@tuxis.nl)
> Onderwerp: Re: [PVE-User] API users
> 
> 
> On 4/24/19 11:54 AM, Mark Schouten wrote:
>>
>> Hi,
>>
>> we want all users to authenticate using 2FA, but we also want to use the API 
>> externally, and 2FA with the API is quite difficult.
>>
>> In the latest version, you can enable 2FA per user, but you cannot disable 
>> GUI access for e.g. API users. So a API user can just login without 2FA. Is 
>> there a way to enable 2FA, and disable the GUI for users without 2FA? 
>> Perhaps by revoking a rolepermission?
>>
> 
> Hi,
> 
> The GUI and TFA are two independent things. The GUI uses the API in the
> same way as any external api client would use it (via ajax calls).
> If you want to disable just the gui, simply do not allow access to '/'
> via a reverse proxy or something similar.
> 
> If you want to enforce TFA, you have to enable it on the realm, then it
> is enforced for all users of that realm
> 
> The per user TFA is to enable single users to enhance the security of
> their account, not to enforce using them.
> 
> hope this answers your question
> 
> 
> 
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Request to update wiki, risk of dataloss

2019-04-12 Thread Thomas Lamprecht
Hi,

On 4/12/19 4:41 PM, Mark Schouten wrote:
> Hi,
> 
> I'm in the process of upgrading some older 4.x clusters with Ceph to current 
> versions. All goes well, but we hit a bug that is understandable, but 
> undocumented. To prevent others from hitting it, I think it would be wise to 
> document to issue.
> 
> It is when you already upgraded Ceph to Luminous and not yet Proxmox to 5.x. 
> Resizing a disk makes Proxmox request the current disk size. Because Luminous 
> says 'GiB' or 'MiB' and older versions say 'GB' or 'MB', $size = 
> 0+$increaseresizesize. A customer of mine resized their disk from 50GB to 
> 20GB because of this issue.
> 
> So maybe a warning on https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous 
> and/or https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0 is possible?
> 

It added one here:
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous#Check_cluster_status_and_adjust_settings
at the bottom, hope this makes it clearer.

Thanks for the suggestion!

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox VE 5.4 released!

2019-04-11 Thread Thomas Lamprecht
On 4/11/19 2:47 PM, Uwe Sauter wrote:
> Thanks for all your effort. Two questions though:
> 
> 
> From the release notes:
> 
> HA improvements and added flexibility
> 
> It is now possible to set a datacenter wide HA policy which can change 
> the way guests are treated upon a Node shutdown or
> reboot. The choices are:
> freeze: always freeze servivces, independent of the shutdown type 
> (reboot, poweroff)
> failover: never freeze services, this means that a service will get 
> recovered to another node if possible and if the
> current node does not comes back up in the grace period of 1 minute.
> default: this is the current behavior, freeze on reboot but do not 
> freeze on poweroff
> 
> 
> This seems like an improvement but it seems I cannot find the place where 
> this can be configured. The documentation also seems to
> lack the necessary information. Can you point to the right docs?

just quickly addressing this as I'm on the go, DC -> options should have a new 
entry in the
WebUI also, man datacenter.cfg

> 
> Regarding failover: do you really think that 1 minute is enough? Most server 
> hardware boots slower, especially if there are a lot
> of disks to be discovered. Also, wouldn't actively migrating services to 
> other nodes before reboot/shutdown make more sense than
> relying on HA to discover that the service is down?
> 
> Regards,
> 
>   Uwe
> 
> Am 11.04.19 um 12:10 schrieb Martin Maurer:
>> Hi all!
>>
>> We are very pleased to announce the general availability of Proxmox VE 5.4.
>>
>> Built on Debian 9.8 (Stretch) and a specially modified Linux Kernel 4.15, 
>> this version of Proxmox VE introduces a new wizard for
>> installing Ceph storage via the user interface, and brings enhanced 
>> flexibility with HA clustering, hibernation support for
>> virtual machines, and support for Universal Second Factor (U2F) 
>> authentication.
>>
>> The new features of Proxmox VE 5.4 focus on usability and simple management 
>> of the software-defined infrastructure as well as on
>> security management.
>>
>> Countless bugfixes and more smaller improvements are listed in the release 
>> notes.
>>
>> Release notes
>> https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_5.4
>>
>> Video tutorial
>> What's new in Proxmox VE 5.4? 
>> https://www.proxmox.com/en/training/video-tutorials
>>
>> Download
>> https://www.proxmox.com/en/downloads
>> Alternate ISO download:
>> http://download.proxmox.com/iso/
>>
>> Dokumentation
>> https://pve.proxmox.com/pve-docs/
>>
>> Source Code
>> https://git.proxmox.com
>>
>> Bugtracker
>> https://bugzilla.proxmox.com
>>
>> FAQ
>> Q: Can I install Proxmox VE 5.4 on top of Debian Stretch?
>> A: Yes, see https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Stretch
>>
>> Q: Can I upgrade Proxmox VE 5.x to 5.4 with apt?
>> A: Yes, just via GUI or via CLI with apt update && apt dist-upgrade
>>
>> Q: Can I upgrade Proxmox VE 4.x to 5.4 with apt dist-upgrade?
>> A: Yes, see https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0. If you 
>> Ceph on V4.x please also check
>> https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous. Please note, Proxmox VE 
>> 4.x is already end of support since June 2018, see
>> Proxmox VE Support Lifecycle 
>> https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/
>>
>> Many THANKS to our active community for all your feedback, testing, bug 
>> reporting and patch submitting!
>>
> 
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] What is the purpose of second gluster server in storage config?

2019-04-06 Thread Thomas Lamprecht
On 4/6/19 8:39 AM, Igor Podlesny wrote:
> -- Beyond of the obvious "well, it's for redundancy". That's obvious..
> but "What subsystems and under what circumstances are gonna use it?"
> -- isn't at all.
> 
> I have strong suspicion that qemu-kvm isn't capable of fail-over
> switching in case its primary gluster server went down. At least
> preliminary tests showed that on single node's (that is a part of
> replicated gluster cluster setup) failure VM's disks are gone
> (effectively are kinda pulled off for VMs).
> 
> Second server's IP isn't even seen among arguments of kvm process.
> 
> Moreover, /proc/mounts doesn't even contain any mention of
> backup-volfile-servers or similar options.
> 


Quite a bit ago QEMU only supported a single server when passing a
GlusterFS backed volume, thus Proxmox VE tries to bing both set servers
on the relevant GlusterFS ports and passed the first one available.

Then, some QEMU releases ago, support for passing multiple GlusterFS
servers was added to QEMU, which it now uses to failover.
Nobody changed our QEMU interface to adapt to that new, even better
possibility..

maybe it came from the fact that the more scalable and better integrated
ceph was already on the plate, and GlusterFS usage in PVE declined a bit,
maybe it just wasn't noticed by GlusterFS users, thus nobody stepped up or
made much noise changing that. Anyway, I'd be open for patches regarding
this.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Boot disk corruption after Ceph OSD destroy with cleanup

2019-03-26 Thread Thomas Lamprecht
On 3/22/19 3:17 PM, Eneko Lacunza wrote:
> Hi Alwin,
> 
> El 22/3/19 a las 15:04, Alwin Antreich escribió:
>> On Fri,On a point release, a ISO is generated and the release info is needed
>> On a point release, a ISO is generated and the release info is needed
>> for that.
>>
>> The volume of package updates alone makes a separate announcment of
>> changes sensless. The changelog shows what changed from one version to
>> the other and with an 'apt update' and 'apt list --upgradable' one can
>> see what packages have upgrades. And if needed, with a little bit of
>> shell scripting you can get all the changelogs directly from the repo
>> server.
> It was just a suggestion. I suppose it's just fine to leave server-destroying 
> bugs fixed and unannounced to your users :)
> 

You already get emails if upgrades are available for your server, at least
if you correctly configured an email address during installation, or after
installation for the root@pam user in DC -> Users tab.

We provide the channels to get this information, even notice actively on new
updates available, on grave issues, which affect all, or most user we also
make additional posts over our various channels (e.g., the apt transport bug,
meltdown/spectre, ...).

So no, it wasn't unannounced, it's documented publicly in our changelog and
bugzilla, as Alwin mentioned, and if you configured the servers correctly
you got an email about pending updates.

Looking a our full stack as a complex Linux Distribution there are coming
multiple bugs (including security and logic flaws) to light per week,
depending on your setup, what specific technologies you use, and the
environment your servers are exposed (e.g., public internet, vs. contained
LAN) a lot of them may be possible server destroying if you include take-over
possibilities, or the fact that not all admins can trust  their VMs and CTs
running on their system, and simple logic flaws, be it in our own stack, or
an upstream component we use. 

Making a separate announcement would then be effectively a mirror of the
changelog (which is already there), as quite some package releases may
include a fix which is relevant for a certain set of setups. And there would
be many that one would do hard to read them all, and remember them all, also
the real big fish would then have a higher chance to go unnoticed. Easier to
just upgrade once packages are released, on which event you get already
notified about..

So while I understand your pain here, I'd rather have users update frequently,
as all updates are important, and us using the time to fix more bugs and add
features, than to write announcements for every update, which is indirectly
already available to read.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] why ;-]

2019-03-26 Thread Thomas Lamprecht
On 3/26/19 8:09 AM, lord_Niedzwiedz wrote:
> root@ave:~# apt upgrade
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> Calculating upgrade... Done
> The following packages have been kept back:
>   zfs-initramfs zfsutils-linux
> 0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
> 
> 
> Proxmox, why ? ;)
> 
> Why ??!!  ;-D

See:
https://pve.proxmox.com/pipermail/pve-user/2019-March/170505.html
https://forum.proxmox.com/threads/upgrade-to-kernel-4-15-18-12-pve-goes-boom.52564/#post-24

There's context about the issue and steps to resolve it.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] PSA - PVE 4.15.18-35

2019-03-20 Thread Thomas Lamprecht
On 3/20/19 3:28 PM, DL Ford wrote:
> I am not sure if this will effect everyone or if something really strange 
> just happened to my system, but after upgrading to PVE 4.15.18-35, all of my 
> network name assignments have gone back to the old style (e.g. in my case 
> enp4s0 is now eth0, enp6s0 is now eth1, etc).
> 
> Hopefully I will save some of you the time I had to spend diagnosing this 
> issue, you just have manually edit /etc/network/interfaces and change the 
> names of the physical NICs. As far as I can tell Proxmox has picked up the 
> changes in the GUI on it's own.
> 

can you post your latest apt actions, e.g., /var/log/apt/history.log (add .1 if 
it already rotated).

I suspect that you ran into an issue[0] and now are not running systemd 
anymore, but openRC or sysV-init...

can your try:


# apt -f install
# apt install --reinstall systemd
# apt purge insserv sysv-rc initscripts openrc
# apt update && apt full-upgrade


[0]: 
https://forum.proxmox.com/threads/upgrade-to-kernel-4-15-18-12-pve-goes-boom.52564/#post-24

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] upgrade 2.3 to 3.0

2019-03-08 Thread Thomas Lamprecht
On 3/7/19 7:42 PM, David Lawley wrote:
> sorry brain faart
> 
> I'm on 3,x, more work.
> 
> I think the last time I researched this I just decided it was time for a 
> refresh anyway

surely the easiest and cleanest way, then you can also go straight to a
fresh PVE 5.X installation.

> 
> On 3/7/2019 1:37 PM, David Lawley wrote:
>> I got a few old servers that I'm thinking about pushing an update
>>
>>
>> Is the pve-upgrade-2.3-to-3.0  script still valid?
>>


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Yubico doesn't work anymore

2019-03-01 Thread Thomas Lamprecht
Hi,

On 3/1/19 11:09 AM, Patrick Westenberg wrote:
> Hi everyone,
> 
> I configured PAM authentication to use yubico but I can't login anymore.
> 
> Mar  1 11:02:23 pve01 pvedaemon[4917]: authentication failure;
> rhost=172.31.0.1 user=root@pam msg=Invalid response from server: 410 Gone
> 
> Is it possible, that the proxmox stuff didn't update their
> implementation as Yubico deactivated deprecated ciphers and non-secured
> traffic?
> 
> https://status.yubico.com/2018/11/26/deprecating-yubicloud-v1-protocol-plain-text-requests-and-old-tls-versions/
> 
> proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
> pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
> pve-kernel-4.15: 5.3-2
> pve-kernel-4.15.18-11-pve: 4.15.18-33
> pve-kernel-4.15.18-3-pve: 4.15.18-22
> pve-kernel-4.15.17-1-pve: 4.15.17-9
> corosync: 2.4.4-pve1
> criu: 2.11.1-1~bpo90
> gfs2-utils: 3.1.9-2
> glusterfs-client: 3.8.8-1
> ksm-control-daemon: 1.2-2
> libjs-extjs: 6.0.1-2
> libpve-access-control: 5.1-3
> libpve-apiclient-perl: 2.0-5
> libpve-common-perl: 5.0-46

that's the issue, it should work again with pve-common in version
5.0-47 (or newer) which includes:
https://git.proxmox.com/?p=pve-common.git;a=commit;h=3b3ae60e0934a74b7cc34634740e720d574de3e2

> libpve-guest-common-perl: 2.0-20
> libpve-http-server-perl: 2.0-11
> libpve-storage-perl: 5.0-38
> libqb0: 1.0.3-1~bpo9
> lvm2: 2.02.168-pve6
> lxc-pve: 3.1.0-3
> lxcfs: 3.0.3-pve1
> novnc-pve: 1.0.0-2
> openvswitch-switch: 2.7.0-3
> proxmox-widget-toolkit: 1.0-22
> pve-cluster: 5.0-33
> pve-container: 2.0-34
> pve-docs: 5.3-2
> pve-edk2-firmware: 1.20181023-1
> pve-firewall: 3.0-17
> pve-firmware: 2.0-6
> pve-ha-manager: 2.0-6
> pve-i18n: 1.0-9
> pve-libspice-server1: 0.14.1-2
> pve-qemu-kvm: 2.12.1-1
> pve-xtermjs: 3.10.1-1
> qemu-server: 5.0-46
> smartmontools: 6.5+svn4324-1
> spiceterm: 3.0-5
> vncterm: 1.5-3
> 
> 
> Regards
> Patrick
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Shared storage recommendations

2019-02-26 Thread Thomas Lamprecht
Hi,

On 2/25/19 6:22 PM, Frederic Van Espen wrote:
> Hi,
> 
> We're designing a new datacenter network where we will run proxmox nodes on
> about 30 servers. Of course, shared storage is a part of the design.
> 
> What kind of shared storage would anyone recommend based on their
> experience and what kind of network equipment would be essential in that
> design? Let's assume for a bit that budget is not constrained too much. We
> should be able to afford a vendor specific iSCSI device, or be able to
> implement an open source solution like Ceph.
> 
> Concerning storage space and IOPS requirements, we're very modest in the
> current setup (about 13TB storage space used, very modest IOPS, about 6500
> write IOPS and 4200 read IOPS currently distributed in the whole network
> according to the prometheus monitoring).
> 
> Key in the whole setup is day to day maintainability and scalability.

I'd use ceph then. Scalability is something ceph is just made for, and
maintainability is also really not to bad, IMO. You can do CTs and VMs on
normal blockdevices (rdb) and also have a file based shared FS (cephFS) both
well integrated into PVE frontend/backend, which other shared storage systems
aren't.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] CVE-2019-8912

2019-02-25 Thread Thomas Lamprecht
On 2/25/19 6:03 PM, José Manuel Giner wrote:
> According to this link, Proxmox VE 5 is affected.
> 
> https://www.cloudlinux.com/cloudlinux-os-blog/entry/major-9-8-vulnerability-affects-multiple-linux-kernels-cve-2019-8912-af-alg-release
> 
> We have a patch?
> 

ah yeah, the hyped CVE ^^ but yes, a kernel with a fix[0] for this 
use-after-free
is available with pve-kernel-4.15.18-11-pve in version 4.15.18-34[1], at the 
time
of writing it's only in the pvetest repository, as the update is not to big it
will come to the other repos (no-subscription and enterprise) probably this 
week,
early next week if no regression emerges.

cheers,
Thomas

[0]: 
https://git.proxmox.com/?p=pve-kernel.git;a=commitdiff;h=cf6ea5cf3482781a5e93bb88f526c821bba7ca0d
[1]: 
https://git.proxmox.com/?p=pve-kernel.git;a=commitdiff;h=9bd09ca97abb37c24e3b0fe50e31d8fdf6f59ea5


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] APT CVE-2019-3462 (please read before upgrading!)

2019-01-25 Thread Thomas Lamprecht
On 1/23/19 10:27 AM, Fabian Grünbichler wrote:
> The APT package manager used by Proxmox VE and Proxmox Mail Gateway was
> recently discovered to be affected by CVE-2019-3462, allowing a
> Man-In-The-Middle or malicious mirror server to execute arbitrary code
> with root privileges when affected systems attempt to install upgrades.
> 
> To securely upgrade your systems, run the following commands as root:
> 
> # apt -o Acquire::http::AllowRedirect=false update
> # apt -o Acquire::http::AllowRedirect=false full-upgrade
> 
> and verify that apt is now at least version 1.4.9 on Debian Stretch:
> 
> $ apt -v
> apt 1.4.9 (amd64)
> 
> Please see the Debian Security Advisory for details:
> https://www.debian.org/security/2019/dsa-4371
> 

To allow you to install Proxmox VE with a package management system version not
affected by this issue, we additionally released a new Proxmox VE 5.3 ISO
containing the fix for CVE-2019-3462 and all other security fixes since the
first 5.3 ISO. Get it from:

https://www.proxmox.com/en/downloads/category/iso-images-pve
http://download.proxmox.com/iso/proxmox-ve_5.3-2.iso

All container templates based on apt (Debian and Ubuntu) got also updated
yesterday.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Kill LXC...

2019-01-23 Thread Thomas Lamprecht
Hi,

On 1/23/19 7:03 PM, Gilberto Nunes wrote:
> I am facing some trouble with 2 LXC that cannot access either kill it.
> Already try lxc-stop -k -n  but no effect.
> Any advice will be welcome...

does it have processes in the "D" (uninterruptible) state?
Probably because some network mount where it has IO pending.

In this case, where whatever storage device/mount does not comes
back to complete the IO your single way out may be a reboot...

You could check with ps either on the host (lots of processes) or from
inside the container, e.g. with:
# pct exec VMID ps faux

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Is there a way to permit to start up VMs when no quorum?

2019-01-07 Thread Thomas Lamprecht
On 1/7/19 7:39 PM, Denis Morejon wrote:> Could you give me an example please?
Dietmar did already, research split brain.

>> In practice, I know a lot of people that are afraid of building a cluster 
>> because of the lost of quorum, an have a plain html web page with the url of 
>> each node instead. And this is sad. This is like assuming that the most 
>> important thing is to have the VMs UP!

No, while important it really isn't the *most* important thing. 
The most important thing is no (data/shared resource) corruption of the VM/CT,
and our quorum mechanisms are one part of the machinery to keep that this way.

If your network is stable and either multicast works (on smaller systems or
medium ones with _really fast_ switches, unicast works just as well) quorum 
really
should not be a problem?

And as really a lot of our users have clusters configured I do not think that
"a lot of people are afraid of using them".

> 
> El 7/1/19 a las 13:23, Dietmar Maurer escribió:
>>> I don't know the idea behind keeping a VM from starting up when no
>>> quorum. It has been maybe, since my point of view, the worst of managing
>>> Proxmox cluster, because the stability of services (VM up and running)
>>> had to be first (before the sync of information, for instance).
>>>
>>> Is there a way to bypass this and permit to start up a VM even on no quorum?
>> No. This is required to avoid split brain ...
>>
>>


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Container problem

2018-12-04 Thread Thomas Lamprecht
On 12/4/18 10:27 AM, lord_Niedzwiedz wrote:
> root@hayne:~# systemctl start pve-container@108
> Job for pve-container@108.service failed because the control process exited 
> with error code.
> See "systemctl status pve-container@108.service" and "journalctl -xe" for 
> details.
>
> root@hayne:~# systemctl status pve-container@108.service
> ● pve-container@108.service - PVE LXC Container: 108
>Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor 
> preset: enabled)
>Active: failed (Result: exit-code) since Tue 2018-12-04 10:25:45 CET; 12s 
> ago
>  Docs: man:lxc-start
>man:lxc
>man:pct
>   Process: 9268 ExecStart=/usr/bin/lxc-start -n 108 (code=exited, 
> status=1/FAILURE)
>
> Dec 04 10:25:44 hayne systemd[1]: Starting PVE LXC Container: 108...
> Dec 04 10:25:45 hayne systemd[1]: pve-container@108.service: Control process 
> exited, code=exited status=1
> Dec 04 10:25:45 hayne systemd[1]: Failed to start PVE LXC Container: 108.
> Dec 04 10:25:45 hayne systemd[1]: pve-container@108.service: Unit entered 
> failed state.
> Dec 04 10:25:45 hayne systemd[1]: pve-container@108.service: Failed with 
> result 'exit-code'.
>


How about at least some minimal context and more telling logs? ^^

# lxc-start -n 108 -l DEBUG -o ct108-start.log

optionally add the "-F" flag to start the CT in foreground..

cheers,
Thomas



___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Cluster network via directly connected interfaces?

2018-11-22 Thread Thomas Lamprecht
On 11/22/18 7:29 PM, Frank Thommen wrote:
> Please excuse, if this is too basic, but after reading 
> https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the 
> cluster/corosync network could be built by directly connected network 
> interfaces.  I.e not like this:
> 
>  +---+
>  | pve01 |--+
>  +---+  |
>     |
>  +---+ ++
>  | pve02 |-| network switch |
>  +---+ ++
>     |
>  +---+  |
>  | pve03 |--+
>  +---+
> 
> 
> but like this:
> 
>  +---+
>  | pve01 |---+
>  +---+   |
>  |   |
>  +---+   |
>  | pve02 |   |
>  +---+   |
>  |   |
>  +---+   |
>  | pve03 |---+
>  +---+
> 
> (all connections 1Gbit, there are currently not plans to extend over three 
> nodes)
> 
> I can't see any drawback in that solution.  It would remove one layer of 
> hardware dependency and potential spof (the switch).  If we don't trust the 
> interfaces, we might be able to configure a second network with the three 
> remaining interfaces.
> 
> Is such a "direct-connection" topology feasible?  Recommended? Strictly not 
> recommended?

full mesh is certainly not bad. for cluster network (corosync) latency is the 
key,
bandwidth isn't really much needed. So this surely not bad.
We use also some 10g (or 40G, not sure) full mesh for a ceph cluster network - 
you
safe a not to cheap switch and get full bandwidth and good latency.
The limiting factor is that this gets quite complex for bigger clusters, but 
besides
that it doesn't really has any drawbacks for inter cluster connects, AFAICT.

For multicast you need to try, as Uwe said, I'm currently not sure, it could 
work as
Linux can route multicast just fine (mrouter) but I don't remember exactly 
anymore -
sorry.

But if you try it it'd be great if you report back. Else unicast is in those 
cluster
sizes always an option - you really shouldn't have a problem as long as you do 
not put
storage traffic together with corosync (cluster) on the same net (corosync gets 
to much
latency spikes then).

> 
> I am currently just planning and thinking and there is no cluster (or even a 
> PROXMOX server) in place.
> 
> Cheers
> frank




___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Request for backport of Ceph bugfix from 12.2.9

2018-11-08 Thread Thomas Lamprecht
On 11/8/18 1:43 PM, Alwin Antreich wrote:
> On Wed, Nov 07, 2018 at 09:01:09PM +0100, Uwe Sauter wrote:
>> This is a bug in 12.2.8 [1] and has been fixed in this PR [2].
>>
>> Would it be possible to get this backported as it is not recommended to 
>> upgrade to 12.2.9?
> Possible yes, but it looks like that the Ceph version 12.2.10 may be
> soon released, including this fix.
> https://www.spinics.net/lists/ceph-users/msg49112.html
> 
> For now I would wait with backporting, as we would need to test a
> backported 12.2.8 as well as we will with a new 12.2.10.

This is a minimal proposed change which looks just right, so much testing
may not be needed, i.e., just the change part once - at least if it
applies cleanly.

But as a ceph rollout is quite a bit of work besides that I agree with
Alwin that it probably makes sense to wait at the soon arriving 12.2.10

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Quick question regarding node removal

2018-11-06 Thread Thomas Lamprecht
Hi,

On 11/6/18 12:56 PM, Uwe Sauter wrote:
> Hi,
> 
> in the documentation to pvecm [1] it says:
> 
> 
> At this point you must power off hp4 and make sure that it will not power on 
> again (in the network) as it is.
> Important:
> As said above, it is critical to power off the node before removal, and make 
> sure that it will never power on again (in the
> existing cluster network) as it is. If you power on the node as it is, your 
> cluster will be screwed up and it could be difficult
> to restore a clean cluster state.
> 
> 
> Am I right to assume that this is due to the configuration on the node which 
> is to be removed? If I reinstall that node I can
> reuse hostname and IP addresses?

Yes, exactly. It's more for the reason that the removed node still thinks
that it is part of the cluster and has still access to the cluster
communication (through the '/etc/corosync/authkey').

So re-installing works fine.

You could also separate it without re-installing, see:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_separate_node_without_reinstall

Here I recommend that you test this first (if the target is anything
production related) - e.g. in a virtual PVE cluster (PVE in VMs).

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Changing votes and quorum

2018-10-30 Thread Thomas Lamprecht
Hi!

Am 10/29/2018 um 05:36 PM schrieb Dewangga Alam:
> On 29/10/18 16.14, Thomas Lamprecht wrote:
>> Am 10/28/2018 um 02:54 PM schrieb Dewangga Alam: Hello!
>>
>> I was new in proxmox and am trying to build large scale proxmox
>> 5.2 cluster (>128 nodes). My `/etc/pve/corosync.conf` configuration
>> like :
>>
>> ``` nodelist { node { name: node1 nodeid: 1 quorum_votes: 1 
>> ring0_addr: 192.168.24.2 } ... skip till nodes 28 ... node { name:
>> node28 nodeid: 28 quorum_votes: 1 ring0_addr: 192.168.24.110 } } 
>> quorum { provider: corosync_votequorum expected_votes: 16
>>
>>> expected_votes must be your real highest expected votes.
> 
> [..]
>>
>> last_man_standing: 1 last_man_standing_window: 2
>>
>>
>>
>> } totem { cluster_name: px-cluster1 config_version: 90 window_size:
>> 300 interface { ringnumber: 0 } ip_version: ipv4 secauth: on 
>> transport: udpu version: 2 } ```
>>
>> My cluster have 28 nodes in each rack. and total nodes will 28
>> nodes*5 racks. So it will 140 nodes in a cluster. From the
>> adjustment above, I wonder there's affected to pve/corosync.conf
>> configuration, but in fact, it didn't.
>>
>> So my basic question, when I invoke `pvecm status` in a node, the 
>> result wasn't as I expect. Then, is it possible to change
>> votequorum configuration?
>>
>>> What wasn't as expected? That your set expected_votes is not 
>>> "accepted" by corosync? That is expected behaviour.
>>
>>> What is the real problem you want to solve?
> 
> I want to build > 32 nodes in one cluster,

cool!

> and I am expect that
> expected_votes can be controlled lower than real votes. I thought, it
> should be make a quorum if the 50%+1 formulas aren't met.

No, that isn't needed. Highest expected can not be smaller then the
actual online quorate nodes. That's like saying you have a election
in your small country with 10 people (and thus 10 expected votes) but
you receive (as example) 16 votes - something is fishy. Corosync here
just thinks that the census (in this case you) was wrong and uses the
higher number.

Although, if you set expected votes, but have a higher node count
but equal or less than your manual set expected_votes count are
online/quorate then you will also see that it stays at your set number
and you actually can have quorum with "less" nodes. But as soon as more
nodes get online corosync expected vote number will rise.

When enabling last_man_standing you can achieve that the highest
expecting also scales down, if some nodes go down (or have another
reason to lose cluster communication) but there are enough nodes left
to have a working, quorate, cluster then after a specified time window
- when all stays working - corosync recalculates it's expected votes
and you can then loose additional nodes.

Hope that helps a bit.

cheers,
Thomas

> 
> Is it a best practice?
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Changing votes and quorum

2018-10-29 Thread Thomas Lamprecht
Hi!

Am 10/28/2018 um 02:54 PM schrieb Dewangga Alam:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Hello!
> 
> I was new in proxmox and am trying to build large scale proxmox 5.2
> cluster (>128 nodes). My `/etc/pve/corosync.conf` configuration like :
> 
> ```
> nodelist {
>   node {
> name: node1
> nodeid: 1
> quorum_votes: 1
> ring0_addr: 192.168.24.2
>   }
>   ... skip till nodes 28 ...
>   node {
> name: node28
> nodeid: 28
> quorum_votes: 1
> ring0_addr: 192.168.24.110
>   }
> }
> quorum {
>   provider: corosync_votequorum
>   expected_votes: 16

expected_votes must be your real highest expected votes.

>   last_man_standing: 1
>   last_man_standing_window: 2



> }
> totem {
>   cluster_name: px-cluster1
>   config_version: 90
>   window_size: 300
>   interface {
> ringnumber: 0
>   }
>   ip_version: ipv4
>   secauth: on
>   transport: udpu
>   version: 2
> }
> ```
> 
> My cluster have 28 nodes in each rack. and total nodes will 28 nodes*5
> racks. So it will 140 nodes in a cluster. From the adjustment above, I
> wonder there's affected to pve/corosync.conf configuration, but in
> fact, it didn't.
> 
> So my basic question, when I invoke `pvecm status` in a node, the
> result wasn't as I expect. Then, is it possible to change votequorum
> configuration?

What wasn't as expected? That your set expected_votes is not
"accepted" by corosync? That is expected behaviour.

What is the real problem you want to solve?

> 
> 
> ```
> Quorum information
> - --
> Date: Sun Oct 28 20:51:42 2018
> Quorum provider:  corosync_votequorum
> Nodes:28
> Node ID:  0x0002
> Ring ID:  1/27668
> Quorate:  Yes
> 
> Votequorum information
> - --
> Expected votes:   28

if more nodes are online than you set expected it will automatically use
the real node count, i.e. formula would be

expected = max(user_set_expected, #nodes_quorate_online)

Note that last_man_standing is not really recommended by us, if you
employ it nonetheless then please test it carefully before rolling out
in production, maybe take also a look at wait_for_all flag to take
this a bit on the save side for cluster cold boots.

> Highest expected: 28
> Total votes:  28
> Quorum:   15
> Flags:Quorate LastManStanding
> ```
> -BEGIN PGP SIGNATURE-
> 
> iQIzBAEBCAAdFiEEZpxiw/Jg6pEte5xQ5X/SIKAozXAFAlvVv6UACgkQ5X/SIKAo
> zXBVdg/9GryQp4HbCmTYfHx1wlbESRlTsBViNCoKLCYcnbdYK9oZa6WCO34C5vHq
> RGSmvqT8CPk1exlXQYvHRNZwBJHzjZ8t5CtQXxOrW+SiIlSWcuW6iw+UKDrZASqI
> wwmJIy0Snu6GqP3Fb6OLfpU5rzGgHARQPBlSxAG7q8U7ZSzQIJ/bm7OnSc/R6Ghk
> Xv/GsCfYD3iDlkkGsUb0xG4f2X7o539OP4su5j+cYcjfKn/l+ffsi0P//hZn21He
> oltgUq6i9v6VJpwkc8rVIZ7/WI/yOqfiwV4OsYiJpASxxzvysJEKedLLp1fyNTuB
> a4D3JXy9caYurbEWGeApKnvJILtr2E1APDJWqXH82MYaY/HgNn97bkdvKnmaq/dD
> 3/TwaUUOn2Mk74Pw2WxqXHQp7heAZW1+O/pb6qtgIDIMsZRB4zZRddfbh6X7ySoi
> Ca8WIh9LkWhD5T1qZPhT2lFyWgLww1ZxCGbEKkVSH7MUsMG8+YwY9NwCTmFdiQoJ
> y9gNSMBJ4l8Fj+qyuvj0zd9Wmr3nW6/AsW+6edWUN9py0tKFxgOoqFTJDCn288AC
> ytl/a3V/iVDh21GnltkUdoUmZAaYIjty3CPLs3uTNqLaI/7c3FQAw8+13W1Oa1qR
> zrBdNa4fkljJw9ew1x5h4zW466HWb5wm1O/HAG9HmPKBQk9UH48=
> =wQxp
> -END PGP SIGNATURE-


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Ceph repository

2018-10-23 Thread Thomas Lamprecht
Hi,

On 10/22/18 5:29 PM, Eneko Lacunza wrote:
> El 22/10/18 a las 17:17, Eneko Lacunza escribió:
>>
>> I'm looking at Ceph Jewel to Luminuous wiki page as preparation for a PVE 4 
>> to 5 migration:
>> https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous
>>
>> I see that after the procedure, there would be 2 repositories with ceph 
>> packages; the official ceph.com repo and the PVE repo.
>>
>> Is this necessary, or can we leave ceph.com repo commented?
>>
> This is addressed in the following wiki:
> https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0
> 
> Wouldn't be better just to migrate directly using Proxmox Ceph repository?

The thing is: you have to upgrade to Ceph Luminous *before* you upgrade from 
latest
PVE 4.4 (jessie based) to PVE 5.X (stretch based), and we do not provide 
luminous
packages for PVE 4.4 (jessie based). 

So, to solve this you need to upgrade to Ceph Luminous with the ceph.com jessie 
repos,
but should ensure that you install a lower, known working version, e.g. 
ceph=12.2.5,
then you can upgrade PVE and then you can switch out the ceph.com with our 
stretch based
ones.

If you, whyever, don't do a dist-upgrade and have PVE 5.X already installed 
then you
could directly use our ceph repositories.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] please help setup correctly proxmox cluster

2018-10-22 Thread Thomas Lamprecht
On 10/22/18 7:02 AM, Юрий Авдеев wrote:
> What I need: Two hosts (node1 and node2) with one virtual machine in 
> replication without shared storage. 
> If one of two hosts is dead - virtual machine will starts in other hosts. 
> Node3 is online, only for quorum, not for virt. 
> I using ZFS for storage on hosts. Please help me to understand, how I must 
> set up this thing correctly.
> Great thanks everybody!

Either just give one node more votes (preferred) or instead of starting
up a VM with PVE for quorum just execute `pvecm expected 1` if you are sure
the other node *is really* dead.

It's hard to say whats wrong without any logs. But a VM in PVE for itself
should never be done, at least if not for testing purpose - to much
intertwined dependencies, it's just not a good stable design.


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] I lost the cluster communication in a 10 nodes cluster

2018-10-15 Thread Thomas Lamprecht
On 10/12/18 6:57 PM, Denis Morejon wrote:
> The 10 nodes lost the communication with each other. And they were working 
> fine for a month. They all have version 5.1.
> 

any environment changes? E.g., switch change or software update
(which then could block multicast)?

Can you also see if the omping test go still through:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_cluster_network

> 
> All nodes have the same date/time and show a status like this:
> 
> root@proxmox11:~# pvecm status
> 
> Quorum information
> --
> Date: Fri Oct 12 11:55:59 2018
> Quorum provider:  corosync_votequorum
> Nodes:    1
> Node ID:  0x0007
> Ring ID:  7/60372
> Quorate:  No
> 
> Votequorum information
> --
> Expected votes:   10
> Highest expected: 10
> Total votes:  1
> Quorum:   6 Activity blocked
> Flags:
> 
> Membership information
> --
>     Nodeid  Votes Name
> 0x0007  1 192.168.80.11 (local)
> 
> 


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox usb automount

2018-10-04 Thread Thomas Lamprecht
On 10/4/18 9:22 AM, lord_Niedzwiedz wrote:
> root@hayneee:~# apt install pve5-usb-automount
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> E: Unable to locate package pve5-usb-automount
> 
>> apt install pve5-usb-automount
>>
>>
>>
>>
>> On Oct 3, 2018, at 05:16, lord_Niedzwiedz 
>> mailto:sir_misi...@o2.pl>> wrote:
>>
>>  Hi,
>>
>> Why i make easy add auto mount usb device to ProxMox.
>>
>> example:
>> /dev/sde1    /media/sde1

normally this is done by an desktop environment, not often used for
headless Proxmox VE setups.

You may want to take a look at udisks2[0][1], there some wikis[2]/tutorials
showing it's usage.

Playing around with udev/automountfs could also be an option.

[0]: https://www.freedesktop.org/wiki/Software/udisks/
[1]: https://packages.debian.org/en/stretch/udisks2
[2]: https://wiki.archlinux.org/index.php/udisks


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Virtio-nic or other

2018-09-20 Thread Thomas Lamprecht
Hi,

On 9/20/18 6:22 PM, Gilberto Nunes wrote:
> HI there
> 
> PVE 5.2
> CentOS guest with kernel 2.6.32
> 
> With is safer: virtio or realtek?

hard to say, but 2.6.32 has virtio-net support, and it's normally
faster, so I'd start there. If you still run into problems you
can always try realtek too. Normally CentOS ports a lot of fixes
back to their dusty old kernel, so virtio-net should really be worth
a shot.

> 
> I was using e1000 but get bug with that kernel...
> Virtio-nic has bug in this kernel too?
> 

Would help to specify which bug exactly :)

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


[PVE-User] Alpine Linux Templates update to address bugs in its package manager APK

2018-09-14 Thread Thomas Lamprecht
Hi all,

As you may have read[0], some bugs in the package manager APK in Alpine Linux 
surfaced.
The most serious one allowing Remote Code Execution (RCE) if the host suffers a 
Man In
The Middle Attack.

To mitigate this please update your APK version to:
* Alpine Linux v3.5: 2.6.10
* Alpine Linux v3.6: 2.7.6
* Alpine Linux v3.7: 2.10.1
* Alpine Linux v3.8: 2.10.1

(or later).

We updated all our provided template images to a newer version including those 
fixes[1].
We also unlinked the problematic ones, this is something we normally don't do 
as we only
remove them from the index, but it seemed justified in this case.
So you will have to update the appliance info index manually (or wait till the
pve-daily-update.timer triggers and updates it automatically):

# pveam update

Then you should have an up to date index and will be able to download Alpine 
Linux
images again.

Upgrading a existing container:
If you mistrust your network you can manually download the package and verify 
its
signature manually.
Either use 'apk fetch' and check the downloaded updates with 'apk verify' or 
download
the package manually from an mirror.

From https://mirrors.alpinelinux.org/ select a mirror of your choice, ideally 
with
https, open it and navigate to your version and architecture, e.g.:
https://repository.fit.cvut.cz/mirrors/alpine/v3.8/main/x86_64/

search for 'apk-tools-static' and download the respective version, e.g.:

# wget 
https://repository.fit.cvut.cz/mirrors/alpine/v3.8/main/x86_64/apk-tools-2.10.1-static-r0.apk
# wget 
https://repository.fit.cvut.cz/mirrors/alpine/v3.8/main/x86_64/apk-tools-static-2.10.1-r0.apk

then verify manually:

# apk verify apk-tools-static-2.10.1-r0.apk

if all's OK you can install it:

# apk add ./apk-tools-static-2.10.1-r0.apk
(apk may still fetch indexes, but it installs from local)

A check could also be done by extracting the .apk in a tmp directory, e.g.:
# mkdir /tmp/apk
# tar xf apk-tools-static-2.10.1-r0.apk -C /tmp/apk

then verify it's content and signatures manually - this can also be done on 
another box,
if you cannot trust the CT (currently) at all.

cheers,
Thomas

[0]: https://justi.cz/security/2018/09/13/alpine-apk-rce.html
[1]: 
https://git.alpinelinux.org/cgit/apk-tools/commit/?id=6484ed9849f03971eb48ee1fdc21a2f128247eb1

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Proxmox upgrade 4 - 5

2018-09-10 Thread Thomas Lamprecht
Hi,

On 9/10/18 10:49 AM, John Crisp wrote:
> I have been critical of some things in the past with Proxmox, so to be
> even handed I thought I'd just drop a note to say over the weekend I did
> 2 in place upgrades from v4 -> v5
> 
> Both went pretty well as smooth as silk, the only issue being that on
> one of them the old boot partition didn't have enough room for a new
> kernel and the dist-upgrade failed when trying to regenerate initramfs.
> (it was originally a V3 upgraded to v4) Slightly scary minutes.
> 
> Perhaps a note on the wiki to say check for space before upgrading older
> systems would be helpful (assuming you have an older system with this)
> 

yes, makes sense to note this! Bit bad luck to run into this just on major
upgrade...

added:
> ensure your /boot partition, if any, has enough space for a new kernel (min 
> 60MB) - e.g., by removing old unused kernels (see pveversion -v)
here: https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0#Preconditions

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] HA Timing question

2018-09-10 Thread Thomas Lamprecht
On 9/7/18 4:28 PM, Klaus Darilion wrote:
> Am 07.09.2018 um 10:35 schrieb Dietmar Maurer:
>>> But what is the timing for starting VM100 on another node? Is it
>>> guaranteed that this only happens after 60 seconds? 
>>
>> yes, that is the idea.
> 
> I miss the point how this is achieved. Is there somewhere a timer of 60s
> before starting a VM on some other node? Where exactly in case I need to
> tune this? E.g if I would like to have such reboots and VM starting only
> after 5 minutes of cluster problems.

Ha, I guess your the first whom wants to increase this delay, most want it
to be in the duration of mere seconds. 

Problem is, there's the $fence_delay in the HA::NodeStatus module which is
the delay at which point a node gets marked as dead-to-be-fenced.
Then there's the nodes watchdog, which, even if you increase the delay above,
will still trigger if it's not quorate for 60 seconds, so this would need
changing too. For the locks, they are per-node and timeout after 2 minutes
of the last update, as a node (or the current manager) can only do something
if they held this lock, a time increase here should not be too problematic -
theoretically, but is not tested at all.
I'm just telling you what is where, no encouraging, if you still want to hack
around: great, wouldn't recommend starting in production, though :)

> 
> Are there some other not yet mentioned relevant timers in Proxmox
> (besides the timers in Corosync)?

Maybe give our HA documentation, especially the "How It Works"[0] and
"Fencing"[1] chapters, a read.

[0]: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_how_it_works
[1]: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_fencing

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Fedora 28

2018-09-07 Thread Thomas Lamprecht
On 9/6/18 10:33 AM, lord_Niedzwiedz wrote:
>         Hi,
> I get yours offical Fedora 27.


you should now be able to get the Fedora 28 template directly from us.
# pveam update

should pull the newest appliance index (gets normally done automatically,
once a day) then either download it through the WebUI or with CLI:

# pveam download STORAGE fedora-28-default_20180907_amd64.tar.xz

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] windows 2008 enterprise with uefi

2018-08-28 Thread Thomas Lamprecht
PVE 5.2 contains a newer version of OVMF (our used EFI implementation) with
a lot of fixes, updating could help - but is certainly no guarantee -
especially as your windows is already in the process of booting.

On 8/28/18 11:20 AM, lists wrote:
> If, during windows iso boot, I press F8 for advanced options, and I select 
> safe mode, I can the last line displayed is:
> 
> Loaded: \windows\system32\drivers\disk.sys
> 
> Then progress stops.
> 
> I changed the VM disk config from IDE to SCSI, to see if that made a 
> difference. But alas, no further progress... :-(
> 

Hmm, maybe try SCSI but set the scsihw (SCSI Controller in the VM's Option Tab)
to LSI 53C895A? Or SATA, Windows is often a bit picky about this stuff...

> On 28-8-2018 10:59, lists wrote:
>> Here is the VM config:
>>
>>> balloon: 0
>>> bios: ovmf
>>> boot: dcn
>>> bootdisk: ide0
>>> cores: 2
>>> efidisk0: local-btrfs:107/vm-107-disk-2.qcow2,size=128K
>>> ide0: local-btrfs:107/vm-107-disk-1.qcow2,size=233G
>>> ide2: local-btrfs:iso/win2008_enterprise_x64.iso,media=cdrom
>>> memory: 8192
>>> name: win8-uefi
>>> net0: e1000=BA:7F:7B:91:9B:1C,bridge=vmbr0
>>> numa: 0
>>> ostype: win8
>>> scsihw: virtio-scsi-pci
>>> smbios1: uuid=626ef0a6-0161-492f-80b7-fbc437c4cdb2
>>> sockets: 2

You had a 2 socket with 2 cores each system on the bare metal setup?
If not, please use 4 cores, suddenly having a NUMA system could also be
a problem for windows.

cheers,
Thomas

>>
>> OS Type set to Windows 8.x/2012/2012r2
>> KVM Hardware virtualization: yes
>> Freez cpu at startup: no
>> Protection: no
>> Qemu Agent: no
>> Bus: IDE
>>
>> Anything else I need to tell?
>>
>> MJ
>>
>> On 28-8-2018 10:54, Lindsay Mathieson wrote:
>>> What virtual hardware are you giving it? bus, disk etc.
>>>
>>> On 28/08/2018 6:06 PM, lists wrote:
 Hi,

 I am trying to move a physical windows 2008 enterprise uefi installation 
 (Version 6.0.6002 Service Pack 2 Build 6002) into proxmox, and I'm getting 
 nowhere.

 Tried all kinds of approaches, and this was my latest attempt:

 Creating a full system backup using windows backup, and then boot the 
 windows install iso in proxmox, to perform a system restore from this 
 backup into proxmox.

 But as soon as I enable uefi in my proxmox VM config, the windows iso no 
 longer boots. However, the physical server IS this same OS in uefi mode, 
 the combination should work, I guess.

 Anyone with a tip or a tric..?

 This is proxmox 4.4-20, so it's a bit older. I could try it on a fresh new 
 proxmox 5.2 install, but first I wanted to ask here.

 Anyone?

 MJ
 ___
 pve-user mailing list
 pve-user@pve.proxmox.com
 https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>>
>> ___
>> pve-user mailing list
>> pve-user@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Problems with HP Smart Array P400 and ZFS

2018-08-24 Thread Thomas Lamprecht
Hi,

On 8/24/18 11:51 AM, Dreyer, Jan, SCM-IT wrote:
> Hi,
> 
> my configuration:
> HP DL380 G5 with Smart Array P400
> Proxmox VE 5.2-1
> name: 4.4.128-1-pve #1 SMP PVE 4.4.128-111 (Wed, 23 May 2018 14:00:02 +) 
> x86_64 GNU/Linux
> This system is currently running ZFS filesystem version 5.
> 
> My problem: When trying to update to a higher kernel (I tried 4.10 and 4.15 
> series), the initrd is not able to detect the cciss devices, and as such not 
> able to load the ZFS pools, including the root pool.

FYI cciss is an alias to hpsa as since 4.14:

> commit 253d2464df446456c0bba5ed4137a7be0b278aa8
> Author: Hannes Reinecke 
> Date:   Tue Aug 15 08:58:08 2017 +0200
> 
> scsi: cciss: Drop obsolete driver
> 
> The hpsa driver now has support for all boards the cciss driver
> used to support, so this patch removes the cciss driver and
> make hpsa an alias to cciss.


> Falling back to kernel 4.4 doesn’t let me use the ZFS cache file system 
> though. :-(
> 
> Any hints on how to detect the raid controller device in initrd?
> 

Sounds a bit like:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765105

Are your firmware versions all up to date?

I did not find related issues searching the LKML, Ubuntu's kernel devel or
related lists. (We very recently ported back a fix for hpsa but it was related
to clean shutdown, no changes for detection/bring up AFAICT). 

regards,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] PVE kernel (sources, build process, etc.)

2018-08-22 Thread Thomas Lamprecht
On 8/22/18 9:58 AM, Uwe Sauter wrote:
> Am 22.08.18 um 09:55 schrieb Thomas Lamprecht:
>> On 8/22/18 9:48 AM, Uwe Sauter wrote:
>>> Hi all,
>>>
>>> some quick questions:
>>>
>>> * As far as I can tell the PVE kernel is a modified version of Ubuntu 
>>> kernels, correct?
>>>   Modifications can be viewed in the pve-kernel.git repository ( 
>>> https://git.proxmox.com/?p=pve-kernel.git;a=tree ).
>>>
>>
>> Yes, in the respective git branch (master is currently 4.13 and 
>> pve-kernel-4.15 is, you guess it, 4.15
>> patches/ includes on-top bug/security fixes also some out of tree modules 
>> get included (ZFS, igb, e1000e)
> 
> I'm mostly interested in the myri10ge driver right now. From what I can tell, 
> you do ship this particular driver without modification?
> 

You're right, no modifications of this module.

# modinfo myri10ge
filename:   
/lib/modules/4.15.18-2-pve/kernel/drivers/net/ethernet/myricom/myri10ge/myri10ge.ko
firmware:   myri10ge_rss_eth_z8e.dat
firmware:   myri10ge_rss_ethp_z8e.dat
firmware:   myri10ge_eth_z8e.dat
firmware:   myri10ge_ethp_z8e.dat
license:Dual BSD/GPL
version:1.5.3-1.534
author: Maintainer: h...@myri.com
description:Myricom 10G driver (10GbE)
srcversion: 46526E4E4E82667CBFF2D7C
alias:  pci:v14C1d0009sv*sd*bc*sc*i*
alias:  pci:v14C1d0008sv*sd*bc*sc*i*
depends:dca
retpoline:  Y
intree: Y
name:   myri10ge
vermagic:   4.15.18-2-pve SMP mod_unload modversions 
parm:   myri10ge_fw_name:Firmware image name (charp)
parm:   myri10ge_fw_names:Firmware image names per board (array of 
charp)
parm:   myri10ge_ecrc_enable:Enable Extended CRC on PCI-E (int)
parm:   myri10ge_small_bytes:Threshold of small packets (int)
parm:   myri10ge_msi:Enable Message Signalled Interrupts (int)
parm:   myri10ge_intr_coal_delay:Interrupt coalescing delay (int)
parm:   myri10ge_flow_control:Pause parameter (int)
parm:   myri10ge_deassert_wait:Wait when deasserting legacy interrupts 
(int)
parm:   myri10ge_force_firmware:Force firmware to assume aligned 
completions (int)
parm:   myri10ge_initial_mtu:Initial MTU (int)
parm:   myri10ge_napi_weight:Set NAPI weight (int)
parm:   myri10ge_watchdog_timeout:Set watchdog timeout (int)
parm:   myri10ge_max_irq_loops:Set stuck legacy IRQ detection threshold 
(int)
parm:   myri10ge_debug:Debug level (0=none,...,16=all) (int)
parm:   myri10ge_fill_thresh:Number of empty rx slots allowed (int)
parm:   myri10ge_max_slices:Max tx/rx queues (int)
parm:   myri10ge_rss_hash:Type of RSS hashing to do (int)
parm:   myri10ge_dca:Enable DCA if possible (int)

> 
>>
>>> * pve-kernel 4.13 is based on 
>>> http://kernel.ubuntu.com/git/ubuntu/ubuntu-artful.git/ ?
>>>
>>
>> Yes. (Note that this may not get much updates anymore)
>>
>>> * pve-kernel 4.15 is based on 
>>> http://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/ ?
>>>
>>
>> Yes. We're normally on the latest stable release tagged on the master branch.
>>
> 
> I'll checkout both and compare the myri10ge drivers…
> 

What's your exact issue, if I may ask?

cheers,
Thomas



___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Mainline kernel

2018-08-22 Thread Thomas Lamprecht
Hi,

On 8/21/18 9:01 PM, Gilberto Nunes wrote:
> Hi there
> 
> Can I download a kernel from here:
> 
> http://kernel.ubuntu.com/~kernel-ppa/mainline/
> 
> And use it with proxmox?
> 

You can, just install the .deb with dpkg, but you won't have ZFS and
a few other things included.

It may work but your totally on your own, as Proxmox VE is only
tested and tailored to our latest shipped kernel(s).
You may get a fix not yet backported but you also may run into new bugs.

I heavily advise against doing this in production, at least not until a
lot of testing on similar test setups.


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] PVE kernel (sources, build process, etc.)

2018-08-22 Thread Thomas Lamprecht
Hi Uwe,

On 8/22/18 9:48 AM, Uwe Sauter wrote:
> Hi all,
> 
> some quick questions:
> 
> * As far as I can tell the PVE kernel is a modified version of Ubuntu 
> kernels, correct?
>   Modifications can be viewed in the pve-kernel.git repository ( 
> https://git.proxmox.com/?p=pve-kernel.git;a=tree ).
> 

Yes, in the respective git branch (master is currently 4.13 and pve-kernel-4.15 
is, you guess it, 4.15
patches/ includes on-top bug/security fixes also some out of tree modules get 
included (ZFS, igb, e1000e)

> * pve-kernel 4.13 is based on 
> http://kernel.ubuntu.com/git/ubuntu/ubuntu-artful.git/ ?
> 

Yes. (Note that this may not get much updates anymore)

> * pve-kernel 4.15 is based on 
> http://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/ ?
> 

Yes. We're normally on the latest stable release tagged on the master branch.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] [pve-devel] applied: [PATCH qemu-server] agent: import used check_agent_error method

2018-07-30 Thread Thomas Lamprecht

Am 07/30/2018 um 11:57 AM schrieb lyt_yudi:

sorry, got it

It’s fixed!



great, thanks for reporting and testing again!


在 2018年7月30日,下午5:54,lyt_yudi  写道:

Hi



在 2018年7月30日,下午4:43,Thomas Lamprecht  写道:


# pvesh create nodes/localhost/qemu/131/agent/set-user-password --password 
test123456 --username root --crypted 0
Can't use string ("password") as a HASH ref while "strict refs" in use at 
/usr/share/perl5/PVE/QemuServer/Agent.pm line 62.


Ah, yes, I found the issue and pushed a fix. Much thanks for the report.

Thanks


But with this new patch, report errors:


在 2018年7月30日,下午5:05,Thomas Lamprecht  写道:

-use PVE::QemuServer::Agent qw(agent_available agent_cmd);
+use PVE::QemuServer::Agent qw(agent_available agent_cmd check_agent_error);
use MIME::Base64 qw(encode_base64 decode_base64);
use JSON;


# pvesh create nodes/localhost/qemu/131/agent/set-user-password --password 
test123456 --username root --crypted 0
Can't use string ("password") as a HASH ref while "strict refs" in use at 
/usr/share/perl5/PVE/QemuServer/Agent.pm line 62.


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user






___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Can't set password with qm guest passwd

2018-07-30 Thread Thomas Lamprecht

Am 07/30/2018 um 09:15 AM schrieb lyt_yudi:




在 2018年7月30日,下午2:25,Dominik Csapak  写道:

yes there was a perl import missing, i already sent a fix on the devel list, 
see:

https://pve.proxmox.com/pipermail/pve-devel/2018-July/033180.html 



thanks

This is a new error

# pvesh create nodes/localhost/qemu/131/agent/set-user-password --password 
test123456 --username root --crypted 0
Can't use string ("password") as a HASH ref while "strict refs" in use at 
/usr/share/perl5/PVE/QemuServer/Agent.pm line 62.



Ah, yes, I found the issue and pushed a fix. Much thanks for the report.

cheers,
Thomas


___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Unable to join cluster

2018-07-30 Thread Thomas Lamprecht

(note: re-send as I forgot to hit answer all, thus the list wasn't included)

Am 07/27/2018 um 08:45 PM schrieb Eric Germann:
> I have two new Proxmox boxes in a virgin cluster.  No VM’s, etc.  The 
only thing setup on them is networking.

>
> I created a cluster on the first one successfully.
>
> However, I try to join the second to the cluster, I get the following 
error:

>
> Starting worker failed: unable to parse worker upid 
'UPID:pve02:1876:00026B6B:5B5B6751:clusterjoin:2001:470:e2fc:10::160:root@pam:' 
(500)


>
> The nodes are all IPv4 and IPv6 enabled.  The cluster IP as show in 
the config is IPv6.  Is this the issue?

>

Yes, this could be indeed a bug where our UPID parser cannot handle the
encoded IPv6 address...

> If I put the v6 address in brackets, same error.
>
> If I substitute the IPv4 address, I get the following error
>
> Etablishing API connection with host '172.28.10.160'
> TASK ERROR: 401 401 authentication failure
>

Huh, that's a bit weird, sure you have the correct credentials?

> Thoughts?  They haven’t been up more than 1 hr when this occurs.
>

For now you may want to use the 'pvecm add' CLI command with --use_ssh
as parameter, with this you should workaround the issue while we take a
look at this.

cheers,
Thomas

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


  1   2   3   >