[ovirt-devel] Re: Purging inactive maintainers from vdsm-master-maintainers
I also agree with the proposal. It's sad to turn in my keys but I'm likely unable to perform many duties expected of a maintainer at this point. I know that people can still find me via the git history :) On Thu, Nov 28, 2019 at 3:37 AM Milan Zamazal wrote: > Dan Kenigsberg writes: > > > On Wed, Nov 27, 2019 at 4:33 PM Francesco Romani > wrote: > >> > >> On 11/27/19 3:25 PM, Nir Soffer wrote: > > > >> > I want to remove inactive contributors from vdsm-master-maintainers. > >> > > >> > I suggest the simple rule of 2 years of inactivity for removing from > >> > this group, > >> > based on git log. > >> > > >> > See the list below for current status: > >> > https://gerrit.ovirt.org/#/admin/groups/106,members > >> > >> > >> No objections, keeping the list minimal and current is a good idea. > > > > > > I love removing dead code; I feel a bit different about removing old > > colleagues. Maybe I'm just being nostalgic. > > > > If we introduce this policy (which I understand is healthy), let us > > give a long warning period (6 months?) before we apply the policy to > > existing dormant maintainers. We should also make sure that we > > actively try to contact a person before he or she is dropped. > > I think this is a reasonable proposal. > > Regards, > Milan > ___ > Devel mailing list -- devel@ovirt.org > To unsubscribe send an email to devel-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/devel@ovirt.org/message/QCMGKR2IRYTITM2T3YMLXGOZCT4BHYGL/ > -- Adam Litke He / Him / His Principle Software Engineer Red Hat <https://www.redhat.com/> ali...@redhat.com @RedHat <https://twitter.com/redhat> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc> <https://www.redhat.com/> <https://redhat.com/summit> ___ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/2HBNMTBENTMVYI543OZWH2MNROTMLVXH/
Re: [ovirt-devel] vdsm stable branch maitainership
+1 On Tue, Jan 9, 2018 at 8:17 AM, Francesco Romani <from...@redhat.com> wrote: > On 01/09/2018 12:43 PM, Dan Kenigsberg wrote: > > Hello, > > > > I would like to nominate Milan Zamazal and Petr Horacek as maintainers > > of vdsm stable branches. This job requires understanding of vdsm > > packaging and code, a lot of attention to details and awareness of the > > requirements of other components and teams. > > > > I believe that both Milan and Petr have these qualities. I am certain > > they would work in responsive caution when merging and tagging patches > > to the stable branches. > > > > vdsm maintainers, please confirm if you approve. > > +1 > > -- > Francesco Romani > Senior SW Eng., Virtualization R > Red Hat > IRC: fromani github: @fromanirh > > -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] jsonrpc go client
On Fri, Jul 14, 2017 at 9:32 AM, Piotr Kliczewski < piotr.kliczew...@gmail.com> wrote: > On Fri, Jul 14, 2017 at 3:14 PM, Dan Kenigsberg <dan...@redhat.com> wrote: > > On Fri, Jul 14, 2017 at 3:11 PM, Piotr Kliczewski > > <piotr.kliczew...@gmail.com> wrote: > >> All, > >> > >> I pushed very simple jsonrpc go client [1] which allows to talk to > >> vdsm. I had a request to create it but if there are more people > >> willing to use it I am happy to maintain it. > Awesome Piotr! Thanks for the great work. > >> > >> Please let me know if you find any issues with it or you have any > >> feature requests. > > > > Interesting. Which use case do you see for this client? > > Currently, Vdsm has very few clients: Engine, vdsm-client, mom and > > hosted-engine. Too often we forget about the non-Engine ones and break > > them, so I'd be happy to learn more about a 5th. > > Adam asked for the client for his storage related changes. I am not > sure about specific use case. > I am looking at implementing a vdsm flexvol driver for kubernetes. This would allow kubernetes pods to access vdsm volumes using the native PV and PVC mechanisms. > > > > > Regarding https://github.com/pkliczewski/vdsm-jsonrpc-go/ > blob/master/example/main.go > > : programming without exceptions and try-except is a pain. don't you > > need to check the retval of Subscribe and disconnect on failure? > > By no means example is not perfect and you are correct. I will fix. > -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [ovirt-users] Feature: enhanced OVA support
Great feature! I am glad to see you plan to use the existing imageio framework for transferring data. Will you allow export of VMs from a particular snapshot? I guess that's how you'll have to do it if you want to support export of running VMs. I think you should definitely have a comment in the ovf to indicate that an OVA was generated by a oVirt. People will try to use this new feature to import random OVAs from who knows where. I'd also recommend adding a version to this comment: or perhaps even a schema version in case you need to deal with compatibility issues in the future. I agree with Yaniv Kaul that we should offer to sparsify the VM to optimize it for export. We should also return compressed data. When exporting, does it make sense to cache the stored OVA file in some sort of ephemeral storage (host local is fine, storage domain may be better) in order to allow the client to resume or restart an interrupted download without having to start from scratch? On Sun, May 14, 2017 at 9:56 AM, Arik Hadas <aha...@redhat.com> wrote: > Hi everyone, > > We would like to share our plan for extending the currently provided > support for OVA files with: > 1. Support for uploading OVA. > 2. Support for exporting a VM/template as OVA. > 3. Support for importing OVA that was generated by oVirt (today, we only > support those that are VMware-compatible). > 4. Support for downloading OVA. > > This can be found on the feature page > <http://www.ovirt.org/develop/release-management/features/virt/enhance-import-export-with-ova/> > . > > Your feedback and cooperation will be highly appreciated. > > Thanks, > Arik > > > ___ > Users mailing list > us...@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [VDSM] Adding Pylint to 'check' target
I like the current structure of the make check rule which has in increasing number of sub-targets (pep8, pyflakes, tests, etc) so it is still easy to run individual targets if the check rule is more than you need. For me adding this is a big +1. On Mon, May 15, 2017 at 12:04 PM, Dan Kenigsberg <dan...@redhat.com> wrote: > On Mon, May 15, 2017 at 4:47 PM, Fred Rolland <froll...@redhat.com> wrote: > > Hi, > > > > We are introducing Pylint to be performed as part of the 'check' target. > > Once that patch [1] will be merged, every execution of 'make check' will > > include also a Pylint analysis. > > > > Note that execution time will be longer by about 2 minutes. > > > > However, you are can use 'jobs' flag to tell 'make' to execute recipes > > simultaneously. > > Be aware that the output of the jobs will be interleaved. > > > > For example, running 'make' with two parallel jobs: > > > > make --jobs=2 check > > > > Regards, > > Freddy > > > > [1] https://gerrit.ovirt.org/#/c/76390/ > > I'm a not as frequent user of `make check` as I used to be, but I'm > cool with this addition. I'd like to hear if others are bothered. > > Dan. > -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Vdsm merge rights
+2 :) On Fri, May 12, 2017 at 6:16 AM, Nir Soffer <nsof...@redhat.com> wrote: > +1 > > בתאריך יום ו׳, 12 במאי 2017, 12:59, מאת Fabian Deutsch < > fdeut...@redhat.com>: > >> +1 >> >> On Fri, May 12, 2017 at 11:25 AM, Edward Haas <eh...@redhat.com> wrote: >> > Good news! +2 >> > >> > On Fri, May 12, 2017 at 11:27 AM, Piotr Kliczewski <pklic...@redhat.com >> > >> > wrote: >> >> >> >> +1 >> >> >> >> On Fri, May 12, 2017 at 9:14 AM, Dan Kenigsberg <dan...@redhat.com> >> wrote: >> >>> >> >>> I'd like to nominate Francesco to the vdsm-maintainers >> >>> >> >>> https://gerrit.ovirt.org/#/admin/groups/uuid- >> becbf722723417c336de6c1646749678acae8b89 >> >>> list, so he can merge patches without waiting for Nir, Adam or me. >> >>> >> >>> I believe that he proved to be thorough and considerate (and paranoid) >> >>> as the job requires. >> >>> >> >>> Vdsm maintainers, please approve. >> >>> >> >>> Dan >> >> >> >> >> > >> > >> > ___ >> > Devel mailing list >> > Devel@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/devel >> ___ >> Devel mailing list >> Devel@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > ___ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel > -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] New design for the Gerrit UI
I really like the colors on the patternfly scheme. Great job! On Thu, May 4, 2017 at 9:25 AM, Evgheni Dereveanchin <edere...@redhat.com> wrote: > Thanks everyone for the great feedback! > > So there's two options I see now: > 1) keep the default header scheme with white background, just add the > project logo into the corner > 2) try to adapt to the Patternfly scheme as used in oVirt's Admin UI > currently. > > I've swapped the header background color to #393f45 as used in oVirt for a > quick test: > https://gerrit-staging.phx.ovirt.org/ > > Is this more readable? If yes - I can continue working in this direction > to add gradients > and other patternfly style elements. Otherwise I'll just go with option 1 > and stick to the default style we have now. > > On Thu, May 4, 2017 at 2:45 PM, Martin Sivak <msi...@redhat.com> wrote: > >> > It will help if someone can suggest an alternate CSS which we can use >> or specific color codes, >> >> Well.. keep it as it is or make it really dark (like the patternfly >> menu). I do not care about logos but big area filled with non-neutral color >> is always going to be an issue. >> >> Martin >> >> On Thu, May 4, 2017 at 2:15 PM, Eyal Edri <ee...@redhat.com> wrote: >> >>> >>> >>> On Thu, May 4, 2017 at 3:05 PM, Martin Perina <mper...@redhat.com> >>> wrote: >>> >>>> I agree with Milan and Martin, even after few minutes looking at it, >>>> the green >>>> with combination of white background just made my eyes burning :-( >>>> >>>> Would it be possible to use more darker colors (at least for top >>>> banner/menu)? >>>> For example darker colors we use in oVirt engine welcome page ... >>>> >>> >>> Thanks for the feedback, >>> It will help if someone can suggest an alternate CSS which we can use or >>> specific color codes, >>> otherwise it will be long trial and error process until we'll find >>> something that will suite everyone. >>> >>> >>> >>>> >>>> >>>> Martin >>>> >>>> On Thu, May 4, 2017 at 5:53 AM, Martin Sivak <msi...@redhat.com> wrote: >>>> >>>>> I agree with Milan here. The light green background makes the menu >>>>> items to be almost unreadable, the search button (slightly different >>>>> green color) blends with the background and generally the color pulls >>>>> my eyes away from the content. I wouldn't feel comfortable looking at >>>>> the screen for a whole day. >>>>> >>>>> Martin >>>>> >>>>> On Thu, May 4, 2017 at 9:57 AM, Milan Zamazal <mzama...@redhat.com> >>>>> wrote: >>>>> > Evgheni Dereveanchin <edere...@redhat.com> writes: >>>>> > >>>>> >> The Infra team is working on customizing the look of Gerrit to make >>>>> it fit >>>>> >> better with other oVirt services. I want to share the result of this >>>>> >> effort. Hopefully we can gather some feedback before applying the >>>>> design to >>>>> >> oVirt's instance of Gerrit. >>>>> >> >>>>> >> Please visit the Staging instance to check it out: >>>>> >> >>>>> >> https://gerrit-staging.phx.ovirt.org/ >>>>> > >>>>> > Thank you for the preview. While it fits better with oVirt services, >>>>> > there is one thing that makes me uncomfortable with it: low contrast. >>>>> > The top green bar is probably directly violating Web Accessibility >>>>> > Guidelines (AA level; see >>>>> > https://www.w3.org/TR/WCAG20/#visual-audio-contrast-contrast), but I >>>>> > find all the green parts harder to read than in the current version. >>>>> > So it would be nice if the contrast could be improved. >>>>> > >>>>> > Thanks, >>>>> > Milan >>>>> > ___ >>>>> > Devel mailing list >>>>> > Devel@ovirt.org >>>>> > http://lists.ovirt.org/mailman/listinfo/devel >>>>> ___ >>>>> Devel mailing list >>>>> Devel@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>>> >>>> >>>> >>>> ___ >>>> Infra mailing list >>>> in...@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/infra >>>> >>>> >>> >>> >>> -- >>> >>> Eyal edri >>> >>> >>> ASSOCIATE MANAGER >>> >>> RHV DevOps >>> >>> EMEA VIRTUALIZATION R >>> >>> >>> Red Hat EMEA <https://www.redhat.com/> >>> <https://red.ht/sig> TRIED. TESTED. TRUSTED. >>> <https://redhat.com/trusted> >>> phone: +972-9-7692018 <+972%209-769-2018> >>> irc: eedri (on #tlv #rhev-dev #rhev-integ) >>> >> >> >> ___ >> Devel mailing list >> Devel@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/devel >> > > > > -- > Regards, > Evgheni Dereveanchin > > ___ > Infra mailing list > in...@ovirt.org > http://lists.ovirt.org/mailman/listinfo/infra > > -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [VDSM] granting network+2 to Eddy
+1 On Tue, Feb 28, 2017 at 6:03 AM, Francesco Romani <from...@redhat.com> wrote: > > On 02/28/2017 08:32 AM, Dan Kenigsberg wrote: > > Hi, > > > > After more than a year of substantial contribution to Vdsm networking, > > and after several months of me upgrading his score, I would like to > > nominate Eddy as a maintainer for network-related code in Vdsm, in > > master and stable branches. > > > > Current Vdsm maintainers and others: please approve my suggestion if > > you agree with it. > > Approved > > -- > Francesco Romani > Red Hat Engineering Virtualization R & D > IRC: fromani > > -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [VDSM] Correct implementation of virt-sysprep job
On 06/12/16 22:06 +0200, Arik Hadas wrote: Adam, :) You seem upset. Sorry if I touched on a nerve... Just out of curiosity: when you write "v2v has promised" - what exactly do you mean? the tool? Richard Jones (the maintainer of virt-v2v)? Shahar and I that implemented the integration with virt-v2v? I'm not aware of such a promise by any of these options :) Some history... Earlier this year Nir, Francesco (added), Shahar, and I began discussing the similarities between what storage needed to do with external commands and what was designed specifically for v2v. I am not sure if you were involved in the project at that time. The plan was to create common infrastructure that could be extended to fit the unique needs of the verticals. The v2v code was going to be moved over to the new infrastructure (see [1]) and the only thing that stopped the initial patch was lack of a VMWare testing environment for verification. At that time storage refocused on developing verbs that used the new infrastructure and have been maintaining its suitability for general use. Conversion of v2v -> Host Jobs is obviously a lower priority item and much more difficult now due to the early missed opportunity. Anyway, let's say that you were given such a promise by someone and thus consider that mechanism to be deprecated - it doesn't really matter. I may be biased but I think my opinion does matter. The current implementation doesn't well fit to this flow (it requires per-volume job, it creates leases that are not needed for template's disks, ...) and with the "next-gen API" with proper support for virt flows not even being discussed with us (and iiuc also not with the infra team) yet, I don't understand what do you suggest except for some strong, though irrelevant, statements. If you are willing to engage in a good-faith technical discussion I am sure I can help you to understand. These operations to storage demand some form of locking protection. If volume leases aren't appropriate then perhaps we should use the VM Leases / xleases that Nir is finishing off for 4.1 now. I suggest loud and clear to reuse (not to add dependencies, not to enhance, ..) an existing mechanism for a very similar flow of virt-v2v that works well and simple. I clearly remember discussions involving infra (hello Oved), virt (hola Michal), and storage where we decided that new APIs performing async operations involving external commands should use the HostJobs infrastructure instead of adding more information to Host Stats. These were the "famous" entity polling meetings. Of course plans can change but I have never been looped into any such discussions. Do you "promise" to implement your "next gen API" for 4.1 as an alternative? I guess we need the design first. On Tue, Dec 6, 2016 at 5:04 PM, Adam Litke <ali...@redhat.com> wrote: On 05/12/16 11:17 +0200, Arik Hadas wrote: On Mon, Dec 5, 2016 at 10:05 AM, Nir Soffer <nsof...@redhat.com> wrote: On Sun, Dec 4, 2016 at 8:50 PM, Shmuel Melamud <smela...@redhat.com> wrote: > > Hi! > > I'm currently working on integration of virt-sysprep into oVirt. > > Usually, if user creates a template from a regular VM, and then creates new VMs from this template, these new VMs inherit all configuration of the original VM, including SSH keys, UDEV rules, MAC addresses, system ID, hostname etc. It is unfortunate, because you cannot have two network devices with the same MAC address in the same network, for example. > > To avoid this, user must clean all machine-specific configuration from the original VM before creating a template from it. You can do this manually, but there is virt-sysprep utility that does this automatically. > > Ideally, virt-sysprep should be seamlessly integrated into template creation process. But the first step is to create a simple button: user selects a VM, clicks the button and oVirt executes virt-sysprep on the VM. > > virt-sysprep works directly on VM's filesystem. It accepts list of all disks of the VM as parameters: > > virt-sysprep -a disk1.img -a disk2.img -a disk3.img > > The architecture is as follows: command on the Engine side runs a job on VDSM side and tracks its success/failure. The job on VDSM side runs virt-sysprep. > > The question is how to implement the job correctly? > > I thought about using storage jobs, but they are designed to work only with a single volume, correct? New
Re: [ovirt-devel] [VDSM] Correct implementation of virt-sysprep job
voked on VM rather than particular disk makes it less suitable. These are more appropriately called HostJobs and the have the following semantics: - They represent an external process running on a single host - They are not persisted. If the host or vdsm restarts, the job is aborted - They operate on entities. Currently storage is the first adopter of the infrastructure but virt was going to adopt these for the next-gen API. Entities can be volumes, storage domains, vms, network interfaces, etc. - Job status and progress is reported by the Host Jobs API. If a job is not present, then the underlying entitie(s) must be polled by engine to determine the actual state. 3. V2V jobs - no mechanism is provided to resume failed jobs, no leases, etc This is the old infra upon which Host Jobs are built. v2v has promised to move to Host Jobs in the future so we should not add new dependencies to this code. I have some arguments for using V2V-like jobs [1]: 1. creating template from vm is rarely done - if host goes unresponsive or any other failure is detected we can just remove the template and report the error We can chose this error handling with Host Jobs as well. 2. the phase of virt-sysprep is, unlike typical storage operation, short - reducing the risk of failures during the process Reduced risk of failures is never an excuse to have lax error handling. The storage flavored host jobs provide tons of utilities for making error handling standardized, easy to implement, and correct. 3. during the operation the VM is down - by locking the VM/template and its disks on the engine side, we render leases-like mechanism redundant Eventually we want to protect all operations on storage with sanlock leases. This is safer and allows for a more distributed approach to management. Again, the use of leases correctly in host jobs requires about 5 lines of code. The benefits of standardization far outweigh any perceived simplification resulting from omitting it. 4. in the worst case - the disk will not be corrupted (only some of the data might be removed). Again, the way engine chooses to handle job failures is independent of the mechanism. Let's separate that from this discussion. So I think that the mechanism for storage jobs is an over-kill for this case. We can keep it simple by generalise the V2V-job for other virt-tools jobs, like virt-sysprep. I think we ought to standardize on the Host Jobs framework where we can collaborate on unit tests, standardized locking and error handling, abort logic, etc. When v2v moves to host jobs then we will have a unified method of handling ephemeral jobs that are tied to entities. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [VDSM] FAIL: test_intra_domain_copy('block', 'cow', 'cow') (storage_sdm_copy_data_test.TestCopyDataDIV)
mg convert -p -t none -T none -f qcow2 /var/tmp/tmpKpf5ys/mnt/blockSD/beb999a9-a90b-4c49-9d4c-7b21b3adb164/images/c406fa29-72c8-4a26-9601-2bd5e0b1cbd0/1b8476de-236b-4c8e-aea0-134b213b7cb7 -O qcow2 -o compat=0.10 /var/tmp/tmpKpf5ys/mnt/blockSD/beb999a9-a90b-4c49-9d4c-7b21b3adb164/images/939d015f-e57e-4bf1-9013-b648fae347ae/0e7c49ae-6c7c-4526-89c2-17b923c70dfa (cwd None) (qemuimg:247) 21:16:08 2016-11-16 21:14:26,805 DEBUG (MainThread) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/dd iflag=direct skip=5 bs=512 if=/var/tmp/tmpKpf5ys/dev/beb999a9-a90b-4c49-9d4c-7b21b3adb164/metadata count=1 (cwd None) (commands:69) 21:16:08 2016-11-16 21:14:26,820 DEBUG (MainThread) [storage.Misc.excCmd] SUCCESS: = '1+0 records in\n1+0 records out\n512 bytes copied, 0.000380116 s, 1.3 MB/s\n'; = 0 (commands:93) 21:16:08 2016-11-16 21:14:26,821 DEBUG (MainThread) [storage.Misc] err: ['1+0 records in', '1+0 records out', '512 bytes copied, 0.000380116 s, 1.3 MB/s'], size: 512 (misc:138) 21:16:08 2016-11-16 21:14:26,823 INFO (MainThread) [storage.VolumeManifest] Tearing down volume beb999a9-a90b-4c49-9d4c-7b21b3adb164/0e7c49ae-6c7c-4526-89c2-17b923c70dfa justme True (blockVolume:386) 21:16:08 2016-11-16 21:14:26,824 INFO (MainThread) [storage.VolumeManifest] Tearing down volume beb999a9-a90b-4c49-9d4c-7b21b3adb164/1b8476de-236b-4c8e-aea0-134b213b7cb7 justme True (blockVolume:386) 21:16:08 2016-11-16 21:14:26,824 INFO (MainThread) [root] Job 'f8a60ab8-2ba9-473d-bf46-82080c283137' completed (jobs:203) 21:16:08 2016-11-16 21:14:26,825 INFO (MainThread) [root] Job 'f8a60ab8-2ba9-473d-bf46-82080c283137' will be deleted in 3600 seconds (jobs:245) 21:16:08 - >> end captured logging << - -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] [VDSM] Failing make check
Hi Piotr, I am now seeing consistent Jenkins failures during make check (when producing the schema html doc) and I suspect this[1] change. Can you take a look please? [1] https://gerrit.ovirt.org/#/c/56387/ -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [vdsm] tests failures
On 10/11/16 15:43 +0200, Nir Soffer wrote: On Thu, Nov 10, 2016 at 3:39 PM, Piotr Kliczewski <piotr.kliczew...@gmail.com> wrote: All, Few mins ago I saw build [1] failure due to: 13:36:42 ERROR: test_abort_during_copy('block') (storage_sdm_copy_data_test.TestCopyDataDIV) 13:36:42 -- 13:36:42 Traceback (most recent call last): 13:36:42 File "/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/testlib.py", line 133, in wrapper 13:36:42 return f(self, *args) 13:36:42 File "/home/jenkins/workspace/vdsm_master_check-patch-el7-x86_64/vdsm/tests/storage_sdm_copy_data_test.py", line 279, in test_abort_during_copy 13:36:42 raise RuntimeError("Timeout waiting for thread") 13:36:42 RuntimeError: Timeout waiting for thread Adam, can you take a look? Is this happening all of the time or intermittently? If intermittent then we can increase the timeout or just ignore for the time being since it's probably caused by an overloaded Jenkins slave. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] system tests failing on template export
On 17/10/16 11:51 +0200, Piotr Kliczewski wrote: Adam, I see constant failures due to this and found: 2016-10-17 03:55:21,045 ERROR (jsonrpc/3) [storage.TaskManager.Task] Task=`8989d694-7099-449b-bd66-4d63786be089`::Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2212, in getAllTasksInfo allTasksInfo = sp.getAllTasksInfo() File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper raise SecureError("Secured object is not in safe state") SecureError: Secured object is not in safe state This usually indicates that the SPM role has been lost which happens most likely due to connection issues with the storage. What is the storage environment being used for the system tests? Please take a look not sure whether it is related. You can find latest build here [1] Thanks, Piotr [1] http://jenkins.ovirt.org/job/ovirt_master_system-tests/668/ On Fri, Oct 14, 2016 at 11:22 AM, Evgheni Dereveanchin <edere...@redhat.com> wrote: Hello, We've got several cases today where system tests failed when attempting to export templates: http://jenkins.ovirt.org/job/ovirt_master_system-tests/655/testReport/junit/(root)/004_basic_sanity/template_export/ Related engine.log looks something like this: https://paste.fedoraproject.org/449936/47643643/raw/ I could not find any obvious issues in SPM logs, could someone please take a look to confirm what may be causing this issue? Full logs from the test are available here: http://jenkins.ovirt.org/job/ovirt_master_system-tests/655/artifact/ Regards, Evgheni Dereveanchin ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] SPM or SDM for a new verb?
On 05/07/16 12:07 +0300, Shmuel Melamud wrote: Hi! I'm writing code for a new verb (sparsifyInplace) in VDSM and got two different opinions about whether to use SPM or SDM for it: 1) SDM is the new correct approach, need to use it. 2) SDM is on early stage and may be changed significantly, so it is better to use SPM as mature and reliable approach. What's your opinion? SDM is definitely the better way to go, if you can, since it will make less work for you in the future and also make your verb use host resources more efficiently. My guess is that sparsifyInplace just needs to run a command against a volume path that is visible to a selected vdsm host and wait for it to complete. Do you intend for this to be run also while a VM is using the volume? For SDM verbs in vdsm there is a basic formula. All verbs are asynchronous. A new public API function is created in HSM. This function unpacks parameters and then creates and schedules a HostJob instance. The HostJob performs any necessary locking and does the work. It also has an interface for progress reporting and for aborting the operation. Engine monitors HostJobs using a public vdsm API. While it's true that SDM is in early stages, the underlying infrastructure that you will need has been upstream for awhile now. I'll be happy to provide some additional details if you have further questions. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] [VDSM] Test fails only under make check
I have written a new test [1] and when running 'make check' I get a nasty ImportError (see below). When running the same test using run_tests_local.sh directly it works fine. Any ideas what might be going on? [1] https://gerrit.ovirt.org/#/c/60060/1/tests/storage_hsm_test.py == ERROR: Failure: ImportError (No module named 'Queue') -- Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest raise self.exc_val.with_traceback(self.tb) File "/usr/lib/python3.4/site-packages/nose/loader.py", line 418, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/lib64/python3.4/imp.py", line 235, in load_module return load_source(name, filename, file) File "/usr/lib64/python3.4/imp.py", line 171, in load_source module = methods.load() File "", line 1220, in load File "", line 1200, in _load_unlocked File "", line 1129, in _exec File "", line 1471, in exec_module File "", line 321, in _call_with_frames_removed File "/home/alitke/src/vdsm/tests/storage_hsm_test.py", line 26, in from storagetestlib import fake_file_env File "/home/alitke/src/vdsm/tests/storagetestlib.py", line 24, in from storagefakelib import FakeLVM File "/home/alitke/src/vdsm/tests/storagefakelib.py", line 32, in from storage import lvm as real_lvm File "/home/alitke/src/vdsm/vdsm/storage/lvm.py", line 41, in from vdsm.storage import devicemapper File "/home/alitke/src/vdsm/lib/vdsm/storage/devicemapper.py", line 30, in from vdsm.storage import misc File "/home/alitke/src/vdsm/lib/vdsm/storage/misc.py", line 36, in import Queue ImportError: No module named 'Queue' -- -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Undelivered mail warnings from Gerrit
On 02/06/16 11:39 +0200, Tomáš Golembiovský wrote: Hi, for the last two weeks I've been getting lots of warnings about undelivered mail from Gerrit. The importnat thing in the message being: The original message was received at Wed, 1 Jun 2016 14:57:54 -0400 from gerrit.ovirt.org [127.0.0.1] - Transcript of session follows - <vdsm-patc...@fedorahosted.org>... Deferred: Connection timed out with hosted-lists01.fedoraproject.org. Warning: message still undelivered after 4 hours Will keep trying until message is 5 days old Anyone else experiencing the same problem? Is this being worked on? It's affecting me quite severely also. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [VDSM] New versioning scheme
On 01/06/16 13:51 +0300, Nir Soffer wrote: Hi all, We are going to branch 4.0 today, and it is a good time to update our versioning scheme. I suggest to use the standard ovirt versioning, use by most projects: 1. master vdsm-4.19.0-201606011345.gitxxxyyy 2. 4.0 vdsm-4.18.1 The important invariant is that any build from master is considered newer compare with the stable builds, since master always contain all stable code, and new code. Second invariant, the most recent build from master is always newer compared with any other master build - the timestamp enforces this. Thoughts? +1 -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [vdsm] Another network test failing
On 17/05/16 09:45 +0300, Dan Kenigsberg wrote: On Tue, May 17, 2016 at 01:10:19AM +0300, Nir Soffer wrote: On Mon, May 16, 2016 at 4:52 PM, Adam Litke <ali...@redhat.com> wrote: > On 15/05/16 15:10 +0300, Dan Kenigsberg wrote: >> >> On Sun, May 15, 2016 at 10:33:30AM +0300, Edward Haas wrote: >>> >>> On Tue, May 10, 2016 at 8:19 PM, Adam Litke <ali...@redhat.com> wrote: >>> >>> > On 10/05/16 18:08 +0300, Dan Kenigsberg wrote: >>> > >>> >> On Mon, May 09, 2016 at 02:48:43PM -0400, Adam Litke wrote: >>> >> >>> >>> When running make check on my local system I often (but not always) >>> >>> get the following error: >>> >>> >>> >> >>> >> Do you have any clue related to when this happens? (your pwd, >>> >> pythonpath) >>> >> >>> > >>> > Maybe it's a side effect of the way nose loads and runs tests? >>> > >>> > Did it begin with the recent move of netinfo under vdsm.network? >>> >> https://gerrit.ovirt.org/56713 (CommitDate: Thu May 5) or did you see >>> >> it >>> >> earlier? >>> >> >>> > >>> > That's possible. It only started happening recently. It seems to >>> > fail only when run under 'make check' but not when run via >>> > ./run_tests_local.sh. >>> >>> >>> Is it possible that on the same machine you have installed an older vdsm >>> version >>> and it somehow conflicts? (resolving vdsm from the site-packages instead >>> from >>> the local workspace) >> >> >> Or maybe you have *.pyc from an older directory structure left in your >> working directory? > > > I think this was the issue. Removing *.pyc from the source tree fixed > it. Thanks! git clean -dxf is very useful from time to time Yet very dangerous in another times (yes, once upon a time I had the only copy of a helper script hiding within the leafs of a git tree) Been there too. Maybe it's time we fix 'make clean'. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [vdsm] Another network test failing
On 15/05/16 15:10 +0300, Dan Kenigsberg wrote: On Sun, May 15, 2016 at 10:33:30AM +0300, Edward Haas wrote: On Tue, May 10, 2016 at 8:19 PM, Adam Litke <ali...@redhat.com> wrote: > On 10/05/16 18:08 +0300, Dan Kenigsberg wrote: > >> On Mon, May 09, 2016 at 02:48:43PM -0400, Adam Litke wrote: >> >>> When running make check on my local system I often (but not always) >>> get the following error: >>> >> >> Do you have any clue related to when this happens? (your pwd, >> pythonpath) >> > > Maybe it's a side effect of the way nose loads and runs tests? > > Did it begin with the recent move of netinfo under vdsm.network? >> https://gerrit.ovirt.org/56713 (CommitDate: Thu May 5) or did you see it >> earlier? >> > > That's possible. It only started happening recently. It seems to > fail only when run under 'make check' but not when run via > ./run_tests_local.sh. Is it possible that on the same machine you have installed an older vdsm version and it somehow conflicts? (resolving vdsm from the site-packages instead from the local workspace) Or maybe you have *.pyc from an older directory structure left in your working directory? I think this was the issue. Removing *.pyc from the source tree fixed it. Thanks! -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [vdsm] Another network test failing
On 10/05/16 18:08 +0300, Dan Kenigsberg wrote: On Mon, May 09, 2016 at 02:48:43PM -0400, Adam Litke wrote: When running make check on my local system I often (but not always) get the following error: Do you have any clue related to when this happens? (your pwd, pythonpath) Maybe it's a side effect of the way nose loads and runs tests? Did it begin with the recent move of netinfo under vdsm.network? https://gerrit.ovirt.org/56713 (CommitDate: Thu May 5) or did you see it earlier? That's possible. It only started happening recently. It seems to fail only when run under 'make check' but not when run via ./run_tests_local.sh. $ rpm -qa | grep libvirt-python libvirt-python-1.2.18-1.fc23.x86_64 == ERROR: Failure: ImportError (No module named 'libvirt') -- Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest raise self.exc_val.with_traceback(self.tb) File "/usr/lib/python3.4/site-packages/nose/loader.py", line 418, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/lib64/python3.4/imp.py", line 235, in load_module return load_source(name, filename, file) File "/usr/lib64/python3.4/imp.py", line 171, in load_source module = methods.load() File "", line 1220, in load File "", line 1200, in _load_unlocked File "", line 1129, in _exec File "", line 1471, in exec_module File "", line 321, in _call_with_frames_removed File "/home/alitke/src/vdsm/tests/network/models_test.py", line 27, in from vdsm.network.netinfo import bonding, mtus File "/home/alitke/src/vdsm/lib/vdsm/network/netinfo/__init__.py", line 26, in from vdsm import libvirtconnection File "/home/alitke/src/vdsm/lib/vdsm/libvirtconnection.py", line 29, in import libvirt ImportError: No module named 'libvirt' -- Ran 234 tests in 11.084s FAILED (SKIP=42, errors=1) -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] [vdsm] Another network test failing
When running make check on my local system I often (but not always) get the following error: $ rpm -qa | grep libvirt-python libvirt-python-1.2.18-1.fc23.x86_64 == ERROR: Failure: ImportError (No module named 'libvirt') -- Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest raise self.exc_val.with_traceback(self.tb) File "/usr/lib/python3.4/site-packages/nose/loader.py", line 418, in loadTestsFromName addr.filename, addr.module) File "/usr/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/usr/lib64/python3.4/imp.py", line 235, in load_module return load_source(name, filename, file) File "/usr/lib64/python3.4/imp.py", line 171, in load_source module = methods.load() File "", line 1220, in load File "", line 1200, in _load_unlocked File "", line 1129, in _exec File "", line 1471, in exec_module File "", line 321, in _call_with_frames_removed File "/home/alitke/src/vdsm/tests/network/models_test.py", line 27, in from vdsm.network.netinfo import bonding, mtus File "/home/alitke/src/vdsm/lib/vdsm/network/netinfo/__init__.py", line 26, in from vdsm import libvirtconnection File "/home/alitke/src/vdsm/lib/vdsm/libvirtconnection.py", line 29, in import libvirt ImportError: No module named 'libvirt' -- Ran 234 tests in 11.084s FAILED (SKIP=42, errors=1) -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Vdsm api package
On 29/03/16 21:01 +0300, Nir Soffer wrote: Hi all, In the Vdsm call, we discussed a way to expose vdsm errors to its clients (e.g, engine, hosted engine agent/setup). The idea is to have a vdsmapi package, holding: - errors.py - all public errors - events.py - all events sent by vdsm - client.py - library for communicating with vdsm - schema.py - the client will use this to autogenerate online help and validate messages - schema.yaml - we probably need several files (gluster, events, etc.) This will allow other projects talking with vdsm to do: from vdsmapi import client, errors ... try: client.list(vmId="xxxyyy") except errors.NoSuchVM: ... (this is a fake example, the real api may be different) Engine can build-require vdsmapi, and generate java module with the public errors from vdsmapi/errors.py module, instead of keeping this hardcoded in engine, and updating it every time vdsm adds new public error. Vdsm will use this package when building response to clients. Edi was concerned about sharing the errors module, so maybe we need a package: vdsmapi/ errors/ network.py virt.py storage.py gluster.py We can still expose all the errors via errors/__init__.py, so clients do not have to care about the area of the application the error come from. Thoughts? Seems like a fantastic idea. Would engine builds of the master branch always fetch the errors module from vdsm's master branch or would there be some synchronization points? -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] oVirt presentations with reveal.js
I recently gave an oVirt talk at FOSDEM and decided to use reveal.js[1] to create my slides[2]. Reveal.js is a really slick framework for creating HTML5 based presentations that look really clean and modern. You write your slides in html markup and a javascript library provides everything else (it's a long list of features so see the demo presentation for details[3]). I was pretty happy with the oVirt themed slides I created so decided to package it up into a reusable template. To try it out simply: 1. git clone https://github.com/aglitke/reveal.js.git ovirt-template 2. cd ovirt-template 3. git checkout -b ovirt-template 4. firefox index.html The main changes I have made is to add a floating oVirt logo to the upper-right corner of the slides and a footer. Both can display while presenting and when the slides are printed. I am not a graphic designer or a web developer so I am keen to accept contributions from people with more experience in this area. Enjoy! [1] https://github.com/hakimel/reveal.js [2] http://aglitke.github.io/fosdem-2016/#/ [3] http://lab.hakim.se/reveal-js/#/ -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Changing the name of VDSM in oVirt 4.0.
On 26/01/16 19:26 +0200, Nir Soffer wrote: On Tue, Jan 26, 2016 at 5:29 PM, Yaniv Dary <yd...@redhat.com> wrote: I suggest for ease of use and tracking we change the versioning to align to the engine (4.0.0 in oVirt 4.0 GA) to make it easy to know which version was in which release and also change the package naming to something like ovirt-host-manager\ovirt-host-agent. When we think about the names, we should consider all the components installed or running on the host. Here is the current names and future options: Also consider that we have discussed breaking vdsmd into its sub-components. In that case we'd need names for: vdsm-storage vdsm-virt vdsm-network etc I am thinking of vdsm as a service provider to the engine. Today it provides a virtualization hypervisor, a storage repository, network configuration services, etc. I think using the word 'provider' is too long (and possibly too vague). We could just make up something to represent the concept of an endpoint that ovirt-engine uses to get things done. For example, an engine often connects to gears to get things done (but gear is already taken by OpenShift, sadly). How about ovirt-minion? :) ovirt-target? ovirt-element? ovirt-unit? Also consider that an abbreviation or acronym is still okay. Thanks for reading to the bottom of my pre-coffee stream of consciousness. Of the alternatives listed below, I'd be inclined to support 'ovirt-host*'. Current names: vdsmd supervdsmd vdsm-tool vdsClient (we have also two hosted engine daemons, I don't remember the names) Here are some options in no particular order to name these components: Alt 1: ovirt-hypervisor ovirt-hypervisor-helper ovirt-hypervisor-tool ovirt-hyperviosr-cli Alt 2: ovirt-host ovirt-host-helper ovirt-host-tool ovirt-host-cli Alt 3: ovirt-agent ovirt-agent-helper ovirt-agent-tool ovirt-agent-cli Thoughts? Nir -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] Merging patches into vdsm without CI
Hi all, Recent breakage in the vdsm CI flows have caused the change upload trigger to be disabled. This means that CI scores are no longer being automatically applied to uploaded changes. This means that patches cannot be merged into vdsm. I have a queue of patches which are otherwise ready for merge (which have passed CI in the past but needed rebasing). These patches have been stalled for almost a week now. What can we to to unfreeze the vdsm development process in the short and long term? Earlier today I worked with Sandro and David on manually running CI on my dev machine but am getting 100s of failures (so it looks like this wont even be a good short-term solution). -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Vdsm: extending maintainers team
On 04/08/15 09:58 +0100, Dan Kenigsberg wrote: If you follow Vdsm development, you probably have noticed that we are short of active maintainers. Thankfully, we have have great developers that - in my opinion - can fill that gap. I am impressed by the quality of their reviews, their endurance, and most importantly - their ability to unbreak whatever code they approve. I'd like to nominate - Nir Soffer - for storage - Francesco Romani - for virt - Piotr Kliczewski - for infra For the mean while, I would like to keep my own single point of merger (unless I'm away, of course). Active and former maintainers: please approve A big +2 from me! This is really needed and Nir, Francesco, and Piotr are absolutely the right candidates for maintainership. (My apologies for the delay in responding as I was on PTO.) -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Stomp regression in vdsm master
On 12/06/15 15:13 +0200, Piotr Kliczewski wrote: On Fri, Jun 12, 2015 at 3:11 PM, Adam Litke ali...@redhat.com wrote: On 12/06/15 11:46 +0200, Piotr Kliczewski wrote: On Fri, Jun 12, 2015 at 11:28 AM, Michal Skrivanek michal.skriva...@redhat.com wrote: On 12 Jun 2015, at 02:38, Adam Litke wrote: On 09/06/15 08:41 +0200, Piotr Kliczewski wrote: Adam, Thank you for reporting. There is work in parallel on the engine side so please refresh your engine as well. The changes that you listed should work with engine 3.5 but will fail as you described for older master. I upgraded engine to latest master (72368f3) and vdsm as well (718909e) and connections were still completely broken between my engine and vdsm until I reverted https://gerrit.ovirt.org/#/c/38451/ . I think there is something real here. I got similar reports from Omer as of yesterday ~noon, both sides latest Is vdsm-jsonrpc-java latest? I have the following in my local maven repo: $ find ~/.m2 -name \*jsonrpc\*.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.1-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.1-SNAPSHOT.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.1-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.1-20150420.133832-3.jar above is the latest merged. Can you share your logs? Attached. /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.0.15/vdsm-jsonrpc-java-client-1.0.15.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.0-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.0-SNAPSHOT.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.0-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.0-20150407.125052-6.jar Thanks, Piotr On Mon, Jun 8, 2015 at 10:54 PM, Adam Litke ali...@redhat.com wrote: Hi Piotr, Today I refreshed my vdsm master branch and got the 4 commits at the bottom of this email (among others). My engine started having connection timeouts to vdsm (100% connectivity failure). Reverting the commits resolved the problem for me. I don't have logs at the moment but just wanted to share this info in case anyone else started experiencing connectivity problems to vdsm. 14897fea06e8f21ae99144ee0294b21e08ea0892 stomp: calling super explicitly ed12db391f2f147443baf52b5519d51ad5bd3410 stomp: allow single stomp reactor ac85274145cd82eec804e3585b3cd12a6c13261a stompreactor: fix naming of default destination c80ab0657d4f0454c3141aadeadcf134e5f16de7 stomp: server side subscriptions -- Adam Litke -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel -- Adam Litke -- Adam Litke ovirt-logs.tgz Description: application/gzip ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Stomp regression in vdsm master
On 12/06/15 15:36 +0200, Piotr Kliczewski wrote: On Fri, Jun 12, 2015 at 3:23 PM, Adam Litke ali...@redhat.com wrote: On 12/06/15 15:13 +0200, Piotr Kliczewski wrote: On Fri, Jun 12, 2015 at 3:11 PM, Adam Litke ali...@redhat.com wrote: On 12/06/15 11:46 +0200, Piotr Kliczewski wrote: On Fri, Jun 12, 2015 at 11:28 AM, Michal Skrivanek michal.skriva...@redhat.com wrote: On 12 Jun 2015, at 02:38, Adam Litke wrote: On 09/06/15 08:41 +0200, Piotr Kliczewski wrote: Adam, Thank you for reporting. There is work in parallel on the engine side so please refresh your engine as well. The changes that you listed should work with engine 3.5 but will fail as you described for older master. I upgraded engine to latest master (72368f3) and vdsm as well (718909e) and connections were still completely broken between my engine and vdsm until I reverted https://gerrit.ovirt.org/#/c/38451/ . I think there is something real here. I got similar reports from Omer as of yesterday ~noon, both sides latest Is vdsm-jsonrpc-java latest? I have the following in my local maven repo: $ find ~/.m2 -name \*jsonrpc\*.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.1-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.1-SNAPSHOT.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.1-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.1-20150420.133832-3.jar above is the latest merged. Can you share your logs? Attached. Looking at the logs I do not see any issues but I do not see any processed messages on vdsm side. Please apply this patch [1] it should solve this issue. Indeed it does. Thanks! [1] https://gerrit.ovirt.org/#/c/38819 /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.0.15/vdsm-jsonrpc-java-client-1.0.15.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.0-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.0-SNAPSHOT.jar /home/alitke/.m2/repository/org/ovirt/vdsm-jsonrpc-java/vdsm-jsonrpc-java-client/1.1.0-SNAPSHOT/vdsm-jsonrpc-java-client-1.1.0-20150407.125052-6.jar Thanks, Piotr On Mon, Jun 8, 2015 at 10:54 PM, Adam Litke ali...@redhat.com wrote: Hi Piotr, Today I refreshed my vdsm master branch and got the 4 commits at the bottom of this email (among others). My engine started having connection timeouts to vdsm (100% connectivity failure). Reverting the commits resolved the problem for me. I don't have logs at the moment but just wanted to share this info in case anyone else started experiencing connectivity problems to vdsm. 14897fea06e8f21ae99144ee0294b21e08ea0892 stomp: calling super explicitly ed12db391f2f147443baf52b5519d51ad5bd3410 stomp: allow single stomp reactor ac85274145cd82eec804e3585b3cd12a6c13261a stompreactor: fix naming of default destination c80ab0657d4f0454c3141aadeadcf134e5f16de7 stomp: server side subscriptions -- Adam Litke -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel -- Adam Litke -- Adam Litke -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Stomp regression in vdsm master
On 09/06/15 08:41 +0200, Piotr Kliczewski wrote: Adam, Thank you for reporting. There is work in parallel on the engine side so please refresh your engine as well. The changes that you listed should work with engine 3.5 but will fail as you described for older master. I upgraded engine to latest master (72368f3) and vdsm as well (718909e) and connections were still completely broken between my engine and vdsm until I reverted https://gerrit.ovirt.org/#/c/38451/ . I think there is something real here. Thanks, Piotr On Mon, Jun 8, 2015 at 10:54 PM, Adam Litke ali...@redhat.com wrote: Hi Piotr, Today I refreshed my vdsm master branch and got the 4 commits at the bottom of this email (among others). My engine started having connection timeouts to vdsm (100% connectivity failure). Reverting the commits resolved the problem for me. I don't have logs at the moment but just wanted to share this info in case anyone else started experiencing connectivity problems to vdsm. 14897fea06e8f21ae99144ee0294b21e08ea0892 stomp: calling super explicitly ed12db391f2f147443baf52b5519d51ad5bd3410 stomp: allow single stomp reactor ac85274145cd82eec804e3585b3cd12a6c13261a stompreactor: fix naming of default destination c80ab0657d4f0454c3141aadeadcf134e5f16de7 stomp: server side subscriptions -- Adam Litke -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] Stomp regression in vdsm master
Hi Piotr, Today I refreshed my vdsm master branch and got the 4 commits at the bottom of this email (among others). My engine started having connection timeouts to vdsm (100% connectivity failure). Reverting the commits resolved the problem for me. I don't have logs at the moment but just wanted to share this info in case anyone else started experiencing connectivity problems to vdsm. 14897fea06e8f21ae99144ee0294b21e08ea0892 stomp: calling super explicitly ed12db391f2f147443baf52b5519d51ad5bd3410 stomp: allow single stomp reactor ac85274145cd82eec804e3585b3cd12a6c13261a stompreactor: fix naming of default destination c80ab0657d4f0454c3141aadeadcf134e5f16de7 stomp: server side subscriptions -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [ACTION NEEDED] Packages for 3.5.3-RC1
On 14/05/15 11:46 +0200, Sandro Bonazzola wrote: # Please review the list of rpms / jobs: # ovirt-hosted-engine-ha-1.2.6 http://jenkins.ovirt.org/job/ovirt-hosted-engine-ha_any_create-rpms_manual/7/ # otopi-1.3.2 http://jenkins.ovirt.org/job/otopi_any_create-rpms_manual/16/ # ovirt-engine-dwh-3.5.3_rc http://jenkins.ovirt.org/job/manual-build-tarball/516/ # ovirt-engine-reports-3.5.3_rc http://jenkins.ovirt.org/job/manual-build-tarball/517/ # qemu-kvm-ev-2.1.2-23.el7_1.3 http://jenkins.ovirt.org/job/qemu_master_create-rpms-el7-x86_64_merged/3/ # qemu-kvm-rhev-0.12.1.2-2.448.el6_6.3 http://jenkins.ovirt.org/job/qemu-kvm-rhev_create-rpms_el6/506/ # ovirt-log-collector-3.5.3-0.1.master.git8b7826f http://jenkins.ovirt.org/job/ovirt-log-collector_3.5_create-rpms-fc20-x86_64_merged/25/ http://jenkins.ovirt.org/job/ovirt-log-collector_3.5_create-rpms-el7-x86_64_merged/15/ http://jenkins.ovirt.org/job/ovirt-log-collector_3.5_create-rpms-el6-x86_64_merged/26/ # ovirt-hosted-engine-setup-1.2.4-0.0.master.git62654a6 http://jenkins.ovirt.org/job/ovirt-hosted-engine-setup_3.5_create-rpms-fc20-x86_64_merged/60/ http://jenkins.ovirt.org/job/ovirt-hosted-engine-setup_3.5_create-rpms-el7-x86_64_merged/55/ http://jenkins.ovirt.org/job/ovirt-hosted-engine-setup_3.5_create-rpms-el6-x86_64_merged/61/ # ovirt-engine-3.5.3_rc1 http://jenkins.ovirt.org/job/manual-build-tarball/518/ # vdsm-4.16.16 http://jenkins.ovirt.org/job/manual-build-tarball/523/ #mom, to be released in Fedora and EPEL #ACTION: Adam, please provide the tarball to be released in src. See http://jenkins.ovirt.org/job/manual-build-tarball/524/ Fedora updates are in-progress... #optimizer http://jenkins.ovirt.org/job/ovirt-optimizer_master_create-rpms_merged/72/ # ovirt-node-plugin-hosted-engine # ACTION Fabian / Douglas to provide version to be released. -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [vdsm][mom][jsonrpc] new VDSM interface for MOM
On 13/05/15 22:53 +0200, Piotr Kliczewski wrote: On Wed, May 13, 2015 at 9:52 PM, Adam Litke ali...@redhat.com wrote: On 11/05/15 04:28 -0400, Francesco Romani wrote: Hi everyone, I'm working to brush up and enhance my old hack https://gerrit.ovirt.org/#/c/37827/1 That patch adds a new MOM interface, to talk with VDSM using the RPC interface. On top of that, I want to make efficient use of VDSM API (avoid redundant call, possibly issuing only one getAllVmStats call and caching the results, and so forth) Next step will be to backport optimizations to current vdsmInterface. Or maybe, even replacing the new vdsminterface with the new one I'm developing :) I'd like to use the blessed JSON-RPC interface, but what's the recommended way to do that? What is (or will be!) the official recommended VDSM external client interface? I thought about patch https://gerrit.ovirt.org/#/c/39203/ But my _impression_ is that patch will depend on VDSM's internal reactor, thus is not very suitable to be used into an external process. I've written my own extremely crude client using the stomp library. Nir also has a patch [1] on gerrit to do this. Maybe he can provide some insight. It'd be nice if the vdsm-yajsonrpc package could provide a full-featured client class that could be easily integrated into projects like MOM. I will try to provide simple client for people to use. Thanks Piotr! I'm sure you can come up with a much more elegant way to stitch the existing classes together to do what we need. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [vdsm][mom][jsonrpc] new VDSM interface for MOM
On 11/05/15 04:28 -0400, Francesco Romani wrote: Hi everyone, I'm working to brush up and enhance my old hack https://gerrit.ovirt.org/#/c/37827/1 That patch adds a new MOM interface, to talk with VDSM using the RPC interface. On top of that, I want to make efficient use of VDSM API (avoid redundant call, possibly issuing only one getAllVmStats call and caching the results, and so forth) Next step will be to backport optimizations to current vdsmInterface. Or maybe, even replacing the new vdsminterface with the new one I'm developing :) I'd like to use the blessed JSON-RPC interface, but what's the recommended way to do that? What is (or will be!) the official recommended VDSM external client interface? I thought about patch https://gerrit.ovirt.org/#/c/39203/ But my _impression_ is that patch will depend on VDSM's internal reactor, thus is not very suitable to be used into an external process. I've written my own extremely crude client using the stomp library. Nir also has a patch [1] on gerrit to do this. Maybe he can provide some insight. It'd be nice if the vdsm-yajsonrpc package could provide a full-featured client class that could be easily integrated into projects like MOM. [1] https://gerrit.ovirt.org/#/c/35181/ -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Engine on Fedora 21
On 02/02/15 15:41 -0500, Greg Sheremeta wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Juan Hernández jhern...@redhat.com Cc: devel@ovirt.org Sent: Monday, February 2, 2015 2:26:29 PM Subject: Re: [ovirt-devel] Engine on Fedora 21 On 02/02/15 13:55 +0100, Juan Hernández wrote: On 02/02/2015 07:56 AM, Sandro Bonazzola wrote: Il 29/01/2015 22:30, Adam Litke ha scritto: On 29/01/15 16:18 -0500, Yedidyah Bar David wrote: - Original Message - From: Adam Litke ali...@redhat.com To: devel@ovirt.org Sent: Thursday, January 29, 2015 9:46:27 PM Subject: [ovirt-devel] Engine on Fedora 21 Hi all, Today I tried running engine on my Fedora 21 laptop. I tried two approaches for deploying jboss: using the ovirt-jboss-as package, and by downloading and unpacking jboss-7.1.1 into /usr/share as I have done in the past. engine-setup runs without errors but when I try to start engine the application does not seem to deploy in jboss and there are no errors reported (engine.log is empty). Is there a reasonable expectation that I should be able to get this working on F21 or am I wasting my time? Does anyone have any ideas on how I can resolve the startup issues? Which Java version did you try to use it with? java-1.8.0-openjdk-1.8.0.31-3.b13.fc21.x86_64 Did you have a look at [1]? In short: won't be, wait for f22. Yeah, didn't see much documentation of specific issues and the tracker bug looks pretty clean as far as general engine usability goes. Everything should be installable right now in F21 but jboss-as 7.1 doesn't work with java 1.8. We'll need to move to wildfly or backport java7 in order to make it working. Alternatively, if it is for development purposes only, you may want to consider using JBoss EAP 6.x instead of JBoss AS 7.1.1. The root cause of the incompatibility has been fixed there (and in WildFly): https://issues.jboss.org/browse/WFLY-2057 You can get JBoss EAP from here: http://www.jboss.org/products/eap/download Then you can unzip it to your favorite directory and use it during installation of oVirt Engine: # engine-setup --jboss-home=/whatever/jboss-eap-6.3 It should work well with Java 8. If it doesn't work it is good to know, as we will need to fix it eventually. Thanks for the suggestions everyone. I ended up installing openJDK-1.7 alongside the stock 1.8 and it's working again for me. via yum or did you just download it? Both :) I had to use yumdownloader to grab the rpms from the f20 repo and then install them using rpm --nodeps (since the 1.8 rpms from f21 have an Obseletes: openjdk-1.7). Perhaps depending on your answer, can't this fulfill the F21 support feature? I'll try to give those other JBoss versions a try soon though. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] Engine on Fedora 21
Hi all, Today I tried running engine on my Fedora 21 laptop. I tried two approaches for deploying jboss: using the ovirt-jboss-as package, and by downloading and unpacking jboss-7.1.1 into /usr/share as I have done in the past. engine-setup runs without errors but when I try to start engine the application does not seem to deploy in jboss and there are no errors reported (engine.log is empty). Is there a reasonable expectation that I should be able to get this working on F21 or am I wasting my time? Does anyone have any ideas on how I can resolve the startup issues? See logs attached... -- Adam Litke 14:42:59,814 INFO [org.jboss.modules] JBoss Modules version 1.1.1.GA 14:42:59,962 INFO [org.jboss.msc] JBoss MSC version 1.0.2.GA 14:42:59,994 INFO [org.jboss.as] JBAS015899: JBoss AS 7.1.1.Final Brontes starting 14:43:02,359 INFO [org.xnio] XNIO Version 3.0.3.GA 14:43:02,369 INFO [org.jboss.as.logging] JBAS011502: Removing bootstrap log handlers 14:44:55,300 INFO [org.jboss.as.logging] JBAS011503: Restored bootstrap log handlers 14:44:55,340 INFO [com.arjuna.ats.jbossatx] ARJUNA032018: Destroying TransactionManagerService 14:44:55,341 INFO [com.arjuna.ats.jbossatx] ARJUNA032014: Stopping transaction recovery manager 14:44:55,832 INFO [org.apache.coyote.http11.Http11Protocol] Pausing Coyote HTTP/1.1 on http--0.0.0.0-8443 14:44:55,832 INFO [org.apache.coyote.http11.Http11Protocol] Stopping Coyote HTTP/1.1 on http--0.0.0.0-8443 14:44:55,832 INFO [org.apache.coyote.http11.Http11Protocol] Pausing Coyote HTTP/1.1 on http--0.0.0.0-8080 14:44:55,834 INFO [org.apache.coyote.http11.Http11Protocol] Stopping Coyote HTTP/1.1 on http--0.0.0.0-8080 14:45:01,218 INFO [org.jboss.as] JBAS015950: JBoss AS 7.1.1.Final Brontes stopped in 5933ms 2015-01-29 14:43:02,379 INFO [org.xnio.nio] (MSC service thread 1-1) XNIO NIO Implementation Version 3.0.3.GA 2015-01-29 14:43:02,381 INFO [org.jboss.as.security] (ServerService Thread Pool -- 31) JBAS013101: Activating Security Subsystem 2015-01-29 14:43:02,383 INFO [org.jboss.as.security] (MSC service thread 1-4) JBAS013100: Current PicketBox version=4.0.7.Final 2015-01-29 14:43:02,486 INFO [org.jboss.as.naming] (ServerService Thread Pool -- 28) JBAS011800: Activating Naming Subsystem 2015-01-29 14:43:02,488 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 23) JBAS010280: Activating Infinispan subsystem. 2015-01-29 14:43:02,709 INFO [org.jboss.as.connector] (MSC service thread 1-13) JBAS010408: Starting JCA Subsystem (JBoss IronJacamar 1.0.9.Final) 2015-01-29 14:43:03,353 INFO [org.jboss.remoting] (MSC service thread 1-2) JBoss Remoting version 3.2.3.GA 2015-01-29 14:43:04,106 INFO [org.jboss.as.connector.subsystems.datasources] (ServerService Thread Pool -- 19) JBAS010404: Deploying non-JDBC-compliant driver class org.postgresql.Driver (version 9.1) 2015-01-29 14:43:04,111 INFO [org.jboss.as.naming] (MSC service thread 1-10) JBAS011802: Starting Naming Service 2015-01-29 14:43:05,082 INFO [org.jboss.as.remoting] (MSC service thread 1-12) JBAS017100: Listening on /127.0.0.1:8703 2015-01-29 14:43:05,152 INFO [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-3) Starting Coyote HTTP/1.1 on http--0.0.0.0-8080 2015-01-29 14:43:06,622 INFO [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-15) Starting Coyote HTTP/1.1 on http--0.0.0.0-8443 2015-01-29 14:43:06,669 INFO [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-13) JBAS010400: Bound data source [java:/ENGINEDataSourceNoJTA] 2015-01-29 14:43:06,670 INFO [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-13) JBAS010400: Bound data source [java:/ENGINEDataSource] ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] mom-0.4.3
Hi all, A recent commit [1] to vdsm (a99bacf) introduced a dependency on mom-0.4.3 but mom-0.4.3 does not yet exist. To work around this problem you may build a pre-release src.rpm [2] of mom-0.4.3 that includes the needed functionality. Once we have enough content for a mom point release I'll build the official upstream packages and release an update. Sorry for the inconvenience. [1] http://gerrit.ovirt.org/#/c/35407/ [2] http://people.redhat.com/~alitke/mom-0.4.3-0.0.aglpre.fc20.src.rpm -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] contEIOVMs regression?
On 21/11/14 10:03 -0500, Nir Soffer wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Nir Soffer nsof...@redhat.com Cc: devel@ovirt.org, Francesco Romani from...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Dan Kenigsberg dan...@redhat.com Sent: Friday, November 21, 2014 4:46:13 PM Subject: Re: contEIOVMs regression? On 20/11/14 17:37 -0500, Nir Soffer wrote: - Original Message - From: Adam Litke ali...@redhat.com To: devel@ovirt.org Cc: Nir Soffer nsof...@redhat.com, Francesco Romani from...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Dan Kenigsberg dan...@redhat.com Sent: Thursday, November 20, 2014 9:15:33 PM Subject: contEIOVMs regression? Hi list, I am taking a look at Bug 1157421 [1] which describes a situation where VM's that are paused with an -EIO error are not automatically resumed after the problem with storage has been corrected. I have some patches [2] on gerrit that resolve the problem. Since this appears to be a regression I am looking at a non-intrusive way to fix it in the 3.5 branch. There is some disagreement on the proper way to fix this so I am hoping we can arrive at a solution through an open discussion. The main issue at hand is with the Event/Callback mechanism we use to call clientIF.contEIOVMs. According to my experiments and this online discussion [3] weakref does not work for instance methods such as clientIF.contEIOVMs. Our Event class uses weakref to prevent it from holding references to registered callback functions. Why making event system more correct is required to tix [1]? I see two easy ways to fix the regression: I don't follow, what is the regression? Assuming that at some point contEIOVMs actually worked and was able to automatically resume VMs, then we have a regression because given the weakref problems I am describing herein there is no way that it is working now. The only way we don't have a regression is if this code has never worked to begin with. The current code is master do work - when I fixed this last time, the problem was that we did not register the callback before starting the monitors, and that the monitors did not issue a state change on the first time a monitor check the domain state. I verified that contEIOVMs is called and that it does try to continue vms. Very curious. I am working with 3.5.0. The main difference (other than branch) is that I am working in an environment with no connected storage pool. Though I still can't see how the weakref stuff could be working in master. If this does not break now (with current code), please open a bug. 1) Treat clientIF as a singleton class (which it is) and make contEIOVMs a module-level method which gets the clientIF instance and calls it's bound contEIOVMs method. See my patches [2] for the code behind this idea. This is the wrong direction. There is only one place using that horrible getInstance(), and it also could just create the single instance that we need. We should remove getInstance() instead of using it in new code. 2) Allow Event to maintain a strong reference on the bound clientIF.contEIOVMs method. This will allow the current code to work as designed but will change the Event implementation to accomodate this specific use case. Since no one else appears to be using this code, it should have no functional impact. The code is already holding a strong reference now, no change is needed :-) I disagree. From vdsm/storage/misc.py: class Event(object): ... def register(self, func, oneshot=False): with self._syncRoot: self._registrar[id(func)] = (weakref.ref(func), oneshot) ... # ^^^ He's dead Jim The function is converted into a weak reference. Since, in this case, the function is an instance method, the reference is immediately dead on arrival. I have verified this with debugging statements in my environment. So you suggest that taking a weakref to an instance method returns a dead reference? I thought that the problem is instance method keep hard reference to the instance, so the weakref is useless. Yeah, try out this test program to see what I mean: #!/usr/bin/env python import weakref from functools import partial class A(object): def __init__(self): self.r1 = weakref.ref(self.a) self.r2 = partial(A.a, weakref.proxy(self)) def a(self): print Hello from a def main(): obj = A() print obj.r1 obj.r2() if __name__ == '__main__': main() -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] contEIOVMs regression?
Hi list, I am taking a look at Bug 1157421 [1] which describes a situation where VM's that are paused with an -EIO error are not automatically resumed after the problem with storage has been corrected. I have some patches [2] on gerrit that resolve the problem. Since this appears to be a regression I am looking at a non-intrusive way to fix it in the 3.5 branch. There is some disagreement on the proper way to fix this so I am hoping we can arrive at a solution through an open discussion. The main issue at hand is with the Event/Callback mechanism we use to call clientIF.contEIOVMs. According to my experiments and this online discussion [3] weakref does not work for instance methods such as clientIF.contEIOVMs. Our Event class uses weakref to prevent it from holding references to registered callback functions. I see two easy ways to fix the regression: 1) Treat clientIF as a singleton class (which it is) and make contEIOVMs a module-level method which gets the clientIF instance and calls it's bound contEIOVMs method. See my patches [2] for the code behind this idea. 2) Allow Event to maintain a strong reference on the bound clientIF.contEIOVMs method. This will allow the current code to work as designed but will change the Event implementation to accomodate this specific use case. Since no one else appears to be using this code, it should have no functional impact. Are there any other ideas I'm missing? I am aware of plans to refactor this code for 3.6 but I am more interested in a short-term, practical solution to address the current regression. Thanks for offering your insight on this problem. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1157421 [2] http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:bug1157421,n,z -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Unstable network connections after installing vdsm
On 10/11/14 11:19 -0500, Ondřej Svoboda wrote: Hi Adam, there are some known issues, which depend on versions of software you are running. If you are using EL7 (or Fedoras?) you may want to switch SELinux to permissive mode and turn off NetworkManager (both are separate problems with bugs open for them [1, 2]). Then give it another go. I think this could be your case. Please begin with NM, which is my suspect. Using Fedora 20 and I think this was the culprit. On my newly installed machines I am yet to reproduce the problem with NM disabled. Have we considered making the vdsm package conflict with NetworkManager? Or is this just a temporary situation? If you are on EL6 you might be experiencing the traffic control (tc) utility or even the kernel not supporting certain commands. I came to looking at this problem finally so I might be able to sort it out (or ask Toni for help). In the mean time, could you let us know what version of VDSM, selinux-policy and NetworkManager you are running? Could you attach /var/log/vdsm/supervdsm.log and /var/log/vdsm/vdsm.log? Does someting (NetworkManager!) in the journal seem fishy? No repro but here are the package versions: vdsm-4.16.0-522.git4a3768f.fc20.x86_64 NetworkManager-0.9.9.0-46.git20131003.fc20.x86_64 selinux-policy-3.12.1-193.fc20.noarch -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] Unstable network connections after installing vdsm
I've been experiencing peculiar and annoying networking behavior on my oVirt development hosts and I'm hoping someone familiar with vdsm networking configuration can help me get to the bottom of it. My setup is two mini-Dells acting as virt hosts and ovirt engine running on my laptop. The dells get their network config from a cobbler instance running on my laptop which also provides PXE services. After freshly installing the dells, I get a nice, stable network connection. After installing vdsm, the connection seems to drop occasionally. I have visit the machine, log into the console, and execute 'dhclient ovirtmgmt'. This fixes the problem again for awhile. Does this sound like anything someone has seen before? What would be the best way to start debugging/diagnosing this issue? Thanks in advance for your responses. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [ovirt-users] OVIRT-3.5-TEST-DAY-3: replace XML-rpc with JSON-rpc
On 17/09/14 15:46 -0400, Francesco Romani wrote: - Original Message - From: Francesco Romani from...@redhat.com To: devel@ovirt.org Cc: users us...@ovirt.org Sent: Wednesday, September 17, 2014 5:33:01 PM Subject: [ovirt-users] OVIRT-3.5-TEST-DAY-3: replace XML-rpc with JSON-rpc Everything I tried went OK, and logs look good to me. I run in a few hiccups, which I mention for the sake of completeness: - VDSM refused to start or run VMs initially: libvirt config included relics from past environment on the same box, not JSON-rpc fault. Fixed with new config and (later) a reboot. - Trying recovery, Engine took longer than expected to sync up with VDSM. I have not hard data and feeling is not enough to file a BZ, so I didn't. - Still trying recovery, one and just one time Engine had stale data from VDSM (reported two VMs as present which actually aren't). Not sure it was related to JSON-rpc, can't reproduce, so not filed a BZ. I need to partially amend this statement as, running more benchmarks/profiling, I got this twice in a row INFO:root:starting 100 vms INFO:root:start: serial execution INFO:root:Starting VM: XS_C000 INFO:root:Starting VM: XS_C001 INFO:root:Starting VM: XS_C002 Traceback (most recent call last): File ./observe.py, line 154, in module data = bench(host, 'XS_C%03i', first, last, api, outfile, mins * 60.) File ./observe.py, line 122, in bench start(vms) File ./observe.py, line 66, in start vm.start() File ./observe.py, line 54, in start self._handle.start() File /usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py, line 16507, in start headers={Correlation-Id:correlation_id} File /usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py, line 118, in request persistent_auth=self._persistent_auth) File /usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py, line 140, in __doRequest persistent_auth=persistent_auth File /usr/lib/python2.7/site-packages/ovirtsdk/web/connection.py, line 134, in doRequest raise RequestError, response ovirtsdk.infrastructure.errors.RequestError: status: 400 reason: Bad Request detail: Network error during communication with the Host. (this is a runner script using ovirt sdk for python, source is available on demand and will be published anyway soon[ish]) On engine logs I see something alike this: http://fpaste.org/134263/ Since the above is way too vague to file a meaningful BZ, I'm now continuing the investigation to see if there is a bug somewhere or if it's a hiccup of my local environment. I just want to note that I have been experiencing vague, intermittent jsonRPC issues with my environment also. I have filed 1143042 which I believe to be a symptom of unreliable communication. It seems to me that we have a definite problem to work out. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] mom-0.4.2 released (Karma requested)
Hi all, I have released mom-0.4.2 and submitted updates for f20[1], el6[2], and epel7[3]. If you have a spare cycle, please install this new version from the updates-testing repository and add a comment in the fedora updates system. This will help expedite the rollout of this update. Thanks! [1] https://admin.fedoraproject.org/updates/FEDORA-2014-10757/mom-0.4.2-1.fc20 [2] https://admin.fedoraproject.org/updates/mom-0.4.2-1.el6 [3] https://admin.fedoraproject.org/updates/mom-0.4.2-1.el7 -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] mom-0.4.2 released (Karma requested)
On 12/09/14 18:09 +0200, Sven Kieske wrote: I see no new branch created at http://gerrit.ovirt.org/#/admin/projects/mom,branches ? This is all still in master branch. We decided there is no point in creating branches for most releases since we're just releasing straight releases into fedora. If there becomes a need for a stable branch in the future more will need to change than just a branch in mom. We'll need to ship mom RPMs in oVirt and make sure the vdsm depends on the oVirt version instead of the latest upstream version. Hoping to avoid all of this for now. Also a changelog would be awesome to have (there seem to be no huge changes since 0.4.1). Nothing huge here. The purpose of this release is to allow vdsm to bump the version of mom it depends on for enabling memory ballooning functional tests. Thanks for the new release anyway! My pleasure. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] mom-0.4.2 release and oVirt-3.5
Hi all, Dan has asked for a new release of mom (so that vdsm can be sure to depend on the latest code upstream). I would like to do one more release prior to oVirt-3.5 in order to get anything required for 3.5 features in the upstream Fedora/EPEL repos. Is there anything else that will be needed for this release? -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] How do I build oVirt Jenkins jobs locally
Hey guys, I am following http://www.ovirt.org/Local_Jenkins_For_The_People in order to set up a build env for ovirt-engine. I've got the basic setup running but I'd like to be able to build rpms in the same way that we do on oVirt.org. I came across the 'jenkins' repo in gerrit but I can't figure out how to use that to create an XML file for the create_rpms job suitable for import into jenkins. Can anyone point me in the right direction? -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] What does your oVirt development environment look like?
On 21/08/14 09:17 -0400, Greg Sheremeta wrote: Good idea. Thanks, and thank you for sharing. I work on the UI, so I don't have much of a need for a complex setup. I have the two mini dells, and then I have two much more powerful personal machines that I use for work -- machine 1 (dauntless) is my main development machine, and machine 2 (starbase) is my main home server. I compile and run engine on dauntless, and starbase serves NFS and SMB. I don't have iscsi setup, although I probably should learn this. I use nested virt for all my hosts, For a Friday afternoon project you might want to check out this easy to follow guide for targetcli. It's what I use for software iSCSI and it works pretty well for me: https://wiki.archlinux.org/index.php/ISCSI_Target so mini dell 1 and mini dell 2 both run Fedora 20 and I basically just remote to them to install vms via virt-manager. I had cobbler running at one point, but I got frustrated with it one too many times and gave up. Now I just have a giant collection of isos available via NFS (and scattered on the desktops of the mini dells :)) I typically install fresh hosts using the F20 network-install iso. It's a little slower, but very reliable. Yeah, I am wondering if this would be a better approach (though I really do like the unattended PXE installations I can do with cobbler). I tend to not need more than one of two database instances at a time. I gave up using my laptop for primary development because I need three monitors on my dev rig, and my laptop supports two max. (I'm currently heartbroken at the lack of USB3 video for linux. See [1].) I basically use my laptop as a remote viewer to dauntless now when I'm working in bed or wanting to sit out on the porch. (RealVNC encrypted mode -- I use an xrandr script to toggle off two of dauntless's monitors, and then I full-screen VNC.) Old pic of my desk: [2] Wow, I feel really low-tech with my single widescreen monitor here. Dauntless, starbase, the dells, and all monitors are connected to a giant UPS. Home network equipment is all connected to another UPS. I've given some thought to building a distributed compile of ovirt (specifically the GWT part -- maybe distribute each permutation to worker nodes), but I was under the impression that most people just use their laptop for work. I think a distributed compile would be pretty nice for me, but not sure how many people would use it. ? I try to compile engine as infrequently as possible. Due to what it does to my running system, I usually reboot afterwords too. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] What does your oVirt development environment look like?
Ever since starting to work on oVirt around 3 years ago I've been striving for the perfect development and test environment. I was inspired by Yaniv's recent deep dive on Foreman integration and thought I'd ask people to share their setups and any tips and tricks so we can all become better, more efficient developers. My setup consists of my main work laptop and two mini-Dell servers. I run the engine on my laptop and I serve NFS and iSCSI (using targetcli) from this system as well. I use the ethernet port on the laptop to connect it to a subnet with the two Dell systems. Some goals for my setup are: - Easy provisioning of the virt-hosts so I can quickly test on Fedora and CentOS without spending lots of time reinstalling - Ability to test block and nfs storage - Automation of test scenarios involving engine and hosts To help me reach these goals I've deployed cobbler on my laptop and it does a pretty good job at managing PXE boot configurations for my hosts (and VMs) so they can be automatically intalled as needed. After viewing Yaniv's presentation, it seems that Forman/Puppet are the way of the future but it does seem a bit more involved to set up. I am definitely curious if others are using Foreman in their personal dev/test environment and can offer some insight on how that is working out. Thanks, and I look forward to reading about more of your setups! If we get enough of these, maybe this could make a good section of the wiki. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] What does your oVirt development environment look like?
On 15/08/14 15:57 -0400, Yair Zaslavsky wrote: - Original Message - From: ybronhei ybron...@redhat.com To: Adam Litke ali...@redhat.com, devel@ovirt.org Sent: Friday, August 15, 2014 7:36:23 PM Subject: Re: [ovirt-devel] What does your oVirt development environment look like? On 08/15/2014 09:32 AM, Adam Litke wrote: Ever since starting to work on oVirt around 3 years ago I've been striving for the perfect development and test environment. I was inspired by Yaniv's recent deep dive on Foreman integration and thought I'd ask people to share their setups and any tips and tricks so we can all become better, more efficient developers. My setup consists of my main work laptop and two mini-Dell servers. I run the engine on my laptop and I serve NFS and iSCSI (using targetcli) from this system as well. I use the ethernet port on the laptop to connect it to a subnet with the two Dell systems. Some goals for my setup are: - Easy provisioning of the virt-hosts so I can quickly test on Fedora and CentOS without spending lots of time reinstalling - Ability to test block and nfs storage - Automation of test scenarios involving engine and hosts To help me reach these goals I've deployed cobbler on my laptop and it does a pretty good job at managing PXE boot configurations for my hosts (and VMs) so they can be automatically intalled as needed. After viewing Yaniv's presentation, it seems that Forman/Puppet are the way of the future but it does seem a bit more involved to set up. I am definitely curious if others are using Foreman in their personal dev/test environment and can offer some insight on how that is working out. Thanks, and I look forward to reading about more of your setups! If we get enough of these, maybe this could make a good section of the wiki. Heppy to hear :) for those who missed - https://www.youtube.com/watch?v=gozX891kYAY each one has its own needs and goals I guess, but if you say it might help, I'll never say no for sharing :P I have 3 dells under my desk, I compile the engine a lot and its heavy for my laptop. So I clone my local working directory and build it on the strongest mini-dell using local jenkins server (http://www.ovirt.org/Local_Jenkins_For_The_People). The other 2 I use as hypervisor when needed. provision them is done by me manually :/.. cobbler pxe boot could help with already defined image.. Other then that, I have nfs mount for storage and few vms for compilation and small tests Haven't used Jenkins for the people for quite some time, it's awesome though. Yaniv, does your Jenkins build all your local branches? I don't have much to share, my environment is even simpler. I am sure it's a common knowledge but still a reminder (even if a new developer can benefit from it, it will be good) - you can create a database schema per each branch you work on, and if needed to switch between branches, you don't have to destroy your current database. Quite helpful, I must say , for someone who works 100% on engine related stuff. Thanks for sharing... How do you manage your multiple db schemas? Just with the engine-backup and engine-restore commands? -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] What does your oVirt development environment look like?
On 15/08/14 16:20 -0400, Alon Bar-Lev wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Yair Zaslavsky yzasl...@redhat.com Cc: devel@ovirt.org Sent: Friday, August 15, 2014 11:17:05 PM Subject: Re: [ovirt-devel] What does your oVirt development environment look like? On 15/08/14 15:57 -0400, Yair Zaslavsky wrote: - Original Message - From: ybronhei ybron...@redhat.com To: Adam Litke ali...@redhat.com, devel@ovirt.org Sent: Friday, August 15, 2014 7:36:23 PM Subject: Re: [ovirt-devel] What does your oVirt development environment look like? On 08/15/2014 09:32 AM, Adam Litke wrote: Ever since starting to work on oVirt around 3 years ago I've been striving for the perfect development and test environment. I was inspired by Yaniv's recent deep dive on Foreman integration and thought I'd ask people to share their setups and any tips and tricks so we can all become better, more efficient developers. My setup consists of my main work laptop and two mini-Dell servers. I run the engine on my laptop and I serve NFS and iSCSI (using targetcli) from this system as well. I use the ethernet port on the laptop to connect it to a subnet with the two Dell systems. Some goals for my setup are: - Easy provisioning of the virt-hosts so I can quickly test on Fedora and CentOS without spending lots of time reinstalling - Ability to test block and nfs storage - Automation of test scenarios involving engine and hosts To help me reach these goals I've deployed cobbler on my laptop and it does a pretty good job at managing PXE boot configurations for my hosts (and VMs) so they can be automatically intalled as needed. After viewing Yaniv's presentation, it seems that Forman/Puppet are the way of the future but it does seem a bit more involved to set up. I am definitely curious if others are using Foreman in their personal dev/test environment and can offer some insight on how that is working out. Thanks, and I look forward to reading about more of your setups! If we get enough of these, maybe this could make a good section of the wiki. Heppy to hear :) for those who missed - https://www.youtube.com/watch?v=gozX891kYAY each one has its own needs and goals I guess, but if you say it might help, I'll never say no for sharing :P I have 3 dells under my desk, I compile the engine a lot and its heavy for my laptop. So I clone my local working directory and build it on the strongest mini-dell using local jenkins server (http://www.ovirt.org/Local_Jenkins_For_The_People). The other 2 I use as hypervisor when needed. provision them is done by me manually :/.. cobbler pxe boot could help with already defined image.. Other then that, I have nfs mount for storage and few vms for compilation and small tests Haven't used Jenkins for the people for quite some time, it's awesome though. Yaniv, does your Jenkins build all your local branches? I don't have much to share, my environment is even simpler. I am sure it's a common knowledge but still a reminder (even if a new developer can benefit from it, it will be good) - you can create a database schema per each branch you work on, and if needed to switch between branches, you don't have to destroy your current database. Quite helpful, I must say , for someone who works 100% on engine related stuff. Thanks for sharing... How do you manage your multiple db schemas? Just with the engine-backup and engine-restore commands? just create N empty databases, install each environment to different PREFIX and when running engine-setup select one for each environment. Even better. Thank you! refer to README.developer at engine repo. BTW: with proper listen ports customization, you can even have N engine instances running at same machine at same time. Alon -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [QA] [ACTION REQUIRED] oVirt 3.5.0 RC2 status
On 12/08/14 17:43 +0200, Sandro Bonazzola wrote: Hi, tomorrow we should compose oVirt 3.5.0 RC2 starting at 08:00 UTC We still have the following blockers list: Bug ID Whiteboard Status Summary 1127294 storage POSTLive Merge: Resolve unknown merge status in vdsm after host crash 1109920 storage POSTLive Merge: Extend internal block volumes during merge There are several patches for master (6) that must be merged and backported to 3.5. Thanks Francesco for your reviews (I will repost the series this afternoon for followup review). I would appreciate a look by those I've included as reviewers (you received a separate email from me) so we can converge on these ASAP. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] python-ioprocess for el7?
On 14/07/14 10:12 -0400, Douglas Schilling Landgraf wrote: On 07/12/2014 11:54 PM, Douglas Schilling Landgraf wrote: On 07/11/2014 07:46 AM, Dan Kenigsberg wrote: On Thu, Jul 10, 2014 at 05:01:21PM -0400, Adam Litke wrote: Hi, I am looking for python-ioprocess RPMs (new enough for latest vdsm requirements). Can anyone point me in the right direction? Thanks! Looking at https://admin.fedoraproject.org/updates/search/python-pthreading https://admin.fedoraproject.org/updates/search/python-cpopen https://admin.fedoraproject.org/updates/search/ioprocess I can confirm that we miss quite a bit of our dependencies for el7. Douglas, Yaniv: can you have them built? I see that http://dl.fedoraproject.org/pub/epel/beta/7/x86_64/ already exists, and I hope to see out packages there. Sure, please refresh, python-cpopen and python-pthreading should be there now. However, ioprocess requires Saggi interaction. Hi Adam, I got access to build ioprocess, should be soon at http://dl.fedoraproject.org/pub/epel/beta/7/x86_64/ or right now you can get via: http://koji.fedoraproject.org/koji/taskinfo?taskID=7137068 Hmm, this seems to still be for version 0.3-2. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] Custom fencing with virsh_fence
Hi all, I am trying to configure custom fencing using fence_virsh in order to test out fencing flows with my virtualized oVirt hosts. I'm getting a failure when clicking the Test button. Can someone help me to diagnose the problem? I have applied the following settings using engine-config: ~/ovirt-engine/bin/engine-config -s CustomVdsFenceType=xxxvirt ~/ovirt-engine/bin/engine-config -s CustomFenceAgentMapping=xxxvirt=virsh ~/ovirt-engine/bin/engine-config -s CustomVdsFenceOptionMapping=xxxvirt:address=ip,username=username,password=password (note that engine-config seems to arbitrarily limit the number of mapped options to 3. Seems like a bug to me). Here is the log output in engine.log: 2014-07-15 11:43:34,813 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (http--0.0.0.0-8080-1) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host centennial from cluster block was chosen as a proxy to execute Status command on Host cascade. 2014-07-15 11:43:34,813 INFO [org.ovirt.engine.core.bll.FenceExecutor] (http--0.0.0.0-8080-1) Using Host centennial from cluster block as proxy to execute Status command on Host 2014-07-15 11:43:34,815 INFO [org.ovirt.engine.core.bll.FenceExecutor] (http--0.0.0.0-8080-1) Executing Status Power Management command, Proxy Host:centennial, Agent:virsh, Target Host:, Management IP:192.168.2.101, User:root, Options: 2014-07-15 11:43:34,816 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) START, FenceVdsVDSCommand(HostName = centennial, HostId = a34f7dbc-dd99-4831-a1a9-54c411080ec1, targetVdsId = b6b9d480-e20f-411a-9b9c-883fac32a4e5, action = Status, ip = 192.168.2.101, port = , type = virsh, user = root, password = **, options = ''), log id: 24f33bda 2014-07-15 11:43:34,875 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) Failed in FenceVdsVDS method, for vds: centennial; host: 192.168.2.103 2014-07-15 11:43:34,876 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) Command FenceVdsVDSCommand(HostName = centennial, HostId = a34f7dbc-dd99-4831-a1a9-54c411080ec1, targetVdsId = b6b9d480-e20f-411a-9b9c-883fac32a4e5, action = Status, ip = 192.168.2.101, port = , type = virsh, user = root, password = **, options = '') execution failed. Exception: ClassCastException: [Ljava.lang.Object; cannot be cast to java.lang.String 2014-07-15 11:43:34,877 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) FINISH, FenceVdsVDSCommand, log id: 24f33bda -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] Custom fencing with virsh_fence
On 15/07/14 17:59 +0200, Juan Hernandez wrote: On 07/15/2014 05:51 PM, Adam Litke wrote: Hi all, I am trying to configure custom fencing using fence_virsh in order to test out fencing flows with my virtualized oVirt hosts. I'm getting a failure when clicking the Test button. Can someone help me to diagnose the problem? I have applied the following settings using engine-config: ~/ovirt-engine/bin/engine-config -s CustomVdsFenceType=xxxvirt ~/ovirt-engine/bin/engine-config -s CustomFenceAgentMapping=xxxvirt=virsh ~/ovirt-engine/bin/engine-config -s CustomVdsFenceOptionMapping=xxxvirt:address=ip,username=username,password=password (note that engine-config seems to arbitrarily limit the number of mapped options to 3. Seems like a bug to me). Here is the log output in engine.log: 2014-07-15 11:43:34,813 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (http--0.0.0.0-8080-1) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host centennial from cluster block was chosen as a proxy to execute Status command on Host cascade. 2014-07-15 11:43:34,813 INFO [org.ovirt.engine.core.bll.FenceExecutor] (http--0.0.0.0-8080-1) Using Host centennial from cluster block as proxy to execute Status command on Host 2014-07-15 11:43:34,815 INFO [org.ovirt.engine.core.bll.FenceExecutor] (http--0.0.0.0-8080-1) Executing Status Power Management command, Proxy Host:centennial, Agent:virsh, Target Host:, Management IP:192.168.2.101, User:root, Options: 2014-07-15 11:43:34,816 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) START, FenceVdsVDSCommand(HostName = centennial, HostId = a34f7dbc-dd99-4831-a1a9-54c411080ec1, targetVdsId = b6b9d480-e20f-411a-9b9c-883fac32a4e5, action = Status, ip = 192.168.2.101, port = , type = virsh, user = root, password = **, options = ''), log id: 24f33bda 2014-07-15 11:43:34,875 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) Failed in FenceVdsVDS method, for vds: centennial; host: 192.168.2.103 2014-07-15 11:43:34,876 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) Command FenceVdsVDSCommand(HostName = centennial, HostId = a34f7dbc-dd99-4831-a1a9-54c411080ec1, targetVdsId = b6b9d480-e20f-411a-9b9c-883fac32a4e5, action = Status, ip = 192.168.2.101, port = , type = virsh, user = root, password = **, options = '') execution failed. Exception: ClassCastException: [Ljava.lang.Object; cannot be cast to java.lang.String 2014-07-15 11:43:34,877 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (http--0.0.0.0-8080-1) FINISH, FenceVdsVDSCommand, log id: 24f33bda Looks like this bug: https://bugzilla.redhat.com/1114977 Indeed it is. So I looked at the host to see what the failure was and I get the following messages. It looks like engine is not passing the contents of the 'Slot' UI field as the port option. This, even after I changed the param mapping like so: engine-config -s CustomVdsFenceOptionMapping=xxxvirt:address=ip,username=username,password=password,slot=port Thread-440::DEBUG::2014-07-15 15:06:46,997::API::1165::vds::(fenceNode) fenceNode(addr=192.168.2.101,port=,agent=virsh,user=root,passwd=,action=status,secure=,options==block-cascade) Thread-440::DEBUG::2014-07-15 15:06:46,997::utils::594::root::(execCmd) /usr/sbin/fence_virsh (cwd None) Thread-440::DEBUG::2014-07-15 15:06:47,035::utils::614::root::(execCmd) FAILED: err = WARNING:root:Parse error: Ignoring unknown option '=block-cascade'\n\nERROR:root:Failed: You have to enter plug number or machine identification\n\nERROR:root:Please use '-h' for usage\n\n; rc = 1 Thread-440::DEBUG::2014-07-15 15:06:47,035::API::1152::vds::(fence) rc 1 inp agent=fence_virsh ipaddr=192.168.2.101 login=root action=status passwd= =block-cascade out [] err [WARNING:root:Parse error: Ignoring unknown option '=block-cascade', '', 'ERROR:root:Failed: You have to enter plug number or machine identification', '', ERROR:root:Please use '-h' for usage, ''] Thread-440::DEBUG::2014-07-15 15:06:47,035::API::1188::vds::(fenceNode) rc 1 in agent=fence_virsh ipaddr=192.168.2.101 login=root action=status passwd= =block-cascade out [] err [WARNING:root:Parse error: Ignoring unknown option '=block-cascade', '', 'ERROR:root:Failed: You have to enter plug number or machine identification', '', ERROR:root:Please use '-h' for usage, ''] -- Dirección Comercial: C/Jose Bardasano Baos, 9, Edif. Gorbea 3, planta 3ºD, 28016 Madrid, Spain Inscrita en el Reg. Mercantil de Madrid – C.I.F. B82657941 - Red Hat S.L. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [vdsm] VM recovery now depends on HSM
On 10/07/14 08:40 +0200, Michal Skrivanek wrote: On Jul 9, 2014, at 15:38 , Nir Soffer nsof...@redhat.com wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Michal Skrivanek michal.skriva...@redhat.com Cc: devel@ovirt.org Sent: Wednesday, July 9, 2014 4:19:09 PM Subject: Re: [ovirt-devel] [vdsm] VM recovery now depends on HSM On 09/07/14 13:11 +0200, Michal Skrivanek wrote: On Jul 8, 2014, at 22:36 , Adam Litke ali...@redhat.com wrote: Hi all, As part of the new live merge feature, when vdsm starts and has to recover existing VMs, it calls VM._syncVolumeChain to ensure that vdsm's view of the volume chain matches libvirt's. This involves two kinds of operations: 1) sync VM object, 2) sync underlying storage metadata via HSM. This means that HSM must be up (and the storage domain(s) that the VM is using must be accessible. When testing some rather eccentric error flows, I am finding this to not always be the case. Is there a way to have VM recovery wait on HSM to come up? How should we respond if a required storage domain cannot be accessed? Is there a mechanism in vdsm to schedule an operation to be retried at a later time? Perhaps I could just schedule the sync and it could be retried until the required resources are available. I've briefly discussed with Federico some time ago that IMHO the syncVolumeChain needs to be changed. It must not be part of VM's create flow as I expect this quite a bottleneck in big-scale environment (it is now in fact not executing only on recovery but on all 4 create flows!). I don't know how yet, but we need to find a different way. Now you just added yet another reason. So…I too ask for more insights:-) Sure, so... We switched to running syncVolumeChain at all times to cover a very rare scenario: 1. VM is running on host A 2. User initiates Live Merge on VM 3. Host A experiences a catastrophic hardware failure before engine can determine if the merge succeeded or failed 4. VM is restarted on Host B Since (in this case) the host cannot know if a live merge was in progress on the previous host, it needs to always check. Some ideas to mitigate: 1. When engine recreates a VM on a new host and a Live Merge was in progress, engine could call a verb to ask the host to synchronize the volume chain. This way, it only happens when engine knows it's needed and engine can be sure that the required resources (storage connections and domains) are present. This seems like the right approach. +1 I like the only when needed, since indeed we can assume the scenario is unlikely to happen most of the times (but very real indeed) Ok. I will need to expose a synchronizeDisks virt verb for this. It will be called by engine whenever a VM moves between hosts prior to a block job being resolved. ## # @VM.synchronizeDisks: # # Tell vdsm to synchronize disk metadata with the live VM state # # @vmID: The UUID of the VM # # Since: 4.16.0 ## {'command': {'class': 'VM', 'name': 'synchronizeDisks'}, 'data': {'vmID': 'UUID'}} Greg, you can call this after VmStats from the new host indicates that the block job is indeed not there anymore. You want to call it before you fetch the VM definition to check the volume chain. This way you can be sure that the new host has refreshed the config in case it was out of sync. Federico, I am thinking about how to handle the case where someone would try a cold merge here instead of starting the VM. I guess they cannot because engine will have the disk locked. Maybe that is good enough for now. 2. The syncVolumeChain call runs in the recovery case to ensure that we clean up after any missed block job events from libvirt while vdsm was stopped/restarting. can we clean up later on, does it need to be on recovery? Can it be delayed - requested by engine a little bit later? This question is where I could use some help from the experts :) Here is the scenario in question: How serious is a temporary metadata inconsistency? 1. Live merge starts for VM on a host 2. vdsm crashes 3. qemu completes the live merge operation and rewrite the qcow chain 4. libvirt emits an event (missed by vdsm which is not running) 5. vdsm starts and recovers VM At this point, the vm conf has an outdated view of the disk. In the case of an active layer merge, the volumeID of the disk will have changed and at least one volume is removed from the chain. For internal volume merge, just one or more volumes can be missing from the chain. In addition, the metadata on the storage side is out dated. As long as engine submits no operations which depend on an accurate picture of the volume chain until it has called synchronizeDisks() we should be okay. Does vdsm initiate any operations on its own that would be sensitive to this synchronization issue (ie. disk stats)? We need this since vdsm recover running vms when it starts, before engine is connected. Actually engine cannot talk with vdsm until it finished the recovery
Re: [ovirt-devel] [vdsm] VM recovery now depends on HSM
Sorry, adding Greg... On 10/07/14 08:40 +0200, Michal Skrivanek wrote: On Jul 9, 2014, at 15:38 , Nir Soffer nsof...@redhat.com wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Michal Skrivanek michal.skriva...@redhat.com Cc: devel@ovirt.org Sent: Wednesday, July 9, 2014 4:19:09 PM Subject: Re: [ovirt-devel] [vdsm] VM recovery now depends on HSM On 09/07/14 13:11 +0200, Michal Skrivanek wrote: On Jul 8, 2014, at 22:36 , Adam Litke ali...@redhat.com wrote: Hi all, As part of the new live merge feature, when vdsm starts and has to recover existing VMs, it calls VM._syncVolumeChain to ensure that vdsm's view of the volume chain matches libvirt's. This involves two kinds of operations: 1) sync VM object, 2) sync underlying storage metadata via HSM. This means that HSM must be up (and the storage domain(s) that the VM is using must be accessible. When testing some rather eccentric error flows, I am finding this to not always be the case. Is there a way to have VM recovery wait on HSM to come up? How should we respond if a required storage domain cannot be accessed? Is there a mechanism in vdsm to schedule an operation to be retried at a later time? Perhaps I could just schedule the sync and it could be retried until the required resources are available. I've briefly discussed with Federico some time ago that IMHO the syncVolumeChain needs to be changed. It must not be part of VM's create flow as I expect this quite a bottleneck in big-scale environment (it is now in fact not executing only on recovery but on all 4 create flows!). I don't know how yet, but we need to find a different way. Now you just added yet another reason. So…I too ask for more insights:-) Sure, so... We switched to running syncVolumeChain at all times to cover a very rare scenario: 1. VM is running on host A 2. User initiates Live Merge on VM 3. Host A experiences a catastrophic hardware failure before engine can determine if the merge succeeded or failed 4. VM is restarted on Host B Since (in this case) the host cannot know if a live merge was in progress on the previous host, it needs to always check. Some ideas to mitigate: 1. When engine recreates a VM on a new host and a Live Merge was in progress, engine could call a verb to ask the host to synchronize the volume chain. This way, it only happens when engine knows it's needed and engine can be sure that the required resources (storage connections and domains) are present. This seems like the right approach. +1 I like the only when needed, since indeed we can assume the scenario is unlikely to happen most of the times (but very real indeed) Ok. I will need to expose a synchronizeDisks virt verb for this. It will be called by engine whenever a VM moves between hosts prior to a block job being resolved. ## # @VM.synchronizeDisks: # # Tell vdsm to synchronize disk metadata with the live VM state # # @vmID: The UUID of the VM # # Since: 4.16.0 ## {'command': {'class': 'VM', 'name': 'synchronizeDisks'}, 'data': {'vmID': 'UUID'}} Greg, you can call this after VmStats from the new host indicates that the block job is indeed not there anymore. You want to call it before you fetch the VM definition to check the volume chain. This way you can be sure that the new host has refreshed the config in case it was out of sync. Federico, I am thinking about how to handle the case where someone would try a cold merge here instead of starting the VM. I guess they cannot because engine will have the disk locked. Maybe that is good enough for now. 2. The syncVolumeChain call runs in the recovery case to ensure that we clean up after any missed block job events from libvirt while vdsm was stopped/restarting. can we clean up later on, does it need to be on recovery? Can it be delayed - requested by engine a little bit later? This question is where I could use some help from the experts :) Here is the scenario in question: How serious is a temporary metadata inconsistency? 1. Live merge starts for VM on a host 2. vdsm crashes 3. qemu completes the live merge operation and rewrite the qcow chain 4. libvirt emits an event (missed by vdsm which is not running) 5. vdsm starts and recovers VM At this point, the vm conf has an outdated view of the disk. In the case of an active layer merge, the volumeID of the disk will have changed and at least one volume is removed from the chain. For internal volume merge, just one or more volumes can be missing from the chain. In addition, the metadata on the storage side is out dated. As long as engine submits no operations which depend on an accurate picture of the volume chain until it has called synchronizeDisks() we should be okay. Does vdsm initiate any operations on its own that would be sensitive to this synchronization issue (ie. disk stats)? We need this since vdsm recover running vms when it starts, before engine is connected. Actually engine cannot talk with vdsm until
Re: [ovirt-devel] [vdsm] VM recovery now depends on HSM
On 09/07/14 13:11 +0200, Michal Skrivanek wrote: On Jul 8, 2014, at 22:36 , Adam Litke ali...@redhat.com wrote: Hi all, As part of the new live merge feature, when vdsm starts and has to recover existing VMs, it calls VM._syncVolumeChain to ensure that vdsm's view of the volume chain matches libvirt's. This involves two kinds of operations: 1) sync VM object, 2) sync underlying storage metadata via HSM. This means that HSM must be up (and the storage domain(s) that the VM is using must be accessible. When testing some rather eccentric error flows, I am finding this to not always be the case. Is there a way to have VM recovery wait on HSM to come up? How should we respond if a required storage domain cannot be accessed? Is there a mechanism in vdsm to schedule an operation to be retried at a later time? Perhaps I could just schedule the sync and it could be retried until the required resources are available. I've briefly discussed with Federico some time ago that IMHO the syncVolumeChain needs to be changed. It must not be part of VM's create flow as I expect this quite a bottleneck in big-scale environment (it is now in fact not executing only on recovery but on all 4 create flows!). I don't know how yet, but we need to find a different way. Now you just added yet another reason. So…I too ask for more insights:-) Sure, so... We switched to running syncVolumeChain at all times to cover a very rare scenario: 1. VM is running on host A 2. User initiates Live Merge on VM 3. Host A experiences a catastrophic hardware failure before engine can determine if the merge succeeded or failed 4. VM is restarted on Host B Since (in this case) the host cannot know if a live merge was in progress on the previous host, it needs to always check. Some ideas to mitigate: 1. When engine recreates a VM on a new host and a Live Merge was in progress, engine could call a verb to ask the host to synchronize the volume chain. This way, it only happens when engine knows it's needed and engine can be sure that the required resources (storage connections and domains) are present. 2. The syncVolumeChain call runs in the recovery case to ensure that we clean up after any missed block job events from libvirt while vdsm was stopped/restarting. In this case, the block job info is saved in the vm conf so the recovery flow could be changed to query libvirt for block job status on only those disks where we know about a previous operation. For those found gone, we'd call syncVolumeChain. In this scenario, we still have to deal with the race with HSM initialization and storage connectivity issues. Perhaps engine should drive this case as well? -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] oVirt 3.5 Test Day 1 Results
I tested: * [RFE] Prevent host fencing while kdumping - http://www.ovirt.org/Fence_kdump * hosted-engine-setup Results: Bug 1115123 -- hosted-engine-setup fails with ioprocess oop_impl enabled Adding a host with Detect kdump flow set to on and without crashkernel command line parameter results in a warning in the log as expected. I ran out of time before I was able to configure crash dump detection for my host VMs correctly. Looking forward to more thorough testing on the next test day. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] [vdsm] VM recovery now depends on HSM
Hi all, As part of the new live merge feature, when vdsm starts and has to recover existing VMs, it calls VM._syncVolumeChain to ensure that vdsm's view of the volume chain matches libvirt's. This involves two kinds of operations: 1) sync VM object, 2) sync underlying storage metadata via HSM. This means that HSM must be up (and the storage domain(s) that the VM is using must be accessible. When testing some rather eccentric error flows, I am finding this to not always be the case. Is there a way to have VM recovery wait on HSM to come up? How should we respond if a required storage domain cannot be accessed? Is there a mechanism in vdsm to schedule an operation to be retried at a later time? Perhaps I could just schedule the sync and it could be retried until the required resources are available. Thanks for your insights. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] [VDSM][sampling] thread pool status and handling of stuck calls
On 07/07/14 10:53 -0400, Nir Soffer wrote: * _sampleVmJobs uses virDomainBLockJobInfo, which needs to enter the QEMU monitor. However, this needs to run only if there are active block jobs. I don't have numbers, but I expect this sampler to be idle most of time. Adam: This is related to live merge right? Yes, it is currently used only for live merge. It only calls libvirt when it expects a job to be running so indeed it is pretty much a noop most of the time. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] oVirt's MoM feature list ?
On 13/06/14 11:52 +, Vinod, Chegu wrote: Cc'ng Gilad Vinod From: Vinod, Chegu Sent: Friday, June 13, 2014 4:45 AM To: ali...@redhat.com Subject: oVirt's MoM feature list ? Hi Adam, Where can I find some information about the future features/enhancements that are planned in MoM ? Perhaps it was discussed already in some email group or in some presentation...If yes can you please point me to the same ? This is a great question for the devel list (added to cc:). I think Doron and Martin (added to cc:) will be able to give some better responses to this as well. -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] [vdsm] Unifying VM device representations
Hi Martin, I noticed that you are working on some patches to refactor the VM devices and deprecate self.conf['devices']. I am a big fan of this because my Live Merge code is far more complex than it should be since some information lives in self.conf['devices'] and some lives in self.devices. Are you planning on changing the recovery code save/recovery of vm.conf to work with the new device container you are creating? It would be nice to get my code working entirely independent of self.conf['devices'] if possible. Also, when are you aiming to have this work completed? Live Merge is needed for 3.5. Will this work be ready before then? Thanks! -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] vdsm tasks API design discussion
feature. Isn't this what we're trying to avoid? - Original Message - From: Adam Litke ali...@redhat.com To: Dan Kenigsberg dan...@redhat.com Cc: smizr...@redhat.com, ybronhei ybron...@redhat.com, devel@ovirt.org Sent: Thursday, May 1, 2014 8:28:14 PM Subject: Re: [ovirt-devel] short recap of last vdsm call (15.4.2014) On 01/05/14 17:53 +0100, Dan Kenigsberg wrote: On Wed, Apr 30, 2014 at 01:26:18PM -0400, Adam Litke wrote: On 30/04/14 14:22 +0100, Dan Kenigsberg wrote: On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote: hey, somehow we missed the summary of this call, and few big issues were raised there. so i would like to share it with all and hear more comments - task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu. Adam, Federico, may I revisit this question from another angle? Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process. A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats. Why would Engine ever want to initiate a new live merge of a (vmId,diskId) before it has a conclusive result of the previous success/failure of the previous attempt? As far as I understand, this should never happen, and it's actually good for the API to force avoidence of such a case. Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward. I do not think that having a (virtual) table of task_id - vmId,diskId in Vdsm is much simpler than having it on the Engine machine. It needs to go somewhere. As the designers of the API we felt it would be better for vdsm to hide the semantics of when a vmId,diskId tuple can be considered a unique identifier. If we ever do generalize the concept of a transient task to other users (setupNetworks, etc) it would be a far more consumable API if engine didn't need to handle a bunch of special cases about what constitutes a job ID and the specifics of its lifetime. UUIDs are simple and already well-supported. Why make it more difficult than it has to be? I still find the nothion of a new framework for async tasks quite useful. But as I requested before, I think we should design it first, so it fits all conceivable users. In particular, if we should not tie it to the existence of a running VM. We'd better settle on persistence semantics that works for everybody (such as network tasks). Last time, the idea was struck down by Saggi and others from infra, who are afraid to repeat mistakes from the current task framework. Several famous quotes apply here. The only thing we have to fear is fear itself :) Sometimes perfect is the enemy of good. Tasks redesign was always going to be driven by the need to implement one feature at first. It just so happens that we volunteered to take a stab at it for live merge. It's clear that we won't be able to completely replace the old tasks and get this feature out in one pass. We believe the general principles of our tasks are generally extensible to cover new use cases in the future: * Jobs are given an engine-supplied UUID when started * There is a well-known way to check if a job is running or not * There is a well-known way to test if a finished job succeeded or failed. I believe we did spend quite a bit of time in March coming up with a design for NG tasks. Unfortunately it was infra who made our jobs vm-specific by requiring the job status to be passed by getVMStats rather than an object-agnostic getJobsStatus stand-alone API that could conglomerate all job types into a single response. If we do not have a task id, we do not need to worry on how to pass it, and where to persist it. There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event. But if Vdsm dies before it managed to clean up, Engine would have to perform the cleanup via another host. So having this short-loop cleanup is redundant. Fair enough. We'll be doing the volume chain scan for every native VM disk at VM startup. The only exception is if we are recovering
Re: [ovirt-devel] vdsm tasks API design discussion
and over. You could also have a special field containing the version of the configuration (I would make it a hash or a UUID and not a running number) that you would persist locally on the host after you finished configuring since the local host is the scope of setupNetworks(). Hmm, interesing. It would save time and effort on scanning network properties. But you are introducing the persistence of task end-state. I thought this was something we are trying to avoid. It would allow you to not care about any of the error state keep sending the same configuration if you think something bad happened until you the is what you expect it to be or and error response actually manages find it's way back to you. By using the same task ID you are guaranteed to only have the operation running once at a time. I don't mind helping anyone with making their algorithms work but there is no escaping from the limitations listed above. If we want to make oVirt truly scalable and robust we have to start thinking about algorithms that work despite of errors and not just have error flows. Agreed. This is what the ngTasks framework is supposed to achieve for us. I think you are conflating the issue of listing active operations and high level flow design. If the async operations that make up a complex flow are themselves idempotent, then we have achieved the above. It can be done with or without a vdsm api to list running jobs. Notice I don't even mention different systems of persistence and some tasks that you should be able to get state information about from more than one host. Some Jobs can survive a VDSM restart since it's not in VDSM like stuff in gluster or QEmu. Yep, live merge is one such job. While we don't persist the job, we do remember that it was running so we can synchronize our state with the underlying hypervisor when we restart. To make it clear, the task API shouldn't really be that useful. Task IDs are just there to match requests to responses internally because as I explained, jobs are hard to manage generally in such a system. This by no way means that if we see a use case emerging that requires some sort of infra we would not do it. I just think it would probably be tied to some common algorithm or idiom than something truly generic used by every API call. Maybe we are talking about two different things that cannot be combined. All I want is a generic way to list ongoing host-level operations that will be useful for live merge and others. If all you want is a protocol syncronization mechanism in the style of QMP then that is different. Perhaps we need both. I'll be happy to keep the jobID as a formal API parameter and other new APIs that spawn long-running operations could do the same. Then whatever token you want to pass on the wire does not matter to me at all. Hope I made things clearer, sorry if I came out a bit rude. I'm off, I have my country's birthday to celebrate. Thanks for participating in the discussion. In the end we will end up with superior code than if we had not had this discussion. Happy Yom HaAtzmaut! -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] short recap of last vdsm call (15.4.2014)
On 30/04/14 14:22 +0100, Dan Kenigsberg wrote: On Tue, Apr 22, 2014 at 02:54:29PM +0300, ybronhei wrote: hey, somehow we missed the summary of this call, and few big issues were raised there. so i would like to share it with all and hear more comments - task id in http header - allows engine to initiate calls with id instead of following vdsm response - federico already started this work, and this is mandatory for live merge feature afaiu. Adam, Federico, may I revisit this question from another angle? Why does Vdsm needs to know live-merge's task id? As far as I understand, (vmid, disk id) are enough to identify a live merge process. A vmId + diskId can uniquely identify a block job at a single moment in time since qemu guarantees that only a single block job can run at any given point in time. But this gives us no way to differentiate two sequential jobs that run on the same disk. Therefore, without having an engine-supplied jobID, we can never be sure if a one job finished and another started since the last time we polled stats. Additionally, engine-supplied UUIDs is part of a developing framework for next-generation async tasks. Engine prefers to use a single identifier to represent any kind of task (rather than some problem domain specific combination of UUIDs). Adhering to this rule will help us to converge on a single implementation of ng async tasks moving forward. If we do not have a task id, we do not need to worry on how to pass it, and where to persist it. There are at least 3 reasons to persist a block job ID: * To associate a specific block job operation with a specific engine-initiated flow. * So that you can clean up after a job that completed when vdsm could not receive the completion event. * Since we must ask libvirt about block job events on a per VM, per disk basis, tracking the devices on which we expect block jobs enables us to eliminate wasteful calls to libvirt. Hope this makes the rationale a bit clearer... -- Adam Litke ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [Engine-devel] Share Your Thoughts
On 23/03/14 10:36 -0400, Gilad Chaplik wrote: AuditLog gets recycled after 30 days. the reason i stopped my VM may still be relevant. I would not make fields complex/composite. they need to be easily useable via the CLI for example. I think we need multiple comments, so we need to think about the RESTful api anyhow. I guess that next feature will be a reason for 'wipe after stop'/any other BE that needs reasoning. What about a new DB table (maybe called Annotations) that takes a business entity type, UUID, action type, timestamp, and reason string. Then the shutdown reason could be entered as a new row in the DB. It can be kept as long as we want it and views can be adjusted to make these fields searchable. -- Adam Litke ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] Asynchronous tasks for live merge
On 03/03/14 14:28 +, Dan Kenigsberg wrote: On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote: Hi all, As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM. The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed. You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic hsm tasks? Sure, I am happy to have that conversation :) If I understand correctly, HSM tasks, while ideal, might be too complex to get right and would block the Live Merge feature for longer than we would like. Has anyone looked into what it would take to implement a HSM Tasks framework like this in vdsm? Are there any WIP implementations? If the scope of this is not too big, it can be completed relatively quickly, and the resulting implementation would cover all known use cases, then this could be worth it. It's important to support Live Merge soon. Regarding deprecation of the current tasks API: Could your suggested HSM Tasks framework be extended to cover SPM/SDM tasks as well? I would hope that a it could. In that case, we could look forward to a unified async task architecture in vdsm. I suggest to have something loosely modeled on posix fork/wait. - Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error. - Engine may monitor the task (a-la wait(WNOHANG)) Allon has communicated a desire to limit engine-side polling. Perhaps the active tasks could be added to the host stats? - When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host. This could be a good comprimise. I hate the idea of requiring engine to play janitor and clean up stale vdsm data, but there is not much better of a way to do it. Allowing reboot to auto-clear tasks will at least provide some backstop to how long tasks could pile up if forgotten. This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all. If we were to consider this, I would want to vet the architecture against all known use cases for tasks to make sure we don't need to create a new framework in 3 months. Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho. Are we okay with abandoning vdsm-side rollback entirely as we move forward? Won't that be a regression for at least some error flows (especially in the realm of SPM tasks)? 5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into? Thanks in advance for taking the time to think about this flow and for providing
Re: [Engine-devel] Asynchronous tasks for live merge
On 03/03/14 16:36 +0200, Itamar Heim wrote: On 03/03/2014 04:28 PM, Dan Kenigsberg wrote: On Fri, Feb 28, 2014 at 09:30:16AM -0500, Adam Litke wrote: Hi all, As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM. The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed. You describe a reasonable way for Vdsm to report whether an async operation has finished. However, may we instead use the oportunity to introduce generic hsm tasks? I suggest to have something loosely modeled on posix fork/wait. - Engine asks Vdsm to start an API verb asynchronously and supplies a uuid. This is unlike fork(2), where the system chooses the pid, but that's required so that Engine could tell if the command has reached Vdsm in case of a network error. - Engine may monitor the task (a-la wait(WNOHANG)) - When the task is finished, Engine may collect its result (a-la wait). Until that happens, Vdsm must report the task forever; restart or upgrade are no excuses. On reboot, though, all tasks are forgotten, so Engine may stop monitoring tasks on a fenced host. This may be an over kill for your use case, but it would come useful for other cases. In particular, setupNetwork returns before it is completely done, since dhcp address acquisition may take too much time. Engine may poll getVdsCaps to see when it's done (or timeout), but it would be nicer to have a generic mechanism that can serve us all. Note that I'm suggesting a completely new task framwork, at least on Vdsm side, as the current one (with its broken persistence, arcane states and never-reliable rollback) is beyond redemption, imho. 5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into? Thanks in advance for taking the time to think about this flow and for providing your insights! ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel the way i read Adam's proposal, there is no task entity at vdsm side to monitor, rather the state of the object the operation is performed on (similar to CreateVM, where the engine monitors the state of the VM, rather than the CreateVM request). Yeah, we use the term job in order to avoid assumptions and implications (ie. rollback/cancel, persistence) that come with the word task. Job essentially means libvirt Block Job, but I am trying to allow for extension in the future. Vdsm would collect block job information for devices it expects to have active block jobs and report them all under a single structure in the VM statistics. There would be no persistence of information so when a libvirt block job goes poof, vdsm will stop reporting it. -- Adam Litke ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
[Engine-devel] Schema upgrade failure on master
Hi, I've recently rebased to master and it looks like the 03_05_0050_event_notification_methods.sql script is failing on schema upgrade. Is this a bug or am I doing something wrong? To upgrade I did the normal proceedure with my development installation: make install-dev ... ~/ovirt/bin/engine-setup Got this result in the log file: psql:/home/alitke/ovirt-**FILTERED**/share/ovirt-**FILTERED**/dbscripts/upgrade/03_05_0050_event_notification_methods.sql:10: ERROR: column notification_method contains null values FATAL: Cannot execute sql command: --file=/home/alitke/ovirt-**FILTERED**/share/ovirt-**FILTERED**/dbscripts/upgrade/03_05_0050_event_notification_methods.sql 2014-03-03 17:20:34 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File /usr/lib/python2.7/site-packages/otopi/context.py, line 142, in _executeMethod method['method']() File /home/alitke/ovirt-**FILTERED**/share/ovirt-**FILTERED**/setup/bin/../plugins/ovirt-**FILTERED**-setup/ovirt-**FILTERED**/db/schema.py, line 280, in _misc osetupcons.DBEnv.PGPASS_FILE File /usr/lib/python2.7/site-packages/otopi/plugin.py, line 451, in execute command=args[0], RuntimeError: Command '/home/alitke/ovirt-**FILTERED**/share/ovirt-**FILTERED**/dbscripts/schema.sh' failed to execute -- Adam Litke ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
[Engine-devel] Asynchronous tasks for live merge
Hi all, As part of our plan to support live merging of VM disk snapshots it seems we will need a new form of asynchronous task in ovirt-engine. I am aware of AsyncTaskManager but it seems to be limited to managing SPM tasks. For live merge, we are going to need something called VmTasks since the async command can be run only on the host that currently runs the VM. The way I see this working from an engine perspective is: 1. RemoveSnapshotCommand in bll is invoked as usual but since the VM is found to be up, we activate an alternative live merge flow. 2. We submit a LiveMerge VDS Command for each impacted disk. This is an asynchronous command which we need to monitor for completion. 3. A VmJob is inserted into the DB so we'll remember to handle it. 4. The VDS Broker monitors the operation via an extension to the already collected VmStatistics data. Vdsm will report active Block Jobs only. Once the job stops (in error or success) it will cease to be reported by vdsm and engine will know to proceed. 5. When the job has completed, VDS Broker raises an event up to bll. Maybe this could be done via VmJobDAO on the stored VmJob? 6. Bll receives the event and issues a series of VDS commands to complete the operation: a) Verify the new image chain matches our expectations (the snap is no longer present in the chain). b) Delete the snapshot volume c) Remove the VmJob from the DB Could you guys review this proposed flow for sanity? The main conceptual gaps I am left with concern #5 and #6. What is the appropriate way for VDSBroker to communicate with BLL? Is there an event mechanism I can explore or should I use the database? I am leaning toward the database because it is persistent and will ensure #6 gets completed even if engine is restarted somewhere in the middle. For #6, is there an existing polling / event loop in bll that I can plug into? Thanks in advance for taking the time to think about this flow and for providing your insights! -- Adam Litke ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] mom RPMs for 3.4
On 01/02/14 22:48 +, Dan Kenigsberg wrote: On Fri, Jan 31, 2014 at 04:56:12PM -0500, Adam Litke wrote: On 31/01/14 08:36 +0100, Sandro Bonazzola wrote: Il 30/01/2014 19:30, Adam Litke ha scritto: On 30/01/14 18:13 +, Dan Kenigsberg wrote: On Thu, Jan 30, 2014 at 11:49:42AM -0500, Adam Litke wrote: Hi Sandro, After updating the MOM project's build system, I have used jenkins to produce a set of RPMs that I would like to tag into the oVirt 3.4 release. Please see the jenkins job [1] for the relevant artifacts for EL6[2], F19[3], and F20[4]. Dan, should I submit a patch to vdsm to make it require mom = 0.4.0? I want to be careful to not break people's environments this late in the 3.4 release cycle. What is the best way to minimize that damage? Hey, we're during beta. I prefer making this requirement explicit now over having users with supervdsmd.log retate due to log spam. In that case, Sandro, can you let me know when those RPMs hit the ovirt repos (for master and 3.4) and then I will submit a patch to vdsm to require the new version. mom 0.4.0 has been built in last night nightly job [1] and published to nightly by publisher job [2] so it's already available on nightly [3] For 3.4.0, it has been planned [4] a beta 2 release on 2014-02-06 so we'll include your builds in that release. I presume the scripting for 3.4 release rpms will produce a version without the git-rev based suffix: ie. mom-0.4.0-1.rpm? I need to figure out how to handle a problem that might be a bit unique to mom. MOM is used by non-oVirt users who install it from the main Fedora repository. I think it's fine that we are producing our own rpms in oVirt (that may have additional patches applied and may resync to upstream mom code more frequently than would be desired for the main Fedora repository). Given this, I think it makes sense to tag the oVirt RPMs with a special version suffix to indicate that these are oVirt produced and not upstream Fedora. For example: The next Fedora update will be mom-0.4.0-1.f20.rpm. The next oVirt update will be mom-0.4.0-1ovirt.f20.rpm. Is this the best practice for accomplishing my goals? One other thing I'd like to have the option of doing is to make vdsm depend on an ovirt distribution of mom so that the upstream Fedora version will not satisfy the dependency for vdsm. What is the motivation for this? You would not like to bother Fedora users with updates that are required only for oVirt? Yes, that was my thinking. It seems that oVirt requires updates more frequently than users that use MOM with libvirt directly and the Fedora update process is a bit more heavy than oVirt's at the moment. Vdsm itself is built, signed, and distributed via Fedora. It is also copied into the ovirt repo, for completeness sake. Could MoM do the same? If vdsm is finding this to work well than surely I can do the same with MOM. The 0.4.0 build is in updates-testing right now and should be able to be tagged stable in a day or two. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] mom RPMs for 3.4
On 31/01/14 08:36 +0100, Sandro Bonazzola wrote: Il 30/01/2014 19:30, Adam Litke ha scritto: On 30/01/14 18:13 +, Dan Kenigsberg wrote: On Thu, Jan 30, 2014 at 11:49:42AM -0500, Adam Litke wrote: Hi Sandro, After updating the MOM project's build system, I have used jenkins to produce a set of RPMs that I would like to tag into the oVirt 3.4 release. Please see the jenkins job [1] for the relevant artifacts for EL6[2], F19[3], and F20[4]. Dan, should I submit a patch to vdsm to make it require mom = 0.4.0? I want to be careful to not break people's environments this late in the 3.4 release cycle. What is the best way to minimize that damage? Hey, we're during beta. I prefer making this requirement explicit now over having users with supervdsmd.log retate due to log spam. In that case, Sandro, can you let me know when those RPMs hit the ovirt repos (for master and 3.4) and then I will submit a patch to vdsm to require the new version. mom 0.4.0 has been built in last night nightly job [1] and published to nightly by publisher job [2] so it's already available on nightly [3] For 3.4.0, it has been planned [4] a beta 2 release on 2014-02-06 so we'll include your builds in that release. I presume the scripting for 3.4 release rpms will produce a version without the git-rev based suffix: ie. mom-0.4.0-1.rpm? I need to figure out how to handle a problem that might be a bit unique to mom. MOM is used by non-oVirt users who install it from the main Fedora repository. I think it's fine that we are producing our own rpms in oVirt (that may have additional patches applied and may resync to upstream mom code more frequently than would be desired for the main Fedora repository). Given this, I think it makes sense to tag the oVirt RPMs with a special version suffix to indicate that these are oVirt produced and not upstream Fedora. For example: The next Fedora update will be mom-0.4.0-1.f20.rpm. The next oVirt update will be mom-0.4.0-1ovirt.f20.rpm. Is this the best practice for accomplishing my goals? One other thing I'd like to have the option of doing is to make vdsm depend on an ovirt distribution of mom so that the upstream Fedora version will not satisfy the dependency for vdsm. Thoughts? ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
[Engine-devel] mom RPMs for 3.4
Hi Sandro, After updating the MOM project's build system, I have used jenkins to produce a set of RPMs that I would like to tag into the oVirt 3.4 release. Please see the jenkins job [1] for the relevant artifacts for EL6[2], F19[3], and F20[4]. Dan, should I submit a patch to vdsm to make it require mom = 0.4.0? I want to be careful to not break people's environments this late in the 3.4 release cycle. What is the best way to minimize that damage? [1] http://jenkins.ovirt.org/view/All/job/manual-build-tarball/179/ [2] http://jenkins.ovirt.org/view/All/job/manual-build-tarball/179/label=centos6-host/artifact/exported-artifacts/mom-0.4.0-1.el6.noarch.rpm [3] http://jenkins.ovirt.org/view/All/job/manual-build-tarball/179/label=fedora19-host/artifact/exported-artifacts/mom-0.4.0-1.fc19.noarch.rpm [4] http://jenkins.ovirt.org/view/All/job/manual-build-tarball/179/label=fedora20-host/artifact/exported-artifacts/mom-0.4.0-1.fc20.noarch.rpm ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] mom RPMs for 3.4
On 30/01/14 18:13 +, Dan Kenigsberg wrote: On Thu, Jan 30, 2014 at 11:49:42AM -0500, Adam Litke wrote: Hi Sandro, After updating the MOM project's build system, I have used jenkins to produce a set of RPMs that I would like to tag into the oVirt 3.4 release. Please see the jenkins job [1] for the relevant artifacts for EL6[2], F19[3], and F20[4]. Dan, should I submit a patch to vdsm to make it require mom = 0.4.0? I want to be careful to not break people's environments this late in the 3.4 release cycle. What is the best way to minimize that damage? Hey, we're during beta. I prefer making this requirement explicit now over having users with supervdsmd.log retate due to log spam. In that case, Sandro, can you let me know when those RPMs hit the ovirt repos (for master and 3.4) and then I will submit a patch to vdsm to require the new version. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] Copy reviewer scores on trivial rebase/commit msg changes
On 18/01/14 01:48 +0200, Itamar Heim wrote: I'd like to enable these - comments welcome: 1. label.Label-Name.copyAllScoresOnTrivialRebase If true, all scores for the label are copied forward when a new patch set is uploaded that is a trivial rebase. A new patch set is considered as trivial rebase if the commit message is the same as in the previous patch set and if it has the same code delta as the previous patch set. This is the case if the change was rebased onto a different parent. This can be used to enable sticky approvals, reducing turn-around for trivial rebases prior to submitting a change. Defaults to false. 2. label.Label-Name.copyAllScoresIfNoCodeChange If true, all scores for the label are copied forward when a new patch set is uploaded that has the same parent commit as the previous patch set and the same code delta as the previous patch set. This means only the commit message is different. This can be used to enable sticky approvals on labels that only depend on the code, reducing turn-around if only the commit message is changed prior to submitting a change. Defaults to false. I am a bit late to the party but +1 from me for trying both. I guess it will be quite rare that something bad happens here. So unlikely, that the time saved on all the previous patches will far offset the lost time for fixing the corner cases. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] oVirt 3.4.0 alpha repository closure failure
On 10/01/14 10:01 +, Dan Kenigsberg wrote: On Fri, Jan 10, 2014 at 08:48:52AM +0100, Sandro Bonazzola wrote: Hi, oVirt 3.4.0 alpha repository has been composed but alpha has not been announced due to repository closure failures: on CentOS 6.5: # repoclosure -r ovirt-3.4.0-alpha -l ovirt-3.3.2 -l base -l epel -l glusterfs-epel -l updates -l extra -l glusterfs-noarch-epel -l ovirt-stable -n Reading in repository metadata - please wait Checking Dependencies Repos looked at: 8 base epel glusterfs-epel glusterfs-noarch-epel ovirt-3.3.2 ovirt-3.4.0-alpha ovirt-stable updates Num Packages in Repos: 16581 package: mom-0.3.2-20140101.git2691f25.el6.noarch from ovirt-3.4.0-alpha unresolved deps: procps-ng Adam, this seems like a real bug in http://gerrit.ovirt.org/#/c/22087/ : el6 still carries the older procps (which is, btw, provided by procps-ng). Done. http://gerrit.ovirt.org/23137 package: vdsm-hook-vhostmd-4.14.0-1.git6fdd55f.el6.noarch from ovirt-3.4.0-alpha unresolved deps: vhostmd Douglas, could you add a with_vhostmd option to the spec, and have it default to 0 on el*, and to 1 on fedoras? Thanks, Dan. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] UI: VM list not populating
On 06/01/14 11:41 -0500, Alexander Wels wrote: On Monday, January 06, 2014 11:27:07 AM Adam Litke wrote: On 06/01/14 11:19 -0500, Alexander Wels wrote: Adam, Is this just when you first login into the webadmin or whenever you go to the VM tab? In other words if you login, then switch to the templates tab and back again to the VM tab does it still not load? What about when you manually refresh the grid? Thanks for the quick response! It doesn't load at all -- first time or any other time when revisiting. In some cases in the past I would have luck by clicking the blue refresh icon but that doesn't help either. I have force refreshed the browser (Chrome) to no avail. I guess the next step is to completely restart the browser (hmm, no luck there either). Okay, then something else is going on, are there any errors in the server log? From server.log there are no ERRORs but this message may be related: 2014-01-06 13:01:54,209 WARN [org.jboss.resteasy.spi.ResteasyDeployment] (http--0.0.0.0-8080-4) Application.getSingletons() returned unknown class type: org.ovirt.engine.api.restapi.util.VmHelper Alexander On Monday, January 06, 2014 11:02:02 AM Adam Litke wrote: Hi all, I am working with the latest ovirt-engine git and am finding some strange behavior with the UI. The list of VMs never populates and I am stuck with the loading indicator. All other tabs behave normally (Hosts, Templates, Storage, etc). Also, the list of VMs can be loaded normally using the REST API. Any ideas what may be causing this strange behavior? ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] UI: VM list not populating
On 06/01/14 11:44 -0500, Einav Cohen wrote: - Original Message - From: Alexander Wels aw...@redhat.com Sent: Monday, January 6, 2014 11:41:38 AM On Monday, January 06, 2014 11:27:07 AM Adam Litke wrote: On 06/01/14 11:19 -0500, Alexander Wels wrote: Adam, Is this just when you first login into the webadmin or whenever you go to the VM tab? In other words if you login, then switch to the templates tab and back again to the VM tab does it still not load? What about when you manually refresh the grid? Thanks for the quick response! It doesn't load at all -- first time or any other time when revisiting. In some cases in the past I would have luck by clicking the blue refresh icon but that doesn't help either. I have force refreshed the browser (Chrome) to no avail. I guess the next step is to completely restart the browser (hmm, no luck there either). Okay, then something else is going on, are there any errors in the server log? In addition to server logs: maybe also provide client logs (see instructions in [1])? thanks. [1] http://lists.ovirt.org/pipermail/users/2013-December/018494.html GET http://localhost:8080/ovirt-engine/webadmin/Reports.xml 404 (Not Found) 4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:16328 Mon Jan 06 13:05:00 GMT-500 2014 com.google.gwt.logging.client.LogConfiguration SEVERE: (TypeError) stack: TypeError: Cannot call method 'kk' of null at LSj (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:11622:58) at Object.JTl [as h_] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17112:15349) at l7j (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:15200:166) at Object.n7j (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:16142:328) at r2j (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:15199:140) at rjk (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:7040:19) at Object._jk [as qT] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17088:17294) at Object.r5j [as tV] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17085:15904) at hIj (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:15873:85) at Object.kIj [as tV] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17082:510) at uKj (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:11859:40) at Object.xKj [as tV] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17082:20018) at OJj (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:15471:172) at Object.RJj [as Ch] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17082:19443) at Object.jAd [as ue] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17019:23272) at cR (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:14512:137) at Object.vR (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17019:13248) at XMLHttpRequest.anonymous (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:11884:65) at _q (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:8351:29) at cr (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:15114:57) at XMLHttpRequest.anonymous (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:12521:45): Cannot call method 'kk' of null com.google.gwt.core.client.JavaScriptException: (TypeError) stack: TypeError: Cannot call method 'kk' of null at LSj (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:11622:58) at Object.JTl [as h_] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17112:15349) at l7j (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:15200:166) at Object.n7j (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:16142:328) at r2j (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:15199:140) at rjk (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:7040:19) at Object._jk [as qT] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17088:17294) at Object.r5j [as tV] (http://localhost:8080/ovirt-engine/webadmin/4DD22D2F78BB84E2940BB7ADF6163F25.cache.html:17085:15904
Re: [Engine-devel] UI: VM list not populating
On 06/01/14 13:12 -0500, Alexander Wels wrote: On Monday, January 06, 2014 01:03:31 PM Adam Litke wrote: On 06/01/14 11:41 -0500, Alexander Wels wrote: On Monday, January 06, 2014 11:27:07 AM Adam Litke wrote: On 06/01/14 11:19 -0500, Alexander Wels wrote: Adam, Is this just when you first login into the webadmin or whenever you go to the VM tab? In other words if you login, then switch to the templates tab and back again to the VM tab does it still not load? What about when you manually refresh the grid? Thanks for the quick response! It doesn't load at all -- first time or any other time when revisiting. In some cases in the past I would have luck by clicking the blue refresh icon but that doesn't help either. I have force refreshed the browser (Chrome) to no avail. I guess the next step is to completely restart the browser (hmm, no luck there either). Okay, then something else is going on, are there any errors in the server log? From server.log there are no ERRORs but this message may be related: 2014-01-06 13:01:54,209 WARN [org.jboss.resteasy.spi.ResteasyDeployment] (http--0.0.0.0-8080-4) Application.getSingletons() returned unknown class type: org.ovirt.engine.api.restapi.util.VmHelper Don't think that is related, as currently the web admin uses GWT RPC to communicate with the engine, and not the REST interface. So, ovirt-engine/var/log/ovirt-engine/server.log and ovirt- engine/var/log/ovirt-engine/engine.log Have nothing in them? From engine.log: 2014-01-06 13:10:34,428 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-50) Executing a command: java.util.concurrent.FutureTask , but note that there are 0 tasks in the queue. This repeats quite regularly... Other than that, nothing looks relavent. Alexander On Monday, January 06, 2014 11:02:02 AM Adam Litke wrote: Hi all, I am working with the latest ovirt-engine git and am finding some strange behavior with the UI. The list of VMs never populates and I am stuck with the loading indicator. All other tabs behave normally (Hosts, Templates, Storage, etc). Also, the list of VMs can be loaded normally using the REST API. Any ideas what may be causing this strange behavior? ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] UI: VM list not populating
On 06/01/14 13:30 -0500, Alexander Wels wrote: Yes either compile in PRETTY mode or run in GWT debug mode. Depending on how comfortable you are with doing either one. Ok I think we're getting somewhere... When compiled in draft mode the client errors look like this: GET http://localhost:8080/ovirt-engine/webadmin/Reports.xml 404 (Not Found) C5287D41B71197763AB3125431813688.cache.html:44792 Mon Jan 06 14:08:21 GMT-500 2014 com.google.gwt.logging.client.LogConfiguration SEVERE: (TypeError) stack: TypeError: Cannot call method 'get__Ljava_lang_Object_2Ljava_lang_Object_2' of null at org_ovirt_engine_ui_uicommonweb_dataprovider_AsyncDataProvider_getDisplayTypes__ILorg_ovirt_engine_core_compat_Version_2Ljava_util_List_2 (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:180644:360) at org_ovirt_engine_ui_uicommonweb_dataprovider_AsyncDataProvider_hasSpiceSupport__ILorg_ovirt_engine_core_compat_Version_2Z (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:181442:10) at Object.org_ovirt_engine_ui_uicommonweb_models_vms_SpiceConsoleModel_canBeSelected__Z [as canBeSelected__Z] (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:248240:199) at org_ovirt_engine_ui_uicommonweb_models_VmConsolesImpl_$canSelectProtocol__Lorg_ovirt_engine_ui_uicommonweb_models_VmConsolesImpl_2Lorg_ovirt_engine_ui_uicommonweb_models_ConsoleProtocol_2Z (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:187341:282) at org_ovirt_engine_ui_uicommonweb_models_VmConsolesImpl_$setDefaultSelectedProtocol__Lorg_ovirt_engine_ui_uicommonweb_models_VmConsolesImpl_2V (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:187391:9) at Object.org_ovirt_engine_ui_uicommonweb_models_VmConsolesImpl_VmConsolesImpl__Lorg_ovirt_engine_core_common_businessentities_VM_2Lorg_ovirt_engine_ui_uicommonweb_models_Model_2Lorg_ovirt_engine_ui_uicommonweb_ConsoleOptionsFrontendPersister$ConsoleContext_2V (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:187407:3) at org_ovirt_engine_ui_uicommonweb_models_ConsoleModelsCache_$updateCache__Lorg_ovirt_engine_ui_uicommonweb_models_ConsoleModelsCache_2Ljava_lang_Iterable_2V (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:185252:1037) at org_ovirt_engine_ui_uicommonweb_models_vms_VmListModel_$setItems__Lorg_ovirt_engine_ui_uicommonweb_models_vms_VmListModel_2Ljava_lang_Iterable_2V (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:194985:3) at Object.org_ovirt_engine_ui_uicommonweb_models_vms_VmListModel_setItems__Ljava_lang_Iterable_2V [as setItems__Ljava_lang_Iterable_2V] (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:195275:3) at Object.org_ovirt_engine_ui_uicommonweb_models_SearchableListModel$2_onSuccess__Ljava_lang_Object_2Ljava_lang_Object_2V [as onSuccess__Ljava_lang_Object_2Ljava_lang_Object_2V] (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:186570:23) at org_ovirt_engine_ui_frontend_Frontend$1_$onSuccess__Lorg_ovirt_engine_ui_frontend_Frontend$1_2Lorg_ovirt_engine_ui_frontend_communication_VdcOperation_2Lorg_ovirt_engine_core_common_queries_VdcQueryReturnValue_2V (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:168839:1451) at Object.org_ovirt_engine_ui_frontend_Frontend$1_onSuccess__Ljava_lang_Object_2Ljava_lang_Object_2V [as onSuccess__Ljava_lang_Object_2Ljava_lang_Object_2V] (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:168871:3) at org_ovirt_engine_ui_frontend_communication_OperationProcessor$2_$onSuccess__Lorg_ovirt_engine_ui_frontend_communication_OperationProcessor$2_2Lorg_ovirt_engine_ui_frontend_communication_VdcOperation_2Ljava_lang_Object_2V (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:173172:217) at Object.org_ovirt_engine_ui_frontend_communication_OperationProcessor$2_onSuccess__Ljava_lang_Object_2Ljava_lang_Object_2V [as onSuccess__Ljava_lang_Object_2Ljava_lang_Object_2V] (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:173190:3) at org_ovirt_engine_ui_frontend_communication_GWTRPCCommunicationProvider$4_$onSuccess__Lorg_ovirt_engine_ui_frontend_communication_GWTRPCCommunicationProvider$4_2Ljava_util_ArrayList_2V (http://localhost:8080/ovirt-engine/webadmin/C5287D41B71197763AB3125431813688.cache.html:172948:675) at Object.org_ovirt_engine_ui_frontend_communication_GWTRPCCommunicationProvider$4_onSuccess__Ljava_lang_Object_2V [as onSuccess__Ljava_lang_Object_2V]
Re: [Engine-devel] UI: VM list not populating
On 06/01/14 14:32 -0500, Daniel Erez wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Alexander Wels aw...@redhat.com Cc: engine-devel@ovirt.org Sent: Monday, January 6, 2014 9:11:48 PM Subject: Re: [Engine-devel] UI: VM list not populating Might be an issue of a stale osinfo properties file, 'displayProtocols' has recently been introduced by [1] Try overwriting osinfo-defaults.properties with the updated one from latest bits /ovirt-engine/packaging/conf/osinfo-defaults.properties -- $HOME/ovirt-engine/share/ovirt-engine/conf [1] http://gerrit.ovirt.org/#/c/18677/14/packaging/conf/osinfo-defaults.properties Thanks for the suggestion but it did not seem to resolve the issue. Also, my proprties file has os.other.displayProtocols.value and os.other.spiceSupport.value. This seems different from [1] above which indicates that the spiceSupport key is removed entirely. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] UI: VM list not populating
On 06/01/14 15:31 -0500, Daniel Erez wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Daniel Erez de...@redhat.com Cc: Alexander Wels aw...@redhat.com, engine-devel@ovirt.org Sent: Monday, January 6, 2014 9:51:57 PM Subject: Re: [Engine-devel] UI: VM list not populating On 06/01/14 14:32 -0500, Daniel Erez wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Alexander Wels aw...@redhat.com Cc: engine-devel@ovirt.org Sent: Monday, January 6, 2014 9:11:48 PM Subject: Re: [Engine-devel] UI: VM list not populating Might be an issue of a stale osinfo properties file, 'displayProtocols' has recently been introduced by [1] Try overwriting osinfo-defaults.properties with the updated one from latest bits /ovirt-engine/packaging/conf/osinfo-defaults.properties -- $HOME/ovirt-engine/share/ovirt-engine/conf [1] http://gerrit.ovirt.org/#/c/18677/14/packaging/conf/osinfo-defaults.properties Thanks for the suggestion but it did not seem to resolve the issue. Also, my proprties file has os.other.displayProtocols.value and os.other.spiceSupport.value. This seems different from [1] above which indicates that the spiceSupport key is removed entirely. Actually spiceSupport key was added a bit later by: http://gerrit.ovirt.org/#/c/18220/17/packaging/conf/osinfo-defaults.properties Can you please check if VMs list is displayed correctly from the userportal? (I just wonder if there's some race in 'initCache/initDisplayTypes' mechanism). Does not work in the User Portal either. I don't know if this is related, but I have started to observe some new errors in server.log. I wonder if I have done too much rebasing and schema upgrading on my local DB: 2014-01-06 15:39:20,451 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-31) Failed to refresh VDS , vds = 203848b8-1d84-4c01-a267-c11280d0ad0f : lager, error = org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [select * from getinterface_viewbyvds_id(?, ?, ?)]; nested exception is org.postgresql.util.PSQLException: The column name qos_overridden was not found in this ResultSet., continuing.: org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [select * from getinterface_viewbyvds_id(?, ?, ?)]; nested exception is org.postgresql.util.PSQLException: The column name qos_overridden was not found in this ResultSet. at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:98) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:603) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:637) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:666) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:706) [spring-jdbc.jar:3.1.1.RELEASE] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:154) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.doExecute(PostgresDbEngineDialect.java:120) [dal.jar:] at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:181) [spring-jdbc.jar:3.1.1.RELEASE] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:137) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeReadList(SimpleJdbcCallsHandler.java:103) [dal.jar:] at org.ovirt.engine.core.dao.network.InterfaceDaoDbFacadeImpl.getAllInterfacesForVds(InterfaceDaoDbFacadeImpl.java:167) [dal.jar:] at org.ovirt.engine.core.dao.network.InterfaceDaoDbFacadeImpl.getAllInterfacesForVds(InterfaceDaoDbFacadeImpl.java:150) [dal.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.updateNetworkData(VdsBrokerObjectsBuilder.java:930) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.updateVDSDynamicData(VdsBrokerObjectsBuilder.java:326) [vdsbroker.jar
Re: [Engine-devel] UI: VM list not populating
On 06/01/14 15:56 -0500, Daniel Erez wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Daniel Erez de...@redhat.com Cc: Alexander Wels aw...@redhat.com, engine-devel@ovirt.org Sent: Monday, January 6, 2014 10:42:08 PM Subject: Re: [Engine-devel] UI: VM list not populating On 06/01/14 15:31 -0500, Daniel Erez wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Daniel Erez de...@redhat.com Cc: Alexander Wels aw...@redhat.com, engine-devel@ovirt.org Sent: Monday, January 6, 2014 9:51:57 PM Subject: Re: [Engine-devel] UI: VM list not populating On 06/01/14 14:32 -0500, Daniel Erez wrote: - Original Message - From: Adam Litke ali...@redhat.com To: Alexander Wels aw...@redhat.com Cc: engine-devel@ovirt.org Sent: Monday, January 6, 2014 9:11:48 PM Subject: Re: [Engine-devel] UI: VM list not populating Might be an issue of a stale osinfo properties file, 'displayProtocols' has recently been introduced by [1] Try overwriting osinfo-defaults.properties with the updated one from latest bits /ovirt-engine/packaging/conf/osinfo-defaults.properties -- $HOME/ovirt-engine/share/ovirt-engine/conf [1] http://gerrit.ovirt.org/#/c/18677/14/packaging/conf/osinfo-defaults.properties Thanks for the suggestion but it did not seem to resolve the issue. Also, my proprties file has os.other.displayProtocols.value and os.other.spiceSupport.value. This seems different from [1] above which indicates that the spiceSupport key is removed entirely. Actually spiceSupport key was added a bit later by: http://gerrit.ovirt.org/#/c/18220/17/packaging/conf/osinfo-defaults.properties Can you please check if VMs list is displayed correctly from the userportal? (I just wonder if there's some race in 'initCache/initDisplayTypes' mechanism). Does not work in the User Portal either. I don't know if this is related, but I have started to observe some new errors in server.log. I wonder if I have done too much rebasing and schema upgrading on my local DB: Yeah, looks like the DB needs upgrading... (if you don't have any important data you can just try creating a new one). Regarding the user portal, I'm guessing you don't see any VMs as you have to assign permissions to them first from the webadmin. Can you try creating some new VMs from the user portal, to see if the list is displayed correctly. Also, look whether you get a similar error in the engine log file as the webadmin. New VMs created in the admin portal and user portal do not show up in the list. I just see the animated boxes indicating that the data is loading. The same error appears in the engine.log. I will try to blow away the data and start over. 2014-01-06 15:39:20,451 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-31) Failed to refresh VDS , vds = 203848b8-1d84-4c01-a267-c11280d0ad0f : lager, error = org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [select * from getinterface_viewbyvds_id(?, ?, ?)]; nested exception is org.postgresql.util.PSQLException: The column name qos_overridden was not found in this ResultSet., continuing.: org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [select * from getinterface_viewbyvds_id(?, ?, ?)]; nested exception is org.postgresql.util.PSQLException: The column name qos_overridden was not found in this ResultSet. at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:98) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:80) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:603) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:637) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:666) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:706) [spring-jdbc.jar:3.1.1.RELEASE] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:154) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect
Re: [Engine-devel] ovirt-engine build segfault on Fedora 20
On 02/01/14 21:53 -0500, Greg Sheremeta wrote: Caution on upgrading your dev machine to Fedora 20. GWT compilation of safari (for Chrome) causes a segfault during the build. Strangely, the build appears to work, so I'm not sure what the net effect of the segfault is. If you only compile for gecko (Firefox) [the default], you won't see the segfault. In other words, make clean install-dev PREFIX=$HOME/ovirt-engine DEV_EXTRA_BUILD_FLAGS_GWT_DEFAULTS=-Dgwt.userAgent=gecko1_8,safari causes the segfault But make install-dev PREFIX=$HOME/ovirt-engine works just fine. I've duplicated this with with both OpenJDK and Oracle JDK. I can confirm this on my F20 system with OpenJDK as well. So far I have not observed any problems with the resulting build. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
[Engine-devel] Engine on Fedora 20
Has anyone had success running ovirt-engine on Fedora 20? I upgraded my system on Wednesday and thought everything was fine but then I started getting the following error: 2013-12-19 14:53:31,447 ERROR [org.ovirt.engine.core.bll.Backend] (MSC service thread 1-5) Error in getting DB connection. The database is inaccessible. Original exception is: DataAccessResourceFailureException: Error retreiving database metadata; nested exception is org.springframework.jdbc.support.MetaDataAccessException: Could not get Connection for extracting meta data; nested exception is org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/ENGINEDataSource Has anyone encountered this recently? ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] Engine on Fedora 20
On 19/12/13 15:05 -0500, Adam Litke wrote: Has anyone had success running ovirt-engine on Fedora 20? I upgraded my system on Wednesday and thought everything was fine but then I started getting the following error: 2013-12-19 14:53:31,447 ERROR [org.ovirt.engine.core.bll.Backend] (MSC service thread 1-5) Error in getting DB connection. The database is inaccessible. Original exception is: DataAccessResourceFailureException: Error retreiving database metadata; nested exception is org.springframework.jdbc.support.MetaDataAccessException: Could not get Connection for extracting meta data; nested exception is org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/ENGINEDataSource Has anyone encountered this recently? Thanks to alonb for his help on IRC. As it turns out, I had a poorly configured pg_hba.conf file that only started causing problems on F20. To fix I replaced my contents with the following two lines: hostengine engine 0.0.0.0/0 md5 hostengine engine ::0/0 md5 Otherwise, it seems to be working fine. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] UX: Display VM Downtime in the UI
On 18/12/13 16:04 -0500, Malini Rao wrote: - Original Message - From: Adam Litke ali...@redhat.com To: engine-devel@ovirt.org Sent: Wednesday, December 18, 2013 9:42:59 AM Subject: [Engine-devel] UX: Display VM Downtime in the UI Hi UX developers, My recent change: http://gerrit.ovirt.org/#/c/22429/ adds support for tracking the time a VM was last stopped and presenting it in the REST API. I would also like to expose this information in the admin portal. This feature has been requested by end users and is useful for managing lots of VMs which may not be used frequently. My idea is to change the 'Uptime' column in the VMs tab to 'Uptime / Downtime' or some equivalent and more compact phrasing. If the VM is Up, then last_start_time would be used to calculate uptime. If the VM is Down, then last_stop_time would be used to calculate downtime. This helps to make efficient use of the column space. Thanks for your comments! MR: I like the idea in general but can we extend to other states as well? Then we could have the col be called something like 'Time in I would argue that 'Up' and 'Down' are the only persistent states where a VM can linger for a user-controlled amount of time. The others (WaitForLaunch, PoweringDown, etc) are just transitions with their own system defined timeouts. Because of this, it really only makes sense to denote uptime and downtime. When the VM is in another state, this column would be empty. current state'. Also, I think since this col is so far from the first column that has the status icon, we should have a tooltip on the value that says ' Uptime' , 'down time' or 'Status time'. Agree on the tooltip. I am not sure how column sorting is being implemented, but if we combine uptime and downtime into a single column we have an opportunity to provide a really intuitive sort where the longest uptime machines are at the top and the longest downtime machines are at the bottom. This could be accomplished by treating uptime as a positive interval and downtime as a negative interval. MR: That's an interesting idea. Not sure how that would translate if we did all states and times. Then I would think you would do descending order within each state but then we would have to fix a sequence for the display of the various statuses based on the statuses that matter most. This is much simpler if you just work with Up and Down. Questions for you all: - Do you support the idea of changing the Uptime column to include Downtime as well or would you prefer a new column instead? MR: I do not like the idea of introducing new columns for this purpose since at any given time, only one of the columns will be populated. Another idea is to remove this column all together and include the time for the current status as a tooltip on the status icon preceding the name. What about adding the uptime/downtime to the status column itself? I don't necessarily think this will muddy the status much since there is still an icon on the left. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
[Engine-devel] Java Newbie: Renaming some functions to fix findbugs warnings
Hello, I am working on resolving some warnings produced by findbugs and am looking for some advice on how to properly resolve the problem. The Frontend class has several pairs of methods where a capitalized version is a deprecated static form and the camelCase version is the instance method. For example: @Deprecated public static void RunQuery(...) - and - public void runQuery(...) In both cases the parameters are the same so simply renaming RunQuery to runQuery will result in a conflict. Since I am new to Java and the ovirt-engine project I am looking for some advice on how to fix the function name without breaking the code or people's sense of aesthetics. Since this is a deprecated function, would it be terrible to rename it to 'runQueryStatic' or 'runQueryDeprecated'? Since the language provides syntactic annotations for 'static' and 'deprecated', both of these names feel dirty but I am not sure what would be better. Thanks for helping out a newbie! --Adam ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] Java Newbie: Renaming some functions to fix findbugs warnings
Adam, We are aware of this issue and we actually have a patch somewhat ready to solve the issue [1]. We made the RunQuery/RunAction/etc method deprecated to encourage people to no longer use them. We have patch ready to remove all current uses of RunQuery/RunAction/etc from the code base, but haven't gotten around to rebasing/merging the patch. Alexander [1] http://gerrit.ovirt.org/#/c/18413/ Thanks for the detail! Looks like fixing this properly is far from a beginner's task. ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] 3.2 features for release notes
On Thu, Jan 24, 2013 at 07:30:07AM -0800, Itamar Heim wrote: doron/adam: not sure about status of vdsm-mom in 3.2? mom is enabled by default for hosts in 3.2 and will control KSM only. No user-visible changes are expected as this is primarily an infrastructure change to enable more advanced SLA in the next release. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] [vdsm] RFC: New Storage API
On Tue, Jan 22, 2013 at 11:36:57PM +0800, Shu Ming wrote: 2013-1-15 5:34, Ayal Baron: image and volume are overused everywhere and it would be extremely confusing to have multiple meanings to the same terms in the same system (we have image today which means virtual disk and volume which means a part of a virtual disk). Personally I don't like the distinction between image and volume done in ec2/openstack/etc seeing as they're treated as different types of entities there while the only real difference is mutability (images are read-only, volumes are read-write). To move to the industry terminology we would need to first change all references we have today to image and volume in the system (I would say also in ovirt-engine side) to align with the new meaning. Despite my personal dislike of the terms, I definitely see the value in converging on the same terminology as the rest of the industry but to do so would be an arduous task which is out of scope of this discussion imo (patches welcome though ;) Another distinction between Openstack and oVirt is how the Nova/ovirt-engine look upon storage systems. In Openstack, a stand alone storage service(Cinder) exports the raw storage block device to Nova. On the other hand, in oVirt, storage system is highly bounded with the cluster scheduling system which integrates storage sub-system, VM dispatching sub-system, ISO image sub systems. This combination make all of the sub-system integrated in a whole which is easy to deploy, but it make the sub-system more opaque and not harder to reuse and maintain. This new storage API proposal give us an opportunity to distinct these sub-systems as new components which export better, loose-coupling APIs to VDSM. A very good point and an important goal in my opinion. I'd like to see ovirt-engine become more of a GUI for configuring the storage component (like it does for Gluster) rather than the centralized manager of storage. The clustered storage should be able to take care of itself as long as the peer hosts can negotiate the SDM role. It would be cool if someone could actually dedicate a non-virtualization host where its only job is to handle SDM operations. Such a host could choose to only deploy the standalone HSM service and not the complete vdsm package. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
[Engine-devel] Managing async tasks
On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Sorry. That's my list :) Hopefully others will be willing to add other requirements for consideration. From my understanding, task recovery (stop, abort, rollback, etc) will not be generally supported and should not be a requirement. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] Managing async tasks
On Mon, Dec 17, 2012 at 03:12:34PM -0500, Saggi Mizrahi wrote: This is an addendum to my previous email. - Original Message - From: Saggi Mizrahi smizr...@redhat.com To: Adam Litke a...@us.ibm.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-devel@ovirt.org, vdsm-de...@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:52:06 PM Subject: Re: Managing async tasks - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-devel@ovirt.org, vdsm-de...@lists.fedorahosted.org Sent: Monday, December 17, 2012 2:16:25 PM Subject: Re: Managing async tasks On Mon, Dec 17, 2012 at 12:15:08PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: vdsm-de...@lists.fedorahosted.org Cc: Dan Kenigsberg dan...@redhat.com, Ayal Baron aba...@redhat.com, Saggi Mizrahi smizr...@redhat.com, Federico Simoncelli fsimo...@redhat.com, engine-devel@ovirt.org Sent: Monday, December 17, 2012 12:00:49 PM Subject: Managing async tasks On today's vdsm call we had a lively discussion around how asynchronous operations should be handled in the future. In an effort to include more people in the discussion and to better capture the resulting conversation I would like to continue that discussion here on the mailing list. A lot of ideas were thrown around about how 'tasks' should be handled in the future. There are a lot of ways that it can be done. To determine how we should implement it, it's probably best if we start with a set of requirements. If we can first agree on these, it should be easy to find a solution that meets them. I'll take a stab at identifying a first set of POSSIBLE requirements: - Standardized method for determining the result of an operation This is a big one for me because it directly affects the consumability of the API. If each verb has different semantics for discovering whether it has completed successfully, then the API will be nearly impossible to use easily. Since there is no way to assure if of some tasks completed successfully or failed, especially around the murky waters of storage, I say this requirement should be removed. At least not in the context of a task. I don't agree. Please feel free to convince me with some exampled. If we cannot provide feedback to a user as to whether their request has been satisfied or not, then we have some bigger problems to solve. If VDSM sends a write command to a storage server, and the connection hangs up before the ACK has returned. The operation has been committed but VDSM has no way of knowing if that happened as far as VDSM is concerned it got an ETIMEO or EIO. This is the same problem that the engine has with VDSM. If VDSM creates an image\VM\network\repo but the connection hangs up before the response can be sent back as far as the engine is concerned the operation times out. This is an inherent issue with clustering. This is why I want to move away from tasks being *the* trackable objects. Tasks should be short. As short as possible. Run VM should just persist the VM information on the VDSM host and return. The rest of the tracking should be done using the VM ID. Create image should return once VDSM persisted the information about the request on the repository and created the metadata files. Tracking should be done on the repo or the imageId. The thing is that I know how long a VM object should live (or an Image object). So tracking it is straight forward. How long a task should live is very problematic and quite context specific. It depends on what the task is. I think it's quite confusing from an API standpoint to have every task have a different scope, id requirement and life-cycle. In VDSM has two types of APIs CRUD objects - VM, Image, Repository, Bridge, Storage Connections General transient methods - getBiosInfo(), getDeviceList() The latter are quite simple to manage. They don't need any special handling. If you lost a getBiosInfo() call you just send another one, no harm done. The same is even true with things that change the host like getDeviceList() What we are really arguing about is fitting the CRUD objects to some generic task oriented scheme. I'm saying it's a waste of time as you can quite easily have flows to recover from each operation. Create - Check if the object exists Read - Read again Update - either update again or read and update if update
Re: [Engine-devel] [vdsm] RFC: New Storage API
operation it will tell it to value one over the other. For example, whether to copy all the data or just create a qcow based of a snapshot. The default is space. You might have also noticed that it is never explicitly specified where to look for existing images. This is done purposefully, VDSM will always look in all connected repositories for existing objects. For very large setups this might be problematic. To mitigate the problem you have these options: participatingRepositories=[repoId, ...] which tell VDSM to narrow the search to just these repositories and imageHints={imgId: repoId} which will force VDSM to look for those image ID just in those repositories and fail if it doesn't find them there. ___ vdsm-devel mailing list vdsm-de...@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel -- --- 舒明 Shu Ming Open Virtualization Engineerning; CSTL, IBM Corp. Tel: 86-10-82451626 Tieline: 9051626 E-mail: shum...@cn.ibm.com or shum...@linux.vnet.ibm.com Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, Beijing 100193, PRC ___ vdsm-devel mailing list vdsm-de...@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] [vdsm] RFC: New Storage API
On Fri, Dec 07, 2012 at 02:53:41PM -0500, Saggi Mizrahi wrote: snip 1) Can you provide more info on why there is a exception for 'lvm based block domain'. Its not coming out clearly. File based domains are responsible for syncing up object manipulation (creation\deletion) The backend is responsible for making sure it all works either by having a single writer (NFS) or having it's own locking mechanism (gluster). In our LVM based domains VDSM is responsible for basic object manipulation. The current design uses an approach where there is a single host responsible for object creation\deleteion it is the SRM\SDM\SPM\S?M. If we ever find a way to make it fully clustered without a big hit in performance the S?M requirement will be removed form that type of domain. I would like to see us maintain a LOCALFS domain as well. For this, we would also need SRM, correct? -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] [vdsm] RFC: New Storage API
On Mon, Dec 10, 2012 at 02:03:09PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: Deepak C Shetty deepa...@linux.vnet.ibm.com, engine-devel engine-devel@ovirt.org, VDSM Project Development vdsm-de...@lists.fedorahosted.org Sent: Monday, December 10, 2012 1:49:31 PM Subject: Re: [vdsm] RFC: New Storage API On Fri, Dec 07, 2012 at 02:53:41PM -0500, Saggi Mizrahi wrote: snip 1) Can you provide more info on why there is a exception for 'lvm based block domain'. Its not coming out clearly. File based domains are responsible for syncing up object manipulation (creation\deletion) The backend is responsible for making sure it all works either by having a single writer (NFS) or having it's own locking mechanism (gluster). In our LVM based domains VDSM is responsible for basic object manipulation. The current design uses an approach where there is a single host responsible for object creation\deleteion it is the SRM\SDM\SPM\S?M. If we ever find a way to make it fully clustered without a big hit in performance the S?M requirement will be removed form that type of domain. I would like to see us maintain a LOCALFS domain as well. For this, we would also need SRM, correct? No, why? Sorry, nevermind. I was thinking of a scenario with multiple clients talking to a single vdsm and making sure they don't stomp on one another. This is probably not something we are going to care about though. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] VDSM tasks, the future
On Tue, Dec 04, 2012 at 10:35:01AM -0500, Saggi Mizrahi wrote: Because I started hinting about how VDSM tasks are going to look going forward I thought it's better I'll just write everything in an email so we can talk about it in context. This is not set in stone and I'm still debating things myself but it's very close to being done. Don't debate them yourself, debate them here! Even better, propose your idea in schema form to show how a command might work exactly. - Everything is asynchronous. The nature of message based communication is that you can't have synchronous operations. This is not really debatable because it's just how TCP\AMQP\messaging works. Can you show how a traditionally synchronous command might work? Let's take Host.getVmList as an example. - Task IDs will be decided by the caller. This is how json-rpc works and also makes sense because no the engine can track the task without needing to have a stage where we give it the task ID back. IDs are reusable as long as no one else is using them at the time so they can be used for synchronizing operations between clients (making sure a command is only executed once on a specific host without locking). - Tasks are transient If VDSM restarts it forgets all the task information. There are 2 ways to have persistent tasks: 1. The task creates an object that you can continue work on in VDSM. The new storage does that by the fact that copyImage() returns one the target volume has been created but before the data has been fully copied. From that moment on the stat of the copy can be queried from any host using getImageStatus() and the specific copy operation can be queried with getTaskStatus() on the host performing it. After VDSM crashes, depending on policy, either VDSM will create a new task to continue the copy or someone else will send a command to continue the operation and that will be a new task. 2. VDSM tasks just start other operations track-able not through the task interface. For example Gluster. gluster.startVolumeRebalance() will return once it has been registered with Gluster. glster.getOperationStatuses() will return the state of the operation from any host. Each call is a task in itself. I worry about this approach because every command has a different semantic for checking progress. For migration, we have to check VM status on the src and dest hosts. For image copy we need to use a special status call on the dest image. It would be nice if there was a unified method for checking on an operation. Maybe that can be completion events. Client: vdsm: --- - Image.copy(...) -- -- Operation Started Wait for event ... -- Event: Operation id done code For an early error: Client: vdsm: --- - Image.copy(...) -- -- Error: code - No task tags. They are silly and the caller can mangle whatever in the task ID if he really wants to tag tasks. Yes. Agreed. - No explicit recovery stage. VDSM will be crash-only, there should be efforts to make everything crash-safe. If that is problematic, in case of networking, VDSM will recover on start without having a task for it. How does this work in practice for something like creating a new image from a template? - No clean Task: Tasks can be started by any number of hosts this means that there is no way to own all tasks. There could be cases where VDSM starts tasks on it's own and thus they have no owner at all. The caller needs to continually track the state of VDSM. We will have brodcasted events to mitigate polling. If a disconnected client might have missed a completion event, it will need to check state. This means each async operation that changes state must document a proceedure for checking progress of a potentially ongoing operation. For Image.copy, that process would be to lookup the new image and check its state. - No revert Impossible to implement safely. How do the engine folks feel about this? I am ok with it :) - No SPM\HSM tasks SPM\SDM is no longer necessary for all domain types (only for type). What used to be SPM tasks, or tasks that persist and can be restarted on other hosts is talked about in previous bullet points. A nice simplification. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel
Re: [Engine-devel] [vdsm] RFC: New Storage API
information (like Volume.getInfo)? (I see some more info below...) All operations return once the operations has been committed to disk NOT when the operation actually completes. This is done so that: - operation come to a stable state as quickly as possible. - In case where there is an SDM, only small portion of the operation actually needs to be performed on the SDM host. - No matter how many times the operation fails and on how many hosts, you can always resume the operation and choose when to do it. - You can stop an operation at any time and remove the resulting object making a distinction between stop because the host is overloaded to I don't want that image This means that after calling any operation that creates a new image the user must then call getImageStatus() to check what is the status of the image. The status of the image can be either optimized, degraded, or broken. Optimized means that the image is available and you can run VMs of it. Degraded means that the image is available and will run VMs but it might be a better way VDSM can represent the underlying data. Broken means that the image can't be used at the moment, probably because not all the data has been set up on the volume. Apart from that VDSM will also return the last persisted status information which will conatin hostID - the last host to try and optimize of fix the image stage - X/Y (eg. 1/10) the last persisted stage of the fix. Do you have some examples of what the stages would be? I think these should be defined in enums so that the user can check on what the individual stages mean. What happens when the low level implementation of an operation changes? The meaning of the stages will change completely. percent_complete - -1 or 0-100, the last persisted completion percentage of the aforementioned stage. -1 means that no progress is available for that operation. last_error - This will only be filled if the operation failed because of something other then IO or a VDSM crash for obvious reasons. It will usually be set if the task was manually stopped The user can either be satisfied with that information or as the host specified in host ID if it is still working on that image by checking it's running tasks. checkStorageRepository(self, repositoryId, options={}): A method to go over a storage repository and scan for any existing problems. This includes degraded\broken images and deleted images that have no yet been physically deleted\merged. It returns a list of Fix objects. Fix objects come in 4 types: clean - cleans data, run them to get more space. optimize - run them to optimize a degraded image What is an example of a degraded image? merge - Merges two images together. Doing this sometimes makes more images ready optimizing or cleaning. The reason it is different from optimize is that unmerged images are considered optimized. mend - mends a broken image What does this mean? The user can read these types and prioritize fixes. Fixes also contain opaque FIX data and they should be sent as received to fixStorageRepository(self, repositoryId, fix, options={}): That will start a fix operation. Could we have an automatic fix mode where vdsm just does the right thing (for most things)? All major operations automatically start the appropriate Fix to bring the created object to an optimize\degraded state (the one that is quicker) unless one of the options is AutoFix=False. This is only useful for repos that might not be able to create volumes on all hosts (SDM) but would like to have the actual IO distributed in the cluster. Other common options is the strategy option: It has currently 2 possible values space and performance - In case VDSM has 2 ways of completing the same operation it will tell it to value one over the other. For example, whether to copy all the data or just create a qcow based of a snapshot. The default is space. I like this a lot. You might have also noticed that it is never explicitly specified where to look for existing images. This is done purposefully, VDSM will always look in all connected repositories for existing objects. For very large setups this might be problematic. To mitigate the problem you have these options: participatingRepositories=[repoId, ...] which tell VDSM to narrow the search to just these repositories and imageHints={imgId: repoId} which will force VDSM to look for those image ID just in those repositories and fail if it doesn't find them there. I would like to have a better way of specifying these optional parameters without burying them in an options structure. I will think a little more about this. Strategy can just be a two optional flags in a 'flags' argument. For the participatingRepositories and imageHints options, I think we need to use real parameters. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center
Re: [Engine-devel] RFD: API: Identifying vdsm objects in the next-gen API
On Mon, Dec 03, 2012 at 03:57:42PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: engine-de...@linode01.ovirt.org, Dan Kenigsberg dan...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Ayal Baron aba...@redhat.com, vdsm-de...@lists.fedorahosted.org Sent: Monday, December 3, 2012 3:30:21 PM Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API On Thu, Nov 29, 2012 at 05:59:09PM -0500, Saggi Mizrahi wrote: - Original Message - From: Adam Litke a...@us.ibm.com To: Saggi Mizrahi smizr...@redhat.com Cc: engine-de...@linode01.ovirt.org, Dan Kenigsberg dan...@redhat.com, Federico Simoncelli fsimo...@redhat.com, Ayal Baron aba...@redhat.com, vdsm-de...@lists.fedorahosted.org Sent: Thursday, November 29, 2012 5:22:43 PM Subject: Re: RFD: API: Identifying vdsm objects in the next-gen API On Thu, Nov 29, 2012 at 04:52:14PM -0500, Saggi Mizrahi wrote: They are not future proof as the paradigm is completely different. Storage domain IDs are not static any more (and are not guaranteed to be unique or the same across the cluster. Image IDs represent the ID of the projected data and not the actual unique path. Just as an example, to run a VM you give a list of domains that might contain the needed images in the chain and the image ID of the tip. The paradigm is changed to and most calls get non synchronous number of images and domains. Further more, the APIs themselves are completely different. So future proofing is not really an issue. I don't understand this at all. Perhaps we could all use some education on the architecture of the planned architectural changes. If I can pass an arbitrary list of domainIDs that _might_ contain the data, why wouldn't I just pass all of them every time? In that case, why are they even required since vdsm would have to search anyway? It's for optimization mostly, the engine usually has a good idea of where stuff are, having it give hints to VDSM can speed up the search process. also, then engines knows how transient some storage pieces are. If you have a domain that is only there for backup or owned by another manager sharing the host, you don't want you VMs using the disks that are on that storage effectively preventing it from being removed (though we do have plans to have qemu switch base snapshots at runtime for just that). This is not a clean design. If the search is slow, then maybe we need to improve caching internally. Making a client cache a bunch of internal IDs to pass around sounds like a complete layering violation to me. You can't cache this, if the same template exists on an 2 different NFS domains only the engine has enough information to know which you should use. We only have the engine give us thing information when starting a VM or merging\copying an image that resides on multiple domains. It is also completely optional. I didn't like it either. Is it even valid for the same template (with identical uuids) to exist in two places? I thought uuids aren't supposed to collide. I can envision some scenario where a cached storagedomain/storagepool relationship becomes invalid because another user detached the storagedomain. In that case, the API just returns the normal error about sd XXX is not attached to sp XXX. So I don't see any problem here. As to making the current API a bit simpler. As I said, making them opaque is problematic as currently the engine is responsible for creating the IDs. As I mentioned in my last post, engine still can specify the ID's when the object is first created. From that point forward the ID never changes so it can be baked into the identifier. Where will this identifier be persisted? Further more, some calls require you to play with these (making a template instead of a snapshot). Also, the full chain and topology needs to be completely visible to the engine. Please provide a specific example of how you play with the IDs. I can guess where you are going, but I don't want to divert the thread. The relationship between volumes and images is deceptive at the moment. IMG is the chain and volume is a member, IMGUUID is only used to for verification and to detect when we hit a template going up the chain. When you do operation on images assumptions are being guaranteed about the resulting IDs. When you copy an image, you assume to know all the new IDs as they remain the same. With your method I can't tell what the new opaque result is going to be. Preview mode (another abomination being deprecated) relies on the disconnect between imgUUID and volUUID. Live migration currently moves a lot of the responsibility to the engine. No client
Re: [Engine-devel] [vdsm] [ATTENTION] vdsm-bootstrap/host deployment (pre-3.2)
On Thu, Nov 29, 2012 at 10:00:12AM +0200, Dan Kenigsberg wrote: On Wed, Nov 28, 2012 at 03:29:35PM -0600, Adam Litke wrote: On Wed, Nov 28, 2012 at 03:45:28PM -0500, Alon Bar-Lev wrote: - Original Message - From: Dan Kenigsberg dan...@redhat.com To: Alon Bar-Lev alo...@redhat.com Cc: VDSM Project Development vdsm-de...@lists.fedorahosted.org, engine-devel engine-devel@ovirt.org, users us...@ovirt.org Sent: Wednesday, November 28, 2012 10:39:42 PM Subject: Re: [vdsm] [ATTENTION] vdsm-bootstrap/host deployment (pre-3.2) On Wed, Nov 28, 2012 at 02:57:17PM -0500, Alon Bar-Lev wrote: No... we need it as compatibility with older engines... We keep minimum changes there for legacy, until end-of-life. Is there an EoL statement for oVirt-3.1? We can make sure that oVirt-3.2's vdsm installs properly with ovirt-3.1's vdsm-bootstrap, or even require that Engine must be upgraded to ovirt-3.2 before upgrading any of the hosts. Is it too harsh to our vast install base? us...@ovirt.org, please chime in! I tried to find such, but the more I dig I find that we need to support old legacy. Why, exactly? Fedora gives no such guarntees (heck, I'm stuck with an unupgradable F16). Should we be any better than our (currently single) platform? We should start and detach from specific distro procedures. * legacy-removed: change machine width core file # echo /var/lib/vdsm/core /proc/sys/kernel/core_pattern Yeah, qemu-kvm and libvirtd are much more stable than in the old days, but wouldn't we want to keep a means to collect the corpses of dead processes from hypervisors? It has helped us nail down nasty bugs, even in Python. It does not mean it should be at /var/lib/vdsm ... :) I don't get the joke :-(. If you mind the location, we can think of somewhere else to put the core dumps. Would it be hard to reinstate a parallel feature in otopi? I usually do not make any jokes... A global system setting should not go into package specific location. Usually core dumps are off by default, I like this approach as unattended system may fast consume all disk space because of dumps. If a host fills up with dumps so quickly, it's a sign that it should not be used for production, and that someone should look into the cores. (P.S. we have a logrotate rule for them in vdsm) There should be a vdsm-debug-aids (or similar) to perform such changes. Again, I don't think vdsm should (by default) modify any system width parameter such as this. But I will happy to hear more views. I agree with your statement above that a single package should not override a global system setting. We should really work to remove as many of these from vdsm as we possibly can. It will help to make vdsm a much safer/well-behaved package. I'm fine with dropping these from vdsm, but I think they are good for ovirt - we would like to (be able to) enfornce policy on our nodes. If configuring core dumps is removed from vdsm, it should go somewhere else, or our log-collector users would miss their beloved dumps. Yes, I agree. From my point of view the plan was to do the following: 1. Remove unnecessary system configuration changes. This includes things like Royce's supervdsm startup process patch (and accompanying sudo-supervdsm conversions) which allows us to remove some of the sudo configuration. 2. Isolate the remaining tweaks into vdsm-tool. 3. Provide a service/program that can be run to configure a system to work in an ovirt-engine controlled cluster. Doing this allows vdsm to be safely installed on any system as a basic prerequisite for other software. -- Adam Litke a...@us.ibm.com IBM Linux Technology Center ___ Engine-devel mailing list Engine-devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/engine-devel