-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jay and I have been working on some SSM problems uncovered in pages recently ported from Perl to Java, we've been asked to share some of the lessons learned for those porting SSM pages in the future.
Issues were uncovered when someone reported UI timeouts with about 10k systems in the SSM when doing package operations. (doesn't happen with Perl pages as the timeout is between tomcat and apache) To remedy we modified the actions to use the new async MessageQueue and run the work in the background, allowing us to return quickly in the UI. Once in the background we got a first look at just how long this operation would take on 10,000 systems, longest run was 90 minutes before I gave up and killed it. Commit is here: http://git.fedorahosted.org/git/?p=spacewalk.git;a=commitdiff;h=52f29b24213b699be3eb742ef37c3797b3048eef When using this new asynchronous framework, it's not safe to pass Hibernate objects in to the event for use in the action, which executes in another thread. We saw Hibernate exceptions sharing a collection between two open sessions several times (intermittent, not sure exactly how to reproduce) when just sharing a User object. Changing to just share the user ID and lookup the user on the other side fixes the issue. Crucial performance killer was a once per server SystemManager.lookupByIdAndUser, which is not only looking up Hibernate objects one at a time, it's also doing an additional isAvailableToUser check right after which is completely unnecessary given that we can already see the system in the SSM. ServerFactory.lookupByIdAndOrg avoids this, but even still querying for Hibernate objects one at a time was brutally slow. It looks as though ultimately all we even need to schedule these actions are the server IDs, but to switch to using these would have required a lot of work on ActionFactory that we didn't dare get into just yet. (bugzilla filed for future examination) Instead we introduced a bulk server query in Hibernate: <query name="Server.findByIdsAndOrgId"> <![CDATA[from com.redhat.rhn.domain.server.Server as s where ORG_ID = :orgId and s.id in (:serverIds)]]> </query> Even with this, you can still get caught out if serverIds is more than 1000 entries (with oracle): java.sql.SQLException: ORA-01795: maximum number of expressions in a list is 1000 So ServerFactory.lookupByIdsAndUser() has to chop up the incoming list into chunks of 1000 and assemble the final list in application code. Doing this took the 90+ minute operation down to around 1.5 - 2 minutes on my hardware. Our last performance issue had to do with scheduling actions. One action per server again pushed us up over an hour of runtime if you reuse the existing rather clean methods in ActionFactory. Converting to use parent actions cuts this down to about half a minute. Long story short, be vary wary how you use Hibernate when working on the SSM, single server method calls very quickly break down for abysmal performance, but bulk Hibernate queries can still lookup many objects in a reasonable timeframe. Cheers, Devan - -- Devan Goodwin <[email protected]> Software Engineer Spacewalk / RHN Satellite Halifax, Canada 650.567.9039x79267 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) iEYEARECAAYFAkpcnQUACgkQAyHWaPV9my6ZEwCg03Xz1O8w2X56phKyULdK+boG 9UwAn3MDs/N8NBF7xAucZZJoJRdGoEUw =d53t -----END PGP SIGNATURE----- _______________________________________________ Spacewalk-devel mailing list [email protected] https://www.redhat.com/mailman/listinfo/spacewalk-devel
