-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jay and I have been working on some SSM problems uncovered in pages
recently ported from Perl to Java, we've been asked to share some of
the lessons learned for those porting SSM pages in the future.

Issues were uncovered when someone reported UI timeouts with about 10k
systems in the SSM when doing package operations. (doesn't happen with
Perl pages as the timeout is between tomcat and apache) To remedy we
modified the actions to use the new async MessageQueue and run the work
in the background, allowing us to return quickly in the UI. Once in the
background we got a first look at just how long this operation would
take on 10,000 systems, longest run was 90 minutes before I gave up and
killed it.

Commit is here:

http://git.fedorahosted.org/git/?p=spacewalk.git;a=commitdiff;h=52f29b24213b699be3eb742ef37c3797b3048eef


When using this new asynchronous framework, it's not safe to pass
Hibernate objects in to the event for use in the action, which executes
in another thread. We saw Hibernate exceptions sharing a collection
between two open sessions several times (intermittent, not sure exactly
how to reproduce) when just sharing a User object. Changing to just
share the user ID and lookup the user on the other side fixes the issue.

Crucial performance killer was a once per server
SystemManager.lookupByIdAndUser, which is not only looking up Hibernate
objects one at a time, it's also doing an additional isAvailableToUser
check right after which is completely unnecessary given that we can
already see the system in the SSM. ServerFactory.lookupByIdAndOrg
avoids this, but even still querying for Hibernate objects one at a
time was brutally slow.

It looks as though ultimately all we even need to schedule these
actions are the server IDs, but to switch to using these would have
required a lot of work on ActionFactory that we didn't dare get into
just yet. (bugzilla filed for future examination) Instead we introduced
a bulk server query in Hibernate:

    <query name="Server.findByIdsAndOrgId">
        <![CDATA[from com.redhat.rhn.domain.server.Server as s where
    ORG_ID = :orgId and s.id in (:serverIds)]]>
    </query>

Even with this, you can still get caught out if serverIds is more than
1000 entries (with oracle):

java.sql.SQLException: ORA-01795: maximum number of expressions in a
list is 1000

So ServerFactory.lookupByIdsAndUser() has to chop up the incoming list
into chunks of 1000 and assemble the final list in application code.

Doing this took the 90+ minute operation down to around 1.5 - 2 minutes
on my hardware.

Our last performance issue had to do with scheduling actions. One
action per server again pushed us up over an hour of runtime if you
reuse the existing rather clean methods in ActionFactory. Converting to
use parent actions cuts this down to about half a minute.

Long story short, be vary wary how you use Hibernate when working on
the SSM, single server method calls very quickly break down for abysmal
performance, but bulk Hibernate queries can still lookup many objects
in a reasonable timeframe. 

Cheers,

Devan



- -- 
  Devan Goodwin <[email protected]>
  Software Engineer     Spacewalk / RHN Satellite
  Halifax, Canada       650.567.9039x79267
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkpcnQUACgkQAyHWaPV9my6ZEwCg03Xz1O8w2X56phKyULdK+boG
9UwAn3MDs/N8NBF7xAucZZJoJRdGoEUw
=d53t
-----END PGP SIGNATURE-----

_______________________________________________
Spacewalk-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to