On Fri 2008-05-23 10:30, amjad ali wrote: > Would you please explain a bit about "unassembled structures"? > Does Discontinuous Galerkin Method falls into this category?
I'm doing some work on this so I'll try to answer. There are two components which can be ``unassembled'' namely the matrix application and the preconditioner. In general, unassembled makes the most sense for semi-structured approximations where there is natural data granularity similar to L1 cache size. A standard example is the spectral or p-version finite element method. In these methods, the element stiffness matrix is dense and can be very large, but it is possible to apply it without storing the entries. For instance, suppose we have polynomial order p-1 on a hexahedral element. Then there are p^3 element degrees of freedom and the global matrix will have p^6 nonzeros contributed by this element (if we assemble it). For p=10, this is already big and will make preconditioning very expensive. On the other hand, we can apply the element Jacobian in O(p^3) space and O(p^4) time if we exploit a tensor product basis. Even if we don't use tensor products, the space requirement can stay the same. For high order p, this will be a lot less operations, but the important point with regard to FLOP/s is that there is now more work per amount of memory touched. If there are N elements, the total number of degrees of freedom is O(N p^3) and the matrix has O(N p^6) entries so multiplication is O(N p^6) time and space. If applied locally (i.e. scatter to local basis, apply there, scatter back) the space requirement is O(N p^3) and the time is O(N p^4) or O(N p^6) depending on the basis. In addition, the element operations can often be done entirely in L1 cache. Clearly this puts a lot less stress on the memory bus. Of course, just applying the Jacobian fast won't cut it, we also need a preconditioner. One way is to assemble a sparser approximation to the global Jacobian and apply standard preconditioners. Another is to apply a domain decomposition preconditioner which exploits the data granularity. For instance, Schur complement preconditioners can be formed based on solves on a single element. These are most attractive when there is a `fast' way to solve the local problem on a single element, but they can be expected to increase the FLOP/s rate either way because the memory access pattern requires more work for a given amount of memory. (I don't know a lot about DD preconditioners. I'm using the former approach, assembling a sparser approximation of the Jacobian.) Discontinuous Galerkin happens to be easy to implement for high order elements and the scatter from global to local basis is trivial. Jed -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20080523/d1547be8/attachment.pgp>
