Re: [petsc-users] Accessing submatrices without additional memory usage

Matthew Knepley Wed, 24 May 2017 11:20:06 -0700

On Wed, May 24, 2017 at 1:13 PM, Michał Dereziński <
[email protected]> wrote:


>
> Wiadomość napisana przez Matthew Knepley <[email protected]> w dniu
> 24.05.2017, o godz. 10:44:
>
> On Wed, May 24, 2017 at 12:37 PM, Michał Dereziński <michal.derezinski@
> gmail.com> wrote:
>
>> Great! Then I have a follow-up question:
>>
>> My goal is to be able to load the full matrix X from disk, while at the
>> same time in parallel, performing computations on the submatrices that have
>> already been loaded. Essentially, I want to think of X as a block matrix
>> (where the blocks are horizontal, spanning the full width of the matrix),
>> where I’m loading one block at a time, and all the blocks that have already
>> been loaded are combined using MatCreateNest, so that I can make
>> computations on that portion of the matrix.
>>
>
> I need to understand better. So
>
>   1) You want to load a sparse matrix from disk
>
>
> Yes, the matrix is sparse, stored on disk in row-wise chunks (one per
> process), with total size of around 3TB.
>
>   2) You are imagining that it is loaded row-wise, since you can do a
> calculation with some rows before others are loaded.
>
>        What calculation, a MatMult?
>        How long does that MatMult take compared to loading?
>
>
> Yes, a MatMult.
> I already have a more straightforward implementation where the matrix is
> loaded completely at the beginning, and then all of the multiplications are
> performed.
> Based on the loading time and computation time with the current
> implementation, it appears that most of the computation time could be
> subsumed into the loading time.
>
>   3) If you are talking about a dense matrix, you should be loading in
> parallel using MPI-I/O. We do this for Vec.
>
> Before you do complicated programming, I would assure myself that the
> performance gain is worth it.
>
>
>> In this scenario, every process needs to be simultaneously loading the
>> next block of X, and perform computations on the previously loaded portion.
>> My strategy is for each MPI process to spawn a thread for data loading (so
>> that the memory between the process and the thread is shared), while the
>> process does computations. My concern is that the data loading thread may
>> be using up computational resources of the processor, even though it is
>> mainly doing IO. Will this be an issue? What is the best way to minimize
>> the cpu time of this parallel data loading scheme?
>>
>
> Oh, you want to load each block in parallel, but there are many blocks. I
> would really caution you against using threads. They
> are death to clean code. Use non-blocking reads.
>
>
> I see. Could you expand on your suggestion regarding non-blocking reads?
> Are you proposing that each process makes an asynchronous read request in
> between every, say, MatMult operation?
>

Check this out: http://beige.ucs.indiana.edu/I590/node109.html

PETSc does not do this currently, but it sounds like you are handling the
load.

  Thanks,

    Matt


>
>   Thanks,
>
>      Matt
>
>
>> Thanks,
>> Michal.
>>
>>
>> Wiadomość napisana przez Matthew Knepley <[email protected]> w dniu
>> 24.05.2017, o godz. 04:55:
>>
>> On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I want to be able to perform matrix operations on several contiguous
>>> submatrices of a full matrix, without allocating the memory redundantly for
>>> the submatrices (in addition to the memory that is already allocated for
>>> the full matrix).
>>> I tried using MatGetSubMatrix, but this function appears to allocate the
>>> additional memory.
>>>
>>> The other way I found to do this is to create the smallest submatrices I
>>> need first, then use MatCreateNest to combine them into bigger ones
>>> (including the full matrix).
>>> The documentation of MatCreateNest seems to indicate that it does not
>>> allocate additional memory for storing the new matrix.
>>> Is this the right approach, or is there a better one?
>>>
>>
>> Yes, that is the right approach.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> Thanks,
>>> Michal Derezinski.
>>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> http://www.caam.rice.edu/~mk51/
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> http://www.caam.rice.edu/~mk51/
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/

Re: [petsc-users] Accessing submatrices without additional memory usage

Reply via email to