> Wiadomość napisana przez Jed Brown <[email protected]> w dniu 24.05.2017, o 
> godz. 12:28:
> 
> Michał Dereziński <[email protected] 
> <mailto:[email protected]>> writes:
> 
>>> Wiadomość napisana przez Jed Brown <[email protected]> w dniu 24.05.2017, o 
>>> godz. 12:06:
>>> 
>>> Okay, do you have more parameters than observations?  
>> 
>> No (not necessarily). The biggest matrix is 50M observations and 12M 
>> parameters.
>> 
>>> And each segment
>>> of the matrix will be fully distributed?
>> 
>> Yes.
>> 
>>> Do you have a parallel file
>>> system?
>> 
>> Yes.
>> 
>>> Is your matrix sparse or dense?
>> 
>> Yes.
> 
> By that you mean sparse?
> 

Yes, sorry, that’s what I meant.

> You'll need some sort of segmented storage (could be separate files or a
> file format that allows seeking).  (If the matrix is generated by some
> other process, you'd benefit from skipping the file system entirely, but
> I understand that may not be possible.)
> 

I have the segmented storage in place.

> I would use MatNest, creating a new one after each segment is loaded.
> There isn't currently a MatLoadBegin/End interface, but that could be
> created if it would be useful.

Ok, yeah, that was my plan with MatNest. 

As far as loading in parallel with computation goes, the feedback that I’m 
hearing so far is:
1. Don’t do it, unless you really have to;
2. If you’re going to do it, instead of spawning a separate thread, use 
asynchronous read, eg MPI_File_iread.

Does this make sense?

Thanks,
Michal.


Reply via email to