Make sure that the files can be ordered, of course. Losing the ordering can be really bad.
On Sun, Nov 13, 2011 at 10:34 PM, Jake Mannix <[email protected]> wrote: > Yeah, in particular, DistributedRowMatrix "is" simply a > SequenceFile<IntWritable,VectorWritable>, when in its serialized form. As > such, > this "file" can be (and typically is) a series of part-* files in a > directory (typically > on HDFS). > > -jake > > On Sun, Nov 13, 2011 at 10:23 PM, Dmitriy Lyubimov <[email protected] > >wrote: > > > It's my understanding drm can be multifile. In fact, stuff like > seq2sparse > > will produce multifile output, being a MR job itself. > > On Nov 12, 2011 3:23 PM, "Lance Norskog" <[email protected]> wrote: > > > > > Is there a convention for multi-file matrices? For example, the > > > DistributedRowMatrix? > > > > > > -- > > > Lance Norskog > > > [email protected] > > > > > >
