On Nov 2, 2011, at 11:50 AM, Jake Mannix wrote: > Ah, ok, I was looking at an older source tree. Then in that case, no > *release* > we've had touches them, and nowhere in the codebase does anyone > currently use the bindings, even if it is the case that if you *did* use > them, > they would indeed get serialized with the matrix.
Are you sure about that? I thought that was something we added a long time ago. Perhaps it doesn't look released b/c some of that stuff got moved around? > > Which is why I was asking the question: does anyone use these, or even > remember they really exist? I do lots of processing with matrices which > happen to have both row and column labels, but it's really not a terribly > bit hassle to know you have to hang onto a dictionary somehwhere which > translates the ids to labels. In fact, it's far more often that I find I'm > reusing > the same dictionary again and again as I'm exploring the data set, > rebuilding > the numeric matrix in different ways. In this case, the dictionaries live > on > HDFS all nice and safe, and I've got a pile of numeric serialized > (DistributedRow-)Matrix instances which all use that dictionary, but don't > replicate it with them. Makes sense. Of course, if you don't use them, there's no harm, right? I'm not against removing them, but it does strike me that we had a pretty lengthy discussion about them way back when and Ted and Jeff went through a few iterations to add them in. > > -jake > > On Wed, Nov 2, 2011 at 8:08 AM, Grant Ingersoll <[email protected]> wrote: > >> >> On Nov 2, 2011, at 10:58 AM, Jake Mannix wrote: >> >>> On Wed, Nov 2, 2011 at 7:34 AM, Grant Ingersoll <[email protected]> >> wrote: >>> >>>> What functionality, specifically, are you proposing to remove? >>> >>> >>> I'm suggesting we kill, from Matrix.java and descendents, all of the >>> following methods: >>> >>> Map<String, Integer> getColumnLabelBindings(); >>> Map<String, Integer> getRowLabelBindings(); >>> void setColumnLabelBindings(Map<String, Integer> bindings); >>> void setRowLabelBindings(Map<String, Integer> bindings); >>> double get(String rowLabel, String columnLabel); >>> void set(String rowLabel, String columnLabel, double value); >>> void set(String rowLabel, String columnLabel, int row, int column, >> double >>> value); >>> void set(String rowLabel, double[] rowData); >>> void set(String rowLabel, int row, double[] rowData); >>> >>> >>>> I know we had a lot of discussion around some of this stuff way back >> when >>>> as to how best to do it, but of course, that doesn't mean it has uptake. >>>> If it's on the Matrix, then doesn't it more easily get shipped around >> via >>>> the Writables vs. requiring the user to do that? Not sure it is an >> issue, >>>> but it's one less piece of code someone else has to write. >>> >>> >>> MatrixWritable does not, in fact, serialize the labels along with the >>> matrix, it turns out. There are two methods for (de-)serializing them >>> separately: >>> >>> public static void readLabels(DataInput in, >>> Map<String, Integer> columnLabelBindings, >>> Map<String, Integer> rowLabelBindings) >>> throws IOException; >>> >>> public static void writeLabelBindings(DataOutput out, >>> Map<String, Integer> >>> columnLabelBindings, >>> Map<String, Integer> >>> rowLabelBindings) throws IOException; >>> >>> but neither of these are used anywhere in the codebase (even in tests). >>> >> >> writeMatrix calls them. Line 162 of MatrixWritable. >> >> -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com
