Re: How to create distributed matrixes from hive tables.

2015-01-20 Thread Xiangrui Meng
You can get a SchemaRDD from the Hive table, map it into a RDD of Vectors, and then construct a RowMatrix. The transformations are lazy, so there is no external storage requirement for intermediate data. -Xiangrui On Sun, Jan 18, 2015 at 4:07 AM, guxiaobo1982 wrote: > Hi, > > We have large datase

How to create distributed matrixes from hive tables.

2015-01-18 Thread guxiaobo1982
Hi, We have large datasets with data format for Spark MLLib matrix, but there are pre-computed by Hive and stored inside Hive, my question is can we create a distributed matrix such as IndexedRowMatrix directlly from Hive tables, avoiding reading data from Hive tables and feed them into an emp