You can get a SchemaRDD from the Hive table, map it into a RDD of
Vectors, and then construct a RowMatrix. The transformations are lazy,
so there is no external storage requirement for intermediate data.
-Xiangrui
On Sun, Jan 18, 2015 at 4:07 AM, guxiaobo1982 wrote:
> Hi,
>
> We have large datase
Hi,
We have large datasets with data format for Spark MLLib matrix, but there are
pre-computed by Hive and stored inside Hive, my question is can we create a
distributed matrix such as IndexedRowMatrix directlly from Hive tables,
avoiding reading data from Hive tables and feed them into an emp