Hi, Igniters!
I am looking for a possibility to load data from Spark RDD or DataFrame to
Ignite cache with next declaration IgniteCache<Integer, Object[]> dataCache
to perform Ignite ML algorithms.
As I understand the current mechanism of Ignite-Spark integration helps to
store RDD/DF from Spark in Ignite to improve performance of Spark Jobs and
this implementation couldn't help me, am I correct?
Dou you know how to make this small ETL more effectively? Without
collecting data on one node like in example below?
IgniteCache<Integer, Object[]> cache = getCache(ignite);
SparkSession spark = SparkSession
.builder()
.appName("SparkForIgnite")
.master("local")
.config("spark.executor.instances", "2")
.getOrCreate();
Dataset<Row> ds = <ds in Spark>;
ds.show();
List<Row> data = ds.collectAsList(); // stupid solution
Object[] parsedRow = new Object[14];
for (int i = 0; i < data.size(); i++) {
for (int j = 0; j < 14; j++)
parsedRow[j] = data.get(i).get(j);
cache.put(i, parsedRow);
}
spark.stop();
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/