Thanks for the response Yanbo. Here is the source (it uses the sample_libsvm_data.txt file used in the mlliv examples).
-Raj ————— IOTest.scala ------------- import org.apache.spark.{SparkConf,SparkContext} import org.apache.spark.sql.SQLContext import org.apache.spark.sql.DataFrame object IOTest { val InputFile = "/tmp/sample_libsvm_data.txt" val OutputDir ="/tmp/out" val sconf = new SparkConf().setAppName("test").setMaster("local[*]") val sqlc = new SQLContext( new SparkContext( sconf )) val df = sqlc.read.format("libsvm").load( InputFile ) df.show; df.printSchema df.write.format("json").mode("overwrite").save( OutputDir ) val data = sqlc.read.format("json").load( OutputDir ) data.show; data.printSchema def main( args: Array[String]):Unit = {} } ----------------------- On Feb 26, 2016, at 12:47 AM, Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>> wrote: Hi Raj, Could you share your code which can help others to diagnose this issue? Which version did you use? I can not reproduce this problem in my environment. Thanks Yanbo 2016-02-26 10:49 GMT+08:00 raj.kumar <raj.ku...@hooklogic.com<mailto:raj.ku...@hooklogic.com>>: Hi, I am using mllib. I use the ml vectorization tools to create the vectorized input dataframe for the ml/mllib machine-learning models with schema: root |-- label: double (nullable = true) |-- features: vector (nullable = true) To avoid repeated vectorization, I am trying to save and load this dataframe using df.write.format("json").mode("overwrite").save( url ) val data = Spark.sqlc.read.format("json").load( url ) However when I load the dataframe, the newly loaded dataframe has the following schema: root |-- features: struct (nullable = true) | |-- indices: array (nullable = true) | | |-- element: long (containsNull = true) | |-- size: long (nullable = true) | |-- type: long (nullable = true) | |-- values: array (nullable = true) | | |-- element: double (containsNull = true) |-- label: double (nullable = true) which the machine-learning models do not recognize. Is there a way I can save and load this dataframe without the schema changing. I assume it has to do with the fact that Vector is not a basic type. thanks -Raj -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-and-Loading-Dataframes-tp26339.html Sent from the Apache Spark User List mailing list archive at Nabble.com<http://nabble.com>. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>