[ https://issues.apache.org/jira/browse/SPARK-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592398#comment-14592398 ]
Alexander Ulanov edited comment on SPARK-8449 at 6/18/15 7:53 PM: ------------------------------------------------------------------ It seems that using the official HDF5 reader is not a viable choice for Spark due to platform dependent binaries. We need to look for pure Java implementation. Apparently, there is one called netCDF: http://www.unidata.ucar.edu/blogs/news/entry/netcdf_java_library_version_44. It might be tricky to use it because the license is not Apache. However it worths a look. was (Author: avulanov): It seems that using the official HDF5 reader is not a viable choice for Spark due to platform dependent binaries. We need to look for pure Java implementation. Apparently, there is one called netCDF: http://www.unidata.ucar.edu/blogs/news/entry/netcdf_java_library_version_44. It might be tricky to use it because the license is not Apache. However it worth a look. > HDF5 read/write support for Spark MLlib > --------------------------------------- > > Key: SPARK-8449 > URL: https://issues.apache.org/jira/browse/SPARK-8449 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.4.0 > Reporter: Alexander Ulanov > Fix For: 1.4.1 > > Original Estimate: 96h > Remaining Estimate: 96h > > Add support for reading and writing HDF5 file format to/from LabeledPoint. > HDFS and local file system have to be supported. Other Spark formats to be > discussed. > Interface proposal: > /* path - directory path in any Hadoop-supported file system URI */ > MLUtils.saveAsHDF5(sc: SparkContext, path: String, RDD[LabeledPoint]): Unit > /* path - file or directory path in any Hadoop-supported file system URI */ > MLUtils.loadHDF5(sc: SparkContext, path: String): RDD[LabeledPoint] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org