Hello Mans, The streaming DataSource APIs are still evolving and are not public yet. Hence there is no official documentation. In fact, there is a new DataSourceV2 API (in Spark 2.3) that we are migrating towards. So at this point of time, it's hard to make any concrete suggestion. You can take a look at the classes DataSourceV2, DataReader, MicroBatchDataReader in the spark source code, along with their implementations.
Hope this helps. TD On Jan 25, 2018 8:36 PM, "M Singh" <mans2si...@yahoo.com.invalid> wrote: Hi: I am trying to create a custom structured streaming source and would like to know if there is any example or documentation on the steps involved. I've looked at the some methods available in the SparkSession but these are internal to the sql package: *private**[sql]* def internalCreateDataFrame( catalystRows: RDD[InternalRow], schema: StructType, isStreaming: Boolean = false): DataFrame = { // TODO: use MutableProjection when rowRDD is another DataFrame and the applied // schema differs from the existing schema on any field data type. val logicalPlan = LogicalRDD( schema.toAttributes, catalystRows, isStreaming = isStreaming)(self) Dataset.ofRows(self, logicalPlan) } Please let me know where I can find the appropriate API or documentation. Thanks Mans