Hello experts, I have one additional question: how can I read binary files into a stream reader object? (intended for getting data into a kafka server).
I looked into DataStreamReader API ( https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-DataStreamReader.html#option) and other google results and didn't find an option for binary file. Any help would be very much appreciated! (thanks again for Ilya's helpful information below - works fine on sparkContext object) Regards, Peter On Thu, Sep 5, 2019 at 3:09 PM Ilya Matiach <il...@microsoft.com> wrote: > Hi Peter, > > You can use the spark.readImages API in spark 2.3 for reading images: > > > > > https://databricks.com/blog/2018/12/10/introducing-built-in-image-data-source-in-apache-spark-2-4.html > > > https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-data-support-in-apache-spark/ > > > > > https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.ml.image.ImageSchema$ > > > > There’s also a spark package for spark versions older than 2.3: > > https://github.com/Microsoft/spark-images > > > > Thank you, Ilya > > > > > > > > > > *From:* Peter Liu <peter.p...@gmail.com> > *Sent:* Thursday, September 5, 2019 2:13 PM > *To:* dev <d...@spark.apache.org>; User <user@spark.apache.org> > *Subject:* Re: read image or binary files / spark 2.3 > > > > Hello experts, > > > > I have quick question: which API allows me to read images files or binary > files (for SparkSession.readStream) from a local/hadoop file system in > Spark 2.3? > > > > I have been browsing the following documentations and googling for it and > didn't find a good example/documentation: > > > > https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2F2.3.0%2Fstreaming-programming-guide.html&data=02%7C01%7Cilmat%40microsoft.com%7Cad36f2af52aa4cc906d908d7322cc4e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637033040182027177&sdata=vYJ%2Ftor22teIlzMGMfqvsiQn5D6iFHcf4u0N2K2dkmc%3D&reserved=0> > > > https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.package > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2F2.3.0%2Fapi%2Fscala%2Findex.html%23org.apache.spark.package&data=02%7C01%7Cilmat%40microsoft.com%7Cad36f2af52aa4cc906d908d7322cc4e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637033040182037172&sdata=HeP0Bxk6eLdCk71uH7wcCxHwIM%2FCjbhzoQaiZgs0Gi0%3D&reserved=0> > > > > any hint/help would be very much appreciated! > > > > thanks! > > > > Peter >