Hi Spark Experts

I have a customer who wants to monitor coming data files (with xml format),
and then analysize them after that put analysized data into DB. The size of
each file is about 30MB (or even less in future). Spark streaming seems
promising.

After learning Spark Streaming and also google-ing how Spark Streaming
handle xml files, I found there seems no existing Spark Stream utility to
recognize whole xml file and parse it. The fileStream seems line-oriented.
There is suggestion of putting whole xml file into one line, however it
requires pre-processing files which will bring unexpected I/O.

Can anyone throw some light on it? If will be great if there are some
sample codes for me to start with.

Thanks

Yong

Reply via email to