Thanks Akhil

I will have a try and then go back to you

Yong

On Mon, Jun 22, 2015 at 8:25 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Like this?
>
> val rawXmls = ssc.fileStream(path, classOf[XmlInputFormat],
> classOf[LongWritable],
>       classOf[Text])
>
>
> Thanks
> Best Regards
>
> On Mon, Jun 22, 2015 at 5:45 PM, Yong Feng <fengyong...@gmail.com> wrote:
>
>> Thanks a lot, Akhil
>>
>> I saw this mail thread before, but still do not understand how to use
>> XmlInputFormatof mahout in Spark Streaming (I am not Spark Streaming
>> Expert yet ;-)). Can you show me some sample code for explanation.
>>
>> Thanks in advance,
>>
>> Yong
>>
>> On Mon, Jun 22, 2015 at 6:44 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> You can use fileStream for that, look at the XMLInputFormat
>>> <https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java>
>>> of mahout. It should give you full XML object as on record, (as opposed to
>>> an XML record spread across multiple line records in textFileStream). Also 
>>> this
>>> thread
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Parsing-a-large-XML-file-using-Spark-td19239.html>
>>> has some discussion around it.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Jun 22, 2015 at 12:23 AM, Yong Feng <fengyong...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi Spark Experts
>>>>
>>>> I have a customer who wants to monitor coming data files (with xml
>>>> format), and then analysize them after that put analysized data into DB.
>>>> The size of each file is about 30MB (or even less in future). Spark
>>>> streaming seems promising.
>>>>
>>>> After learning Spark Streaming and also google-ing how Spark Streaming
>>>> handle xml files, I found there seems no existing Spark Stream utility to
>>>> recognize whole xml file and parse it. The fileStream seems line-oriented.
>>>> There is suggestion of putting whole xml file into one line, however it
>>>> requires pre-processing files which will bring unexpected I/O.
>>>>
>>>> Can anyone throw some light on it? If will be great if there are some
>>>> sample codes for me to start with.
>>>>
>>>> Thanks
>>>>
>>>> Yong
>>>>
>>>>
>>>
>>
>

Reply via email to