If no pre-built solution exists, writing your own would not be that
difficult. I suggest looking at a parser combinator such as FastParse to
create your own.

http://www.lihaoyi.com/fastparse/

Regards,
Kurt

On Tue, Mar 13, 2018 at 7:47 AM Aakash Basu <aakash.spark....@gmail.com>
wrote:

> Thanks again for the detailed explanation, would like to go through.
>
> In my case, I'm having to parse large scale *.as2*, *.P3193*, *.edi *and *.txt
> *data mapping it with the respective standards and then building a JSON
> (so XML doesn't comes into the picture), containing the following (small
> example of EDI) -
>
> ISA*00*          *00*          *ZZ*D00XXX         *ZZ*00AA           
> *070305*1832*^*00501*676048320*0*P*\~
> GS*BE*D00XXX*00AA*20150305*1832*260007982*X*005010X220A1~
> ST*834*0001*005010X220A1~
> BGN*00*88880070301  00*20150305*181245****4~
> DTP*007*D8*20150301~
> N1*P5*PAYER 1*FI*999999999~
> N1*IN*KCMHSAS*FI*999999999~
> INS*Y*18*030*XN*A*C   **FT~
> REF*0F*00389999~
> REF*1L*000003409999~
> REF*3H*K129999A~
> DTP*356*D8*20150301~
> NM1*IL*1*DOE*JOHN*A***34*999999999~
> N3*777 ELM ST~
> N4*ALLEGAN*MI*49010**CY*03~
> DMG*D8*19670330*M**O~
> LUI***ESSPANISH~
> HD*030**AK*064703*IND~
> DTP*348*D8*20150301~
> AMT*P3*45.34~
> REF*17*E  1F~
> SE*20*0001~
> GE*1*260007982~
> IEA*1*676048320~
>
>
>
> Thanks,
> Aakash.
>
> On Tue, Mar 13, 2018 at 6:37 PM, Darin McBeath <ddmcbe...@yahoo.com>
> wrote:
>
>> I'm not familiar with EDI, but perhaps one option might be
>> spark-xml-utils (https://github.com/elsevierlabs-os/spark-xml-utils).
>> You could transform the XML to the XML format required by  the xml-to-json
>> function and then return the json.  Spark-xml-utils wraps the open source
>> Saxon project and supports XPath, XQuery, and XSLT.    Spark-xml-utils
>> doesn't parallelize the parsing of an individual document, but if you have
>> your documents split across a cluster, the processing can be parallelized.
>> We use this package extensively within our company to process millions of
>> XML records.  If you happen to be attending Spark summit in a few months,
>> someone will be presenting on this topic (
>> https://databricks.com/session/mining-the-worlds-science-large-scale-data-matching-and-integration-from-xml-corpora
>> ).
>>
>>
>> Below is a snippet for xquery.
>>
>> let $retval :=
>>      <map>
>>        <string key="doi">{$doi}</string>
>>        <string key="cid">{$cid}</string>
>>        <string key="pii">{$pii}</string>
>>        <string key="contentType">{$content-type}</string>
>>        <string key="srctitle">{$srctitle}</string>
>>        <string key="documentType">{$document-type}</string>
>>        <string key="documentSubtype">{$document-subtype}</string>
>>        <string key="publicationDate">{$publication-date}</string>
>>        <string key="articleTitle">{$article-title}</string>
>>        <string key="issn">{$issn}</string>
>>        <string key="isbn">{$isbn}</string>
>>        <string key="lang">{$lang}</string>
>>        {$tables}
>>      </map>
>>
>> return xml-to-json($retval)
>>
>>
>> Darin.
>>
>> On Tuesday, March 13, 2018, 8:52:42 AM EDT, Aakash Basu <
>> aakash.spark....@gmail.com> wrote:
>>
>>
>> Hi Jörn,
>>
>> Thanks for a quick revert. I already built a EDI to JSON parser from
>> scratch using the 811 and 820 standard mapping document. It can run on any
>> standard and for any type of EDI. But my built is in native python and
>> doesn't leverage Spark's parallel processing, which I want to do for large
>> and huge amount of EDI data.
>>
>> Any pointers on that?
>>
>> Thanks,
>> Aakash.
>>
>> On Tue, Mar 13, 2018 at 3:44 PM, Jörn Franke <jornfra...@gmail.com>
>> wrote:
>>
>> Maybe there are commercial ones. You could also some of the open source
>> parser for xml.
>>
>> However xml is very inefficient and you need to du a lot of tricks to
>> make it run in parallel. This also depends on type of edit message etc.
>> sophisticated unit testing and performance testing is key.
>>
>> Nevertheless it is also not as difficult as I made it sound now.
>>
>> > On 13. Mar 2018, at 10:36, Aakash Basu <aakash.spark....@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > Did anyone built parallel and large scale X12 EDI parser to XML or JSON
>> using Spark?
>> >
>> > Thanks,
>> > Aakash.
>>
>>
>>
>

Reply via email to