Hi all:

I'm reviving a thread from long ago now that I've gotten a few minutes
to look at this question again: How is XML data best parsed using a
SAX parser in Pharo Smalltalk?

I tried to look at the GenomeTools project that Miguel references
below, but it seems that the class he mentions (GTNCBIBlastParser) is
no longer in it. Perhaps there's a newer, better example of how to
drive the SAX parser somewhere?

> Message: 4
> Date: Tue, 20 Jul 2010 12:25:29 -0500
> From: Miguel Enrique Cob? Mart?nez <[email protected]>
> Subject: Re: [Pharo-project] Markup Builder in Smalltalk (XMLWriter)
> To: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="UTF-8"
>
> This good summary should go directly to the collaboractive book.
>
>
> El mar, 20-07-2010 a las 14:11 -0300, Hern?n Morales Durand escribi?:
>> A XML parser just creates a representation of a XML document according
>> to a parsing model. Ideally you should choose a XML parser
>> specifically for your needs. You have different parsing models:
>>
>> -Tree Parser: This is what you will read everywhere as the "DOM parser"
>> -Event Parser: This is denoted  by S*X and could be
>> --SAX Parser: Known as the "Push parser"
>> --StAX Parser: Known also too as the "Pull parser"
>> -VTD Parser : This is known as "Virtual Token Descriptor"
>>
>> Now there are several classifications depending of the parser
>> characteristics and what you want to do or how. You may be interested
>> in:
>>
>> Making modifications or just processing?
>> -For modifications: The parser creates long-lived representations from
>> the XML document (necessary for modifications): You should choose DOM
>> or VTD
>> --Do you *need* to query or modify the objects (parser creates nodes): DOM
>> --You do not need the objects (parser creates integers and locations
>> caches): VTD
>> -For processing: The parser doesn't creates long-lived objects: SAX or StAX.
>>
>> Type of Access
>> -Back-and-forth: Access the data after the parsing is complete: DOM or VTD
>> --Massive or very frequent access: Choose DOM
>> --Rare or simple access: Choose VTD
>> -Sequential: Access the data while you're processing the document: SAX or 
>> StAX
>> --Processing all tokens: SAX
>> --Processing interested tokens (allows skipping forward): StAX
>>
>> Briefly
>> -Streaming applications (very large documents): SAX or StAX
>> -Database applications: DOM or VTD
>> -Hardware acceleration?: VTD
>>
>> For the S*X parsers you need to know the XML token types because, for
>> example in the case of XMLParser in Pharo/Squeak, you probably would
>> subclass SAXHandler and override one or several methods in the content
>> category to do your own processing. See GTNCBIBlastParser in
>> http://www.squeaksource.com/GenomeTools.html for an example of a SAX
>> Parser.
>>
>> XML token types:
>> Start element: <Hit>....
>> End element: ...</Hit>
>> Text: <...>Text value</...>
>> etc.
>>
>> For DOM usage examples you may see
>> http://community.ofset.org/index.php/Les_bases_de_XML_dans_Squeak (it
>> is in french but is a good document)
>>
>> What we have in Pharo/Squeak
>>
>> Parsers:
>> 1) XMLParser : Supports SAX and DOM. 
>> http://www.squeaksource.com/XMLSupport.html
>> 2) VWXML Parser : Supports SAX and DOM (AFAIK)
>> http://www.squeaksource.com/VWXML.html
>> 3) XMLPullParser : Supports StAX. 
>> http://www.squeaksource.com/XMLPullParser.html
>>
>> XML Query tools
>> 1) Pastell : Supports X-Path like queries. Requires XMLParser.
>> http://www.squeaksource.com/Pastell.html
>> 2) XPath library : Supports XPath partially. Requires XMLParser.
>> http://www.squeaksource.com/XPath.html
>>
>> There are several additional tools in SqueakSource but I haven't reviewed 
>> yet.
>> A VTD parser would be ideal for Smalltalk because it uses integer
>> arrays reducing the object allocation overhead in memory. I haven't
>> found implementations of a XML VTD parser in Smalltalk as of today.
>> Cheers,
>>

Thanks,
-- 
Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
PGP Sig: 917E DDB7 C911 9EC1 0CD9  C06B 06C4 835F 0BB8 7336

Reply via email to