Thanks for your suggestions.

One question, though: I just tried "xmllint --stream --schema" on a test
file where the opening and ending tags of an element do not match as follows
(the xsd file is attached):

<?xml version="1.0" encoding="UTF-8"?>
<InstrumentReferenceData CreationTime="14:20:00" CreationDate="2008-08-25"
Version="0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:noNamespaceSchemaLocation="test.xsd">
 <InstrumentRecord>
  <*SourceAuthority*>DE</*SourceAuthorityXYZ*>
  <InstrumentIdentification>DE0007164600ww</InstrumentIdentification>
 </InstrumentRecord>
</InstrumentReferenceData>

What I got was:

xmllint --stream --schema test.xsd test.err.xml
Unimplemented block at xmlschemas.c:28270
test.err.xml validates
test.err.xml : failed to parse

I do not understand why it says that the file validates but the parsing
failed. How should this be considered?
Does the presence of an unimplemented block still leave the output of
xmllint reliable?
Moreover, there is no error message so in this case it is impossible to
understand what happened exactly (therefore in our case it would not
be useful to use xmllint to validate incoming files).

The same thing happens if the opening tag of an element name is misspelled:

<?xml version="1.0" encoding="UTF-8"?>
<InstrumentReferenceData CreationTime="14:20:00" CreationDate="2008-08-25"
Version="0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:noNamespaceSchemaLocation="test.xsd">
 <InstrumentRecord>
  <*SourceAuthorityXYZ*>DE</*SourceAuthority*>
  <InstrumentIdentification>DE0007164600ww</InstrumentIdentification>
 </InstrumentRecord>
</InstrumentReferenceData>

In this case I get:

Schemas validity error : Element 'SourceAuthorityXYZ': This element is not
expected. Expected is ( SourceAuthority ).
Unimplemented block at xmlschemas.c:28270
test.err.xml validates
test.err.xml : failed to parse

Something similar happens if I validate the file with xmlSchemaValidateFile
since it relies on xmlSchemaValidateStream just like the --stream option (if
I understand correctly).

Thanks,
Massimo


2008/8/29 Daniel Veillard <[EMAIL PROTECTED]>

>  On Thu, Aug 28, 2008 at 04:44:41PM +0200, bagnacauda wrote:
> > Hello,
> >
> > An external company is going to send us very large xml files - up to
> 400MB -
> > which will have to be
> > - validated against a schema (if validation fails, a report of all errors
> > found by the parser is produced and processing is stopped)
> > - processed in order to use their data to update our database
> >
> > Now I'm wondering what is the best approach to handle these files since
> the
> > processing is quite simple but the files are REALLY large.
> >
> > What is best in terms of performance: SAX or the reader?
> > Has anybody ever met with this problem?
>
>  I have parsed/validated 4+GB files with libxml2. 400MB is not that big
> believe me.
>  I would suggest for validation simplicity to just fork off
>  xmllint --schemas ....xsd --stream your_big_file.xml
> as an entry point test.
>  then IMHO the speed of your database will be the limiting factor on
> import so use the way cleaner reader API for the import code, it
> will avoid a whole class of problems (entities) and have a way
> friendlier API, while being quite fast enough. Parsing itself shouldn't
> take much more than 10s. Your database may crawl for a while though ...
>
> Daniel
>
> --
> Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
> [EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
> http://veillard.com/ | virtualization library  http://libvirt.org/
>
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; elementFormDefault="qualified" attributeFormDefault="unqualified">

	<xs:simpleType name="InstrumentIdentificationType">
		<xs:restriction base="xs:string">
			<xs:whiteSpace value="collapse"/>
			<xs:pattern value="[A-Z]{2}([A-Z]|[0-9]){9}[0-9]"/>
		</xs:restriction>
	</xs:simpleType>
	<xs:simpleType name="AuthorityKeyType">
		<xs:restriction base="xs:string">
			<xs:whiteSpace value="collapse"/>
			<xs:pattern value="[A-Z]{2}"/>
		</xs:restriction>
	</xs:simpleType>
	
	<xs:element name="InstrumentReferenceData">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="InstrumentRecord" type="InstrumentRecordType" maxOccurs="unbounded"/>
			</xs:sequence>
			<xs:attribute name="CreationDate" type="xs:date" use="required"/>
			<xs:attribute name="CreationTime" type="xs:time" use="required"/>
			<xs:attribute name="Version" type="xs:string" use="required" fixed="0.9"/>
		</xs:complexType>
	</xs:element>
	<xs:complexType name="InstrumentRecordType">
		<xs:sequence>
			<xs:element name="SourceAuthority" type="AuthorityKeyType"/>
			<xs:element name="InstrumentIdentification" type="InstrumentIdentificationType"/>
		</xs:sequence>
	</xs:complexType>
</xs:schema>
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to