I think you're on to something when you inspect the document contents. However, I'd consider three changes to make the process more robust, and there are a couple of simple command-line tests you can run to try to discover whether your input is being altered before your program sees it. (I think this is virtually certain.)
 
The first test program alteration, and probably the most important, is to hex dump your input rather than using printf(). printf may not show important differences in the input.
 
Second, make sure you're dumping the entire document. If the problem occurs late in a file bigger than 5000 bytes, you won't see it as the program is currently written.
 
Finally, consider altering the test program to dump the input regardless of where the input comes from. If the output differs when the input comes from stdin versus the file, either cat/type has altered the stream, or the standard input has.
 
The first command line test is to compare the results of "cat sim.xml | hexdump" and "hexdump sim.xml". (I'm assuming your Linux box has hexdump.) It seems likely to me that they will differ, in which case either cat or the pipe has altered the stream. Smart money would be on cat being the culprit, though I don't know why it would change the stream.
 
The second command line test is "./test < sim.xml", which will stream sim.xml directly to your test program's standard input (rather than allowing cat a chance to alter it). If this works, cat is almost certainly guilty of transforming the file.


From: Aditya Kulkarni [mailto:[EMAIL PROTECTED]
Sent: Monday, November 01, 2004 5:01 AM
To: [EMAIL PROTECTED]
Subject: "Invalid Document Structure" error during parsing

Hi all,

 

I am getting the “Invalid document structure” error during parsing of an XML file. The scenario is as follows –

 

[1] Platform: Redhat Linux 9.0 and Windows 32-bit

 

[2] Non multi – threaded

 

[3] I am passing an instance of  StdInInputSource class as an argument to the XercesDomParser::parse() method.

 

[4] My test program is being invoked in this way :  cat sim.xml | ./test ( type sim.xml | test on win 32)

 

[5] When I run my program with the same xml file (invoked as ./test sim.xml ) directly, it gives the desired output. Here I have passed an instance of the LocalFileInputSource class as the argument to XercesDomParser::parse() method.

 

What can be the possible cause of this message?

 

Here is the snip of code that I am talking about.

<snip>

Void XMLParser :: parse( void )

    char            whoami[] = "XMLParser::parse";

    XMLCh           *x_file_name = NULL;

 

    // [0]

    XMLPlatformUtils::Initialize();

 

    // [1]

    m_parser = new XercesDOMParser();

    NULLCHECK(m_parser, whoami, "Failed to create XercesDOMParser instance", XML_PARSER_NOMEM);

 

    // [2]

    m_parser->setDoNamespaces(true);

    m_parser->setValidationScheme(XercesDOMParser::Val_Always);

    m_parser->setDoSchema(true);

    m_parser->setCreateEntityReferenceNodes(false);

    m_parser->setCreateCommentNodes(false);

    m_parser->setIncludeIgnorableWhitespace(false);

 

    m_ehandler = (ErrorHandler *)new HandlerBase();

    if (NULL != m_ehandler) {

        m_parser->setErrorHandler(m_ehandler);

    }

 

    // create input source.

    if (m_readFromStdin == true) {

        m_inputSource = new StdInInputSource();

        NULLCHECK(m_inputSource, whoami, "StdInInputSource failed", XML_PARSER_NOMEM);

    } else {

        x_file_name = XMLString::transcode(m_xmlFile.c_str());

        NULLCHECK(x_file_name, whoami, "XMLString::transcode failed", XML_PARSER_FAILED);

        m_inputSource = new LocalFileInputSource((const XMLCh *)x_file_name);

        NULLCHECK(m_inputSource, whoami, "LocalFileInputSource failed", XML_PARSER_NOMEM);

    }

 

    // [3]

    try {

        m_parser->parse((const InputSource &)*m_inputSource);

        //        m_parser->parse((const char *)m_xmlFile.c_str());

    } catch (const XMLException &x) {

        handleXMLException(x);

    } catch (const DOMException &d) {

        handleDOMException(d);

    } catch (const SAXParseException &sp) {

        handleSAXParseException(sp);

    } catch (const SAXException &s) {

        handleSAXException(s);

    }

 

    if (m_parser->getErrorCount() != 0) {

        printf("Errors encountered during parsing file %s\n", m_xmlFile.c_str());

        return XML_PARSER_FAILED;

    } 

</snip>

 

I also tried peeping at the contents that come to the test program from the StdInInputSource. The input that I get from StdInInputSource is exactly similar to what is present in the file. Here is the code to see what comes from the StdInInputSource.

 

<snip>

    // check if input stream is correct

    if (m_readFromStdin == true) {

        bin_in_stream = m_inputSource->makeStream();

        if (NULL == bin_in_stream) {

            printf("bin_in_stream == NULL\n");

            return XML_PARSER_FAILED;

        }

        memset(inbuf, 0, 5000);

        bin_in_stream->readBytes((XMLByte *const)inbuf, 4999);

        printf("%s\n", inbuf);

    }

</snip>

 

Thanks,

Aditya Kulkarni

Reply via email to