> From: Itay Eliaz [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, April 24, 2002 12:03 PM
> I'm trying to implement an InputSource based on a TCP socket for the XMLparser. > I do this since my connection is not HTTP, and therefore I can't use the > URLInputSource. > To do this I derived from the InputSource and BinInputStream classes. > My problem is the document ends but the socket is still open and the parser > hangs. > In my implemetation of the BinInputStream::readBytes method, the whole > document is read, but in the next method call it hangs since it didn't reach > maxToRead nor EOF. (I think I understand your problem, but if not feel free to correct me and/or ignore this message entirely.) This is more TCP issue than a Xerces one. TCP doesn't make any guarantees about record boundaries - it's an octet-stream connected protocol. That means you have no guarantee that one call to recv() will read all the data written to the socket by the peer. The sending program must communicate the end of its transmission in some fashion. The usual approaches are: 1. An application protocol that delimits the application data. This can be as simple as sending a size value immediately before the data (making sure, of course, to send it in some canonical form over the wire and translate it on the receiving end into something you can use). Or it can be a flexible, powerful protocol with features to handle many kinds of conditions and room for expansion. Like, say, HTTP. 2. Data that's self-delimiting, with some kind of sentinel value at the end. Good only if you can reserve an octet value for the sentinel, and even so it lacks the advantage of the first method in simplifying buffer-manipulation code in languages that don't offer automatic buffer manipulation. 3. A TCP half-close: the sending side sends a TCP FIN, indicating that it will not be sending any more data. The sockets API lets you do this with the shutdown() function. See pretty much any reference on programming with sockets for more information. 4. Send record-boundary information on another channel. There's no advantage here for new applications; it's a kluge to get around unavoidable design problems with existing code. 5. Consider the transmission over when a timer expires without new data being received. This is only appropriate in limited circumstances, usually as part of error recovery. If you use method 3 (the half-close), subsequent calls to recv() will return 0 (the socket equivalent of EOF). Otherwise, when your readBytes notes the end of the transmission from the peer, it will have to note that fact somewhere, so that subsequent calls to readBytes can simulate EOF. (That's assuming I understand the semantics of readBytes; I haven't looked at that code.) Now, since XML documents are self-delimiting (assuming you only send a single document at a time), you *could* use the document itself to implement method 2, but that would in effect require readBytes to do some parsing itself - definitely a backwards and error-prone approach. I'd say 1 or 3 is the way to go. Personally, I'd lean toward 1, as the most powerful, and have the sender implement a very basic HTTP/1.0 user agent. (You don't want HTTP/1.1, as that would require supporting the Chunked transfer-encoding and various other features excessive to your requirements.) But 3 is simple and elegant, if the sender only wants to send one document, get a response, and close the connection. Just make sure you handle a return value of 0 from recv() correctly in readBytes. Michael Wojcik Principal Software Systems Developer, Micro Focus Department of English, Miami University --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
