Michael has detailed some excellent points. However, I ran into the case
that because I'm getting xml documents from a 3rd party source, options 1 &
3 are not available. I used a callback (the BinInputStream object is the
registered callback object) strategy on the parsing of the top level tag so
when the parser was done seeing the entire XML document, the callback
notified the BinInputStream that there was no more need to read & that
object sets a flag that is checked inside readBytes() prior to doing a read
on the socket (option #2 below). my readBytes() blocks for a very short time
& returns what it has read. That way the flag can be set & used prior to the
next invocation of readBytes(). The registered object has to implement a
simple notification interface.  That, combined with a timeout-based read
(for the very reasons Michael described so well), this works for us without
having to change the behavior in the 3rd party software.
Good luck.
-steve

> -----Original Message-----
> From: Michael Wojcik [SMTP:[EMAIL PROTECTED]]
> Sent: Wednesday, April 24, 2002 1:30 PM
> To:   [EMAIL PROTECTED]
> Subject:      RE: TCP socket InputSource 
> 
> > From: Itay Eliaz [mailto:[EMAIL PROTECTED]]
> > Sent: Wednesday, April 24, 2002 12:03 PM
> 
> > I'm trying to implement an InputSource based on a TCP socket for the
> XMLparser.
> > I do this since my connection is not HTTP, and therefore I can't use the
> > URLInputSource.
> > To do this I derived from the InputSource and BinInputStream classes.
> > My problem is the document ends but the socket is still open and the
> parser
> > hangs.
> > In my implemetation of the BinInputStream::readBytes method, the whole
> > document is read, but in the next method call it hangs since it didn't
> reach
> > maxToRead nor EOF.
> 
> (I think I understand your problem, but if not feel free to correct me
> and/or ignore this message entirely.)
> 
> This is more TCP issue than a Xerces one.  TCP doesn't make any guarantees
> about record boundaries - it's an octet-stream connected protocol.  That
> means you have no guarantee that one call to recv() will read all the data
> written to the socket by the peer.
> 
> The sending program must communicate the end of its transmission in some
> fashion.  The usual approaches are:
> 
> 1. An application protocol that delimits the application data.  This can
> be
> as simple as sending a size value immediately before the data (making
> sure,
> of course, to send it in some canonical form over the wire and translate
> it
> on the receiving end into something you can use).  Or it can be a
> flexible,
> powerful protocol with features to handle many kinds of conditions and
> room
> for expansion.  Like, say, HTTP.
> 
> 2. Data that's self-delimiting, with some kind of sentinel value at the
> end.
> Good only if you can reserve an octet value for the sentinel, and even so
> it
> lacks the advantage of the first method in simplifying buffer-manipulation
> code in languages that don't offer automatic buffer manipulation.
> 
> 3. A TCP half-close: the sending side sends a TCP FIN, indicating that it
> will not be sending any more data.  The sockets API lets you do this with
> the shutdown() function.  See pretty much any reference on programming
> with
> sockets for more information.
> 
> 4. Send record-boundary information on another channel.  There's no
> advantage here for new applications; it's a kluge to get around
> unavoidable
> design problems with existing code.
> 
> 5. Consider the transmission over when a timer expires without new data
> being received.  This is only appropriate in limited circumstances,
> usually
> as part of error recovery.
> 
> If you use method 3 (the half-close), subsequent calls to recv() will
> return
> 0 (the socket equivalent of EOF).  Otherwise, when your readBytes notes
> the
> end of the transmission from the peer, it will have to note that fact
> somewhere, so that subsequent calls to readBytes can simulate EOF.
> (That's
> assuming I understand the semantics of readBytes; I haven't looked at that
> code.)
> 
> Now, since XML documents are self-delimiting (assuming you only send a
> single document at a time), you *could* use the document itself to
> implement
> method 2, but that would in effect require readBytes to do some parsing
> itself - definitely a backwards and error-prone approach.
> 
> I'd say 1 or 3 is the way to go.  Personally, I'd lean toward 1, as the
> most
> powerful, and have the sender implement a very basic HTTP/1.0 user agent.
> (You don't want HTTP/1.1, as that would require supporting the Chunked
> transfer-encoding and various other features excessive to your
> requirements.)  But 3 is simple and elegant, if the sender only wants to
> send one document, get a response, and close the connection.  Just make
> sure
> you handle a return value of 0 from recv() correctly in readBytes.
> 
> Michael Wojcik
> Principal Software Systems Developer, Micro Focus
> Department of English, Miami University
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation, 
offer or agreement or any information about any transaction, customer 
account or account activity contained in this communication.
***********************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to