Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: >> If you have an arbitrary fragment of data, don't fiddle with it. > > Thisis your scenario. The simple concept of a unique "start" of text > does not exist in live streams that can start anywhere. So you cannot > always expect that U+FEFF or U+FFFE will only exist once in a strram > and necessaryly at the start of position where you can start reading > it because you may already be past the initial creation of the stream > without having any wya to come back to the "start".
An "arbitrary fragment of data" -- I'm going to keep using the exact same phrase until it sinks in -- DOES have a start and an end. THAT is my scenario. > Your assumption just assumes that you can always "rewind" your file, My assumption assumes no such thing. > Now you will argue: this live stream is not plain text, it has a > binary structure. Well, yes. > Yes but only if your consumer application wants to process the full > multiplex. Typically clients will demultiplex the stream and pass it > down to a simpler client that absolutely does not care about the > transport multiplex format. If that downward client is just used to > display the incoming text, it will just wait for text that will be > buffered ine by line and displayed immediately where there's a newline > separator. But even in this case, each line may have been fragmented > so that each fragment will contain a leading BOM which will nto be > necessarily stripped Question: Why did the process that broke the stream into fragments add leading BOMs? > (you have also incorrectly asuumed that a text stream is necessaily > transported over a "reliable" protocol like TCP where there can be no > data loss in the middle Really. I think you have incorrectly asuumed my asuumption. > Texts are inhernetly fragmentable. Initially they are transcripts of > human communication and nobody in real life is permanently connected > to someone else and able to remember eveything that was said by > someone else. OK, I think are far enough removed from Unicode to end this. -- Doug Ewell | Thornton, CO, USA http://ewellic.org | @DougEwell _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode