RE: [codec] StatefulDecoders
Alex, Sorry about the delay. I'm a bit snowed under at the moment. I've attached producers/consumers that process Object instances instead of type specific data. Some differences between these interfaces and those you've already built are: 1. These have no monitor facility. All errors result in a CodecException being thrown. There is no concept of a warning. 2. These have a finalize concept which is required for implementations such as base64 where padding on the final data block is required. 3. These have flush methods to allow the chain to be flushed without being finalized. 4. These have a propogating flag which allows finalize/flush calls to be propagated through codec chains. By default this is true but can be set to false. This is necessary when a single consumer (eg. OutputStreamConsumer) receives streams from multiple sources and each of those sources are finalized before the next is started. In this case you don't want the OutputStreamConsumer to be finalized (and the underlying stream closed) multiple times, you want this to occur only after the final input source completes. Hope this makes sense. With regards to each difference: 1. Not sure of the correct approach here. I threw exceptions because it was simpler to implement and made it harder to end up with silent errors occurring. A monitor approach is more flexible although perhaps harder to use from a client perspective. 2. I believe you will need to add a finalize concept. Some codecs require notification that this is the final processing call (ie. Base64). 3. Flush isn't critical. I just added it for completeness. 4. A propagating option isn't critical and java IOStreams don't have this concept. However a common problem is where you wish to feed the result of several streams into a single stream without the close on each top level stream calling close on the receiving stream. Another way of overcoming this is to create a special Noop Nopropagate codec that you insert into the chain to prevent these calls propagating. I meant to create some sample code using my interfaces to compare with yours but I can't get it done at the moment. You're obviously clued in on what's required, any differences between mine and yours is relatively small and I'm sure either would suit the purposes of codec. Given that I'm taking way too long to do anything at the moment I'll leave it in your capable hands. My current work should ease up in a few weeks and I'll try to give you a hand again then. Cheers, Brett > Could you give some example's of how this would look just using > > Objects instead of specific types to implement what the DecoderStack > > does here: > > > > http://cvs.apache.org/viewcvs.cgi/incubator/directory/snickers > /trunk/codec-s > > tateful/src/java/org/apache/commons/codec/stateful/DecoderStac > k.java?rev=972 > > 4&root=Apache-SVN&view=auto > > > > And go on to show how it's used like in this test code here: > > > > http://cvs.apache.org/viewcvs.cgi/incubator/directory/snickers > /trunk/codec-s > > tateful/src/test/org/apache/commons/codec/stateful/DecoderStac > kTest.java?rev > > =9724&root=Apache-SVN&view=auto > > > > Specifically I'm referring to the usage of the DecoderStack in the > > testDecode() example which shows the chaining really simply. > > Perhaps looking at the two use cases we can come to a better > conclusion. > > > > > > > Does the above make sense? If so, please give it careful > > consideration > > > because I originally used the callback design and modified it to use > > > producers/consumers because I think it is actually simpler > and is much > > > more > > > flexible. > > > > Yes it makes sense I just want to see it and play with it. > Can you whip > > it up and we'll begin getting a feel for both sets of interfaces. > > > > > If you're still not convinced I guess I'll have to give in > and go with > > the > > > flow ;-) > > > > Nah we'll try come to an understanding. > > > Producer.java Description: java/ Consumer.java Description: java/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] StatefulDecoders
Alex, I haven't had a chance to respond to your email yet. I'll try to do so tonight. I'll knock up a couple of quick interfaces for comparison at the same time. Cheers, Brett > -Original Message- > From: Alex Karasulu [mailto:[EMAIL PROTECTED] > Sent: Wednesday, 24 March 2004 12:23 PM > To: 'Jakarta Commons Developers List' > Subject: RE: [codec] StatefulDecoders > > > Brett, > > > > Ok let's take a breath and dive into this email :-). > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] StatefulDecoders
> Take a look at the "Reclaiming Type Safety" section in this > article on the > > event notification pattern here: > > > > http://members.ispwest.com/jeffhartkopf/notifier/ > Cool, that's a neat way of achieving type safety. Avoiding downcasts (eg. Object to byte[]) is a good thing. It still relies on a runtime check but is only performed in one piece of code instead of every implementation of an event receiver. Advantages: Type safety enforced in a single class instead of using downcasts within each event receiver. Single event method defined in interfaces instead of methods per event type. No need to define separate interfaces per event type. Disadvantages: No compile time type checking, incorrect types may not be picked up during development. Runtime overhead to perform reflection on event receiver class and locate type specific event receiver method. Runtime overhead converting from generic event type to specific event type. I don't know if compile time or runtime checks should be used but if runtime checks are chosen then this pattern is a good way of enforcing them. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] StatefulDecoders
> -Original Message- > From: Noel J. Bergman [mailto:[EMAIL PROTECTED] > Sent: Monday, 8 March 2004 3:25 PM > To: Jakarta Commons Developers List > Subject: RE: [codec] StatefulDecoders > Consider the Cocoon (http://cocoon.apache.org/) pipeline for the > concept of > > a fully event-driven programming approach, although their > implementation has > > far too much overhead for codec purposes (or the regex events I > mentioned). I still intend to look at this although my list of reading seems to grow daily ... > IMO, we want a consistent interface that provides the fundamental > > operations, and then we can build convenience on top of that interace. > > > > Your interface is closer to what I had in mind than Alex's, at least > using > > more of the generic terminology. The codec domain can be expressed > using > > the pipeline interface, or if we want a codec specific interface, > Alex's > > could be a convenience layer on top of the pipeline. > > > > What I had imagined is an approach where each element in the pipeline > > supports a registering for variety of event notifications. Some of > them may > > be generic, some of them may be domain specific. Most codec uses > would use > > generic events. > > > > The key is being able to go an object and register for semantic > events. So > > if you assume that a transformer is both a producer and a consumer, > you > > could register a transformer with a datasource (producer), and > register > > downstream consumers that want the decoded data with the transformer. > Yes, > > we would want to allow both fan-in and fan-out where appropriate. I think my design already supports most of the above ideas, they just have to be implemented as required for the particular usage. For example, a Base64Encoder already supports registering for events of type byte[]. A MIMEMultipartDecoder could generate events of type MIMEPart. The event types are not specified by the existing implementation, they can be added as necessary for the particular feature. Fan out is achieved by creating a multicast stage accepting single events from a producer and passing them to multiple consumers of the same type (although I haven't implemented a multicast stage because I haven't needed it yet :-). Fan in can already be handled by setting a single consumer as the destination for multiple producers. Perhaps this isn't quite what you're envisaging. You may like to see a more generic approach that allows events and pipelines to be described in more abstract terms. Unfortunately I can't see a way of achieving this without making the API complex and imposing overhead. If you're looking for a more powerful approach, should it be implemented outside of codec where runtime issues aren't quite as critical? I guess it depends on what problems you're trying to solve. If you wish to process large streams of data in an efficient manner my implementation is a good fit, if you're looking to process structured data (eg. MIME) it can be extended to fit as required, if you're looking to use it as the basis of communication and processing within a server then it isn't up to the task. However, isn't the last point outside the scope of codec and more in the realm of other designs/libraries such as SEDA. > As for the details of message transport ... it seems to me that we > already > > have multiple options, so I'm not sure that we want to roll our own > versus > > adopting and/or adapting an existing one. > > > > We have JMS, and the new concurrency package coming in JDK 1.5. One > thing > > that got botched in JDK 1.5 is that they removed the Putable and > Takable > > interfaces that Doug Lea had used in his library, instead merging > their > > functionality directly onto the queue interface, falsely believing > that a > > message queue is a Collection. A number of us argued for those > interfaces, > > and Doug proposed a change, but it was vetoed by Sun. > > > > I see a few options, such as: > > > > - we pick up the necessary interfaces from Doug's concurrent > > library, and deal with java.util.concurrent down the road. > > > > - we use JMS interfaces in a very simplied form. > > > > As potentially whacked as the idea might be, considering the > complexity of > > JMS, I believe that we could selectively use JMS interfaces without > undo > > complexity or hurting performance. Basically, we'd ignore the things > that > > don't make any sense in our context. Take a look at MessageProducer, > > MessageConsumer and MessageListener. Intelligently, they are just > > interfaces. We don't need multi-threading, network transports, etc., > in > > general, although by using those interfaces, they would be available > where > > applications warranted them. > > > > ref: > > http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/jms/packag > e-summary.html > > > > Alternatively, I had this odd thought that we
RE: [codec] StatefulDecoders
I just responded to one of your emails Alex, apologies for it being about 5 days late! I've also got some reading to do ... > -Original Message- > From: Alex Karasulu [mailto:[EMAIL PROTECTED] > Sent: Monday, 8 March 2004 6:17 PM > To: 'Jakarta Commons Developers List' > Subject: RE: [codec] StatefulDecoders > > > Noel, > > > > This will keep me busy for a while and thanks for taking the time to > > respond. Please bear with me as I try to understand your view point. > > > > Alex > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] StatefulDecoders
> How about we put our minds together and finalize some of this > stuff so I can > > start writing some codecs that can be added back to this project? Yeah definitely, sounds like we're trying to solve the same problem here. I haven't responded to your previous emails because I haven't contributed before and was leaving opinions to those who've actually proven themselves. > > > In general, I have long preferred the pipeline/event model to > > > > the approach > > > > > > > > that Alex had, where it would give data to the codec, and > > > > then poll it for > > > > Agreed! Mine approach was not the best but have you had a > chance at looking > > at the new interfaces that I sent out with the callbacks. > Shall I resend > > those? > I still have them here. I'll comment on them further down. > Let me just list the requirements one more time: > > 1). Interfaces should allow for implementations that perform > piece meal > > decodes > >- enables implementations to have constant sized > processing footprints > >- enables implementations to have efficient non-blocking > and streaming > > operation Agreed. > 2). Easily understood and simple to use Agreed, although needs to be weighed up with any conflicting requirements. > 3). Interfaces should in no way shape or form restrict or limit the > > performance of implementations what ever they may be. Agreed, although without knowing all of these implementations in advance we can never be sure ;-) > > > You're right, my design has no concept of structured content. It was > > > developed to solve a particular problem (ie. efficient > streamable data > > > manipulation). If API support for structured content is > required then > > my > > > implementation doesn't (yet) support it. > > > > You can build on this separately no? There is no need to > have the codec > > interfaces take this into account other then allow this > decoration in the > > future rather than inhibit it. > Yes, I can build on it separately, however a new set of producers and consumers are needed for each type of structured data. I don't see this as a problem because trying to make this too generic may lead to loss of performance and a complicated API. > > > I'll use engine for the want of a better word to describe > an element > > in a > > > pipeline performing some operation on the data passing through it. > > > > SEDA calls this a stage btw. Much better :-) > > With codecs the encoding is variable right? It could be anything. > > Something has to generate events/callbacks that delimit > logical units of the > > encoding what ever that may be. For some encodings that you mentioned > > (base64) there may not be a data structure but the unit of > encoding must be > > at least two characters for base64 I think. Please correct > me if I'm wrong. 3 byte input 4 byte output for encoding, and 4 byte input 3 byte output for decoding. Input is padded if not a multiple of 3 bytes. > > So there is some minimum unit size that can range from one > byte to anything > > and this is determined by the codec's encoding and reflected > in some form of > > callback. SAX uses callbacks to allow builders that are > content aware do > > their thing right? Now I'm not suggesting that a base64 codec's > > encoder/decoder pairs make callbacks on every 2 or single > byte (depending on > > your direction). In the case of such a non-structured > decoder the buffer > > size would be the determining factor or the end of the stream. Agreed. > > > > So I think we need to use callbacks to let decoders tell us > when they hit > > some notable event that needs attention whatever that may be. I agree in principle here although I'm not sure that I agree with the structure of callbacks. I'll explain more later. > > > operations. These are pipelines; receiving content on one > > > > end, performing > > > > > > > > operations, and generating events down a chain. More than > > > > one event could > > > > > > > > be generated at any point, and the chain can have multiple paths. > > > > This, the pipelining notion, IMHO is overly complicated for > building out > > codec interfaces. The pipeline can be built from the smaller > simpler parts > > we are discussing now. We must try harder to constrain the scope of a > > codec's definition. > > Noel as you know I have built server's based on pipelined > components before > > and am trying it all over again. We must spare those wanting > to implement > > simple codecs like base64 from these concepts let alone the > language around > > them. The intended use of codecs by some folks may not be so > grandiose. > > They may simply need it to just convert a byte buffer and be > done with it. > > There is no reason why we should cloud this picture for the > simple user. I agree that we definitely don't want to introduce complexity and computational overhead for simple cases. However I think m
RE: [codec] StatefulDecoders
Noel, Sorry about the delay, I've been away for a few days. > In general, I have long preferred the pipeline/event model to > the approach > > that Alex had, where it would give data to the codec, and > then poll it for > > state. However, I don't see something in your implementation > that I think > > we want. We want to be able to have structured content handlers and > > customized events depending upon the content handler and the > registered > > event handlers. This could be particularly important in a streaming > > approach to MIME content. And I also desperately want a > regex in this same > > model. You're right, my design has no concept of structured content. It was developed to solve a particular problem (ie. efficient streamable data manipulation). If API support for structured content is required then my implementation doesn't (yet) support it. I'll use engine for the want of a better word to describe an element in a pipeline performing some operation on the data passing through it. An API aware of structured content shouldn't complicate the creation of simple engines such as base64 which pay no attention to data structure. Ideally, a structured API would extend an unstructured API and only those engines requiring structured features would need to use it. I'm having trouble visualising a design that supports structured content without being specific to a particular type of structured content. Do you have some examples of what operations you would like a structured data API to support? Do you see interactions between pipeline elements being strongly typed? My design uses the concepts of producers and consumers, I'd like to see those ideas preserved. Engines are both consumers and producers but the first and last elements in a chain (or pipeline) are only producers and consumers respectively allowing I/O to be decoupled from the pipeline operations. For example, my design uses an OutputStreamConsumer to write pipeline result data to an OutputStream, OutputStreamProducer to receive data written to an OutputStream and pass into a pipeline, and InputStreamProducer to pump data from an input stream and pass into a pipeline. A structured content API can extend the producer/consumer ideas by passing data types understood by the structured content in question. For example, a multipart mime decoding engine (consumer of byte data, hence a ByteConsumer) could produce MIME parts (a MIMEPartProducer). A MIMEPartConsumer design would receive MIMEPart objects (which are in turn ByteByteEngines but extended with a MIME type property) and connect them to a consumer capable of handling the byte data contained in the MIME part. The above example would involve the definition of several new interfaces (MIMEPart extending ByteByteEngine adding mime type property, MIMEProducer extending Producer, MIMEConsumer extending Consumer) and new classes to implement the new interfaces with the behaviour desired. Any other structured content types could be handled in similar ways with new "event" types being defined and relevant producer and consumer interfaces created to support them. Perhaps a more generic method can be devised but weak typing and degraded performance are hard to avoid. > Drop the word "conversion". Yep, agreed. > Conversion is simply one of many possible > > operations. These are pipelines; receiving content on one > end, performing > > operations, and generating events down a chain. More than > one event could > > be generated at any point, and the chain can have multiple paths. If the above can be achieved without introducing a large overhead (both runtime and coding overhead) for simple operations then it sounds good. Is it worth considering the possibility of a pipeline receiving data from more than one source? This may be necessary when composing multipart MIME messages. Then again, a multipart MIME consumer class may be a better solution using similar ideas to those described earlier (ie. A MIMEPartConsumer which combines all parts into a single byte stream). I'm not sure how much sense I've made above, hopefully some ;-) Brett - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] StatefulDecoders
I probably sound like a broken record but here goes :-) If I'm barking up the wrong tree, let me know and I'll stop making noise on this list ... Many of the problems being discussed here have been solved in the library I've posted previously. An up-to-date version can be found here: http://www32.brinkster.com/bretthenderson/bhcodec-0.7.zip It uses generic interfaces for communication between all components that allows the use of streams, byte arrays, NIO, etc to be plugged together as necessary. NIO isn't currently supported but I expect it would be trivial to add it. The library can be visualised as a collection of data consumers and producers (a codec engine implements both). No distinction is made between encoding and decoding (they are the same thing in my view from an api perspective). One problem I see with the current codec project is that every new use case that is envisaged tends to require extensions to the current interfaces. The above library is designed to be more generic and allow a more pluggable approach where new functionality doesn't impact every codec implementation. It uses a push model internally but pull-model utility classes wrapped around underlying push classes can be used to implement pull functionality where necessary. It does not require JDK1.4 although NIO could be plugged in if necessary. I understand that people don't want to spend time looking at every pet project people have come up with but I think this could be useful in commons. There is a lot to look at and I guess that is discouraging people from taking the time to look at it. Should I propose this library as a separate project? (How do I do this?) Perhaps as more generic codec library that could potentially be used by commons-codec once it has matured. It may be too large a change to fit into the existing codec project as it currently stands. I've offered this several times now and while there doesn't seem to be any major opposition to the idea, there hasn't been strong support either. I'm not sure how to proceed. Is this something that can be placed in a sandbox for people to play with? Cheers, Brett > -Original Message- > From: Noel J. Bergman [mailto:[EMAIL PROTECTED] > Sent: Tuesday, 24 February 2004 2:25 PM > To: Jakarta Commons Developers List > Subject: RE: [codec] StatefulDecoders > > > > This brings up an interesting issue: How do we potentially > package and > > > deliver some code that depends on Java 1.4. In a second [codec] jar? > > > > There are several issues, but let me address what I consider > to be the key > > one: we have to design the core code as push-model. If we > were to design > > the code as pull-model, we would lose the thread of execution > inside the > > callee. We don't want the callee blocking on I/O and returning when > > finished. But with a non-blocking callee, we can then use > either a NIO or > > IO wrapper as necessary. > > > > Obviously the interface between the I/O handling wrapper and the data > > handling core will have to be Java 2 < 1.4 compatible. > > > > --- Noel > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] More thoughts on CharSets and Encoders (references: RE: [codec] Streamable Codec Framework)
> Does CharSet/Util's in [lang] approach a similar > functionality to nio.charset? After reviewing the codebase, > my viewpoint is no, as it is more for "building" charsets, > than for using them (authors rebuttals always welcome). I'd also be interested to see if this functionality exists somewhere. > I think [httpclients] static ASCII methods (once in > HttpConstants) now also in [codec-multipart] are very similar > in functionality to the idea of CharsetEncoders/Decoders of > nio.charset. > > So we begin to have functionality for charset's in [lang] and for > encoders in [codec]. How do we bring this all together? I'd > like to see > similar CharsetEncoding/Decoding capabilities as nio (with > the eventual > goal of actually having Jakarta Commons converge to using nio of > Charsets in the future. > > As a possible bridge for transition I think a CharsetEncoder > API in [codec] that duplicates that of nio.charset would form > an excellent path for convergence. The eventual goal once > j2sdk1.3 was no longer in service would be to simply refactor > Apache Projects dependent on this API to use NIO instead. Does the CharSetEncode class in my library approach the functionality you require? http://www32.brinkster.com/bretthenderson/BHCodec-0.6.zip Internally it uses an OutputStreamWriter which leverages JDK functionality albeit in a somewhat inelegant way. I would expect performance to be fairly reasonable however. I intend to write a corresponding CharSetDecode class but haven't gotten around to this yet. If you have any interest I can up the priority. It will use an InputStreamReader internally unless better alternatives are found. If at some point in the future JDK 1.4 becomes an accepted base I will be reworking CharSetEncode to use java.nio features because they provide a cleaner interface than wrapping streams. > >> If JDK1.4 is considered a sufficient base, I could > > > > I think tha considering 1.3.1 as the base requirement is safe. > > Unfortunately, as discussed on this list under various > heading, making > > 1.4 a requirement is too aggressive. > > > > Gary > > Yes, we're still supporting 1.3 in many cases, BUT, wouldn't we want > convergence eventually to the API's provided by the j2sdk? > AND, by that > point in the future, is j2sdk 1.3 even going to be in play? I will always be leaving a CharSetEncode feature in my library because it allows charset conversion to be performed within a processing chain but I would see the internal implementation moving to java.nio eventually. Brett - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] multipart encoders/decoders
> > > There are obviously advantages to having a single unified framework > > and if possible it would be the ideal result. Unfortunately > I have run > > into performance disadvantages so far. I haven't tried it > for a while > > but in the past my base 64 conversion has not been as fast as the > > existing codec implementation for small conversions. > > For common algorithms such as base 64 it may make sense to have > > two implementations optimised for different purposes. > > That does not seem justified at first. Optimize last if at all... ;-) Hehe, you're right. I guess it just feels wrong pushing for stream support in codec when its introduction will incur overhead for non-streamed cases. Of course in 99.9% of those cases the performance difference will be immeasurable in the overall application :-) > > > In addition, I'm not familiar with language codecs but you > mentioned > > it makes no sense to use these in streams. > > One of the things to keep in mind, is that for simple cases, > the f/w should be invisible to the client code. For example: > > DigiestUtil.md5Hex(new FileInputStream("boo.txt")); > > Gary Hmm, that is definitely worth remembering. The more generic I made the design, the more coding was required in order to use it :-( Perhaps a symptom of over-engineering, I hope not. There are a few ways I can think of dealing with this. 1. Do nothing. Force people to learn a new and more complicated API. 2. Create a new API that supports streaming leaving the existing API in place for the existing functionality and common use cases not requiring stream support. 3. Add stream support to the existing API. 4. Create an API supporting stream processing and re-implement the existing API using it. Of these I think. 1. A non-starter but had to list it. Backwards compatibility and usability being two reasons. 2. This is a valid approach but leaves two distinct code-bases to support. I hope there are other options available. 3. In most projects this tends to be the way things are done. In this case I'm not sure that its practical and may get fairly messy and create an unmaintable codebase. I really need to spend more time looking at the existing APIs in detail though. 4. I think a variation on this idea could work well in practice. Codec could be conceptually designed in various layers. It could have a low level API that is modular and supports stream based processing. My library or some equivalent would fit this purpose. A second layer could then provide simplified access to the library for the most common use cases implementing the existing API and adding new functionality as desired. To give an example, the example could be implemented as follows: //DigiestUtil.md5Hex(new FileInputStream("boo.txt")); public class DigestUtil { ... public static String md5Hex(InputStream inputStream) throws CodecException { BufferByteConsumer result = new BufferByteConsumer(); ChainByteEngine chain = new ChainByteEngine(result); chain.append(new MD5()); chain.append(new AsciiHexEncode()); new InputStreamProducer(chain, inputStream).pump(); return new String(result.read()); } ... } There's some overhead in initialisation but most classes are fairly lightweight. All of the above classes have been implemented if you wish to have a look. I updated some of the classes last night, a copy can be found at: http://www32.brinkster.com/bretthenderson/BHCodec-0.6.zip Brett - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] multipart encoders/decoders
> Here are a few good rules of thumb: > > 1. Commons exists as an effort to encourage code reuse. The > Streamable > framework presented was interesting, but I'd like us to find > an existing > streamable Base64 implementation inside of the ASF codebase. I have no problems with this but so far I haven't seen anything like this that doesn't sub-class InputStream and OutputStream. Sub-classing InputStream and OutputStream is problematic because it forces you to code algorithms around IOStream semantics (InputStream coding is not simple, especially available() method) and it forces you to make the encoder sub-class OutputStream and the decoder sub-class InputStream unless you write an implementation for each stream type. Providing a single InputStream implementation that can use an underlying codec engine simplifies development and testing of new algorithms considerably and removes the distinction between input and output streams. > 3. No need to expressly focus on a framework (at all). Codec > is FIRSTLY a > functional beast, even if the solution is inelegant. If there is an > existing streamable Base64 in ASF, I'd recommend copying it > outright and > placing in the codec package. Over time, it can move towards > a unified > streamable framework. I'm often guilty of over designing things, hey it's fun :-) Would it help if I didn't call my classes a "framework"? It's no more than a few common interfaces and some implementation codecs. There are no factories, or service providers, or other abstractions complicating things. It really isn't much different to the existing codec interfaces except that they are written to support streaming and separate input and output interfaces. I really think there's a need for all streamable codecs to follow common interfaces. Perhaps InputStream and OutputStream are sufficient but I think there's a better way. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] multipart encoders/decoders
> [snip] > > 1. Commons exists as an effort to encourage code reuse. The > > Streamable framework presented was interesting, but I'd like us to > > find an existing streamable Base64 implementation inside of the ASF > > codebase. > > Not for Base64 but Ant has: > > o MD5 and SHA checksum computation: > http://ant.apache.org/manual/CoreTasks/checksu> m.html Everyone will be getting tired of my emails soon ... I've had a look at Ant and they are using the java.security.MessageDigest directly. I think it goes back to JDK 1.2 so I assume it's okay to use within codec. It supports MD5 and SHA-1. We would have to create wrapper classes if we want them to support the relevant codec interfaces but this should be straightforward. Of course if we want to implement other algorithms such as SHA-256 or SHA-512 we either have to write our own or rely on the user to have the relevant security providers installed within their JVM. It would be interesting to compare performance between the Sun provided MD5 and codec. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] multipart encoders/decoders
> It is accomplished under "jakarta-commons-sandbox/codec-multipart". > > > (2) Can we agree on /what/ streamable codecs are (sorry but > I like to > > point out the obvious when starting something like this). Recognize > > the current impls alternatives. > > > > Yes sorry, I think there are two ideas running around here: > > (a) Actual "inline" Stream Encoders/Decoders (SSL etc). that > require no > knowledge of the length of the content. Probibly extend > "FilterOutputStreams" etc. > > (b) Encoders/Decoders that actually work by passing thier content > through streaming to manage larger amounts of data > efficently. Data for > which the length is probibly already known (Files). An > interface which > supports handing objects and Streams manages this: Can you give some examples of algorithms where the length needs to be known in advance? My code may break horribly with such an algorithm :-( - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] multipart encoders/decoders
> (3) Should the Producer/Consumer framework submitted be > retrofitted into the current [codec] Encoder/Decoder > framework? Personally, I like the specificity of "Encoder" > and "Decoder" for interfaces. This means the current i/f > would be expanded. I'm not terribly attached to Producer and Consumer but couldn't think of a better alternative. Perhaps Source and Sink but they may be no better. I'm not keen on Encoder and Decoder because I don't believe there's a need to make the distinction between the two, plus I might have to rewrite some code ;-) Both encoders and decoders are processing data, it is the algorithm that decides if it is encoding or decoding. In some cases the words encode and decode may not make sense. If you're modifying line endings on a file are you encoding or decoding? Producer and Consumer relate to different things. Base64Encode for example implements ByteProducer and ByteConsumer by way of the ByteEngine interface because it consumes byte data and produces byte data. OutputStreamConsumer only implements ByteConsumer because it consumes byte data but sends the data to a location outside the scope of the library (ie. OutputStream). > > (4) The other way around: should the current [codec] be > recast in the proposed Produced/Consumer f/w. I am not wild > about the genericity of "Producer" and "Consumer" as names. > > (5) I am assuming that two f/w's in [codec] are undesirable. > It would be good to agree or disagree on this previous > statement as a starter! ;-) There are obviously advantages to having a single unified framework and if possible it would be the ideal result. Unfortunately I have run into performance disadvantages so far. I haven't tried it for a while but in the past my base 64 conversion has not been as fast as the existing codec implementation for small conversions. For common algorithms such as base 64 it may make sense to have two implementations optimised for different purposes. In addition, I'm not familiar with language codecs but you mentioned it makes no sense to use these in streams. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
> I suspect we are going to need something along the lines of a > "Streamable Encoder/Decoder" for the Multipart stuff. If we look at > HttpClients MultipartPostMethod, there is a framework for basically > joining multiple sources (files, strings, byte arrays) > together into an > OutputStream which is multipart encoded. I want to attempt to > maintain > this strategy when isolating the code out of HttpClient and into the > multipart sandbox project. I suspect that your Streamable > Consumer/Producer stuff could also be advantageous for multipart > encoding/decoding. At least I want to make sure we're not > inventing the > same wheel. I'll try to look at the HttpClient code to get a feel for how it hangs together. From what I can gather my code should plug in fairly cleanly. My code doesn't specify any type of IO interface as any interface can be adapted in by implementing relevant consumers and producers. I've tried to design the framework such that the actual codec algorithms have no knowledge of the source or destination of the data they process. This allows them to be far more generic and greatly increases their usefulness. > Specifically, I see we're going to need interfaces other than the > existing codec ones because they pass around byte[] and Object when > encoding/decoding. We need to maintain that the content will > be streamed > from its native datstructure when its consumed by a consumer > (HttpClient > MultipartPost for instance) or when it is used to "decode" that the > Objects produced are built efficiently off a InputStream (ie > Files are > immediately written to the FileSystem, Strings or byte[]s are > maintained > in memory). My framework doesn't specify any particular type of data although byte oriented processing is the only fleshed out implementation at the moment. All it cares about is that a producer is available to generate data from an external source and a matching consumer is available to pass it to a destination. Every producer must have a matching consumer. A consumer can be called directly by clients. Typically an engine (implementing both consumer and producer) will sit in the middle performing some kind of translation/encoding/decoding on the data. It "consumes" input data and "produces" output data. Using this structure, processing chains can be defined so that multiple transforms can be performed on the same data all in a stream oriented fashion. To cut a long story short, chains can be defined to access data from streams/buffers/etc, perform relevant translations (re-using small in-memory buffers to eliminate garbage collection) and pass data to output streams/buffers/etc. Due to the stream support, data of arbitrary size can be processed. > > Either way, I'm currently "tidying" up a maven project > directory to be > committed into the sandbox for the new multipart codec stuff. > Once, its > in place we could add your code to it as well. Let me know if you want to import any of my code and I'll do any necessary package reorganisation. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
Thanks for the reply. Yep, it is a completely different framework. I wrote the framework before looking at the current commons codec component so there are no relationships between the two. Hopefully there are ways of incorporating ideas between the two. Are you hoping to incorporate streamed processing into the existing design or create new classes to achieve this? I have no preferences either way but it could be tricky to re-use the existing interfaces. I'll have to look at this further though. I'll look at Ant as soon as I can to see how it approaches the problem. Hmm, your simple example already uncovers a gap in my design :-) Should be easily solved though. Phew. I process all data off an input stream writing to a destination without some manual coding. However I can achieve this by creating a "Producer" that reads InputStreams. I will call this InputStreamProducer. My InputStreamProducer will implement ByteProducer and will have a method (eg. pump()) which pumps data from the provided input stream to the "ByteConsumer" attached to it. Using my new InputStreamProducer I can perform MD5 Hex encoding of an input stream creating a result String as follows: // Create processing objects. MD5 md5 = new MD5(); AsciiHexEncode asciiHexEncode = new AsciiHexEncode(); InputStreamProducer source = new InputStreamProducer(inputStream); BufferByteConsumer resultBuf = new BufferByteConsumer(); String result; // Set up processing chain. source.setConsumer(md5); md5.setConsumer(asciiHexEncode); asciiHexEncode.setConsumer(resultBuf); // Process all available data. source.pump(); // Obtain result hash. result = new String(resultBuf.getData()); If I eliminate all calls to ".setConsumer()" by adding the necessary constructors to accept consumers, the above code can be shortened to. ByteBufferConsumer resultBuf = new BufferByteConsumer(); String result; // Create md5 hash of data from inputStream. new InputStreamProducer(inputStream, new MD5(new AsciiHexEncode(resultBuf))).pump(); result = new String(resultBuf.getData()); What do you think? Static utility methods could simplify the above even further if necessary. It's definitely more complex than the existing approach but it is very flexible in that it allows arbitrary processing chains to be defined and allows for simple integration with IO Streams. Each class performs a very small well defined purpose and can be coupled to build complex processing chains. Processing should be efficient although more setup time is required. Supporting Reader/Writer and any other IO classes should be as simple as defining the relevant Consumer/Producer implementations to interact with them. Codec algorithms won't require modification. Cheers, Brett > -Original Message- > From: Gary Gregory [mailto:[EMAIL PROTECTED] > Sent: Friday, 9 January 2004 4:15 PM > To: 'Jakarta Commons Developers List' > Subject: RE: [codec] Streamable Codec Framework > > > Hello, > > Streamable codecs make a lot of sense for some codecs (but > perhaps not for the language codecs). Thanks for bringing the > topic up. I took a very quick look at the code you refer to > and it seems to be a separate framework from what we have in > [codec] today (I could be wrong of course), especially the > whole Producer/Consumer business. > > A simple example I can think of that could drive an > implementation could be: > > InputStream inputStream = ... new File(...); > DigiestUtil.md5Hex(inputStream); > > It would be interesting to see how Ant implements MD5 and SHA. > > This probably means that Encoder.encode(Object) should also > handle I/O/Streams and Reader/Writer... > > Gary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
Thanks for the reply. Yep, it is a completely different framework. I wrote the framework before looking at the current commons codec component so there are no relationships between the two. Hopefully there are ways of incorporating ideas between the two. Are you hoping to incorporate streamed processing into the existing design or create new classes to achieve this? I have no preferences either way but it could be tricky to re-use the existing interfaces. I'll have to look at this further though. I'll look at Ant as soon as I can to see how it approaches the problem. Hmm, your simple example already uncovers a gap in my design :-) Should be easily solved though. Phew. Currently I can't process all data off an input stream writing to a destination without some manual coding. However I can resolve this by creating a "Producer" that reads InputStreams. I will call this InputStreamProducer. My InputStreamProducer will implement ByteProducer and will have a method (eg. pump()) which pumps data from the provided input stream to the "ByteConsumer" attached to it. Using my new InputStreamProducer I can perform MD5 Hex encoding of an input stream creating a result String as follows: // Create processing objects. MD5 md5 = new MD5(); AsciiHexEncode asciiHexEncode = new AsciiHexEncode(); InputStreamProducer source = new InputStreamProducer(inputStream); BufferByteConsumer resultBuf = new BufferByteConsumer(); String result; // Set up processing chain. source.setConsumer(md5); md5.setConsumer(asciiHexEncode); asciiHexEncode.setConsumer(resultBuf); // Process all available data. source.pump(); // Obtain result hash. result = new String(resultBuf.getData()); If I eliminate all calls to ".setConsumer()" by adding the necessary constructors to accept consumers, the above code can be shortened to. ByteBufferConsumer resultBuf = new BufferByteConsumer(); String result; // Create md5 hash of data from inputStream. new InputStreamProducer(inputStream, new MD5(new AsciiHexEncode(resultBuf))).pump(); result = new String(resultBuf.getData()); What do you think? Static utility methods could simplify the above even further if necessary. Supporting Reader/Writer and any other IO classes should be as simple as defining the relevant Consumer/Producer implementations to interact with them. Codec algorithms won't require modification. Cheers, Brett > -Original Message- > From: Gary Gregory [mailto:[EMAIL PROTECTED] > Sent: Friday, 9 January 2004 4:15 PM > To: 'Jakarta Commons Developers List' > Subject: RE: [codec] Streamable Codec Framework > > > Hello, > > Streamable codecs make a lot of sense for some codecs (but > perhaps not for the language codecs). Thanks for bringing the > topic up. I took a very quick look at the code you refer to > and it seems to be a separate framework from what we have in > [codec] today (I could be wrong of course), especially the > whole Producer/Consumer business. > > A simple example I can think of that could drive an > implementation could be: > > InputStream inputStream = ... new File(...); > DigiestUtil.md5Hex(inputStream); > > It would be interesting to see how Ant implements MD5 and SHA. > > This probably means that Encoder.encode(Object) should also > handle I/O/Streams and Reader/Writer... > > Gary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
There seemed to be definite interest in streamable codecs but the list has gone fairly quiet. I am interested in participating in work of this kind but I'm not sure how to proceed. I don't think this deserves to be a standalone project as it seems to fit fairly well into the scope of the current codec package and I don't want to step on any toes with respect to the existing codec project. I believe Gary Gregory and Tim O'Brien are the two primary codec committers. Gary and Tim, your thoughts would be most appreciated. I'm providing the code mentioned in the below message as an example because I believe it is more effective to discuss working code than talk about abstract ideas. > -Original Message- > From: Brett Henderson [mailto:[EMAIL PROTECTED] > Sent: Thursday, 13 November 2003 10:33 AM > To: 'Jakarta Commons Developers List' > Subject: RE: [codec] Streamable Codec Framework > > > I made some changes to the code I supplied previously, it can > be found at the following URL. > > http://www32.brinkster.com/bretthenderson/BHCodec-0.5.zip > > The main differences relate to the codec interfaces and > support for data types other than "byte", the encoding > algorithms are largely unchanged. > > A quick summary of the framework is as follows: > > Framework is based around consumers and producers, consumers > accept incoming data and producers produce outgoing data. A > consumer implements the Consumer interface and a producer > implements the Producer interface. > > Specialisations of these interfaces are used for each type > of data to be converted. For example there are currently > ByteConsumer, ByteProducer, CharConsumer and CharProducer interfaces. > > The engine package contains classes (and interfaces) that > are both consumers and producers (ie. accept incoming data > and produce result data). For example there is a ByteEngine > interface that implements ByteConsumer and ByteConsumer > interfaces and is in turn implemented by the Base64Encode > concrete class. > > Engines may consume one kind of data and produce another, > the CharByteEngine interface defines an engine that > consumes characters and produces bytes. This is implemented > by the CharSetEncode class (untested). > > The consumer package contains classes that consume data > and perform an action on the data that doesn't allow it to > be accessed via producer functionality. For example, the > BufferByteConsumer class acts as a receiving buffer for > encoding results, the OutputStreamConsumer writes all data to > an OutputStream. > > The producer package contains classes that produce data > for the framework but don't accept data via consumer > functionality. For example, the OutputStreamProducer is an > OutputStream that "produces" all data passed to it. > > The io package contains classes that fit into the java.io > functionality that are neither consumers or producers in the > framework sense. For example, the CodecOutputStream is a > FilterOutputStream that uses an internal ByteEngine to > perform a transformation on the data passing through it. > > JUnit tests exist for most classes in the framework. > All testing is performed using JUnit. If there is no unit > test for a class, it can be considered untested. > > The framework is now generic enough to handle data of any > type and allow classes to be defined which can accept > any kind of data and/or produce any kind of data. All > data can be processed in a "streamy" fashion. For example, > encoding engines implementing the ByteEngine interface can be > plugged into CodecOutputStream or CodecInputStream and used > for stream functionality without directly supporting java.io streams. > > Using the CharSetEncode and (currently non-existent) > CharSetDecode, it should be possible to encode character data > to base64 then write result to a Writer. This should go part > way towards helping Konstantin with his XML conversions. > > Sorry about the brain dump but there is a fair bit > contained in the zip file and I thought some explanation > would be useful. > > Any feedback on the above is highly welcome. I don't > plan on making too many more changes unless it is > deemed useful. > > Brett - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
I made some changes to the code I supplied previously, it can be found at the following URL. http://www32.brinkster.com/bretthenderson/BHCodec-0.5.zip The main differences relate to the codec interfaces and support for data types other than "byte", the encoding algorithms are largely unchanged. A quick summary of the framework is as follows: Framework is based around consumers and producers, consumers accept incoming data and producers produce outgoing data. A consumer implements the Consumer interface and a producer implements the Producer interface. Specialisations of these interfaces are used for each type of data to be converted. For example there are currently ByteConsumer, ByteProducer, CharConsumer and CharProducer interfaces. The engine package contains classes (and interfaces) that are both consumers and producers (ie. accept incoming data and produce result data). For example there is a ByteEngine interface that implements ByteConsumer and ByteConsumer interfaces and is in turn implemented by the Base64Encode concrete class. Engines may consume one kind of data and produce another, the CharByteEngine interface defines an engine that consumes characters and produces bytes. This is implemented by the CharSetEncode class (untested). The consumer package contains classes that consume data and perform an action on the data that doesn't allow it to be accessed via producer functionality. For example, the BufferByteConsumer class acts as a receiving buffer for encoding results, the OutputStreamConsumer writes all data to an OutputStream. The producer package contains classes that produce data for the framework but don't accept data via consumer functionality. For example, the OutputStreamProducer is an OutputStream that "produces" all data passed to it. The io package contains classes that fit into the java.io functionality that are neither consumers or producers in the framework sense. For example, the CodecOutputStream is a FilterOutputStream that uses an internal ByteEngine to perform a transformation on the data passing through it. JUnit tests exist for most classes in the framework. All testing is performed using JUnit. If there is no unit test for a class, it can be considered untested. The framework is now generic enough to handle data of any type and allow classes to be defined which can accept any kind of data and/or produce any kind of data. All data can be processed in a "streamy" fashion. For example, encoding engines implementing the ByteEngine interface can be plugged into CodecOutputStream or CodecInputStream and used for stream functionality without directly supporting java.io streams. Using the CharSetEncode and (currently non-existent) CharSetDecode, it should be possible to encode character data to base64 then write result to a Writer. This should go part way towards helping Konstantin with his XML conversions. Sorry about the brain dump but there is a fair bit contained in the zip file and I thought some explanation would be useful. Any feedback on the above is highly welcome. I don't plan on making too many more changes unless it is deemed useful. Brett - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
1.2.2 it is then :-) I agree with maintaining 1.2.2 compatibility, it is a bit harsh to require 1.4 to perform base64 encoding. Unfortunately it would make life a lot easier with regards to charset encoding ... It should be possible to use OutputStreamWriter and InputStreamReader internally to perform the conversions without incurring much of a performance overhead. For example a CharByteEngine??? could use OutputStreamWriter internally to perform charset encoding. In many cases OutputStreamWriter and InputStreamReader can be used directly, it is the cases where byte to char conversion is required during output streaming that require an encoder for transforming between chars and bytes. Perhaps I'm missing something here though ... I also think it would be useful to be able to perform charset conversion without depending on streams. > -Original Message- > From: Gary Gregory [mailto:[EMAIL PROTECTED] > Sent: Tuesday, 11 November 2003 4:19 AM > To: 'Jakarta Commons Developers List' > Subject: RE: [codec] Streamable Codec Framework > > > Yes, no problem, 1.2.2. > > Gary > > > -Original Message- > > From: Tim O'Brien [mailto:[EMAIL PROTECTED] > > Sent: Monday, November 10, 2003 08:10 > > To: Jakarta Commons Developers List > > Subject: RE: [codec] Streamable Codec Framework > > > > Oleg, this is understood - 1.2.2 should be our LCD for codec. > > > > Tim > > > > > > On Mon, 10 Nov 2003 [EMAIL PROTECTED] wrote: > > > > > Tim, Gary, et al > > > Streamable codec framework would be a welcome addition to Commons > > > Codec. However, as far as we (Commons HttpClient) are > concerned, the > > > decision > > to > > > ditch java 1.2.2 support would render Codec unusable for > us (and I'd > > guess > > > a few other projects that still need to maintain java 1.2.2 > > compatibility). > > > Not that we like it too much, but because lots of our users still > > > demand it. > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
I think the design of the codec framework could cover your requirements but it will require more functionality than it currently has. > > > > Some of the goals I was working towards were: > > > > 1. No memory allocation during streaming. This eliminates > > > > garbage collection during large conversions. > > > Cool. I got large conversions... I'm already at > > > mediumblob in mysql , and it goes up/down XML > > stream > > > :) > > > > I have a lot to learn here. While I have some > > knowledge > > of XML (like every other developer on the planet), I > > have never used it for large data sets or used SAX > > parsing. > > Sounds like a good test to find holes in the design > > :-) > > It's easy. You got callback, where you can gobble up > string buffers with incoming chars for element > contents. ( and there is a lot of this stuff... ) > After tag is closed, you have all the chars in a big > string buffer, and get another callback - in this > callback you have to convert data, and do whatever > necessary ( in my case, create input stream, and pass > it to database ) This could be tricky, it's something I've been thinking about but would like feedback from others about the best way of going about it. The data you have available is in character format. The base64 codec engine operates on byte buffers. The writer you want to write to requires the data to be in character format. I have concentrated on byte processing for now because it is the most common requirement. XML processing requires that characters be used instead. It makes no sense to perform base64 conversion on character arrays directly because base64 is only 8-bit aware (you could split each character into two bytes but this would blow out the result buffer size where chars only contain ASCII data). I think it makes more sense to perform character to byte conversion separately (perhaps through extensions to existing framework) and then perform base64 encoding on the result. I guess this is a UTF-16 to UTF-8 conversion ... What support is there within the JDK for performing character to byte conversion? JDK1.4 has the java.nio.charset package but I can't see an equivalent for JDK1.3 and lower, they seem to use com.sun classes internally when charset conversion is required. If JDK1.4 is considered a sufficient base, I could extend the current framework to provide conversion engines that translate from one data representation to another. I could then create a new CodecEngine interface to handle character buffers (eg. CodecEngineChar). > > > > 3. Customisable receivers. All codecs utilise > > > > receivers to > > > > handle conversion results. This allows > > different > > > > outputs such as > > > > streams, in-memory buffers, etc to be supported. > > > > > > And writers :) Velocity directives use them. > > > > Do you mean java.io.Writer? If so I haven't > > included > > direct support for them because I focused on raw > > byte > > streams. However it shouldn't be hard to add a > > receiver to write to java.io.Writer instances. > > > My scenarios: > - I'm exporting information as base64 to XML with help > ov velocity. I do it through custom directive - > in this directive I get a Writer from velocity, where > I have to put my data. > > Ideally codec would do: read input stream - encode - > put it into writer without allocating too much > memory. > > I'm importing information: > - I have stream ( string ) of base 64 data - > codec gives me an input stream which is fed from this > source and does not allocate too much memory and > behaves polite... > The current framework doesn't handle direct conversion from an input stream to an output stream but this would be simple to add if required. Again, the hard part would be the char/byte issues. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Streamable Codec Framework
> > I noticed Alexander Hvostov's recent email > > containing streamable > > base64 codecs. Given that the current codec > > implementations are > > oriented around in-memory buffers, is there room for > > an > > alternative codec framework supporting stream > > functionality? I > > realise the need for streamable codecs may not be > > that great but > > it does seem like a gap in the current library. > > I'm in the need. So we are at least 3 :) > > > > Some of the goals I was working towards were: > > 1. No memory allocation during streaming. This > > eliminates > > garbage collection during large conversions. > Cool. I got large conversions... I'm already at > mediumblob in mysql , and it goes up/down XML stream > :) I have a lot to learn here. While I have some knowledge of XML (like every other developer on the planet), I have never used it for large data sets or used SAX parsing. Sounds like a good test to find holes in the design :-) > > 3. Customisable receivers. All codecs utilise > > receivers to > > handle conversion results. This allows different > > outputs such as > > streams, in-memory buffers, etc to be supported. > > And writers :) Velocity directives use them. Do you mean java.io.Writer? If so I haven't included direct support for them because I focused on raw byte streams. However it shouldn't be hard to add a receiver to write to java.io.Writer instances. > I'll give it a look at and come back later today :) I look forward to your feedback. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[codec] Streamable Codec Framework
I just realised I left off "codec" in the subject. Sorry about that. -Original Message----- From: Brett Henderson [mailto:[EMAIL PROTECTED] Sent: Monday, 3 November 2003 10:47 AM To: [EMAIL PROTECTED] Subject: Streamable Codec Framework Hi All, I noticed Alexander Hvostov's recent email containing streamable base64 codecs. Given that the current codec implementations are oriented around in-memory buffers, is there room for an alternative codec framework supporting stream functionality? I realise the need for streamable codecs may not be that great but it does seem like a gap in the current library. I have done some work in this area over the last couple of months as a small hobby project and have produced a small framework for streamable codecs. Some of the goals I was working towards were: 1. No memory allocation during streaming. This eliminates garbage collection during large conversions. 2. Pipelineable codecs. This allows multiple codecs to be chained together and treated as a single codec. This allows codecs such as base 64 to be broken into two components (base64 and line wrapping codecs). 2. Single OutputStream, InputStream implementations which utilise codec engines internally. This eliminates the need to produce a buffer based engine and a stream engine for every codec. Note that this requires codec engines to be written in a manner that supports streaming. 3. Customisable receivers. All codecs utilise receivers to handle conversion results. This allows different outputs such as streams, in-memory buffers, etc to be supported. 4. Direction agnostic codecs. Decoupling the engine from the streams allows the engines to be used in different ways than originally intended. Ie. You can perform base64 encoding during reads from an InputStream. I have produced base64 and ascii hex codecs as a proof of concept and to evaluate performance. It isn't as fast as the current buffer based codecs but is unlikely to ever be as fast due to the extra overheads associated with streaming. Both base64 and ascii hex implementations can produce a data rate of approximately 40MB/sec on a Pentium Mobile 1.5GHz notebook. With some performance tuning I'm sure this could be improved, I think array bounds checking is the largest performance hit. Currently requires jdk1.4 (exception handling requires rework for jdk1.3). Running ant without arguments in the root directory will build the project, run all unit tests and run performance tests. Note that the tests require junit to be available within ant. Javadocs are the only documentation at the moment. Files can be found at: http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip I hope someone finds this useful. I'm not trying to force my implementation on anybody and I'm sure it could be improved in many ways. I'm simply putting it forward as an optional approach. If it is decided that streamable codecs are a useful addition to commons I'd be glad to help. Cheers, Brett PS. Some areas that currently need improving are: 1. Exception handling requires jdk1.4, should be rewritten to support older java versions. 2. BufferReceiver allocates memory continuously during streamed conversions, should be fixed to recycle memory buffers. 3. Engines should have a new flush method added to allow them to hold off posting to receivers until their internal buffers fill up. This would prevent fragmented buffers during pipelined conversions. 4. OutputStream flush needs rework, shouldn't call finalize, should call new flush method on CodecEngines. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Streamable Codec Framework
Hi All, I noticed Alexander Hvostov's recent email containing streamable base64 codecs. Given that the current codec implementations are oriented around in-memory buffers, is there room for an alternative codec framework supporting stream functionality? I realise the need for streamable codecs may not be that great but it does seem like a gap in the current library. I have done some work in this area over the last couple of months as a small hobby project and have produced a small framework for streamable codecs. Some of the goals I was working towards were: 1. No memory allocation during streaming. This eliminates garbage collection during large conversions. 2. Pipelineable codecs. This allows multiple codecs to be chained together and treated as a single codec. This allows codecs such as base 64 to be broken into two components (base64 and line wrapping codecs). 2. Single OutputStream, InputStream implementations which utilise codec engines internally. This eliminates the need to produce a buffer based engine and a stream engine for every codec. Note that this requires codec engines to be written in a manner that supports streaming. 3. Customisable receivers. All codecs utilise receivers to handle conversion results. This allows different outputs such as streams, in-memory buffers, etc to be supported. 4. Direction agnostic codecs. Decoupling the engine from the streams allows the engines to be used in different ways than originally intended. Ie. You can perform base64 encoding during reads from an InputStream. I have produced base64 and ascii hex codecs as a proof of concept and to evaluate performance. It isn't as fast as the current buffer based codecs but is unlikely to ever be as fast due to the extra overheads associated with streaming. Both base64 and ascii hex implementations can produce a data rate of approximately 40MB/sec on a Pentium Mobile 1.5GHz notebook. With some performance tuning I'm sure this could be improved, I think array bounds checking is the largest performance hit. Currently requires jdk1.4 (exception handling requires rework for jdk1.3). Running ant without arguments in the root directory will build the project, run all unit tests and run performance tests. Note that the tests require junit to be available within ant. Javadocs are the only documentation at the moment. Files can be found at: http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip I hope someone finds this useful. I'm not trying to force my implementation on anybody and I'm sure it could be improved in many ways. I'm simply putting it forward as an optional approach. If it is decided that streamable codecs are a useful addition to commons I'd be glad to help. Cheers, Brett PS. Some areas that currently need improving are: 1. Exception handling requires jdk1.4, should be rewritten to support older java versions. 2. BufferReceiver allocates memory continuously during streamed conversions, should be fixed to recycle memory buffers. 3. Engines should have a new flush method added to allow them to hold off posting to receivers until their internal buffers fill up. This would prevent fragmented buffers during pipelined conversions. 4. OutputStream flush needs rework, shouldn't call finalize, should call new flush method on CodecEngines. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]