from:"Brett Henderson"

RE: [codec] StatefulDecoders

2004-03-31 Thread Brett Henderson

Alex,

Sorry about the delay.  I'm a bit snowed under at the moment.

I've attached producers/consumers that process Object instances instead of
type specific data.

Some differences between these interfaces and those you've already built
are:
1. These have no monitor facility.  All errors result in a CodecException
being thrown.  There is no concept of a warning.
2. These have a finalize concept which is required for implementations such
as base64 where padding on the final data block is required.
3. These have flush methods to allow the chain to be flushed without being
finalized.
4. These have a propogating flag which allows finalize/flush calls to be
propagated through codec chains.  By default this is true but can be set to
false.  This is necessary when a single consumer (eg. OutputStreamConsumer)
receives streams from multiple sources and each of those sources are
finalized before the next is started.  In this case you don't want the
OutputStreamConsumer to be finalized (and the underlying stream closed)
multiple times, you want this to occur only after the final input source
completes.  Hope this makes sense.

With regards to each difference:
1. Not sure of the correct approach here.  I threw exceptions because it was
simpler to implement and made it harder to end up with silent errors
occurring.  A monitor approach is more flexible although perhaps harder to
use from a client perspective.
2. I believe you will need to add a finalize concept.  Some codecs require
notification that this is the final processing call (ie. Base64).
3. Flush isn't critical.  I just added it for completeness.
4. A propagating option isn't critical and java IOStreams don't have this
concept.  However a common problem is where you wish to feed the result of
several streams into a single stream without the close on each top level
stream calling close on the receiving stream.  Another way of overcoming
this is to create a special Noop Nopropagate codec that you insert into the
chain to prevent these calls propagating.

I meant to create some sample code using my interfaces to compare with yours
but I can't get it done at the moment.

You're obviously clued in on what's required, any differences between mine
and yours is relatively small and I'm sure either would suit the purposes of
codec.

Given that I'm taking way too long to do anything at the moment I'll leave
it in your capable hands.  My current work should ease up in a few weeks and
I'll try to give you a hand again then.

Cheers,
Brett

> Could you give some example's of how this would look just using
> 
> Objects instead of specific types to implement what the DecoderStack
> 
> does here:
> 
> 
> 
> http://cvs.apache.org/viewcvs.cgi/incubator/directory/snickers
> /trunk/codec-s
> 
> tateful/src/java/org/apache/commons/codec/stateful/DecoderStac
> k.java?rev=972
> 
> 4&root=Apache-SVN&view=auto
> 
> 
> 
> And go on to show how it's used like in this test code here:
> 
> 
> 
> http://cvs.apache.org/viewcvs.cgi/incubator/directory/snickers
> /trunk/codec-s
> 
> tateful/src/test/org/apache/commons/codec/stateful/DecoderStac
> kTest.java?rev
> 
> =9724&root=Apache-SVN&view=auto
> 
> 
> 
> Specifically I'm referring to the usage of the DecoderStack in the
> 
> testDecode() example which shows the chaining really simply.
> 
> Perhaps looking at the two use cases we can come to a better
> conclusion.
> 
> 
> 
> 
> 
> > Does the above make sense?  If so, please give it careful
> > consideration
> 
> > because I originally used the callback design and modified it to use
> 
> > producers/consumers because I think it is actually simpler
> and is much
> 
> > more
> 
> > flexible.
> 
> 
> 
> Yes it makes sense I just want to see it and play with it.
> Can you whip
> 
> it up and we'll begin getting a feel for both sets of interfaces.
> 
> 
> 
> > If you're still not convinced I guess I'll have to give in
> and go with
> > the
> 
> > flow ;-)
> 
> 
> 
> Nah we'll try come to an understanding.
> 
> 
> 


Producer.java
Description: java/


Consumer.java
Description: java/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-03-24 Thread Brett Henderson

Alex,

I haven't had a chance to respond to your email yet.  I'll try to do so
tonight.

I'll knock up a couple of quick interfaces for comparison at the same time.

Cheers,
Brett


> -Original Message-
> From: Alex Karasulu [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, 24 March 2004 12:23 PM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] StatefulDecoders
> 
> 
> Brett,
> 
> 
> 
> Ok let's take a breath and dive into this email :-).
> 
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-03-10 Thread Brett Henderson

> Take a look at the "Reclaiming Type Safety" section in this 
> article on the 
> 
> event notification pattern here:
> 
> 
> 
> http://members.ispwest.com/jeffhartkopf/notifier/
> 
Cool, that's a neat way of achieving type safety.  Avoiding downcasts (eg.
Object to byte[]) is a good thing.  It still relies on a runtime check but
is only performed in one piece of code instead of every implementation of an
event receiver.

Advantages:
Type safety enforced in a single class instead of using downcasts within
each event receiver.
Single event method defined in interfaces instead of methods per event type.
No need to define separate interfaces per event type.

Disadvantages:
No compile time type checking, incorrect types may not be picked up during
development.
Runtime overhead to perform reflection on event receiver class and locate
type specific event receiver method.
Runtime overhead converting from generic event type to specific event type.

I don't know if compile time or runtime checks should be used but if runtime
checks are chosen then this pattern is a good way of enforcing them.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-03-10 Thread Brett Henderson

> -Original Message-
> From: Noel J. Bergman [mailto:[EMAIL PROTECTED]
> Sent: Monday, 8 March 2004 3:25 PM
> To: Jakarta Commons Developers List
> Subject: RE: [codec] StatefulDecoders

> Consider the Cocoon (http://cocoon.apache.org/) pipeline for the 
> concept of
> 
> a fully event-driven programming approach, although their 
> implementation has
> 
> far too much overhead for codec purposes (or the regex events I 
> mentioned).

I still intend to look at this although my list of reading seems to grow
daily ...

> IMO, we want a consistent interface that provides the fundamental
> 
> operations, and then we can build convenience on top of that interace.
> 
> 
> 
> Your interface is closer to what I had in mind than Alex's, at least 
> using
> 
> more of the generic terminology.  The codec domain can be expressed 
> using
> 
> the pipeline interface, or if we want a codec specific interface, 
> Alex's
> 
> could be a convenience layer on top of the pipeline.
> 
> 
> 
> What I had imagined is an approach where each element in the pipeline
> 
> supports a registering for variety of event notifications. Some of 
> them may
> 
> be generic, some of them may be domain specific.  Most codec uses 
> would use
> 
> generic events.
> 
> 
> 
> The key is being able to go an object and register for semantic 
> events.  So
> 
> if you assume that a transformer is both a producer and a consumer, 
> you
> 
> could register a transformer with a datasource (producer), and 
> register
> 
> downstream consumers that want the decoded data with the transformer.  
> Yes,
> 
> we would want to allow both fan-in and fan-out where appropriate.

I think my design already supports most of the above ideas, they just have
to be implemented as required for the particular usage.  For example, a
Base64Encoder already supports registering for events of type byte[].  A
MIMEMultipartDecoder could generate events of type MIMEPart.  The event
types are not specified by the existing implementation, they can be added as
necessary for the particular feature.  Fan out is achieved by creating a
multicast stage accepting single events from a producer and passing them to
multiple consumers of the same type (although I haven't implemented a
multicast stage because I haven't needed it yet :-).  Fan in can already be
handled by setting a single consumer as the destination for multiple
producers.

Perhaps this isn't quite what you're envisaging.  You may like to see a more
generic approach that allows events and pipelines to be described in more
abstract terms.  Unfortunately I can't see a way of achieving this without
making the API complex and imposing overhead.  If you're looking for a more
powerful approach, should it be implemented outside of codec where runtime
issues aren't quite as critical?

I guess it depends on what problems you're trying to solve.  If you wish to
process large streams of data in an efficient manner my implementation is a
good fit, if you're looking to process structured data (eg. MIME) it can be
extended to fit as required, if you're looking to use it as the basis of
communication and processing within a server then it isn't up to the task.
However, isn't the last point outside the scope of codec and more in the
realm of other designs/libraries such as SEDA.

> As for the details of message transport ... it seems to me that we 
> already
> 
> have multiple options, so I'm not sure that we want to roll our own 
> versus
> 
> adopting and/or adapting an existing one.
> 
> 
> 
> We have JMS, and the new concurrency package coming in JDK 1.5.  One 
> thing
> 
> that got botched in JDK 1.5 is that they removed the Putable and 
> Takable
> 
> interfaces that Doug Lea had used in his library, instead merging 
> their
> 
> functionality directly onto the queue interface, falsely believing 
> that a
> 
> message queue is a Collection.  A number of us argued for those 
> interfaces,
> 
> and Doug proposed a change, but it was vetoed by Sun.
> 
> 
> 
> I see a few options, such as:
> 
> 
> 
>   - we pick up the necessary interfaces from Doug's concurrent
> 
> library, and deal with java.util.concurrent down the road.
> 
> 
> 
>   - we use JMS interfaces in a very simplied form.
> 
> 
> 
> As potentially whacked as the idea might be, considering the 
> complexity of
> 
> JMS, I believe that we could selectively use JMS interfaces without 
> undo
> 
> complexity or hurting performance.  Basically, we'd ignore the things 
> that
> 
> don't make any sense in our context.  Take a look at MessageProducer,
> 
> MessageConsumer and MessageListener.  Intelligently, they are just
> 
> interfaces.  We don't need multi-threading, network transports, etc., 
> in
> 
> general, although by using those interfaces, they would be available 
> where
> 
> applications warranted them.
> 
> 
> 
> ref:
> 
> http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/jms/packag
> e-summary.html
> 
> 
> 
> Alternatively, I had this odd thought that we

RE: [codec] StatefulDecoders

2004-03-08 Thread Brett Henderson

I just responded to one of your emails Alex, apologies for it being about 5
days late!

I've also got some reading to do ...

> -Original Message-
> From: Alex Karasulu [mailto:[EMAIL PROTECTED] 
> Sent: Monday, 8 March 2004 6:17 PM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] StatefulDecoders
> 
> 
> Noel,
> 
> 
> 
> This will keep me busy for a while and thanks for taking the time to
> 
> respond.  Please bear with me as I try to understand your view point.
> 
> 
> 
> Alex
> 
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-03-08 Thread Brett Henderson

> How about we put our minds together and finalize some of this
> stuff so I can
> 
> start writing some codecs that can be added back to this project?

Yeah definitely, sounds like we're trying to solve the same problem here.

I haven't responded to your previous emails because I haven't contributed
before and was leaving opinions to those who've actually proven themselves.

> > > In general, I have long preferred the pipeline/event model to
> 
> > > the approach
> 
> > >
> 
> > > that Alex had, where it would give data to the codec, and
> 
> > > then poll it for
> 
> 
> 
> Agreed! Mine approach was not the best but have you had a
> chance at looking
> 
> at the new interfaces that I sent out with the callbacks.
> Shall I resend
> 
> those?
> 

I still have them here.  I'll comment on them further down.

> Let me just list the requirements one more time:
> 
> 1). Interfaces should allow for implementations that perform
> piece meal
> 
> decodes
> 
>- enables implementations to have constant sized
> processing footprints
> 
>- enables implementations to have efficient non-blocking
> and streaming
> 
> operation

Agreed.

> 2). Easily understood and simple to use

Agreed, although needs to be weighed up with any conflicting requirements.

> 3). Interfaces should in no way shape or form restrict or limit the
> 
> performance of implementations what ever they may be.

Agreed, although without knowing all of these implementations in advance we
can never be sure ;-)

> 
> > You're right, my design has no concept of structured content. It was
> 
> > developed to solve a particular problem (ie. efficient
> streamable data
> 
> > manipulation).  If API support for structured content is
> required then
> > my
> 
> > implementation doesn't (yet) support it.
> 
> 
> 
> You can build on this separately no?  There is no need to
> have the codec
> 
> interfaces take this into account other then allow this
> decoration in the
> 
> future rather than inhibit it.
> 

Yes, I can build on it separately, however a new set of producers and
consumers are needed for each type of structured data.  I don't see this as
a problem because trying to make this too generic may lead to loss of
performance and a complicated API.

> 
> > I'll use engine for the want of a better word to describe
> an element
> > in a
> 
> > pipeline performing some operation on the data passing through it.
> 
> 
> 
> SEDA calls this a stage btw.

Much better :-)

> 
> With codecs the encoding is variable right?  It could be anything.
> 
> Something has to generate events/callbacks that delimit
> logical units of the
> 
> encoding what ever that may be.  For some encodings that you mentioned
> 
> (base64) there may not be a data structure but the unit of
> encoding must be
> 
> at least two characters for base64 I think.  Please correct
> me if I'm wrong.

3 byte input 4 byte output for encoding, and 4 byte input 3 byte output for
decoding.  Input is padded if not a multiple of 3 bytes.

> 
> So there is some minimum unit size that can range from one
> byte to anything
> 
> and this is determined by the codec's encoding and reflected
> in some form of
> 
> callback.  SAX uses callbacks to allow builders that are
> content aware do
> 
> their thing right?  Now I'm not suggesting that a base64 codec's
> 
> encoder/decoder pairs make callbacks on every 2 or single
> byte (depending on
> 
> your direction).  In the case of such a non-structured
> decoder the buffer
> 
> size would be the determining factor or the end of the stream.

Agreed.

> 
> 
> 
> So I think we need to use callbacks to let decoders tell us
> when they hit
> 
> some notable event that needs attention whatever that may be.

I agree in principle here although I'm not sure that I agree with the
structure of callbacks.  I'll explain more later.

> > > operations.  These are pipelines; receiving content on one
> 
> > > end, performing
> 
> > >
> 
> > > operations, and generating events down a chain.  More than
> 
> > > one event could
> 
> > >
> 
> > > be generated at any point, and the chain can have multiple paths.
> 
> 
> 
> This, the pipelining notion, IMHO is overly complicated for
> building out
> 
> codec interfaces.  The pipeline can be built from the smaller
> simpler parts
> 
> we are discussing now.  We must try harder to constrain the scope of a
> 
> codec's definition.
> 
> Noel as you know I have built server's based on pipelined
> components before
> 
> and am trying it all over again.  We must spare those wanting
> to implement
> 
> simple codecs like base64 from these concepts let alone the
> language around
> 
> them.  The intended use of codecs by some folks may not be so
> grandiose.
> 
> They may simply need it to just convert a byte buffer and be
> done with it.
> 
> There is no reason why we should cloud this picture for the
> simple user.  

I agree that we definitely don't want to introduce complexity and
computational overhead for simple cases.  However I think m

RE: [codec] StatefulDecoders

2004-03-01 Thread Brett Henderson

Noel,

Sorry about the delay, I've been away for a few days.

> In general, I have long preferred the pipeline/event model to
> the approach
> 
> that Alex had, where it would give data to the codec, and
> then poll it for
> 
> state.  However, I don't see something in your implementation
> that I think
> 
> we want.  We want to be able to have structured content handlers and
> 
> customized events depending upon the content handler and the
> registered
> 
> event handlers.  This could be particularly important in a streaming
> 
> approach to MIME content.  And I also desperately want a
> regex in this same
> 
> model.

You're right, my design has no concept of structured content. It was
developed to solve a particular problem (ie. efficient streamable data
manipulation).  If API support for structured content is required then my
implementation doesn't (yet) support it.

I'll use engine for the want of a better word to describe an element in a
pipeline performing some operation on the data passing through it.

An API aware of structured content shouldn't complicate the creation of
simple engines such as base64 which pay no attention to data structure.
Ideally, a structured API would extend an unstructured API and only those
engines requiring structured features would need to use it.

I'm having trouble visualising a design that supports structured content
without being specific to a particular type of structured content. Do you
have some examples of what operations you would like a structured data API
to support?  Do you see interactions between pipeline elements being
strongly typed?

My design uses the concepts of producers and consumers, I'd like to see
those ideas preserved.  Engines are both consumers and producers but the
first and last elements in a chain (or pipeline) are only producers and
consumers respectively allowing I/O to be decoupled from the pipeline
operations. For example, my design uses an OutputStreamConsumer to write
pipeline result data to an OutputStream, OutputStreamProducer to receive
data written to an OutputStream and pass into a pipeline, and
InputStreamProducer to pump data from an input stream and pass into a
pipeline.

A structured content API can extend the producer/consumer ideas by passing
data types understood by the structured content in question.

For example, a multipart mime decoding engine (consumer of byte data, hence
a ByteConsumer) could produce MIME parts (a MIMEPartProducer). A
MIMEPartConsumer design would receive MIMEPart objects (which are in turn
ByteByteEngines but extended with a MIME type property) and connect them to
a consumer capable of handling the byte data contained in the MIME part.

The above example would involve the definition of several new interfaces
(MIMEPart extending ByteByteEngine adding mime type property, MIMEProducer
extending Producer, MIMEConsumer extending
Consumer) and new classes to implement the new interfaces with the behaviour
desired.

Any other structured content types could be handled in similar ways with new
"event" types being defined and relevant producer and consumer interfaces
created to support them.

Perhaps a more generic method can be devised but weak typing and degraded
performance are hard to avoid.


> Drop the word "conversion".

Yep, agreed.

> Conversion is simply one of many possible
> 
> operations.  These are pipelines; receiving content on one
> end, performing
> 
> operations, and generating events down a chain.  More than
> one event could
> 
> be generated at any point, and the chain can have multiple paths.

If the above can be achieved without introducing a large overhead (both
runtime and coding overhead) for simple operations then it sounds good.  Is
it worth considering the possibility of a pipeline receiving data from more
than one source?  This may be necessary when composing multipart MIME
messages.  Then again, a multipart MIME consumer class may be a better
solution using similar ideas to those described earlier (ie. A
MIMEPartConsumer which combines all parts into a single byte stream).

I'm not sure how much sense I've made above, hopefully some ;-)

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

2004-02-24 Thread Brett Henderson

I probably sound like a broken record but here goes :-)

If I'm barking up the wrong tree, let me know and I'll stop making noise on
this list ...

Many of the problems being discussed here have been solved in the library
I've posted previously. An up-to-date version can be found here:
http://www32.brinkster.com/bretthenderson/bhcodec-0.7.zip
It uses generic interfaces for communication between all components that
allows the use of streams, byte arrays, NIO, etc to be plugged together as
necessary.  NIO isn't currently supported but I expect it would be trivial
to add it.

The library can be visualised as a collection of data consumers and
producers (a codec engine implements both).  No distinction is made between
encoding and decoding (they are the same thing in my view from an api
perspective).

One problem I see with the current codec project is that every new use case
that is envisaged tends to require extensions to the current interfaces.
The above library is designed to be more generic and allow a more pluggable
approach where new functionality doesn't impact every codec implementation.

It uses a push model internally but pull-model utility classes wrapped
around underlying push classes can be used to implement pull functionality
where necessary.

It does not require JDK1.4 although NIO could be plugged in if necessary.

I understand that people don't want to spend time looking at every pet
project people have come up with but I think this could be useful in
commons.  There is a lot to look at and I guess that is discouraging people
from taking the time to look at it.

Should I propose this library as a separate project?  (How do I do this?)
Perhaps as more generic codec library that could potentially be used by
commons-codec once it has matured.  It may be too large a change to fit into
the existing codec project as it currently stands.

I've offered this several times now and while there doesn't seem to be any
major opposition to the idea, there hasn't been strong support either.  I'm
not sure how to proceed.  Is this something that can be placed in a sandbox
for people to play with?

Cheers,
Brett

> -Original Message-
> From: Noel J. Bergman [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, 24 February 2004 2:25 PM
> To: Jakarta Commons Developers List
> Subject: RE: [codec] StatefulDecoders
> 
> 
> > This brings up an interesting issue: How do we potentially 
> package and
> 
> > deliver some code that depends on Java 1.4. In a second [codec] jar?
> 
> 
> 
> There are several issues, but let me address what I consider 
> to be the key
> 
> one: we have to design the core code as push-model.  If we 
> were to design
> 
> the code as pull-model, we would lose the thread of execution 
> inside the
> 
> callee.  We don't want the callee blocking on I/O and returning when
> 
> finished.  But with a non-blocking callee, we can then use 
> either a NIO or
> 
> IO wrapper as necessary.
> 
> 
> 
> Obviously the interface between the I/O handling wrapper and the data
> 
> handling core will have to be Java 2 < 1.4 compatible.
> 
> 
> 
>   --- Noel
> 
> 
> 
> 
> 
> -
> 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> 
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] More thoughts on CharSets and Encoders (references: RE: [codec] Streamable Codec Framework)

2004-01-15 Thread Brett Henderson

> Does CharSet/Util's in [lang] approach a similar 
> functionality to nio.charset? After reviewing the codebase, 
> my viewpoint is no, as it is more for "building" charsets, 
> than for using them (authors rebuttals always welcome).

I'd also be interested to see if this functionality exists
somewhere.

> I think [httpclients] static ASCII methods (once in 
> HttpConstants) now also in [codec-multipart] are very similar 
> in functionality to the idea of CharsetEncoders/Decoders of 
> nio.charset.
> 
> So we begin to have functionality for charset's in [lang] and for 
> encoders in [codec]. How do we bring this all together? I'd 
> like to see 
> similar CharsetEncoding/Decoding capabilities as nio (with 
> the eventual 
> goal of actually having Jakarta Commons converge to using nio of 
> Charsets in the future.
> 
> As a possible bridge for transition I think a CharsetEncoder 
> API in [codec] that duplicates that of nio.charset would form 
> an excellent path for convergence. The eventual goal once 
> j2sdk1.3 was no longer in service would be to simply refactor 
> Apache Projects dependent on this API to use NIO instead.

Does the CharSetEncode class in my library approach the functionality
you require?
http://www32.brinkster.com/bretthenderson/BHCodec-0.6.zip
Internally it uses an OutputStreamWriter which leverages JDK
functionality albeit in a somewhat inelegant way.  I would expect
performance to be fairly reasonable however.

I intend to write a corresponding CharSetDecode class but haven't
gotten around to this yet.  If you have any interest I can up the
priority.  It will use an InputStreamReader internally unless
better alternatives are found.

If at some point in the future JDK 1.4 becomes an accepted base
I will be reworking CharSetEncode to use java.nio features
because they provide a cleaner interface than wrapping streams.

> >> If JDK1.4 is considered a sufficient base, I could
> > 
> > I think tha considering 1.3.1 as the base requirement is safe. 
> > Unfortunately, as discussed on this list under various 
> heading, making 
> > 1.4 a requirement is too aggressive.
> > 
> > Gary
> 
> Yes, we're still supporting 1.3 in many cases, BUT, wouldn't we want 
> convergence eventually to the API's provided by the j2sdk? 
> AND, by that 
> point in the future, is j2sdk 1.3 even going to be in play?

I will always be leaving a CharSetEncode feature in my library
because it allows charset conversion to be performed within
a processing chain but I would see the internal implementation
moving to java.nio eventually.

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

> 
> > There are obviously advantages to having a single unified framework 
> > and if possible it would be the ideal result. Unfortunately 
> I have run 
> > into performance disadvantages so far. I haven't tried it 
> for a while 
> > but in the past my base 64 conversion has not been as fast as the 
> > existing codec implementation for small conversions.
> > For common algorithms such as base 64 it may make sense to have
> > two implementations optimised for different purposes.
> 
> That does not seem justified at first. Optimize last if at all... ;-)

Hehe, you're right.

I guess it just feels wrong pushing for stream support in codec when
its introduction will incur overhead for non-streamed cases.  Of
course in 99.9% of those cases the performance difference will be
immeasurable in the overall application :-)

> 
> > In addition, I'm not familiar with language codecs but you 
> mentioned 
> > it makes no sense to use these in streams.
> 
> One of the things to keep in mind, is that for simple cases, 
> the f/w should be invisible to the client code. For example:
> 
> DigiestUtil.md5Hex(new FileInputStream("boo.txt"));
> 
> Gary

Hmm, that is definitely worth remembering.  The more generic I made
the design, the more coding was required in order to use it :-(
Perhaps a symptom of over-engineering, I hope not.

There are a few ways I can think of dealing with this.
1. Do nothing.  Force people to learn a new and more complicated API.
2. Create a new API that supports streaming leaving the existing
API in place for the existing functionality and common use cases
not requiring stream support.
3. Add stream support to the existing API.
4. Create an API supporting stream processing and re-implement the
existing API using it.

Of these I think.
1. A non-starter but had to list it.  Backwards compatibility and usability
being two reasons.
2. This is a valid approach but leaves two distinct code-bases to support.
I hope there are other options available.
3. In most projects this tends to be the way things are done.  In this
case I'm not sure that its practical and may get fairly messy and create
an unmaintable codebase.  I really need to spend more time looking at the
existing APIs in detail though.
4. I think a variation on this idea could work well in practice.
Codec could be conceptually designed in various layers.  It could
have a low level API that is modular and supports stream based
processing.  My library or some equivalent would fit this purpose.
A second layer could then provide simplified access to the library
for the most common use cases implementing the existing API and adding
new functionality as desired.


To give an example, the example could be implemented as follows:
  //DigiestUtil.md5Hex(new FileInputStream("boo.txt"));
  public class DigestUtil {
...
public static String md5Hex(InputStream inputStream) throws
CodecException {
  BufferByteConsumer result = new BufferByteConsumer();
  ChainByteEngine chain = new ChainByteEngine(result);
  
  chain.append(new MD5());
  chain.append(new AsciiHexEncode());
  
  new InputStreamProducer(chain, inputStream).pump();
  
  return new String(result.read());
}
...
  }

There's some overhead in initialisation but most classes are fairly
lightweight.
All of the above classes have been implemented if you wish to have a
look.

I updated some of the classes last night, a copy can be found at:
http://www32.brinkster.com/bretthenderson/BHCodec-0.6.zip

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

> Here are a few good rules of thumb:
> 
> 1. Commons exists as an effort to encourage code reuse.  The 
> Streamable 
> framework presented was interesting, but I'd like us to find 
> an existing 
> streamable Base64 implementation inside of the ASF codebase.

I have no problems with this but so far I haven't seen anything
like this that doesn't sub-class InputStream and OutputStream.
Sub-classing InputStream and OutputStream is problematic because
it forces you to code algorithms around IOStream semantics
(InputStream coding is not simple, especially available() method)
and it forces you to make the encoder sub-class OutputStream and
the decoder sub-class InputStream unless you write an
implementation for each stream type.
Providing a single InputStream implementation that can use an
underlying codec engine simplifies development and testing
of new algorithms considerably and removes the distinction between
input and output streams.

> 3. No need to expressly focus on a framework (at all).  Codec 
> is FIRSTLY a 
> functional beast, even if the solution is inelegant.  If there is an 
> existing streamable Base64 in ASF, I'd recommend copying it 
> outright and 
> placing in the codec package.  Over time, it can move towards 
> a unified 
> streamable framework.
I'm often guilty of over designing things, hey it's fun :-)
Would it help if I didn't call my classes a "framework"?
It's no more than a few common interfaces and some implementation
codecs.  There are no factories, or service providers, or other
abstractions complicating things.
It really isn't much different to the existing codec interfaces
except that they are written to support streaming and separate
input and output interfaces.
I really think there's a need for all streamable codecs to follow
common interfaces.  Perhaps InputStream and OutputStream are
sufficient but I think there's a better way.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

> [snip]
> > 1. Commons exists as an effort to encourage code reuse.  The 
> > Streamable framework presented was interesting, but I'd like us to 
> > find an existing streamable Base64 implementation inside of the ASF 
> > codebase.
> 
> Not for Base64 but Ant has:
> 
> o MD5 and SHA checksum computation: 
> http://ant.apache.org/manual/CoreTasks/checksu> m.html

Everyone will be getting tired of my emails soon ...

I've had a look at Ant and they are using the java.security.MessageDigest
directly.
I think it goes back to JDK 1.2 so I assume it's okay to use within codec.
It supports MD5 and SHA-1.

We would have to create wrapper classes if we want them to support
the relevant codec interfaces but this should be straightforward.
Of course if we want to implement other algorithms such as SHA-256 or
SHA-512
we either have to write our own or rely on the user to have the relevant
security providers installed within their JVM.

It would be interesting to compare performance between the Sun provided
MD5 and codec.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

> It is accomplished under "jakarta-commons-sandbox/codec-multipart".
> 
> > (2) Can we agree on /what/ streamable codecs are (sorry but 
> I like to 
> > point out the obvious when starting something like this). Recognize 
> > the current impls alternatives.
> > 
> 
> Yes sorry, I think there are two ideas running around here:
> 
> (a) Actual "inline" Stream Encoders/Decoders (SSL etc). that 
> require no 
> knowledge of the length of the content. Probibly extend 
> "FilterOutputStreams" etc.
> 
> (b) Encoders/Decoders that actually work by passing thier content 
> through streaming to manage larger amounts of data 
> efficently. Data for 
> which the length is probibly already known (Files). An 
> interface which 
> supports handing objects and Streams manages this:

Can you give some examples of algorithms where the length needs to be
known in advance?

My code may break horribly with such an algorithm :-(


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] multipart encoders/decoders

2004-01-12 Thread Brett Henderson

> (3) Should the Producer/Consumer framework submitted be 
> retrofitted into the current [codec] Encoder/Decoder 
> framework? Personally, I like the specificity of "Encoder" 
> and "Decoder" for interfaces. This means the current i/f 
> would be expanded.

I'm not terribly attached to Producer and Consumer but couldn't
think of a better alternative.  Perhaps Source and Sink but
they may be no better.

I'm not keen on Encoder and Decoder because I don't believe
there's a need to make the distinction between the two, plus I
might have to rewrite some code ;-)
Both encoders and decoders are processing data, it is the algorithm
that decides if it is encoding or decoding.  In some cases the
words encode and decode may not make sense.  If you're modifying
line endings on a file are you encoding or decoding?

Producer and Consumer relate to different things.  Base64Encode
for example implements ByteProducer and ByteConsumer by way of
the ByteEngine interface because it consumes byte data and
produces byte data.  OutputStreamConsumer only implements
ByteConsumer because it consumes byte data but sends the data
to a location outside the scope of the library (ie. OutputStream).

> 
> (4) The other way around: should the current [codec] be 
> recast in the proposed Produced/Consumer f/w. I am not wild 
> about the genericity of "Producer" and "Consumer" as names.
> 
> (5) I am assuming that two f/w's in [codec] are undesirable. 
> It would be good to agree or disagree on this previous 
> statement as a starter! ;-)

There are obviously advantages to having a single unified
framework and if possible it would be the ideal result.
Unfortunately I have run into performance disadvantages so far.
I haven't tried it for a while but in the past my base 64
conversion has not been as fast as the existing codec
implementation for small conversions.
For common algorithms such as base 64 it may make sense to have
two implementations optimised for different purposes.
In addition, I'm not familiar with language codecs but you
mentioned it makes no sense to use these in streams.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-12 Thread Brett Henderson

> I suspect we are going to need something along the lines of a 
> "Streamable Encoder/Decoder" for the Multipart stuff. If we look at 
> HttpClients MultipartPostMethod, there is a framework for basically 
> joining multiple sources (files, strings, byte arrays) 
> together into an 
> OutputStream which is multipart encoded. I want to attempt to 
> maintain 
> this strategy when isolating the code out of HttpClient and into the 
> multipart sandbox project. I suspect that your Streamable 
> Consumer/Producer stuff could also be advantageous for multipart 
> encoding/decoding. At least I want to make sure we're not 
> inventing the 
> same wheel.

I'll try to look at the HttpClient code to get a feel for how it
hangs together.  From what I can gather my code should plug in fairly
cleanly.  My code doesn't specify any type of IO interface as
any interface can be adapted in by implementing relevant consumers
and producers.  I've tried to design the framework such that the
actual codec algorithms have no knowledge of the source or destination
of the data they process.  This allows them to be far more generic
and greatly increases their usefulness.

> Specifically, I see we're going to need interfaces other than the 
> existing codec ones because they pass around byte[] and Object when 
> encoding/decoding. We need to maintain that the content will 
> be streamed 
> from its native datstructure when its consumed by a consumer 
> (HttpClient 
> MultipartPost for instance) or when it is used to "decode" that the 
> Objects produced are built efficiently off a InputStream (ie 
> Files are 
> immediately written to the FileSystem, Strings or byte[]s are 
> maintained 
> in memory).

My framework doesn't specify any particular type of data although
byte oriented processing is the only fleshed out implementation at
the moment.  All it cares about is that a producer is available
to generate data from an external source and a matching consumer
is available to pass it to a destination.
Every producer must have a matching consumer.  A consumer can be
called directly by clients.
Typically an engine (implementing both consumer and producer) will
sit in the middle performing some kind of translation/encoding/decoding
on the data.  It "consumes" input data and "produces" output data.
Using this structure, processing chains can be defined so that
multiple transforms can be performed on the same data all in a
stream oriented fashion.

To cut a long story short, chains can be defined to access data
from streams/buffers/etc, perform relevant translations (re-using
small in-memory buffers to eliminate garbage collection) and pass
data to output streams/buffers/etc.  Due to the stream support,
data of arbitrary size can be processed.

> 
> Either way, I'm currently "tidying" up a maven project 
> directory to be 
> committed into the sandbox for the new multipart codec stuff. 
> Once, its 
> in place we could add your code to it as well.

Let me know if you want to import any of my code and I'll do
any necessary package reorganisation.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-12 Thread Brett Henderson

Thanks for the reply.

Yep, it is a completely different framework.  I wrote
the framework before looking at the current commons codec
component so there are no relationships between the two.  Hopefully
there are ways of incorporating ideas between the two.

Are you hoping to incorporate streamed processing into the existing
design or create new classes to achieve this?  I have no preferences
either way but it could be tricky to re-use the existing interfaces.
I'll have to look at this further though.

I'll look at Ant as soon as I can to see how it approaches the problem.

Hmm, your simple example already uncovers a gap in my design :-)
Should be easily solved though. Phew.

I process all data off an input stream writing to a destination
without some manual coding.  However I can achieve this by creating
a "Producer" that reads InputStreams.  I will call this
InputStreamProducer.

My InputStreamProducer will implement ByteProducer and will have
a method (eg. pump()) which pumps data from the provided input
stream to the "ByteConsumer" attached to it.

Using my new InputStreamProducer I can perform MD5 Hex encoding
of an input stream creating a result String as follows:

  // Create processing objects.
  MD5 md5 = new MD5();
  AsciiHexEncode asciiHexEncode = new AsciiHexEncode();
  InputStreamProducer source = new InputStreamProducer(inputStream);
  BufferByteConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Set up processing chain.
  source.setConsumer(md5);
  md5.setConsumer(asciiHexEncode);
  asciiHexEncode.setConsumer(resultBuf);

  // Process all available data.
  source.pump();

  // Obtain result hash.
  result = new String(resultBuf.getData());

If I eliminate all calls to ".setConsumer()" by adding the necessary
constructors to accept consumers, the above code can be shortened to.

  ByteBufferConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Create md5 hash of data from inputStream.
  new InputStreamProducer(inputStream, new MD5(new
AsciiHexEncode(resultBuf))).pump();
  result = new String(resultBuf.getData());

What do you think?  Static utility methods could simplify the above even
further if necessary.

It's definitely more complex than the existing approach but it is
very flexible in that it allows arbitrary processing chains to be
defined and allows for simple integration with IO Streams.  Each class
performs a very small well defined purpose and can be coupled to build
complex processing chains.  Processing should be efficient although
more setup time is required.

Supporting Reader/Writer and any other IO classes should be as simple
as defining the relevant Consumer/Producer implementations to interact
with them.  Codec algorithms won't require modification.

Cheers,
Brett

> -Original Message-
> From: Gary Gregory [mailto:[EMAIL PROTECTED] 
> Sent: Friday, 9 January 2004 4:15 PM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Streamable Codec Framework
> 
> 
> Hello,
> 
> Streamable codecs make a lot of sense for some codecs (but 
> perhaps not for the language codecs). Thanks for bringing the 
> topic up. I took a very quick look at the code you refer to 
> and it seems to be a separate framework from what we have in 
> [codec] today (I could be wrong of course), especially the 
> whole Producer/Consumer business.
> 
> A simple example I can think of that could drive an 
> implementation could be:
> 
> InputStream inputStream = ... new File(...); 
> DigiestUtil.md5Hex(inputStream);
> 
> It would be interesting to see how Ant implements MD5 and SHA.
> 
> This probably means that Encoder.encode(Object) should also 
> handle I/O/Streams and Reader/Writer...
> 
> Gary


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-11 Thread Brett Henderson

Thanks for the reply.

Yep, it is a completely different framework.  I wrote
the framework before looking at the current commons codec
component so there are no relationships between the two.
Hopefully there are ways of incorporating ideas between the
two.

Are you hoping to incorporate streamed processing into the
existing design or create new classes to achieve this?  I
have no preferences either way but it could be tricky to
re-use the existing interfaces. I'll have to look at this
further though.

I'll look at Ant as soon as I can to see how it approaches
the problem.

Hmm, your simple example already uncovers a gap in my
design :-) Should be easily solved though. Phew.

Currently I can't process all data off an input stream writing
to a destination without some manual coding.  However I can
resolve this by creating a "Producer" that reads InputStreams.
I will call this InputStreamProducer.

My InputStreamProducer will implement ByteProducer and will
have a method (eg. pump()) which pumps data from the provided
input stream to the "ByteConsumer" attached to it.

Using my new InputStreamProducer I can perform MD5 Hex encoding
of an input stream creating a result String as follows:

  // Create processing objects.
  MD5 md5 = new MD5();
  AsciiHexEncode asciiHexEncode = new AsciiHexEncode();
  InputStreamProducer source = new InputStreamProducer(inputStream);
  BufferByteConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Set up processing chain.
  source.setConsumer(md5);
  md5.setConsumer(asciiHexEncode);
  asciiHexEncode.setConsumer(resultBuf);

  // Process all available data.
  source.pump();

  // Obtain result hash.
  result = new String(resultBuf.getData());

If I eliminate all calls to ".setConsumer()" by adding the necessary
constructors to accept consumers, the above code can be shortened to.

  ByteBufferConsumer resultBuf = new BufferByteConsumer();
  String result;

  // Create md5 hash of data from inputStream.
  new InputStreamProducer(inputStream, new MD5(new
AsciiHexEncode(resultBuf))).pump();
  result = new String(resultBuf.getData());

What do you think?  Static utility methods could simplify the above
even further if necessary.

Supporting Reader/Writer and any other IO classes should be as simple
as defining the relevant Consumer/Producer implementations to interact
with them.  Codec algorithms won't require modification.

Cheers,
Brett

> -Original Message-
> From: Gary Gregory [mailto:[EMAIL PROTECTED] 
> Sent: Friday, 9 January 2004 4:15 PM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Streamable Codec Framework
> 
> 
> Hello,
> 
> Streamable codecs make a lot of sense for some codecs (but 
> perhaps not for the language codecs). Thanks for bringing the 
> topic up. I took a very quick look at the code you refer to 
> and it seems to be a separate framework from what we have in 
> [codec] today (I could be wrong of course), especially the 
> whole Producer/Consumer business.
> 
> A simple example I can think of that could drive an 
> implementation could be:
> 
> InputStream inputStream = ... new File(...); 
> DigiestUtil.md5Hex(inputStream);
> 
> It would be interesting to see how Ant implements MD5 and SHA.
> 
> This probably means that Encoder.encode(Object) should also 
> handle I/O/Streams and Reader/Writer...
> 
> Gary


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2004-01-06 Thread Brett Henderson

There seemed to be definite interest in streamable
codecs but the list has gone fairly quiet.

I am interested in participating in work of this
kind but I'm not sure how to proceed.

I don't think this deserves to be a standalone
project as it seems to fit fairly well into the
scope of the current codec package and I don't
want to step on any toes with respect to the
existing codec project.

I believe Gary Gregory and Tim O'Brien are the two
primary codec committers.  Gary and Tim, your thoughts
would be most appreciated.

I'm providing the code mentioned in the below
message as an example because I believe it is more
effective to discuss working code than talk about
abstract ideas.

> -Original Message-
> From: Brett Henderson [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, 13 November 2003 10:33 AM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Streamable Codec Framework
> 
> 
> I made some changes to the code I supplied previously, it can 
> be found at the following URL.
> 
> http://www32.brinkster.com/bretthenderson/BHCodec-0.5.zip
> 
> The main differences relate to the codec interfaces and 
> support for data types other than "byte", the encoding 
> algorithms are largely unchanged.
> 
> A quick summary of the framework is as follows:
> 
> Framework is based around consumers and producers, consumers 
> accept incoming data and producers produce outgoing data. A 
> consumer implements the Consumer interface and a producer 
> implements the Producer interface.
> 
> Specialisations of these interfaces are used for each type
> of data to be converted.  For example there are currently 
> ByteConsumer, ByteProducer, CharConsumer and CharProducer interfaces.
> 
> The engine package contains classes (and interfaces) that
> are both consumers and producers (ie. accept incoming data
> and produce result data).  For example there is a ByteEngine 
> interface that implements ByteConsumer and ByteConsumer 
> interfaces and is in turn implemented by the Base64Encode 
> concrete class.
> 
> Engines may consume one kind of data and produce another,
> the CharByteEngine interface defines an engine that
> consumes characters and produces bytes.  This is implemented
> by the CharSetEncode class (untested).
> 
> The consumer package contains classes that consume data
> and perform an action on the data that doesn't allow it to
> be accessed via producer functionality.  For example, the 
> BufferByteConsumer class acts as a receiving buffer for 
> encoding results, the OutputStreamConsumer writes all data to 
> an OutputStream.
> 
> The producer package contains classes that produce data
> for the framework but don't accept data via consumer 
> functionality.  For example, the OutputStreamProducer is an 
> OutputStream that "produces" all data passed to it.
> 
> The io package contains classes that fit into the java.io 
> functionality that are neither consumers or producers in the 
> framework sense.  For example, the CodecOutputStream is a 
> FilterOutputStream that uses an internal ByteEngine to 
> perform a transformation on the data passing through it.
> 
> JUnit tests exist for most classes in the framework.
> All testing is performed using JUnit.  If there is no unit
> test for a class, it can be considered untested.
> 
> The framework is now generic enough to handle data of any
> type and allow classes to be defined which can accept
> any kind of data and/or produce any kind of data.  All
> data can be processed in a "streamy" fashion.  For example, 
> encoding engines implementing the ByteEngine interface can be 
> plugged into CodecOutputStream or CodecInputStream and used 
> for stream functionality without directly supporting java.io streams.
> 
> Using the CharSetEncode and (currently non-existent) 
> CharSetDecode, it should be possible to encode character data 
> to base64 then write result to a Writer.  This should go part 
> way towards helping Konstantin with his XML conversions.
> 
> Sorry about the brain dump but there is a fair bit
> contained in the zip file and I thought some explanation
> would be useful.
> 
> Any feedback on the above is highly welcome.  I don't
> plan on making too many more changes unless it is
> deemed useful.
> 
> Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-12 Thread Brett Henderson

I made some changes to the code I supplied previously, it can
be found at the following URL.

http://www32.brinkster.com/bretthenderson/BHCodec-0.5.zip

The main differences relate to the codec interfaces and support
for data types other than "byte", the encoding algorithms
are largely unchanged.

A quick summary of the framework is as follows:

Framework is based around consumers and producers, consumers
accept incoming data and producers produce outgoing data. A
consumer implements the Consumer interface and a producer
implements the Producer interface.

Specialisations of these interfaces are used for each type
of data to be converted.  For example there are currently
ByteConsumer, ByteProducer, CharConsumer and CharProducer
interfaces.

The engine package contains classes (and interfaces) that
are both consumers and producers (ie. accept incoming data
and produce result data).  For example there is a ByteEngine
interface that implements ByteConsumer and ByteConsumer
interfaces and is in turn implemented by the Base64Encode
concrete class.

Engines may consume one kind of data and produce another,
the CharByteEngine interface defines an engine that
consumes characters and produces bytes.  This is implemented
by the CharSetEncode class (untested).

The consumer package contains classes that consume data
and perform an action on the data that doesn't allow it to
be accessed via producer functionality.  For example, the
BufferByteConsumer class acts as a receiving buffer for
encoding results, the OutputStreamConsumer writes all data
to an OutputStream.

The producer package contains classes that produce data
for the framework but don't accept data via consumer
functionality.  For example, the OutputStreamProducer
is an OutputStream that "produces" all data passed to it.

The io package contains classes that fit into the java.io
functionality that are neither consumers or producers in
the framework sense.  For example, the CodecOutputStream
is a FilterOutputStream that uses an internal ByteEngine
to perform a transformation on the data passing through it.

JUnit tests exist for most classes in the framework.
All testing is performed using JUnit.  If there is no unit
test for a class, it can be considered untested.

The framework is now generic enough to handle data of any
type and allow classes to be defined which can accept
any kind of data and/or produce any kind of data.  All
data can be processed in a "streamy" fashion.  For example,
encoding engines implementing the ByteEngine interface
can be plugged into CodecOutputStream or CodecInputStream
and used for stream functionality without directly
supporting java.io streams.

Using the CharSetEncode and (currently non-existent)
CharSetDecode, it should be possible to encode character
data to base64 then write result to a Writer.  This
should go part way towards helping Konstantin with his
XML conversions.

Sorry about the brain dump but there is a fair bit
contained in the zip file and I thought some explanation
would be useful.

Any feedback on the above is highly welcome.  I don't
plan on making too many more changes unless it is
deemed useful.

Brett


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-10 Thread Brett Henderson

1.2.2 it is then :-)

I agree with maintaining 1.2.2 compatibility, it
is a bit harsh to require 1.4 to perform base64 encoding.
Unfortunately it would make life a lot easier with regards
to charset encoding ...

It should be possible to use OutputStreamWriter and
InputStreamReader internally to perform the conversions
without incurring much of a performance overhead.
For example a CharByteEngine??? could use OutputStreamWriter
internally to perform charset encoding.

In many cases OutputStreamWriter and InputStreamReader
can be used directly, it is the cases where byte to char
conversion is required during output streaming that require
an encoder for transforming between chars and bytes.
Perhaps I'm missing something here though ...

I also think it would be useful to be able to perform
charset conversion without depending on streams.

> -Original Message-
> From: Gary Gregory [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, 11 November 2003 4:19 AM
> To: 'Jakarta Commons Developers List'
> Subject: RE: [codec] Streamable Codec Framework
> 
> 
> Yes, no problem, 1.2.2.
> 
> Gary
> 
> > -Original Message-
> > From: Tim O'Brien [mailto:[EMAIL PROTECTED]
> > Sent: Monday, November 10, 2003 08:10
> > To: Jakarta Commons Developers List
> > Subject: RE: [codec] Streamable Codec Framework
> > 
> > Oleg, this is understood - 1.2.2 should be our LCD for codec.
> > 
> > Tim
> > 
> > 
> > On Mon, 10 Nov 2003 [EMAIL PROTECTED] wrote:
> > 
> > > Tim, Gary, et al
> > > Streamable codec framework would be a welcome addition to Commons 
> > > Codec. However, as far as we (Commons HttpClient) are 
> concerned, the 
> > > decision
> > to
> > > ditch java 1.2.2 support would render Codec unusable for 
> us (and I'd
> > guess
> > > a few other projects that still need to maintain java 1.2.2
> > compatibility).
> > > Not that we like it too much, but because lots of our users still 
> > > demand it.
> > >

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-09 Thread Brett Henderson

I think the design of the codec framework could cover
your requirements but it will require more functionality
than it currently has.

> > > > Some of the goals I was working towards were:
> > > > 1. No memory allocation during streaming.  This eliminates
> > > > garbage collection during large conversions.
> > > Cool. I got large conversions... I'm already at
> > > mediumblob in mysql , and it goes up/down XML
> > stream
> > > :)
> > 
> > I have a lot to learn here.  While I have some
> > knowledge
> > of XML (like every other developer on the planet), I
> > have never used it for large data sets or used SAX
> > parsing.
> > Sounds like a good test to find holes in the design
> > :-)
> 
> It's easy. You got callback, where you can gobble up
> string buffers with incoming chars for element
> contents.  ( and there is a lot of this stuff... )
> After tag is closed, you have all the chars in a big
> string buffer, and get another callback - in this
> callback you have to convert data, and do whatever
> necessary ( in my case, create input stream, and pass
> it to database ) 

This could be tricky, it's something I've been thinking
about but would like feedback from others about the best
way of going about it.

The data you have available is in character format.
The base64 codec engine operates on byte buffers.
The writer you want to write to requires the data
to be in character format.

I have concentrated on byte processing for now because
it is the most common requirement.  XML processing
requires that characters be used instead.

It makes no sense to perform base64 conversion on
character arrays directly because base64 is only 8-bit
aware (you could split each character into two bytes
but this would blow out the result buffer size where
chars only contain ASCII data).

I think it makes more sense to perform character to
byte conversion separately (perhaps through
extensions to existing framework) and then perform
base64 encoding on the result.  I guess this is a
UTF-16 to UTF-8 conversion ...

What support is there within the JDK for performing
character to byte conversion?
JDK1.4 has the java.nio.charset package but I can't
see an equivalent for JDK1.3 and lower, they seem to
use com.sun classes internally when charset conversion
is required.

If JDK1.4 is considered a sufficient base, I could
extend the current framework to provide conversion
engines that translate from one data representation
to another.  I could then create a new CodecEngine
interface to handle character buffers (eg.
CodecEngineChar).


> > > > 3. Customisable receivers.  All codecs utilise
> > > > receivers to
> > > > handle conversion results.  This allows
> > different
> > > > outputs such as
> > > > streams, in-memory buffers, etc to be supported.
> > > 
> > > And writers :) Velocity directives use them.
> > 
> > Do you mean java.io.Writer?  If so I haven't
> > included
> > direct support for them because I focused on raw
> > byte
> > streams.  However it shouldn't be hard to add a
> > receiver to write to java.io.Writer instances.
> 
> 
> My scenarios: 
> - I'm exporting information as base64 to XML with help
> ov velocity. I do it through custom directive - 
> in this directive I get a Writer from velocity, where
> I have to put my data. 
> 
> Ideally codec would do: read input stream - encode -
> put it into writer without allocating too much 
> memory. 
> 
> I'm importing information:
> - I have stream ( string ) of base 64 data - 
> codec gives me an input stream which is fed from this
> source and does not allocate too much memory and
> behaves polite...
> 
The current framework doesn't handle direct conversion
from an input stream to an output stream but this
would be simple to add if required.
Again, the hard part would be the char/byte issues.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] Streamable Codec Framework

2003-11-04 Thread Brett Henderson

> > I noticed Alexander Hvostov's recent email
> > containing streamable
> > base64 codecs.  Given that the current codec
> > implementations are
> > oriented around in-memory buffers, is there room for
> > an
> > alternative codec framework supporting stream
> > functionality?  I
> > realise the need for streamable codecs may not be
> > that great but
> > it does seem like a gap in the current library.
> 
> I'm in the need. So we are at least 3 :) 
> 
> 
> > Some of the goals I was working towards were:
> > 1. No memory allocation during streaming.  This
> > eliminates
> > garbage collection during large conversions.
> Cool. I got large conversions... I'm already at
> mediumblob in mysql , and it goes up/down XML stream
> :)

I have a lot to learn here.  While I have some knowledge
of XML (like every other developer on the planet), I
have never used it for large data sets or used SAX parsing.
Sounds like a good test to find holes in the design :-)

> > 3. Customisable receivers.  All codecs utilise
> > receivers to
> > handle conversion results.  This allows different
> > outputs such as
> > streams, in-memory buffers, etc to be supported.
> 
> And writers :) Velocity directives use them.

Do you mean java.io.Writer?  If so I haven't included
direct support for them because I focused on raw byte
streams.  However it shouldn't be hard to add a
receiver to write to java.io.Writer instances.

> I'll give it a look at and come back later today :) 

I look forward to your feedback.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[codec] Streamable Codec Framework

2003-11-02 Thread Brett Henderson

I just realised I left off "codec" in the subject.  Sorry about
that.

-Original Message-----
From: Brett Henderson [mailto:[EMAIL PROTECTED] 
Sent: Monday, 3 November 2003 10:47 AM
To: [EMAIL PROTECTED]
Subject: Streamable Codec Framework

Hi All,

I noticed Alexander Hvostov's recent email containing streamable
base64 codecs.  Given that the current codec implementations are
oriented around in-memory buffers, is there room for an
alternative codec framework supporting stream functionality?  I
realise the need for streamable codecs may not be that great but
it does seem like a gap in the current library.

I have done some work in this area over the last couple of months
as a small hobby project and have produced a small framework for
streamable codecs.

Some of the goals I was working towards were:
1. No memory allocation during streaming.  This eliminates
garbage collection during large conversions.
2. Pipelineable codecs.  This allows multiple codecs to be chained
together and treated as a single codec.  This allows codecs such as
base 64 to be broken into two components (base64 and line wrapping
codecs).
2. Single OutputStream, InputStream implementations which
utilise codec engines internally.  This eliminates the need to
produce a buffer based engine and a stream engine for every codec.
Note that this requires codec engines to be written in a manner
that supports streaming.
3. Customisable receivers.  All codecs utilise receivers to
handle conversion results.  This allows different outputs such as
streams, in-memory buffers, etc to be supported.
4. Direction agnostic codecs.  Decoupling the engine from the
streams allows the engines to be used in different ways than
originally intended.  Ie. You can perform base64 encoding
during reads from an InputStream.

I have produced base64 and ascii hex codecs as a proof of concept
and to evaluate performance.  It isn't as fast as the current
buffer based codecs but is unlikely to ever be as fast due to the
extra overheads associated with streaming.
Both base64 and ascii hex implementations can produce a data rate
of approximately 40MB/sec on a Pentium Mobile 1.5GHz notebook.
With some performance tuning I'm sure this could be improved,
I think array bounds checking is the largest performance hit.

Currently requires jdk1.4 (exception handling requires rework
for jdk1.3).
Running ant without arguments in the root directory will build
the project, run all unit tests and run performance tests.  Note
that the tests require junit to be available within ant.

Javadocs are the only documentation at the moment.

Files can be found at:
http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip

I hope someone finds this useful.  I'm not trying to force my
implementation on anybody and I'm sure it could be improved in
many ways.  I'm simply putting it forward as an optional approach.
If it is decided that streamable codecs are a useful addition to
commons I'd be glad to help.

Cheers,
Brett

PS.  Some areas that currently need improving are:
1. Exception handling requires jdk1.4, should be rewritten to
support older java versions.
2. BufferReceiver allocates memory continuously during streamed
conversions, should be fixed to recycle memory buffers.
3. Engines should have a new flush method added to allow them
to hold off posting to receivers until their internal buffers
fill up.  This would prevent fragmented buffers during
pipelined conversions.
4. OutputStream flush needs rework, shouldn't call finalize,
should call new flush method on CodecEngines.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Streamable Codec Framework

2003-11-02 Thread Brett Henderson

Hi All,

I noticed Alexander Hvostov's recent email containing streamable
base64 codecs.  Given that the current codec implementations are
oriented around in-memory buffers, is there room for an
alternative codec framework supporting stream functionality?  I
realise the need for streamable codecs may not be that great but
it does seem like a gap in the current library.

I have done some work in this area over the last couple of months
as a small hobby project and have produced a small framework for
streamable codecs.

Some of the goals I was working towards were:
1. No memory allocation during streaming.  This eliminates
garbage collection during large conversions.
2. Pipelineable codecs.  This allows multiple codecs to be chained
together and treated as a single codec.  This allows codecs such as
base 64 to be broken into two components (base64 and line wrapping
codecs).
2. Single OutputStream, InputStream implementations which
utilise codec engines internally.  This eliminates the need to
produce a buffer based engine and a stream engine for every codec.
Note that this requires codec engines to be written in a manner
that supports streaming.
3. Customisable receivers.  All codecs utilise receivers to
handle conversion results.  This allows different outputs such as
streams, in-memory buffers, etc to be supported.
4. Direction agnostic codecs.  Decoupling the engine from the
streams allows the engines to be used in different ways than
originally intended.  Ie. You can perform base64 encoding
during reads from an InputStream.

I have produced base64 and ascii hex codecs as a proof of concept
and to evaluate performance.  It isn't as fast as the current
buffer based codecs but is unlikely to ever be as fast due to the
extra overheads associated with streaming.
Both base64 and ascii hex implementations can produce a data rate
of approximately 40MB/sec on a Pentium Mobile 1.5GHz notebook.
With some performance tuning I'm sure this could be improved,
I think array bounds checking is the largest performance hit.

Currently requires jdk1.4 (exception handling requires rework
for jdk1.3).
Running ant without arguments in the root directory will build
the project, run all unit tests and run performance tests.  Note
that the tests require junit to be available within ant.

Javadocs are the only documentation at the moment.

Files can be found at:
http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip

I hope someone finds this useful.  I'm not trying to force my
implementation on anybody and I'm sure it could be improved in
many ways.  I'm simply putting it forward as an optional approach.
If it is decided that streamable codecs are a useful addition to
commons I'd be glad to help.

Cheers,
Brett

PS.  Some areas that currently need improving are:
1. Exception handling requires jdk1.4, should be rewritten to
support older java versions.
2. BufferReceiver allocates memory continuously during streamed
conversions, should be fixed to recycle memory buffers.
3. Engines should have a new flush method added to allow them
to hold off posting to receivers until their internal buffers
fill up.  This would prevent fragmented buffers during
pipelined conversions.
4. OutputStream flush needs rework, shouldn't call finalize,
should call new flush method on CodecEngines.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] StatefulDecoders

RE: [codec] More thoughts on CharSets and Encoders (references: RE: [codec] Streamable Codec Framework)

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] multipart encoders/decoders

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

RE: [codec] Streamable Codec Framework

[codec] Streamable Codec Framework

Streamable Codec Framework

24 matches

Site Navigation

Mail list logo

Footer information