On Wed, 2009-01-07 at 10:39 +0000, Andrew Church wrote: > I think you're misunderstanding my approach. It's not an issue of > "more" or "less" frames; I want to treat the two streams of frames as > entirely separate, so filters are free to add, delete, reorder, delay > (as with smartyuv and such), or do whatever they want and the core > doesn't have to worry about keeping track. If anything, doing away > with zero copy for now and making each filter copy frames to a > distinct output buffer will help enforce this idea of having > unconnected input and output streams, I think.
OK, here we go again, next try :) I'm going to explain my pet project which I feel is very similar to the one you're proposing. Then, I'll try to list differencies/similairities with respect the yours. This will be a *long* mail. :^) Anyway, before to start, there is a key issue that I've to raise. Sounds like we're talking about pretty radical changes in how transcode (the executable) works (I'll surely do), but I do really want to have shorter development cycles for 1.2.0 and beyond. 1.1.0 took 2 years and counting. We know the first cause of that, by far, was lack of time, but I think is better to use efficiently our development time. Since we're talking about radical changes, and since there is a repository change in sight, what I'm proposing (and what I *really* like to do) is to do a fresh start. Instead modifying `transcode', let's add a new executable with it's source tree, keeping `transcode' fairly stable, without changing it's own fundamentals like framebuffering model, processing stages, threading and so on. I volunteer for that. In practice, this mean (into e.g. CVS HEAD) cd transcode-HEAD (cvs) mv src transcode (fix Makefiles and stuff) mkdir transcode-ng-whatever cvs add transcode-ng-whatever cd transcode-ng-whatever while !satisfied || !release-due { hack-here apply-some-fixes-back-to-transcode } By sharing all the module set and the internal libraries (and maybe more), that's not big deal IMO. +++ The headline first. My own pet project is to add a scripting interface on transcode and to build something similar *in principles* to avisynth (www.avisynth.org). This essentially need 1. a scripting interface 2. a whole new way for the modules for talk each other and interchange data. For #1, see http://tcforge.berlios.de/archives/2008/10/11/scripting_interface/index.html and.or drop me a mail separately. For #2, that's a pretty much a synonim of "new framebuffer model". So, let's go. By building a processing pipeline using transcode (but this is also true in general with minor modifications, e.g. with other multimedia tools) we're essentially build an hidden tree of processing stages. Each processing stage is implemented by 0+ modules The root is always the *muxer*, and the processing stages which does no-op can be elided for the sake of similicity. The job of the framebuffer is to connect them and let interchange data. Let's see a few examples 1) transcode -i /dev/zero -o /dev/null translates into the graph demux_null(A)->+->mux_null | demux_null(V)--' while something like 2) transcode -i file.avi -J smartyuv,xsharpen -y xvid,lame,avi translates into the graph (assuming file.avi being MJPEG/PCM) demux_avi(A)----------------------------------------------------------------------->encode_lame->+->mux_avi | demux_avi(V)->decode_ffmpeg(V)->filter_levels(V)->filter_smartyuv->filter_xsharpen->encode_xvid--' a simpler example can be 3) transcode -i foo.bar --mplayer_probe -o xvid,lame,avi demux_mplayer(A)->encode_lame->+->mux_avi | demux_mplayer(V)->encode_xvid--' >From here and for the sake of brevity, I'll use "upstream" meaning a module >that is near to the demux with respect to a given one, and "downstream" meanong a module that is near to mux. Examples: 1) above: demux_null are upstream for mux_null 2) above: encode_lame is downstream for demux_avi(A) 3) above: filter_levels is upstream for filter_smartyuv, which has filter_xsharpen downstream The mode the transcode processing stages interchange data can be seen as "push model" because every stage 1. do it's job inconditionaly (separate threads for each stage). 2. pushes finished frame to next stage. 3. pauses when the next stage can't fetch data. That raises the need for buffering between stages, that is actually implemented using the central framebuffer code. The new model is instead a "pull model". Each stage drags frame from upstream as soon as it is ready to process it. So, the central framebuffer is gone. Each stage talks directly with upstream, so it has to link exactly the module, instead of sitting and wait for frames to come in its FIFO. All this fancy stuff can be implemented fairly efficiently if we require that caller provides the *destination* buffer for the upstream. Let's see with some pseudocode typedef struct tcprocessoritem_ TCProcessorItem; struct tcprocessoritem_ { TCProcessorItem *upstrem; TCModule *module; TCFrameBuffer *frame; int (*get_frame)(TCProcessorItem *P, TCFrameBuffer *frame); }; (Let's consider only a generic int tc_module_process(TCModule *M, TCFrameBuffer *src, TCFrameBuffer *dst); for simplicity, but dealing with the actual module operations requires few changes) having the get_frame callback implemented like static int int generic_get_frame(TCProcessorItem *P, TCFrameBuffer *frame) { int err = TC_OK; if(!P->frame) { P->frame = tc_framebuffer_alloc(); /* lifetime equals to the processor */ } err = P->upstream->get_frame(P->upstream, P->frame); if (!err) { err = tc_module_process(P->module, P->frame, frame); } return err; } The root of the tree are special processor items (demuxer) which effectively produce frames, dealing only with destination buffer static int source_get_frame(TCProcessorItem *P, TCFrameBuffer *frame) { return tc_module_demultiplex(P->module, frame, NULL); /* or return tc_module_demultiplex(P->module, NULL, frame); */ } We also got the root which does something like TCProcessorItem *A = get_audio_subtree(); TCProcessorItem *V = get_video_subtree() TCFrameBuffer *AB = tc_framebuffer_alloc(); TCFrameBuffer *VB = tc_framebuffer_alloc(); while(!interrupted) { int err; err = A->get_frame(A, AB); if (err) { break; } err = V->get_frame(V, VB); if (err) { break; } tc_module_multiplex(Muxer, AB, VB); } +++ That's essentially the core of my idea. We have (much) finer grained processing stages, no fixed slots, no need for frame ID/sequence, core doesn't need to keep track of anything. We got fairly efficient frame passing (and there is room for further improvement, but in that view we just don't have no-op stages :) There are of course open issues. Skipped frames are easy since they are transparent to downstream. Cloned frames should like minimal effort (internal buffering which should be easy generalized). Of course the exposition above is the bare minimum, yet already pretty long and a lot of things have to be defined exactly before to go, but IMHo the very basic idea is sound and VERY appealing to me (and BTE already implemented, even not exaclty like that, in some other softwares. for example, libavfilter [docs, not code] was an insipration for me in thinking all the above). +++ I see some similarities with Andrew's proposal (maybe I'm wrong again, btw =)), and maybe this one is even more radical than the Andrew's one. I think the keypoint in both ideas is to eventually get rid of the central framebuffer, and that was one of the reasons I had hard time understanding: as exposed in the very beginning, i *DO NOT* want to get rid of the central framebuffer into the legacy transcode. Still a lot to say, but a lot already said and I don't want to DoS the list/discussion, so it's all for now Comments welcome. (and don't be afraid to say "you still don't get it!" if it's the case ;)) Bests, -- Francesco Romani // Ikitt http://fromani.exit1.org ::: transcode homepage http://tcforge.berlios.de ::: transcode experimental forge