Hi,

On Sunday 11 January 2009, Francesco Romani wrote:
> I do agree completely.
> I've thought a bit more on this subject, and quickly come out with the
> following: a low-hanging fruit *seems to be* to use double buffering for
> work buffers:
>
> typedef struct tcprocessingnode_ TCProcessingNode;
> strut tcprocessingnode_ {
>       TCProcessingNode *upstrem;
>       TCModule         *module;
>       TCFrameBuffer    *frames[2];
>       int              work; /* index of the framebuffer to use */
>
>       int              (*request_frame)(TCProcessorItem *P, TCFrameBuffer
> *frame); int              (*wait_frame)(TCProcessorItem *P, TCFrameBuffer
> *frame); };
>
> static int int generic_get_frame_mt_friendly(TCProcessorItem *P,
> TCFrameBuffer *frame) {
>     int err = TC_OK;
>     /* init postconditions: frames[2] point to valid framebuffers, work=0
> */ P->upstream->request_frame(P->upstream, P->frames[!work]); /* async, in
> a separate thread */ P->upstream->wait_frame(P->upstream, P->frames[work]);
>     /* blocks until ready, hopefully 99% of times just not blocks */
>     err = tc_module_process(P->module, P->frames[work], frame);
>     work = !work; /* done with current buffer, so rotate it */
>     return err;
> }
>
> Locking and blocking has to be sorted out clearly (the above is just
> lacking) but the core idea is to let current step and upstream (and so on
> until leafnodes) to work in parallel. Hope it's clear enough, too :)
>
I think there is a better way to deal with it. Why not having a FIFO at each 
edge of the tree. Each node runs in its thread and basically asks the 
upstream FIFO(s) for a frame and processes it and stores it in the downstream 
FIFO(s). It only blocks if there are not frames available otherwise it can 
run on full speed. If we have a demux or mux node then it has simply more 
then one in/out-going edge (FIFO). There are serveral ways to do it. Either 
the FIFOS life in the nodes or are extra objects. Let's say we have them 
separate for now: 
Scetch in pseudocode:
ProcessingNode {
        ProcessingEdge **parents;
        int numparents;
        ProcessingEdge **children;
        int numchildren;
        Module* module
        void run(ProcessingNode*,Module*); // runs in a thread
        // eventually
        void processFrameInfo(ProcessingNode*,Module*); // once at the initial 
phase 
(single threaded)
}
void run(....){
        pull frames from parents (maybe paired with the FrameInfo)
        process frames with module
        push frames to children
}
processFrameInfo(....){
        get frame info(s) from children
        ask module to emit possibly changed version
        store new frame info in parents
}


The ProcessingEdge is a FIFO with frame meta info.
ProcessingEdge {
        Frame **buffer
        int length
        int indexIn;
        int indexOut;
        push(...);
        pull(...);
        // eventually also 
        FrameInfo* info;
}

In practice we could actually hand-code different types of processing nodes. 
One for 1 to 1 mapping and others for mux and demux type. In the core we only 
have to assemble the graph and kick off the threads. 
I know that we don't want to change the interface for the modules but in the 
long run I think we should use the new architecture to allow the Frame meta 
data to change. E.g. to allow resizing and stuff just everywhere. 
Allow modules could get multiple frames as input as once. Currently the 
filters can already specify how many frames they need but they still have to 
store them internally. Any this is just a minor thing.

Now if we really want multithread nodes, say for encoding, then we can have a 
special ProcessingNodeMT that handles the frame order and stuff. If we 
restrict ourselves to parallelise modules that do not change frameorder and 
the like. we get away with a simple sliding window or heap. From outside it 
looks the same so that this node also delivers ready frame to the next FIFO.

Concerning the memory I would say that normally modules can just pass the 
framepointers through as long as they don't change it's size and stuff. If 
they delete a frame I think it is obvious to delete the data as well. If 
frames are cloned or modified in size then the modules have to allocate 
memory for them - also pretty obvious I think. 

Just my 2 cents.
Regards!
        Georg

-- 
---- Georg Martius,  Tel: +49 177 6413311  -----
------- http://www.flexman.homeip.net ----------

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to