I would note that if this does end up in the specification, it can carry a rider that implementations should exhibit behaviour as demonstrated by the algorithm, not that they need to implement the algorithm itself.
However I leave it to the working group to ascertain whether they prefer this as a means of specification. Regards Keith > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Paul Kyzivat > Sent: Tuesday, September 30, 2008 2:53 PM > To: [EMAIL PROTECTED] > Cc: [email protected] > Subject: Re: [Sip] Is body handling NP-complete? > > Dale, > > I like the thorough treatment. It seems plausible to me. > > One implication of this is that body parts need to be > processed twice in a content-type+content-disposition+context > dependent way: once to extract references, and again for > final effect. (Hmm - is extracting of references dependent on > the disposition and context?) > > If you imagine that various content types are handled by > plugins, then the plugins will each require two entry points > - one for each of these processing modes. While that is > certainly possible, it might mean that plugins that already > exist for non-sip applications (e.g. email or html) may not > be reusable for sip. (I am not familiar with the content-type > plugins for other protocols such as email. But I presume that > email plugins may be useful in some cases, at least for MESSAGE.) > > Do plugins for email have this complexity? > > An alternative is to do all the processing in one pass, and > then at the end check if all body parts have been processed. > BUT that means all the side effects of processing will have > been realized before discovering that the message is in > error. That is certainly not ideal. > > > This example also points out that the requirement that references > > appear before the parts they reference is needed to make > the problem > > tractable. Otherwise, there could be circular dependencies between > > the choices for different multipart/alternatives, which might be > > impossible to analyze by an algorithm as simple as a tree-walk. > > This requirement is satisfied by lots of simple cases. But it > means that it is impossible to describe any kind of circular > structure. I'm not familiar enough with the use of > multipart/related to know if it is a restriction that is > tolerable there. Perhaps someone can comment on that. > > Thanks, > Paul > > [EMAIL PROTECTED] wrote: > > Referring back to last March, in > > > http://www.ietf.org/mail-archive/web/sip/current/msg22426.html, I was > > contemplating whether the rules for testing a message body required > > searching through combinations of choices regarding which > alternatives > > of multipart/alternatives are chosen. As I then conceptualized the > > problem, the question was "Is there a choice of > alternatives such that > > all the constraints for 'successful processing' can be > met?" It turns > > out that if you phrase the question that way, the problem is > > NP-complete, which means in practice that any algorithm to solve it > > must do backtracking. > > > > The new formulation of the problem attempts to avoid this > by dividing > > the problem into three phases: (1) assemble a tree which > represents > > the dependencies of super-parts' being processable upon that of > > sub-parts, (2) walk the tree, tagging parts from the bottom > up as to > > whether they are processable, (3) if the overall tree is > processable, > > selecting for processing the correct subset of parts as directed by > > the tags in the tree. As long as the size of the tree is > commensurate > > with the number of body parts, the algorithm is inherently > linear, and > > clearly it doesn't involve backtracking. > > > > In regard to step (1), the tree is assembled: > > > > - the root node represents the message body > > - a node for a multipart/alternative part has one child for each > > component > > - a node for a multipart/mixed part has one child for each > > component > > - a note for a multipart/related part is not automatically given > > children for its components > > - a header in a part that refers to another component generates a > > child of that part that represents the referred-to component > > - similarly, a reference within a part that refers to another > > component generates a child of that part that represents the > > referred-to component (This is how multipart/related nodes get > > children.) > > > > Of course, if a component is referred to multiple times, > the component > > is represented by multiple nodes in the tree, one node for > each of its > > roles. > > > > However, there are subtleties I might be overlooking, so I > am going to > > revisit the embedding of the NP-complete problem "3-satisfiability" > > into MIME to see what it reveals about my sketch of an algorithm. > > > > Referring to the previous message: > > > > The reduction is as follows: > > > > http://en.wikipedia.org/wiki/NP-complete > > http://en.wikipedia.org/wiki/Reduction_%28complexity%29 > > http://en.wikipedia.org/wiki/3-satisfiability#3-satisfiability > > > > Choose an instance of the 3-satisfiability problem, > that is: a set of > > boolean variables A, B, C, etc. and a formula composed of: > > a series of "clauses" joined by AND, consisting of > > a series of variables and their negations > joined by OR. > > The problem is to determine if there is any assignment > of truth values > > to the variables which makes the formula true, that is, > which makes > > true one of the variables or negations in each of the clauses. > > > > For example, choose 4 variables A, B, C, and D, and the formula > > > > (A OR B OR C) AND (B OR NOT-C OR D) AND (NOT-A OR NOT-B > OR NOT-D) > > > > This can be satisfied by setting A=TRUE, B=FALSE, > C=TRUE, and D=FALSE. > > > > Construct a multipart/mixed with each component labeled > > handling=required. Within that, for each variable, > have a part which > > is multipart/alternate with 2 parts. If processing > chooses the first > > part of the part corresponding to a variable, that will > model the > > variable being TRUE. If processing chooses the second > part of the > > part corresponding to a variable, that will model the > variable being > > FALSE. > > > > For example, our body starts: > > > > multipart/mixed > > multipart/alternative;handling=required > > first part corresponding to A (not > yet specified) > > second part corresponding to NOT-A > > multipart/alternative;handling=required > > first part corresponding to B > > second part corresponding to NOT-B > > multipart/alternative;handling=required > > first part corresponding to C > > second part corresponding to NOT-C > > multipart/alternative;handling=required > > first part corresponding to D > > second part corresponding to NOT-D > > > > For each clause of the formula, add an additional body > part, marked > > "Content-Disposition: by-reference; handling=required". > Give the part > > a 'cid' URL. The ability of the recipient to > successfully process > > this part will model making the corresponding clause TRUE. > > > > For example, our body is now: > > > > multipart/mixed > > multipart/alternative;handling=required > > first part corresponding to A (not > yet specified) > > second part corresponding to NOT-A > > multipart/alternative;handling=required > > first part corresponding to B > > second part corresponding to NOT-B > > multipart/alternative;handling=required > > first part corresponding to C > > second part corresponding to NOT-C > > multipart/alternative;handling=required > > first part corresponding to D > > second part corresponding to NOT-D > > part (not otherwise > specified);by-reference;handling=required;cid=clause1 > > part (not otherwise > specified);by-reference;handling=required;cid=clause2 > > part (not otherwise > > specified);by-reference;handling=required;cid=clause3 > > > > We now want to arrange things so that one of the last 3 > parts can be > > successfully processed only if we choose for processing > one of the > > earlier parts that corresponds to a variable value that > will make the > > corresponding clause true. This is done by embedding forward > > references in the variable-corresponding parts that refer to the > > clause-corresponding parts. > > > > For example: > > > > 1 multipart/mixed > > 2 multipart/alternative;handling=required > > 3 first part corresponding to A > > references cid:clause1 > > 4 second part corresponding to NOT-A > > references cid:clause3 > > 5 multipart/alternative;handling=required > > 6 first part corresponding to B > > references cid:clause1 > > references cid:clause2 > > 7 second part corresponding to NOT-B > > references cid:clause3 > > 8 multipart/alternative;handling=required > > 9 first part corresponding to C > > references cid:clause1 > > 10 second part corresponding to NOT-C > > references cid:clause2 > > 11 multipart/alternative;handling=required > > 12 first part corresponding to D > > references cid:clause2 > > 13 second part corresponding to NOT-D > > references cid:clause3 > > 14 part;by-reference;handling=required;cid=clause1 > > 15 part;by-reference;handling=required;cid=clause2 > > 16 part;by-reference;handling=required;cid=clause3 > > > > In order to process this body, the recipient must > choose exactly one > > of each pair of bodies in the multipart/alternative > body parts. This > > choice is equivalent to choosing a set of values for > the variables. > > The references embedded in the chosen parts refer to > the body parts > > that correspond to the clauses which are made true by the > > corresponding variable's value. In order to > successfully process the > > final body parts, the recipient must have chosen a set > of variable > > values that makes all of the clauses true. > > > > The resulting tree is: > > > > 1 -+- 2 --+- 3 ---- 14 > > | | > > | +- 4 ---- 16 > > | > > +- 5 --+- 6 --+- 14 > > | | | > > | | +- 15 > > | | > > | +- 7 ---- 16 > > | > > +- 8 --+- 9 ---- 14 > > | | > > | +- 10 --- 15 > > | > > +- 11 -+- 12 --- 15 > > | | > > | +- 13 --- 16 > > | > > +- 14 > > | > > +- 15 > > | > > +- 16 > > > > Though the number of nodes is greater than the number of > body parts, > > it's limited approximately by the number of appearances of > variables > > in the original boolean formula, so it's linear in the > number of body > > parts. So in building the tree, we don't have exponential effort. > > > > In the new approach, we walk the tree, tagging each node as > to whether > > it is processable. In the above model, we assume that all > the simple > > parts are processable, so all nodes labeled 2 through 16 are marked > > processable. And as we tag the nodes for processability, we choose > > for each multipart/alternative node which child will be processed > > (namely, the last child that is processable). In this tree, the > > choices are 2 -> 4, 5 -> 7, 8 -> 10, and 11 -> 13. > > > > Where things get interesting is deciding if node 1 is processable. > > All of its children are processable, of course. But the > crux of the > > construction is how the "by-reference;handling=required" > constraints > > on nodes 14, 15, and 16 are handled. In this algorithm, we check > > whether each part is referenced by a part that will be processed. > > Fortunately, all references to these nodes must have been > tree-walked > > before node 1 is examined, so can determine unambiguously > whether used > > references point to those nodes. In this case, we have chosen to > > process 2, 4, 5, 7, 8, 10, 11, and 13, which means that > there are used > > references to 16, 16, 15, and 16. This leaves 17 > unreferenced, and so > > node 1 is determined to be unprocessable. > > > > Note that the new algorithm avoids being NP-complete essentially by > > making choices as soon as they are seen. In this model, > values NOT-A, > > NOT-B, NOT-C, and NOT-D are chosen outright. Since the example > > formula is not satisfied, the body that encodes it is not > processable. > > > > This example also points out that the requirement that references > > appear before the parts they reference is needed to make > the problem > > tractable. Otherwise, there could be circular dependencies between > > the choices for different multipart/alternatives, which might be > > impossible to analyze by an algorithm as simple as a tree-walk. > > > > Dale > > _______________________________________________ > > Sip mailing list https://www.ietf.org/mailman/listinfo/sip > > This list is for NEW development of the core SIP Protocol Use > > [EMAIL PROTECTED] for questions on current sip Use > > [EMAIL PROTECTED] for new developments on the application of sip > > > _______________________________________________ > Sip mailing list https://www.ietf.org/mailman/listinfo/sip > This list is for NEW development of the core SIP Protocol Use > [EMAIL PROTECTED] for questions on current sip > Use [EMAIL PROTECTED] for new developments on the application of sip > _______________________________________________ Sip mailing list https://www.ietf.org/mailman/listinfo/sip This list is for NEW development of the core SIP Protocol Use [EMAIL PROTECTED] for questions on current sip Use [EMAIL PROTECTED] for new developments on the application of sip
