Re: Unnecessary complexity in DocOpCollector

Tad Glines Fri, 12 Feb 2010 10:04:52 -0800

OK, so my initial assumption of N-1 compositions was correct and I
just made a mistake somewhere e=when walking through the algorithm.


So, what you are saying is that the cost of a compose operation is
proportional to the size (number of items) in the two BufferedDocOps.

-Tad

On Thu, Feb 11, 2010 at 7:04 PM, Alexandre Mah <[email protected]> wrote:
> The number of compose calls is always N-1.  The difference in time
> complexity is due to the amount of work the compositions need to do,
> not the number of compositions that are done.
>
> The cost of composing two operations of sizes x and y is x + y, and
> the resulting operation has size roughly x + y.
>
> Suppose you have operations A, B, C, D, E, F, G, and H of sizes a, b,
> c, d, e, f, g, and h respectively.
>
> If you compose the operations in the following way:
> ((((((AB)C)D)E)F)G)H
> then the cost of composing A and B is (approximately) a + b, the cost
> of composing AB with C is a + b + c, the cost of composing ABC with D
> is a + b + c + d, and so on.
> (To make the illustration simpler, if we just consider the case where
> a = b = c = d = e = f = g = 1, then the cost is roughly 2 + 3 + 4 + 5
> + 6 + 7 + 8, and it's easy to see that this cost is quadratic if we
> generalise this calculation to an arbritrary number of operations.)
>
> However, if you compose the operations in the following way:
> ((AB)(CD))((EF)(GH))
> then the cost of composing A and B is (approximately) a + b, the cost
> of composing C and D is c + d, the cost of composing AB with CD is a +
> b + c + d, and so on.  The total cost of this composition would be 3 *
> (a + b + c + d + e + f + g + h), where the 3 in this expression can be
> more usefully thought of as log 8 (due to there being 8 operations).
> So composing in this way gives O(n log n) time complexity.
>
> Note that both ways of composing N operations perform exactly N-1
> compositions, but one has n^2 time complexity and the other has n log
> n time complexity (where n is the total size of the N operations).
>
> Basing the DocOpCollector on the algorithm for incrementing a binary
> number is just an elegant way to perform the more efficient way of
> composing.
>
> For example, adding operations A, B, C, D, E, and F to the collector
> gives the following sequence of states for the operations List
> (DocOpCollector's private member field), as the operations are added
> one by one.
>
> [A]
> [null, BA]
> [C,    BA]
> [null, null, DCBA]
> [E,    null, DCBA]
> [null, FE,   DCBA]
>
> (Here, I've reversed the composition notation so that the first
> operation is on the right.)
>
> You'll notice that this sequence of states can be interpreted as the
> little-endian binary representations of 1, 2, 3, 4, 5, and 6, if you
> consider null entries to be 0 and non-null entries to be 1.
>
> Also, note that every one of the original operations will have an
> influence on at most log N of the N-1 compositions performed, since
> every time an operation takes part in a composition, it is "promoted"
> to reside in a location that represents a more significant binary
> digit.  So the time complexity for composing the N operations of total
> size n is O(n log N), which is actually a bit better than O(n log n).
>
> Alex
>
> On 2/12/10, Tad Glines <[email protected]> wrote:
>> Ok, I see it now. For number of adds N where N is between 1 and 7 the
>> number of compose calls is N-1, but after that the number of calls to
>> compose per add starts dropping such that when N = 16 the number of
>> compose calls is only 13.
>>
>> -Tad
>>
>> On Thu, Feb 11, 2010 at 5:29 PM, Alexandre Mah <[email protected]> wrote:
>>> On 2/12/10, Tad Glines <[email protected]> wrote:
>>>> The construction of DocOpCollector seems unnecessarily complex.
>>>>
>>>> Here's the relevant snippet:
>>>>
>>>>   private final List<BufferedDocOp> operations = new
>>>> ArrayList<BufferedDocOp>();
>>>>
>>>>   public void add(BufferedDocOp operation) {
>>>>     ListIterator<BufferedDocOp> iterator = operations.listIterator();
>>>>     while (iterator.hasNext()) {
>>>>       BufferedDocOp nextOperation = iterator.next();
>>>>       if (nextOperation == null) {
>>>>         iterator.set(operation);
>>>>         return;
>>>>       }
>>>>       iterator.set(null);
>>>>       operation = compose(nextOperation, operation);
>>>>     }
>>>>     operations.add(operation);
>>>>   }
>>>>
>>>>   public BufferedDocOp composeAll() {
>>>>     BufferedDocOp result = null;
>>>>     for (BufferedDocOp operation : operations) {
>>>>       if (operation != null) {
>>>>         result = (result != null) ? compose(operation, result) :
>>>> operation;
>>>>       }
>>>>     }
>>>>     operations.clear();
>>>>     return result;
>>>>   }
>>>>
>>>> This seems like it could be more simply (and efficiently) expresses as:
>>>>   private BufferedDocOp collectedOp = null;
>>>>
>>>>   public void add(BufferedDocOp operation) {
>>>>     if (collectedOp == null)
>>>>       collectedOp = operation;
>>>>     else
>>>>       collectedOp = compose(collectedOp, operation);
>>>>   }
>>>>
>>>>   public BufferedDocOp finish() {
>>>>     BufferedDocOp result = collectedOp;
>>>>     collectedOp = null;
>>>>     return result;
>>>>   }
>>>>
>>>> Does anyone see a problem with the proposed change?
>>>
>>> The solution you propose has an O(n^2) time complexity for composing a
>>> sequence of operations of total size n, whereas the current code has
>>> an O(n log n) time complexity.
>>>
>>> Perhaps I can explain how the DocOpCollector works, since I think it's
>>> pretty interesting.
>>> If you interpret the operations List as a (little-endian) binary
>>> number, where a null in the list indicates a 0 and a non-null
>>> operation indicates a 1, then adding an operation to the collector
>>> works exactly like incrementing its binary number representation.
>>>
>>> An interesting byproduct of this algorithm is that interpreting the
>>> operations List as a binary number will tell you exactly how many
>>> operations have been collected so far.
>>>
>>> I hope that clears things up.
>>>
>>> Alex
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Wave Protocol" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/wave-protocol?hl=en.
>>>
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Wave Protocol" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/wave-protocol?hl=en.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Wave Protocol" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/wave-protocol?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups "Wave 
Protocol" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/wave-protocol?hl=en.

Re: Unnecessary complexity in DocOpCollector

Reply via email to