Title: [178857] trunk/Source/WebCore
Revision
178857
Author
benja...@webkit.org
Date
2015-01-21 13:56:22 -0800 (Wed, 21 Jan 2015)

Log Message

Handle the transition on any character as a separate type of transition
https://bugs.webkit.org/show_bug.cgi?id=140711

Reviewed by Andreas Kling.

Instead of considering the universal transition as 127 transitions, it is now
handled as a separate type of transition.

The goal is to reduce the number of exit edge to consider for each node. Instead
of having 127 for any partition containing one universal transition, we have
as few exit edges as necessary + one universal transition.

In the NFA, the universal transition is stored directly on NFANode in a new
HashSet (transitionsOnAnyCharacter).
The target nodes are made exclusive between "transitionsOnAnyCharacter" and "transitions"
by construction. That is not strictly needed but it simplify debugging at the moment.

When converting a NFA to a DFA, we first find all the node that transition on any character.
Then, when we iterate over "real" transition, we also augment that set with the set on
any character.

When creating the DFA node, we first consider each "real" transition, then we have a single
"fallback" transition for any character that has not been handled yet.

When matching, we first search for any real transition. If there is none but a fallback exists,
we take the fallback.

* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::DFA::nextState):
(WebCore::ContentExtensions::printTransitions):
(WebCore::ContentExtensions::DFA::debugPrintDot):
(WebCore::ContentExtensions::printTransition): Deleted.
* contentextensions/DFANode.h:
* contentextensions/NFA.cpp:
(WebCore::ContentExtensions::NFA::addTransition):
(WebCore::ContentExtensions::NFA::addTransitionsOnAnyCharacter):
(WebCore::ContentExtensions::printTransitions):
(WebCore::ContentExtensions::NFA::debugPrintDot):
(WebCore::ContentExtensions::printTransition): Deleted.
* contentextensions/NFA.h:
* contentextensions/NFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::populateTransitions):
(WebCore::ContentExtensions::getOrCreateDFANode):
(WebCore::ContentExtensions::NFAToDFA::convert):
* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::GraphBuilder::generateTransition):

Modified Paths

Diff

Modified: trunk/Source/WebCore/ChangeLog (178856 => 178857)


--- trunk/Source/WebCore/ChangeLog	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/ChangeLog	2015-01-21 21:56:22 UTC (rev 178857)
@@ -1,3 +1,53 @@
+2015-01-21  Benjamin Poulain  <benja...@webkit.org>
+
+        Handle the transition on any character as a separate type of transition
+        https://bugs.webkit.org/show_bug.cgi?id=140711
+
+        Reviewed by Andreas Kling.
+
+        Instead of considering the universal transition as 127 transitions, it is now
+        handled as a separate type of transition.
+
+        The goal is to reduce the number of exit edge to consider for each node. Instead
+        of having 127 for any partition containing one universal transition, we have
+        as few exit edges as necessary + one universal transition.
+
+        In the NFA, the universal transition is stored directly on NFANode in a new
+        HashSet (transitionsOnAnyCharacter).
+        The target nodes are made exclusive between "transitionsOnAnyCharacter" and "transitions"
+        by construction. That is not strictly needed but it simplify debugging at the moment.
+
+        When converting a NFA to a DFA, we first find all the node that transition on any character.
+        Then, when we iterate over "real" transition, we also augment that set with the set on
+        any character.
+
+        When creating the DFA node, we first consider each "real" transition, then we have a single
+        "fallback" transition for any character that has not been handled yet.
+
+        When matching, we first search for any real transition. If there is none but a fallback exists,
+        we take the fallback.
+
+        * contentextensions/DFA.cpp:
+        (WebCore::ContentExtensions::DFA::nextState):
+        (WebCore::ContentExtensions::printTransitions):
+        (WebCore::ContentExtensions::DFA::debugPrintDot):
+        (WebCore::ContentExtensions::printTransition): Deleted.
+        * contentextensions/DFANode.h:
+        * contentextensions/NFA.cpp:
+        (WebCore::ContentExtensions::NFA::addTransition):
+        (WebCore::ContentExtensions::NFA::addTransitionsOnAnyCharacter):
+        (WebCore::ContentExtensions::printTransitions):
+        (WebCore::ContentExtensions::NFA::debugPrintDot):
+        (WebCore::ContentExtensions::printTransition): Deleted.
+        * contentextensions/NFA.h:
+        * contentextensions/NFANode.h:
+        * contentextensions/NFAToDFA.cpp:
+        (WebCore::ContentExtensions::populateTransitions):
+        (WebCore::ContentExtensions::getOrCreateDFANode):
+        (WebCore::ContentExtensions::NFAToDFA::convert):
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::GraphBuilder::generateTransition):
+
 2015-01-20  Antti Koivisto  <an...@apple.com>
 
         REGRESSION(r178180): Membuster regressed ~4%

Modified: trunk/Source/WebCore/contentextensions/DFA.cpp (178856 => 178857)


--- trunk/Source/WebCore/contentextensions/DFA.cpp	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/contentextensions/DFA.cpp	2015-01-21 21:56:22 UTC (rev 178857)
@@ -65,12 +65,17 @@
 
     const DFANode& node = m_nodes[currentState];
     auto nextNode = node.transitions.find(character);
-    if (nextNode == node.transitions.end()) {
-        ok = false;
-        return 0;
+    if (nextNode != node.transitions.end()) {
+        ok = true;
+        return nextNode->value;
     }
-    ok = true;
-    return nextNode->value;
+    if (node.hasFallbackTransition) {
+        ok = true;
+        return node.fallbackTransition;
+    }
+    ok = false;
+    return 0;
+
 }
 
 const Vector<uint64_t>& DFA::actions(unsigned currentState) const
@@ -95,9 +100,12 @@
         dataLogF("\\\\%d-\\\\%d", rangeStart, rangeEnd);
 }
 
-static void printTransition(unsigned sourceNode, const HashMap<uint16_t, unsigned>& transitions)
+static void printTransitions(const Vector<DFANode>& graph, unsigned sourceNodeId)
 {
-    if (transitions.isEmpty())
+    const DFANode& sourceNode = graph[sourceNodeId];
+    const HashMap<uint16_t, unsigned>& transitions = sourceNode.transitions;
+
+    if (transitions.isEmpty() && !sourceNode.hasFallbackTransition)
         return;
 
     HashMap<unsigned, Vector<uint16_t>, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsPerTarget;
@@ -112,7 +120,7 @@
 
     // Then we go over each one an display the ranges one by one.
     for (const auto& transitionPerTarget : transitionsPerTarget) {
-        dataLogF("        %d -> %d [label=\"", sourceNode, transitionPerTarget.key);
+        dataLogF("        %d -> %d [label=\"", sourceNodeId, transitionPerTarget.key);
 
         Vector<uint16_t> incommingCharacters = transitionPerTarget.value;
         std::sort(incommingCharacters.begin(), incommingCharacters.end());
@@ -134,6 +142,9 @@
 
         dataLogF("\"];\n");
     }
+
+    if (sourceNode.hasFallbackTransition)
+        dataLogF("        %d -> %d [label=\"[fallback]\"];\n", sourceNodeId, sourceNode.fallbackTransition);
 }
 
 void DFA::debugPrintDot() const
@@ -174,7 +185,7 @@
 
     dataLogF("    {\n");
     for (unsigned i = 0; i < m_nodes.size(); ++i)
-        printTransition(i, m_nodes[i].transitions);
+        printTransitions(m_nodes, i);
 
     dataLogF("    }\n");
     dataLogF("}\n");

Modified: trunk/Source/WebCore/contentextensions/DFANode.h (178856 => 178857)


--- trunk/Source/WebCore/contentextensions/DFANode.h	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/contentextensions/DFANode.h	2015-01-21 21:56:22 UTC (rev 178857)
@@ -41,6 +41,8 @@
 class DFANode {
 public:
     HashMap<uint16_t, unsigned> transitions;
+    bool hasFallbackTransition = false;
+    unsigned fallbackTransition;
     Vector<uint64_t> actions;
 
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING

Modified: trunk/Source/WebCore/contentextensions/NFA.cpp (178856 => 178857)


--- trunk/Source/WebCore/contentextensions/NFA.cpp	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/contentextensions/NFA.cpp	2015-01-21 21:56:22 UTC (rev 178857)
@@ -53,6 +53,10 @@
     ASSERT(to < m_nodes.size());
     ASSERT(character);
 
+    NFANode& fromNode = m_nodes[from];
+    if (fromNode.transitionsOnAnyCharacter.contains(to))
+        return;
+
     auto addResult = m_nodes[from].transitions.add(character, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>());
     addResult.iterator->value.add(to);
 }
@@ -66,6 +70,18 @@
     addResult.iterator->value.add(to);
 }
 
+void NFA::addTransitionsOnAnyCharacter(unsigned from, unsigned to)
+{
+    ASSERT(from < m_nodes.size());
+    ASSERT(to < m_nodes.size());
+
+    NFANode& fromNode = m_nodes[from];
+    fromNode.transitionsOnAnyCharacter.add(to);
+
+    for (auto transitionSlot : fromNode.transitions)
+        transitionSlot.value.remove(to);
+}
+
 void NFA::setFinal(unsigned node, uint64_t ruleId)
 {
     ASSERT(!m_nodes[node].finalRuleIds.contains(ruleId));
@@ -114,8 +130,11 @@
     }
 }
 
-static void printTransition(unsigned sourceNode, const HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>>& transitions, uint16_t epsilonTransitionCharacter)
+static void printTransitions(const Vector<NFANode>& graph, unsigned sourceNode, uint16_t epsilonTransitionCharacter)
 {
+    const NFANode& node = graph[sourceNode];
+    const HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>>& transitions = node.transitions;
+
     HashMap<unsigned, HashSet<uint16_t>, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsPerTarget;
 
     for (const auto& transition : transitions) {
@@ -149,6 +168,9 @@
 
         dataLogF("\"];\n");
     }
+
+    for (unsigned targetOnAnyCharacter : node.transitionsOnAnyCharacter)
+        dataLogF("        %d -> %d [label=\"[any input]\"];\n", sourceNode, targetOnAnyCharacter);
 }
 
 void NFA::debugPrintDot() const
@@ -193,7 +215,7 @@
 
     dataLogF("    {\n");
     for (unsigned i = 0; i < m_nodes.size(); ++i)
-        printTransition(i, m_nodes[i].transitions, epsilonTransitionCharacter);
+        printTransitions(m_nodes, i, epsilonTransitionCharacter);
     dataLogF("    }\n");
     dataLogF("}\n");
 }

Modified: trunk/Source/WebCore/contentextensions/NFA.h (178856 => 178857)


--- trunk/Source/WebCore/contentextensions/NFA.h	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/contentextensions/NFA.h	2015-01-21 21:56:22 UTC (rev 178857)
@@ -48,6 +48,7 @@
 
     void addTransition(unsigned from, unsigned to, char character);
     void addEpsilonTransition(unsigned from, unsigned to);
+    void addTransitionsOnAnyCharacter(unsigned from, unsigned to);
     void setFinal(unsigned node, uint64_t ruleId);
 
     unsigned graphSize() const;

Modified: trunk/Source/WebCore/contentextensions/NFANode.h (178856 => 178857)


--- trunk/Source/WebCore/contentextensions/NFANode.h	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/contentextensions/NFANode.h	2015-01-21 21:56:22 UTC (rev 178857)
@@ -41,6 +41,7 @@
 class NFANode {
 public:
     HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>> transitions;
+    HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsOnAnyCharacter;
 
     Vector<uint64_t> finalRuleIds;
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING

Modified: trunk/Source/WebCore/contentextensions/NFAToDFA.cpp (178856 => 178857)


--- trunk/Source/WebCore/contentextensions/NFAToDFA.cpp	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/contentextensions/NFAToDFA.cpp	2015-01-21 21:56:22 UTC (rev 178857)
@@ -291,9 +291,10 @@
     NodeIdSet m_targets[128];
 };
 
-static inline void populateTransitions(SetTransitions& setTransitions, const UniqueNodeIdSetImpl& sourceNodeSet, const Vector<NFANode>& graph, const Vector<Vector<unsigned>>& nfaNodeclosures)
+static inline void populateTransitions(SetTransitions& setTransitions, NodeIdSet& setFallbackTransition, const UniqueNodeIdSetImpl& sourceNodeSet, const Vector<NFANode>& graph, const Vector<Vector<unsigned>>& nfaNodeclosures)
 {
     ASSERT(!graph.isEmpty());
+    ASSERT(setFallbackTransition.isEmpty());
 #if !ASSERT_DISABLED
     for (const NodeIdSet& set : setTransitions)
         ASSERT(set.isEmpty());
@@ -302,16 +303,36 @@
     const unsigned* buffer = sourceNodeSet.buffer();
     for (unsigned i = 0; i < sourceNodeSet.m_size; ++i) {
         unsigned nodeId = buffer[i];
+        const NFANode& nfaSourceNode = graph[nodeId];
+        if (!nfaSourceNode.transitionsOnAnyCharacter.isEmpty())
+            setFallbackTransition.add(nfaSourceNode.transitionsOnAnyCharacter.begin(), nfaSourceNode.transitionsOnAnyCharacter.end());
+    }
+    for (unsigned targetNodeId : setFallbackTransition)
+        extendSetWithClosure(nfaNodeclosures, targetNodeId, setFallbackTransition);
+
+    for (unsigned i = 0; i < sourceNodeSet.m_size; ++i) {
+        unsigned nodeId = buffer[i];
         for (const auto& transitionSlot : graph[nodeId].transitions) {
             NodeIdSet& targetSet = setTransitions[transitionSlot.key];
             for (unsigned targetNodId : transitionSlot.value) {
                 targetSet.add(targetNodId);
                 extendSetWithClosure(nfaNodeclosures, targetNodId, targetSet);
             }
+            targetSet.add(setFallbackTransition.begin(), setFallbackTransition.end());
         }
     }
 }
 
+static ALWAYS_INLINE unsigned getOrCreateDFANode(const NodeIdSet& nfaNodeSet, const Vector<NFANode>& nfaGraph, Vector<DFANode>& dfaGraph, UniqueNodeIdSetTable& uniqueNodeIdSetTable, Vector<UniqueNodeIdSetImpl*>& unprocessedNodes)
+{
+    NodeIdSetToUniqueNodeIdSetSource nodeIdSetToUniqueNodeIdSetSource(dfaGraph, nfaGraph, nfaNodeSet);
+    auto uniqueNodeIdAddResult = uniqueNodeIdSetTable.add<NodeIdSetToUniqueNodeIdSetTranslator>(nodeIdSetToUniqueNodeIdSetSource);
+    if (uniqueNodeIdAddResult.isNewEntry)
+        unprocessedNodes.append(uniqueNodeIdAddResult.iterator->impl());
+
+    return uniqueNodeIdAddResult.iterator->impl()->m_dfaNodeId;
+}
+
 DFA NFAToDFA::convert(NFA& nfa)
 {
     Vector<NFANode>& nfaGraph = nfa.m_nodes;
@@ -337,7 +358,8 @@
         UniqueNodeIdSetImpl* uniqueNodeIdSetImpl = unprocessedNodes.takeLast();
 
         unsigned dfaNodeId = uniqueNodeIdSetImpl->m_dfaNodeId;
-        populateTransitions(transitionsFromClosedSet, *uniqueNodeIdSetImpl, nfaGraph, nfaNodeClosures);
+        NodeIdSet setFallbackTransition;
+        populateTransitions(transitionsFromClosedSet, setFallbackTransition, *uniqueNodeIdSetImpl, nfaGraph, nfaNodeClosures);
 
         // FIXME: there should not be any transition on key 0.
         for (unsigned key = 0; key < transitionsFromClosedSet.size(); ++key) {
@@ -346,18 +368,19 @@
             if (targetNodeSet.isEmpty())
                 continue;
 
-            NodeIdSetToUniqueNodeIdSetSource nodeIdSetToUniqueNodeIdSetSource(dfaGraph, nfaGraph, targetNodeSet);
-            auto uniqueNodeIdAddResult = uniqueNodeIdSetTable.add<NodeIdSetToUniqueNodeIdSetTranslator>(nodeIdSetToUniqueNodeIdSetSource);
-
-            unsigned targetNodeId = uniqueNodeIdAddResult.iterator->impl()->m_dfaNodeId;
+            unsigned targetNodeId = getOrCreateDFANode(targetNodeSet, nfaGraph, dfaGraph, uniqueNodeIdSetTable, unprocessedNodes);
             const auto addResult = dfaGraph[dfaNodeId].transitions.add(key, targetNodeId);
             ASSERT_UNUSED(addResult, addResult.isNewEntry);
 
-            if (uniqueNodeIdAddResult.isNewEntry)
-                unprocessedNodes.append(uniqueNodeIdAddResult.iterator->impl());
-
             targetNodeSet.clear();
         }
+        if (!setFallbackTransition.isEmpty()) {
+            unsigned targetNodeId = getOrCreateDFANode(setFallbackTransition, nfaGraph, dfaGraph, uniqueNodeIdSetTable, unprocessedNodes);
+            DFANode& dfaSourceNode = dfaGraph[dfaNodeId];
+            ASSERT(!dfaSourceNode.hasFallbackTransition);
+            dfaSourceNode.hasFallbackTransition = true;
+            dfaSourceNode.fallbackTransition = targetNodeId;
+        }
     } while (!unprocessedNodes.isEmpty());
 
     return DFA(WTF::move(dfaGraph), 0);

Modified: trunk/Source/WebCore/contentextensions/URLFilterParser.cpp (178856 => 178857)


--- trunk/Source/WebCore/contentextensions/URLFilterParser.cpp	2015-01-21 21:43:55 UTC (rev 178856)
+++ trunk/Source/WebCore/contentextensions/URLFilterParser.cpp	2015-01-21 21:56:22 UTC (rev 178857)
@@ -245,8 +245,7 @@
     {
         if (trivialAtom & hasNonCharacterMask) {
             ASSERT(trivialAtom & newlineClassIDBuiltinMask);
-            for (unsigned i = 1; i < 128; ++i)
-                m_nfa.addTransition(source, target, i);
+            m_nfa.addTransitionsOnAnyCharacter(source, target);
         } else {
             if (trivialAtom & caseInsensitiveMask) {
                 char character = static_cast<char>(trivialAtom & characterMask);
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to