Title: [181405] trunk
Revision
181405
Author
[email protected]
Date
2015-03-11 14:08:51 -0700 (Wed, 11 Mar 2015)

Log Message

Add basic support for BOL and EOL assertions to the URL Filter parser
https://bugs.webkit.org/show_bug.cgi?id=142568

Patch by Benjamin Poulain <[email protected]> on 2015-03-11
Reviewed by Alex Christensen.

Source/WebCore:

This patch adds heavily restricted support for BOL and EOL to the URL filter parser.

Both assertions must be the first/last term of their pattern. Any advanced combination
results in a parsing error.

The BOL assertion is easy to represent: currently, any pattern starts at the beginning
of a line and the NFA are generated accordingly.

I had two options to represent the EOL assertion:
1) Add a new special transition on EOL.
2) Add a new vector of actions to the states, conditional to the EOL input.

I picked the first option to avoid growing every state by a vector
that would be empty in the vast majority of cases.


On the matching side, the interpreter was modified to support transitions on '\0'.
DFABytecodeInstruction::CheckValue now stops when running on a character after
the end of the string.

DFABytecodeInstruction::Jump gets two fixes: First we now account for the index
to avoid going past the end of the input. Second, stop on '\0' too... the reason
is that the unconditional jump is only used for fallback edges of the DFA, fallback
edge are not supposed to accept '\0'.

* contentextensions/DFA.cpp:
(WebCore::ContentExtensions::printTransitions):
* contentextensions/DFABytecodeInterpreter.cpp:
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
* contentextensions/DFANode.h:
* contentextensions/NFA.cpp:
(WebCore::ContentExtensions::NFA::addTransition):
(WebCore::ContentExtensions::NFA::addEpsilonTransition):
(WebCore::ContentExtensions::printTransitions):
* contentextensions/NFANode.h:
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::populateTransitions):
(WebCore::ContentExtensions::NFAToDFA::convert):
* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::Term::Term):
(WebCore::ContentExtensions::Term::isEndOfLineAssertion):
(WebCore::ContentExtensions::GraphBuilder::assertionBOL):
(WebCore::ContentExtensions::GraphBuilder::assertionEOL):
(WebCore::ContentExtensions::GraphBuilder::sinkFloatingTermIfNecessary):

Tools:

* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
(TestWebKitAPI::TEST_F):

Modified Paths

Diff

Modified: trunk/Source/WebCore/ChangeLog (181404 => 181405)


--- trunk/Source/WebCore/ChangeLog	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/ChangeLog	2015-03-11 21:08:51 UTC (rev 181405)
@@ -1,3 +1,55 @@
+2015-03-11  Benjamin Poulain  <[email protected]>
+
+        Add basic support for BOL and EOL assertions to the URL Filter parser
+        https://bugs.webkit.org/show_bug.cgi?id=142568
+
+        Reviewed by Alex Christensen.
+
+        This patch adds heavily restricted support for BOL and EOL to the URL filter parser.
+
+        Both assertions must be the first/last term of their pattern. Any advanced combination
+        results in a parsing error.
+
+        The BOL assertion is easy to represent: currently, any pattern starts at the beginning
+        of a line and the NFA are generated accordingly.
+
+        I had two options to represent the EOL assertion:
+        1) Add a new special transition on EOL.
+        2) Add a new vector of actions to the states, conditional to the EOL input.
+
+        I picked the first option to avoid growing every state by a vector
+        that would be empty in the vast majority of cases.
+
+
+        On the matching side, the interpreter was modified to support transitions on '\0'.
+        DFABytecodeInstruction::CheckValue now stops when running on a character after
+        the end of the string.
+
+        DFABytecodeInstruction::Jump gets two fixes: First we now account for the index
+        to avoid going past the end of the input. Second, stop on '\0' too... the reason
+        is that the unconditional jump is only used for fallback edges of the DFA, fallback
+        edge are not supposed to accept '\0'.
+
+        * contentextensions/DFA.cpp:
+        (WebCore::ContentExtensions::printTransitions):
+        * contentextensions/DFABytecodeInterpreter.cpp:
+        (WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
+        * contentextensions/DFANode.h:
+        * contentextensions/NFA.cpp:
+        (WebCore::ContentExtensions::NFA::addTransition):
+        (WebCore::ContentExtensions::NFA::addEpsilonTransition):
+        (WebCore::ContentExtensions::printTransitions):
+        * contentextensions/NFANode.h:
+        * contentextensions/NFAToDFA.cpp:
+        (WebCore::ContentExtensions::populateTransitions):
+        (WebCore::ContentExtensions::NFAToDFA::convert):
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::Term::Term):
+        (WebCore::ContentExtensions::Term::isEndOfLineAssertion):
+        (WebCore::ContentExtensions::GraphBuilder::assertionBOL):
+        (WebCore::ContentExtensions::GraphBuilder::assertionEOL):
+        (WebCore::ContentExtensions::GraphBuilder::sinkFloatingTermIfNecessary):
+
 2015-03-11  Jer Noble  <[email protected]>
 
         [Mac] Update fullscreen placeholder UI to use Vibrancy.

Modified: trunk/Source/WebCore/contentextensions/DFA.cpp (181404 => 181405)


--- trunk/Source/WebCore/contentextensions/DFA.cpp	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/contentextensions/DFA.cpp	2015-03-11 21:08:51 UTC (rev 181405)
@@ -78,7 +78,7 @@
 static void printTransitions(const Vector<DFANode>& graph, unsigned sourceNodeId)
 {
     const DFANode& sourceNode = graph[sourceNodeId];
-    const HashMap<uint16_t, unsigned>& transitions = sourceNode.transitions;
+    const DFANodeTransitions& transitions = sourceNode.transitions;
 
     if (transitions.isEmpty() && !sourceNode.hasFallbackTransition)
         return;

Modified: trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp (181404 => 181405)


--- trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp	2015-03-11 21:08:51 UTC (rev 181405)
@@ -48,6 +48,7 @@
     
     unsigned programCounter = 0;
     unsigned urlIndex = 0;
+    bool urlIndexIsAfterEndOfString = false;
     Actions actions;
     
     // This should always terminate if interpreting correctly compiled bytecode.
@@ -59,17 +60,23 @@
             return actions;
 
         case DFABytecodeInstruction::CheckValue:
+            if (urlIndexIsAfterEndOfString)
+                return actions;
+
             // Check to see if the next character in the url is the value stored with the bytecode.
-            if (!url[urlIndex])
-                return actions; // Reached null character at end.
             if (url[urlIndex] == getBits<uint8_t>(m_bytecode, m_bytecodeLength, programCounter + sizeof(DFABytecode))) {
                 programCounter = getBits<unsigned>(m_bytecode, m_bytecodeLength, programCounter + sizeof(DFABytecode) + sizeof(uint8_t));
+                if (!url[urlIndex])
+                    urlIndexIsAfterEndOfString = true;
                 urlIndex++; // This represents an edge in the DFA.
             } else
                 programCounter += instructionSizeWithArguments(DFABytecodeInstruction::CheckValue);
             break;
 
         case DFABytecodeInstruction::Jump:
+            if (!url[urlIndex] || urlIndexIsAfterEndOfString)
+                return actions;
+
             programCounter = getBits<unsigned>(m_bytecode, m_bytecodeLength, programCounter + sizeof(DFABytecode));
             urlIndex++; // This represents an edge in the DFA.
             break;
@@ -82,7 +89,8 @@
         default:
             RELEASE_ASSERT_NOT_REACHED(); // Invalid bytecode.
         }
-        ASSERT(urlIndex <= urlCString.length()); // We should always terminate at a null character at the end of a String.
+        // We should always terminate before or at a null character at the end of a String.
+        ASSERT(urlIndex <= urlCString.length() || (urlIndexIsAfterEndOfString && urlIndex <= urlCString.length() + 1));
     }
     RELEASE_ASSERT_NOT_REACHED();
 }

Modified: trunk/Source/WebCore/contentextensions/DFANode.h (181404 => 181405)


--- trunk/Source/WebCore/contentextensions/DFANode.h	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/contentextensions/DFANode.h	2015-03-11 21:08:51 UTC (rev 181405)
@@ -38,9 +38,12 @@
 
 // A DFANode abstract the transition table out of a DFA state. If a state is accepting, the DFANode also have
 // the actions for that state.
+
+typedef HashMap<uint16_t, unsigned, DefaultHash<uint16_t>::Hash, WTF::UnsignedWithZeroKeyHashTraits<uint16_t>> DFANodeTransitions;
+
 class DFANode {
 public:
-    HashMap<uint16_t, unsigned> transitions;
+    DFANodeTransitions transitions;
     bool hasFallbackTransition = false;
     unsigned fallbackTransition;
     Vector<uint64_t> actions;

Modified: trunk/Source/WebCore/contentextensions/NFA.cpp (181404 => 181405)


--- trunk/Source/WebCore/contentextensions/NFA.cpp	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/contentextensions/NFA.cpp	2015-03-11 21:08:51 UTC (rev 181405)
@@ -51,13 +51,12 @@
 {
     ASSERT(from < m_nodes.size());
     ASSERT(to < m_nodes.size());
-    ASSERT(character);
 
     NFANode& fromNode = m_nodes[from];
     if (fromNode.transitionsOnAnyCharacter.contains(to))
         return;
 
-    auto addResult = m_nodes[from].transitions.add(character, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>());
+    auto addResult = m_nodes[from].transitions.add(character, NFANodeIndexSet());
     addResult.iterator->value.add(to);
 }
 
@@ -66,7 +65,7 @@
     ASSERT(from < m_nodes.size());
     ASSERT(to < m_nodes.size());
 
-    auto addResult = m_nodes[from].transitions.add(epsilonTransitionCharacter, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>());
+    auto addResult = m_nodes[from].transitions.add(epsilonTransitionCharacter, NFANodeIndexSet());
     addResult.iterator->value.add(to);
 }
 
@@ -133,13 +132,13 @@
 static void printTransitions(const Vector<NFANode>& graph, unsigned sourceNode, uint16_t epsilonTransitionCharacter)
 {
     const NFANode& node = graph[sourceNode];
-    const HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>>& transitions = node.transitions;
+    const HashMap<uint16_t, NFANodeIndexSet, DefaultHash<uint16_t>::Hash, WTF::UnsignedWithZeroKeyHashTraits<uint16_t>>& transitions = node.transitions;
 
-    HashMap<unsigned, HashSet<uint16_t>, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsPerTarget;
+    HashMap<unsigned, HashSet<uint16_t, DefaultHash<uint16_t>::Hash, WTF::UnsignedWithZeroKeyHashTraits<uint16_t>>, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsPerTarget;
 
     for (const auto& transition : transitions) {
         for (unsigned targetNode : transition.value) {
-            transitionsPerTarget.add(targetNode, HashSet<uint16_t>());
+            transitionsPerTarget.add(targetNode, HashSet<uint16_t, DefaultHash<uint16_t>::Hash, WTF::UnsignedWithZeroKeyHashTraits<uint16_t>>());
             transitionsPerTarget.find(targetNode)->value.add(transition.key);
         }
     }

Modified: trunk/Source/WebCore/contentextensions/NFANode.h (181404 => 181405)


--- trunk/Source/WebCore/contentextensions/NFANode.h	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/contentextensions/NFANode.h	2015-03-11 21:08:51 UTC (rev 181405)
@@ -38,10 +38,14 @@
 namespace ContentExtensions {
 
 // A NFANode abstract the transition table out of a NFA state.
+
+typedef HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> NFANodeIndexSet;
+typedef HashMap<uint16_t, NFANodeIndexSet, DefaultHash<uint16_t>::Hash, WTF::UnsignedWithZeroKeyHashTraits<uint16_t>> NFANodeTransitions;
+
 class NFANode {
 public:
-    HashMap<uint16_t, HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>>> transitions;
-    HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> transitionsOnAnyCharacter;
+    HashMap<uint16_t, NFANodeIndexSet, DefaultHash<uint16_t>::Hash, WTF::UnsignedWithZeroKeyHashTraits<uint16_t>> transitions;
+    NFANodeIndexSet transitionsOnAnyCharacter;
 
     Vector<uint64_t> finalRuleIds;
 #if CONTENT_EXTENSIONS_STATE_MACHINE_DEBUGGING

Modified: trunk/Source/WebCore/contentextensions/NFAToDFA.cpp (181404 => 181405)


--- trunk/Source/WebCore/contentextensions/NFAToDFA.cpp	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/contentextensions/NFAToDFA.cpp	2015-03-11 21:08:51 UTC (rev 181405)
@@ -41,7 +41,7 @@
 
 // FIXME: set a better initial size.
 // FIXME: include the hash inside NodeIdSet.
-typedef HashSet<unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> NodeIdSet;
+typedef NFANodeIndexSet NodeIdSet;
 
 static inline void epsilonClosureExcludingSelf(const Vector<NFANode>& nfaGraph, unsigned nodeId, unsigned epsilonTransitionCharacter, Vector<unsigned>& output)
 {
@@ -322,7 +322,8 @@
                 targetSet.add(targetNodId);
                 extendSetWithClosure(nfaNodeclosures, targetNodId, targetSet);
             }
-            targetSet.add(setFallbackTransition.begin(), setFallbackTransition.end());
+            if (transitionSlot.key)
+                targetSet.add(setFallbackTransition.begin(), setFallbackTransition.end());
         }
     }
 }
@@ -365,7 +366,6 @@
         NodeIdSet setFallbackTransition;
         populateTransitions(transitionsFromClosedSet, setFallbackTransition, *uniqueNodeIdSetImpl, nfaGraph, nfaNodeClosures);
 
-        // FIXME: there should not be any transition on key 0.
         for (unsigned key = 0; key < transitionsFromClosedSet.size(); ++key) {
             NodeIdSet& targetNodeSet = transitionsFromClosedSet[key];
 

Modified: trunk/Source/WebCore/contentextensions/URLFilterParser.cpp (181404 => 181405)


--- trunk/Source/WebCore/contentextensions/URLFilterParser.cpp	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Source/WebCore/contentextensions/URLFilterParser.cpp	2015-03-11 21:08:51 UTC (rev 181405)
@@ -81,6 +81,13 @@
         new (NotNull, &m_atomData.group) Group();
     }
 
+    enum EndOfLineAssertionTermTag { EndOfLineAssertionTerm };
+    explicit Term(EndOfLineAssertionTermTag)
+        : Term(CharacterSetTerm, false)
+    {
+        m_atomData.characterSet.characters.set(0);
+    }
+
     Term(const Term& other)
         : m_termType(other.m_termType)
         , m_quantifier(other.m_quantifier)
@@ -212,6 +219,11 @@
         }
     }
 
+    bool isEndOfLineAssertion() const
+    {
+        return m_termType == TermType::CharacterSet && m_atomData.characterSet.characters.bitCount() == 1 && m_atomData.characterSet.characters.get(0);
+    }
+
     Term& operator=(const Term& other)
     {
         destroy();
@@ -498,12 +510,22 @@
 
     void assertionBOL()
     {
-        fail(ASCIILiteral("Line boundary assertions are not supported yet."));
+        if (hasError())
+            return;
+
+        if (m_subtreeStart != m_subtreeEnd || m_floatingTerm.isValid() || !m_openGroups.isEmpty())
+            fail(ASCIILiteral("Start of line assertion can only appear as the first term in a filter."));
     }
 
     void assertionEOL()
     {
-        fail(ASCIILiteral("Line boundary assertions are not supported yet."));
+        if (hasError())
+            return;
+
+        sinkFloatingTermIfNecessary();
+        ASSERT(!m_floatingTerm.isValid());
+
+        m_floatingTerm = Term(Term::EndOfLineAssertionTerm);
     }
 
     void assertionWordBoundary(bool)
@@ -614,6 +636,15 @@
 
         ASSERT(m_lastPrefixTreeEntry);
 
+        if (m_hasProcessedEndOfLineAssertion) {
+            fail(ASCIILiteral("The end of line assertion must be the last term in an _expression_."));
+            m_floatingTerm = Term();
+            return;
+        }
+
+        if (m_floatingTerm.isEndOfLineAssertion())
+            m_hasProcessedEndOfLineAssertion = true;
+
         if (!m_openGroups.isEmpty()) {
             m_openGroups.last().extendGroupSubpattern(m_floatingTerm);
             m_floatingTerm = Term();
@@ -656,6 +687,7 @@
     PrefixTreeEntry* m_lastPrefixTreeEntry;
     Deque<Term> m_openGroups;
     Term m_floatingTerm;
+    bool m_hasProcessedEndOfLineAssertion { false };
 
     PrefixTreeEntry* m_newPrefixSubtreeRoot = nullptr;
     Term m_newPrefixStaringPoint;

Modified: trunk/Tools/ChangeLog (181404 => 181405)


--- trunk/Tools/ChangeLog	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Tools/ChangeLog	2015-03-11 21:08:51 UTC (rev 181405)
@@ -1,3 +1,13 @@
+2015-03-11  Benjamin Poulain  <[email protected]>
+
+        Add basic support for BOL and EOL assertions to the URL Filter parser
+        https://bugs.webkit.org/show_bug.cgi?id=142568
+
+        Reviewed by Alex Christensen.
+
+        * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
+        (TestWebKitAPI::TEST_F):
+
 2015-03-11  Carlos Garcia Campos  <[email protected]>
 
         [GTK] Add support for handling TLS errors to MiniBrowser

Modified: trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp (181404 => 181405)


--- trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp	2015-03-11 21:03:24 UTC (rev 181404)
+++ trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp	2015-03-11 21:08:51 UTC (rev 181405)
@@ -157,4 +157,62 @@
     testURL(backend, URL(URL(), "http://webkit.org/fobar"), { });
 }
 
+const char* matchPastEndOfStringFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\".+\"}}]";
+
+TEST_F(ContentExtensionTest, MatchPastEndOfString)
+{
+    auto extensionData = ContentExtensions::compileRuleList(matchPastEndOfStringFilter);
+    auto extension = InMemoryCompiledContentExtension::create(WTF::move(extensionData));
+
+    ContentExtensions::ContentExtensionsBackend backend;
+    backend.addContentExtension("MatchPastEndOfString", extension);
+
+    testURL(backend, URL(URL(), "http://webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "http://webkit.org/foo"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "http://webkit.org/foobar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "http://webkit.org/foobarbar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "http://webkit.org/foofoobar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "http://webkit.org/foobarfoobar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "http://webkit.org/foob"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "http://webkit.org/foor"), { ContentExtensions::ActionType::BlockLoad });
+}
+
+const char* startOfLineAssertionFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"^foobar\"}}]";
+
+TEST_F(ContentExtensionTest, StartOfLineAssertion)
+{
+    auto extensionData = ContentExtensions::compileRuleList(startOfLineAssertionFilter);
+    auto extension = InMemoryCompiledContentExtension::create(WTF::move(extensionData));
+
+    ContentExtensions::ContentExtensionsBackend backend;
+    backend.addContentExtension("StartOfLineAssertion", extension);
+
+    testURL(backend, URL(URL(), "foobar://webkit.org/foobar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "foobars:///foobar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "foobarfoobar:///foobarfoobarfoobar"), { ContentExtensions::ActionType::BlockLoad });
+
+    testURL(backend, URL(URL(), "http://webkit.org/foobarfoo"), { });
+    testURL(backend, URL(URL(), "http://webkit.org/foobarf"), { });
+    testURL(backend, URL(URL(), "http://foobar.org/"), { });
+    testURL(backend, URL(URL(), "http://foobar.org/"), { });
+}
+
+const char* endOfLineAssertionFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\".*foobar$\"}}]";
+
+TEST_F(ContentExtensionTest, EndOfLineAssertion)
+{
+    auto extensionData = ContentExtensions::compileRuleList(endOfLineAssertionFilter);
+    auto extension = InMemoryCompiledContentExtension::create(WTF::move(extensionData));
+
+    ContentExtensions::ContentExtensionsBackend backend;
+    backend.addContentExtension("EndOfLineAssertion", extension);
+
+    testURL(backend, URL(URL(), "http://webkit.org/foobar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "file:///foobar"), { ContentExtensions::ActionType::BlockLoad });
+    testURL(backend, URL(URL(), "file:///foobarfoobarfoobar"), { ContentExtensions::ActionType::BlockLoad });
+
+    testURL(backend, URL(URL(), "http://webkit.org/foobarfoo"), { });
+    testURL(backend, URL(URL(), "http://webkit.org/foobarf"), { });
+}
+
 } // namespace TestWebKitAPI
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to