https://bugzilla.wikimedia.org/show_bug.cgi?id=51457
Web browser: ---
Bug ID: 51457
Summary: Excessive backtracking in
attribute_preprocessor_text_line when parsing table
cell
Product: Parsoid
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: tokenizer
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
Mobile Platform: ---
Several busy ('hanging') workers in production were backtracking when parsing
pathological tables in
http://el.wikipedia.org/wiki/%CE%A0%CE%BF%CF%81%CE%B5%CE%AF%CE%B1_%CF%84%CF%89%CE%BD_%CE%BA%CF%85%CF%80%CF%81%CE%B9%CE%B1%CE%BA%CF%8E%CE%BD_%CE%BF%CE%BC%CE%AC%CE%B4%CF%89%CE%BD_%CF%83%CF%84%CE%B1_%CE%BA%CF%8D%CF%80%CE%B5%CE%BB%CE%BB%CE%B1_%CE%95%CF%85%CF%81%CF%8E%CF%80%CE%B7%CF%82
I tracked this down by attaching the node debugger to those workers.
Backtracking when parsing table cells with optional attributes is hard to
avoid, but in this case there might be a bug in cache key construction for
memoization. The presence of plenty of quotes additionally slows down
potential-attribute parsing here.
I have some WIP code that speeds things up a lot by avoiding to parse
attributes with clearly invalid names, but get some failures in tests where the
PHP parser simply strips invalid attribute names. Needs further investigation.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l