| Smalyshev created this task. Herald added a subscriber: Aklapper. |
TASK DESCRIPTION
On December 8, I have noticed that Updater is getting stuck on updates. Turns out there is a performance problem in Updater code, specifically in RdfRepository.java in this piece:
Collection<Statement> aboutStatements = new HashSet<>(insertStatements); aboutStatements.removeAll(entityStatements); aboutStatements.removeAll(statementStatements); aboutStatements.removeAll(filtered(insertStatements).withSubjectStarts(uris.value())); aboutStatements.removeAll(filtered(insertStatements).withSubjectStarts(uris.reference()));
The problem is in the implementation of removeAll:
if (size() > c.size()) { for (Iterator<?> i = c.iterator(); i.hasNext(); ) modified |= remove(i.next()); } else { for (Iterator<?> i = iterator(); i.hasNext(); ) { if (c.contains(i.next())) { i.remove(); modified = true; } } }As we can see, in certain situations, instead of going over elements of c and removing them, it opts to go over elements of the set and check if they are in c. The problem is that in this case c is a filter on a 100K-size list, which means each check produces the scan of the whole (or close to it) list. This makes the whole procedure extremely slow.
TASK DETAIL
EMAIL PREFERENCES
To: Smalyshev
Cc: Wikidata-Query-Service, Aklapper, Smalyshev
Cc: Wikidata-Query-Service, Aklapper, Smalyshev
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
