hoo created this task.
hoo added projects: Datasets-General-or-Unknown, Wikidata, Wikibase-DataModel, Wikidata-Sprint.
Herald added a subscriber: Aklapper.

TASK DESCRIPTION

We aren't currently making use of the normal entity caching mechanism when dumping entities, as we don't want to pollute the cache with millions of otherwise unused entities. I did some testing and got the following numbers:

The following tests were done with a caching entity revision lookup (as obtained from SqlStore) vs. a non-caching one. All tests used entity prefetching.

Random access 5,000 entities in batches of 500:

Uncached:
1. 58.209151029587
2. 57.370378017426
3. 57.560675144196

Cached:
1. 55.036426067352
2. 54.444396972656
3. 53.03471493721

Q1 - Q5001 in batches of 500:

Uncached:
1. 109.29708003998
2. 83.949445009232

Cached:
1. 36.519347190857
2. 33.664337873459
3. 34.130303859711

Q1000000 - Q1005001 in batches of 500:

Uncached:
1. 47.415897130966
2. 20.073761940002

Cached:
1. 12.881020069122
2. 11.010383844376

Q40000000 - Q40005001 in batches of 500:

Uncached:
1. 51.220274925232

Cached:
1. 59.866926193237

Q41000000 - Q41005001 in batches of 500 (cached):
1. 58.677440881729

Q39000000 - Q39005001 in batches of 500 (uncached):
1. 51.507272005081

While this suggests some speedup (around 6.5% for random access), this still takes the time for putting cache misses into memcached into account. Due to that I hacked a retrieve-only cache and tested it on mwdebug1001.

Access to 15,000 random entities (in batches of 500) with retrieve-only cache:

Uncached:
1. 85.038383960724
2. 89.931836843491

Cached:
1. 75.183009147644
2. 74.581010103226

This suggests a speedup of over 15% when accessing random entities, which is a workload that should be rather similar to what the dumpers do (they need to access all entities in the end).

Code used (making use of the 'retrieve-only' hack):

<?php
$wikibaseRepo = Wikibase\Repo\WikibaseRepo::getDefaultInstance(); $entityPrefetcher = $wikibaseRepo->getStore()->getEntityPrefetcher(); $revisionLookup = $wikibaseRepo->getEntityRevisionLookup( 'uncached' );

$revisionLookup = new Wikibase\Lib\Store\CachingEntityRevisionLookup( $revisionLookup, wfGetCache( $GLOBALS['wgMainCacheType'] ), 60 * 60 * 24, 'wikibase_shared/wikidata_1_31_0_wmf_3-wikidatawiki-hhvm:WikiPageEntityRevisionLookup', 'retrieve-only' );

$entityLookup = new Wikibase\Lib\Store\RevisionBasedEntityLookup( $revisionLookup );

$t0 = microtime( 1 ); for ( $i = 0; $i < 15; $i++ ) { $toFetch = []; for ( $j = 0; $j < 500; $j++ ) { $toFetch[] = new \Wikibase\DataModel\Entity\ItemId( 'Q' . mt_rand(1, 42025257 ) ); } $entityPrefetcher->prefetch( $toFetch ); foreach ( $toFetch as $itemId ) { try { $entityLookup->getEntity( $itemId ); } catch( Wikibase\Lib\Store\RevisionedUnresolvedRedirectException $e ) {} } }; echo microtime( 1 ) - $t0;

TASK DETAIL
https://phabricator.wikimedia.org/T178247

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: daniel, Aklapper, hoo, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to