I'm also more worried about memory consumption than speed.

By far the biggest performance issue regarding speed is the fact that we load
entire entities just to look up a single label. This has been known for a while.

But with the new data model, we no longer do deferred unstubbing. Everything is
unserialized right away, always, even if in the end, all we need is a single
label of the entity. That's especially bad if there is a lot of referenced
entities, of course.

On top of that, PHP seems to "sometimes" get confused when memory is running
low. This seems "somehow" connected to ArrayObject. These effects are hard to
reproduce, though; we are not sure what exactly is going on.

In any case, we should try to be less wasteful with memory. Having a stub
implementations for StatementList would already help a lot. I'll be working on
removing the need to load so many entities in the first place (we already had
TermsLookup in the sprint, but didn't get around to working on it - partially
due to the problems on the live site).

Am 08.10.2014 09:43, schrieb Markus Krötzsch:
> Btw, when doing such performance measures, it would be great to get some 
> memory
> statistics from PHP as well. From my past as an SMW developer, I remember 
> seeing
> incredible memory footprints of apparently simple PHP objects. OoM would be 
> one
> of the most common causes for blank pages, much more common than timeouts, and
> even a single object in PHP can take up huge amounts of memory.
> 
> Markus
> 
> 
> On 07.10.2014 23:44, Jeroen De Dauw wrote:
>> Hey,
>>
>>     Thank you for making the measurements. Can you estimate the time for
>>     item Q183 specifically? Since it is 1000 entities weighing 19 MB,
>>     this means that on average the entities were 19 KB. Germany on the
>>     other hand is much larger, and it makes we wonder how it scales to
>>     that size.
>>
>>
>> Good point - I did not realize the outliers are that big. Q183 takes
>> ~415ms, which is rather long. ~25ms json_decode, ~390ms array ->
>> objects. In itself that is not a problem, though perhaps something to
>> look at after we fixed the critical performance issues. This also does
>> illustrate that one should be careful to not fully deserilaize entities
>> when that is not needed, and that fully deserializing a collection of
>> entities in one request is something to avoid.
>>
>> Do we have code that falls in the later category? Even if we do only
>> partial deserialization, this is still going to be to costly for an
>> action done dozens of times during a request. We should also not simply
>> assume this is the case now and stop looking for what the critical
>> issues are.
>>
>> Cheers
>>
>> -- 
>> Jeroen De Dauw - http://www.bn2vs.com
>> Software craftsmanship advocate
>> Evil software architect at Wikimedia Germany
>> ~=[,,_,,]:3
>>
>>
>> _______________________________________________
>> Wikidata-tech mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>>
> 
> 
> _______________________________________________
> Wikidata-tech mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to