On 19.12.2012 14:34, Friedrich Röhrs wrote: > Hi, > > Sorry for my ignorance, if this is common knowledge: What is the use case for > sorting millions of different measures from different objects?
Finding all cities with more than 100000 inhabitants requires the database to look through all values for the property "population" (or even all properties with countable values, depending on implementation an query planning), compare each value with "100000" and return those with a greater value. To speed this up, an index sorted by this value would be needed. > For cars there could be entries by the manufacturer, by some > car-testing magazine, etc. I don't see how this could be adequatly > represented/sorted by a database only query. If this cannot be done adequatly on the database level, then it cannot be done efficiently, which means we will not allow it. So our task is to come up with an architecture that does allow this. (One way to allow "scripted" queries like this to run efficiently is to do this in a massively parallel way, using a map/reduce framework. But that's also not trivial, and would require a whole new server infrastructure). > If however this is necessary, i still don't understand why it must affect the > datavalue structure. If a index is necessary it could be done over a > serialized > representation of the value. "Serialized" can mean a lot of things, but an index on some data blob is only useful for exact matches, it can not be used for greater/lesser queries. We need to map our values to scalar data types the database can understand directly, and use for indexing. > This needs to be done anyway, since the values are > saved at a specific unit (which is just a wikidata item). To compare them on a > database level they must all be saved at the same unit, or some sort of > procedure must be used to compare them (or am i missing something again?). If they measure the same dimension, they should be saved using the same unit (probably the SI base unit for that dimension). Saving values using different units would make it impossible to run efficient queries against these values, thereby defying one of the major reasons for Wikidata's existance. I don't see a way around this. -- daniel -- Daniel Kinzler, Softwarearchitekt Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l