[Wikidata-bugs] [Maniphest] [Commented On] T76373: Evaluate Titan as graph storage/query engine for Wikidata Query service

GWicke Tue, 23 Dec 2014 09:13:13 -0800

GWicke added a comment.

In https://phabricator.wikimedia.org/T76373#941917, @Manybubbles wrote:


> I'd spin up a new one - probably just on a single node.  I think in the long 
> run we probably can run this on the production search cluster but for now 
> lets keep it off just in case it does something stupid.  I can put together 
> some puppet changes to put a single node elasticsearch instance on 
> einsteinium.


Awesome, that would be great! I think storage is not very fast there (disks 
IIRC), but maybe the 64G of memory will compensate.

> > For Date, I wonder if support can't be added to Titan, since Elastic AFAIK 
> > supports dates.

> 

> 

> It sure does.  They are parsed and formatted automatically but amount to a 
> java long since epoch under the hood.  As @GWicke said that means they can't 
> reach back until the big bang.  If instead of dealing in dates we dealt in 
> _seconds_ since epoch we could reach back to the big bang so long as current 
> estimates are right to within an order of magnitude.  Instead of 292 million 
> years ago we'd have 292 billion years ago.


Seconds wouldn't actually work, but years will, especially if represented as a 
double. The big bang represented as years comfortably fits into the 52-bit 
mantissa of a double, so we wouldn't get weird rounding artifacts like reading 
back 13.834520923424352345234523 billion years after storing 13.8. But we'd 
have plenty of range for other cosmic dates.

> Elasticsearch is based on Joda Time which can handle negative years just 
> fine.  It can't handle negative years that far back though.  I've filed an 
> issue <https://github.com/elasticsearch/elasticsearch/issues/9048> for it but 
> I imagine we'll be on our own.  I believe they are getting infinite precision 
> numbers at some point but the kind of dates we handle are probably best 
> stored in floating point instead.


Parsing out the year portion of an ISO 8601 timestamp in the API frontend is 
not all that hard, so we could just go ahead and index on a double 
representation of that in Titan. For dates within the long cut-off, we can also 
convert that to long (or double), and store that in a finer-grained index. 
We'll then need to switch between year-based indexing and finer-grained 
indexing in the front-end after looking at a string representation of a 
timestamp in the query.

> Looking at mixed indexes I wonder how they are backed to Elasticsearch.  
> Lucene/Elasticsearch pretty much indexes everything independently and then 
> ANDs the results of traversing multiple indexes together to get the answer.  
> That deserves some looking into.


The documentation says that it efficiently supports multi-predicate queries on 
the same index, so I assume that it sends off all predicates to elasticsearch 
at once, and lets it deal with efficiently ANDing.


TASK DETAIL
  https://phabricator.wikimedia.org/T76373

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, GWicke
Cc: Smalyshev, Manybubbles, GWicke, JanZerebecki, aude, Lydia_Pintscher, 
Eloquence, aaron, jkroll, Wikidata-bugs, daniel



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T76373: Evaluate Titan as graph storage/query engine for Wikidata Query service

Reply via email to