GWicke added a comment. In https://phabricator.wikimedia.org/T76373#941917, @Manybubbles wrote:
> I'd spin up a new one - probably just on a single node. I think in the long > run we probably can run this on the production search cluster but for now > lets keep it off just in case it does something stupid. I can put together > some puppet changes to put a single node elasticsearch instance on > einsteinium. Awesome, that would be great! I think storage is not very fast there (disks IIRC), but maybe the 64G of memory will compensate. > > For Date, I wonder if support can't be added to Titan, since Elastic AFAIK > > supports dates. > > > It sure does. They are parsed and formatted automatically but amount to a > java long since epoch under the hood. As @GWicke said that means they can't > reach back until the big bang. If instead of dealing in dates we dealt in > _seconds_ since epoch we could reach back to the big bang so long as current > estimates are right to within an order of magnitude. Instead of 292 million > years ago we'd have 292 billion years ago. Seconds wouldn't actually work, but years will, especially if represented as a double. The big bang represented as years comfortably fits into the 52-bit mantissa of a double, so we wouldn't get weird rounding artifacts like reading back 13.834520923424352345234523 billion years after storing 13.8. But we'd have plenty of range for other cosmic dates. > Elasticsearch is based on Joda Time which can handle negative years just > fine. It can't handle negative years that far back though. I've filed an > issue <https://github.com/elasticsearch/elasticsearch/issues/9048> for it but > I imagine we'll be on our own. I believe they are getting infinite precision > numbers at some point but the kind of dates we handle are probably best > stored in floating point instead. Parsing out the year portion of an ISO 8601 timestamp in the API frontend is not all that hard, so we could just go ahead and index on a double representation of that in Titan. For dates within the long cut-off, we can also convert that to long (or double), and store that in a finer-grained index. We'll then need to switch between year-based indexing and finer-grained indexing in the front-end after looking at a string representation of a timestamp in the query. > Looking at mixed indexes I wonder how they are backed to Elasticsearch. > Lucene/Elasticsearch pretty much indexes everything independently and then > ANDs the results of traversing multiple indexes together to get the answer. > That deserves some looking into. The documentation says that it efficiently supports multi-predicate queries on the same index, so I assume that it sends off all predicates to elasticsearch at once, and lets it deal with efficiently ANDing. TASK DETAIL https://phabricator.wikimedia.org/T76373 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, GWicke Cc: Smalyshev, Manybubbles, GWicke, JanZerebecki, aude, Lydia_Pintscher, Eloquence, aaron, jkroll, Wikidata-bugs, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
