Manybubbles added a comment. In https://phabricator.wikimedia.org/T88549#1018178, @Fceller wrote:
> Hi, I'm the CTO of ArangoDB, so my comments are most certainly biased. I > still would like to tell you about our opinions on the raised issues, namely > full-text indexes and blueprint. Thanks for replying! > (1) We do not believe that TP is helpful in a shared environment. Gremlin is > a nice language, but it requires you to move a lot of data into the client. > This works very well if you can embedded the database and keep it in the same > process space. As soon as you need to shard the data and spread it to many > servers you will move a lot of data between Gremlin and the DBservers. > Therefore we decided to create a Javascript version of Gremlin which runs > directly on the shards thus minimising the amount of moved data. Therefore it > is indeed true, that we did not add support for TP3 because we believe it > will be of limited use. Have a look at what they are working on now in their master branch - I think they've struck on a good notion: they _heavily_ deprecating predicates in place of anonymous filters. And those should be possible to optimize. > (2) Fulltext indexes are not our main expertise. We think that search engines > like ElasticSearch, Solr are much better in this - especially when it comes > to stemming, different languages, phonetic searches. There is an elastic > search plugin to use ElasticSearch as fulltext search engine for ArangoDB. > The fulltext index is indeed very slow when building. We want to speed up the > process and hopefully can improve there over time (see also the next bullet > point). I assume that you are using a fulltext index in your example, right? That makes sense. I don't plan on using full text indexes in this project at all unless something unexpected comes up. Even so, we have much more experience with Elasticsearch and Lucene so it'd make sense to go there. > (3) We decided to keep the indexes only in memory. The reason are as follows. > > There are various possibilities: > > (1) use memory only indexes (this is currently implemented in ArangoDB) > (2) use disk-based indexes (this is currently implemented in CouchDB) > (3) disk-backed with a file-system like clean flag > (4) other solutions like keeping only parts in memory, use memory as a > cache, and so on are also possible > > There is a trade-off: > > Runtime behaviour: > > (1) this is the fastest solution > (2) this is the slowest solution because you need to ensure that there are > no inconsistencies even in case of a server crash. If you have a look at what > CouchDB you will see what I mean. You need to do much more synching then in > (1). > (3) could be nearly as fast as (1) > > Startup behaviour: > > (1) this is the slowest solution > (2) this is the fastest solution > (3) depends: with a clean shutdown as fast as (2), with a crash as slow as > (1) > > So if you expect your server to crash often, then (1) might not be a good > idea. If you expect your server to run stable, then (1) might be much fast > during normal operations. The best of all world would be (3). ArangoDB > currently uses (1), but we want to switch to (3). You could also go with a Lucene-like write once behavior. I don't know that it'd be a good match at all though. It matches well with the infrequently updated asynchronous nature of full text search but it feels lie it'd be more troubling for something like ArangoDB. Also probably more work to implement than clean shutdown. Anyway, i'm sure you've spent more time thinking about it than I have. TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Manybubbles Cc: Neunhoef, Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
