Fceller added a subscriber: Fceller. Fceller added a comment. Hi, I'm the CTO of ArangoDB, so my comments are most certainly biased. I still would like to tell you about our opinions on the raised issues, namely full-text indexes and blueprint.
(1) We do not believe that TP is helpful in a shared environment. Gremlin is a nice language, but it requires you to move a lot of data into the client. This works very well if you can embedded the database and keep it in the same process space. As soon as you need to shard the data and spread it to many servers you will move a lot of data between Gremlin and the DBservers. Therefore we decided to create a Javascript version of Gremlin which runs directly on the shards thus minimising the amount of moved data. Therefore it is indeed true, that we did not add support for TP3 because we believe it will be of limited use. (2) Fulltext indexes are not our main expertise. We think that search engines like ElasticSearch, Solr are much better in this - especially when it comes to stemming, different languages, phonetic searches. There is an elastic search plugin to use ElasticSearch as fulltext search engine for ArangoDB. The fulltext index is indeed very slow when building. We want to speed up the process and hopefully can improve there over time (see also the next bullet point). I assume that you are using a fulltext index in your example, right? (3) We decided to keep the indexes only in memory. The reason are as follows. There are various possibilities: (1) use memory only indexes (this is currently implemented in ArangoDB) (2) use disk-based indexes (this is currently implemented in CouchDB) (3) disk-backed with a file-system like clean flag (4) other solutions like keeping only parts in memory, use memory as a cache, and so on are also possible There is a trade-off: Runtime behaviour: (1) this is the fastest solution (2) this is the slowest solution because you need to ensure that there are no inconsistencies even in case of a server crash. If you have a look at what CouchDB you will see what I mean. You need to do much more synching then in (1). (3) could be nearly as fast as (1) Startup behaviour: (1) this is the slowest solution (2) this is the fastest solution (3) depends: with a clean shutdown as fast as (2), with a crash as slow as (1) So if you expect your server to crash often, then (1) might not be a good idea. If you expect your server to run stable, then (1) might be much fast during normal operations. The best of all world would be (3). ArangoDB currently uses (1), but we want to switch to (3). TASK DETAIL https://phabricator.wikimedia.org/T88549 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Fceller Cc: Fceller, JanZerebecki, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, aude, GWicke, daniel _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
