Hi all, After reading that excellent book "Learning Apache Drill: Query and Analyze Distributed Data Sources with SQL", my classmate and I also wanted to write a Drill storage plugin. We found most DFS and NFS have been supported by Drill, so we chose a relatively new and promising distributed file system, IPFS.
So we built Minerva, a Drill storage plugin that connects IPFS's decentralized storage and Drill's flexible query engine. Any data file stored on IPFS can be easily accessed from Drill's query interface, just like a file stored on a local disk. The basic idea is very simple: run a Drill instance along the IPFS daemon, and you can connect to other users on IPFS who are also using Minerva. If one of the users happens to have stored the file you are trying to query, then Drill can send execution plan to that node, who executes the operations locally and returns the results back. Of course, other users can benefit from your node as well, if you are sharing the data they want. If there are enough people running Minerva, data sharing and querying can be made distributed and more efficient! The query process is as follows: 0 The user inputs an SQL statement, referencing a file on IPFS by its CID; 1 The Foreman resolves the CIDs of the "pieces" of the data file, as well as the IPFS providers of these pieces, by querying the DHT of IPFS; 2 The Foreman distributes jobs to drillbits running on the providers. 3 Drillbits on the providers read data from the piece of file on their local disk, perform any necessary relational operations, and return results to the Foreman. 4 The Foreman returns the results to the user. Thanks to the modular design of Drill, we could rather "easily" write this storage plugin. Now this plugin supports basic query operations, both read and write, but only works with json and csv files. It is not very stable for now, and the performance is still poor, mainly because it takes to too long to do DHT queries on IPFS. We are trying to improve these problems in the future. If you are insterested, we have made a few slides that explain the ideas in details: https://www.slideshare.net/BowenDing4/minerva-ipfs-storage-plugin-for-ipfs Any suggestion is welcome. ^_^ Find the code on GitHub: https://github.com/bdchain/Minerva Best, Wang Liang