Hi Wang, This looks interesting. Would you consider submitting this as a PR once you are satisfied with the performance? --C
> On Jul 6, 2019, at 5:31 AM, 王亮 <[email protected]> wrote: > > Hi all, > > After reading that excellent book "Learning Apache Drill: Query and Analyze > Distributed Data Sources with SQL", my classmate and I also wanted to write > a Drill storage plugin. We found most DFS and NFS have been supported by > Drill, so we chose a relatively new and promising distributed file system, > IPFS. > > So we built Minerva, a Drill storage plugin that connects IPFS's > decentralized storage and Drill's flexible query engine. Any data file > stored on IPFS can be easily accessed from Drill's query interface, just > like a file stored on a local disk. The basic idea is very simple: run a > Drill instance along the IPFS daemon, and you can connect to other users on > IPFS who are also using Minerva. If one of the users happens to have stored > the file you are trying to query, then Drill can send execution plan to > that node, who executes the operations locally and returns the results > back. Of course, other users can benefit from your node as well, if you are > sharing the data they want. If there are enough people running Minerva, > data sharing and querying can be made distributed and more efficient! > > The query process is as follows: > 0 The user inputs an SQL statement, referencing a file on IPFS by its CID; > 1 The Foreman resolves the CIDs of the "pieces" of the data file, as well > as the IPFS providers of these pieces, by querying the DHT of IPFS; > 2 The Foreman distributes jobs to drillbits running on the providers. > 3 Drillbits on the providers read data from the piece of file on their > local disk, perform any necessary relational operations, and return results > to the Foreman. > 4 The Foreman returns the results to the user. > > Thanks to the modular design of Drill, we could rather "easily" write this > storage plugin. Now this plugin supports basic query operations, both read > and write, but only works with json and csv files. It is not very stable > for now, and the performance is still poor, mainly because it takes to too > long to do DHT queries on IPFS. We are trying to improve these problems in > the future. > > If you are insterested, we have made a few slides that explain the ideas in > details: > https://www.slideshare.net/BowenDing4/minerva-ipfs-storage-plugin-for-ipfs > > Any suggestion is welcome. ^_^ > > Find the code on GitHub: https://github.com/bdchain/Minerva > > Best, > Wang Liang
