The conclusion of the HBaseConAsia 2019 will be available later. And here is the note of the round table meeting after the conference. A bit long...
First we talked about splittable meta. At Xiaomi we have a cluster which has nearly 200k regions and meta is very easy to overload and can not recover. Anoop said we can try read replica, but agreed that read replica can not solve all the problems, finally we still need to split meta. Then we talked about SQL. Allan Yang said that most of their customers want secondary index, even more than SQL. And for global strong consistent secondary index, we agree that the only safe way is to use transaction. Other 'local' solutions will be in trouble when splitting/merging. Xiaomi has an global secondary index solution, open source it? Then we back to SQL. We talked about Phoenix, the problem for Phoenix is well known: not stable enough. We even had a user on the mailing-list said he/she will never use Phoenix again. Alibaba and Huawei both have their in-house SQL solution, and Huawei also talked about it on HBaseConAsia 2019, they will try to open source it. And we could introduce a SQL proxy in hbase-connector repo. No push down support first, all logics are done at the proxy side, can optimize later. Some guys said that the current feature set for 3.0.0 is not good enough to attract more users, especially for small companies. Only internal improvements, no users visible features. SQL and secondary index are very important. Yu Li talked about the CCSMap, we still want it to be release in 3.0.0. One problem is the relationship with in memory compaction. Theoretically they should have no conflicts but actually they have. And Xiaomi guys mentioned that in memory compaction still has some bugs, even for basic mode, the MVCC writePoint may be stuck and hang the region server. And Jieshan Bi asked why not just use CCSMap to replace CSLM. Yu Li said this is for better memory usage, the index and data could be placed together. Then we started to talk about the HBase on cloud. For now, it is a bit difficult to deploy HBase on cloud as we need to deploy zookeeper and HDFS first. Then we talked about the HBOSS and WAL abstraction(HBASE-209520. Wellington said the HBOSS basicly works, it use s3a and zookeeper to help simulating the operations of HDFS. We could introduce our own 'FileSystem' interface, not the hadoop one, and we could remove the 'atomic renaming' dependency so the 'FileSystem' implementation will be easier. And on the WAL abstraction, Wellington said there are still some guys working it, but now they focus on patching ratis, rather than abstracting the WAL system first. We agreed that a better way is to abstract WAL system at a level higher than FileSystem. so maybe we could even use Kafka to store the WAL. Then we talked about the FPGA usage for compaction at Alibaba. Jieshan Bi said that in Huawei they offload the compaction to storage layer. For open source solution, maybe we could offload the compaction to spark, and then use something like bulkload to let region server load the new HFiles. The problem for doing compaction inside region server is the CPU cost and GC pressure. We need to scan every cell so the CPU cost is high. Yu Li talked about their page based compaction in flink state store, maybe it could also benefit HBase. Then it is the time for MOB. Huawei said MOD can not solve their problem. We still need to read the data through RPC, and it will also introduce pressures on the memstore, since the memstore is still a bit small, comparing to MOB cell. And we will also flush a lot although there are only a small number of MOB cells in the memstore, so we still need to compact a lot. So maybe the suitable scenario for using MOB is that, most of your data are still small, and a small amount of the data are a bit larger, where MOD could increase the performance, and users do not need to use another system to store the larger data. Huawei said that they implement the logic at client side. If the data is larger than a threshold, the client will go to another storage system rather than HBase. Alibaba said that if we want to support large blob, we need to introduce streaming API. And Kuaishou said that they do not use MOB, they just store data on HDFS and the index in HBase, typical solution. Then we talked about which company to host the next year's HBaseConAsia. It will be Tencent or Huawei, or both, probably in Shenzhen. And since there is no HBaseCon in America any more(it is called 'NoSQL Day'), maybe next year we could just call the conference HBaseCon. Then we back to SQL again. Alibaba said that most of their customers are migrate from old business, so they need 'full' SQL support. That's why they need Phoenix. And lots of small companies wants to run OLAP queries directly on the database, they do no want to use ETL. So maybe in the SQL proxy(planned above), we should delegate the OLAP queries to spark SQL or something else, rather than just rejecting them. And a Phoenix committer said that, the Phoenix community are currently re-evaluate the relationship with HBase, because when upgrading to HBase 2.1.x, lots of things are broken. They plan to break the tie between Phoenix and HBase, which means Phoenix plans to also run on other storage systems. Note: This is not on the meeting but personally, I think this maybe a good news, since Phoenix is not HBase only, we have more reasons to introduce our own SQL layer. Then we talked about Kudu. It is faster than HBase on scan. If we want to increase the performance on scan, we should have larger block size, but this will lead to a slower random read, so we need to trade-off. The Kuaishou guys asked whether HBase could support storing HFile in columnar format. The answer is no, as said above, it will slow random read. But we could learn what google done in bigtable. We could write a copy of the data in parquet format to another FileSystem, and user could just scan the parquet file for better analysis performance. And if they want the newest data, they could ask HBase for the newest data, and it should be small. This is more like a solution, not only HBase is involved. But at least we could introduce some APIs in HBase so users can build the solution in their own environment. And if you do not care the newest data, you could also use replication to replicate the data to ES or other systems, and search there. And Didi talked about their problems using HBase. They use kylin so they also have lots of regions, so meta is also a problem for them. And the pressure on zookeeper is also a problem, as the replication queues are stored on zk. And after 2.1, zookeeper is only used as an external storage in replication implementation, so it is possible to switch to other storages, such as etcd. But it is still a bit difficult to store the data in a system table, as now we need to start the replication system before WAL system, but if we want to store the replication data in a hbase table, obviously the WAL system must be started before replication system, as we need the region of the system online first, and it will write an open marker to WAL. We need to find a way to break the dead lock. And they also mentioned that, the rsgroup feature also makes big znode on zookeeper, as they have lots of tables. We have HBASE-22514 which aims to solve the problem. And last, they shared their experience when upgrading from 0.98 to 1.4.x. they should be compatible but actually there are problems. They agreed to post a blog about this. And the Flipkart guys said they will open source their test-suite, which focus on the consistency(Jepsen?). This is a good news, hope we could have another useful tool other than ITBLL. That's all. Thanks for reading.