Thanks for the thorough write-up Duo. Made for a good read.... S On Fri, Jul 26, 2019 at 6:43 AM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:
> The conclusion of the HBaseConAsia 2019 will be available later. And here > is the note of the round table meeting after the conference. A bit long... > > First we talked about splittable meta. At Xiaomi we have a cluster which > has nearly 200k regions and meta is very easy to overload and can not > recover. Anoop said we can try read replica, but agreed that read replica > can not solve all the problems, finally we still need to split meta. > > Then we talked about SQL. Allan Yang said that most of their customers want > secondary index, even more than SQL. And for global strong consistent > secondary index, we agree that the only safe way is to use transaction. > Other 'local' solutions will be in trouble when splitting/merging. Xiaomi > has an global secondary index solution, open source it? > > Then we back to SQL. We talked about Phoenix, the problem for Phoenix is > well known: not stable enough. We even had a user on the mailing-list said > he/she will never use Phoenix again. Alibaba and Huawei both have their > in-house SQL solution, and Huawei also talked about it on HBaseConAsia > 2019, they will try to open source it. And we could introduce a SQL proxy > in hbase-connector repo. No push down support first, all logics are done at > the proxy side, can optimize later. > > Some guys said that the current feature set for 3.0.0 is not good enough to > attract more users, especially for small companies. Only internal > improvements, no users visible features. SQL and secondary index are very > important. > > Yu Li talked about the CCSMap, we still want it to be release in 3.0.0. One > problem is the relationship with in memory compaction. Theoretically they > should have no conflicts but actually they have. And Xiaomi guys mentioned > that in memory compaction still has some bugs, even for basic mode, the > MVCC writePoint may be stuck and hang the region server. And Jieshan Bi > asked why not just use CCSMap to replace CSLM. Yu Li said this is for > better memory usage, the index and data could be placed together. > > Then we started to talk about the HBase on cloud. For now, it is a bit > difficult to deploy HBase on cloud as we need to deploy zookeeper and HDFS > first. Then we talked about the HBOSS and WAL abstraction(HBASE-209520. > Wellington said the HBOSS basicly works, it use s3a and zookeeper to help > simulating the operations of HDFS. We could introduce our own 'FileSystem' > interface, not the hadoop one, and we could remove the 'atomic renaming' > dependency so the 'FileSystem' implementation will be easier. And on the > WAL abstraction, Wellington said there are still some guys working it, but > now they focus on patching ratis, rather than abstracting the WAL system > first. We agreed that a better way is to abstract WAL system at a level > higher than FileSystem. so maybe we could even use Kafka to store the WAL. > > Then we talked about the FPGA usage for compaction at Alibaba. Jieshan Bi > said that in Huawei they offload the compaction to storage layer. For open > source solution, maybe we could offload the compaction to spark, and then > use something like bulkload to let region server load the new HFiles. The > problem for doing compaction inside region server is the CPU cost and GC > pressure. We need to scan every cell so the CPU cost is high. Yu Li talked > about their page based compaction in flink state store, maybe it could also > benefit HBase. > > Then it is the time for MOB. Huawei said MOD can not solve their problem. > We still need to read the data through RPC, and it will also introduce > pressures on the memstore, since the memstore is still a bit small, > comparing to MOB cell. And we will also flush a lot although there are only > a small number of MOB cells in the memstore, so we still need to compact a > lot. So maybe the suitable scenario for using MOB is that, most of your > data are still small, and a small amount of the data are a bit larger, > where MOD could increase the performance, and users do not need to use > another system to store the larger data. > Huawei said that they implement the logic at client side. If the data is > larger than a threshold, the client will go to another storage system > rather than HBase. > Alibaba said that if we want to support large blob, we need to introduce > streaming API. > And Kuaishou said that they do not use MOB, they just store data on HDFS > and the index in HBase, typical solution. > > Then we talked about which company to host the next year's HBaseConAsia. It > will be Tencent or Huawei, or both, probably in Shenzhen. And since there > is no HBaseCon in America any more(it is called 'NoSQL Day'), maybe next > year we could just call the conference HBaseCon. > > Then we back to SQL again. Alibaba said that most of their customers are > migrate from old business, so they need 'full' SQL support. That's why they > need Phoenix. And lots of small companies wants to run OLAP queries > directly on the database, they do no want to use ETL. So maybe in the SQL > proxy(planned above), we should delegate the OLAP queries to spark SQL or > something else, rather than just rejecting them. > > And a Phoenix committer said that, the Phoenix community are currently > re-evaluate the relationship with HBase, because when upgrading to HBase > 2.1.x, lots of things are broken. They plan to break the tie between > Phoenix and HBase, which means Phoenix plans to also run on other storage > systems. > Note: This is not on the meeting but personally, I think this maybe a good > news, since Phoenix is not HBase only, we have more reasons to introduce > our own SQL layer. > > Then we talked about Kudu. It is faster than HBase on scan. If we want to > increase the performance on scan, we should have larger block size, but > this will lead to a slower random read, so we need to trade-off. > The Kuaishou guys asked whether HBase could support storing HFile in > columnar format. The answer is no, as said above, it will slow random read. > But we could learn what google done in bigtable. We could write a copy of > the data in parquet format to another FileSystem, and user could just scan > the parquet file for better analysis performance. And if they want the > newest data, they could ask HBase for the newest data, and it should be > small. This is more like a solution, not only HBase is involved. But at > least we could introduce some APIs in HBase so users can build the solution > in their own environment. And if you do not care the newest data, you could > also use replication to replicate the data to ES or other systems, and > search there. > > And Didi talked about their problems using HBase. They use kylin so they > also have lots of regions, so meta is also a problem for them. And the > pressure on zookeeper is also a problem, as the replication queues are > stored on zk. And after 2.1, zookeeper is only used as an external storage > in replication implementation, so it is possible to switch to other > storages, such as etcd. But it is still a bit difficult to store the data > in a system table, as now we need to start the replication system before > WAL system, but if we want to store the replication data in a hbase table, > obviously the WAL system must be started before replication system, as we > need the region of the system online first, and it will write an open > marker to WAL. We need to find a way to break the dead lock. > And they also mentioned that, the rsgroup feature also makes big znode on > zookeeper, as they have lots of tables. We have HBASE-22514 which aims to > solve the problem. > And last, they shared their experience when upgrading from 0.98 to 1.4.x. > they should be compatible but actually there are problems. They agreed to > post a blog about this. > > And the Flipkart guys said they will open source their test-suite, which > focus on the consistency(Jepsen?). This is a good news, hope we could have > another useful tool other than ITBLL. > > That's all. Thanks for reading. >