So thats good handles reads utilizing load balancer to kylin query and allows you scale nodes up/down using ECS. but what if your EMR (single master) goes down? are those clustered as well to different AZs?
On Tue, Aug 7, 2018 at 6:54 PM Chase Zhang <[email protected]> wrote: > Hi Sonny, > > I think *reload config*, instead of reload metadata will have the same > effect of wiping the cache of cubes. Please have a try. (You have to do > this upon each query node) > > Our Kylin instance and EMR are started separately. The EMR was started > first, then we use docker (ECS) to start Kylin. As to customize the > properties without building new docker image, we've written our own preload > script. We make templates of configs like kylin.properties with some > fields filled with placeholder. Once the container is started, the script > first replace those placeholders with values in environment variables, > thus, the IP address to EMR is set here. > > As for the Kylin vs EMR mapping. We only have one Kylin master node per > EMR cluster, but query node is deployed with auto scale, which means number > will change according to the situation. > > I'm afraid we don't have a video (even there is one, it will be in Chinese > which I think won't be helpful). Our docker file hasn't yet open sourced. I > will follow the progress and notify you if there is any news. > > On Aug 7, 2018, 11:12 PM +0800, Sonny Heer <[email protected]>, wrote: > > Thanks Chase. I'm assuming the wipe-cache is the same as "Reload > Metadata" under "System" tab in kylin UI. We did try doing reload metadata > via UI but that didn't seem to update the query node. > > The other key problem is how did your team coordinate between kylin and > EMR. that is also hardcoded properties in kylin.properties for where to > connect. Did you bring up Kylin & EMR at the same time so therefore > bootstrap of kylin has the EMR master node ips? Is there a 1;1 mapping of > kylin node to emr cluster? > > Is there a video of that slide deck? Also will be curious to look at your > docker image if available. thanks > > > > On Mon, Aug 6, 2018 at 8:37 PM Chase Zhang <[email protected]> > wrote: > >> Hi Sonny, >> >> I'm Chase from Strikingly. As Shaofeng has mentioned our solution, I'd >> like to have a brief introduction about it in case it will be helpful to >> you. >> >> To my understanding, the key problem of you is how to coordinate the >> master node of Kylin and its query nodes. >> >> Currently, Kylin must have a hard coded target urls at the master side >> for all query nodes and once a cube is built, master node of kylin will >> notify query nodes to update the metadata. This is because Kylin has a >> cache for related configs, although the hbase is having latest values, the >> cache might be out of date. >> >> Luckily, Kylin has provided a RESTful API for updating the cache (see >> http://kylin.apache.org/docs23/howto/howto_use_restapi.html#wipe-cache). >> >> In theory, you can manually trigger this API to make query node's >> metadata cache up to date. But if you are having multiple query instances, >> this will be come troublesome. >> >> Not like other Big Data solutions, Kylin's architecture is simple. It >> does not depends on service discovery component like Zookeeper. This makes >> Kylin easy to deploy and use, but if you're having some advanced demand, >> like auto scale, A hard coded query node's IP address and ports might not >> be good enough. >> >> As to mitigate this problem, we have developed a tool set. The basic >> ideas are: >> >> 1. Deploy Kylin with docker container >> 2. Make a separated scheduler to trigger build and monitor the status >> through RESTful API upon master nodes >> 3. Use AWS's Target Group as a service discovery solution. As query nodes >> are running inside a target group, we can use AWS's API to get all >> instance's IP address and serving ports. >> 4. Knowing a cube has been built finished as well as the entry point of >> each query nodes, the scheduler can make RESTful API to query nodes one by >> one to update their cache. >> >> Furthermore, we're now having some advanced cache management logic (like >> avoid invalidate cache while a build is failed and wait for the next build >> to recover). We embedded all these logic to our own scheduler. >> >> Hope this reply will help you. >> >> On Aug 7, 2018, 3:28 AM +0800, Sonny Heer <[email protected]>, wrote: >> >> [image: Screen Shot 2018-08-06 at 10.27.35 AM.png] >> >> >> In this diagram (from slide deck). is each HBase a different EMR >> cluster? if so how is kylin configured to connect to both? - notice the >> kylin query node shows a line connecting to both clusters. Thanks for the >> input... >> >> >> >> >> On Mon, Aug 6, 2018 at 10:56 AM Sonny Heer <[email protected]> wrote: >> >>> ShaoFeng, >>> >>> Is Strikingly open to sharing their work? It appears our use case is >>> similar and would love to see what work they have matches ours. >>> >>> On Mon, Aug 6, 2018 at 7:01 AM Sonny Heer <[email protected]> wrote: >>> >>>> Does that require a HA cluster & kylin installed on its own instance? >>>> EMR doesn't spin up services as HA on its master node. I'd be curious to >>>> see what Strikingly has done and if they have it deployed on AWS. >>>> >>>> >>>> >>>> On Sun, Aug 5, 2018 at 10:57 PM ShaoFeng Shi <[email protected]> >>>> wrote: >>>> >>>>> Hi Sonny, >>>>> >>>>> You can configure an R/W separated deployment with two EMRs: one is >>>>> Hadoop only and the other is the HBase cluster. In the EC2 that run Kylin, >>>>> install both Hadoop and HBase client/configuration. And then tell Kylin >>>>> you >>>>> have Hadoop and HBase in two clusters (refer to the blog). Kylin will run >>>>> jobs in the W cluster and bulk load HFile to the R cluster. >>>>> >>>>> https://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/ >>>>> >>>>> Many Kylin users run in this R/W separated architecture. I once tried >>>>> it on Azure with two clusters, it worked well. Not tested with EMR, but I >>>>> think they are similar. >>>>> >>>>> >>>>> 2018-08-06 10:55 GMT+08:00 Sonny Heer <[email protected]>: >>>>> >>>>>> Yea that would be great if Kylin can have a centralized metastore in >>>>>> RDS. >>>>>> >>>>>> The big problem for us now is this: >>>>>> >>>>>> 2 emr clusters each running kylin on master node. Both share hbase >>>>>> s3 root dir. >>>>>> >>>>>> Cluster A creates a cube and does a build. Cluster B can see the >>>>>> cube as it builds in “monitor”, but once cube is finished. Cube is >>>>>> “ready” >>>>>> only in cluster A (job launched from). >>>>>> >>>>>> We need somewhat isolated kylin nodes that can still share the same >>>>>> backend. This is a big win since then each cluster can scale read/write >>>>>> independently in EMR - this is our goal. Having read/write in the same >>>>>> cluster doesn’t work for various reasons... >>>>>> >>>>>> It seems kylin is really close since the monitoring of the cube is in >>>>>> sync when sharing same hbase backend. >>>>>> >>>>>> Using read replica did not work - when we try to login from the >>>>>> replica kylin want able to work >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Aug 5, 2018 at 7:01 PM ShaoFeng Shi <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Sonny, >>>>>>> >>>>>>> EMR HBase read replica is a great feature, but we didn't try. Are >>>>>>> you going to using this feature? or just want to deploy Kylin as a >>>>>>> cluster? >>>>>>> >>>>>>> If putting Kylin metadata to RDS, can it be easier for you? >>>>>>> >>>>>>> 2018-08-04 0:05 GMT+08:00 Sonny Heer <[email protected]>: >>>>>>> >>>>>>>> we'd like to use emr hbase read replicas if possible. We had some >>>>>>>> issues using this stragety since kylin requires write capability from >>>>>>>> all >>>>>>>> nodes (on login for example). >>>>>>>> >>>>>>>> idea is to cluster kylin using multiple EMRs on master node. If >>>>>>>> this isn't possible we may go with separate instance approach, but >>>>>>>> that is >>>>>>>> prone to errors as emr libs have to copied around.. >>>>>>>> >>>>>>>> ref: >>>>>>>> >>>>>>>> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ >>>>>>>> >>>>>>>> Anyone else have experience or can share their use case on emr? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> On Thu, Aug 2, 2018 at 2:32 PM Sonny Heer <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Is it possible in the new version of kylin to have multiple EMR >>>>>>>>> clusters with Kylin installed on master node but talking to the same >>>>>>>>> S3 >>>>>>>>> location. >>>>>>>>> >>>>>>>>> e.g. one Write EMR cluster and one Read EMR cluster >>>>>>>>> >>>>>>>>> ? >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> >>>>>>> Shaofeng Shi 史少锋 >>>>>>> >>>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> >>>>> Shaofeng Shi 史少锋 >>>>> >>>>>
