Hi ShaoFeng, Thanks a lot for the pointer on the lambda mode, yes, that's exactly what I need :)
Is there perhaps documentation on this? For now, I was trying to get this working 'empirically' and finally succeeded, but some of my conclusions may be wrong. This is what I concluded: - hive table must have the same name as the streaming table (name given to the data source) - cube can't be built from UI (to build the historic segments from the data in hive), but it can be built using the REST API - cube build engine must be mapreduce. For Spark as build engine I got exception "Cannot adapt to interface org.apache.kylin.engine.spark.ISparkOutput" - endTime must be non-overlapping with the streaming data. When I had overlap, the streaming data coming from kafka did not show up in the output, I guess this is what you meant by "the segments from Hive will overwrite the segments from Kafka". Are these correct conclusions? Is there anything else I should be aware of? Many thanks, Andras On Tue, Jun 25, 2019 at 9:19 AM ShaoFeng Shi <[email protected]> wrote: > Hello Andras, > > Kylin's realtime-OLAP feature supports a "Lambda" mode (mentioned in > https://kylin.apache.org/blog/2019/04/12/rt-streaming-design/), which > means, you can define a fact table whose data can be from both Kafka and > Hive. The only requirement is that all the cube columns appear in both > Kafka data and Hive data. I think maybe that can fit your need. The cube > can be built from Kafka, in the meanwhile, it can also be built from Hive, > the segments from Hive will overwrite the segments from Kafka (as usually > Hive data is more accurate). When querying the cube, Kylin will firstly > query historical segments, and then real-time segments (adding the max-time > of historical segments as the condition). > > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Email: [email protected] > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: [email protected] > Join Kylin dev mail group: [email protected] > > > > > Andras Nagy <[email protected]> 于2019年6月24日周一 下午11:29写道: > >> Dear Ma, >> >> Thanks for your reply. >> >> Slightly related to my original question on the hybrid model, I was >> wondering if it's possible to combine a batch and a streaming cube. I >> realized this is not possible, as a hybrid model can only be created from >> cubes of the same model (and a model points to either a batch or a >> streaming datasource). >> >> The usecase would be this: >> - we have a large amount of streaming data in Kafka that we would like to >> process with Kylin streaming >> - Kafka retention is only a few days, so if we need to change anything in >> the cubes (e.g. introduce a new metric or dimension which has been present >> in the events, but not in the cube definition), we can only reprocess a few >> days worth of data in the streaming model >> - the raw events are also written to a data lake for long-term storage >> - the data written to the data lake could be used to feed the historic >> data into a batch kylin model (and cubes) >> - I'm looking for a way to combine these, so if we want to change >> anything in the cubes, we can recalculate them for the historic data as well >> >> Is there a way to achieve this with current Kylin? (Without implementing >> a custom query layer that combines the two cubes.) >> >> Best regards, >> Andras >> >> >> >> >> >> >> >> >> >> >> On Fri, Jun 14, 2019 at 6:43 AM Ma Gang <[email protected]> wrote: >> >>> Hi Andras, >>> >>> Currently it doesn't support consume from specified offsets, only >>> support consume from startOffset or latestOffset, if you want to consume >>> from startOffset, you need to set the >>> configuration: kylin.stream.consume.offsets.latest to false in the cube's >>> overrides page. >>> >>> If you do need to start from specified offsets, please create a jira >>> request, but I think it is hard for user to know what's the offsets should >>> be set for all partitions. >>> >>> At 2019-06-13 22:34:59, "Andras Nagy" <[email protected]> >>> wrote: >>> >>> Dear Ma, >>> >>> Thank you very much! >>> >>> >1)yes, you can specify a configuration in the new cube, to consume >>> data from start offset >>> That is, an offset value for each partition of the topic? That would be >>> good - could you please point me where to do this in practice, or point me >>> to what I should read? (I haven't found it on the cube designer UI - >>> perhaps this is something that's only available on the API?) >>> >>> Many thanks, >>> Andras >>> >>> >>> >>> On Thu, Jun 13, 2019 at 1:14 PM Ma Gang <[email protected]> wrote: >>> >>>> Hi Andras, >>>> 1)yes, you can specify a configuration in the new cube, to consume data >>>> from start offset >>>> >>>> 2)It should work, but I haven't tested it yet >>>> >>>> 3)as I remember, currently we use Kafka 1.0 client library, so it is >>>> better to use the version later, I'm sure that the version before 0.9.0 >>>> cannot work, but not sure 0.9.x can work or not >>>> >>>> >>>> >>>> Ma Gang >>>> 邮箱:[email protected] >>>> >>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=Ma+Gang&uid=mg4work%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Amg4work%40163.com%22%5D> >>>> >>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 >>>> >>>> On 06/13/2019 18:01, Andras Nagy <[email protected]> wrote: >>>> Greetings, >>>> >>>> I have a few questions related to the new streaming (real-time OLAP) >>>> implementation. >>>> >>>> 1) Is there a way to have data reprocessed from kafka? E.g. I change a >>>> cube definition and drop the cube (or add a new cube definition) and want >>>> to have data that is still available on kafka to be reprocessed to build >>>> the changed cube (or new cube)? Is this possible? >>>> >>>> 2) Does the hybrid model work with streaming cubes (to combine two >>>> cubes)? >>>> >>>> 3) What is minimum kafka version required? The tutorial asks to install >>>> Kafka 1.0, is this the minimum required version? >>>> >>>> Thank you very much, >>>> Andras >>>> >>>> >>> >>> >>> >>
