Dear Ma, Thanks for your reply.
Slightly related to my original question on the hybrid model, I was wondering if it's possible to combine a batch and a streaming cube. I realized this is not possible, as a hybrid model can only be created from cubes of the same model (and a model points to either a batch or a streaming datasource). The usecase would be this: - we have a large amount of streaming data in Kafka that we would like to process with Kylin streaming - Kafka retention is only a few days, so if we need to change anything in the cubes (e.g. introduce a new metric or dimension which has been present in the events, but not in the cube definition), we can only reprocess a few days worth of data in the streaming model - the raw events are also written to a data lake for long-term storage - the data written to the data lake could be used to feed the historic data into a batch kylin model (and cubes) - I'm looking for a way to combine these, so if we want to change anything in the cubes, we can recalculate them for the historic data as well Is there a way to achieve this with current Kylin? (Without implementing a custom query layer that combines the two cubes.) Best regards, Andras On Fri, Jun 14, 2019 at 6:43 AM Ma Gang <[email protected]> wrote: > Hi Andras, > > Currently it doesn't support consume from specified offsets, only support > consume from startOffset or latestOffset, if you want to consume from > startOffset, you need to set the > configuration: kylin.stream.consume.offsets.latest to false in the cube's > overrides page. > > If you do need to start from specified offsets, please create a jira > request, but I think it is hard for user to know what's the offsets should > be set for all partitions. > > At 2019-06-13 22:34:59, "Andras Nagy" <[email protected]> > wrote: > > Dear Ma, > > Thank you very much! > > >1)yes, you can specify a configuration in the new cube, to consume data > from start offset > That is, an offset value for each partition of the topic? That would be > good - could you please point me where to do this in practice, or point me > to what I should read? (I haven't found it on the cube designer UI - > perhaps this is something that's only available on the API?) > > Many thanks, > Andras > > > > On Thu, Jun 13, 2019 at 1:14 PM Ma Gang <[email protected]> wrote: > >> Hi Andras, >> 1)yes, you can specify a configuration in the new cube, to consume data >> from start offset >> >> 2)It should work, but I haven't tested it yet >> >> 3)as I remember, currently we use Kafka 1.0 client library, so it is >> better to use the version later, I'm sure that the version before 0.9.0 >> cannot work, but not sure 0.9.x can work or not >> >> >> >> Ma Gang >> 邮箱:[email protected] >> >> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=Ma+Gang&uid=mg4work%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Amg4work%40163.com%22%5D> >> >> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 >> >> On 06/13/2019 18:01, Andras Nagy <[email protected]> wrote: >> Greetings, >> >> I have a few questions related to the new streaming (real-time OLAP) >> implementation. >> >> 1) Is there a way to have data reprocessed from kafka? E.g. I change a >> cube definition and drop the cube (or add a new cube definition) and want >> to have data that is still available on kafka to be reprocessed to build >> the changed cube (or new cube)? Is this possible? >> >> 2) Does the hybrid model work with streaming cubes (to combine two cubes)? >> >> 3) What is minimum kafka version required? The tutorial asks to install >> Kafka 1.0, is this the minimum required version? >> >> Thank you very much, >> Andras >> >> > > >
