Re: Re: Kylin streaming questions

ShaoFeng Shi Tue, 25 Jun 2019 00:19:43 -0700

Hello Andras,

Kylin's realtime-OLAP feature supports a "Lambda" mode (mentioned in
https://kylin.apache.org/blog/2019/04/12/rt-streaming-design/), which
means, you can define a fact table whose data can be from both Kafka and
Hive. The only requirement is that all the cube columns appear in both
Kafka data and Hive data. I think maybe that can fit your need. The cube
can be built from Kafka, in the meanwhile, it can also be built from Hive,
the segments from Hive will overwrite the segments from Kafka (as usually
Hive data is more accurate). When querying the cube, Kylin will firstly
query historical segments, and then real-time segments (adding the max-time
of historical segments as the condition).



Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]




Andras Nagy <[email protected]> 于2019年6月24日周一 下午11:29写道：

> Dear Ma,
>
> Thanks for your reply.
>
> Slightly related to my original question on the hybrid model, I was
> wondering if it's possible to combine a batch and a streaming cube. I
> realized this is not possible, as a hybrid model can only be created from
> cubes of the same model (and a model points to either a batch or a
> streaming datasource).
>
> The usecase would be this:
> - we have a large amount of streaming data in Kafka that we would like to
> process with Kylin streaming
> - Kafka retention is only a few days, so if we need to change anything in
> the cubes (e.g. introduce a new metric or dimension which has been present
> in the events, but not in the cube definition), we can only reprocess a few
> days worth of data in the streaming model
> - the raw events are also written to a data lake for long-term storage
> - the data written to the data lake could be used to feed the historic
> data into a batch kylin model (and cubes)
> - I'm looking for a way to combine these, so if we want to change anything
> in the cubes, we can recalculate them for the historic data as well
>
> Is there a way to achieve this with current Kylin? (Without implementing a
> custom query layer that combines the two cubes.)
>
> Best regards,
> Andras
>
>
>
>
>
>
>
>
>
>
> On Fri, Jun 14, 2019 at 6:43 AM Ma Gang <[email protected]> wrote:
>
>> Hi Andras,
>>
>> Currently it doesn't support consume from specified offsets, only support
>> consume from startOffset or latestOffset, if you want to consume from
>> startOffset, you need to set the
>> configuration: kylin.stream.consume.offsets.latest to false in the cube's
>> overrides page.
>>
>> If you do need to start from specified offsets, please create a jira
>> request, but I think it is hard for user to know what's the offsets should
>> be set for all partitions.
>>
>> At 2019-06-13 22:34:59, "Andras Nagy" <[email protected]>
>> wrote:
>>
>> Dear Ma,
>>
>> Thank you very much!
>>
>> >1)yes, you can specify a configuration in the new cube, to consume data
>> from start offset
>> That is, an offset value for each partition of the topic? That would be
>> good - could you please point me where to do this in practice, or point me
>> to what I should read? (I haven't found it on the cube designer UI -
>> perhaps this is something that's only available on the API?)
>>
>> Many thanks,
>> Andras
>>
>>
>>
>> On Thu, Jun 13, 2019 at 1:14 PM Ma Gang <[email protected]> wrote:
>>
>>> Hi Andras,
>>> 1)yes, you can specify a configuration in the new cube, to consume data
>>> from start offset
>>>
>>> 2)It should work, but I haven't tested it yet
>>>
>>> 3)as I remember, currently we use Kafka 1.0 client library, so it is
>>> better to use the version later, I'm sure that the version before 0.9.0
>>> cannot work, but not sure 0.9.x can work or not
>>>
>>>
>>>
>>> Ma Gang
>>> 邮箱：[email protected]
>>>
>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=Ma+Gang&uid=mg4work%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Amg4work%40163.com%22%5D>
>>>
>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制
>>>
>>> On 06/13/2019 18:01, Andras Nagy <[email protected]> wrote:
>>> Greetings,
>>>
>>> I have a few questions related to the new streaming (real-time OLAP)
>>> implementation.
>>>
>>> 1) Is there a way to have data reprocessed from kafka? E.g. I change a
>>> cube definition and drop the cube (or add a new cube definition) and want
>>> to have data that is still available on kafka to be reprocessed to build
>>> the changed cube (or new cube)? Is this possible?
>>>
>>> 2) Does the hybrid model work with streaming cubes (to combine two
>>> cubes)?
>>>
>>> 3) What is minimum kafka version required? The tutorial asks to install
>>> Kafka 1.0, is this the minimum required version?
>>>
>>> Thank you very much,
>>> Andras
>>>
>>>
>>
>>
>>
>

Re: Re: Kylin streaming questions

Reply via email to