Re: Re: Kylin streaming questions

Andras Nagy Mon, 24 Jun 2019 08:30:18 -0700

Dear Ma,

Thanks for your reply.


Slightly related to my original question on the hybrid model, I was
wondering if it's possible to combine a batch and a streaming cube. I
realized this is not possible, as a hybrid model can only be created from
cubes of the same model (and a model points to either a batch or a
streaming datasource).

The usecase would be this:
- we have a large amount of streaming data in Kafka that we would like to
process with Kylin streaming
- Kafka retention is only a few days, so if we need to change anything in
the cubes (e.g. introduce a new metric or dimension which has been present
in the events, but not in the cube definition), we can only reprocess a few
days worth of data in the streaming model
- the raw events are also written to a data lake for long-term storage
- the data written to the data lake could be used to feed the historic data
into a batch kylin model (and cubes)
- I'm looking for a way to combine these, so if we want to change anything
in the cubes, we can recalculate them for the historic data as well

Is there a way to achieve this with current Kylin? (Without implementing a
custom query layer that combines the two cubes.)

Best regards,
Andras










On Fri, Jun 14, 2019 at 6:43 AM Ma Gang <[email protected]> wrote:

> Hi Andras,
>
> Currently it doesn't support consume from specified offsets, only support
> consume from startOffset or latestOffset, if you want to consume from
> startOffset, you need to set the
> configuration: kylin.stream.consume.offsets.latest to false in the cube's
> overrides page.
>
> If you do need to start from specified offsets, please create a jira
> request, but I think it is hard for user to know what's the offsets should
> be set for all partitions.
>
> At 2019-06-13 22:34:59, "Andras Nagy" <[email protected]>
> wrote:
>
> Dear Ma,
>
> Thank you very much!
>
> >1)yes, you can specify a configuration in the new cube, to consume data
> from start offset
> That is, an offset value for each partition of the topic? That would be
> good - could you please point me where to do this in practice, or point me
> to what I should read? (I haven't found it on the cube designer UI -
> perhaps this is something that's only available on the API?)
>
> Many thanks,
> Andras
>
>
>
> On Thu, Jun 13, 2019 at 1:14 PM Ma Gang <[email protected]> wrote:
>
>> Hi Andras,
>> 1)yes, you can specify a configuration in the new cube, to consume data
>> from start offset
>>
>> 2)It should work, but I haven't tested it yet
>>
>> 3)as I remember, currently we use Kafka 1.0 client library, so it is
>> better to use the version later, I'm sure that the version before 0.9.0
>> cannot work, but not sure 0.9.x can work or not
>>
>>
>>
>> Ma Gang
>> 邮箱：[email protected]
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=Ma+Gang&uid=mg4work%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Amg4work%40163.com%22%5D>
>>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制
>>
>> On 06/13/2019 18:01, Andras Nagy <[email protected]> wrote:
>> Greetings,
>>
>> I have a few questions related to the new streaming (real-time OLAP)
>> implementation.
>>
>> 1) Is there a way to have data reprocessed from kafka? E.g. I change a
>> cube definition and drop the cube (or add a new cube definition) and want
>> to have data that is still available on kafka to be reprocessed to build
>> the changed cube (or new cube)? Is this possible?
>>
>> 2) Does the hybrid model work with streaming cubes (to combine two cubes)?
>>
>> 3) What is minimum kafka version required? The tutorial asks to install
>> Kafka 1.0, is this the minimum required version?
>>
>> Thank you very much,
>> Andras
>>
>>
>
>
>

Re: Re: Kylin streaming questions

Reply via email to