Re: blink SQL从kafka中获取rowtime

Jark Wu Thu, 17 Oct 2019 19:17:19 -0700

Hi Zijie,

应该是你的 sqlTimestamp 字段中有 null 的数据，在去取 ts 的时候报 NPE 了。
目前 watermark assigner 要求每条数据的 ts 都是有值的。


Best,
Jark 

> 在 2019年10月17日，20:25，Zijie Lu <[email protected]> 写道：
> 
> CREATE TABLE requests(
> `rowtime` TIMESTAMP,
> `requestId` VARCHAR,
> `algoExtent` ROW(`mAdId` VARCHAR))
> with (
>  'connector.type' = 'kafka',
>  'connector.version' = 'universal',
>  'connector.topic' = 'test_request',
>  'connector.startup-mode' = 'latest-offset',
>  'connector.properties.0.key' = 'zookeeper.connect',
>  'connector.properties.0.value' = '10.107.116.42:2181',
>  'connector.properties.1.key' = 'bootstrap.servers',
>  'connector.properties.1.value' = '10.107.116.42:9092',
>  'connector.properties.2.key' = 'group.id',
>  'connector.properties.2.value' = 'test_request',
>  'update-mode' = 'append','format.type' = 'json',
>  'format.json-schema': '{type: "object", properties: {sqlTimestamp: {
> type: "string"}, requestId: { type: "string"}, "algoExtent": {type:
> "object", "properties": {"mAdId": {type: "string"}}}}}'
>  'schema.0.rowtime.timestamps.type' = 'from-field',
>  'schema.0.rowtime.timestamps.from' = 'sqlTimestamp',
>  'schema.0.rowtime.watermarks.type' = 'periodic-ascending')
> 尝试过这样的定义也是报同样的错
> 
> On Thu, 17 Oct 2019 at 20:22, Zijie Lu <[email protected]> wrote:
> 
>> 而这个定义在old planner里是可以用的
>> 
>> On Thu, 17 Oct 2019 at 19:49, Zijie Lu <[email protected]> wrote:
>> 
>>> 我使用blink planner来定义了下面的表
>>> CREATE TABLE requests(
>>> `rowtime` TIMESTAMP,
>>> `requestId` VARCHAR,
>>> `algoExtent` ROW(`mAdId` VARCHAR))
>>> with (
>>>  'connector.type' = 'kafka',
>>>  'connector.version' = 'universal',
>>>  'connector.topic' = 'test_request',
>>>  'connector.startup-mode' = 'latest-offset',
>>>  'connector.properties.0.key' = 'zookeeper.connect',
>>>  'connector.properties.0.value' = '10.107.116.42:2181',
>>>  'connector.properties.1.key' = 'bootstrap.servers',
>>>  'connector.properties.1.value' = '10.107.116.42:9092',
>>>  'connector.properties.2.key' = 'group.id',
>>>  'connector.properties.2.value' = 'test_request',
>>>  'update-mode' = 'append','format.type' = 'json',
>>>  'format.derive-schema' = 'true',
>>>  'schema.0.rowtime.timestamps.type' = 'from-field',
>>>  'schema.0.rowtime.timestamps.from' = 'sqlTimestamp',
>>>  'schema.0.rowtime.watermarks.type' = 'periodic-ascending')
>>> 然后kafka里消息的格式如下
>>> {"requestId":  "rrrr","algoExtent": {"duration": 12,"adType ":
>>> "FEED_568_320","mAdId":  "1910141050233527", "sqlTimestamp":"2019-10-17
>>> 19:08:01" }}
>>> 但是运行时报错
>>> Caused by:
>>> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException:
>>> Could not forward element to next operator
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:654)
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
>>>        at
>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
>>>        at
>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
>>>        at
>>> org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:310)
>>>        at
>>> org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409)
>>>        at
>>> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
>>>        at
>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
>>>        at
>>> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
>>>        at
>>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
>>>        at
>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
>>>        at
>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:202)
>>> Caused by:
>>> org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException:
>>> Could not forward element to next operator
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:654)
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
>>>        at
>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
>>>        at
>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
>>>        at SourceConversion$4.processElement(Unknown Source)
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
>>>        ... 13 more
>>> Caused by: java.lang.NullPointerException
>>>        at
>>> org.apache.flink.table.dataformat.GenericRow.getLong(GenericRow.java:58)
>>>        at
>>> org.apache.flink.table.planner.plan.nodes.physical.stream.PeriodicWatermarkAssignerWrapper.extractTimestamp(StreamExecTableSourceScan.scala:202)
>>>        at
>>> org.apache.flink.table.planner.plan.nodes.physical.stream.PeriodicWatermarkAssignerWrapper.extractTimestamp(StreamExecTableSourceScan.scala:194)
>>>        at
>>> org.apache.flink.streaming.runtime.operators.TimestampsAndPeriodicWatermarksOperator.processElement(TimestampsAndPeriodicWatermarksOperator.java:64)
>>>        at
>>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:637)
>>>        ... 19 more
>>> 请问在blink里应该如何定义rowtime呢？
>>> 
>>

Re: blink SQL从kafka中获取rowtime

回复