回复:flink-1.11 ddl kafka-to-hive问题

2020-07-22 文章 kcz
谢谢大佬们,公众号有demo了,我去对比一下看看





-- 原始邮件 --
发件人: Jingsong Li https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> <
> 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> >
>
> > 在 2020年7月21日,22:57,kcz <573693...@qq.com> 写道:
> >
> > 一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的
> >
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: JasonLee <17610775...@163.com 
<mailto:17610775...@163.com>&gt;
> > 发送时间: 2020年7月21日 20:39
> > 收件人: user-zh mailto:user-zh@flink.apache.org
> >&gt;
> > 主题: 回复:flink-1.11 ddl kafka-to-hive问题
> >
> >
> >
> > hi
> > hive表是一直没有数据还是过一段时间就有数据了?
> >
> >
> > | |
> > JasonLee
> > |
> > |
> > 邮箱:17610775...@163.com
> > |
> >
> > Signature is customized by Netease Mail Master
> >
> > 在2020年07月21日 19:09,kcz 写道:
> > hive-1.2.1
> > chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
> > String hiveSql = "CREATE&nbsp; TABLE&nbsp; 
stream_tmp.fs_table (\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
"&nbsp; host STRING,\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
"&nbsp; url STRING," +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
"&nbsp; public_date STRING" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ") 
partitioned by (public_date
> string) " +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "stored 
as PARQUET " +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
"TBLPROPERTIES (\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
"&nbsp;
> 'sink.partition-commit.delay'='0 s',\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
"&nbsp;
> 'sink.partition-commit.trigger'='partition-time',\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
"&nbsp;
> 'sink.partition-commit.policy.kind'='metastore,success-file'" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ")";
> > tableEnv.executeSql(hiveSql);
> >
> >
> > tableEnv.executeSql("INSERT INTO&nbsp; stream_tmp.fs_table SELECT 
host,
> url, DATE_FORMAT(public_date, '-MM-dd') FROM stream_tmp.source_table");
>
>

-- 
Best, Jingsong Lee

Re: flink-1.11 ddl kafka-to-hive问题

2020-07-21 文章 Jingsong Li
你的Source表是怎么定义的?确定有watermark前进吗?(可以看Flink UI)

'sink.partition-commit.trigger'='partition-time' 去掉试试?

Best,
Jingsong

On Wed, Jul 22, 2020 at 12:02 AM Leonard Xu  wrote:

> HI,
>
> Hive 表时在flink里建的吗? 如果是建表时使用了hive dialect吗?可以参考[1]设置下
>
> Best
> Leonard Xu
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> <
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> >
>
> > 在 2020年7月21日,22:57,kcz <573693...@qq.com> 写道:
> >
> > 一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的
> >
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: JasonLee <17610775...@163.com <mailto:17610775...@163.com>>
> > 发送时间: 2020年7月21日 20:39
> > 收件人: user-zh mailto:user-zh@flink.apache.org
> >>
> > 主题: 回复:flink-1.11 ddl kafka-to-hive问题
> >
> >
> >
> > hi
> > hive表是一直没有数据还是过一段时间就有数据了?
> >
> >
> > | |
> > JasonLee
> > |
> > |
> > 邮箱:17610775...@163.com
> > |
> >
> > Signature is customized by Netease Mail Master
> >
> > 在2020年07月21日 19:09,kcz 写道:
> > hive-1.2.1
> > chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
> > String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
> >    "  host STRING,\n" +
> >    "  url STRING," +
> >    "  public_date STRING" +
> >    ") partitioned by (public_date
> string) " +
> >    "stored as PARQUET " +
> >    "TBLPROPERTIES (\n" +
> >    " 
> 'sink.partition-commit.delay'='0 s',\n" +
> >    " 
> 'sink.partition-commit.trigger'='partition-time',\n" +
> >    " 
> 'sink.partition-commit.policy.kind'='metastore,success-file'" +
> >    ")";
> > tableEnv.executeSql(hiveSql);
> >
> >
> > tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host,
> url, DATE_FORMAT(public_date, '-MM-dd') FROM stream_tmp.source_table");
>
>

-- 
Best, Jingsong Lee


Re: flink-1.11 ddl kafka-to-hive问题

2020-07-21 文章 Leonard Xu
HI,

Hive 表时在flink里建的吗? 如果是建表时使用了hive dialect吗?可以参考[1]设置下

Best
Leonard Xu
[1] 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
 
<https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect>

> 在 2020年7月21日,22:57,kcz <573693...@qq.com> 写道:
> 
> 一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的
> 
> 
> 
> 
> 
> -- 原始邮件 --
> 发件人: JasonLee <17610775...@163.com <mailto:17610775...@163.com>>
> 发送时间: 2020年7月21日 20:39
> 收件人: user-zh mailto:user-zh@flink.apache.org>>
> 主题: 回复:flink-1.11 ddl kafka-to-hive问题
> 
> 
> 
> hi
> hive表是一直没有数据还是过一段时间就有数据了?
> 
> 
> | |
> JasonLee
> |
> |
> 邮箱:17610775...@163.com
> |
> 
> Signature is customized by Netease Mail Master
> 
> 在2020年07月21日 19:09,kcz 写道:
> hive-1.2.1
> chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
> String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
>    "  host STRING,\n" +
>    "  url STRING," +
>    "  public_date STRING" +
>    ") partitioned by (public_date string) " 
> +
>    "stored as PARQUET " +
>    "TBLPROPERTIES (\n" +
>    "  'sink.partition-commit.delay'='0 
> s',\n" +
>    "  
> 'sink.partition-commit.trigger'='partition-time',\n" +
>    "  
> 'sink.partition-commit.policy.kind'='metastore,success-file'" +
>    ")";
> tableEnv.executeSql(hiveSql);
> 
> 
> tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url, 
> DATE_FORMAT(public_date, '-MM-dd') FROM stream_tmp.source_table");



回复:flink-1.11 ddl kafka-to-hive问题

2020-07-21 文章 kcz
一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的





-- 原始邮件 --
发件人: JasonLee <17610775...@163.com>
发送时间: 2020年7月21日 20:39
收件人: user-zh 

Re: flink-1.11 ddl kafka-to-hive问题

2020-07-21 文章 Jark Wu
rolling 策略配一下?
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/filesystem.html#sink-rolling-policy-rollover-interval

Best,
Jark

On Tue, 21 Jul 2020 at 20:38, JasonLee <17610775...@163.com> wrote:

> hi
> hive表是一直没有数据还是过一段时间就有数据了?
>
>
> | |
> JasonLee
> |
> |
> 邮箱:17610775...@163.com
> |
>
> Signature is customized by Netease Mail Master
>
> 在2020年07月21日 19:09,kcz 写道:
> hive-1.2.1
> chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
> String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
>"  host STRING,\n" +
>"  url STRING," +
>"  public_date STRING" +
>") partitioned by (public_date string) " +
>"stored as PARQUET " +
>"TBLPROPERTIES (\n" +
>"  'sink.partition-commit.delay'='0 s',\n" +
>"  'sink.partition-commit.trigger'='partition-time',\n" +
>"  'sink.partition-commit.policy.kind'='metastore,success-file'" +
>")";
> tableEnv.executeSql(hiveSql);
>
>
> tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url,
> DATE_FORMAT(public_date, '-MM-dd') FROM stream_tmp.source_table");


??????flink-1.11 ddl kafka-to-hive????

2020-07-21 文章 JasonLee
hi
hive??


| |
JasonLee
|
|
??17610775...@163.com
|

Signature is customized by Netease Mail Master

??2020??07??21?? 19:09??kcz ??
hive-1.2.1
chk 
??chkchk??kafkahive??
String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
   "  host STRING,\n" +
   "  url STRING," +
   "  public_date STRING" +
   ") partitioned by (public_date string) " +
   "stored as PARQUET " +
   "TBLPROPERTIES (\n" +
   "  'sink.partition-commit.delay'='0 s',\n" +
   "  'sink.partition-commit.trigger'='partition-time',\n" +
   "  'sink.partition-commit.policy.kind'='metastore,success-file'" +
   ")";
tableEnv.executeSql(hiveSql);


tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url, 
DATE_FORMAT(public_date, '-MM-dd') FROM stream_tmp.source_table");

flink-1.11 ddl kafka-to-hive????

2020-07-21 文章 kcz
hive-1.2.1
chk 
??chkchk??kafkahive??
String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
"  host STRING,\n" +
"  url STRING," +
"  public_date STRING" +
") partitioned by (public_date string) " +
"stored as PARQUET " +
"TBLPROPERTIES (\n" +
"  'sink.partition-commit.delay'='0 s',\n" +
"  'sink.partition-commit.trigger'='partition-time',\n" +
"  'sink.partition-commit.policy.kind'='metastore,success-file'" +
")";
tableEnv.executeSql(hiveSql);


tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url, 
DATE_FORMAT(public_date, '-MM-dd') FROM stream_tmp.source_table");