[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-9407: Fix Version/s: (was: 1.14.0) 1.15.0 > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, > usability > Fix For: 1.15.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9407: -- Labels: auto-deprioritized-major pull-request-available usability (was: pull-request-available stale-major usability) Priority: Minor (was: Major) This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, > usability > Fix For: 1.14.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz updated FLINK-9407: Fix Version/s: (was: 1.13.0) 1.14.0 > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Priority: Major > Labels: pull-request-available, stale-major, usability > Fix For: 1.14.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-9407: -- Labels: pull-request-available stale-major usability (was: pull-request-available usability) > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Priority: Major > Labels: pull-request-available, stale-major, usability > Fix For: 1.13.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-9407: -- Fix Version/s: (was: 1.12.0) 1.13.0 > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Priority: Major > Labels: pull-request-available, usability > Fix For: 1.13.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated FLINK-9407: -- Fix Version/s: (was: 1.11.0) 1.12.0 > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Priority: Major > Labels: pull-request-available, usability > Fix For: 1.12.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-9407: Labels: pull-request-available usability (was: pull-request-available) > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available, usability > Fix For: 1.11.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-9407: Fix Version/s: 1.11.0 > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-9407: -- Labels: pull-request-available (was: ) > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated FLINK-9407: Description: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. Below, FYI. I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we had written down before. But I will give more tests in the next couple of days. Including the performance under compression with short checkpoint intervals. And more UTs. {code:java} scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> scala> res1.registerTempTable("tablerice") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select * from tablerice") res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> res3.show(3) +-+---+---+ | name|age|married| +-+---+---+ |Sagar| 26| false| |Sagar| 30| false| |Sagar| 34| false| +-+---+---+ only showing top 3 rows {code} was: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. Below, FYI. I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we had written down before. But I will give more tests in the next couple of days. Including the performance under compression. And more UTs. {code:java} scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> scala> res1.registerTempTable("tablerice") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select * from tablerice") res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> res3.show(3) +-+---+---+ | name|age|married| +-+---+---+ |Sagar| 26| false| |Sagar| 30| false| |Sagar| 34| false| +-+---+---+ only showing top 3 rows {code} > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression with > short checkpoint intervals. And more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated FLINK-9407: Labels: (was: patch-available pull-request-available) > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression. And > more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated FLINK-9407: Description: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. Below, FYI. I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we had written down before. But I will give more tests in the next couple of days. Including the performance under compression. And more UTs. {code:java} scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> scala> res1.registerTempTable("tablerice") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select * from tablerice") res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> res3.show(3) +-+---+---+ | name|age|married| +-+---+---+ |Sagar| 26| false| |Sagar| 30| false| |Sagar| 34| false| +-+---+---+ only showing top 3 rows {code} was: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. Below, FYI. I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we written down before. But I will give more tests in the next couple of days. Including the performance under compression. And more UTs. {code:java} scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> scala> res1.registerTempTable("tablerice") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select * from tablerice") res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> res3.show(3) +-+---+---+ | name|age|married| +-+---+---+ |Sagar| 26| false| |Sagar| 30| false| |Sagar| 34| false| +-+---+---+ only showing top 3 rows {code} > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: patch-available, pull-request-available > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we had written down before. But I will give more tests in > the next couple of days. Including the performance under compression. And > more UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated FLINK-9407: Description: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. Below, FYI. I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we written down before. But I will give more tests in the next couple of days. Including the performance under compression. And more UTs. {code:java} scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> scala> res1.registerTempTable("tablerice") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select * from tablerice") res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> res3.show(3) +-+---+---+ | name|age|married| +-+---+---+ |Sagar| 26| false| |Sagar| 30| false| |Sagar| 34| false| +-+---+---+ only showing top 3 rows {code} was: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. Below, FYI. I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we written down before. But I will give more tests in the next couple of days. Including the performance under compression. And more UT tests. {code:java} scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> scala> res1.registerTempTable("tablerice") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select * from tablerice") res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> res3.show(3) +-+---+---+ | name|age|married| +-+---+---+ |Sagar| 26| false| |Sagar| 30| false| |Sagar| 34| false| +-+---+---+ only showing top 3 rows {code} > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: patch-available, pull-request-available > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we written down before. But I will give more tests in the > next couple of days. Including the performance under compression. And more > UTs. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated FLINK-9407: Description: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. Below, FYI. I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we written down before. But I will give more tests in the next couple of days. Including the performance under compression. And more UT tests. {code:java} scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> scala> res1.registerTempTable("tablerice") warning: there was one deprecation warning; re-run with -deprecation for details scala> spark.sql("select * from tablerice") res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field] scala> res3.show(3) +-+---+---+ | name|age|married| +-+---+---+ |Sagar| 26| false| |Sagar| 30| false| |Sagar| 34| false| +-+---+---+ only showing top 3 rows {code} was:Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: patch-available, pull-request-available > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. > Below, FYI. > I tested the PR and verify the results with spark sql. Obviously, we can get > the results of what we written down before. But I will give more tests in the > next couple of days. Including the performance under compression. And more UT > tests. > {code:java} > scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21") > res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> > scala> res1.registerTempTable("tablerice") > warning: there was one deprecation warning; re-run with -deprecation for > details > scala> spark.sql("select * from tablerice") > res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more > field] > scala> res3.show(3) > +-+---+---+ > | name|age|married| > +-+---+---+ > |Sagar| 26| false| > |Sagar| 30| false| > |Sagar| 34| false| > +-+---+---+ > only showing top 3 rows > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-9407: -- Labels: patch-available pull-request-available (was: patch-available) > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: patch-available, pull-request-available > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated FLINK-9407: Labels: patch-available (was: ) > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > Labels: patch-available > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9407) Support orc rolling sink writer
[ https://issues.apache.org/jira/browse/FLINK-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingleizhang updated FLINK-9407: Description: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling sink. (was: Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and {{AvroKeyValueSinkWriter}}. I would suggest add orc writer for rolling sink.) > Support orc rolling sink writer > --- > > Key: FLINK-9407 > URL: https://issues.apache.org/jira/browse/FLINK-9407 > Project: Flink > Issue Type: New Feature > Components: filesystem-connector >Reporter: mingleizhang >Assignee: mingleizhang >Priority: Major > > Currently, we only support {{StringWriter}}, {{SequenceFileWriter}} and > {{AvroKeyValueSinkWriter}}. I would suggest add an orc writer for rolling > sink. -- This message was sent by Atlassian JIRA (v7.6.3#76005)