[GitHub] [flink] lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] Add tests for different Hive table formats

2019-08-02 Thread GitBox
lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] 
Add tests for different Hive table formats
URL: https://github.com/apache/flink/pull/9264#discussion_r310340281
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableOutputFormat.java
 ##
 @@ -124,7 +124,7 @@
private transient int numNonPartitionColumns;
 
// SerDe in Hive-1.2.1 and Hive-2.3.4 can be of different classes, make 
sure to use a common base class
-   private transient Serializer serializer;
+   private transient Serializer recordSerDe;
 
 Review comment:
   It has to be a serializer because we need it to serialize records. Besides, 
using Object means we have to use reflection to call the `serialize` method. 
And if we do this for each record, it might hurt performance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] Add tests for different Hive table formats

2019-07-31 Thread GitBox
lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] 
Add tests for different Hive table formats
URL: https://github.com/apache/flink/pull/9264#discussion_r309091387
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
 ##
 @@ -494,8 +495,22 @@ private static CatalogBaseTable 
instantiateCatalogTable(Table hiveTable) {
String comment = properties.remove(HiveCatalogConfig.COMMENT);
 
// Table schema
+   List fields;
+   if 
(org.apache.hadoop.hive.ql.metadata.Table.hasMetastoreBasedSchema(hiveConf,
+   
hiveTable.getSd().getSerdeInfo().getSerializationLib())) {
 
 Review comment:
   Unfortunately we cannot use `Table.getCols()` because this method gets 
HiveConf from SessionState, which we don't possess on Flink side.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] Add tests for different Hive table formats

2019-07-30 Thread GitBox
lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] 
Add tests for different Hive table formats
URL: https://github.com/apache/flink/pull/9264#discussion_r309016831
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableOutputFormat.java
 ##
 @@ -256,10 +258,12 @@ public void configure(Configuration parameters) {
public void open(int taskNumber, int numTasks) throws IOException {
try {
StorageDescriptor sd = 
hiveTablePartition.getStorageDescriptor();
-   serializer = (AbstractSerDe) 
Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance();
+   serializer = (Serializer) 
Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance();
+   Preconditions.checkArgument(serializer instanceof 
Deserializer,
 
 Review comment:
   Any suggestions about the name? Like `serDe`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] Add tests for different Hive table formats

2019-07-30 Thread GitBox
lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] 
Add tests for different Hive table formats
URL: https://github.com/apache/flink/pull/9264#discussion_r309016668
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
 ##
 @@ -494,8 +495,22 @@ private static CatalogBaseTable 
instantiateCatalogTable(Table hiveTable) {
String comment = properties.remove(HiveCatalogConfig.COMMENT);
 
// Table schema
+   List fields;
+   if 
(org.apache.hadoop.hive.ql.metadata.Table.hasMetastoreBasedSchema(hiveConf,
+   
hiveTable.getSd().getSerdeInfo().getSerializationLib())) {
 
 Review comment:
   I think you mean `org.apache.hadoop.hive.ql.metadata.Table.getCols()`. But 
we only have an instance of `org.apache.hadoop.hive.metastore.api.Table` here. 
Do you think we should create a `org.apache.hadoop.hive.ql.metadata.Table` and 
call `getCols()`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] Add tests for different Hive table formats

2019-07-29 Thread GitBox
lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] 
Add tests for different Hive table formats
URL: https://github.com/apache/flink/pull/9264#discussion_r308516376
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/batch/connectors/hive/TableEnvHiveConnectorTest.java
 ##
 @@ -85,4 +87,40 @@ public void testDefaultPartitionName() throws Exception {
 
hiveShell.execute("drop database db1 cascade");
}
+
+   @Test
+   public void testDifferentFormats() throws Exception {
+   String[] formats = new String[]{"orc", "parquet", 
"sequencefile"};
 
 Review comment:
   I'll add test for csv. Since all other test cases are using text tables, I 
don't think we need to cover it here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] Add tests for different Hive table formats

2019-07-29 Thread GitBox
lirui-apache commented on a change in pull request #9264: [FLINK-13192][hive] 
Add tests for different Hive table formats
URL: https://github.com/apache/flink/pull/9264#discussion_r308514138
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableOutputFormat.java
 ##
 @@ -256,10 +258,12 @@ public void configure(Configuration parameters) {
public void open(int taskNumber, int numTasks) throws IOException {
try {
StorageDescriptor sd = 
hiveTablePartition.getStorageDescriptor();
-   serializer = (AbstractSerDe) 
Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance();
+   serializer = (Serializer) 
Class.forName(sd.getSerdeInfo().getSerializationLib()).newInstance();
+   Preconditions.checkArgument(serializer instanceof 
Deserializer,
 
 Review comment:
   The problem is `SerDeUtils.initializeSerDe` requires a `Deserializer`. So we 
have to do the cast if we want to reuse this util method. Since most, if not 
all, SerDe lib implement both Serializer and Deserializer, I suppose this cast 
is OK?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services