[jira] [Commented] (SQOOP-3264) Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP

2017-11-29 Thread Michal Klempa (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270457#comment-16270457
 ] 

Michal Klempa commented on SQOOP-3264:
--

https://reviews.apache.org/r/64148/

> Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP
> --
>
> Key: SQOOP-3264
> URL: https://issues.apache.org/jira/browse/SQOOP-3264
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 1.4.6
>Reporter: Michal Klempa
>Priority: Minor
> Fix For: 1.4.7
>
>
> When importing JDBC SQL  Types:
> {code}
> public final static int DATE=  91;
> public final static int TIME=  92;
> public final static int TIMESTAMP   =  93;
> {code}
> Sqoop currently uses the org.apache.sqoop.hive.HiveTypes.toHiveType method, 
> where all of these types are mapped to STRING type.
> Given that in fact, the JDBC value returned is of type Long, let me propose 
> we can output the type for Hive as:
> {code}
> DATE -> TIMESTAMP
> TIME -> BIGINT
> TIMESTAMP -> TIMESTAMP
> {code}
> This is also in line with org.apache.sqoop.manager.ConnManager.toAvroType, 
> where the type is 
> {code}
> case Types.DATE:
> case Types.TIME:
> case Types.TIMESTAMP:
>   return Type.LONG;
> {code}
> Some of the connectors override the toJavaType:
> {code}
> org.apache.sqoop.manager.SQLServerManager
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> which may indicate different handling.
> The SQLServerManager uses Java String as the output type, because of 
> timezones.
> Same holds true for OraOopConnManager, although it has a separate 
> configuration boolean value 
> 'oraoop.timestamp.string' which controls whether the import will use 
> timezones and convert date types
> to Java String, or timezones are going to be dropped and import will behave 
> the 'sqoop way'.
> Both of these connectors already handle these types as String by default, 
> proposed change would not affect them.
> Other connectors are needed to be checked.
> Some of the connectors override the toHiveType:
> {code}
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> This connector uses the 'sqoop way':
> {code}
> String hiveType = super.toHiveType(sqlType);
> {code}
> and only when not resolved, the type used is decided:
> {code}
> if (hiveType == null) {
>   // http://wiki.apache.org/hadoop/Hive/Tutorial#Primitive_Types
>   if (sqlType == OraOopOracleQueries.getOracleType("BFILE")
>   || sqlType == OraOopOracleQueries.getOracleType("INTERVALYM")
>   || sqlType == OraOopOracleQueries.getOracleType("INTERVALDS")
>   || sqlType == OraOopOracleQueries.getOracleType("NCLOB")
>   || sqlType == OraOopOracleQueries.getOracleType("NCHAR")
>   || sqlType == OraOopOracleQueries.getOracleType("NVARCHAR")
>   || sqlType == OraOopOracleQueries.getOracleType("OTHER")
>   || sqlType == OraOopOracleQueries.getOracleType("ROWID")
>   || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPTZ")
>   || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPLTZ")
>   || sqlType == OraOopOracleQueries.getOracleType("STRUCT")) {
> hiveType = "STRING";
>   }
>   if (sqlType == OraOopOracleQueries.getOracleType("BINARY_FLOAT")) {
> hiveType = "FLOAT";
>   }
>   if (sqlType == OraOopOracleQueries.getOracleType("BINARY_DOUBLE")) {
> hiveType = "DOUBLE";
>   }
> }
> {code}
> This code is affected with proposed change. As the Hive TIMESTAMP is 
> timezone-less, we have to change the handling in this method - respect the 
> property 'oraoop.timestamp.string' - if true, output STRING hive type, if 
> false, go with 'sqoop way'.
> The Hive Type is only used when generating the table ddl (create statement) 
> and Hive can properly recognize the  JDBC compliant java.sql.Timestamp format 
> "-MM-DD HH:MM:SS.f", so no connector should be affected in a way, 
> that Hive would not read the resulting column values.
> However, thorough testing should be done on all connectors before releasing 
> any column type behavior changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SQOOP-3264) Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP

2017-11-29 Thread Boglarka Egyed (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270432#comment-16270432
 ] 

Boglarka Egyed commented on SQOOP-3264:
---

Hi [~michal.klempa],

Thanks for reporting this issue!

Currently Apache Sqoop does not accept pull requests, we use [Apache's Review 
Board|https://reviews.apache.org/r/59833/] instead.

Please consider the followings on Review Board:
* Project: Sqoop
* Summary: generate your summary using the issue's JIRA key + JIRA title
* Groups: add the relevant group so everyone on the project will know about 
your patch (sqoop)
* Bugs: add the issue's JIRA key so it's easy to navigate to the JIRA side
* Repository: sqoop-trunk for Sqoop1 (or sqoop-sqoop2 for Sqoop2)

Please add the link of the review as an external/web link so it's easy to 
navigate to the reviews side

If you would like to assign this ticket to yourself please let me know and I'll 
dd you to the Contributors list.

Thanks,
Bogi

> Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP
> --
>
> Key: SQOOP-3264
> URL: https://issues.apache.org/jira/browse/SQOOP-3264
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 1.4.6
>Reporter: Michal Klempa
>Priority: Minor
> Fix For: 1.4.7
>
>
> When importing JDBC SQL  Types:
> {code}
> public final static int DATE=  91;
> public final static int TIME=  92;
> public final static int TIMESTAMP   =  93;
> {code}
> Sqoop currently uses the org.apache.sqoop.hive.HiveTypes.toHiveType method, 
> where all of these types are mapped to STRING type.
> Given that in fact, the JDBC value returned is of type Long, let me propose 
> we can output the type for Hive as:
> {code}
> DATE -> TIMESTAMP
> TIME -> BIGINT
> TIMESTAMP -> TIMESTAMP
> {code}
> This is also in line with org.apache.sqoop.manager.ConnManager.toAvroType, 
> where the type is 
> {code}
> case Types.DATE:
> case Types.TIME:
> case Types.TIMESTAMP:
>   return Type.LONG;
> {code}
> Some of the connectors override the toJavaType:
> {code}
> org.apache.sqoop.manager.SQLServerManager
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> which may indicate different handling.
> The SQLServerManager uses Java String as the output type, because of 
> timezones.
> Same holds true for OraOopConnManager, although it has a separate 
> configuration boolean value 
> 'oraoop.timestamp.string' which controls whether the import will use 
> timezones and convert date types
> to Java String, or timezones are going to be dropped and import will behave 
> the 'sqoop way'.
> Both of these connectors already handle these types as String by default, 
> proposed change would not affect them.
> Other connectors are needed to be checked.
> Some of the connectors override the toHiveType:
> {code}
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> This connector uses the 'sqoop way':
> {code}
> String hiveType = super.toHiveType(sqlType);
> {code}
> and only when not resolved, the type used is decided:
> {code}
> if (hiveType == null) {
>   // http://wiki.apache.org/hadoop/Hive/Tutorial#Primitive_Types
>   if (sqlType == OraOopOracleQueries.getOracleType("BFILE")
>   || sqlType == OraOopOracleQueries.getOracleType("INTERVALYM")
>   || sqlType == OraOopOracleQueries.getOracleType("INTERVALDS")
>   || sqlType == OraOopOracleQueries.getOracleType("NCLOB")
>   || sqlType == OraOopOracleQueries.getOracleType("NCHAR")
>   || sqlType == OraOopOracleQueries.getOracleType("NVARCHAR")
>   || sqlType == OraOopOracleQueries.getOracleType("OTHER")
>   || sqlType == OraOopOracleQueries.getOracleType("ROWID")
>   || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPTZ")
>   || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPLTZ")
>   || sqlType == OraOopOracleQueries.getOracleType("STRUCT")) {
> hiveType = "STRING";
>   }
>   if (sqlType == OraOopOracleQueries.getOracleType("BINARY_FLOAT")) {
> hiveType = "FLOAT";
>   }
>   if (sqlType == OraOopOracleQueries.getOracleType("BINARY_DOUBLE")) {
> hiveType = "DOUBLE";
>   }
> }
> {code}
> This code is affected with proposed change. As the Hive TIMESTAMP is 
> timezone-less, we have to change the handling in this method - respect the 
> property 'oraoop.timestamp.string' - if true, output STRING hive type, if 
> false, go with 'sqoop way'.
> The Hive Type is only used when generating the table ddl (create statement) 
> and Hive can properly recognize the  JDBC compliant java.sql.Timestamp format 
> "-MM-DD HH:MM:SS.f", so no 

[jira] [Commented] (SQOOP-3264) Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269004#comment-16269004
 ] 

ASF GitHub Bot commented on SQOOP-3264:
---

GitHub user michalklempa opened a pull request:

https://github.com/apache/sqoop/pull/40

SQOOP-3264 Hive types for JDBC timestamp, date and time types are 
timestamp, timestamp and bigint respectively

This resolves https://issues.apache.org/jira/browse/SQOOP-3264
Although further testing on different databases is needed (help welcome).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/triviadata/sqoop 
SQOOP-3264_date_to_hive_timestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/sqoop/pull/40.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #40


commit ceefc33a6c5632f1300bddc73c20f7819959f10f
Author: Michal Klempa 
Date:   2017-11-28T16:30:28Z

SQOOP-3264 Hive types for JDBC timestamp, date and time types are 
timestamp, timestamp and bigint respectively




> Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP
> --
>
> Key: SQOOP-3264
> URL: https://issues.apache.org/jira/browse/SQOOP-3264
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 1.4.6
>Reporter: Michal Klempa
>Priority: Minor
> Fix For: 1.4.7
>
>
> When importing JDBC SQL  Types:
> {code}
> public final static int DATE=  91;
> public final static int TIME=  92;
> public final static int TIMESTAMP   =  93;
> {code}
> Sqoop currently uses the org.apache.sqoop.hive.HiveTypes.toHiveType method, 
> where all of these types are mapped to STRING type.
> Given that in fact, the JDBC value returned is of type Long, let me propose 
> we can output the type for Hive as:
> {code}
> DATE -> TIMESTAMP
> TIME -> BIGINT
> TIMESTAMP -> TIMESTAMP
> {code}
> This is also in line with org.apache.sqoop.manager.ConnManager.toAvroType, 
> where the type is 
> {code}
> case Types.DATE:
> case Types.TIME:
> case Types.TIMESTAMP:
>   return Type.LONG;
> {code}
> Some of the connectors override the toJavaType:
> {code}
> org.apache.sqoop.manager.SQLServerManager
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> which may indicate different handling.
> The SQLServerManager uses Java String as the output type, because of 
> timezones.
> Same holds true for OraOopConnManager, although it has a separate 
> configuration boolean value 
> 'oraoop.timestamp.string' which controls whether the import will use 
> timezones and convert date types
> to Java String, or timezones are going to be dropped and import will behave 
> the 'sqoop way'.
> Both of these connectors already handle these types as String by default, 
> proposed change would not affect them.
> Other connectors are needed to be checked.
> Some of the connectors override the toHiveType:
> {code}
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> This connector uses the 'sqoop way':
> {code}
> String hiveType = super.toHiveType(sqlType);
> {code}
> and only when not resolved, the type used is decided:
> {code}
> if (hiveType == null) {
>   // http://wiki.apache.org/hadoop/Hive/Tutorial#Primitive_Types
>   if (sqlType == OraOopOracleQueries.getOracleType("BFILE")
>   || sqlType == OraOopOracleQueries.getOracleType("INTERVALYM")
>   || sqlType == OraOopOracleQueries.getOracleType("INTERVALDS")
>   || sqlType == OraOopOracleQueries.getOracleType("NCLOB")
>   || sqlType == OraOopOracleQueries.getOracleType("NCHAR")
>   || sqlType == OraOopOracleQueries.getOracleType("NVARCHAR")
>   || sqlType == OraOopOracleQueries.getOracleType("OTHER")
>   || sqlType == OraOopOracleQueries.getOracleType("ROWID")
>   || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPTZ")
>   || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPLTZ")
>   || sqlType == OraOopOracleQueries.getOracleType("STRUCT")) {
> hiveType = "STRING";
>   }
>   if (sqlType == OraOopOracleQueries.getOracleType("BINARY_FLOAT")) {
> hiveType = "FLOAT";
>   }
>   if (sqlType == OraOopOracleQueries.getOracleType("BINARY_DOUBLE")) {
> hiveType = "DOUBLE";
>   }
> }
> {code}
> This code is affected with proposed change. As the Hive TIMESTAMP is 
> timezone-less, we have to change the handling in this method - respect the 
> property 'oraoop.timestamp.string' - if true, output STRING hive