[jira] [Commented] (SQOOP-3421) Importing data from Oracle to Parquet as incremental dataset name fails

2019-01-21 Thread Szabolcs Vasas (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747986#comment-16747986
 ] 

Szabolcs Vasas commented on SQOOP-3421:
---

Hi [~dmateusp],

You have encountered a Kite limitation here. The problem is that since the 
table name is specified in SOME_SCHEMA.SOME_TABLE_NAME form Kite tries to 
create a dataset with that name but '.' is not permitted in Kite dataset names. 
The reason you get this error with Parquet file format only is that Kite was 
only used for Parquet reading/writing.
Kite dependency has been removed from Sqoop a couple of months ago so this 
issue is resolved in the latest trunk but unfortunately we do not have any 
releases yet which contain the fix.

Btw s3n file system is not deprecated you might want to use s3a in the future.

Regards,
Szabolcs

> Importing data from Oracle to Parquet as incremental dataset name fails
> ---
>
> Key: SQOOP-3421
> URL: https://issues.apache.org/jira/browse/SQOOP-3421
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Mateus Pires
>Priority: Minor
>
> Hi there, I'm trying to run the following to import an Oracle table into S3 
> as Parquet:
> {code:bash}
> sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL 
> --where="rownum < 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password 
> some_password --username some_username --num-mappers 4 --split-by PRD_ID 
> --target-dir s3n://bucket/destination --temporary-rootdir 
> s3n://bucket/temp/destination --compress --check-column PRD_MODIFY_DT 
> --incremental lastmodified --map-column-java PRD_ATTR_TEXT=String --append
> {code}
> Version of Kite is: kite-data-s3-1.1.0.jar
> Version of Sqoop is: 1.4.7
> And I'm getting the following error:
> {code:text}
> 19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT 
> t.* FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0
> 19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file 
> file:/etc/hive/conf.dist/hive-site.xml
> 19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: 
> org.kitesdk.data.ValidationException: Dataset name 
> 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
> alphanumeric (plus '_')
> org.kitesdk.data.ValidationException: Dataset name 
> 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
> alphanumeric (plus '_')
>   at 
> org.kitesdk.data.ValidationException.check(ValidationException.java:55)
>   at 
> org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
>   at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
>   at 
> org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
>   at 
> org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
>   at org.kitesdk.data.Datasets.create(Datasets.java:239)
>   at org.kitesdk.data.Datasets.create(Datasets.java:307)
>   at 
> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:156)
>   at 
> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:130)
>   at 
> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:132)
>   at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264)
>   at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
>   at 
> org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454)
>   at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
>   at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
>   at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>   at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> {code}
> Importing as text file instead solves the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SQOOP-3421) Importing data from Oracle to Parquet as incremental dataset name fails

2019-01-21 Thread Daniel Mateus Pires (JIRA)


 [ 
https://issues.apache.org/jira/browse/SQOOP-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Mateus Pires updated SQOOP-3421:
---
Description: 
Hi there, I'm trying to run the following to import an Oracle table into S3 as 
Parquet:


{code:bash}

sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL --where="rownum 
< 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password some_password --username 
some_username --num-mappers 4 --split-by PRD_ID --target-dir 
s3n://bucket/destination --temporary-rootdir s3n://bucket/temp/destination 
--compress --check-column PRD_MODIFY_DT --incremental lastmodified 
--map-column-java PRD_ATTR_TEXT=String --append
{code}

Version of Kite is: kite-data-s3-1.1.0.jar
Version of Sqoop is: 1.4.7

And I'm getting the following error:

{code:text}
19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0
19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file 
file:/etc/hive/conf.dist/hive-site.xml
19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: 
org.kitesdk.data.ValidationException: Dataset name 
47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name 
47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
alphanumeric (plus '_')
at 
org.kitesdk.data.ValidationException.check(ValidationException.java:55)
at 
org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
at 
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
at 
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
at org.kitesdk.data.Datasets.create(Datasets.java:239)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at 
org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:156)
at 
org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:130)
at 
org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:132)
at 
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at 
org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
{code}

Importing as text file instead solves the issue

  was:
Hi there, I'm trying to run the following to import an Oracle table into S3 as 
Parquet:

sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL --where="rownum 
< 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password some_password --username 
some_username --num-mappers 4 --split-by PRD_ID --target-dir 
s3n://bucket/destination --temporary-rootdir s3n://bucket/temp/destination 
--compress --check-column PRD_MODIFY_DT --incremental lastmodified 
--map-column-java PRD_ATTR_TEXT=String --append

Version of Kite is: kite-data-s3-1.1.0.jar
Version of Sqoop is: 1.4.7

And I'm getting the following error:

{code:text}
19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0
19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file 
file:/etc/hive/conf.dist/hive-site.xml
19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: 
org.kitesdk.data.ValidationException: Dataset name 
47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name 
47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
alphanumeric (plus '_')
at 
org.kitesdk.data.ValidationException.check(ValidationException.java:55)
at 
org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
at 
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
at 
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
at org.kitesdk.data.Datasets.create(Datasets.java:239)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at 

[jira] [Created] (SQOOP-3421) Importing data from Oracle to Parquet as incremental dataset name fails

2019-01-21 Thread Daniel Mateus Pires (JIRA)
Daniel Mateus Pires created SQOOP-3421:
--

 Summary: Importing data from Oracle to Parquet as incremental 
dataset name fails
 Key: SQOOP-3421
 URL: https://issues.apache.org/jira/browse/SQOOP-3421
 Project: Sqoop
  Issue Type: Bug
Affects Versions: 1.4.7
Reporter: Daniel Mateus Pires


Hi there, I'm trying to run the following to import an Oracle table into S3 as 
Parquet:

sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL --where="rownum 
< 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password some_password --username 
some_username --num-mappers 4 --split-by PRD_ID --target-dir 
s3n://bucket/destination --temporary-rootdir s3n://bucket/temp/destination 
--compress --check-column PRD_MODIFY_DT --incremental lastmodified 
--map-column-java PRD_ATTR_TEXT=String --append

Version of Kite is: kite-data-s3-1.1.0.jar
Version of Sqoop is: 1.4.7

And I'm getting the following error:

{code:text}
19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0
19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file 
file:/etc/hive/conf.dist/hive-site.xml
19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: 
org.kitesdk.data.ValidationException: Dataset name 
47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name 
47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not 
alphanumeric (plus '_')
at 
org.kitesdk.data.ValidationException.check(ValidationException.java:55)
at 
org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
at 
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
at 
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
at org.kitesdk.data.Datasets.create(Datasets.java:239)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at 
org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:156)
at 
org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:130)
at 
org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:132)
at 
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at 
org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
{code}

Importing as text file instead solves the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically

2019-01-21 Thread Denes Bodo (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747822#comment-16747822
 ] 

Denes Bodo commented on SQOOP-3042:
---

I opened SQOOP-3420 to track the requirements of log message change.

> Sqoop does not clear compile directory under /tmp/sqoop-/compile 
> automatically
> 
>
> Key: SQOOP-3042
> URL: https://issues.apache.org/jira/browse/SQOOP-3042
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Eric Lin
>Assignee: Eric Lin
>Priority: Critical
>  Labels: patch
> Fix For: 3.0.0
>
> Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, 
> SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, 
> SQOOP-3042.7.patch, SQOOP-3042.9.patch
>
>
> After running sqoop, all the temp files generated by ClassWriter are left 
> behind on disk, so anyone can check those JAVA files to see the schema of 
> those tables that Sqoop has been interacting with. By default, the directory 
> is under /tmp/sqoop-/compile.
> In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see 
> that we did add "deleteOnExit" on the temp dir:
> {code}
> for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) {
>   hashDir = new File(baseDir, RandomHash.generateMD5String());
>   while (hashDir.exists()) {
> hashDir = new File(baseDir, RandomHash.generateMD5String());
>   }
>   if (hashDir.mkdirs()) {
> // We created the directory. Use it.
> // If this directory is not actually filled with files, delete it
> // when the JVM quits.
> hashDir.deleteOnExit();
> break;
>   }
> }
> {code}
> However, I believe it failed to delete due to directory is not empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3420) Invalid ERROR message initiates false alarms

2019-01-21 Thread Denes Bodo (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747806#comment-16747806
 ] 

Denes Bodo commented on SQOOP-3420:
---

[~ericlin], [~vasas], [~vasubramanian] Do you think it should be *info* level 
or a *warn* is more suitable in this case?
 * Former level was debug, so it shall not be *warn*.
 * This message is printed when Sqoop cannot perform a backup operation so 
*warn* level is more suitable in this case.

If any of us has more experience in operation and has connection with real life 
users: what would be the better message type?

> Invalid ERROR message initiates false alarms
> 
>
> Key: SQOOP-3420
> URL: https://issues.apache.org/jira/browse/SQOOP-3420
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Denes Bodo
>Priority: Critical
>  Labels: usability
>
> In SQOOP-3042, a debug message was refactored to be error instead means false 
> alarms in customer log analyser. After understanding the functionality it is 
> recommended to use info level message instead of error in case when 
> ImportTool is unable to backup generated .java file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3420) Invalid ERROR message initiates false alarms

2019-01-21 Thread Denes Bodo (JIRA)
Denes Bodo created SQOOP-3420:
-

 Summary: Invalid ERROR message initiates false alarms
 Key: SQOOP-3420
 URL: https://issues.apache.org/jira/browse/SQOOP-3420
 Project: Sqoop
  Issue Type: Bug
Affects Versions: 1.4.7
Reporter: Denes Bodo


In SQOOP-3042, a debug message was refactored to be error instead means false 
alarms in customer log analyser. After understanding the functionality it is 
recommended to use info level message instead of error in case when ImportTool 
is unable to backup generated .java file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)