[jira] [Commented] (SQOOP-3421) Importing data from Oracle to Parquet as incremental dataset name fails
[ https://issues.apache.org/jira/browse/SQOOP-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747986#comment-16747986 ] Szabolcs Vasas commented on SQOOP-3421: --- Hi [~dmateusp], You have encountered a Kite limitation here. The problem is that since the table name is specified in SOME_SCHEMA.SOME_TABLE_NAME form Kite tries to create a dataset with that name but '.' is not permitted in Kite dataset names. The reason you get this error with Parquet file format only is that Kite was only used for Parquet reading/writing. Kite dependency has been removed from Sqoop a couple of months ago so this issue is resolved in the latest trunk but unfortunately we do not have any releases yet which contain the fix. Btw s3n file system is not deprecated you might want to use s3a in the future. Regards, Szabolcs > Importing data from Oracle to Parquet as incremental dataset name fails > --- > > Key: SQOOP-3421 > URL: https://issues.apache.org/jira/browse/SQOOP-3421 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.7 >Reporter: Daniel Mateus Pires >Priority: Minor > > Hi there, I'm trying to run the following to import an Oracle table into S3 > as Parquet: > {code:bash} > sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL > --where="rownum < 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password > some_password --username some_username --num-mappers 4 --split-by PRD_ID > --target-dir s3n://bucket/destination --temporary-rootdir > s3n://bucket/temp/destination --compress --check-column PRD_MODIFY_DT > --incremental lastmodified --map-column-java PRD_ATTR_TEXT=String --append > {code} > Version of Kite is: kite-data-s3-1.1.0.jar > Version of Sqoop is: 1.4.7 > And I'm getting the following error: > {code:text} > 19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0 > 19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file > file:/etc/hive/conf.dist/hive-site.xml > 19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.ValidationException: Dataset name > 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not > alphanumeric (plus '_') > org.kitesdk.data.ValidationException: Dataset name > 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not > alphanumeric (plus '_') > at > org.kitesdk.data.ValidationException.check(ValidationException.java:55) > at > org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) > at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) > at > org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) > at > org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) > at org.kitesdk.data.Datasets.create(Datasets.java:239) > at org.kitesdk.data.Datasets.create(Datasets.java:307) > at > org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:156) > at > org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:130) > at > org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:132) > at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264) > at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) > at > org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) > at org.apache.sqoop.Sqoop.run(Sqoop.java:147) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) > at org.apache.sqoop.Sqoop.main(Sqoop.java:252) > {code} > Importing as text file instead solves the issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SQOOP-3421) Importing data from Oracle to Parquet as incremental dataset name fails
[ https://issues.apache.org/jira/browse/SQOOP-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Mateus Pires updated SQOOP-3421: --- Description: Hi there, I'm trying to run the following to import an Oracle table into S3 as Parquet: {code:bash} sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL --where="rownum < 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password some_password --username some_username --num-mappers 4 --split-by PRD_ID --target-dir s3n://bucket/destination --temporary-rootdir s3n://bucket/temp/destination --compress --check-column PRD_MODIFY_DT --incremental lastmodified --map-column-java PRD_ATTR_TEXT=String --append {code} Version of Kite is: kite-data-s3-1.1.0.jar Version of Sqoop is: 1.4.7 And I'm getting the following error: {code:text} 19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0 19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not alphanumeric (plus '_') org.kitesdk.data.ValidationException: Dataset name 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not alphanumeric (plus '_') at org.kitesdk.data.ValidationException.check(ValidationException.java:55) at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) at org.kitesdk.data.Datasets.create(Datasets.java:239) at org.kitesdk.data.Datasets.create(Datasets.java:307) at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:156) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:130) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:132) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) {code} Importing as text file instead solves the issue was: Hi there, I'm trying to run the following to import an Oracle table into S3 as Parquet: sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL --where="rownum < 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password some_password --username some_username --num-mappers 4 --split-by PRD_ID --target-dir s3n://bucket/destination --temporary-rootdir s3n://bucket/temp/destination --compress --check-column PRD_MODIFY_DT --incremental lastmodified --map-column-java PRD_ATTR_TEXT=String --append Version of Kite is: kite-data-s3-1.1.0.jar Version of Sqoop is: 1.4.7 And I'm getting the following error: {code:text} 19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0 19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not alphanumeric (plus '_') org.kitesdk.data.ValidationException: Dataset name 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not alphanumeric (plus '_') at org.kitesdk.data.ValidationException.check(ValidationException.java:55) at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) at org.kitesdk.data.Datasets.create(Datasets.java:239) at org.kitesdk.data.Datasets.create(Datasets.java:307) at
[jira] [Created] (SQOOP-3421) Importing data from Oracle to Parquet as incremental dataset name fails
Daniel Mateus Pires created SQOOP-3421: -- Summary: Importing data from Oracle to Parquet as incremental dataset name fails Key: SQOOP-3421 URL: https://issues.apache.org/jira/browse/SQOOP-3421 Project: Sqoop Issue Type: Bug Affects Versions: 1.4.7 Reporter: Daniel Mateus Pires Hi there, I'm trying to run the following to import an Oracle table into S3 as Parquet: sqoop import --connect jdbc:oracle:thin:@//some.host:1521/ORCL --where="rownum < 100" --table SOME_SCHEMA.SOME_TABLE_NAME --password some_password --username some_username --num-mappers 4 --split-by PRD_ID --target-dir s3n://bucket/destination --temporary-rootdir s3n://bucket/temp/destination --compress --check-column PRD_MODIFY_DT --incremental lastmodified --map-column-java PRD_ATTR_TEXT=String --append Version of Kite is: kite-data-s3-1.1.0.jar Version of Sqoop is: 1.4.7 And I'm getting the following error: {code:text} 19/01/21 13:20:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM SOME_SCHEMA.SOME_TABLE_NAME t WHERE 1=0 19/01/21 13:20:34 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 19/01/21 13:20:35 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not alphanumeric (plus '_') org.kitesdk.data.ValidationException: Dataset name 47a2cf963b82475d8eba78c822403204_SOME_SCHEMA.SOME_TABLE_NAME is not alphanumeric (plus '_') at org.kitesdk.data.ValidationException.check(ValidationException.java:55) at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105) at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68) at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137) at org.kitesdk.data.Datasets.create(Datasets.java:239) at org.kitesdk.data.Datasets.create(Datasets.java:307) at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:156) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:130) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:132) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692) at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:454) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) {code} Importing as text file instead solves the issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SQOOP-3042) Sqoop does not clear compile directory under /tmp/sqoop-/compile automatically
[ https://issues.apache.org/jira/browse/SQOOP-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747822#comment-16747822 ] Denes Bodo commented on SQOOP-3042: --- I opened SQOOP-3420 to track the requirements of log message change. > Sqoop does not clear compile directory under /tmp/sqoop-/compile > automatically > > > Key: SQOOP-3042 > URL: https://issues.apache.org/jira/browse/SQOOP-3042 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.6 >Reporter: Eric Lin >Assignee: Eric Lin >Priority: Critical > Labels: patch > Fix For: 3.0.0 > > Attachments: SQOOP-3042.1.patch, SQOOP-3042.2.patch, > SQOOP-3042.4.patch, SQOOP-3042.5.patch, SQOOP-3042.6.patch, > SQOOP-3042.7.patch, SQOOP-3042.9.patch > > > After running sqoop, all the temp files generated by ClassWriter are left > behind on disk, so anyone can check those JAVA files to see the schema of > those tables that Sqoop has been interacting with. By default, the directory > is under /tmp/sqoop-/compile. > In class org.apache.sqoop.SqoopOptions, function getNonceJarDir(), I can see > that we did add "deleteOnExit" on the temp dir: > {code} > for (int attempts = 0; attempts < MAX_DIR_CREATE_ATTEMPTS; attempts++) { > hashDir = new File(baseDir, RandomHash.generateMD5String()); > while (hashDir.exists()) { > hashDir = new File(baseDir, RandomHash.generateMD5String()); > } > if (hashDir.mkdirs()) { > // We created the directory. Use it. > // If this directory is not actually filled with files, delete it > // when the JVM quits. > hashDir.deleteOnExit(); > break; > } > } > {code} > However, I believe it failed to delete due to directory is not empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SQOOP-3420) Invalid ERROR message initiates false alarms
[ https://issues.apache.org/jira/browse/SQOOP-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747806#comment-16747806 ] Denes Bodo commented on SQOOP-3420: --- [~ericlin], [~vasas], [~vasubramanian] Do you think it should be *info* level or a *warn* is more suitable in this case? * Former level was debug, so it shall not be *warn*. * This message is printed when Sqoop cannot perform a backup operation so *warn* level is more suitable in this case. If any of us has more experience in operation and has connection with real life users: what would be the better message type? > Invalid ERROR message initiates false alarms > > > Key: SQOOP-3420 > URL: https://issues.apache.org/jira/browse/SQOOP-3420 > Project: Sqoop > Issue Type: Bug >Affects Versions: 1.4.7 >Reporter: Denes Bodo >Priority: Critical > Labels: usability > > In SQOOP-3042, a debug message was refactored to be error instead means false > alarms in customer log analyser. After understanding the functionality it is > recommended to use info level message instead of error in case when > ImportTool is unable to backup generated .java file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SQOOP-3420) Invalid ERROR message initiates false alarms
Denes Bodo created SQOOP-3420: - Summary: Invalid ERROR message initiates false alarms Key: SQOOP-3420 URL: https://issues.apache.org/jira/browse/SQOOP-3420 Project: Sqoop Issue Type: Bug Affects Versions: 1.4.7 Reporter: Denes Bodo In SQOOP-3042, a debug message was refactored to be error instead means false alarms in customer log analyser. After understanding the functionality it is recommended to use info level message instead of error in case when ImportTool is unable to backup generated .java file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)