[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858997#comment-13858997 ] Antonio Bastardo commented on HIVE-1996: Is this bug solved in hive-0.10.0_cdh4.2.0_20130411_1129? I'm having this behavior in this version. Regards LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0, 0.8.1 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859024#comment-13859024 ] Hive QA commented on HIVE-1996: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12503097/HIVE-1996.2.Patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/772/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/772/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-772/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/Driver.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update U hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java Fetching external item into 'hcatalog/src/test/e2e/harness' Updated external to revision 1554298. Updated to revision 1554298. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12503097 LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0, 0.8.1 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166594#comment-13166594 ] Namit Jain commented on HIVE-1996: -- Yongqiang, can you take a look ? LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147146#comment-13147146 ] jirapos...@reviews.apache.org commented on HIVE-1996: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1610/ --- (Updated 2011-11-09 16:44:55.708392) Review request for hive, Carl Steinbach and John Sichi. Changes --- Load rename made it configurable. Summary --- LOAD DATA INPATH fails when the table already contains a file of the same name. If any name confilcts occurs it will rename the file, After file name got changed it is trying to load with the old name because of this load is failed. Now we have changed the code like, load with the changed filename for that introduced a map it will maintain the old name and changed filename as key value pair and while loading need to use this map. This addresses bug HIVE-1996. https://issues.apache.org/jira/browse/HIVE-1996 Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1198626 trunk/conf/hive-default.xml 1198626 trunk/data/conf/hive-site.xml 1198626 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1198626 trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1198626 trunk/ql/src/test/queries/clientpositive/input47.q PRE-CREATION trunk/ql/src/test/results/clientpositive/input47.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1610/diff Testing --- Added a test case for this scenario. Thanks, chinna LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.2.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109247#comment-13109247 ] Chinna Rao Lalam commented on HIVE-1996: Hi He Yongqiang, Here these 2 scenarios need to consider 1)If rename disabled load one data folder that contains 10 files like 1.txt,2.txt...,10.txt here in the table already one file present with same name 5.txt. While loading 5.txt it will throw the exception and operation will fail but here already loaded file(1.txt,2.txt4.txt) will present... 1.txt,2.txt...,10.txt here in the table already one file present with same name 6.txt. While loading 6.txt it will throw the exception and operation will fail but here already loaded file(1.txt,2.txt4.txt,5.txt ) will present... So its mainly dependent on the order and can cause inconsistencies. 2)In the current implementation also org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Path, Path, FileSystem) if any of the file it is unable to rename it will throw exception but for the same operation some file will be loaded. Proposed Sol: While loading if any exception comes note that file as unloaded file and continue the load with remaining files and operation will fail with the exception and unloaded file information so user can retry loading the unloaded files alone. Here there is no inconsistent data. Pls give u r inputs LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103814#comment-13103814 ] Chinna Rao Lalam commented on HIVE-1996: Agreed, I will update the patch with configuration. LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088903#comment-13088903 ] Chinna Rao Lalam commented on HIVE-1996: This scenario will work if load from local but it will fail if loading from File-system. To replicate this scenarios i have used this queries create table load_overwrite2 (key string, value string) stored as textfile location 'file:/tmp1/load2_overwrite2'; so as part of this query execution it should create this file:/tmp1/load2_overwrite2. I have verified this in my environment it is working without fail. Pls let me know if any issues. LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088911#comment-13088911 ] jirapos...@reviews.apache.org commented on HIVE-1996: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1610/ --- Review request for hive, Carl Steinbach and John Sichi. Summary --- LOAD DATA INPATH fails when the table already contains a file of the same name. If any name confilcts occurs it will rename the file, After file name got changed it is trying to load with the old name because of this load is failed. Now we have changed the code like, load with the changed filename for that introduced a map it will maintain the old name and changed filename as key value pair and while loading need to use this map. This addresses bug HIVE-1996. https://issues.apache.org/jira/browse/HIVE-1996 Diffs - trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1160102 trunk/ql/src/test/queries/clientpositive/input44.q PRE-CREATION trunk/ql/src/test/results/clientpositive/input44.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1610/diff Testing --- Added a test case for this scenario. Thanks, chinna LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036197#comment-13036197 ] Chinna Rao Lalam commented on HIVE-1996: After file name got changed it is trying to load with the old name because of this load is failed. Now we have changed the code like, load with the changed filename for that introduced a map it will maintain the old name and changed filename as key value pair and while loading need to use this map. LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Chinna Rao Lalam Attachments: HIVE-1996.Patch Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but {{Hive.copyFiles}} doesn't see the change in {{srcs}} as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1996) LOAD DATA INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995499#comment-12995499 ] Kirk True commented on HIVE-1996: - This is very closely related to, but not the same as, HIVE-307. That bug specifically pertains to {{LOCAL}} files. LOAD DATA INPATH fails when the table already contains a file of the same name Key: HIVE-1996 URL: https://issues.apache.org/jira/browse/HIVE-1996 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Kirk True Assignee: Kirk True Steps: 1. From the command line copy the kv2.txt data file into the current user's HDFS directory: {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt kv2.txt}} 2. In Hive, create the table: {{create table tst_src1 (key_ int, value_ string);}} 3. Load the data into the table from HDFS: {{load data inpath './kv2.txt' into table tst_src1;}} 4. Repeat step 1 5. Repeat step 3 Expected: To have kv2.txt renamed in HDFS and then copied to the destination as per HIVE-307. Actual: File is renamed, but Hive.copyFiles doesn't see the change in srcs as it continues to use the same array elements (with the un-renamed, old file names). It crashes with this error: {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725) at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira