[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534371#comment-15534371 ] Rajesh Balamohan commented on TEZ-2741: --- Key/Val references can not be updated at the higher up in the chain. Alternative is to use {{--hiveconf hive.compute.splits.in.am=false}} to workaround this issue. > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526864#comment-15526864 ] Hitesh Shah commented on TEZ-2741: -- [~rajesh.balamohan] any updates on this? > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456304#comment-15456304 ] Rajesh Balamohan commented on TEZ-2741: --- Thanks for reverting the patch [~hitesh]. The issue is due to the fact that value being reset would be local to the method and higher level apps might end up using the object (e.g TestGroupedSplits.testFormat) which is different than this. Will check on fixing it. > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450463#comment-15450463 ] Hitesh Shah commented on TEZ-2741: -- [~rajesh.balamohan] [~gopalv] It seems like this commit broke the unit test - TestGroupedSplits#testFormat to be more specific. Any chance either of you can look at it soon or should I revert the patch until this can be looked at? > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Fix For: 0.9.0 > > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425645#comment-15425645 ] Rajesh Balamohan commented on TEZ-2741: --- I haven't been able to test it with Pig+Hive in this case. Patch LGTM; +1. creates KV when moving to next split. > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException:
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423535#comment-15423535 ] TezQA commented on TEZ-2741: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752341/TEZ-2741.1.patch against master revision d3fd828. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.hadoop.mapred.split.TestGroupedSplits Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1916//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1916//console This message is automatically generated. > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at >
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423405#comment-15423405 ] Hitesh Shah commented on TEZ-2741: -- [~rajesh.balamohan] any feedback on the patch? > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0.
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303501#comment-15303501 ] Gopal V commented on TEZ-2741: -- [~rajesh.balamohan]: the reported bug cannot be reproduced by only using Hive. The bug scenario requires PIG written SequenceFiles in the same directory as Hive written one (Specifically, PIG data was not generated via HCatalog). > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by:
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281195#comment-15281195 ] Rajesh Balamohan commented on TEZ-2741: --- Can you please provide details on which version of hive/tez is having this issue? I tried it on tez master + hive 2.1.0-SNAPSHOT and do not see this issue. e.g based on the txt file attached in this jira. I tried creating another custom sequence file (in p=2 and p=3) {noformat} hive> select * from foo where p=1; OK fdaljf;lajdfla;jfl;a1 fdahflkadjf;lajdf 1 afdlja;fj;a 1 fa;ldajf;ja;dfa 1 j;fa;djf;lajf;af;lajfl;a1 afl;kdf;lajf;lajf;lajdlk;fjadl;fjal;jfal;kjfa;ldjfa;ljfa1 1 Time taken: 0.146 seconds, Fetched: 7 row(s) hive> msck repair table foo; OK Time taken: 0.097 seconds hive> select * from foo where p=2; OK test_0 2 test_1 2 test_2 2 test_3 2 test_4 2 test_5 2 test_6 2 test_7 2 test_8 2 test_9 2 hive> select * from foo where p=3; OK test_0 3 test_1 3 test_2 3 test_3 3 test_4 3 test_5 3 test_6 3 test_7 3 test_8 3 test_9 3 {noformat} > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at >
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278030#comment-15278030 ] Gopal V commented on TEZ-2741: -- Yes, this is ready to be reviewed. > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: >
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273350#comment-15273350 ] TezQA commented on TEZ-2741: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752341/TEZ-2741.1.patch against master revision c3b8b85. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.hadoop.mapred.split.TestGroupedSplits org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1699//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1699//console This message is automatically generated. > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at >
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273102#comment-15273102 ] Hitesh Shah commented on TEZ-2741: -- [~gopalv] is this ready to be reviewed? > Hive on Tez does not work well with Sequence Files Schema changes > - > > Key: TEZ-2741 > URL: https://issues.apache.org/jira/browse/TEZ-2741 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajat Jain >Assignee: Gopal V > Attachments: TEZ-2741.1.patch, garbled_text > > > {code} > hive> create external table foo (a string) partitioned by (p string) stored > as sequencefile location 'hdfs:///user/hive/foo' > # A useless file with some text in hdfs > hive> create external table tmp_foo (a string) location > 'hdfs:///tmp/random_data' > hive> insert overwrite table foo partition (p = '1') select * from tmp_foo > {code} > After this step, {{foo}} contains one partition with a text file. > Now use this Java program to generate the second sequence file (but with a > different key class) > {code} > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; > import java.io.IOException; > public class SequenceFileWriter { > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > Configuration conf = new Configuration(); > Job job = new Job(conf); > job.setJobName("Convert Text"); > job.setJarByClass(Mapper.class); > job.setMapperClass(Mapper.class); > job.setReducerClass(Reducer.class); > // increase if you need sorting or a special number of files > job.setNumReduceTasks(0); > job.setOutputKeyClass(LongWritable.class); > job.setOutputValueClass(Text.class); > job.setOutputFormatClass(SequenceFileOutputFormat.class); > job.setInputFormatClass(TextInputFormat.class); > TextInputFormat.addInputPath(job, new Path("/tmp/random_data")); > SequenceFileOutputFormat.setOutputPath(job, new > Path("/user/hive/foo/p=2/")); > // submit and wait for completion > job.waitForCompletion(true); > } > } > {code} > Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails > with Tez with the following error: > {code} > hive> set hive.execution.engine=tez; > hive> select count(*) from foo; > Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, > diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: > org.apache.hadoop.io.BytesWritable is not class > org.apache.hadoop.io.LongWritable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: While processing file > hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key
[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes
[ https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712130#comment-14712130 ] Rajat Jain commented on TEZ-2741: - Thanks, Gopal. I'll verify this patch today and let you know. Hive on Tez does not work well with Sequence Files Schema changes - Key: TEZ-2741 URL: https://issues.apache.org/jira/browse/TEZ-2741 Project: Apache Tez Issue Type: Bug Reporter: Rajat Jain Assignee: Gopal V Attachments: TEZ-2741.1.patch, garbled_text {code} hive create external table foo (a string) partitioned by (p string) stored as sequencefile location 'hdfs:///user/hive/foo' # A useless file with some text in hdfs hive create external table tmp_foo (a string) location 'hdfs:///tmp/random_data' hive insert overwrite table foo partition (p = '1') select * from tmp_foo {code} After this step, {{foo}} contains one partition with a text file. Now use this Java program to generate the second sequence file (but with a different key class) {code} import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.BytesWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; import java.io.IOException; public class SequenceFileWriter { public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); Job job = new Job(conf); job.setJobName(Convert Text); job.setJarByClass(Mapper.class); job.setMapperClass(Mapper.class); job.setReducerClass(Reducer.class); // increase if you need sorting or a special number of files job.setNumReduceTasks(0); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); job.setOutputFormatClass(SequenceFileOutputFormat.class); job.setInputFormatClass(TextInputFormat.class); TextInputFormat.addInputPath(job, new Path(/tmp/random_data)); SequenceFileOutputFormat.setOutputPath(job, new Path(/user/hive/foo/p=2/)); // submit and wait for completion job.waitForCompletion(true); } } {code} Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails with Tez with the following error: {code} hive set hive.execution.engine=tez; hive select count(*) from foo; Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: While processing file hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: While processing file hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: org.apache.hadoop.io.BytesWritable is not class