[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-09-29 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534371#comment-15534371
 ] 

Rajesh Balamohan commented on TEZ-2741:
---

Key/Val references can not be updated at the higher up in the chain. 
Alternative is to use {{--hiveconf hive.compute.splits.in.am=false}} to 
workaround this issue.

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-09-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526864#comment-15526864
 ] 

Hitesh Shah commented on TEZ-2741:
--

[~rajesh.balamohan] any updates on this? 

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-09-01 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456304#comment-15456304
 ] 

Rajesh Balamohan commented on TEZ-2741:
---

Thanks for reverting the patch [~hitesh]. The issue is due to the fact that 
value being reset would be local to the method and higher level apps might end 
up using the object (e.g TestGroupedSplits.testFormat) which is different than 
this. Will check on fixing it.

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-08-30 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450463#comment-15450463
 ] 

Hitesh Shah commented on TEZ-2741:
--

[~rajesh.balamohan] [~gopalv] It seems like this commit broke the unit test - 
TestGroupedSplits#testFormat to be more specific. Any chance either of you can 
look at it soon or should I revert the patch until this can be looked at?  

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Fix For: 0.9.0
>
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-08-17 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425645#comment-15425645
 ] 

Rajesh Balamohan commented on TEZ-2741:
---

I haven't been able to test it with Pig+Hive in this case.

Patch LGTM;  +1.  creates KV when moving to next split. 

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-08-16 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423535#comment-15423535
 ] 

TezQA commented on TEZ-2741:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12752341/TEZ-2741.1.patch
  against master revision d3fd828.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.hadoop.mapred.split.TestGroupedSplits

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1916//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1916//console

This message is automatically generated.

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-08-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423405#comment-15423405
 ] 

Hitesh Shah commented on TEZ-2741:
--

[~rajesh.balamohan] any feedback on the patch? 

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-05-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303501#comment-15303501
 ] 

Gopal V commented on TEZ-2741:
--

[~rajesh.balamohan]: the reported bug cannot be reproduced by only using Hive.

The bug scenario requires PIG written SequenceFiles in the same directory as 
Hive written one (Specifically, PIG data was not generated via HCatalog).



> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-05-11 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281195#comment-15281195
 ] 

Rajesh Balamohan commented on TEZ-2741:
---

Can you please provide details on which version of hive/tez is having this 
issue?
I tried it on tez master + hive 2.1.0-SNAPSHOT and do not see this issue.

e.g based on the txt file attached in this jira.  I tried creating another 
custom sequence file (in p=2 and p=3)
{noformat}
hive> select * from foo where p=1;
OK
fdaljf;lajdfla;jfl;a1
fdahflkadjf;lajdf   1
afdlja;fj;a 1
fa;ldajf;ja;dfa 1
j;fa;djf;lajf;af;lajfl;a1
afl;kdf;lajf;lajf;lajdlk;fjadl;fjal;jfal;kjfa;ldjfa;ljfa1
1
Time taken: 0.146 seconds, Fetched: 7 row(s)

hive> msck repair table foo;
OK
Time taken: 0.097 seconds

hive> select * from foo where p=2;
OK
test_0  2
test_1  2
test_2  2
test_3  2
test_4  2
test_5  2
test_6  2
test_7  2
test_8  2
test_9  2
hive> select * from foo where p=3;
OK
test_0  3
test_1  3
test_2  3
test_3  3
test_4  3
test_5  3
test_6  3
test_7  3
test_8  3
test_9  3
{noformat}



> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-05-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278030#comment-15278030
 ] 

Gopal V commented on TEZ-2741:
--

Yes, this is ready to be reviewed.

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-05-05 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273350#comment-15273350
 ] 

TezQA commented on TEZ-2741:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12752341/TEZ-2741.1.patch
  against master revision c3b8b85.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.hadoop.mapred.split.TestGroupedSplits
  org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1699//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1699//console

This message is automatically generated.

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2016-05-05 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273102#comment-15273102
 ] 

Hitesh Shah commented on TEZ-2741:
--

[~gopalv] is this ready to be reviewed? 

> Hive on Tez does not work well with Sequence Files Schema changes
> -
>
> Key: TEZ-2741
> URL: https://issues.apache.org/jira/browse/TEZ-2741
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajat Jain
>Assignee: Gopal V
> Attachments: TEZ-2741.1.patch, garbled_text
>
>
> {code}
> hive> create external table foo (a string) partitioned by (p string) stored 
> as sequencefile location 'hdfs:///user/hive/foo'
> # A useless file with some text in hdfs
> hive> create external table tmp_foo (a string) location 
> 'hdfs:///tmp/random_data'
> hive> insert overwrite table foo partition (p = '1') select * from tmp_foo
> {code}
> After this step, {{foo}} contains one partition with a text file.
> Now use this Java program to generate the second sequence file (but with a 
> different key class)
> {code}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
> import java.io.IOException;
> public class SequenceFileWriter {
>   public static void main(String[] args) throws IOException,
>   InterruptedException, ClassNotFoundException {
> Configuration conf = new Configuration();
> Job job = new Job(conf);
> job.setJobName("Convert Text");
> job.setJarByClass(Mapper.class);
> job.setMapperClass(Mapper.class);
> job.setReducerClass(Reducer.class);
> // increase if you need sorting or a special number of files
> job.setNumReduceTasks(0);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> job.setInputFormatClass(TextInputFormat.class);
> TextInputFormat.addInputPath(job, new Path("/tmp/random_data"));
> SequenceFileOutputFormat.setOutputPath(job, new 
> Path("/user/hive/foo/p=2/"));
> // submit and wait for completion
> job.waitForCompletion(true);
>   }
> }
> {code}
> Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
> with Tez with the following error:
> {code}
> hive> set hive.execution.engine=tez;
> hive> select count(*) from foo;
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
> diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
> org.apache.hadoop.io.BytesWritable is not class 
> org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: While processing file 
> hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key 

[jira] [Commented] (TEZ-2741) Hive on Tez does not work well with Sequence Files Schema changes

2015-08-25 Thread Rajat Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712130#comment-14712130
 ] 

Rajat Jain commented on TEZ-2741:
-

Thanks, Gopal. I'll verify this patch today and let you know.

 Hive on Tez does not work well with Sequence Files Schema changes
 -

 Key: TEZ-2741
 URL: https://issues.apache.org/jira/browse/TEZ-2741
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajat Jain
Assignee: Gopal V
 Attachments: TEZ-2741.1.patch, garbled_text


 {code}
 hive create external table foo (a string) partitioned by (p string) stored 
 as sequencefile location 'hdfs:///user/hive/foo'
 # A useless file with some text in hdfs
 hive create external table tmp_foo (a string) location 
 'hdfs:///tmp/random_data'
 hive insert overwrite table foo partition (p = '1') select * from tmp_foo
 {code}
 After this step, {{foo}} contains one partition with a text file.
 Now use this Java program to generate the second sequence file (but with a 
 different key class)
 {code}
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.BytesWritable;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.Mapper;
 import org.apache.hadoop.mapreduce.Reducer;
 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
 import java.io.IOException;
 public class SequenceFileWriter {
   public static void main(String[] args) throws IOException,
   InterruptedException, ClassNotFoundException {
 Configuration conf = new Configuration();
 Job job = new Job(conf);
 job.setJobName(Convert Text);
 job.setJarByClass(Mapper.class);
 job.setMapperClass(Mapper.class);
 job.setReducerClass(Reducer.class);
 // increase if you need sorting or a special number of files
 job.setNumReduceTasks(0);
 job.setOutputKeyClass(LongWritable.class);
 job.setOutputValueClass(Text.class);
 job.setOutputFormatClass(SequenceFileOutputFormat.class);
 job.setInputFormatClass(TextInputFormat.class);
 TextInputFormat.addInputPath(job, new Path(/tmp/random_data));
 SequenceFileOutputFormat.setOutputPath(job, new 
 Path(/user/hive/foo/p=2/));
 // submit and wait for completion
 job.waitForCompletion(true);
   }
 }
 {code}
 Now run {{select count(*) from foo;}}. It passes with MapReduce, but fails 
 with Tez with the following error:
 {code}
 hive set hive.execution.engine=tez;
 hive select count(*) from foo;
 Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1438013895843_0007_1_00, 
 diagnostics=[Task failed, taskId=task_1438013895843_0007_1_00_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
 java.io.IOException: While processing file 
 hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
 org.apache.hadoop.io.BytesWritable is not class 
 org.apache.hadoop.io.LongWritable
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: java.io.IOException: While processing file 
 hdfs://localhost:9000/user/hive/foo/p=2/part-m-0. wrong key class: 
 org.apache.hadoop.io.BytesWritable is not class