[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7950: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to trunk StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, HIVE-7950.4.patch, HIVE-7950.5.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.5.patch Fixed your nit, Sergey. Thanks for taking the time to review -- much appreciated. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, HIVE-7950.4.patch, HIVE-7950.5.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.4.patch Updated patch with feedback from Sergey on RB. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, HIVE-7950.4.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.3.patch Updated patch. Needed to make a small change after getting past the Tez bug. Added some more unit tests and tried to clean things up. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: hive-7950-tez-WIP.diff I took a look at the tez branch to see if I could add more resources to an existing session as you described, [~sershe]. Looking at the javadoc, I feel like this patch should work, but the query still errors out when the map inside the dag fails due to missing classes. I can see that the dag does get the extra jars localized: {noformat} 2014-09-08 23:20:34,823 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Added additional resources : [[file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-fate-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-core-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-trace-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-start-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/zookeeper-3.4.6.jar]] to classpath {noformat} But I'm still getting a NoClassDefFoundException on a class which is in accumulo-core.jar: {noformat} Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:183) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:384) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:281) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:73) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) ... 12 more Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.2.patch Patch against the tez branch which attempts to ensure that the Tez AM and the DAG both have the necessary extra local resources as required by a StorageHandler. Tried to add some tests which ensure modifications to TezTask work as expected. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Summary: StorageHandler resources aren't added to Tez Session if already Session is already Open (was: AccumuloStorageHandler doesn't work with Hive on Tez) StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that I've noticed already (probably more as I can get past the ones I already found): * Jars added to the classpath via tmpjars (which is done by the copied HBase Utils class) aren't available in the Tez Map task -- need to compare to HBaseStorageHandler and see if there is something magic happening * Configuration generated by the AccumuloStorageHandler doesn't make it all the way to the Configuration passed to the AccumuloOutputFormat (probably AccumuloInputFormat, too) {noformat} 2014-09-03 01:28:45,357 ERROR [TezChild] org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448) ... 23 more Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has not been
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Description: Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. was: Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that I've noticed already (probably more as I can get past the ones I already found): * Jars added to the classpath via tmpjars (which is done by the copied HBase Utils class) aren't available in the Tez Map task -- need to compare to HBaseStorageHandler and see if there is something magic happening * Configuration generated by the AccumuloStorageHandler doesn't make it all the way to the Configuration passed to the AccumuloOutputFormat (probably AccumuloInputFormat, too) {noformat} 2014-09-03 01:28:45,357 ERROR [TezChild] org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448) ... 23 more Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:553) at org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.getHiveRecordWriter(HivePassThroughOutputFormat.java:113) at
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Status: Patch Available (was: Open) StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950-1.diff StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)