[jira] [Created] (HIVE-17083) DagUtils overwrites any credentials already added
Josh Elser created HIVE-17083: - Summary: DagUtils overwrites any credentials already added Key: HIVE-17083 URL: https://issues.apache.org/jira/browse/HIVE-17083 Project: Hive Issue Type: Bug Components: Tez Reporter: Josh Elser Assignee: Josh Elser While working with a StorageHandler with hive.execution.engine=tez, I found that the credentials the storage handler was adding were not propagating to the dag. After a big of debugging/git-log, I found that DagUtils was overwriting the credentials which were already set. A quick patch locally seem to make things work again. Will put together a quick unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
Josh Elser created HIVE-16973: - Summary: Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2 Key: HIVE-16973 URL: https://issues.apache.org/jira/browse/HIVE-16973 Project: Hive Issue Type: Bug Components: Accumulo Storage Handler Reporter: Josh Elser Assignee: Josh Elser Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. Looking into it, it seems like the bit-rot got pretty bad. You'll see something like the following: {noformat} Caused by: java.io.IOException: Failed to unwrap AuthenticationToken at org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) at org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) {noformat} It appears that some of the code-paths changed since when I first did my testing (or I just did poor testing) and the delegation token was never being fetched/serialized. There also are some issues with fetching the delegation token from Accumulo properly which were addressed in ACCUMULO-4665 I believe it would also be best to just update the dependency to use Accumulo 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would otherwise get much more complicated with reflection -- Accumulo has moved on past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler
Josh Elser created HIVE-11755: - Summary: Incorrect method called with Kerberos enabled in AccumuloStorageHandler Key: HIVE-11755 URL: https://issues.apache.org/jira/browse/HIVE-11755 Project: Hive Issue Type: Bug Affects Versions: 1.2.1 Reporter: Josh Elser Assignee: Josh Elser Fix For: 1.2.2 The following exception was noticed in testing out the AccumuloStorageHandler's OutputFormat: {noformat} java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can only be set once per job at org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146) at org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125) at org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95) at org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51) at org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124) at org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Job Submission failed with exception 'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can only be set once per job)' {noformat} The OutputFormat implementation already had a method in place to account for this exception but the method accidentally wasn't getting called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing
[ https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271529#comment-14271529 ] Josh Elser commented on HIVE-8931: -- bq. Yes the HMS has code which depends specifically on the 0.9.2 version of thrift... I meant I'm assuming that the QTests themselves are exercising the metastore in such a way that the thrift dependency is directly needed (and not doing some mock thing). Test TestAccumuloCliDriver is not completing Key: HIVE-8931 URL: https://issues.apache.org/jira/browse/HIVE-8931 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Josh Elser Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing. Logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing
[ https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270685#comment-14270685 ] Josh Elser commented on HIVE-8931: -- Getting back to this, I'm a little stuck here. Backing up, {{hive-metastore}} is bringing in libthrift-0.9.2 which is breaking things. The qtests ultimately pull from $CLASSPATH to star the Accumulo minicluster (which includes stuff from HIVE_HADOOP_TEST_CLASSPATH), that ultimately comes back to the maven test classpath. Without getting libthrift-0.9.1 somehow on the maven classpath, I don't know where the libthirft-0.9.1.jar even exists on the local m2 repository (and thus can't do any trickery to substitute it in place of the libthrift-0.9.2 dependency). My assumption is that excluding libthrift from the hive-metastore dependency will break the other qtests (but that is only a guess). Assuming I can't exclude libthrift from hive-metastore, I'm not sure what I could even do at this point aside from introducing a new maven module specifically for the Accumulo qtests (and gives me carte blanche over the classpath). [~brocknoland], any ideas? Test TestAccumuloCliDriver is not completing Key: HIVE-8931 URL: https://issues.apache.org/jira/browse/HIVE-8931 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Josh Elser Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing. Logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7
[ https://issues.apache.org/jira/browse/HIVE-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246017#comment-14246017 ] Josh Elser commented on HIVE-9082: -- Failure appears to be unrelated. Update Accumulo storage handler to build against Accumulo 1.7 - Key: HIVE-9082 URL: https://issues.apache.org/jira/browse/HIVE-9082 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.15.0 Attachments: HIVE-9082.1.patch Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x support we currently have, so we need to make some updates to the storage handler to get compilation/etc working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing
[ https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246304#comment-14246304 ] Josh Elser commented on HIVE-8931: -- HIVE-8829, update to Thrift 0.9.2, is what broke these tests. Accumulo expects to function with Thrift 0.9.1 and the tests just throws everything and their brother on the classpath. I'll have to see if I can add some trickery to the test driver to keep the extra dependencies from being added. Test TestAccumuloCliDriver is not completing Key: HIVE-8931 URL: https://issues.apache.org/jira/browse/HIVE-8931 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Josh Elser Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing. Logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7
[ https://issues.apache.org/jira/browse/HIVE-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-9082: - Status: Patch Available (was: Open) Update Accumulo storage handler to build against Accumulo 1.7 - Key: HIVE-9082 URL: https://issues.apache.org/jira/browse/HIVE-9082 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.15.0 Attachments: HIVE-9082.1.patch Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x support we currently have, so we need to make some updates to the storage handler to get compilation/etc working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7
[ https://issues.apache.org/jira/browse/HIVE-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-9082: - Attachment: HIVE-9082.1.patch Changes reference to a class in the accumulo-trace jar which was removed in 1.7.0 to one that exists across all versions. The reference to the class is used to pull in the jar to libjars. There are some other fixes which need to happen to solve Accumulo 1.7.0-SNAPSHOT compilation, but those can and should all be addressed in Accumulo (and not pushed down onto Hive). Update Accumulo storage handler to build against Accumulo 1.7 - Key: HIVE-9082 URL: https://issues.apache.org/jira/browse/HIVE-9082 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.15.0 Attachments: HIVE-9082.1.patch Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x support we currently have, so we need to make some updates to the storage handler to get compilation/etc working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7
Josh Elser created HIVE-9082: Summary: Update Accumulo storage handler to build against Accumulo 1.7 Key: HIVE-9082 URL: https://issues.apache.org/jira/browse/HIVE-9082 Project: Hive Issue Type: Improvement Components: StorageHandler Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.15.0 Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x support we currently have, so we need to make some updates to the storage handler to get compilation/etc working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing
[ https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220279#comment-14220279 ] Josh Elser commented on HIVE-8931: -- Thanks for pointing it out, [~brocknoland]. I'll try to take a look. Test TestAccumuloCliDriver is not completing Key: HIVE-8931 URL: https://issues.apache.org/jira/browse/HIVE-8931 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Josh Elser Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing. Logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8931) Test TestAccumuloCliDriver is not completing
[ https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser reassigned HIVE-8931: Assignee: Josh Elser Test TestAccumuloCliDriver is not completing Key: HIVE-8931 URL: https://issues.apache.org/jira/browse/HIVE-8931 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Josh Elser Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing. Logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing
[ https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220293#comment-14220293 ] Josh Elser commented on HIVE-8931: -- Btw, any idea when this test started timing out? That would be super helpful to bisect things (assuming it was at some point passing -- it was for me when I wrote it, anways). Test TestAccumuloCliDriver is not completing Key: HIVE-8931 URL: https://issues.apache.org/jira/browse/HIVE-8931 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Josh Elser Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing. Logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats
[ https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207183#comment-14207183 ] Josh Elser commented on HIVE-8808: -- Thanks for looping me in, [~sushanth]. As far as I can recall, Accumulo's InputFormat classes are stateless, relying on the state to be provided through the JobConf/InputSplits as you described. I know we have some annoyances where multiple calls to the InputFormat which alter the JobConf are not idempotent (they typically throw an error if things are re-set). I work around most of that pain in the StorageHandler impl. Nothing is coming to mind that would be fundamentally broken if we get a re-used instance of the input format. HTH test/evaluate this too. HiveInputFormat caching cannot work with all input formats -- Key: HIVE-8808 URL: https://issues.apache.org/jira/browse/HIVE-8808 Project: Hive Issue Type: Bug Reporter: Brock Noland In {{HiveInputFormat}} we implement instance caching (see {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are stateless but I don't think this assumption is true, especially with regards to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8704) HivePassThroughOutputFormat cannot proxy more than one kind of OF (in one process)
[ https://issues.apache.org/jira/browse/HIVE-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194147#comment-14194147 ] Josh Elser commented on HIVE-8704: -- Nice find, [~sushanth]! Thanks for getting to the bottom of this. HivePassThroughOutputFormat cannot proxy more than one kind of OF (in one process) -- Key: HIVE-8704 URL: https://issues.apache.org/jira/browse/HIVE-8704 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-8704.patch HivePassThroughOutputFormat is a wrapper HiveOutputFormat used by hive to allow access to StorageHandlers that use mapred OutputFormats as their primary implementation point, and do not implement HiveOutputFormat. However, HivePassThroughOutputFormat(henceforth called PTOF) has one major bug - it tracks the underlying outputformat that it is proxying by means of a static string in HiveFileFormatUtils. There are a few problems with this. a) For starters, it means that a given process can only process one PTOF-based output format. So, in the case of a HS2 instance, where one thread is attempting to start a job based on HBase and another on Accumulo will cause a problem, and will overwrite each others' real output format. This leads to bugs where a person trying to use a hbase table gets stack traces from Accumulo like the following: {noformat} ERROR exec.Task: Job Submission failed with exception 'java.lang.NullPointerException(Expected Accumulo table name to be provided in job configuration)' java.lang.NullPointerException: Expected Accumulo table name to be provided in job configuration at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:61) at org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:43) at org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:87) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1071) at org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:465) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1291) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1291) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1603) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1363) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1176) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1003) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:998) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at
[jira] [Assigned] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1
[ https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser reassigned HIVE-8363: Assignee: Josh Elser AccumuloStorageHandler compile failure hadoop-1 --- Key: HIVE-8363 URL: https://issues.apache.org/jira/browse/HIVE-8363 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.14.0 Reporter: Szehon Ho Assignee: Josh Elser Priority: Blocker There's an error about AccumuloStorageHandler compiling on hadoop-1. It seems the signature of split() is not the same. Looks like we can should use another utils to fix this. {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-accumulo-handler: Compilation failure [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52] no suitable method found for split(java.lang.String,char) [ERROR] method org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not applicable {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160903#comment-14160903 ] Josh Elser commented on HIVE-7068: -- [~szehon], yeah, I can get a patch up there today. Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch, HIVE-7068.4.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1
[ https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161329#comment-14161329 ] Josh Elser commented on HIVE-8363: -- I was confused as to how this was introduced. My guess is that HIVE-8257 correctly broke this. We were inadvertently using a Hadoop 2 method even when Hadoop 1 was specified. AccumuloStorageHandler compile failure hadoop-1 --- Key: HIVE-8363 URL: https://issues.apache.org/jira/browse/HIVE-8363 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.14.0 Reporter: Szehon Ho Assignee: Josh Elser Priority: Blocker There's an error about AccumuloStorageHandler compiling on hadoop-1. It seems the signature of split() is not the same. Looks like we can should use another utils to fix this. {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-accumulo-handler: Compilation failure [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52] no suitable method found for split(java.lang.String,char) [ERROR] method org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not applicable {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1
[ https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-8363: - Attachment: HIVE-8363.1.patch Patch switches from Hadoop's StringUtils to commons-lang's. We already had a dependency on commons-lang, and a 3 line fix is much better than introducing more shim code. AccumuloStorageHandler compile failure hadoop-1 --- Key: HIVE-8363 URL: https://issues.apache.org/jira/browse/HIVE-8363 Project: Hive Issue Type: Bug Components: StorageHandler Reporter: Szehon Ho Assignee: Josh Elser Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8363.1.patch There's an error about AccumuloStorageHandler compiling on hadoop-1. It seems the signature of split() is not the same. Looks like we can should use another utils to fix this. {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-accumulo-handler: Compilation failure [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52] no suitable method found for split(java.lang.String,char) [ERROR] method org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not applicable {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1
[ https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-8363: - Fix Version/s: 0.14.0 Affects Version/s: (was: 0.14.0) Status: Patch Available (was: Open) AccumuloStorageHandler compile failure hadoop-1 --- Key: HIVE-8363 URL: https://issues.apache.org/jira/browse/HIVE-8363 Project: Hive Issue Type: Bug Components: StorageHandler Reporter: Szehon Ho Assignee: Josh Elser Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8363.1.patch There's an error about AccumuloStorageHandler compiling on hadoop-1. It seems the signature of split() is not the same. Looks like we can should use another utils to fix this. {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-accumulo-handler: Compilation failure [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52] no suitable method found for split(java.lang.String,char) [ERROR] method org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not applicable {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7789) Documentation for AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158075#comment-14158075 ] Josh Elser commented on HIVE-7789: -- Thanks, [~leftylev]! That's great. Documentation for AccumuloStorageHandler Key: HIVE-7789 URL: https://issues.apache.org/jira/browse/HIVE-7789 Project: Hive Issue Type: Task Components: Documentation Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation on its usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7789) Documentation for AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser resolved HIVE-7789. -- Resolution: Fixed Got a first-round of documentation up at https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration that I'm fairly happy with. Documentation for AccumuloStorageHandler Key: HIVE-7789 URL: https://issues.apache.org/jira/browse/HIVE-7789 Project: Hive Issue Type: Task Components: Documentation Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation on its usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency
[ https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152661#comment-14152661 ] Josh Elser commented on HIVE-8257: -- bq. Could you add the optional tag to the jar: Yeah, I can do that. bq. Do you need the changes in the main pom.xml? Declaring the version in dependencyManagement in the project pom is the proper place to do so. While, the way the two hadoop profiles are configured confuses that a little bit, it is the proper way to do so. If anything, I think the extra versions in the accumulo-handler/pom.xml are unnecessary, but I kept them there to follow suit with the other modules. Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8257.1.patch It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8257) Accumulo introduces old hadoop-client dependency
[ https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-8257: - Attachment: HIVE-8257.2.patch Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8257.1.patch, HIVE-8257.2.patch It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency
[ https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152674#comment-14152674 ] Josh Elser commented on HIVE-8257: -- Thanks, [~vikram.dixit]! Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8257.1.patch, HIVE-8257.2.patch It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency
[ https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152673#comment-14152673 ] Josh Elser commented on HIVE-8257: -- v2 patch attached with {{optional}} added to hadoop-client dependency. Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8257.1.patch, HIVE-8257.2.patch It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-7789) Documentation for AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-7789 started by Josh Elser. Documentation for AccumuloStorageHandler Key: HIVE-7789 URL: https://issues.apache.org/jira/browse/HIVE-7789 Project: Hive Issue Type: Task Components: Documentation Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation on its usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7789) Documentation for AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152799#comment-14152799 ] Josh Elser commented on HIVE-7789: -- Started working on this at https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration Documentation for AccumuloStorageHandler Key: HIVE-7789 URL: https://issues.apache.org/jira/browse/HIVE-7789 Project: Hive Issue Type: Task Components: Documentation Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation on its usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency
[ https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151247#comment-14151247 ] Josh Elser commented on HIVE-8257: -- (bump [~vikram.dixit]) Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8257.1.patch It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8257) Accumulo introduces old hadoop-client dependency
Josh Elser created HIVE-8257: Summary: Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8257) Accumulo introduces old hadoop-client dependency
[ https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-8257: - Attachment: HIVE-8257.1.patch Adds hadoop-client to dependencyManagement in parent pom and dependencies in the accumulo-storage pom. Verified that no old hadoop artifacts are in {{mvn dependency:tree}} and dist tarball no longer has old jars included in {{lib}}. Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8257.1.patch It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8257) Accumulo introduces old hadoop-client dependency
[ https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-8257: - Status: Patch Available (was: Open) Accumulo introduces old hadoop-client dependency Key: HIVE-8257 URL: https://issues.apache.org/jira/browse/HIVE-8257 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Josh Elser Assignee: Josh Elser Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8257.1.patch It was brought to my attention that Accumulo is transitively bringing in some artifacts with the wrong version of Hadoop. Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get its necessary dependencies. Because there is no dependency with the correct version in Hive, this introduces hadoop-2.2.0 dependencies. A solution is to make sure that hadoop-client is set with the correct {{hadoop-20S.version}} or {{hadoop-23.version}}. Snippet from {{mvn dependency:tree -Phadoop-2}} {noformat} [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-accumulo-handler --- [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT [INFO] +- commons-lang:commons-lang:jar:2.6:compile [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile ... [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144194#comment-14144194 ] Josh Elser commented on HIVE-7950: -- Thanks for your help, [~sershe]! StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, HIVE-7950.4.patch, HIVE-7950.5.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.5.patch Fixed your nit, Sergey. Thanks for taking the time to review -- much appreciated. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, HIVE-7950.4.patch, HIVE-7950.5.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.4.patch Updated patch with feedback from Sergey on RB. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, HIVE-7950.4.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137640#comment-14137640 ] Josh Elser commented on HIVE-7950: -- Sure thing. RB is linked. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138144#comment-14138144 ] Josh Elser commented on HIVE-7984: -- Thanks, Sushanth -- much appreciated. AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7984-1.diff, HIVE-7984-1.patch, HIVE-7984.1.patch Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136784#comment-14136784 ] Josh Elser commented on HIVE-7950: -- Test failures appear unrelated to me. Can anyone give this a review for me? StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136787#comment-14136787 ] Josh Elser commented on HIVE-7984: -- Test failure appears unrelated to me. Can anyone give this a review? It's a rather straightforward change. AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7984-1.diff, HIVE-7984-1.patch, HIVE-7984.1.patch Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7984: - Attachment: HIVE-7984-1.patch Same changes, but named the original attachment wrong. Fixing suffix to trigger HIVE-QA AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7984-1.diff, HIVE-7984-1.patch Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.3.patch Updated patch. Needed to make a small change after getting past the Tez bug. Added some more unit tests and tried to clean things up. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: hive-7950-tez-WIP.diff I took a look at the tez branch to see if I could add more resources to an existing session as you described, [~sershe]. Looking at the javadoc, I feel like this patch should work, but the query still errors out when the map inside the dag fails due to missing classes. I can see that the dag does get the extra jars localized: {noformat} 2014-09-08 23:20:34,823 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Added additional resources : [[file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-fate-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-core-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-trace-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-start-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/zookeeper-3.4.6.jar]] to classpath {noformat} But I'm still getting a NoClassDefFoundException on a class which is in accumulo-core.jar: {noformat} Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:183) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:384) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:281) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:73) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) ... 12 more Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127407#comment-14127407 ] Josh Elser commented on HIVE-7950: -- Ok, I figured a bit more out here. I believe that the AM *is* correctly getting the extra jars from the storage handler as expected. The subsequent errors are coming from the containers that are started to actually run the DAG (rather than the coordination from the tez AM). The interesting part is that the patch (HIVE-7950-1.diff) which starts a brand new Session will result in a successful query. It seems like maybe Tez isn't passing along the extra resources we added to the running session (AM) in Hive along to the DAG containers to actually run the query. I have no idea at this point if this is a problem in how hive is using tez or if it's a bug in tez itself... StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127440#comment-14127440 ] Josh Elser commented on HIVE-7950: -- Sure thing, [~gopalv]. I don't actually have to do any extra {{ADD JAR}} commands. The AccumuloStorageHandler constructs a list of jars that need to be passed along to the execution engine (via tmpjars in the Hadoop configuration). With the 'yarn' execution.engine, this works just fine -- the resources are localized and added to the Map/Reduce containers and things are great. When I try to run with 'tez', there are a few issues. The first is that, if there is already a TezSessionState that was already open'ed (e.g. like what is done when I just open the hive shell), it will have been started without those extra 'tmpjars' resources from the StorageHandler and the query will fail because we need those jars. Sergey mentioned that Tez 0.5.0 had a new method that would allow more resources to be added to an already started TezClient ({{TezClient#addAppMasterLocalFiles(MapString, LocalResource)}}). Implementing this (in the hive-7950-tez-WIP.diff attachment), appears to have successfully added the extra jars from the StorageHandler to the DAGAppMaster, but the containers started to actually run the query are missing those extra jars. Does that make sense? StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127485#comment-14127485 ] Josh Elser commented on HIVE-7950: -- You're the man. That was exactly what I needed. I completely missed that I needed to add the resources to the DAG as well. I'll clean up my changes and post and updated patch here later today after I poke/prod it some more. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127789#comment-14127789 ] Josh Elser commented on HIVE-7950: -- I have found one corner-case which I'm still trying to maneuver around. The DAG code fails if local resources are added that already exist in the state of extra resources that are already going to be added. The problem is that you can't get find out what resources are already set to be localized for a DAG. I can call {{DAG#addTaskLocalFiles(MapString, LocalResource)}} to add resources but that will fail if any of them happen to be already loaded. This seems to be lacking WRT to Vertex which also has a {{getTaskLocalFiles()}} method. That's a Tez nit -- I can open a JIRA over there if you think that's necessary (or not already fixed upstream). This is the actual stack I'm trying to work around: {noformat} org.apache.tez.dag.api.TezUncheckedException: Attempting to add duplicate resource: accumulo-fate-1.6.0.jar at org.apache.tez.common.TezCommonUtils.addAdditionalLocalResources(TezCommonUtils.java:307) at org.apache.tez.dag.api.Vertex.addTaskLocalFiles(Vertex.java:256) at org.apache.tez.dag.api.DAG.createDag(DAG.java:643) at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:372) at org.apache.tez.client.TezClient.submitDAG(TezClient.java:342) at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:385) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:209) ... {noformat} Essentially, we have a DAG, we tried to submit it to the Session we have, but the underlying application was dead (for testing purposes, because I {{kill}}'ed it, but this happens if you just wait long enough). The code gets a {{SessionNotRunning}} exception, tries to {{closeAndOpen}} StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127798#comment-14127798 ] Josh Elser commented on HIVE-7950: -- Ugh, published too soon: The code gets a {{SessionNotRunning}} exception, tries to {{closeAndOpen}} session, and then ultimately fails when it goes to submit the DAG. I believe I need to figure out a way to ensure the DAG doesn't have the extra local resources (from the StorageHandler) in the case where we start up a new Session, and then DAG would get the resources from that new Session (as opposed to the old session which didn't have the extra resources to begin with), but I'm not 100% sure yet. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127901#comment-14127901 ] Josh Elser commented on HIVE-7950: -- I think I finally got to the bottom of this, and it is broken with Tez-0.5.0. TezTask needs to be altered (as described by the previous discussion) to add the necessary StorageHandler resources to the DAG using {code} dag.addTaskLocalFiles(localResources); {code} For the case of the AccumuloStorageHandler, this adds the jars necessary to connect to Accumulo to {{commonTaskLocalFiles}} in {{DAG}}. Then, {{TezTask}} will proceed to eventually submit the DAG to be run. {code} try { // ready to start execution on the cluster sessionState.getSession().addAppMasterLocalFiles(resourceMap); dagClient = sessionState.getSession().submitDAG(dag); } catch (SessionNotRunning nr) { console.printInfo(Tez session was closed. Reopening...); // close the old one, but keep the tmp files around TezSessionPoolManager.getInstance().closeAndOpen(sessionState, this.conf); console.printInfo(Session re-established.); dagClient = sessionState.getSession().submitDAG(dag); } {code} Consider the case where we had a Session already created for the user, but the underlying application has exited, say due to a timeout. In the try block, we try to submit our DAG to run. In doing so, TezClient creates a DAGPlan from the DAG {code} DAGPlan dagPlan = dag.createDag(amConfig.getTezConfiguration()); {code} When we create a {{DAGPlan}} from the {{DAG}}, we modify the {{DAG}} instance, adding the local resources to each {{Vertex}} in the {{DAG}}. Then, we identify that the underlying application has already died, and that we need to {{closeAndOpen}} a new Session. So, we get the {{SessionNotRunning}} exception, pop out to the catch block, and end up creating another {{DAGPlan}} from the {{DAG}} _that was already altered by the last attempt to submit it_. As I'm looking at it, I don't think there's anything I can do at the Hive level to fix this because {{TezClient}} will always try to add duplicate resources to the {{Vertex}}'s in a {{DAG}} which throws an Exception and tanks the query. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127923#comment-14127923 ] Josh Elser commented on HIVE-7950: -- Yeah, you're totally right. I was getting hung up on this edge case and forgot and the first issue to fix. I'll open a tez jira for the above. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950.2.patch Patch against the tez branch which attempts to ensure that the Tez AM and the DAG both have the necessary extra local resources as required by a StorageHandler. Tried to add some tests which ensure modifications to TezTask work as expected. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7984: - Summary: AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez (was: Configuration items from StorageHandler not passed to Tez Configuration) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7984-1.diff Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123529#comment-14123529 ] Josh Elser commented on HIVE-7950: -- Ah, the tez branch is on 0.5.0 (trunk is on 0.4.1). What's the lifecycle on the tez branch, is it occasionally merged into trunk? StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) AccumuloStorageHandler doesn't work with Hive on Tez
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121463#comment-14121463 ] Josh Elser commented on HIVE-7950: -- Going to break the bigger issue down into more manageable pieces to fix. AccumuloStorageHandler doesn't work with Hive on Tez Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that I've noticed already (probably more as I can get past the ones I already found): * Jars added to the classpath via tmpjars (which is done by the copied HBase Utils class) aren't available in the Tez Map task -- need to compare to HBaseStorageHandler and see if there is something magic happening * Configuration generated by the AccumuloStorageHandler doesn't make it all the way to the Configuration passed to the AccumuloOutputFormat (probably AccumuloInputFormat, too) {noformat} 2014-09-03 01:28:45,357 ERROR [TezChild] org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448) ... 23 more Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Summary: StorageHandler resources aren't added to Tez Session if already Session is already Open (was: AccumuloStorageHandler doesn't work with Hive on Tez) StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that I've noticed already (probably more as I can get past the ones I already found): * Jars added to the classpath via tmpjars (which is done by the copied HBase Utils class) aren't available in the Tez Map task -- need to compare to HBaseStorageHandler and see if there is something magic happening * Configuration generated by the AccumuloStorageHandler doesn't make it all the way to the Configuration passed to the AccumuloOutputFormat (probably AccumuloInputFormat, too) {noformat} 2014-09-03 01:28:45,357 ERROR [TezChild] org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448) ... 23 more Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has not been
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Description: Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. was: Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that I've noticed already (probably more as I can get past the ones I already found): * Jars added to the classpath via tmpjars (which is done by the copied HBase Utils class) aren't available in the Tez Map task -- need to compare to HBaseStorageHandler and see if there is something magic happening * Configuration generated by the AccumuloStorageHandler doesn't make it all the way to the Configuration passed to the AccumuloOutputFormat (probably AccumuloInputFormat, too) {noformat} 2014-09-03 01:28:45,357 ERROR [TezChild] org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448) ... 23 more Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:553) at org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.getHiveRecordWriter(HivePassThroughOutputFormat.java:113) at
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Status: Patch Available (was: Open) StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: HIVE-7950-1.diff StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration
Josh Elser created HIVE-7984: Summary: Configuration items from StorageHandler not passed to Tez Configuration Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121820#comment-14121820 ] Josh Elser commented on HIVE-7984: -- I think the cause is that {{PlanUtils.configureInputJobPropertiesForStorageHandler(TableDesc)}} or {{PlanUtils.configureOutputJobPropertiesForStorageHandler(TableDesc)}} aren't called in the Tez pipeline. Still trying to figure out where exactly that should go. Configuration items from StorageHandler not passed to Tez Configuration --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122275#comment-14122275 ] Josh Elser commented on HIVE-7950: -- Nope, 0.4.1-incubating. Being able to add more resources to an existing session would certainly be preferable though.. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122315#comment-14122315 ] Josh Elser commented on HIVE-7984: -- After a bunch of digging, I found that I could still work around this via the custom OutputFormat for Accumulo without having to actually dig into the calls to the StorageHandler WRT to the execution engine. Configuration items from StorageHandler not passed to Tez Configuration --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7984: - Attachment: HIVE-7984-1.diff Fixes the OutputFormat to be a little more resilient. Also removed a really nasty log.info statement that shouldn't have been committed in the first place. Configuration items from StorageHandler not passed to Tez Configuration --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7984-1.diff Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7984: - Status: Patch Available (was: Open) Configuration items from StorageHandler not passed to Tez Configuration --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7984-1.diff Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7928) There is no catch statement in Utils#updateMap
[ https://issues.apache.org/jira/browse/HIVE-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118298#comment-14118298 ] Josh Elser commented on HIVE-7928: -- [~skrho], I don't follow the reason for your change. The point of the try/finally is to ensure that the {{ZipFile}} is closed before the method returns. The code also does not handle the IOException that can be thrown and lets the caller deal with that exception ({{throws IOException}}. A try block does not always require a catch statement. This method looks fine to me as-is. There is no catch statement in Utils#updateMap -- Key: HIVE-7928 URL: https://issues.apache.org/jira/browse/HIVE-7928 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: skrho Assignee: skrho Priority: Minor Attachments: HIVE-7928_001.patch There is no catch statement in Utils class( In accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java line : 148) If there is no catch statement, We can't know why if exception is happended.. I think add catch statement and throw exception.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7929) close of ZipOutputStream in Utils#jarDir() should be placed in finally block
[ https://issues.apache.org/jira/browse/HIVE-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118766#comment-14118766 ] Josh Elser commented on HIVE-7929: -- The catch block is unnecessary. I think the finally block should only contain {{zos.close()}} with {{zos.closeEntry();}} and {{zipDir(dir, relativePath, zos, true);}} moved inside of the try block. For example: {code} try { ... zos.closeEntry(); zipDir(dir, relativePath, zos, true); } finally { zos.close(); } {code} Alternatively, it might be cleaner to do a try/finally in {{createJar(File, File)}} to close the JarOutputStream and completely remove the {{close()}} call in {{jarDir(File, String, ZipOutputStream)}}. Also, it may interest you, this code was borrowed from HBase. They may benefit from these same improvements in their codebase -- I forget what HBase version I copied this from though. close of ZipOutputStream in Utils#jarDir() should be placed in finally block Key: HIVE-7929 URL: https://issues.apache.org/jira/browse/HIVE-7929 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: skrho Assignee: skrho Priority: Minor Labels: patch Attachments: HIVE-7929_001.patch In accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java , line 308 : zos.closeEntry(); zipDir(dir, relativePath, zos, true); zos.close(); If exception is happened, ZipOutputStream would be left unclosed.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7950) AccumuloStorageHandler doesn't work with Hive on Tez
Josh Elser created HIVE-7950: Summary: AccumuloStorageHandler doesn't work with Hive on Tez Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that I've noticed already (probably more as I can get past the ones I already found): * Jars added to the classpath via tmpjars (which is done by the copied HBase Utils class) aren't available in the Tez Map task -- need to compare to HBaseStorageHandler and see if there is something magic happening * Configuration generated by the AccumuloStorageHandler doesn't make it all the way to the Configuration passed to the AccumuloOutputFormat (probably AccumuloInputFormat, too) {noformat} 2014-09-03 01:28:45,357 ERROR [TezChild] org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {row:a,col:d} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448) ... 23 more Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has not been configured for AccumuloOutputFormat at org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:553) at org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.getHiveRecordWriter(HivePassThroughOutputFormat.java:113) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:296) at
[jira] [Created] (HIVE-7789) Documentation for AccumuloStorageHandler
Josh Elser created HIVE-7789: Summary: Documentation for AccumuloStorageHandler Key: HIVE-7789 URL: https://issues.apache.org/jira/browse/HIVE-7789 Project: Hive Issue Type: Task Components: Documentation Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation on its usage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103266#comment-14103266 ] Josh Elser commented on HIVE-7068: -- Certainly -- created HIVE-7789. Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch, HIVE-7068.4.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7068: - Attachment: HIVE-7068.4.patch Rebase'd patch (#4) from upstream changes. Changes over the last patch: * Had to make some qtest fixes (after HIVE-7519) * Deleted two commented lines I noticed in the source Huge thank you to Nick, Sushanth, and Navis! Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch, HIVE-7068.4.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099196#comment-14099196 ] Josh Elser commented on HIVE-7068: -- [~ndimiduk], I agree with you completely. There's no reason that the column mapping stuff needs to be separated as it is now. I tried to make the ColumnMapping class hierarchy a bit cleaner over what was in the hbase-handler (it looked like there were already comments in the hbase-handler code saying that it would be good to clean it up in the future). I'd love to help converge these. Many thanks for taking the time to look through it. Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082861#comment-14082861 ] Josh Elser commented on HIVE-7068: -- Any other Hive committers have some time to look at this and potentially help get this merged in? Would be greatly appreciated! Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7068: - Attachment: HIVE-7068.2.patch Minor updates to the patch: * Removes unnecessary whitespace/javadoc * Adds a better exception when Accumulo connection information isn't in the hiveconf as required. * Pulls in more upstream changes from trunk * Fixes accumulo qtest after HIVE-5771 Also re-trigger HIVE QA which appear to have failed for other reasons on the last patch. I'll update reviewboard as well if anyone wants to see the changes. Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7068: - Attachment: HIVE-7068.3.patch Sorry, found another minor issue with serialization of strings as compared to what HBaseStorageHandler does. New patch allows binary encoding to be specified on strings without error (falls back to UTF8 serialization). Added a test for it too, and cleaned up some other nits I saw in fixing the bug. Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7068: - Fix Version/s: 0.14.0 Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Fix For: 0.14.0 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7068: - Attachment: HIVE-7068.1.patch v1 of patch to add AccumuloStorageHandler. Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7068: - Status: Patch Available (was: Open) First stab at adding an AccumuloStorageHandler to Hive. This is a lot of code, so I'll try to outline the high level features. * Builds on impl from Brian (as mentioned in description) * Tended to mimic HBaseStorageHandler when it made sense to do so * Predicate pushdown on rowid to query only relevant portions of Accumulo tables * Predicate pushdown on non-rowid columns to filter server-side in Accumulo * Support for external tables * Hive Map pushdown to column family plus optional column qualifier prefix * Binary and UTF8 serialization within Accumulo * Extendable CompositeRowId and RowIdFactory interfaces for users * Lots of unit tests * A handful of borrowed qtests from HBaseStorageHandler * Accumulo 1.5.1 and 1.6.0 support (hid from users by reflection) Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7068) Integrate AccumuloStorageHandler
Josh Elser created HIVE-7068: Summary: Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)