from:"Josh Elser \(JIRA\)"

[jira] [Created] (HIVE-17083) DagUtils overwrites any credentials already added

2017-07-13 Thread Josh Elser (JIRA)

Josh Elser created HIVE-17083:
-

 Summary: DagUtils overwrites any credentials already added
 Key: HIVE-17083
 URL: https://issues.apache.org/jira/browse/HIVE-17083
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Josh Elser
Assignee: Josh Elser


While working with a StorageHandler with hive.execution.engine=tez, I found 
that the credentials the storage handler was adding were not propagating to the 
dag.

After a big of debugging/git-log, I found that DagUtils was overwriting the 
credentials which were already set. A quick patch locally seem to make things 
work again. Will put together a quick unit test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-06-27 Thread Josh Elser (JIRA)

Josh Elser created HIVE-16973:
-

 Summary: Fetching of Delegation tokens (Kerberos) for 
AccumuloStorageHandler fails in HS2
 Key: HIVE-16973
 URL: https://issues.apache.org/jira/browse/HIVE-16973
 Project: Hive
  Issue Type: Bug
  Components: Accumulo Storage Handler
Reporter: Josh Elser
Assignee: Josh Elser


Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
Looking into it, it seems like the bit-rot got pretty bad. You'll see something 
like the following:

{noformat}
Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
at 
org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
 
at 
org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
 
{noformat}

It appears that some of the code-paths changed since when I first did my 
testing (or I just did poor testing) and the delegation token was never being 
fetched/serialized. There also are some issues with fetching the delegation 
token from Accumulo properly which were addressed in ACCUMULO-4665

I believe it would also be best to just update the dependency to use Accumulo 
1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
otherwise get much more complicated with reflection -- Accumulo has moved on 
past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-07 Thread Josh Elser (JIRA)

Josh Elser created HIVE-11755:
-

 Summary: Incorrect method called with Kerberos enabled in 
AccumuloStorageHandler
 Key: HIVE-11755
 URL: https://issues.apache.org/jira/browse/HIVE-11755
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 1.2.2


The following exception was noticed in testing out the AccumuloStorageHandler's 
OutputFormat:

{noformat}
java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
only be set once per job
  at 
org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
  at 
org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
  at 
org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
  at 
org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
  at 
org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
  at 
org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
  at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
  at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
  at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
  at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
  at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
  at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
  at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
  at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  Job Submission failed with exception 
'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can 
only be set once per job)'
{noformat}

The OutputFormat implementation already had a method in place to account for 
this exception but the method accidentally wasn't getting called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing

2015-01-09 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271529#comment-14271529
 ] 

Josh Elser commented on HIVE-8931:
--

bq. Yes the HMS has code which depends specifically on the 0.9.2 version of 
thrift...

I meant I'm assuming that the QTests themselves are exercising the metastore in 
such a way that the thrift dependency is directly needed (and not doing some 
mock thing).

 Test TestAccumuloCliDriver is not completing
 

 Key: HIVE-8931
 URL: https://issues.apache.org/jira/browse/HIVE-8931
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Josh Elser

 Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing.
 Logs:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing

2015-01-08 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270685#comment-14270685
 ] 

Josh Elser commented on HIVE-8931:
--

Getting back to this, I'm a little stuck here. Backing up, {{hive-metastore}} 
is bringing in libthrift-0.9.2 which is breaking things. The qtests ultimately 
pull from $CLASSPATH to star the Accumulo minicluster (which includes stuff 
from HIVE_HADOOP_TEST_CLASSPATH), that ultimately comes back to the maven test 
classpath. Without getting libthrift-0.9.1 somehow on the maven classpath, I 
don't know where the libthirft-0.9.1.jar even exists on the local m2 repository 
(and thus can't do any trickery to substitute it in place of the 
libthrift-0.9.2 dependency). My assumption is that excluding libthrift from the 
hive-metastore dependency will break the other qtests (but that is only a 
guess).

Assuming I can't exclude libthrift from hive-metastore, I'm not sure what I 
could even do at this point aside from introducing a new maven module 
specifically for the Accumulo qtests (and gives me carte blanche over the 
classpath). [~brocknoland], any ideas? 

 Test TestAccumuloCliDriver is not completing
 

 Key: HIVE-8931
 URL: https://issues.apache.org/jira/browse/HIVE-8931
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Josh Elser

 Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing.
 Logs:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7

2014-12-14 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246017#comment-14246017
 ] 

Josh Elser commented on HIVE-9082:
--

Failure appears to be unrelated.

 Update Accumulo storage handler to build against Accumulo 1.7
 -

 Key: HIVE-9082
 URL: https://issues.apache.org/jira/browse/HIVE-9082
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.15.0

 Attachments: HIVE-9082.1.patch


 Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the 
 current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x 
 support we currently have, so we need to make some updates to the storage 
 handler to get compilation/etc working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing

2014-12-14 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246304#comment-14246304
 ] 

Josh Elser commented on HIVE-8931:
--

HIVE-8829, update to Thrift 0.9.2, is what broke these tests. Accumulo expects 
to function with Thrift 0.9.1 and the tests just throws everything and their 
brother on the classpath.

I'll have to see if I can add some trickery to the test driver to keep the 
extra dependencies from being added.



 Test TestAccumuloCliDriver is not completing
 

 Key: HIVE-8931
 URL: https://issues.apache.org/jira/browse/HIVE-8931
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Josh Elser

 Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing.
 Logs:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7

2014-12-12 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-9082:
-
Status: Patch Available  (was: Open)

 Update Accumulo storage handler to build against Accumulo 1.7
 -

 Key: HIVE-9082
 URL: https://issues.apache.org/jira/browse/HIVE-9082
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.15.0

 Attachments: HIVE-9082.1.patch


 Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the 
 current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x 
 support we currently have, so we need to make some updates to the storage 
 handler to get compilation/etc working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7

2014-12-12 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-9082:
-
Attachment: HIVE-9082.1.patch

Changes reference to a class in the accumulo-trace jar which was removed in 
1.7.0 to one that exists across all versions. The reference to the class is 
used to pull in the jar to libjars.

There are some other fixes which need to happen to solve Accumulo 
1.7.0-SNAPSHOT compilation, but those can and should all be addressed in 
Accumulo (and not pushed down onto Hive).

 Update Accumulo storage handler to build against Accumulo 1.7
 -

 Key: HIVE-9082
 URL: https://issues.apache.org/jira/browse/HIVE-9082
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.15.0

 Attachments: HIVE-9082.1.patch


 Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the 
 current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x 
 support we currently have, so we need to make some updates to the storage 
 handler to get compilation/etc working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9082) Update Accumulo storage handler to build against Accumulo 1.7

2014-12-11 Thread Josh Elser (JIRA)

Josh Elser created HIVE-9082:


 Summary: Update Accumulo storage handler to build against Accumulo 
1.7
 Key: HIVE-9082
 URL: https://issues.apache.org/jira/browse/HIVE-9082
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.15.0


Currently, trunk doesn't compile against Accumulo 1.7.0-SNAPSHOT which is the 
current tip of Accumulo. 1.7.0 includes some API removals over the 1.5.x 
support we currently have, so we need to make some updates to the storage 
handler to get compilation/etc working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing

2014-11-20 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220279#comment-14220279
 ] 

Josh Elser commented on HIVE-8931:
--

Thanks for pointing it out, [~brocknoland]. I'll try to take a look.

 Test TestAccumuloCliDriver is not completing
 

 Key: HIVE-8931
 URL: https://issues.apache.org/jira/browse/HIVE-8931
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Josh Elser

 Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing.
 Logs:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8931) Test TestAccumuloCliDriver is not completing

2014-11-20 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reassigned HIVE-8931:


Assignee: Josh Elser

 Test TestAccumuloCliDriver is not completing
 

 Key: HIVE-8931
 URL: https://issues.apache.org/jira/browse/HIVE-8931
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Josh Elser

 Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing.
 Logs:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8931) Test TestAccumuloCliDriver is not completing

2014-11-20 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220293#comment-14220293
 ] 

Josh Elser commented on HIVE-8931:
--

Btw, any idea when this test started timing out? That would be super helpful to 
bisect things (assuming it was at some point passing -- it was for me when I 
wrote it, anways).

 Test TestAccumuloCliDriver is not completing
 

 Key: HIVE-8931
 URL: https://issues.apache.org/jira/browse/HIVE-8931
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Josh Elser

 Tests are taking 3 hours due to {{TestAccumuloCliDriver}} not finishing.
 Logs:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1848/failed/TestAccumuloCliDriver/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8808) HiveInputFormat caching cannot work with all input formats

2014-11-11 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207183#comment-14207183
 ] 

Josh Elser commented on HIVE-8808:
--

Thanks for looping me in, [~sushanth].

As far as I can recall, Accumulo's InputFormat classes are stateless, relying 
on the state to be provided through the JobConf/InputSplits as you described. I 
know we have some annoyances where multiple calls to the InputFormat which 
alter the JobConf are not idempotent (they typically throw an error if things 
are re-set). I work around most of that pain in the StorageHandler impl.

Nothing is coming to mind that would be fundamentally broken if we get a 
re-used instance of the input format. HTH test/evaluate this too.

 HiveInputFormat caching cannot work with all input formats
 --

 Key: HIVE-8808
 URL: https://issues.apache.org/jira/browse/HIVE-8808
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 In {{HiveInputFormat}} we implement instance caching (see 
 {{getInputFormatFromCache}}). In HS2, this assumes that InputFormats are 
 stateless but I don't think this assumption is true, especially with regards 
 to HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8704) HivePassThroughOutputFormat cannot proxy more than one kind of OF (in one process)

2014-11-02 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194147#comment-14194147
 ] 

Josh Elser commented on HIVE-8704:
--

Nice find, [~sushanth]! Thanks for getting to the bottom of this.

 HivePassThroughOutputFormat cannot proxy more than one kind of OF (in one 
 process)
 --

 Key: HIVE-8704
 URL: https://issues.apache.org/jira/browse/HIVE-8704
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-8704.patch


 HivePassThroughOutputFormat is a wrapper HiveOutputFormat used by hive to 
 allow access to StorageHandlers that use mapred OutputFormats as their 
 primary implementation point, and do not implement HiveOutputFormat.
 However, HivePassThroughOutputFormat(henceforth called PTOF) has one major 
 bug - it tracks the underlying outputformat that it is proxying by means of a 
 static string in HiveFileFormatUtils. There are a few problems with this.
 a) For starters, it means that a given process can only process one 
 PTOF-based output format. So, in the case of a HS2 instance, where one thread 
 is attempting to start a job based on HBase and another on Accumulo will 
 cause a problem, and will overwrite each others' real output format. This 
 leads to bugs where a person trying to use a hbase table gets stack traces 
 from Accumulo like the following:
 {noformat}
 ERROR exec.Task: Job Submission failed with exception 
 'java.lang.NullPointerException(Expected Accumulo table name to be provided 
 in job configuration)'
 java.lang.NullPointerException: Expected Accumulo table name to be provided 
 in job configuration
   at 
 com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
   at 
 org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:61)
   at 
 org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:43)
   at 
 org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:87)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1071)
   at 
 org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:465)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1291)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1291)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1603)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1363)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1176)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1003)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:998)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:144)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
   at 
 org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at

[jira] [Assigned] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1

2014-10-06 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reassigned HIVE-8363:


Assignee: Josh Elser

 AccumuloStorageHandler compile failure hadoop-1
 ---

 Key: HIVE-8363
 URL: https://issues.apache.org/jira/browse/HIVE-8363
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.14.0
Reporter: Szehon Ho
Assignee: Josh Elser
Priority: Blocker

 There's an error about AccumuloStorageHandler compiling on hadoop-1.  It 
 seems the signature of split() is not the same.  Looks like we can should use 
 another utils to fix this.
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-accumulo-handler: Compilation failure
 [ERROR] 
 /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52]
  no suitable method found for split(java.lang.String,char)
 [ERROR] method 
 org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not 
 applicable
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-10-06 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160903#comment-14160903
 ] 

Josh Elser commented on HIVE-7068:
--

[~szehon], yeah, I can get a patch up there today.

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch, 
 HIVE-7068.4.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1

2014-10-06 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161329#comment-14161329
 ] 

Josh Elser commented on HIVE-8363:
--

I was confused as to how this was introduced. My guess is that HIVE-8257 
correctly broke this. We were inadvertently using a Hadoop 2 method even when 
Hadoop 1 was specified.

 AccumuloStorageHandler compile failure hadoop-1
 ---

 Key: HIVE-8363
 URL: https://issues.apache.org/jira/browse/HIVE-8363
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.14.0
Reporter: Szehon Ho
Assignee: Josh Elser
Priority: Blocker

 There's an error about AccumuloStorageHandler compiling on hadoop-1.  It 
 seems the signature of split() is not the same.  Looks like we can should use 
 another utils to fix this.
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-accumulo-handler: Compilation failure
 [ERROR] 
 /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52]
  no suitable method found for split(java.lang.String,char)
 [ERROR] method 
 org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not 
 applicable
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1

2014-10-06 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-8363:
-
Attachment: HIVE-8363.1.patch

Patch switches from Hadoop's StringUtils to commons-lang's. We already had a 
dependency on commons-lang, and a 3 line fix is much better than introducing 
more shim code.

 AccumuloStorageHandler compile failure hadoop-1
 ---

 Key: HIVE-8363
 URL: https://issues.apache.org/jira/browse/HIVE-8363
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Reporter: Szehon Ho
Assignee: Josh Elser
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8363.1.patch


 There's an error about AccumuloStorageHandler compiling on hadoop-1.  It 
 seems the signature of split() is not the same.  Looks like we can should use 
 another utils to fix this.
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-accumulo-handler: Compilation failure
 [ERROR] 
 /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52]
  no suitable method found for split(java.lang.String,char)
 [ERROR] method 
 org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not 
 applicable
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8363) AccumuloStorageHandler compile failure hadoop-1

2014-10-06 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-8363:
-
Fix Version/s: 0.14.0
Affects Version/s: (was: 0.14.0)
   Status: Patch Available  (was: Open)

 AccumuloStorageHandler compile failure hadoop-1
 ---

 Key: HIVE-8363
 URL: https://issues.apache.org/jira/browse/HIVE-8363
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Reporter: Szehon Ho
Assignee: Josh Elser
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8363.1.patch


 There's an error about AccumuloStorageHandler compiling on hadoop-1.  It 
 seems the signature of split() is not the same.  Looks like we can should use 
 another utils to fix this.
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-accumulo-handler: Compilation failure
 [ERROR] 
 /data/hive-ptest/working/apache-svn-trunk-source/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/columns/ColumnMapper.java:[57,52]
  no suitable method found for split(java.lang.String,char)
 [ERROR] method 
 org.apache.hadoop.util.StringUtils.split(java.lang.String,char,char) is not 
 applicable
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7789) Documentation for AccumuloStorageHandler

2014-10-03 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158075#comment-14158075
 ] 

Josh Elser commented on HIVE-7789:
--

Thanks, [~leftylev]! That's great.

 Documentation for AccumuloStorageHandler
 

 Key: HIVE-7789
 URL: https://issues.apache.org/jira/browse/HIVE-7789
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation 
 on its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-7789) Documentation for AccumuloStorageHandler

2014-10-02 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HIVE-7789.
--
Resolution: Fixed

Got a first-round of documentation up at 
https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration that I'm 
fairly happy with.

 Documentation for AccumuloStorageHandler
 

 Key: HIVE-7789
 URL: https://issues.apache.org/jira/browse/HIVE-7789
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation 
 on its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-29 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152661#comment-14152661
 ] 

Josh Elser commented on HIVE-8257:
--

bq. Could you add the optional tag to the jar:

Yeah, I can do that.

bq. Do you need the changes in the main pom.xml?

Declaring the version in dependencyManagement in the project pom is the proper 
place to do so. While, the way the two hadoop profiles are configured confuses 
that a little bit, it is the proper way to do so. If anything, I think the 
extra versions in the accumulo-handler/pom.xml are unnecessary, but I kept them 
there to follow suit with the other modules.

 Accumulo introduces old hadoop-client dependency
 

 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8257.1.patch


 It was brought to my attention that Accumulo is transitively bringing in some 
 artifacts with the wrong version of Hadoop.
 Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
 its necessary dependencies. Because there is no dependency with the correct 
 version in Hive, this introduces hadoop-2.2.0 dependencies.
 A solution is to make sure that hadoop-client is set with the correct 
 {{hadoop-20S.version}} or {{hadoop-23.version}}.
 Snippet from {{mvn dependency:tree -Phadoop-2}}
 {noformat}
 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
 hive-accumulo-handler ---
 [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
 [INFO] +- commons-lang:commons-lang:jar:2.6:compile
 [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
 [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
 ...
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
 [INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-29 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-8257:
-
Attachment: HIVE-8257.2.patch

 Accumulo introduces old hadoop-client dependency
 

 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8257.1.patch, HIVE-8257.2.patch


 It was brought to my attention that Accumulo is transitively bringing in some 
 artifacts with the wrong version of Hadoop.
 Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
 its necessary dependencies. Because there is no dependency with the correct 
 version in Hive, this introduces hadoop-2.2.0 dependencies.
 A solution is to make sure that hadoop-client is set with the correct 
 {{hadoop-20S.version}} or {{hadoop-23.version}}.
 Snippet from {{mvn dependency:tree -Phadoop-2}}
 {noformat}
 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
 hive-accumulo-handler ---
 [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
 [INFO] +- commons-lang:commons-lang:jar:2.6:compile
 [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
 [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
 ...
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
 [INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-29 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152674#comment-14152674
 ] 

Josh Elser commented on HIVE-8257:
--

Thanks, [~vikram.dixit]!

 Accumulo introduces old hadoop-client dependency
 

 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8257.1.patch, HIVE-8257.2.patch


 It was brought to my attention that Accumulo is transitively bringing in some 
 artifacts with the wrong version of Hadoop.
 Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
 its necessary dependencies. Because there is no dependency with the correct 
 version in Hive, this introduces hadoop-2.2.0 dependencies.
 A solution is to make sure that hadoop-client is set with the correct 
 {{hadoop-20S.version}} or {{hadoop-23.version}}.
 Snippet from {{mvn dependency:tree -Phadoop-2}}
 {noformat}
 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
 hive-accumulo-handler ---
 [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
 [INFO] +- commons-lang:commons-lang:jar:2.6:compile
 [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
 [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
 ...
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
 [INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-29 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152673#comment-14152673
 ] 

Josh Elser commented on HIVE-8257:
--

v2 patch attached with {{optional}} added to hadoop-client dependency.

 Accumulo introduces old hadoop-client dependency
 

 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8257.1.patch, HIVE-8257.2.patch


 It was brought to my attention that Accumulo is transitively bringing in some 
 artifacts with the wrong version of Hadoop.
 Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
 its necessary dependencies. Because there is no dependency with the correct 
 version in Hive, this introduces hadoop-2.2.0 dependencies.
 A solution is to make sure that hadoop-client is set with the correct 
 {{hadoop-20S.version}} or {{hadoop-23.version}}.
 Snippet from {{mvn dependency:tree -Phadoop-2}}
 {noformat}
 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
 hive-accumulo-handler ---
 [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
 [INFO] +- commons-lang:commons-lang:jar:2.6:compile
 [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
 [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
 ...
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
 [INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-7789) Documentation for AccumuloStorageHandler

2014-09-29 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7789 started by Josh Elser.

 Documentation for AccumuloStorageHandler
 

 Key: HIVE-7789
 URL: https://issues.apache.org/jira/browse/HIVE-7789
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation 
 on its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7789) Documentation for AccumuloStorageHandler

2014-09-29 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152799#comment-14152799
 ] 

Josh Elser commented on HIVE-7789:
--

Started working on this at 
https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration

 Documentation for AccumuloStorageHandler
 

 Key: HIVE-7789
 URL: https://issues.apache.org/jira/browse/HIVE-7789
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation 
 on its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-28 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151247#comment-14151247
 ] 

Josh Elser commented on HIVE-8257:
--

(bump [~vikram.dixit])

 Accumulo introduces old hadoop-client dependency
 

 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8257.1.patch


 It was brought to my attention that Accumulo is transitively bringing in some 
 artifacts with the wrong version of Hadoop.
 Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
 its necessary dependencies. Because there is no dependency with the correct 
 version in Hive, this introduces hadoop-2.2.0 dependencies.
 A solution is to make sure that hadoop-client is set with the correct 
 {{hadoop-20S.version}} or {{hadoop-23.version}}.
 Snippet from {{mvn dependency:tree -Phadoop-2}}
 {noformat}
 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
 hive-accumulo-handler ---
 [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
 [INFO] +- commons-lang:commons-lang:jar:2.6:compile
 [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
 [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
 ...
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
 [INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-25 Thread Josh Elser (JIRA)

Josh Elser created HIVE-8257:


 Summary: Accumulo introduces old hadoop-client dependency
 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0


It was brought to my attention that Accumulo is transitively bringing in some 
artifacts with the wrong version of Hadoop.

Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
its necessary dependencies. Because there is no dependency with the correct 
version in Hive, this introduces hadoop-2.2.0 dependencies.

A solution is to make sure that hadoop-client is set with the correct 
{{hadoop-20S.version}} or {{hadoop-23.version}}.

Snippet from {{mvn dependency:tree -Phadoop-2}}
{noformat}
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
hive-accumulo-handler ---
[INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
[INFO] +- commons-lang:commons-lang:jar:2.6:compile
[INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
...
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-25 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-8257:
-
Attachment: HIVE-8257.1.patch

Adds hadoop-client to dependencyManagement in parent pom and dependencies in 
the accumulo-storage pom. Verified that no old hadoop artifacts are in {{mvn 
dependency:tree}} and dist tarball no longer has old jars included in {{lib}}.

 Accumulo introduces old hadoop-client dependency
 

 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8257.1.patch


 It was brought to my attention that Accumulo is transitively bringing in some 
 artifacts with the wrong version of Hadoop.
 Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
 its necessary dependencies. Because there is no dependency with the correct 
 version in Hive, this introduces hadoop-2.2.0 dependencies.
 A solution is to make sure that hadoop-client is set with the correct 
 {{hadoop-20S.version}} or {{hadoop-23.version}}.
 Snippet from {{mvn dependency:tree -Phadoop-2}}
 {noformat}
 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
 hive-accumulo-handler ---
 [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
 [INFO] +- commons-lang:commons-lang:jar:2.6:compile
 [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
 [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
 ...
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
 [INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8257) Accumulo introduces old hadoop-client dependency

2014-09-25 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-8257:
-
Status: Patch Available  (was: Open)

 Accumulo introduces old hadoop-client dependency
 

 Key: HIVE-8257
 URL: https://issues.apache.org/jira/browse/HIVE-8257
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8257.1.patch


 It was brought to my attention that Accumulo is transitively bringing in some 
 artifacts with the wrong version of Hadoop.
 Accumulo-1.6.0 sets the Hadoop version at 2.2.0 and uses hadoop-client to get 
 its necessary dependencies. Because there is no dependency with the correct 
 version in Hive, this introduces hadoop-2.2.0 dependencies.
 A solution is to make sure that hadoop-client is set with the correct 
 {{hadoop-20S.version}} or {{hadoop-23.version}}.
 Snippet from {{mvn dependency:tree -Phadoop-2}}
 {noformat}
 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ 
 hive-accumulo-handler ---
 [INFO] org.apache.hive:hive-accumulo-handler:jar:0.14.0-SNAPSHOT
 [INFO] +- commons-lang:commons-lang:jar:2.6:compile
 [INFO] +- commons-logging:commons-logging:jar:1.1.3:compile
 [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
 ...
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
 [INFO] |  |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-22 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144194#comment-14144194
 ] 

Josh Elser commented on HIVE-7950:
--

Thanks for your help, [~sershe]!

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
 HIVE-7950.4.patch, HIVE-7950.5.patch, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-19 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Attachment: HIVE-7950.5.patch

Fixed your nit, Sergey. Thanks for taking the time to review -- much 
appreciated.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
 HIVE-7950.4.patch, HIVE-7950.5.patch, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-18 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Attachment: HIVE-7950.4.patch

Updated patch with feedback from Sergey on RB.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
 HIVE-7950.4.patch, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-17 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137640#comment-14137640
 ] 

Josh Elser commented on HIVE-7950:
--

Sure thing. RB is linked.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
 hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez

2014-09-17 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138144#comment-14138144
 ] 

Josh Elser commented on HIVE-7984:
--

Thanks, Sushanth -- much appreciated.

 AccumuloOutputFormat Configuration items from StorageHandler not re-set in 
 Configuration in Tez
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7984-1.diff, HIVE-7984-1.patch, HIVE-7984.1.patch


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-16 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136784#comment-14136784
 ] 

Josh Elser commented on HIVE-7950:
--

Test failures appear unrelated to me. Can anyone give this a review for me?

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
 hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez

2014-09-16 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136787#comment-14136787
 ] 

Josh Elser commented on HIVE-7984:
--

Test failure appears unrelated to me. Can anyone give this a review? It's a 
rather straightforward change.

 AccumuloOutputFormat Configuration items from StorageHandler not re-set in 
 Configuration in Tez
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7984-1.diff, HIVE-7984-1.patch, HIVE-7984.1.patch


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez

2014-09-10 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7984:
-
Attachment: HIVE-7984-1.patch

Same changes, but named the original attachment wrong. Fixing suffix to trigger 
HIVE-QA

 AccumuloOutputFormat Configuration items from StorageHandler not re-set in 
 Configuration in Tez
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7984-1.diff, HIVE-7984-1.patch


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-10 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Attachment: HIVE-7950.3.patch

Updated patch. Needed to make a small change after getting past the Tez bug. 
Added some more unit tests and tried to clean things up.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
 hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Attachment: hive-7950-tez-WIP.diff

I took a look at the tez branch to see if I could add more resources to an 
existing session as you described, [~sershe]. Looking at the javadoc, I feel 
like this patch should work, but the query still errors out when the map inside 
the dag fails due to missing classes.

I can see that the dag does get the extra jars localized:
{noformat}
2014-09-08 23:20:34,823 INFO [AsyncDispatcher event handler] 
org.apache.tez.dag.app.dag.impl.DAGImpl: Added additional resources : 
[[file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-fate-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-core-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-trace-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-start-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/zookeeper-3.4.6.jar]]
 to classpath
{noformat}

But I'm still getting a NoClassDefFoundException on a class which is in 
accumulo-core.jar:
{noformat}
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:183)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for 
class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:384)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:281)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:73)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134)
... 12 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for 
class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127407#comment-14127407
]

Josh Elser commented on HIVE-7950:
--

Ok, I figured a bit more out here. I believe that the AM *is* correctly getting
the extra jars from the storage handler as expected. The subsequent errors are
coming from the containers that are started to actually run the DAG (rather
than the coordination from the tez AM).

The interesting part is that the patch (HIVE-7950-1.diff) which starts a brand
new Session will result in a successful query. It seems like maybe Tez isn't
passing along the extra resources we added to the running session (AM) in Hive
along to the DAG containers to actually run the query. I have no idea at this
point if this is a problem in how hive is using tez or if it's a bug in tez
itself...

StorageHandler resources aren't added to Tez Session if already Session is
already Open
---

Key: HIVE-7950
URL: https://issues.apache.org/jira/browse/HIVE-7950
Project: Hive
Issue Type: Bug
Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff

Was trying to run some queries using the AccumuloStorageHandler when using
the Tez execution engine. Some things that classes which were added to
tmpjars weren't making it into the container. When a Tez Session is already
open, as is the normal case when simply using the `hive` command, the
resources aren't added.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127440#comment-14127440
]

Josh Elser commented on HIVE-7950:
--

Sure thing, [~gopalv].

I don't actually have to do any extra {{ADD JAR}} commands. The
AccumuloStorageHandler constructs a list of jars that need to be passed along
to the execution engine (via tmpjars in the Hadoop configuration). With the
'yarn' execution.engine, this works just fine -- the resources are localized
and added to the Map/Reduce containers and things are great.

When I try to run with 'tez', there are a few issues. The first is that, if
there is already a TezSessionState that was already open'ed (e.g. like what is
done when I just open the hive shell), it will have been started without those
extra 'tmpjars' resources from the StorageHandler and the query will fail
because we need those jars.

Sergey mentioned that Tez 0.5.0 had a new method that would allow more
resources to be added to an already started TezClient
({{TezClient#addAppMasterLocalFiles(MapString, LocalResource)}}).
Implementing this (in the hive-7950-tez-WIP.diff attachment), appears to have
successfully added the extra jars from the StorageHandler to the DAGAppMaster,
but the containers started to actually run the query are missing those extra
jars.

Does that make sense?

StorageHandler resources aren't added to Tez Session if already Session is
already Open
---

Key: HIVE-7950
URL: https://issues.apache.org/jira/browse/HIVE-7950
Project: Hive
Issue Type: Bug
Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127485#comment-14127485
 ] 

Josh Elser commented on HIVE-7950:
--

You're the man. That was exactly what I needed. I completely missed that I 
needed to add the resources to the DAG as well.

I'll clean up my changes and post and updated patch here later today after I 
poke/prod it some more.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127789#comment-14127789
 ] 

Josh Elser commented on HIVE-7950:
--

I have found one corner-case which I'm still trying to maneuver around. The DAG 
code fails if local resources are added that already exist in the state of 
extra resources that are already going to be added. The problem is that you 
can't get find out what resources are already set to be localized for a DAG.

I can call {{DAG#addTaskLocalFiles(MapString, LocalResource)}} to add 
resources but that will fail if any of them happen to be already loaded. This 
seems to be lacking WRT to Vertex which also has a {{getTaskLocalFiles()}} 
method. That's a Tez nit -- I can open a JIRA over there if you think that's 
necessary (or not already fixed upstream).

This is the actual stack I'm trying to work around:
{noformat}
org.apache.tez.dag.api.TezUncheckedException: Attempting to add duplicate 
resource: accumulo-fate-1.6.0.jar
at 
org.apache.tez.common.TezCommonUtils.addAdditionalLocalResources(TezCommonUtils.java:307)
at org.apache.tez.dag.api.Vertex.addTaskLocalFiles(Vertex.java:256)
at org.apache.tez.dag.api.DAG.createDag(DAG.java:643)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:372)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:342)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:385)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:209)
...
{noformat}

Essentially, we have a DAG, we tried to submit it to the Session we have, but 
the underlying application was dead (for testing purposes, because I 
{{kill}}'ed it, but this happens if you just wait long enough). The code gets a 
{{SessionNotRunning}} exception, tries to {{closeAndOpen}} 

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127798#comment-14127798
 ] 

Josh Elser commented on HIVE-7950:
--

Ugh, published too soon:

The code gets a {{SessionNotRunning}} exception, tries to {{closeAndOpen}} 
session, and then ultimately fails when it goes to submit the DAG. I believe I 
need to figure out a way to ensure the DAG doesn't have the extra local 
resources (from the StorageHandler) in the case where we start up a new 
Session, and then DAG would get the resources from that new Session (as opposed 
to the old session which didn't have the extra resources to begin with), but 
I'm not 100% sure yet.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127901#comment-14127901
 ] 

Josh Elser commented on HIVE-7950:
--

I think I finally got to the bottom of this, and it is broken with Tez-0.5.0.

TezTask needs to be altered (as described by the previous discussion) to add 
the necessary StorageHandler resources to the DAG using

{code}
dag.addTaskLocalFiles(localResources);
{code}

For the case of the AccumuloStorageHandler, this adds the jars necessary to 
connect to Accumulo to {{commonTaskLocalFiles}} in {{DAG}}. Then, {{TezTask}} 
will proceed to eventually submit the DAG to be run.

{code}
try {
  // ready to start execution on the cluster
  sessionState.getSession().addAppMasterLocalFiles(resourceMap);
  dagClient = sessionState.getSession().submitDAG(dag);
} catch (SessionNotRunning nr) {
  console.printInfo(Tez session was closed. Reopening...);

  // close the old one, but keep the tmp files around
  TezSessionPoolManager.getInstance().closeAndOpen(sessionState, this.conf);
  console.printInfo(Session re-established.);

  dagClient = sessionState.getSession().submitDAG(dag);
}
{code}

Consider the case where we had a Session already created for the user, but the 
underlying application has exited, say due to a timeout. In the try block, we 
try to submit our DAG to run. In doing so, TezClient creates a DAGPlan from the 
DAG

{code}
DAGPlan dagPlan = dag.createDag(amConfig.getTezConfiguration());
{code}

When we create a {{DAGPlan}} from the {{DAG}}, we modify the {{DAG}} instance, 
adding the local resources to each {{Vertex}} in the {{DAG}}. Then, we identify 
that the underlying application has already died, and that we need to 
{{closeAndOpen}} a new Session. So, we get the {{SessionNotRunning}} exception, 
pop out to the catch block, and end up creating another {{DAGPlan}} from the 
{{DAG}} _that was already altered by the last attempt to submit it_.

As I'm looking at it, I don't think there's anything I can do at the Hive level 
to fix this because {{TezClient}} will always try to add duplicate resources to 
the {{Vertex}}'s in a {{DAG}} which throws an Exception and tanks the query.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127923#comment-14127923
 ] 

Josh Elser commented on HIVE-7950:
--

Yeah, you're totally right. I was getting hung up on this edge case and forgot 
and the first issue to fix. I'll open a tez jira for the above.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Attachment: HIVE-7950.2.patch

Patch against the tez branch which attempts to ensure that the Tez AM and the 
DAG both have the necessary extra local resources as required by a 
StorageHandler. Tried to add some tests which ensure modifications to TezTask 
work as expected.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, 
 hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez

2014-09-06 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7984:
-
Summary: AccumuloOutputFormat Configuration items from StorageHandler not 
re-set in Configuration in Tez  (was: Configuration items from StorageHandler 
not passed to Tez Configuration)

 AccumuloOutputFormat Configuration items from StorageHandler not re-set in 
 Configuration in Tez
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7984-1.diff


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-05 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123529#comment-14123529
 ] 

Josh Elser commented on HIVE-7950:
--

Ah, the tez branch is on 0.5.0 (trunk is on 0.4.1). What's the lifecycle on the 
tez branch, is it occasionally merged into trunk?

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) AccumuloStorageHandler doesn't work with Hive on Tez

2014-09-04 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121463#comment-14121463
 ] 

Josh Elser commented on HIVE-7950:
--

Going to break the bigger issue down into more manageable pieces to fix.

 AccumuloStorageHandler doesn't work with Hive on Tez
 

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that I've noticed already (probably 
 more as I can get past the ones I already found):
 * Jars added to the classpath via tmpjars (which is done by the copied HBase 
 Utils class) aren't available in the Tez Map task -- need to compare to 
 HBaseStorageHandler and see if there is something magic happening
 * Configuration generated by the AccumuloStorageHandler doesn't make it all 
 the way to the Configuration passed to the AccumuloOutputFormat (probably 
 AccumuloInputFormat, too)
 {noformat}
 2014-09-03 01:28:45,357 ERROR [TezChild] 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row {row:a,col:d}
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {row:a,col:d}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183)
   ... 15 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
 java.lang.IllegalStateException: Instance has not been configured for 
 AccumuloOutputFormat
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
   ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: java.lang.IllegalStateException: Instance has not been 
 configured for AccumuloOutputFormat
   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448)
   ... 23 more
 Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has 
 not been configured for AccumuloOutputFormat
   at

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-04 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Summary: StorageHandler resources aren't added to Tez Session if already 
Session is already Open  (was: AccumuloStorageHandler doesn't work with Hive on 
Tez)

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that I've noticed already (probably 
 more as I can get past the ones I already found):
 * Jars added to the classpath via tmpjars (which is done by the copied HBase 
 Utils class) aren't available in the Tez Map task -- need to compare to 
 HBaseStorageHandler and see if there is something magic happening
 * Configuration generated by the AccumuloStorageHandler doesn't make it all 
 the way to the Configuration passed to the AccumuloOutputFormat (probably 
 AccumuloInputFormat, too)
 {noformat}
 2014-09-03 01:28:45,357 ERROR [TezChild] 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row {row:a,col:d}
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {row:a,col:d}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183)
   ... 15 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
 java.lang.IllegalStateException: Instance has not been configured for 
 AccumuloOutputFormat
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
   ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: java.lang.IllegalStateException: Instance has not been 
 configured for AccumuloOutputFormat
   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448)
   ... 23 more
 Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has 
 not been

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-04 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Description: 
Was trying to run some queries using the AccumuloStorageHandler when using the 
Tez execution engine. Some things that classes which were added to tmpjars 
weren't making it into the container. When a Tez Session is already open, as is 
the normal case when simply using the `hive` command, the resources aren't 
added.


  was:
Was trying to run some queries using the AccumuloStorageHandler when using the 
Tez execution engine. Some things that I've noticed already (probably more as I 
can get past the ones I already found):

* Jars added to the classpath via tmpjars (which is done by the copied HBase 
Utils class) aren't available in the Tez Map task -- need to compare to 
HBaseStorageHandler and see if there is something magic happening
* Configuration generated by the AccumuloStorageHandler doesn't make it all the 
way to the Configuration passed to the AccumuloOutputFormat (probably 
AccumuloInputFormat, too)

{noformat}
2014-09-03 01:28:45,357 ERROR [TezChild] 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {row:a,col:d}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {row:a,col:d}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183)
... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
java.lang.IllegalStateException: Instance has not been configured for 
AccumuloOutputFormat
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.lang.IllegalStateException: Instance has not been 
configured for AccumuloOutputFormat
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448)
... 23 more
Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has 
not been configured for AccumuloOutputFormat
at 
org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:553)
at 
org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.getHiveRecordWriter(HivePassThroughOutputFormat.java:113)
at

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-04 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Status: Patch Available  (was: Open)

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-04 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Attachment: HIVE-7950-1.diff

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration

2014-09-04 Thread Josh Elser (JIRA)

Josh Elser created HIVE-7984:


 Summary: Configuration items from StorageHandler not passed to Tez 
Configuration
 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


Ran AccumuloStorageHandler queries with Tez and found that configuration 
elements that are pulled from the {{-hiveconf}} and passed to the 
inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't 
available inside of the Tez container.

I'm guessing that there is a disconnect from the configuration that the 
StorageHandler creates and what the Tez container sees.

The HBaseStorageHandler likely doesn't run into this because it expects to have 
hbase-site.xml available via tmpjars (and can extrapolate connection 
information from that file). Accumulo's site configuration file is not meant to 
be shared with consumers which means that this exact approach is not sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration

2014-09-04 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121820#comment-14121820
 ] 

Josh Elser commented on HIVE-7984:
--

I think the cause is that 
{{PlanUtils.configureInputJobPropertiesForStorageHandler(TableDesc)}} or 
{{PlanUtils.configureOutputJobPropertiesForStorageHandler(TableDesc)}} aren't 
called in the Tez pipeline. Still trying to figure out where exactly that 
should go.

 Configuration items from StorageHandler not passed to Tez Configuration
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-04 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122275#comment-14122275
 ] 

Josh Elser commented on HIVE-7950:
--

Nope, 0.4.1-incubating. Being able to add more resources to an existing session 
would certainly be preferable though..

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration

2014-09-04 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122315#comment-14122315
 ] 

Josh Elser commented on HIVE-7984:
--

After a bunch of digging, I found that I could still work around this via the 
custom OutputFormat for Accumulo without having to actually dig into the calls 
to the StorageHandler WRT to the execution engine.

 Configuration items from StorageHandler not passed to Tez Configuration
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration

2014-09-04 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7984:
-
Attachment: HIVE-7984-1.diff

Fixes the OutputFormat to be a little more resilient. Also removed a really 
nasty log.info statement that shouldn't have been committed in the first place.

 Configuration items from StorageHandler not passed to Tez Configuration
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7984-1.diff


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7984) Configuration items from StorageHandler not passed to Tez Configuration

2014-09-04 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7984:
-
Status: Patch Available  (was: Open)

 Configuration items from StorageHandler not passed to Tez Configuration
 ---

 Key: HIVE-7984
 URL: https://issues.apache.org/jira/browse/HIVE-7984
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7984-1.diff


 Ran AccumuloStorageHandler queries with Tez and found that configuration 
 elements that are pulled from the {{-hiveconf}} and passed to the 
 inputJobProperties or outputJobProperties by the AccumuloStorageHandler 
 aren't available inside of the Tez container.
 I'm guessing that there is a disconnect from the configuration that the 
 StorageHandler creates and what the Tez container sees.
 The HBaseStorageHandler likely doesn't run into this because it expects to 
 have hbase-site.xml available via tmpjars (and can extrapolate connection 
 information from that file). Accumulo's site configuration file is not meant 
 to be shared with consumers which means that this exact approach is not 
 sufficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7928) There is no catch statement in Utils#updateMap

2014-09-02 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118298#comment-14118298
 ] 

Josh Elser commented on HIVE-7928:
--

[~skrho], I don't follow the reason for your change. The point of the 
try/finally is to ensure that the {{ZipFile}} is closed before the method 
returns. The code also does not handle the IOException that can be thrown and 
lets the caller deal with that exception ({{throws IOException}}. A try block 
does not always require a catch statement. This method looks fine to me as-is.

 There is no catch statement in Utils#updateMap
 --

 Key: HIVE-7928
 URL: https://issues.apache.org/jira/browse/HIVE-7928
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: skrho
Assignee: skrho
Priority: Minor
 Attachments: HIVE-7928_001.patch


 There is no catch statement in Utils class( In 
 accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java  line : 
 148)
 If there is no catch statement, We can't know why if exception is happended.. 
   
 I think add catch statement and throw exception..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7929) close of ZipOutputStream in Utils#jarDir() should be placed in finally block

2014-09-02 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118766#comment-14118766
 ] 

Josh Elser commented on HIVE-7929:
--

The catch block is unnecessary. I think the finally block should only contain 
{{zos.close()}} with {{zos.closeEntry();}} and {{zipDir(dir, relativePath, zos, 
true);}} moved inside of the try block. For example:

{code}
try {
 ...
  zos.closeEntry();
  zipDir(dir, relativePath, zos, true);
} finally {
  zos.close();
}
{code}

Alternatively, it might be cleaner to do a try/finally in {{createJar(File, 
File)}} to close the JarOutputStream and completely remove the {{close()}} call 
in {{jarDir(File, String, ZipOutputStream)}}.

Also, it may interest you, this code was borrowed from HBase. They may benefit 
from these same improvements in their codebase -- I forget what HBase version I 
copied this from though.

 close of ZipOutputStream in Utils#jarDir() should be placed in finally block
 

 Key: HIVE-7929
 URL: https://issues.apache.org/jira/browse/HIVE-7929
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: skrho
Assignee: skrho
Priority: Minor
  Labels: patch
 Attachments: HIVE-7929_001.patch


 In accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java , 
 line 308 :
 zos.closeEntry();
 zipDir(dir, relativePath, zos, true);
 zos.close();
 If exception is happened, ZipOutputStream would be left unclosed..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-7950) AccumuloStorageHandler doesn't work with Hive on Tez

2014-09-02 Thread Josh Elser (JIRA)

Josh Elser created HIVE-7950:


 Summary: AccumuloStorageHandler doesn't work with Hive on Tez
 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


Was trying to run some queries using the AccumuloStorageHandler when using the 
Tez execution engine. Some things that I've noticed already (probably more as I 
can get past the ones I already found):

* Jars added to the classpath via tmpjars (which is done by the copied HBase 
Utils class) aren't available in the Tez Map task -- need to compare to 
HBaseStorageHandler and see if there is something magic happening
* Configuration generated by the AccumuloStorageHandler doesn't make it all the 
way to the Configuration passed to the AccumuloOutputFormat (probably 
AccumuloInputFormat, too)

{noformat}
2014-09-03 01:28:45,357 ERROR [TezChild] 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {row:a,col:d}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:195)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:161)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:309)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {row:a,col:d}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:183)
... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
java.lang.IllegalStateException: Instance has not been configured for 
AccumuloOutputFormat
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:459)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:540)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.lang.IllegalStateException: Instance has not been 
configured for AccumuloOutputFormat
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:286)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:496)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:448)
... 23 more
Caused by: java.io.IOException: java.lang.IllegalStateException: Instance has 
not been configured for AccumuloOutputFormat
at 
org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.getRecordWriter(AccumuloOutputFormat.java:553)
at 
org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.getHiveRecordWriter(HivePassThroughOutputFormat.java:113)
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:296)
at

[jira] [Created] (HIVE-7789) Documentation for AccumuloStorageHandler

2014-08-19 Thread Josh Elser (JIRA)

Josh Elser created HIVE-7789:


 Summary: Documentation for AccumuloStorageHandler
 Key: HIVE-7789
 URL: https://issues.apache.org/jira/browse/HIVE-7789
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0


HIVE-7068 introduces an AccumuloStorageHandler. We need to add documentation on 
its usage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-08-19 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103266#comment-14103266
 ] 

Josh Elser commented on HIVE-7068:
--

Certainly -- created HIVE-7789.

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch, 
 HIVE-7068.4.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler

2014-08-18 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7068:
-

Attachment: HIVE-7068.4.patch

Rebase'd patch (#4) from upstream changes. Changes over the last patch:

* Had to make some qtest fixes (after HIVE-7519)
* Deleted two commented lines I noticed in the source

Huge thank you to Nick, Sushanth, and Navis!

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch, 
 HIVE-7068.4.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-08-15 Thread Josh Elser (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099196#comment-14099196
]

Josh Elser commented on HIVE-7068:
--

[~ndimiduk], I agree with you completely. There's no reason that the column
mapping stuff needs to be separated as it is now. I tried to make the
ColumnMapping class hierarchy a bit cleaner over what was in the hbase-handler
(it looked like there were already comments in the hbase-handler code saying
that it would be good to clean it up in the future). I'd love to help converge
these.

Many thanks for taking the time to look through it.

Integrate AccumuloStorageHandler

Key: HIVE-7068
URL: https://issues.apache.org/jira/browse/HIVE-7068
Project: Hive
Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch

[Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to
HBase. Some [initial
work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done
to support querying an Accumulo table using Hive already. It is not a
complete solution as, most notably, the current implementation presently
lacks support for INSERTs.
I would like to polish up the AccumuloStorageHandler (presently based on
0.10), implement missing basic functionality and compare it to the
HBaseStorageHandler (to ensure that we follow the same general usage
patterns).
I've also been in communication with [~bfem] (the initial author) who
expressed interest in working on this again. I hope to coordinate efforts
with him.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-08-01 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082861#comment-14082861
 ] 

Josh Elser commented on HIVE-7068:
--

Any other Hive committers have some time to look at this and potentially help 
get this merged in? Would be greatly appreciated!

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler

2014-07-29 Thread Josh Elser (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Elser updated HIVE-7068:
-

Attachment: HIVE-7068.2.patch

Minor updates to the patch:

* Removes unnecessary whitespace/javadoc
* Adds a better exception when Accumulo connection information isn't in the
hiveconf as required.
* Pulls in more upstream changes from trunk
* Fixes accumulo qtest after HIVE-5771

Also re-trigger HIVE QA which appear to have failed for other reasons on the
last patch. I'll update reviewboard as well if anyone wants to see the changes.

Integrate AccumuloStorageHandler

Key: HIVE-7068
URL: https://issues.apache.org/jira/browse/HIVE-7068
Project: Hive
Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler

2014-07-29 Thread Josh Elser (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Elser updated HIVE-7068:
-

Attachment: HIVE-7068.3.patch

Sorry, found another minor issue with serialization of strings as compared to
what HBaseStorageHandler does. New patch allows binary encoding to be specified
on strings without error (falls back to UTF8 serialization). Added a test for
it too, and cleaned up some other nits I saw in fixing the bug.

Integrate AccumuloStorageHandler

Key: HIVE-7068
URL: https://issues.apache.org/jira/browse/HIVE-7068
Project: Hive
Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler

2014-07-24 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7068:
-

Fix Version/s: 0.14.0

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
 Fix For: 0.14.0


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler

2014-07-24 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7068:
-

Attachment: HIVE-7068.1.patch

v1 of patch to add AccumuloStorageHandler.

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7068) Integrate AccumuloStorageHandler

2014-07-24 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7068:
-

Status: Patch Available  (was: Open)

First stab at adding an AccumuloStorageHandler to Hive. This is a lot of code, 
so I'll try to outline the high level features.


* Builds on impl from Brian (as mentioned in description)
* Tended to mimic HBaseStorageHandler when it made sense to do so
* Predicate pushdown on rowid to query only relevant portions of Accumulo tables
* Predicate pushdown on non-rowid columns to filter server-side in Accumulo
* Support for external tables
* Hive Map pushdown to column family plus optional column qualifier prefix
* Binary and UTF8 serialization within Accumulo
* Extendable CompositeRowId and RowIdFactory interfaces for users
* Lots of unit tests
* A handful of borrowed qtests from HBaseStorageHandler
* Accumulo 1.5.1 and 1.6.0 support (hid from users by reflection)

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7068) Integrate AccumuloStorageHandler

2014-05-15 Thread Josh Elser (JIRA)

Josh Elser created HIVE-7068:


 Summary: Integrate AccumuloStorageHandler
 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser


[Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
HBase. Some [initial 
work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
to support querying an Accumulo table using Hive already. It is not a complete 
solution as, most notably, the current implementation presently lacks support 
for INSERTs.

I would like to polish up the AccumuloStorageHandler (presently based on 0.10), 
implement missing basic functionality and compare it to the HBaseStorageHandler 
(to ensure that we follow the same general usage patterns).

I've also been in communication with [~bfem] (the initial author) who expressed 
interest in working on this again. I hope to coordinate efforts with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

77 matches

Mail list logo