date:20130711

[
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705943#comment-13705943
]

Vikram Dixit K commented on HIVE-4675:
--

I used this framework to run tests on hive on a single node. It took about half
the time that it normally takes which is great. However, I am unable to figure
out the failing tests. I got a message that goes:

TestOrcHCatLoader has one or more failing tests... Also, it doesn't seem like
the output is integrated with the ant testreport target. It would be great to
see a summary of failing tests. Could you please elaborate on how to get an
idea of the failing tests.

Thanks!

Create new parallel unit test environment
-

Key: HIVE-4675
URL: https://issues.apache.org/jira/browse/HIVE-4675
Project: Hive
Issue Type: Improvement
Components: Testing Infrastructure
Reporter: Brock Noland
Assignee: Brock Noland
Fix For: 0.12.0

Attachments: HIVE-4675.patch

The current ptest tool is great, but it has the following limitations:
-Requires an NFS filer
-Unless the NFS filer is dedicated ptests can become IO bound easily
-Investigating of failures is troublesome because the source directory for
the failure is not saved
-Ignoring or isolated tests is not supported
-No unit tests for the ptest framework exist
It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-07-11 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705947#comment-13705947
 ] 

Brock Noland commented on HIVE-4675:


Hi,

Great to hear! The TEST-*.xml file should be in the logs directory in the 
working dir. Typically we run this via jenkins and then in the jenkins build 
script copy the TEST-*.xml files into a directory for jenkins to parse.

I think we could generate some kind of report as well, did you want to create 
an enhancement request describing what you'd like?

Brock

 Create new parallel unit test environment
 -

 Key: HIVE-4675
 URL: https://issues.apache.org/jira/browse/HIVE-4675
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4675.patch


 The current ptest tool is great, but it has the following limitations:
 -Requires an NFS filer
 -Unless the NFS filer is dedicated ptests can become IO bound easily
 -Investigating of failures is troublesome because the source directory for 
 the failure is not saved
 -Ignoring or isolated tests is not supported
 -No unit tests for the ptest framework exist
 It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ivan A. Veselovsky (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky reassigned HIVE-2991:


Assignee: Ivan A. Veselovsky

 Integrate Clover with Hive
 --

 Key: HIVE-2991
 URL: https://issues.apache.org/jira/browse/HIVE-2991
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Ivan A. Veselovsky
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
 hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
 hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
 hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
 hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
 hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
 hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
 HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
 HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip


 Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
 make use of it to generate code coverage report to figure out which areas of 
 Hive are well tested and which ones are not. More information about license 
 can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive

2013-07-11 Thread Jitendra Nath Pandey (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706089#comment-13706089
]

Jitendra Nath Pandey commented on HIVE-4160:

Dmitry, Vinod
There is significant amount of vectorization work in expression evaluation
for example, arithmetic expressions or logical expressions or aggregations etc.
Many of these expressions are pretty generic and different systems are likely
to have similar semantics for these. It should be possible to re-use this code
with little change in pig or other systems. It will be required to use same
vectorized representation of data in the processing engine to re-use these
expressions, but that part of code is also generic and re-usable. I think that
could be a good starting point.
However, a bunch of the vectorization work is in operator code where we have
vectorized version of the hive operators. These operators are closely tied with
hive semantics and implementation. Therefore, it will need some restructuring
in hive code base as well to generalize these operators for re-use in other
projects. Also, at this point we should be thinking more generally about a
common physical layer shared between pig and hive. These languages can continue
to have different logical plans but it would be desirable that they share
common physical plan structure because they both use same map-reduce runtime.

Vectorized Query Execution in Hive
--

Key: HIVE-4160
URL: https://issues.apache.org/jira/browse/HIVE-4160
Project: Hive
Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
Attachments: Hive-Vectorized-Query-Execution-Design.docx,
Hive-Vectorized-Query-Execution-Design-rev2.docx,
Hive-Vectorized-Query-Execution-Design-rev3.docx,
Hive-Vectorized-Query-Execution-Design-rev3.docx,
Hive-Vectorized-Query-Execution-Design-rev3.pdf,
Hive-Vectorized-Query-Execution-Design-rev4.docx,
Hive-Vectorized-Query-Execution-Design-rev4.pdf,
Hive-Vectorized-Query-Execution-Design-rev5.docx,
Hive-Vectorized-Query-Execution-Design-rev5.pdf,
Hive-Vectorized-Query-Execution-Design-rev6.docx,
Hive-Vectorized-Query-Execution-Design-rev6.pdf,
Hive-Vectorized-Query-Execution-Design-rev7.docx,
Hive-Vectorized-Query-Execution-Design-rev8.docx,
Hive-Vectorized-Query-Execution-Design-rev8.pdf,
Hive-Vectorized-Query-Execution-Design-rev9.docx,
Hive-Vectorized-Query-Execution-Design-rev9.pdf

The Hive query execution engine currently processes one row at a time. A
single row of data goes through all the operators before the next row can be
processed. This mode of processing is very inefficient in terms of CPU usage.
Research has demonstrated that this yields very low instructions per cycle
[MonetDB X100]. Also currently Hive heavily relies on lazy deserialization
and data columns go through a layer of object inspectors that identify column
type, deserialize data and determine appropriate expression routines in the
inner loop. These layers of virtual method calls further slow down the
processing.
This work will add support for vectorized query execution to Hive, where,
instead of individual rows, batches of about a thousand rows at a time are
processed. Each column in the batch is represented as a vector of a primitive
data type. The inner loop of execution scans these vectors very fast,
avoiding method calls, deserialization, unnecessary if-then-else, etc. This
substantially reduces CPU time used, and gives excellent instructions per
cycle (i.e. improved processor pipeline utilization). See the attached design
specification for more details.

[jira] [Commented] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706097#comment-13706097
 ] 

Ashutosh Chauhan commented on HIVE-2991:


[~iveselovsky] Seems like you have expanded the scope of this jira quite a bit. 
Your other changes (introducing targets in build system) are quite useful, but 
they are orthogonal to clover integration (as far as i understand). I would 
suggest to split the patch in three parts: one for clover integration, second 
for improvement in test infrastructure and third for improvements in build 
infra.

 Integrate Clover with Hive
 --

 Key: HIVE-2991
 URL: https://issues.apache.org/jira/browse/HIVE-2991
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Ivan A. Veselovsky
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
 hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
 hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
 hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
 hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
 hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
 hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
 HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
 HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip


 Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
 make use of it to generate code coverage report to figure out which areas of 
 Hive are well tested and which ones are not. More information about license 
 can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive

2013-07-11 Thread Dmitriy V. Ryaboy (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706190#comment-13706190
]

Dmitriy V. Ryaboy commented on HIVE-4160:
-

Jitendra,
I believe physical plan primitives for both Hive and Pig (and potentially
others) are going to come in via Tez, as both Pig and Hive want to get off
strict MR in the long-term.

I'll take a crack at extracting what's extractable. Right now Hive's UDAF
reaches fairly deeply into this code, as you noted, but I think with a little
restructuring this can be factored out.

Vectorized Query Execution in Hive
--

[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde


 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Summary: Reduce or eliminate the expensive Schema equals() check for 
AvroSerde  (was: Speed up AvroSerde by checking hashcodes instead of equality)

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4675) Create new parallel unit test environment


[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706236#comment-13706236
 ] 

Vikram Dixit K commented on HIVE-4675:
--

[~brocknoland] I have raised HIVE-4842 for the same. 

Thanks!

 Create new parallel unit test environment
 -

 Key: HIVE-4675
 URL: https://issues.apache.org/jira/browse/HIVE-4675
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.12.0

 Attachments: HIVE-4675.patch


 The current ptest tool is great, but it has the following limitations:
 -Requires an NFS filer
 -Unless the NFS filer is dedicated ptests can become IO bound easily
 -Investigating of failures is troublesome because the source directory for 
 the failure is not saved
 -Ignoring or isolated tests is not supported
 -No unit tests for the ptest framework exist
 It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4842) Hive parallel test framework 2 needs to summarize failures

Vikram Dixit K created HIVE-4842:


 Summary: Hive parallel test framework 2 needs to summarize failures
 Key: HIVE-4842
 URL: https://issues.apache.org/jira/browse/HIVE-4842
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.12.0
Reporter: Vikram Dixit K
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.12.0


Currently when unit tests are run, there are multiple simple ways to consume 
the results. Particularly ant testreport target that generates an html file for 
easily locating failures. The ptest2 changes coming from HIVE-4675 is great for 
running the tests in parallel but not very easy to figure out the failing 
tests. It would be great to have an output similar to that of the testreport 
target for easy consumption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3756) LOAD DATA does not honor permission inheritence

2013-07-11 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706269#comment-13706269
]

Sushanth Sowmyan commented on HIVE-3756:

I have a few more thoughts on this. Let's walk through an example:

Let's say Parent Dir d1 has permission/group combination A.
Let's say directory d2 inside Parent Dir has permission/group combination B.

In the case of non-partitioned tables, d1 will be the database/warehouse dir,
and d2 the table dir.
In the case of partitioned tables, d1 will be the table directory and d2 the
appropriate partition directories.

If we did not have the flag to inherit permissions on, then whatever data is
loaded, be it files inside d2 (as during a load operation) or replacing d2 and
everything in it (as during an insert overwrite operation), will have yet
another permission/group combination C, which is a function of the user's
current umask and the user's default group

The purpose behind the subdir inherit permissions flag is to make this
behaviour go away, and to be able to use the parent dir's permissions/group
when possible. So far, so good.

Let's say, for purposes of this entire discussion from now onwards, the flag to
inherit permissions is on.

Now, if we load data into d2, without using overwrite, files inside d2 get
permission B.
If we load data into d2, using overwrite, we now overwrite d2, and thus, d2
takes on d1's permissions, and so do the files inside, thus resulting in d2 and
files inside d2 having permissions/group combination A.

While this behaviour is consistent, I find that from a user's perspective, if
they create a table (say unpartitioned), then chmod/chgrp it to B, and then
they try to load data into it using an Insert-Overwrite, then they still expect
that they're only overwriting data inside the table dir, and their expectation
is that the table still have permissions/group-combination B. They don't want
it to be replaced by A, the parent db dir's permissions/group , and they
don't want C, the umask/current-user-default-group.

Now, as to whether this requires a new flag that overrides
hive.warehouse.subdir.inherit.perms or whether they want
hive.warehouse.subdir.inherit.perms to work in this way is still up for
discussion, but there is now need for an additional requirement, that of the
following:

If the directory being moved in already exists, and will be deleted so that
this can be placed, then instead of going with the parent permissions, it
should go with the previous dir's permissions.

Thoughts?

This can be a separate jira if people feel like it should be, but I think it's
also a minor modification of this current jira.

LOAD DATA does not honor permission inheritence
-

Key: HIVE-3756
URL: https://issues.apache.org/jira/browse/HIVE-3756
Project: Hive
Issue Type: Bug
Components: Authorization, Security
Affects Versions: 0.9.0
Reporter: Johndee Burks
Assignee: Chaoyu Tang
Attachments: HIVE-3756_1.patch, HIVE-3756.patch

When a LOAD DATA operation is performed the resulting data in hdfs for the
table does not maintain permission inheritance. This remains true even with
the hive.warehouse.subdir.inherit.perms set to true.
The issue is easily reproducible by creating a table and loading some data
into it. After the load is complete just do a dfs -ls -R on the warehouse
directory and you will see that the inheritance of permissions worked for the
table directory but not for the data.

[jira] [Updated] (HIVE-4055) add Date data type


 [ 
https://issues.apache.org/jira/browse/HIVE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4055:
-

Status: Patch Available  (was: Open)

 add Date data type
 --

 Key: HIVE-4055
 URL: https://issues.apache.org/jira/browse/HIVE-4055
 Project: Hive
  Issue Type: Sub-task
  Components: JDBC, Query Processor, Serializers/Deserializers, UDF
Reporter: Sun Rui
 Attachments: Date.pdf, HIVE-4055.1.patch.txt, HIVE-4055.2.patch.txt, 
 HIVE-4055.D11547.1.patch


 Add Date data type, a new primitive data type which supports the standard SQL 
 date type.
 Basically, the implementation can take HIVE-2272 and HIVE-2957 as references.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde


[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706299#comment-13706299
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

Thanks Edward for the comments.
We are now trying to take a different approach to address the same issue.
A new patch is coming soon.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

Vikram Dixit K created HIVE-4843:


 Summary: Refactoring MapRedTask and ExecDriver for better 
re-usability (for tez) and readability
 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch

Currently, there are static apis in multiple locations in ExecDriver and 
MapRedTask that can be leveraged if put in the already existing utility class 
in the exec package. This would help making the code more maintainable, 
readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability


 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4843:
-

Attachment: HIVE-4843.1.patch

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Islam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/
---

Review request for hive, Ashutosh Chauhan and Jakob Homan.


Bugs: HIVE-4732
https://issues.apache.org/jira/browse/HIVE-4732


Repository: hive-git


Description
---

From our performance analysis, we found AvroSerde's schema.equals() call 
consumed a substantial amount ( nearly 40%) of time. This patch intends to 
minimize the number schema.equals() calls by pushing the check as late/fewer 
as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.
 
   


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
dbc999f 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
c85ef15 
  
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
 66f0348 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
9af751b 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 

Diff: https://reviews.apache.org/r/12480/diff/


Testing
---


Thanks,

Mohammad Islam

[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde


 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Attachment: HIVE-4732.v1.patch

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

[
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706360#comment-13706360
]

Mohammad Kamrul Islam commented on HIVE-4732:
-

New patch is uploaded in RB: https://reviews.apache.org/r/12480/

Description copied from RB:
From our performance analysis, we found AvroSerde's schema.equals() call
consumed a substantial amount ( nearly 40%) of time. This patch intends to
minimize the number schema.equals() calls by pushing the check as late/fewer
as possible.

At first, we added a unique id for each record reader which is then included in
every AvroGenericRecordWritable. Then, we introduce two new data structures
(one hashset and one hashmap) to store intermediate data to avoid duplicates
checkings. Hashset contains all the record readers' IDs that don't need any
re-encoding. On the other hand, HashMap contains the already used re-encoders.
It works as cache and allows re-encoders reuse. With this change, our test
shows nearly 40% reduction in Avro record reading time.

Reduce or eliminate the expensive Schema equals() check for AvroSerde
-

Key: HIVE-4732
URL: https://issues.apache.org/jira/browse/HIVE-4732
Project: Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch

The AvroSerde spends a significant amount of time checking schema equality.
Changing to compare hashcodes (which can be computed once then reused) will
improve performance.

[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde


 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Status: Patch Available  (was: Open)

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706419#comment-13706419
 ] 

Gunther Hagleitner commented on HIVE-4843:
--

can you create a review on rb or phabricator please?

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-07-11 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706460#comment-13706460
 ] 

Rajesh Balamohan commented on HIVE-4331:


This will be extremely beneficial for lots of usecases involving Hive, HBase, 
HCatalog and Pig.  Especially one can think of hosting frequently changed data 
in HBase and access it in Hive/Pig/MapReduce via HCatalog.


 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability


[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706462#comment-13706462
 ] 

Vikram Dixit K commented on HIVE-4843:
--

https://reviews.apache.org/r/12476/

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4844) Add char/varchar data types

Jason Dere created HIVE-4844:


 Summary: Add char/varchar data types
 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere


Add new char/varchar data types which have support for more SQL-compliant 
behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces


[ 
https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706519#comment-13706519
 ] 

Jason Dere commented on HIVE-3745:
--

Would it make more sense to support the SQL comparison semantics using new char 
data types, so that we don't break existing behavior for strings? I've created 
HIVE-4844.

 Hive does improper = based string comparisons for strings with trailing 
 whitespaces
 -

 Key: HIVE-3745
 URL: https://issues.apache.org/jira/browse/HIVE-3745
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.9.0
Reporter: Harsh J
Assignee: Gang Tim Liu

 Compared to other systems such as DB2, MySQL, etc., which disregard trailing 
 whitespaces in a string used when comparing two strings with the {{=}} 
 relational operator, Hive does not do this.
 For example, note the following line from the MySQL manual: 
 http://dev.mysql.com/doc/refman/5.1/en/char.html
 {quote}
 All MySQL collations are of type PADSPACE. This means that all CHAR and 
 VARCHAR values in MySQL are compared without regard to any trailing spaces. 
 {quote}
 Hive still is whitespace sensitive and regards trailing spaces of a string as 
 worthy elements when comparing. Ideally {{LIKE}} should consider this 
 strongly, but {{=}} should not.
 Is there a specific reason behind this difference of implementation in Hive's 
 SQL?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4844) Add char/varchar data types


 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Assignee: Jason Dere

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere

 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4841:
--

Attachment: HIVE-4841.D11673.1.patch

navis requested code review of HIVE-4841 [jira] Add partition level hook to 
HiveMetaHook.

Reviewers: JIRA

HIVE-4841 Add partition level hook to HiveMetaHook

Current HiveMetaHook provides hooks for tables only. With partition level hook, 
external storages also could be revised to exploit PPR.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11673

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  
hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseHCatStorageHandler.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/27615/

To: JIRA, navis


 Add partition level hook to HiveMetaHook
 

 Key: HIVE-4841
 URL: https://issues.apache.org/jira/browse/HIVE-4841
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4841.D11673.1.patch


 Current HiveMetaHook provides hooks for tables only. With partition level 
 hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706557#comment-13706557
 ] 

Navis commented on HIVE-4841:
-

I've consolidated various methods,

add_partition_with_environment_context()
append_partition_with_environment_context()
append_partition_by_name_with_environment_context()

into single entry point

add_partition_with_environment_context()

add passed all tests

 Add partition level hook to HiveMetaHook
 

 Key: HIVE-4841
 URL: https://issues.apache.org/jira/browse/HIVE-4841
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4841.D11673.1.patch


 Current HiveMetaHook provides hooks for tables only. With partition level 
 hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4841:


Status: Patch Available  (was: Open)

 Add partition level hook to HiveMetaHook
 

 Key: HIVE-4841
 URL: https://issues.apache.org/jira/browse/HIVE-4841
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4841.D11673.1.patch


 Current HiveMetaHook provides hooks for tables only. With partition level 
 hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-07-11 Thread Viraj Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated HIVE-4331:
-

Attachment: HIVE_4331.patch

Initial patch will put it on review board

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE_4331.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4658) Make KW_OUTER optional in outer joins


[ 
https://issues.apache.org/jira/browse/HIVE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706581#comment-13706581
 ] 

Edward Capriolo commented on HIVE-4658:
---

Can we go +1?

 Make KW_OUTER optional in outer joins
 -

 Key: HIVE-4658
 URL: https://issues.apache.org/jira/browse/HIVE-4658
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Edward Capriolo
Priority: Trivial
 Attachments: hive-4658.2.patch.txt, HIVE-4658.D11091.1.patch


 For really trivial migration issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .


[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706586#comment-13706586
 ] 

Edward Capriolo commented on HIVE-3404:
---

+1 

 UDF to obtain the quarter of an year if a date or timestamp is given .
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
 Attachments: HIVE-3404.1.patch.txt


 Hive current releases lacks a function which returns the quarter of an year 
 if a date or timestamp is given .The function QUARTER(date) would return the 
 quarter  from a date / timestamp .This can be used in HiveQL.This will be 
 useful for different domains like retail ,finance etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .


[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706593#comment-13706593
 ] 

Edward Capriolo commented on HIVE-3404:
---

You also need to update show_functions.q

 UDF to obtain the quarter of an year if a date or timestamp is given .
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
 Attachments: HIVE-3404.1.patch.txt


 Hive current releases lacks a function which returns the quarter of an year 
 if a date or timestamp is given .The function QUARTER(date) would return the 
 quarter  from a date / timestamp .This can be used in HiveQL.This will be 
 useful for different domains like retail ,finance etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-1446) Move Hive Documentation from the wiki to version control


 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1446.
---

Resolution: Fixed

 Move Hive Documentation from the wiki to version control
 

 Key: HIVE-1446
 URL: https://issues.apache.org/jira/browse/HIVE-1446
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: hive-1446.diff, hive-1446-part-1.diff, hive-logo-wide.png


 Move the Hive Language Manual (and possibly some other documents) from the 
 Hive wiki to version control. This work needs to be coordinated with the 
 hive-dev and hive-user community in order to avoid missing any edits as well 
 as to avoid or limit unavailability of the docs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

[
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706597#comment-13706597
]

Edward Capriolo commented on HIVE-2989:
---

Did we ditch this idea? should we close up shop?

Adding Table Links to Hive
--

Key: HIVE-2989
URL: https://issues.apache.org/jira/browse/HIVE-2989
Project: Hive
Issue Type: Improvement
Components: Metastore, Query Processor, Security
Affects Versions: 0.10.0
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt,
HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt,
HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt

Original Estimate: 672h
Remaining Estimate: 672h

This will add Table Links to Hive. This will be an alternate mechanism for a
user to access tables and data in a database that is different from the one
he is associated with. This feature can be used to provide access control (if
access to databasename.tablename in queries and use database X is turned
off in conjunction).
If db X wants to access one or more partitions from table T in db Y, the user
will issue:
CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
New partitions added to T will automatically be added to the link as well and
become available to X. However, if the link is specified to be static, that
will not be the case. The X user will then have to explicitly import each
partition of T that he needs. The command above will not actually make any
existing partitions of T available to X. Instead, we provide the following
command to add an existing partition to a link:
ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
The user will need to execute the above for each existing partition that
needs to be imported. For future partitions, Hive will take care of this. An
imported partition can be dropped from a link using a similar command. We
just specify DROP instead of ADD. For querying the linked table, the X
user will refer to it as T@Y. Link Tables will only have read access and not
be writable. The entire Table Link alongwith all its imported partitions can
be dropped as follows:
DROP LINK TO T@Y
The above commands are purely MetaStore operations. The implementation will
rely on replicating the entire partition metadata when a partition is added
to a link. For every link that is created, we will add a new row to table
TBLS. The TBL_TYPE column will have a new kind of value LINK_TABLE (or
STATIC_LINK_TABLE if the link has been specified as static). A new column
LINK_TBL_ID will be added which will contain the id of the imported table. It
will be NULL for all other table types including the regular managed tables.
When a partition is added to a link, the new row in the table PARTITIONS will
point to the LINK_TABLE in the same database and not the master table in the
other database. We will replicate all the metadata for this partition from
the master database. The advantage of this approach is that fewer changes
will be needed in query processing and DDL for LINK_TABLEs. Also, commands
like SHOW TABLES and SHOW PARTITIONS will work as expected for
LINK_TABLEs too. Of course, even though the metadata is not shared, the
underlying data on disk is still shared. Hive still needs to know that when
dropping a partition which belongs to a LINK_TABLE, it should not drop the
underlying data from HDFS. Views and external tables cannot be imported from
one database to another.

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Bhushan Mandhani (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706598#comment-13706598
]

Bhushan Mandhani commented on HIVE-2989:

Hi, Bhushan Mandhani is no longer at Facebook so this email address is no
longer being monitored. If you need assistance, please contact another person
who is currently at the company.

Adding Table Links to Hive
--

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Resolved] (HIVE-2591) Hive 0.7.1 fails with Exception in thread main java.lang.NoSuchFieldError: type


 [ 
https://issues.apache.org/jira/browse/HIVE-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-2591.
---

Resolution: Won't Fix

 Hive 0.7.1 fails with Exception in thread main java.lang.NoSuchFieldError: 
 type
 ---

 Key: HIVE-2591
 URL: https://issues.apache.org/jira/browse/HIVE-2591
 Project: Hive
  Issue Type: Bug
  Components: CLI, JDBC, SQL
Affects Versions: 0.7.1
 Environment: Intel Core2 Quad CPU Q8400 @2.66GHz
 4 GB RAM
 Ubuntu 10.10 32 bit
 JDK 6.0_27
 Apache Ant 1.8.0
 Apache Hive 0.7.1
 Apache Hadoop 0.20.203.0
Reporter: Prashanth
Priority: Blocker
  Labels: hive

 Hi,
 When I try to invoke hive and type in SHOW TABLES in cli in the environment 
 as explained above, I get Exception in thread main 
 java.lang.NoSuchFieldError: type and I am not able to use it at all.
 Is there any temporary fix for this? Please let me know, if I am making any 
 mistake here.
 I have downloaded Hive 0.7.1 from the download link as mentioned in the Hive 
 Wiki. The download url is http://hive.apache.org/releases.html.
 /opt/hive-0.7.1$ hive
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
 org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
 Hive history file=/tmp/hadoop/hive_job_log_hduser_20190121_764439225.txt
 hive SHOW TABLES;
 Exception in thread main java.lang.NoSuchFieldError: type
 at 
 org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1234)
 at 
 org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:5942)
 at org.antlr.runtime.Lexer.nextToken(Lexer.java:89)
 at 
 org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:133)
 at 
 org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:127)
 at 
 org.antlr.runtime.CommonTokenStream.setup(CommonTokenStream.java:127)
 at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:91)
 at 
 org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:521)
 at 
 org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:436)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:327)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 I am not sure what is the actual issue here or rather how to fix it.
 Can you please let me know if there is any workaround for this.
 Alternatively I tried building hive from the SVN source repo.
 I am neither able to build hive from SVN. I get the following error.
 [datanucleusenhancer]   D:\hive\build\ivy\lib\default\zookeeper-3.3.1.jar
 [datanucleusenhancer] Exception in thread main java.lang.VerifyError: 
 Expecting a stackmap frame at branch target 76 in method 
 org.apache.hadoop.hive.metastore.model.MDatabase.jdoCopyField(Lorg/apache/hadoop/hive/metastore/model/MDatabase;I)V
  at offset 1
 [datanucleusenhancer]   at java.lang.Class.getDeclaredFields0(Native Method)
 [datanucleusenhancer]   at 
 java.lang.Class.privateGetDeclaredFields(Class.java:2308)
 [datanucleusenhancer]   at java.lang.Class.getDeclaredFields(Class.java:1760)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.ClassMetaData.addMetaDataForMembersNotInMetaData(ClassMetaData.java:358)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.ClassMetaData.populate(ClassMetaData.java:199)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager$1.run(MetaDataManager.java:2394)
 [datanucleusenhancer]   at java.security.AccessController.doPrivileged(Native 
 Method)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.populateAbstractClassMetaData(MetaDataManager.java:2388)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.populateFileMetaData(MetaDataManager.java:2225)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.initialiseFileMetaDataForUse(MetaDataManager.java:925)
 [datanucleusenhancer]   at 
 org.datanucleus.metadata.MetaDataManager.loadMetadataFiles(MetaDataManager.java:399)
 [datanucleusenhancer]   at

[jira] [Resolved] (HIVE-2989) Adding Table Links to Hive

[
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Edward Capriolo resolved HIVE-2989.
---

Resolution: Won't Fix

Adding Table Links to Hive
--

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW