date:20160923

[jira] [Created] (HIVE-14834) Reduce the retry attempts for HiveServer startup

2016-09-23 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-14834:
-

 Summary: Reduce the retry attempts for HiveServer startup
 Key: HIVE-14834
 URL: https://issues.apache.org/jira/browse/HIVE-14834
 Project: Hive
  Issue Type: Task
Reporter: Siddharth Seth


Currently, 30 attempts with a 1 minute sleep in between - 30 minutes.

That seems a little too much. Early feedback (and failure) seems like a better 
approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14835) Improve ptest2 build time

2016-09-23 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-14835:


 Summary: Improve ptest2 build time
 Key: HIVE-14835
 URL: https://issues.apache.org/jira/browse/HIVE-14835
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran


2 things can be improved
1) ptest2 always downloads jars for compiling its own directory which takes 
about 1m30s which should take only 5s with cache jars. The reason for that is 
maven.repo.local is pointing to a path under WORKSPACE which will be cleaned by 
jenkins for every run.
2) For hive build we can make use of parallel build and quite the output of 
build which should shave off another 15-30s. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 52171: HIVE-14819 FunctionInfo for permanent functions shows TEMPORARY FunctionType

2016-09-23 Thread Jason Dere



> On Sept. 22, 2016, 11:53 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java, line 499
> > 
> >
> > why shouldn't the just be in one?
> 
> Jason Dere wrote:
> That's been the behavior since HIVE-2573 - permanent UDFs loaded at Hive 
> initialization are put in the system registry, and added to the session 
> registry once they are used by that session.
> I think they at least need to go to the session registry, it's also 
> useful since it can let us know that the UDF resources have already been 
> loaded for that session. If we completely remove the UDF loading behavior in 
> Hive initialization we might be able to remove permanent UDFs from the system 
> registry. But you kept this behavior around in HIVE-13596! :)

Looking at this a bit more, the initial permanent UDF loading mechanism in 
Hive.reloadFunctions() is actually broken a bit by HIVE-12857 - it's trying to 
instantiate the UDF class and add it to Registry.persistent. But 
reloadFunctions() was never intended to actually load all of the UDF jars, so 
it fails to instantiate the class and reloadFunctions() actually fails to 
register the function during reloadFunctions(). I'll add a fix to persistent 
function detection related to this issue.
We can open a followup issue to revisit whether permanent functions should be 
loaded during Hive initialization.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52171/#review150102
---


On Sept. 22, 2016, 7:21 p.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52171/
> ---
> 
> (Updated Sept. 22, 2016, 7:21 p.m.)
> 
> 
> Review request for hive and Sergey Shelukhin.
> 
> 
> Bugs: HIVE-14819
> https://issues.apache.org/jira/browse/HIVE-14819
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Patch to allow the registry to set PERSISTENT type when registering permanent 
> functions to the session registry. Previously all functions added to session 
> registry had the TEMPORARY tag
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java 30ba996 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java 05926b5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java a16d9e5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
>  911b86b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestFunctionRegistry.java 
> d2d5a1b 
> 
> Diff: https://reviews.apache.org/r/52171/diff/
> 
> 
> Testing
> ---
> 
> Added tests to TestFunctionRegistry
> 
> 
> Thanks,
> 
> Jason Dere
> 
>

Re: Review Request 52171: HIVE-14819 FunctionInfo for permanent functions shows TEMPORARY FunctionType

2016-09-23 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52171/
---

(Updated Sept. 23, 2016, 10:44 p.m.)


Review request for hive and Sergey Shelukhin.


Changes
---

Adding fix to the issue with FunctionRegistry.isPermanentFunction() caused 
during reloadFunctions()


Bugs: HIVE-14819
https://issues.apache.org/jira/browse/HIVE-14819


Repository: hive-git


Description
---

Patch to allow the registry to set PERSISTENT type when registering permanent 
functions to the session registry. Previously all functions added to session 
registry had the TEMPORARY tag


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java 30ba996 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java de74c3e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java 05926b5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java a16d9e5 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java
 911b86b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestFunctionRegistry.java d2d5a1b 

Diff: https://reviews.apache.org/r/52171/diff/


Testing
---

Added tests to TestFunctionRegistry


Thanks,

Jason Dere

[jira] [Created] (HIVE-14837) JDBC: standalone jar is missing hadoop core dependencies

2016-09-23 Thread Gopal V (JIRA)

Gopal V created HIVE-14837:
--

 Summary: JDBC: standalone jar is missing hadoop core dependencies
 Key: HIVE-14837
 URL: https://issues.apache.org/jira/browse/HIVE-14837
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


{code}
2016/09/24 00:31:57 ERROR - jmeter.threads.JMeterThread: Test failed! 
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at 
org.apache.hive.jdbc.HiveConnection.createUnderlyingTransport(HiveConnection.java:418)
at 
org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:438)
at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:225)
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:182)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14836) Implement predict pushing down in Vectorized Page reader

2016-09-23 Thread Ferdinand Xu (JIRA)

Ferdinand Xu created HIVE-14836:
---

 Summary: Implement predict pushing down in Vectorized Page reader
 Key: HIVE-14836
 URL: https://issues.apache.org/jira/browse/HIVE-14836
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


Currently we filter blocks using Predict pushing down. We should support it in 
page reader as well to improve its efficiency. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14828) Cloud/S3: Stats publishing should be on HDFS instead of S3

2016-09-23 Thread Rajesh Balamohan (JIRA)

Rajesh Balamohan created HIVE-14828:
---

 Summary: Cloud/S3: Stats publishing should be on HDFS instead of S3
 Key: HIVE-14828
 URL: https://issues.apache.org/jira/browse/HIVE-14828
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Minor


Currently, stats files are created in S3. Later as a part of FSStatsAggregator, 
it reads this file and populates MS again.

{noformat}
2016-09-23 05:57:46,772 INFO  [main]: fs.FSStatsPublisher 
(FSStatsPublisher.java:init(49)) - created : 
s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
2016-09-23 05:57:46,773 DEBUG [main]: fs.FSStatsAggregator 
(FSStatsAggregator.java:connect(53)) - About to read stats from : 
s3a://BUCKET/test/.hive-staging_hive_2016-09-23_05-57-34_309_2648485988937054815-1/-ext-10001
{noformat}

Instead of this, stats can be written directly on to HDFS and read locally 
instead of S3, which would help in reducing couple of calls to S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14827) Micro benchmark for Parquet vectorized reader

2016-09-23 Thread Ferdinand Xu (JIRA)

Ferdinand Xu created HIVE-14827:
---

 Summary: Micro benchmark for Parquet vectorized reader
 Key: HIVE-14827
 URL: https://issues.apache.org/jira/browse/HIVE-14827
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


We need a microbenchmark to evaluate the throughput and execution time for 
Parquet vectorized reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14826) Support vectorization for Parquet

2016-09-23 Thread Ferdinand Xu (JIRA)

Ferdinand Xu created HIVE-14826:
---

 Summary: Support vectorization for Parquet
 Key: HIVE-14826
 URL: https://issues.apache.org/jira/browse/HIVE-14826
 Project: Hive
  Issue Type: New Feature
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Parquet vectorized reader can improve both throughput and also leverages 
existing Hive vectorization execution engine. This is an umbrella ticket to 
track this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 51695: HIVE-5867: JDBC driver and beeline should support executing an initial SQL script

2016-09-23 Thread Jianguo Tian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51695/
---

(Updated Sept. 23, 2016, 7:31 a.m.)


Review request for hive and cheng xu.


Bugs: HIVE-5867
https://issues.apache.org/jira/browse/HIVE-5867


Repository: hive-git


Description
---

HIVE-5867: JDBC driver and beeline should support executing an initial SQL 
script


Diffs (updated)
-

  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 
ad96a6466dd1aadab71fc261f55be4639dcbe2bf 
  jdbc/src/java/org/apache/hive/jdbc/Utils.java 
3161566994d6c6e01de9d88a6e87295684619ffa 
  jdbc/src/test/org/apache/hive/jdbc/TestRunInitialSQL.java PRE-CREATION 

Diff: https://reviews.apache.org/r/51695/diff/


Testing
---

TestRunInitialSQL.java is a JUnit test class which can test parseInitFile() 
method of HiveConnection.java. I test some positive cases and negative cases to 
look that if these cases could be parse into SQL statement which could be 
executed successfully.


Thanks,

Jianguo Tian

Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-23 Thread Rui Li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/
---

(Updated Sept. 23, 2016, 8:58 a.m.)


Review request for hive.


Bugs: HIVE-14412
https://issues.apache.org/jira/browse/HIVE-14412


Repository: hive-git


Description
---

The 1st patch to add timezone-aware timestamp.


Diffs (updated)
-

  common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
PRE-CREATION 
  contrib/src/test/queries/clientnegative/serde_regex.q a676338 
  contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
  contrib/src/test/results/clientnegative/serde_regex.q.out 0f9b036 
  contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
  hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
  hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
  jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java 93f093f 
  jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java de74c3e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f28d33e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
7be628e 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
 ba41518 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 8b0db4a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 7ceb005 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 62bbcc6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 9ba1865 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 82080eb 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java a718264 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java bccf5a6 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 17b892c 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java efae82d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 9cbc114 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 5808c90 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java a7551cb 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java c961d14 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 570408a 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 259fde8 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java 5a31e61 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java 
PRE-CREATION 
  ql/src/test/queries/clientnegative/serde_regex.q c9cfc7d 
  ql/src/test/queries/clientnegative/serde_regex2.q a29bb9c 
  ql/src/test/queries/clientnegative/serde_regex3.q 4e91f06 
  ql/src/test/queries/clientpositive/create_like.q bd39731 
  ql/src/test/queries/clientpositive/join43.q 12c45a6 
  ql/src/test/queries/clientpositive/serde_regex.q e21c6e1 
  ql/src/test/queries/clientpositive/timestamptz.q PRE-CREATION 
  ql/src/test/queries/clientpositive/timestamptz_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/timestamptz_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/timestamptz_3.q PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_cast_from_binary_1.q.out acecbae 
  ql/src/test/results/clientnegative/invalid_cast_from_binary_2.q.out 41e1c80 
  ql/src/test/results/clientnegative/invalid_cast_from_binary_3.q.out 23e3403 
  ql/src/test/results/clientnegative/invalid_cast_from_binary_4.q.out 3541ef6 
  ql/src/test/results/clientnegative/invalid_cast_from_binary_5.q.out 177039c 
  ql/src/test/results/clientnegative/invalid_cast_from_binary_6.q.out 668380f 
  ql/src/test/results/clientnegative/serde_regex.q.out 7892bb2 
  ql/src/test/results/clientnegative/serde_regex2.q.out 1ceb387 
  ql/src/test/results/clientnegative/serde_regex3.q.out 028a24f 
  ql/src/test/results/clientnegative/wrong_column_type.q.out 6ff90ea 
  ql/src/test/results/clientpositive/create_like.q.out 0111c94 
  ql/src/test/results/clientpositive/join43.q.out 127d5d0 
  ql/src/test/results/clientpositive/serde_regex.q.out 7bebb0c 
  ql/src/test/results/clientpositive/timestamptz.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/timestamptz_1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/timestamptz_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/timestamptz_3.q.out PRE-CREATION 
  serde/if/serde.thrift 6caad36 
  serde/src/gen/thrift/gen-cpp/serde_constants.h a5f33fb 
  serde/src/gen/thrift/gen-cpp/serde_constants.cpp 3a675bf 
  
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java
 04ed8f5 
  serde/src/gen/thrift/gen-php/org/apache/hadoop/hive/serde/Types.php 18c3991 
  serde/src/gen/thrift/gen-py/org_apache_hadoop_hive_serde/constants.py fafdc24

Re: Review Request 50787: Add a timezone-aware timestamp

2016-09-23 Thread Rui Li



> On Sept. 22, 2016, 11:20 a.m., Jason Dere wrote:
> > - How about compatbility with the various date functions 
> > (year()/month()/day()/etc)?

For most of the functions, TIMESTAMPTZ is implicitly converted to text. 
Therefore I think we can get correct results. I added some special handle in 
HOUR because some hour may be unavailable due to DST.
So far I've verified the following funcsions work:

to_date
year
quarter
month
day
dayofmonth
hour
minute
second
weekofyear

Is it OK we leave others in follow-on tasks? I'd like to keep the patch small.


> On Sept. 22, 2016, 11:20 a.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java,
> >  line 58
> > 
> >
> > No conversions to/from DATE/TIMESTAMP?

Added conversion from date/timestamp to timestamptz. Default timezone is used 
for the converted timestamptz.
We can add convertion from numeric types in follow-on task.


> On Sept. 22, 2016, 11:20 a.m., Jason Dere wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java,
> >  line 1109
> > 
> >
> > If the local timezone is different from the timezone in the 
> > TimestampTZ, is it possible that the year/month/day of the DATE might be 
> > different from the year/month/day of the TimestampTZ?

Good catch! It makes more sense to convert from the text representation than 
the time/nanos. So I convert the timestamptz to string first, and use that 
string to create the date. Same applies when converting to timestamp.


- Rui


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50787/#review149983
---


On Sept. 22, 2016, 4:05 a.m., Rui Li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50787/
> ---
> 
> (Updated Sept. 22, 2016, 4:05 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14412
> https://issues.apache.org/jira/browse/HIVE-14412
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The 1st patch to add timezone-aware timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/test/org/apache/hadoop/hive/common/type/TestTimestampTZ.java 
> PRE-CREATION 
>   contrib/src/test/queries/clientnegative/serde_regex.q a676338 
>   contrib/src/test/queries/clientpositive/serde_regex.q d75d607 
>   contrib/src/test/results/clientnegative/serde_regex.q.out 0f9b036 
>   contrib/src/test/results/clientpositive/serde_regex.q.out 2984293 
>   hbase-handler/src/test/queries/positive/hbase_timestamp.q 0350afe 
>   hbase-handler/src/test/results/positive/hbase_timestamp.q.out 3918121 
>   jdbc/src/java/org/apache/hive/jdbc/HiveBaseResultSet.java 93f093f 
>   jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java 38918f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java de74c3e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java f28d33e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 
> 7be628e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/TypeConverter.java
>  ba41518 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 8b0db4a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 7ceb005 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 62bbcc6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 9ba1865 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 82080eb 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java a718264 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java 17b892c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java efae82d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToDouble.java 9cbc114 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToFloat.java 5808c90 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java a7551cb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java c961d14 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java 570408a 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToString.java 5cacd59 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java 259fde8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToTimestampTZ.java
>  PRE-CREATION 
>   ql/src/test/queries/clientnegative/serde_regex.q c9cfc7d 
>   ql/src/test/queries/clientnegative/serde_regex2.q a29bb9c 
>   ql/src/test/queries/clientnegative/serde_regex3.q 4e91f06 
>   ql/src/test/queries/clientpositive/create_like.q bd39731 
>

Discovering and sharing datasets in the clouds

2016-09-23 Thread Elliot West

Hi,

I’d like to get some thoughts on a Hive metastore system/feature that we’ve
been toying with to provide greater independence and flexibility to our
data teams when operating in cloud environments. It can probably best be
thought of as a federation layer for metastores, but from the user’s
perspective it’s more similar to the external or remote table features
found in some traditional RDBMSes.

For some time I’ve been involved in projects to move data processing
functions from a single, large, on-premises clusters, to smaller, team
owned clusters in the cloud. While this has had many benefits, it has
reinforced our reliance on the Hive metastore. We use it as the source of
truth for descriptions of our data, and as a directory help us locate it.
Additionally, in the cloud it adds a layer of consistency to data stored on
eventually consistent file stores. Finally, it serves as a broadly
supported integration point for many data processing frameworks that we
might choose to use. The problem we may face is that in line with a
distributed, self-service ethos, teams will spin up their own metastores
and unintentionally create isolated silos of data. Naturally we often want
to share datasets and this arrangement, for all its benefits, is a
technological barrier to that.

To solve this problem we’ve been experimenting with a federated metastore.
Quite simply this is a service that presents a metastore Thrift API and
routes requests to different metastores based on some mappings derived from
the database and table names. By default we provide a companion ’local’
metastore instance that serves as the cluster’s primary read/write metadata
store. Users can then add read-only mappings to ‘external’ metastores using
database name prefixes such as:

‘extdb_’ → thrift://external.metastore:9038/


With this implementation we’ve been able to perform joins between tables in
different metastores and query ‘remote’ tables as if they were local.
Conceptually this architecture can allow individual teams to take full
ownership of the publishing and maintenance of their datasets while being
free to share them with other teams or divisions. Additionally the costs
involved to transport and process the data downstream are borne by the
consumer, not the owner, which seems fair. Finally there is no requirement
for a centralised team to run and manage a single organisation-wide
metastore.

Below is an example of a cross-metastore query:

hive> show databases;
OK
default
etl   -- database in local metastore
extdb_etl -- database in thrift://external.metastore:9038/

-- example of what a query looks like joining data from a
-- 'local' db called 'etl' and a 'remote' db called
-- 'extdb_etl'

hive> select
  l.id
  , r.name
from
  etl.local_table l
join
  extdb_etl.remote_table r
on (
  r.id = l.id
)
where
  l.load_date = '2016-05-13'
;


To get to this point we’ve had to side-step some of the tricker issues such
as authentication (we simply turned it off for now!) and compatibility
across different metastore versions (we’ve stuck with one version only).
Clearly these need to be addressed if we were to use this in the real world.

I see that this recent HortonWorks blog post (
http://hortonworks.com/blog/making-elephant-fly-cloud/) alludes to issues
of ‘Shared Metadata and Governance’ and perhaps the role of the metastore
in this regard. Therefore I’m wondering where to take this next as I can
envisage a number of possible forms this feature could take:

   1. A separate stand-alone federation service that sits between Hive
   clients and metastore instances.
   2. Metastore federation added as a feature to the current Hive
   metastore. Similar to 1 but integrated into the current hive-metastore
   module.
   3. Support for remote tables added to Hive. This, while similar in
   implementation to 2, might provide a user experience consistent with that
   found with configuring external tables in traditional RDBMSes:

CREATE REMOTE TABLE my_table
AS their_database.their_table
ON SERVER ‘thrift://external.metastore:9083/’


I’d appreciate any thoughts or suggestions.

Thanks,

Elliot.

Re: [DISCUSS] Hive 2.1.1 bug fix release

2016-09-23 Thread Sergio Pena

Thanks Jesus.
+1

I will take a look at some jiras useful for it.

Btw, many tests are failing on branch-2.1.
Are we going to fix them before the release?

On Thu, Sep 22, 2016 at 12:56 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> +1
>
> Thanks
> Prasanth
>
> > On Sep 22, 2016, at 5:57 AM, Jesus Camacho Rodriguez  hortonworks.com> wrote:
> >
> > Hi team,
> >
> > Since the release of 2.1.0, there have been more than 170 fixes
> > that went into branch-2.1 (awesome!). Thus, I think it is a good
> > time to release 2.1.1.
> >
> > If you would like some other fixes to be included, please tag
> > their target version as 2.1.1 so we do not miss them.
> >
> > If there are no objections, I propose to create an RC next week.
> >
> > Thanks,
> > Jesús
> >
>
>

Re: [DISCUSS] Hive 2.1.1 bug fix release

2016-09-23 Thread Jesus Camacho Rodriguez

Thanks Sergio.

Since only bug fixes were going in, I would expect that most failures
are just a matter of regenerating q files. But certainly, we should
make sure all tests are passing.

--
Jesús



On 9/23/16, 4:23 PM, "Sergio Pena"  wrote:

>Thanks Jesus.
>+1
>
>I will take a look at some jiras useful for it.
>
>Btw, many tests are failing on branch-2.1.
>Are we going to fix them before the release?
>
>On Thu, Sep 22, 2016 at 12:56 PM, Prasanth Jayachandran <
>pjayachand...@hortonworks.com> wrote:
>
>> +1
>>
>> Thanks
>> Prasanth
>>
>> > On Sep 22, 2016, at 5:57 AM, Jesus Camacho Rodriguez > hortonworks.com> wrote:
>> >
>> > Hi team,
>> >
>> > Since the release of 2.1.0, there have been more than 170 fixes
>> > that went into branch-2.1 (awesome!). Thus, I think it is a good
>> > time to release 2.1.1.
>> >
>> > If you would like some other fixes to be included, please tag
>> > their target version as 2.1.1 so we do not miss them.
>> >
>> > If there are no objections, I propose to create an RC next week.
>> >
>> > Thanks,
>> > Jesús
>> >
>>
>>

[jira] [Created] (HIVE-14829) metastore.sh fails due to classpath conflict with hive-service-rpc

2016-09-23 Thread JIRA

Sergio Peña created HIVE-14829:
--

 Summary: metastore.sh fails due to classpath conflict with 
hive-service-rpc
 Key: HIVE-14829
 URL: https://issues.apache.org/jira/browse/HIVE-14829
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.1, 2.1.0
Reporter: Sergio Peña
Assignee: Sergio Peña


When attempting to run metastore.sh to start a new HMS server, then the script 
fails that the metastore class cannot be found on the classpath. This issue 
happens because a new {{hive-service-rpc}} jar added on Hive 2.x is making 
conflict with {{hive-service}} jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Hive 2.1.1 bug fix release

2016-09-23 Thread Sergio Pena

Cool.

I will start investigating these failures.

On Fri, Sep 23, 2016 at 10:40 AM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> wrote:

> Thanks Sergio.
>
> Since only bug fixes were going in, I would expect that most failures
> are just a matter of regenerating q files. But certainly, we should
> make sure all tests are passing.
>
> --
> Jesús
>
>
>
> On 9/23/16, 4:23 PM, "Sergio Pena"  wrote:
>
> >Thanks Jesus.
> >+1
> >
> >I will take a look at some jiras useful for it.
> >
> >Btw, many tests are failing on branch-2.1.
> >Are we going to fix them before the release?
> >
> >On Thu, Sep 22, 2016 at 12:56 PM, Prasanth Jayachandran <
> >pjayachand...@hortonworks.com> wrote:
> >
> >> +1
> >>
> >> Thanks
> >> Prasanth
> >>
> >> > On Sep 22, 2016, at 5:57 AM, Jesus Camacho Rodriguez
>  >> hortonworks.com> wrote:
> >> >
> >> > Hi team,
> >> >
> >> > Since the release of 2.1.0, there have been more than 170 fixes
> >> > that went into branch-2.1 (awesome!). Thus, I think it is a good
> >> > time to release 2.1.1.
> >> >
> >> > If you would like some other fixes to be included, please tag
> >> > their target version as 2.1.1 so we do not miss them.
> >> >
> >> > If there are no objections, I propose to create an RC next week.
> >> >
> >> > Thanks,
> >> > Jesús
> >> >
> >>
> >>
>

[jira] [Created] (HIVE-14830) Move a majority of the MiniLlapCliDriver tests to use an inline AM

2016-09-23 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-14830:
-

 Summary: Move a majority of the MiniLlapCliDriver tests to use an 
inline AM
 Key: HIVE-14830
 URL: https://issues.apache.org/jira/browse/HIVE-14830
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14831) Missing Druid dependencies at runtime

2016-09-23 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14831:
--

 Summary: Missing Druid dependencies at runtime
 Key: HIVE-14831
 URL: https://issues.apache.org/jira/browse/HIVE-14831
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Excluded some packages when shading in the initial patch that should have been 
included.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14833) Add vectorization support for date truncation UDFs

2016-09-23 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14833:
--

 Summary: Add vectorization support for date truncation UDFs
 Key: HIVE-14833
 URL: https://issues.apache.org/jira/browse/HIVE-14833
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14832) Add support for 'date' and 'interval' types to date truncation UDFs

2016-09-23 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-14832:
--

 Summary: Add support for 'date' and 'interval' types to date 
truncation UDFs
 Key: HIVE-14832
 URL: https://issues.apache.org/jira/browse/HIVE-14832
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14834) Reduce the retry attempts for HiveServer startup

[jira] [Created] (HIVE-14835) Improve ptest2 build time

Re: Review Request 52171: HIVE-14819 FunctionInfo for permanent functions shows TEMPORARY FunctionType

Re: Review Request 52171: HIVE-14819 FunctionInfo for permanent functions shows TEMPORARY FunctionType

[jira] [Created] (HIVE-14837) JDBC: standalone jar is missing hadoop core dependencies

[jira] [Created] (HIVE-14836) Implement predict pushing down in Vectorized Page reader

[jira] [Created] (HIVE-14828) Cloud/S3: Stats publishing should be on HDFS instead of S3

[jira] [Created] (HIVE-14827) Micro benchmark for Parquet vectorized reader

[jira] [Created] (HIVE-14826) Support vectorization for Parquet

Re: Review Request 51695: HIVE-5867: JDBC driver and beeline should support executing an initial SQL script

Re: Review Request 50787: Add a timezone-aware timestamp

Re: Review Request 50787: Add a timezone-aware timestamp

Discovering and sharing datasets in the clouds

Re: [DISCUSS] Hive 2.1.1 bug fix release

Re: [DISCUSS] Hive 2.1.1 bug fix release

[jira] [Created] (HIVE-14829) metastore.sh fails due to classpath conflict with hive-service-rpc

Re: [DISCUSS] Hive 2.1.1 bug fix release

[jira] [Created] (HIVE-14830) Move a majority of the MiniLlapCliDriver tests to use an inline AM

[jira] [Created] (HIVE-14831) Missing Druid dependencies at runtime

[jira] [Created] (HIVE-14833) Add vectorization support for date truncation UDFs

[jira] [Created] (HIVE-14832) Add support for 'date' and 'interval' types to date truncation UDFs

21 matches

Site Navigation

Mail list logo

Footer information