[jira] [Created] (HIVE-14795) ALTER TABLE ... RENAME [OVERWRITE] [TO] to overwrite existing table

2016-09-19 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created HIVE-14795:


 Summary: ALTER TABLE ... RENAME [OVERWRITE] [TO] to overwrite 
existing table
 Key: HIVE-14795
 URL: https://issues.apache.org/jira/browse/HIVE-14795
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 2.1.0, 2.0.0, 1.1.0, 1.2.0
Reporter: Ruslan Dautkhanov


It would be great to Have in OVERWRITE option in Hive's rename command:

ALTER TABLE ... RENAME [OVERWRITE] [TO] to overwrite existing table, if it 
exists.

There are many commands like DROP TABLE IF EXISTS, LOAD DATA .. OVERWRITE that 
tell how to deal if table already exists.

We currently have to check in our ETL pipelines (outside of Hive) if table 
already exists and call different actions depending on that result.

It would simplify that logic quite a bit.

Closest in functionaility I found in MySQL: 
http://bugs.mysql.com/bug.php?id=36271

This is not a compatibility breaking change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14794) HCatalog support to pre-fetch for Avro tables that use avro.schema.url.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-14794:
---

 Summary: HCatalog support to pre-fetch for Avro tables that use 
avro.schema.url.
 Key: HIVE-14794
 URL: https://issues.apache.org/jira/browse/HIVE-14794
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 2.1.0, 1.2.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


HIVE-14792 introduces support to modify and add properties to table-parameters 
during query-planning. It prefetches remote Avro-schema information and stores 
it in TBLPROPERTIES, under {{avro.schema.literal}}.

We'll need similar support in {{HCatLoader}} to prevent excessive reads of 
schema-files in Pig queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-19 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-14793:
-

 Summary: Allow ptest branch to be specified, PROFILE override
 Key: HIVE-14793
 URL: https://issues.apache.org/jira/browse/HIVE-14793
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth


Post HIVE-14734 - the profile is automatically determined. Add an option to 
override this via Jenkins. Also add an option to specify the branch from which 
ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-14792:
---

 Summary: AvroSerde reads the remote schema-file at least once per 
mapper, per table reference.
 Key: HIVE-14792
 URL: https://issues.apache.org/jira/browse/HIVE-14792
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.0, 1.2.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


Avro tables that use "external" schema files stored on HDFS can cause excessive 
calls to {{FileSystem::open()}}, especially for queries that spawn large 
numbers of mappers.

This is because of the following code in {{AvroSerDe::initialize()}}:

{code:title=AvroSerDe.java|borderStyle=solid}
public void initialize(Configuration configuration, Properties properties) 
throws SerDeException {
// ...
if (hasExternalSchema(properties)
|| columnNameProperty == null || columnNameProperty.isEmpty()
|| columnTypeProperty == null || columnTypeProperty.isEmpty()) {
  schema = determineSchemaOrReturnErrorSchema(configuration, properties);
} else {
  // Get column names and sort order
  columnNames = Arrays.asList(columnNameProperty.split(","));
  columnTypes = 
TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);

  schema = getSchemaFromCols(properties, columnNames, columnTypes, 
columnCommentProperty);
 
properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
 schema.toString());
}
// ...
}
{code}

For files using {{avro.schema.url}}, every time the SerDe is initialized (i.e. 
at least once per mapper), the schema file is read remotely. For queries with 
thousands of mappers, this leads to a stampede to the handful (3?) datanodes 
that host the schema-file. In the best case, this causes slowdowns.

It would be preferable to distribute the Avro-schema to all mappers as part of 
the job-conf. The alternatives aren't exactly appealing:
# One can't rely solely on the {{column.list.types}} stored in the Hive 
metastore. (HIVE-14789).
# {{avro.schema.literal}} might not always be usable, because of the size-limit 
on table-parameters. The typical size of the Avro-schema file is between 
0.5-3MB, in my limited experience. Bumping the max table-parameter size isn't a 
great solution.

If the {{avro.schema.file}} were read during query-planning, and made available 
as part of table-properties (but not serialized into the metastore), the 
downstream logic will remain largely intact. I have a patch that does this.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14791) LLAP: Use FQDN for all communication

2016-09-19 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-14791:
---

 Summary: LLAP: Use FQDN for all communication 
 Key: HIVE-14791
 URL: https://issues.apache.org/jira/browse/HIVE-14791
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 2.2.0
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: 2.2.0


{code}
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
+ socketAddress.getHostName());
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
host = socketAddress.getHostName();
llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:  
public static String getHostName() {
llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: 
 return InetAddress.getLocalHost().getHostName();
llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:  
  String name = address.getHostName();
llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:  
  builder.setAmHost(address.getHostName());
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java:   
 nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), 
localAddress.get().getPort());
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
localAddress.get().getHostName(), vertex.getDagName(), 
qIdProto.getDagIndex(),
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
  new ExecutionContextImpl(localAddress.get().getHostName()), env,
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java:   
 String hostName = MetricsUtils.getHostName();
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
.setBindAddress(addr.getHostName())
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java:
  request.getContainerIdString(), executionContext.getHostName(), 
vertex.getDagName(),
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java:   
 String displayName = "LlapDaemonCacheMetrics-" + MetricsUtils.getHostName();
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java:   
 displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName();
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java:
  new LlapProtocolClientImpl(new Configuration(), 
serverAddr.getHostName(),
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java:
builder.setAmHost(getAddress().getHostName());
llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
  String displayName = "LlapTaskSchedulerMetrics-" + 
MetricsUtils.getHostName();
{code}

In systems where the hostnames do not match FQDN, calling the 
getCanonicalHostName() will allow for resolution of the hostname when accessing 
from a different base domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14790) Jenkins is not displaying test results because 'set -e' is aborting the script too soon

2016-09-19 Thread JIRA
Sergio Peña created HIVE-14790:
--

 Summary: Jenkins is not displaying test results because 'set -e' 
is aborting the script too soon
 Key: HIVE-14790
 URL: https://issues.apache.org/jira/browse/HIVE-14790
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña


Jenkins is not displaying test results because 'set -e' is aborting the script 
too soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50525: HIVE-14341: Altered skewed location is not respected for list bucketing

2016-09-19 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50525/
---

(Updated Sept. 19, 2016, 9:02 p.m.)


Review request for hive.


Changes
---

Made the changes so the desc command will show skewed location for those 
locations not updated explicitly.
With this patch, we will not automatically collect the skew mapping from the 
directory since that would cause the issue if the location is updated 
explicitly.
Rather, given a query like select * from list_bucket_single where key=1, if the 
skew location for key 1 is updated explicitly, then we will have the new 
location from HMS, otherwise, we will check the default location 
/list_bucket_single/key=1.


Repository: hive-git


Description
---

HIVE-14341: Altered skewed location is not respected for list bucketing


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java e386717 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java da46854 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
 ba4f6a7 
  ql/src/test/queries/clientpositive/create_alter_list_bucketing_table1.q 
bf89e8f 
  ql/src/test/results/clientpositive/create_alter_list_bucketing_table1.q.out 
216d3be 

Diff: https://reviews.apache.org/r/50525/diff/


Testing
---


Thanks,

Aihua Xu



[jira] [Created] (HIVE-14789) Avro Table-reads bork when using SerDe-generated table-schema.

2016-09-19 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-14789:
---

 Summary: Avro Table-reads bork when using SerDe-generated 
table-schema.
 Key: HIVE-14789
 URL: https://issues.apache.org/jira/browse/HIVE-14789
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 2.0.1, 1.2.1
Reporter: Mithun Radhakrishnan


AvroSerDe allows one to skip the table-columns in a table-definition when 
creating a table, as long as the TBLPROPERTIES includes a valid 
{{avro.schema.url}} or {{avro.schema.literal}}. The table-columns are inferred 
from processing the Avro schema file/literal.

The problem is that the inferred schema might not be congruent with the actual 
schema in the Avro schema file/literal. Consider the following table definition:

{code:sql}
CREATE TABLE avro_schema_break_1
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{
  "type": "record",
  "name": "Messages",
  "namespace": "net.myth",
  "fields": [
{
  "name": "header",
  "type": [
"null",
{
  "type": "record",
  "name": "HeaderInfo",
  "fields": [
{
  "name": "inferred_event_type",
  "type": [
"null",
"string"
  ],
  "default": null
},
{
  "name": "event_type",
  "type": [
"null",
"string"
  ],
  "default": null
},
{
  "name": "event_version",
  "type": [
"null",
"string"
  ],
  "default": null
}
  ]
}
  ]
},
{
  "name": "messages",
  "type": {
"type": "array",
"items": {
  "name": "MessageInfo",
  "type": "record",
  "fields": [
{
  "name": "message_id",
  "type": [
"null",
"string"
  ],
  "doc": "Message-ID"
},
{
  "name": "received_date",
  "type": [
"null",
"long"
  ],
  "doc": "Received Date"
},
{
  "name": "sent_date",
  "type": [
"null",
"long"
  ]
},
{
  "name": "from_name",
  "type": [
"null",
"string"
  ]
},
{
  "name": "flags",
  "type": [
"null",
{
  "type": "record",
  "name": "Flags",
  "fields": [
{
  "name": "is_seen",
  "type": [
"null",
"boolean"
  ],
  "default": null
},
{
  "name": "is_read",
  "type": [
"null",
"boolean"
  ],
  "default": null
},
{
  "name": "is_flagged",
  "type": [
"null",
"boolean"
  ],
  "default": null
}
  ]
}
  ],
  "default": null
}
  ]
}
  }
}
  ]
}');
{code}

This produces a table with the following schema:
{noformat}
2016-09-19T13:23:42,934 DEBUG [0ce7e586-13ea-4390-ac2a-6dac36e8a216 main] 
hive.log: DDL: struct avro_schema_break_1 { 
struct 
header, 
list>>
 messages}
{noformat}

Data written to this table using the AvroSchema from {{avro.schema.literal}} 
using Pig's {{AvroStorage}} cannot be read using Hive using the generated table 
schema. This is the exception one sees:

{noformat}
java.io.IOException: org.apache.avro.AvroTypeException: Found 
net.myth.HeaderInfo, expecting union
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
  at 

[jira] [Created] (HIVE-14788) Investigate how to access permanent function with restarting HS2 if load balancer is configured

2016-09-19 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-14788:
---

 Summary: Investigate how to access permanent function with 
restarting HS2 if load balancer is configured
 Key: HIVE-14788
 URL: https://issues.apache.org/jira/browse/HIVE-14788
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Aihua Xu
Assignee: Aihua Xu


When load balancer is configured for multiple HS2 servers, seems we need to 
restart each HS2 server to get permanent function to work. Since the command 
"reload function" issued from the client to refresh the global registry may is 
not targeted to a specific HS2 server, some servers may not get refreshed and 
ClassNotFoundException may be thrown later.

Investigate if it's an issue and a good solution for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14787) Ability to access DistributedCache from UDFs via Java API

2016-09-19 Thread Ilya Bystrov (JIRA)
Ilya Bystrov created HIVE-14787:
---

 Summary: Ability to access DistributedCache from UDFs via Java API
 Key: HIVE-14787
 URL: https://issues.apache.org/jira/browse/HIVE-14787
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: 1.1.0+cdh5.7.1
Reporter: Ilya Bystrov


I'm trying to create custom function

{{create function geoip as 'some.package.UDFGeoIp' using jar 
'hdfs:///user/hive/ext/HiveGeoIP.jar', file 'hdfs:///user/hive/ext/GeoIP.dat';}}

According to https://issues.apache.org/jira/browse/HIVE-1016
I should be able to access file via {{new File("./GeoIP.dat");}} (in overridden 
method {{GenericUDF#evaluate(DeferredObject[] arguments)}})
But this doesn't work.

I use the following workaround, but it's ugly:
{code}
CodeSource codeSource = 
GenericUDFGeoIP.class.getProtectionDomain().getCodeSource();
File jarFile = new File(codeSource.getLocation().toURI().getPath());
String jarDir = jarFile.getParentFile().getPath();
File actualFile = new File(jarDir + "/GeoIP.dat");
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 51694: HIVE-14713 LDAP Authentication Provider should be covered with unit tests

2016-09-19 Thread Illya Yalovyy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51694/#review149493
---



Chaoyu,

Thank you for a great review. I have also done some minor changes on my side. 
I'll update this CR in a couple of days.

- Illya Yalovyy


On Sept. 7, 2016, 2:24 p.m., Illya Yalovyy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51694/
> ---
> 
> (Updated Sept. 7, 2016, 2:24 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Chaoyu Tang, Naveen Gangam, and 
> Szehon Ho.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.
> 
> 
> Diffs
> -
> 
>   service/pom.xml ecea719 
>   
> service/src/java/org/apache/hive/service/auth/LdapAuthenticationProviderImpl.java
>  efd5393 
>   service/src/java/org/apache/hive/service/auth/ldap/ChainFilterFactory.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/auth/ldap/CustomQueryFilterFactory.java
>  PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/DirSearch.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/DirSearchFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/Filter.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/FilterFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/GroupFilterFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/LdapSearch.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/LdapSearchFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/LdapUtils.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/Query.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/QueryFactory.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/SearchResultHandler.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/auth/ldap/UserFilterFactory.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/auth/ldap/UserSearchFilterFactory.java
>  PRE-CREATION 
>   
> service/src/test/org/apache/hive/service/auth/TestLdapAtnProviderWithMiniDS.java
>  089a059 
>   
> service/src/test/org/apache/hive/service/auth/TestLdapAuthenticationProviderImpl.java
>  f276906 
>   service/src/test/org/apache/hive/service/auth/ldap/Credentials.java 
> PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/LdapTestUtils.java 
> PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/TestChainFilter.java 
> PRE-CREATION 
>   
> service/src/test/org/apache/hive/service/auth/ldap/TestCustomQueryFilter.java 
> PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/TestGroupFilter.java 
> PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/TestLdapSearch.java 
> PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/TestLdapUtils.java 
> PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/TestQuery.java 
> PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/TestQueryFactory.java 
> PRE-CREATION 
>   
> service/src/test/org/apache/hive/service/auth/ldap/TestSearchResultHandler.java
>  PRE-CREATION 
>   service/src/test/org/apache/hive/service/auth/ldap/TestUserFilter.java 
> PRE-CREATION 
>   
> service/src/test/org/apache/hive/service/auth/ldap/TestUserSearchFilter.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/51694/diff/
> 
> 
> Testing
> ---
> 
> ...hive/service> mvn clean test
> 
> ...
> 
> Results :
> 
> Tests run: 123, Failures: 0, Errors: 0, Skipped: 0
> 
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 04:18 min
> [INFO] Finished at: 2016-09-06T08:46:04-07:00
> [INFO] Final Memory: 66M/984M
> [INFO] 
> 
> 
> 
> Thanks,
> 
> Illya Yalovyy
> 
>



Re: Review Request 52029: HIVE-14753: Track the number of open/closed/abandoned sessions in HS2

2016-09-19 Thread Peter Vary

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52029/#review149468
---



Just nits. Nice clean code. Thanks!


common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java (line 
209)


nit: Too long line (100 char)



common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java (line 
117)


nit: Too long line (100 char)



common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricVariableRatioGauge.java
 (line 24)


nit: extra line?



common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricVariableRatioGauge.java
 (line 33)


nit: indent by 4 space instead



common/src/test/org/apache/hadoop/hive/common/metrics/MetricsTestUtils.java 
(line 53)


nit: indent by 4 space instead



common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestMetricVariableRatioGauge.java
 (line 67)


nit: maybe one test, where the ration is not decimal?



service/src/java/org/apache/hive/service/cli/session/SessionManager.java (line 
260)


Documentation - we should create a documentation of these statistics. This 
abandoned session metrics was not clear for me - had to read the code :)



service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java
 (line 152)


nit: Too long line (100 char)



service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java
 (line 154)


nit: Too long line (100 char)



service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java
 (line 252)


nit: name (testAciveSessionTimeMetrics)


- Peter Vary


On Sept. 19, 2016, 1:21 p.m., Barna Zsombor Klara wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52029/
> ---
> 
> (Updated Sept. 19, 2016, 1:21 p.m.)
> 
> 
> Review request for hive, Gabor Szadovszky, Peter Vary, and Sergio Pena.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-14753: Track the number of open/closed/abandoned sessions in HS2
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java 
> 9be9b50aa02ff88816eb92079eaff9afa3e1be90 
>   common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java 
> 4297233ed12a7d9a2fa03ac3204e8335c0aed821 
>   
> common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsConstant.java
>  9dc96f9c6412720a891b5c55e2074049c893d780 
>   
> common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java
>  4c433678bd62ea74b80babce9856681192deb25f 
>   
> common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricVariableRatioGauge.java
>  PRE-CREATION 
>   common/src/test/org/apache/hadoop/hive/common/metrics/MetricsTestUtils.java 
> 46676589e6656d0f13f1931bfe67a63dd1920042 
>   
> common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestMetricVariableRatioGauge.java
>  PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 15bab0660fcb9a997d66f6ff0a5dbc0e39c37ae7 
>   
> service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java
>  5511c54ff431211f7f72deaa017c915b839dfb2a 
> 
> Diff: https://reviews.apache.org/r/52029/diff/
> 
> 
> Testing
> ---
> 
> Ran the unit tests in the common and the ql subprojects.
> Manually verified the metrics using the HS2 webui metric dump.
> 
> 
> Thanks,
> 
> Barna Zsombor Klara
> 
>



Review Request 52029: HIVE-14753: Track the number of open/closed/abandoned sessions in HS2

2016-09-19 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52029/
---

Review request for hive, Gabor Szadovszky, Peter Vary, and Sergio Pena.


Repository: hive-git


Description
---

HIVE-14753: Track the number of open/closed/abandoned sessions in HS2


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/metrics/LegacyMetrics.java 
9be9b50aa02ff88816eb92079eaff9afa3e1be90 
  common/src/java/org/apache/hadoop/hive/common/metrics/common/Metrics.java 
4297233ed12a7d9a2fa03ac3204e8335c0aed821 
  
common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsConstant.java
 9dc96f9c6412720a891b5c55e2074049c893d780 
  
common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java
 4c433678bd62ea74b80babce9856681192deb25f 
  
common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/MetricVariableRatioGauge.java
 PRE-CREATION 
  common/src/test/org/apache/hadoop/hive/common/metrics/MetricsTestUtils.java 
46676589e6656d0f13f1931bfe67a63dd1920042 
  
common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestMetricVariableRatioGauge.java
 PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
15bab0660fcb9a997d66f6ff0a5dbc0e39c37ae7 
  
service/src/test/org/apache/hive/service/cli/session/TestSessionManagerMetrics.java
 5511c54ff431211f7f72deaa017c915b839dfb2a 

Diff: https://reviews.apache.org/r/52029/diff/


Testing
---

Ran the unit tests in the common and the ql subprojects.
Manually verified the metrics using the HS2 webui metric dump.


Thanks,

Barna Zsombor Klara



Re: Load performance with partitioned table

2016-09-19 Thread naveen mahadevuni
hi Franke,

1) We are using 4 indentical AWS machines. 8 vCPUs, 32 GB RAM. 1 TB storage
2) Setting up bloom filters only on two other string columns. Not all of
them.
3) The data is any event data ex: Syslog.
4) Queries usually run on timestamp range with additional predicates on
other columns (mostly equality)
4) We use SNAPPY compression with 256 MB blocks.
5) ORC stripe size is 256MB, HDFS block size is 128 MB
6) The time for first INSERT is 206 seconds and the second one is 302
seconds.

Thanks,
Naveen

On Fri, Sep 16, 2016 at 4:57 AM, Jörn Franke  wrote:

> What is your hardware setup?
> Are the bloom filters necessary on all columns? Usually they make only
> sense for non-numeric columns. Updating bloom filters take time and should
> be avoided where they do not make sense.
> Can you provide an example of the data and the select queries that you
> execute on them?
> Do you use compression on the tables? If so which?
> What are the exact times and data volumes?
>
> > On 15 Sep 2016, at 19:56, naveen mahadevuni 
> wrote:
> >
> > Hi,
> >
> > I'm using ORC format for our table storage. The table has a timestamp
> > column(say TS) and 25 other columns. The other ORC properties we are
> using
> > arestorage index and bloom filters. We are loading 100 million records in
> > to this table on a 4-node cluster.
> >
> > Our source table is a text table with CSV format. In the source table
> > timestamp values come as BIGINT. In the INSERT SELECT, we use function
> > "from_unixtime(sourceTable.TS)" to convert the BIGINT values to
> timestamp
> > in the target ORC table. So the first INSERT SELECT in to non-partitioned
> > table looks like this
> >
> > 1) INSERT INTO TARGET SELECT from_unixtime(ts), col1, col2... from
> SOURCE.
> >
> > I wanted to test by partitioning the table by date derived from this
> > timestamp, so I used "to_date(from_unixtime(TS))" in the new INSERT
> SELECT
> > with dynamic partitioning. The second one is
> >
> > 2) INSERT INTO TARGET PARTITION(datecol) SELECT from_unixtime(ts), col1,
> > col2... to_date(from_unixtime(ts)) as datecol from SOURCE.
> >
> > The load time increased by 50% from 1 to 2. I understand the second
> > statement involves creating many more partition directories and files.
> >
> > Is there anyway we can improve the load time? In the second INSERT
> SELECT,
> > will the result of the expression "from_unixtime(ts)" be reused in
> > "to_date(from_unixtime(ts))"?
> >
> > Thanks,
> > Naveen
>