[jira] [Resolved] (HIVE-14751) Add support for date truncation

2016-09-26 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-14751.

Resolution: Fixed

Thanks for reporting this [~pxiong]. I am closing this one again, and I created 
HIVE-14843 to follow-up on this issue.

> Add support for date truncation
> ---
>
> Key: HIVE-14751
> URL: https://issues.apache.org/jira/browse/HIVE-14751
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14751.patch
>
>
> Add support for {{floor ( to )}}, which is equivalent to 
> {{date_trunc(, )}}.
> https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: HIVE-14029.8.patch

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, 
> HIVE-14029.6.patch, HIVE-14029.7.patch, HIVE-14029.8.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14676) JDBC driver should support executing an initial SQL script

2016-09-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu resolved HIVE-14676.
-
Resolution: Duplicate

> JDBC driver should support executing an initial SQL script
> --
>
> Key: HIVE-14676
> URL: https://issues.apache.org/jira/browse/HIVE-14676
> Project: Hive
>  Issue Type: Sub-task
>  Components: Clients, JDBC
>Reporter: Ferdinand Xu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-5867:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to the master. Thanks Jianguo for the contribution.

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Fix For: 2.2.0
>
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14824) Separate fstype from cluster type in QTestUtil

2016-09-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524840#comment-15524840
 ] 

Hive QA commented on HIVE-14824:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12830431/HIVE-14824.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10630 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testMetadataQueriesWithSerializeThriftInTasks
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1308/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1308/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1308/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12830431 - PreCommit-HIVE-Build

> Separate fstype from cluster type in QTestUtil
> --
>
> Key: HIVE-14824
> URL: https://issues.apache.org/jira/browse/HIVE-14824
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14824.01.patch, HIVE-14824.02.patch
>
>
> QTestUtil cluster type encodes the file system. e.g. 
> MiniClusterType.encrypted means mr + encrypted hdfs, spark means file://, mr 
> means hdfs etc.
> These can be separated out. e.g. To add tests for tez against encrypted, and 
> llap against encrypted - I'd need to introduce 2 new cluster types.
> Instead it's better to separate the storage into it's own types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14824) Separate fstype from cluster type in QTestUtil

2016-09-26 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14824:
--
Attachment: HIVE-14824.02.patch

Updated patch.

> Separate fstype from cluster type in QTestUtil
> --
>
> Key: HIVE-14824
> URL: https://issues.apache.org/jira/browse/HIVE-14824
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14824.01.patch, HIVE-14824.02.patch
>
>
> QTestUtil cluster type encodes the file system. e.g. 
> MiniClusterType.encrypted means mr + encrypted hdfs, spark means file://, mr 
> means hdfs etc.
> These can be separated out. e.g. To add tests for tez against encrypted, and 
> llap against encrypted - I'd need to introduce 2 new cluster types.
> Instead it's better to separate the storage into it's own types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-14751) Add support for date truncation

2016-09-26 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reopened HIVE-14751:


Hi [~jcamachorodriguez], as reported by [~ekoifman] and tested by myself, your 
patch introduced ambiguity into the grammar.
{code}
warning(200): IdentifiersParser.g:327:5:
Decision can match input such as "KW_DAY KW_TO KW_SECOND" using multiple 
alternatives: 2, 5

As a result, alternative(s) 5 were disabled for that input
warning(200): IdentifiersParser.g:327:5:
Decision can match input such as "KW_YEAR KW_TO KW_MONTH" using multiple 
alternatives: 1, 3

As a result, alternative(s) 3 were disabled for that input
{code} 

Could you please take another look? Thanks.

> Add support for date truncation
> ---
>
> Key: HIVE-14751
> URL: https://issues.apache.org/jira/browse/HIVE-14751
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14751.patch
>
>
> Add support for {{floor ( to )}}, which is equivalent to 
> {{date_trunc(, )}}.
> https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-09-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524596#comment-15524596
 ] 

Alan Gates commented on HIVE-13966:
---

All of the calls to the notifiers look like:
{code}
for (MetaStoreEventListener transactionalListener : transactionalListeners) {
 if(transactionalListener instanceof  TransactionalMetaStoreEventListener) {
transactionalListener.onCreateDatabase(new CreateDatabaseEvent(db, true, 
this));
  }
}
{code}
So you're creating a new event for every listener.  But the contents of these 
events are 90% final, so it's pretty much the same event for every listener.  
And as far as I can tell no one ever alters the contents of those events.  In 
the AlterTableHandler case it's worse because after creating the event you're 
setting the environment, which is again the same for every event.  Given that 
this is happening in the inner loop of the metastore operations and we want it 
to be as fast as possible it seems something like the following would be better:
{code}
CreateDatabaseEvent cde = null;
for (MetaStoreEventListener transactionalListener : transactionalListeners) {
  if (cde == null) cde = new CreateDatabaseEvent(db, true, this);
  transactionalListener.onCreateDatabase(cde);
}
{code}

Also, why are you checking the instance type every time?  As per my previous 
comment I'm not sure we need a separate interface type, but if we do, just 
assume that's what they are in the for loop rather than assuming the more 
general and then doing an instanceof for the specific.

Why did you add the renameTable operation to RawStore?  That doesn't seem 
related to this.

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Assignee: Rahul Sharma
>Priority: Critical
> Attachments: HIVE-13966.1.patch, HIVE-13966.2.patch, HIVE-13966.pdf
>
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-09-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524561#comment-15524561
 ] 

Alan Gates commented on HIVE-13966:
---

I'd like to work on getting this patch committed.  I haven't finished reviewing 
it yet but I have one question from what I've seen so far.  Why create a new 
interface TransactionalMetaStoreEventListener?  Based on the changes in the 
config file the system will no which event listener to put in this group so I'm 
not sure what the different interface buys us.

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Assignee: Rahul Sharma
>Priority: Critical
> Attachments: HIVE-13966.1.patch, HIVE-13966.2.patch, HIVE-13966.pdf
>
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14830) Move a majority of the MiniLlapCliDriver tests to use an inline AM

2016-09-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524142#comment-15524142
 ] 

Siddharth Seth commented on HIVE-14830:
---

[~mmccline] - this will end up running the Tez AM, as well as llap within the 
same test process. I think that's what is required for maven.surefire.debug and 
IDE connectivity to work.
The existing TestMiniLlapCliDriver tests already run LLAP within the test 
process - it's only the AM which is spawned as a separate process. Partial 
debugging should already be possible.

> Move a majority of the MiniLlapCliDriver tests to use an inline AM
> --
>
> Key: HIVE-14830
> URL: https://issues.apache.org/jira/browse/HIVE-14830
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524138#comment-15524138
 ] 

Aihua Xu commented on HIVE-1:
-

[~xuefuz] Can you take a second look at the patch? The tests are not related.

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-14842) LLAP: Heatmap for CPU and cache utilization

2016-09-26 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14842 started by Gopal V.
--
> LLAP: Heatmap for CPU and cache utilization
> ---
>
> Key: HIVE-14842
> URL: https://issues.apache.org/jira/browse/HIVE-14842
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: UI
>
> This allows for the detection of nodes being ignored during scheduling or 
> nodes which failed and are restarting periodically with a visual inspection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14842) LLAP: Heatmap for CPU and cache utilization

2016-09-26 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14842:
---
Affects Version/s: 2.2.0

> LLAP: Heatmap for CPU and cache utilization
> ---
>
> Key: HIVE-14842
> URL: https://issues.apache.org/jira/browse/HIVE-14842
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: UI
>
> This allows for the detection of nodes being ignored during scheduling or 
> nodes which failed and are restarting periodically with a visual inspection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14842) LLAP: Heatmap for CPU and cache utilization

2016-09-26 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14842:
---
Labels: UI  (was: )

> LLAP: Heatmap for CPU and cache utilization
> ---
>
> Key: HIVE-14842
> URL: https://issues.apache.org/jira/browse/HIVE-14842
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: UI
>
> This allows for the detection of nodes being ignored during scheduling or 
> nodes which failed and are restarting periodically with a visual inspection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14842) LLAP: Heatmap for CPU and cache utilization

2016-09-26 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-14842:
--

Assignee: Gopal V

> LLAP: Heatmap for CPU and cache utilization
> ---
>
> Key: HIVE-14842
> URL: https://issues.apache.org/jira/browse/HIVE-14842
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Gopal V
>Assignee: Gopal V
>
> This allows for the detection of nodes being ignored during scheduling or 
> nodes which failed and are restarting periodically with a visual inspection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524106#comment-15524106
 ] 

Hive QA commented on HIVE-1:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12830374/HIVE-1.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10631 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.hcatalog.mapreduce.TestHCatMultiOutputFormat.org.apache.hive.hcatalog.mapreduce.TestHCatMultiOutputFormat
org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1307/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1307/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1307/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12830374 - PreCommit-HIVE-Build

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 

[jira] [Commented] (HIVE-14840) MSCK not adding the missing partitions to Hive Metastore when the partition names are not in lowercase

2016-09-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523902#comment-15523902
 ] 

Sergey Shelukhin commented on HIVE-14840:
-

This is by design if the FS is case sensitive iirc. Metastore won't be able to 
find this partition later.

> MSCK not adding the missing partitions to Hive Metastore when the partition 
> names are not in lowercase
> --
>
> Key: HIVE-14840
> URL: https://issues.apache.org/jira/browse/HIVE-14840
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Sushil Kumar S
>Assignee: Sushil Kumar S
>Priority: Minor
>  Labels: hive
>
> Hi,
>   There's is a bug while running MSCK REPAIR TABLE EXTERNAL_TABLE_NAME on 
> Hive 1.2.1, all the partition that are not present in the metastore are being 
> listed but not added if the partition names are not in lowercase, in other 
> words if an external path has a camel case based name and value i.e 
> s3n://some_external_path/myPartition=01 it just gets listed as partition not 
> found in metastore but doesn’t add it.
> However, am not able to run ALTER TABLE MY_EXTERNAL_TABLE RECOVER PARTITIONS; 
> on hive 1.2 and based on the source code from hive-exec am able to see under 
> org/apache/hadoop/hive/ql/parse/HiveParser.g:1001:1 that there's no token 
> matching in the grammar for RECOVER PARTITIONS.
> Example:
> - When external path = s3n://some_external_path/myPartition=01
>hive> MSCK REPAIR TABLE my_external_table;
>Partitions not in metastore: my_external_table:mypartition=01
>Time taken: 1.729 seconds, Fetched: 2 row(s)
> hive> show partitions foster.ola_raven_raven_users_raw;
> OK
> Time taken: 0.901 seconds, Fetched: 0 row(s)
> - When external path = s3n://some_external_path/mypartition=01
> hive> MSCK REPAIR TABLE my_external_table;
> Partitions not in metastore: my_external_table:mypartition=01
> Repair: Added partition to metastore my_external_table:mypartition=01
> Time taken: 1.729 seconds, Fetched: 2 row(s)
>  hive> show partitions my_external_table;
>  OK
>  mypartition=01
>  Time taken: 1.101 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Status: Patch Available  (was: In Progress)

Patch-3: the test case passed locally. Not sure why it failed here. Modified 
the test case slightly to see what's going on. 

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: HIVE-1.3.patch

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Status: In Progress  (was: Patch Available)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3173) implement getTypeInfo database metadata method

2016-09-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-3173:
-
   Resolution: Fixed
 Assignee: Xiu (Joe) Guo
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Patch (finally) committed.  Thank Xiu.

> implement getTypeInfo database metadata method 
> ---
>
> Key: HIVE-3173
> URL: https://issues.apache.org/jira/browse/HIVE-3173
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.8.1
>Reporter: N Campbell
>Assignee: Xiu (Joe) Guo
> Fix For: 2.2.0
>
> Attachments: Hive-3173.patch.txt
>
>
> The JDBC driver does not implement the database metadata method getTypeInfo. 
> Hence, an application cannot dynamically determine the available type 
> information and associated properties. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-09-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523698#comment-15523698
 ] 

Sergio Peña commented on HIVE-14373:


[~poeppt] I left some comments on the RB.

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Thomas Poepping
> Attachments: HIVE-14373.02.patch, HIVE-14373.03.patch, 
> HIVE-14373.04.patch, HIVE-14373.05.patch, HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-26 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523571#comment-15523571
 ] 

Vihang Karajgaonkar commented on HIVE-5867:
---

Hi [~Ferd] I don't have any further comments/questions. Thanks for checking.

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7973) Hive Replication Support

2016-09-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523520#comment-15523520
 ] 

Sushanth Sowmyan commented on HIVE-7973:


Follow up phase uber jira created at HIVE-14841

> Hive Replication Support
> 
>
> Key: HIVE-7973
> URL: https://issues.apache.org/jira/browse/HIVE-7973
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>
> A need for replication is a common one in many database management systems, 
> and it's important for hive to evolve support for such a tool as part of its 
> ecosystem. Hive already supports an EXPORT and IMPORT command, which can be 
> used to dump out tables, distcp them to another cluster, and and 
> import/create from that. If we had a mechanism by which exports and imports 
> could be automated, it establishes the base with which replication can be 
> developed.
> One place where this kind of automation can be developed is with aid of the 
> HiveMetaStoreEventHandler mechanisms, to generate notifications when certain 
> changes are committed to the metastore, and then translate those 
> notifications to export actions, distcp actions and import actions on another 
> import action.
> Part of that already exists is with the Notification system that is part of 
> hcatalog-server-extensions. Initially, this was developed to be able to 
> trigger a JMS notification, which an Oozie workflow can use to can start off 
> actions keyed on the finishing of a job that used HCatalog to write to a 
> table. While this currently lives under hcatalog, the primary reason for its 
> existence has a scope well past hcatalog alone, and can be used as-is without 
> the use of HCatalog IF/OF. This can be extended, with the help of a library 
> which does that aforementioned translation. I also think that these sections 
> should live in a core hive module, rather than being tucked away inside 
> hcatalog.
> Once we have rudimentary support for table & partition replication, we can 
> then move on to further requirements of replication, such as metadata 
> replications (such as replication of changes to roles/etc), and/or optimize 
> away the requirement to distcp and use webhdfs instead, etc.
> This Story tracks all the bits that go into development of such a system - 
> I'll create multiple smaller tasks inside this as we go on.
> Please also see HIVE-10264 for documentation-related links for this, and 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment 
> for associated wiki (currently in progress)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14841) Replication - Phase 2

2016-09-26 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14841:

Description: 
Per email sent out to the dev list, the current implementation of replication 
in hive has certain drawbacks, for instance :

* Replication follows a rubberbanding pattern, wherein different tables/ptns 
can be in a different/mixed state on the destination, so that unless all events 
are caught up on, we do not have an equivalent warehouse. Thus, this only 
satisfies DR cases, not load balancing usecases, and the secondary warehouse is 
really only seen as a backup, rather than as a live warehouse that trails the 
primary.
* The base implementation is a naive implementation, and has several 
performance problems, including a large amount of duplication of data for 
subsequent events, as mentioned in HIVE-13348, having to copy out entire 
partitions/tables when just a delta of files might be sufficient/etc. Also, 
using EXPORT/IMPORT allows us a simple implementation, but at the cost of tons 
of temporary space, much of which is not actually applied at the destination.

Thus, to track this, we now create a new branch (repl2) and a uber-jira(this 
one) to track experimental development towards improvement of this situation.

  was:
Per email sent out to the dev list, the current implementation of replication 
in hive has certain drawbacks, for instance :

* Replication follows a rubberbanding pattern, wherein different
tables/ptns can be in a different/mixed state on the destination, so
that unless all events are caught up on, we do not have an equivalent
warehouse. Thus, this only satisfies DR cases, not load balancing
usecases, and the secondary warehouse is really only seen as a backup,
rather than as a live warehouse that trails the primary.
* The base implementation is a naive implementation, and has several
performance problems, including a large amount of duplication of data
for subsequent events, as mentioned in HIVE-13348, having to copy out
entire partitions/tables when just a delta of files might be
sufficient/etc. Also, using EXPORT/IMPORT allows us a simple
implementation, but at the cost of tons of temporary space, much of
which is not actually applied at the destination.

Thus, to track this, we now create a new branch (repl2) and a uber-jira(this 
one) to track experimental development towards improvement of this situation.


> Replication - Phase 2
> -
>
> Key: HIVE-14841
> URL: https://issues.apache.org/jira/browse/HIVE-14841
> Project: Hive
>  Issue Type: New Feature
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sushanth Sowmyan
>
> Per email sent out to the dev list, the current implementation of replication 
> in hive has certain drawbacks, for instance :
> * Replication follows a rubberbanding pattern, wherein different tables/ptns 
> can be in a different/mixed state on the destination, so that unless all 
> events are caught up on, we do not have an equivalent warehouse. Thus, this 
> only satisfies DR cases, not load balancing usecases, and the secondary 
> warehouse is really only seen as a backup, rather than as a live warehouse 
> that trails the primary.
> * The base implementation is a naive implementation, and has several 
> performance problems, including a large amount of duplication of data for 
> subsequent events, as mentioned in HIVE-13348, having to copy out entire 
> partitions/tables when just a delta of files might be sufficient/etc. Also, 
> using EXPORT/IMPORT allows us a simple implementation, but at the cost of 
> tons of temporary space, much of which is not actually applied at the 
> destination.
> Thus, to track this, we now create a new branch (repl2) and a uber-jira(this 
> one) to track experimental development towards improvement of this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14146) Column comments with "\n" character "corrupts" table metadata

2016-09-26 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523506#comment-15523506
 ] 

Aihua Xu commented on HIVE-14146:
-

[~pvary] Do you have the latest patch for reviewing? Can you attach a new one 
to run the precommit test first?

> Column comments with "\n" character "corrupts" table metadata
> -
>
> Key: HIVE-14146
> URL: https://issues.apache.org/jira/browse/HIVE-14146
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14146.10.patch, HIVE-14146.2.patch, 
> HIVE-14146.3.patch, HIVE-14146.4.patch, HIVE-14146.5.patch, 
> HIVE-14146.6.patch, HIVE-14146.7.patch, HIVE-14146.8.patch, 
> HIVE-14146.9.patch, HIVE-14146.patch, changes
>
>
> Create a table with the following(noting the \n in the COMMENT):
> {noformat}
> CREATE TABLE commtest(first_nm string COMMENT 'Indicates First name\nof an 
> individual’);
> {noformat}
> Describe shows that now the metadata is messed up:
> {noformat}
> beeline> describe commtest;
> +---++---+--+
> | col_name  | data_type  |comment|
> +---++---+--+
> | first_nm | string   | Indicates First name  |
> | of an individual  | NULL   | NULL  |
> +---++---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14835) Improve ptest2 build time

2016-09-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523342#comment-15523342
 ] 

Sergio Peña commented on HIVE-14835:


LGTM
+1

> Improve ptest2 build time
> -
>
> Key: HIVE-14835
> URL: https://issues.apache.org/jira/browse/HIVE-14835
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14835.1.patch
>
>
> NO PRECOMMIT TESTS
> 2 things can be improved
> 1) ptest2 always downloads jars for compiling its own directory which takes 
> about 1m30s which should take only 5s with cache jars. The reason for that is 
> maven.repo.local is pointing to a path under WORKSPACE which will be cleaned 
> by jenkins for every run.
> 2) For hive build we can make use of parallel build and quite the output of 
> build which should shave off another 15-30s. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: HIVE-1.2.patch

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: (was: HIVE-1.2.patch)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14358) Add metrics for number of queries executed for each execution engine (mr, spark, tez)

2016-09-26 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523311#comment-15523311
 ] 

Yongzhi Chen commented on HIVE-14358:
-

LGTM +1

> Add metrics for number of queries executed for each execution engine (mr, 
> spark, tez)
> -
>
> Key: HIVE-14358
> URL: https://issues.apache.org/jira/browse/HIVE-14358
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Affects Versions: 2.1.0
>Reporter: Lenni Kuff
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14358.patch
>
>
> HiveServer2 currently has a metric for the total number of queries ran since 
> last restart, but it would be useful to also have metrics for number of 
> queries ran for each execution engine. This would improve supportability by 
> allowing users to get a high-level understanding of what workloads had been 
> running on the server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523301#comment-15523301
 ] 

Hive QA commented on HIVE-1:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12830337/HIVE-1.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1305/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1305/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1305/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-09-26 14:56:47.851
+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1305/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-09-26 14:56:47.853
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
HEAD is now at 4ce5fe1 HIVE-14831: Missing Druid dependencies at runtime (Jesus 
Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git clean -f -d
warning: unable to access '/home/sseth/.config/git/ignore': Permission denied
Removing java/
Removing src/
+ git checkout master
warning: unable to access '/home/sseth/.config/git/ignore': Permission denied
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 4ce5fe1 HIVE-14831: Missing Druid dependencies at runtime (Jesus 
Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-09-26 14:56:48.786
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: No such 
file or directory
error: 
a/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java:
 No such file or directory
error: 
a/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java: 
No such file or directory
error: 
a/spark-client/src/test/java/org/apache/hive/spark/client/rpc/TestRpc.java: No 
such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12830337 - PreCommit-HIVE-Build

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 

[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Status: Patch Available  (was: In Progress)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: HIVE-1.2.patch

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Status: In Progress  (was: Patch Available)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Attachment: (was: HIVE-1.2.patch)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code

2016-09-26 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517298#comment-15517298
 ] 

Chaoyu Tang edited comment on HIVE-9423 at 9/26/16 2:12 PM:


Yeah, it makes sense. This patch improves the usability though we might have 
missed some error case (e.g. TTransportException: java.net.SocketException: 
Connection reset") which is also resulted from exceeding the max worker #.
Do the TTransportException like TTransportException.UNKNOWN or 
TTransportException.END_OF_FILE only happen during connection time? They could 
also happen during execution time (after the connection), right? If so, are the 
messages like "Unknown HS2 problem when connecting to Thrift server" a little 
confusing since they are particular for connection issues?


was (Author: ctang.ma):
Yeah, it makes sense. This patch improves the usability though we might have 
missed some error case (e.g. TTransportException: java.net.SocketException: 
Connection reset") which is also resulted from exceeding the max worker #.
Do the TTransportException like TTransportException.UNKNOWN or 
TTransportException.END_OF_FILE only happen during connection time? They could 
also happen during execution time (after the connection), right? If so, is the 
message like "Unknown HS2 problem when connecting to Thrift server" be a little 
confusing?

> HiveServer2: Provide the user with different error messages depending on the 
> Thrift client exception code
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-9423) HiveServer2: Provide the user with different error messages depending on the Thrift client exception code

2016-09-26 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517298#comment-15517298
 ] 

Chaoyu Tang edited comment on HIVE-9423 at 9/26/16 2:06 PM:


Yeah, it makes sense. This patch improves the usability though we might have 
missed some error case (e.g. TTransportException: java.net.SocketException: 
Connection reset") which is also resulted from exceeding the max worker #.
Do the TTransportException like TTransportException.UNKNOWN or 
TTransportException.END_OF_FILE only happen during connection time? They could 
also happen during execution time (after the connection), right? If so, is the 
message like "Unknown HS2 problem when connecting to Thrift server" be a little 
confusing?


was (Author: ctang.ma):
Yeah, it makes sense. This patch improves the usability though we might have 
missed some error case (e.g. TTransportException: java.net.SocketException: 
Connection reset") which is also resulted from exceeding the max worker #.
+1

> HiveServer2: Provide the user with different error messages depending on the 
> Thrift client exception code
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.2.patch, HIVE-9423.3.patch, HIVE-9423.4.patch, 
> HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14840) MSCK not adding the missing partitions to Hive Metastore when the partition names are not in lowercase

2016-09-26 Thread Sushil Kumar S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushil Kumar S updated HIVE-14840:
--
Labels: hive  (was: )

> MSCK not adding the missing partitions to Hive Metastore when the partition 
> names are not in lowercase
> --
>
> Key: HIVE-14840
> URL: https://issues.apache.org/jira/browse/HIVE-14840
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Sushil Kumar S
>Assignee: Sushil Kumar S
>Priority: Minor
>  Labels: hive
>
> Hi,
>   There's is a bug while running MSCK REPAIR TABLE EXTERNAL_TABLE_NAME on 
> Hive 1.2.1, all the partition that are not present in the metastore are being 
> listed but not added if the partition names are not in lowercase, in other 
> words if an external path has a camel case based name and value i.e 
> s3n://some_external_path/myPartition=01 it just gets listed as partition not 
> found in metastore but doesn’t add it.
> However, am not able to run ALTER TABLE MY_EXTERNAL_TABLE RECOVER PARTITIONS; 
> on hive 1.2 and based on the source code from hive-exec am able to see under 
> org/apache/hadoop/hive/ql/parse/HiveParser.g:1001:1 that there's no token 
> matching in the grammar for RECOVER PARTITIONS.
> Example:
> - When external path = s3n://some_external_path/myPartition=01
>hive> MSCK REPAIR TABLE my_external_table;
>Partitions not in metastore: my_external_table:mypartition=01
>Time taken: 1.729 seconds, Fetched: 2 row(s)
> hive> show partitions foster.ola_raven_raven_users_raw;
> OK
> Time taken: 0.901 seconds, Fetched: 0 row(s)
> - When external path = s3n://some_external_path/mypartition=01
> hive> MSCK REPAIR TABLE my_external_table;
> Partitions not in metastore: my_external_table:mypartition=01
> Repair: Added partition to metastore my_external_table:mypartition=01
> Time taken: 1.729 seconds, Fetched: 2 row(s)
>  hive> show partitions my_external_table;
>  OK
>  mypartition=01
>  Time taken: 1.101 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13316) Upgrade to Calcite 1.9

2016-09-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523086#comment-15523086
 ] 

Hive QA commented on HIVE-13316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12797717/HIVE-13316.01.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1304/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1304/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1304/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-09-26 13:41:42.539
+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1304/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-09-26 13:41:42.541
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   de5e184..8015644  branch-2.1 -> origin/branch-2.1
+ git reset --hard HEAD
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
HEAD is now at 4ce5fe1 HIVE-14831: Missing Druid dependencies at runtime (Jesus 
Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git clean -f -d
warning: unable to access '/home/sseth/.config/git/ignore': Permission denied
+ git checkout master
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
warning: unable to access '/home/sseth/.config/git/ignore': Permission denied
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
HEAD is now at 4ce5fe1 HIVE-14831: Missing Druid dependencies at runtime (Jesus 
Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-09-26 13:41:44.491
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
warning: unable to access '/home/sseth/.config/git/attributes': Permission 
denied
error: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelBuilder.java: 
already exists in working directory
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelFactories.java:41
error: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelFactories.java: 
patch does not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:35
error: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java: 
patch does not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRexUtil.java:1
error: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRexUtil.java: patch 
does not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveReduceExpressionsRule.java:16
error: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveReduceExpressionsRule.java:
 patch does not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:63
error: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: patch 
does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}


[jira] [Updated] (HIVE-13316) Upgrade to Calcite 1.9

2016-09-26 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13316:
---
Summary: Upgrade to Calcite 1.9  (was: Upgrade to Calcite 1.7)

> Upgrade to Calcite 1.9
> --
>
> Key: HIVE-13316
> URL: https://issues.apache.org/jira/browse/HIVE-13316
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13316.01.patch, HIVE-13316.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Status: Patch Available  (was: In Progress)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12222) Define port range in property for RPCServer

2016-09-26 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-1:

Status: In Progress  (was: Patch Available)

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Spark
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>Assignee: Aihua Xu
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch
>
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14426) Extensive logging on info level in WebHCat

2016-09-26 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14426:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   Status: Resolved  (was: Patch Available)

Thanks [~pvary] for the patch.

> Extensive logging on info level in WebHCat
> --
>
> Key: HIVE-14426
> URL: https://issues.apache.org/jira/browse/HIVE-14426
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, 
> HIVE-14426.4.patch, HIVE-14426.5.patch, HIVE-14426.6.patch, 
> HIVE-14426.7.patch, HIVE-14426.8.patch, HIVE-14426.9-branch-2.1.patch, 
> HIVE-14426.9.patch, HIVE-14426.patch
>
>
> There is an extensive logging in WebHCat at info level, and even some 
> sensitive information could be logged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14426) Extensive logging on info level in WebHCat

2016-09-26 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522795#comment-15522795
 ] 

Chaoyu Tang commented on HIVE-14426:


Committed to branch-2.1.

> Extensive logging on info level in WebHCat
> --
>
> Key: HIVE-14426
> URL: https://issues.apache.org/jira/browse/HIVE-14426
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, 
> HIVE-14426.4.patch, HIVE-14426.5.patch, HIVE-14426.6.patch, 
> HIVE-14426.7.patch, HIVE-14426.8.patch, HIVE-14426.9-branch-2.1.patch, 
> HIVE-14426.9.patch, HIVE-14426.patch
>
>
> There is an extensive logging in WebHCat at info level, and even some 
> sensitive information could be logged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522527#comment-15522527
 ] 

Hive QA commented on HIVE-14029:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12830260/HIVE-14029.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10629 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1303/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1303/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1303/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12830260 - PreCommit-HIVE-Build

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, 
> HIVE-14029.6.patch, HIVE-14029.7.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-14049) Password prompt in Beeline is continuously printed

2016-09-26 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14049 started by Miklos Csanady.
-
> Password prompt in Beeline is continuously printed
> --
>
> Key: HIVE-14049
> URL: https://issues.apache.org/jira/browse/HIVE-14049
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Miklos Csanady
>
> I'm experiencing this issue with a Mac, which was not occurring until 
> recently.
> {code}
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> beeline> !connect jdbc:hive2://localhost:1
> Connecting to jdbc:hive2://localhost:1
> Enter username for jdbc:hive2://localhost:1: hive
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> ...
> {code}
> The 'Enter password for jdbc:hive2://localhost:1:' line continues to 
> print until enter is hit. From looking at the code in Commands.java (lines 
> 1413-1420), it's not quite clear why this happens on the second call to 
> readLine()) :
> {code}
> if (username == null) {
>   username = beeLine.getConsoleReader().readLine("Enter username for " + url 
> + ": ");
> }
> props.setProperty("user", username);
> if (password == null) {
>   password = beeLine.getConsoleReader().readLine("Enter password for " + url 
> + ": ",
>   new Character('*'));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HIVE-14049) Password prompt in Beeline is continuously printed

2016-09-26 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14049 stopped by Miklos Csanady.
-
> Password prompt in Beeline is continuously printed
> --
>
> Key: HIVE-14049
> URL: https://issues.apache.org/jira/browse/HIVE-14049
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Miklos Csanady
>
> I'm experiencing this issue with a Mac, which was not occurring until 
> recently.
> {code}
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> beeline> !connect jdbc:hive2://localhost:1
> Connecting to jdbc:hive2://localhost:1
> Enter username for jdbc:hive2://localhost:1: hive
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> ...
> {code}
> The 'Enter password for jdbc:hive2://localhost:1:' line continues to 
> print until enter is hit. From looking at the code in Commands.java (lines 
> 1413-1420), it's not quite clear why this happens on the second call to 
> readLine()) :
> {code}
> if (username == null) {
>   username = beeLine.getConsoleReader().readLine("Enter username for " + url 
> + ": ");
> }
> props.setProperty("user", username);
> if (password == null) {
>   password = beeLine.getConsoleReader().readLine("Enter password for " + url 
> + ": ",
>   new Character('*'));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14049) Password prompt in Beeline is continuously printed

2016-09-26 Thread Miklos Csanady (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522443#comment-15522443
 ] 

Miklos Csanady commented on HIVE-14049:
---

If you still have this issue, please feel free to reopen this.

> Password prompt in Beeline is continuously printed
> --
>
> Key: HIVE-14049
> URL: https://issues.apache.org/jira/browse/HIVE-14049
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Miklos Csanady
>
> I'm experiencing this issue with a Mac, which was not occurring until 
> recently.
> {code}
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> beeline> !connect jdbc:hive2://localhost:1
> Connecting to jdbc:hive2://localhost:1
> Enter username for jdbc:hive2://localhost:1: hive
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> ...
> {code}
> The 'Enter password for jdbc:hive2://localhost:1:' line continues to 
> print until enter is hit. From looking at the code in Commands.java (lines 
> 1413-1420), it's not quite clear why this happens on the second call to 
> readLine()) :
> {code}
> if (username == null) {
>   username = beeLine.getConsoleReader().readLine("Enter username for " + url 
> + ": ");
> }
> props.setProperty("user", username);
> if (password == null) {
>   password = beeLine.getConsoleReader().readLine("Enter password for " + url 
> + ": ",
>   new Character('*'));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-14049) Password prompt in Beeline is continuously printed

2016-09-26 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14049 started by Miklos Csanady.
-
> Password prompt in Beeline is continuously printed
> --
>
> Key: HIVE-14049
> URL: https://issues.apache.org/jira/browse/HIVE-14049
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Miklos Csanady
>
> I'm experiencing this issue with a Mac, which was not occurring until 
> recently.
> {code}
> Beeline version 2.2.0-SNAPSHOT by Apache Hive
> beeline> !connect jdbc:hive2://localhost:1
> Connecting to jdbc:hive2://localhost:1
> Enter username for jdbc:hive2://localhost:1: hive
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> Enter password for jdbc:hive2://localhost:1:
> ...
> {code}
> The 'Enter password for jdbc:hive2://localhost:1:' line continues to 
> print until enter is hit. From looking at the code in Commands.java (lines 
> 1413-1420), it's not quite clear why this happens on the second call to 
> readLine()) :
> {code}
> if (username == null) {
>   username = beeLine.getConsoleReader().readLine("Enter username for " + url 
> + ": ");
> }
> props.setProperty("user", username);
> if (password == null) {
>   password = beeLine.getConsoleReader().readLine("Enter password for " + url 
> + ": ",
>   new Character('*'));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522397#comment-15522397
 ] 

Hive QA commented on HIVE-5867:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12830253/HIVE-5867.3%20.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10625 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1302/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1302/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1302/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12830253 - PreCommit-HIVE-Build

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: (was: HIVE-14029.7.patch)

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, 
> HIVE-14029.6.patch, HIVE-14029.7.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: HIVE-14029.7.patch

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, 
> HIVE-14029.6.patch, HIVE-14029.7.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522284#comment-15522284
 ] 

Hive QA commented on HIVE-14029:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12830248/HIVE-14029.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10629 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1301/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1301/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1301/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12830248 - PreCommit-HIVE-Build

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, 
> HIVE-14029.6.patch, HIVE-14029.7.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-26 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522214#comment-15522214
 ] 

Ferdinand Xu commented on HIVE-5867:


Thanks [~JonnyR] for the updates. LGTM +1 pending to the test.

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5867) JDBC driver and beeline should support executing an initial SQL script

2016-09-26 Thread Jianguo Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianguo Tian updated HIVE-5867:
---
Attachment: HIVE-5867.3 .patch

> JDBC driver and beeline should support executing an initial SQL script
> --
>
> Key: HIVE-5867
> URL: https://issues.apache.org/jira/browse/HIVE-5867
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, JDBC
>Reporter: Prasad Mujumdar
>Assignee: Jianguo Tian
> Attachments: HIVE-5867.1.patch, HIVE-5867.2.patch, HIVE-5867.3 .patch
>
>
> HiveCLI support the .hiverc script that is executed at the start of the 
> session. This is helpful for things like registering UDFs, session specific 
> configs etc.
> This functionality is missing for beeline and JDBC clients. It would be 
> useful for JDBC driver to support an init script with SQL statements that's 
> automatically executed after connection. The script path can be specified via 
> JDBC connection URL. For example 
> {noformat}
> jdbc:hive2://localhost:1/default;initScript=/home/user1/scripts/init.sql
> {noformat}
> This can be added to Beeline's command line option like "-i 
> /home/user1/scripts/init.sql"
> To help transition from HiveCLI to Beeline, we can keep the default init 
> script as $HOME/.hiverc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-26 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: (was: HIVE-14029.7.patch)

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, 
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch, 
> HIVE-14029.6.patch, HIVE-14029.7.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics 
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)