[jira] [Commented] (HIVE-10048) JDBC - Support SSL encryption regardless of Authentication mechanism

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372536#comment-14372536
 ] 

Hive QA commented on HIVE-10048:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706111/HIVE-10048.1.patch

{color:green}SUCCESS:{color} +1 7818 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3103/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3103/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3103/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706111 - PreCommit-HIVE-TRUNK-Build

> JDBC - Support SSL encryption regardless of Authentication mechanism
> 
>
> Key: HIVE-10048
> URL: https://issues.apache.org/jira/browse/HIVE-10048
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.0.0
>Reporter: Mubashir Kazia
>  Labels: newbie, patch
> Fix For: 1.2.0
>
> Attachments: HIVE-10048.1.patch
>
>
> JDBC driver currently only supports SSL Transport if the Authentication 
> mechanism is SASL Plain with username and password. SSL transport  should be 
> decoupled from Authentication mechanism. If the customer chooses to do 
> Kerberos Authentication and SSL encryption over the wire it should be 
> supported. The Server side already supports this but the driver does not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9693) Introduce a stats cache for HBase metastore [hbase-metastore branch]

2015-03-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372530#comment-14372530
 ] 

Lefty Leverenz commented on HIVE-9693:
--

HiveConf (just trivia):
*  two parameter descriptions say "hbase metastore" -- please change to "HBase 
metastore"
*  if the description starts on a new line, it looks better in the generated 
template file (see hive.metastore.hbase.catalog.cache.size)
*  can "we will place" be changed to "Hive will place" in two parameter 
descriptions?
*  similarly, can "types that we need to cache" be changed to "types that need 
to be cached"?
*  optionally change "stats" to "statistics" in several parameter descriptions 
... but "stats" is used in other descriptions too, so maybe it should stay as is

> Introduce a stats cache for HBase metastore  [hbase-metastore branch]
> -
>
> Key: HIVE-9693
> URL: https://issues.apache.org/jira/browse/HIVE-9693
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-9693.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372510#comment-14372510
 ] 

Hive QA commented on HIVE-9277:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706109/HIVE-9277.15.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7819 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3102/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3102/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3102/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706109 - PreCommit-HIVE-TRUNK-Build

> Hybrid Hybrid Grace Hash Join
> -
>
> Key: HIVE-9277
> URL: https://issues.apache.org/jira/browse/HIVE-9277
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: join
> Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
> HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
> HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
> HIVE-9277.13.patch, HIVE-9277.14.patch, HIVE-9277.15.patch, 
> High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
> hash join”_.
> We can benefit from this feature as illustrated below:
> * The query will not fail even if the estimated memory requirement is 
> slightly wrong
> * Expensive garbage collection overhead can be avoided when hash table grows
> * Join execution using a Map join operator even though the small table 
> doesn't fit in memory as spilling some data from the build and probe sides 
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and 
> significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9693) Introduce a stats cache for HBase metastore [hbase-metastore branch]

2015-03-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372501#comment-14372501
 ] 

Alan Gates commented on HIVE-9693:
--

HiveConf:
METASTORE_HBASE_AGGREGATE_STATS_CACHE_MAX_VARIANCE 0.01 seems like a really low 
default for this.  I was thinking we should start closer to 0.1 or 0.2.

AggregateStatsCache:
For these classes that won't be used outside of the o.a.h.h.metastore.hbase 
package I've been marking both classes and methods as package scoped instead of 
public.  This way it's clear to people that this is not a public API.
I'm nervous about the read/write lock on AggregateStatsList.  I see why you're 
doing it, but I'm nervous that the readers could spend a long time in there 
comparing and block writers.  One thing we could do is make it so that the 
readers only hold the lock for long enough to copy the list and then release 
the lock.  Or we could add a timeout to the write lock so we bound the amount 
of time the writer will wait, and if it can't write in that amount of time it 
just drops it.
Line 46, we know a priori how big the hash table can get from the config value, 
so we should use that in the construction of the hash table as that will avoid 
possibly expensive resizings later.
findBestMatch() This should also check whether an AggregateStats entry is past 
it's ttl.  If so, it should reject it even if it's a match.  This avoids way 
old data getting used in the case where the cache is large relative to the 
system usage and the cleaner hasn't been kicked off recently.
line 149, bogus javadoc, there's no param candidateList
Why have both a cleaner thread and evictOneNode?  It's simpler to spawn the 
cleaner and add the node, even if that puts the cache a few over.  Then the 
cleaner should first remove all timed out nodes, and if it is still above the 
cleaning threshold it should start evicting nodes on an LRU basis.  Actually, 
once it starts cleaning it should make sure it goes a ways below the threshold 
so it doesn't have to start cleaning again immediately.  Thus I'd add a 
{{private float cleanUntil = 0.8f}}
maxFull and cleanUntil should be configurable.  You'll want it for testing 
purposes if nothing else.
In evictOneNode, if I understand the javadoc on ConcurrentHashMap correctly you 
could get a key from the keyset that returns null on get, as some other thread 
could have already removed the key.  Moving evictOneNode into the cleaner 
thread would get rid of this issue as well since it would guarantee only one 
thread was ever removing entries from the list.
line 323, the cast is unnecessary.

HBaseReadWrite
line 1419, 1456 bogus javadoc, param name is tblName not tableName

HBaseUtils
You should break the ColumnStatsAggregator interface and implementing classes 
into separate classes.  This is an area where we'll have more work to do 
(especially in combining NDVs) and it will be convenient to have it in separate 
code rather than letting this class get huge (or more huge, anyway).  You could 
even put it in an hbase.aggregator package if you wanted.

BitVector
line 60, bogus javadoc, no param index

BloomFilter
Are there not existing implementations of a bloom filter?  Do we have to 
implement one ourselves?

I haven't reviewed the tests yet, as you said you were still adding a bunch.


> Introduce a stats cache for HBase metastore  [hbase-metastore branch]
> -
>
> Key: HIVE-9693
> URL: https://issues.apache.org/jira/browse/HIVE-9693
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-9693.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10038) Add Calcite's ProjectMergeRule.

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372475#comment-14372475
 ] 

Hive QA commented on HIVE-10038:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706104/HIVE-10038.patch

{color:red}ERROR:{color} -1 due to 605 failed/errored test(s), 7818 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_invalidate_column_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_array_map_access_nonconstant
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_evolution_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_udfs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_table_bincolserde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_table_colserde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cast1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cast_qualified_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_windowing
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_cast
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_comparison
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_nested_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_pad_convert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_varchar_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_colstats_all_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constprog2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_big_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_escape
org.apache.hadoop.hive.

[jira] [Updated] (HIVE-10048) JDBC - Support SSL encryption regardless of Authentication mechanism

2015-03-20 Thread Mubashir Kazia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubashir Kazia updated HIVE-10048:
--
Attachment: HIVE-10048.1.patch

> JDBC - Support SSL encryption regardless of Authentication mechanism
> 
>
> Key: HIVE-10048
> URL: https://issues.apache.org/jira/browse/HIVE-10048
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.0.0
>Reporter: Mubashir Kazia
>  Labels: newbie, patch
> Fix For: 1.2.0
>
> Attachments: HIVE-10048.1.patch
>
>
> JDBC driver currently only supports SSL Transport if the Authentication 
> mechanism is SASL Plain with username and password. SSL transport  should be 
> decoupled from Authentication mechanism. If the customer chooses to do 
> Kerberos Authentication and SSL encryption over the wire it should be 
> supported. The Server side already supports this but the driver does not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10045) LLAP : NPE in DiskRangeList helper init

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10045.
-
   Resolution: Fixed
Fix Version/s: llap

Will resolve this for now. Query runs forever for me, Map 1 tasks are timing 
out on the heartbeat, I don't see exceptions in container logs. Please file a 
different bug if that is not supposed to happen, or I may look later too.
This issue is resolved :)

> LLAP : NPE in DiskRangeList helper init
> ---
>
> Key: HIVE-10045
> URL: https://issues.apache.org/jira/browse/HIVE-10045
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> {noformat}
> hive> select count(*) from store_sales where ss_item_sk not in (select 
> i_item_sk from item);
> Warning: Map Join MAPJOIN[32][bigTable=store_sales] in task 'Map 1' is a 
> cross product
> Query ID = gunther_20150320161251_1fb94086-90aa-4fd2-b4b2-fca245850238
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1424502260528_1355)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 KILLED174  00  174   0 
> 174
> Map 3 ..   SUCCEEDED  1  100   0  
>  0
> Map 4 FAILED  1  001   4  
>  0
> Reducer 2 KILLED  1  001   0  
>  1
> Reducer 5 KILLED  1  001   0  
>  1
> 
> VERTICES: 01/05  [>>--] 0%ELAPSED TIME: 0.62 s
> 
> Status: Failed
> Vertex failed, vertexName=Map 4, vertexId=vertex_1424502260528_1355_5_00, 
> diagnostics=[Task failed, taskId=task_1424502260528_1355_5_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:308)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
>   ... 14 more
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(H

[jira] [Commented] (HIVE-10022) DFS in authorization might take too long

2015-03-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372444#comment-14372444
 ] 

Thejas M Nair commented on HIVE-10022:
--

That's right.


> DFS in authorization might take too long
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10047) LLAP: VectorMapJoinOperator gets an over-flow on batchSize of 1024

2015-03-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10047:
---
Assignee: Matt McCline

> LLAP: VectorMapJoinOperator gets an over-flow on batchSize of 1024
> --
>
> Key: HIVE-10047
> URL: https://issues.apache.org/jira/browse/HIVE-10047
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Matt McCline
> Fix For: llap
>
>
> Simple LLAP queries on constrained resources runs into an exception which 
> suggests that the 
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory$VectorLongColumnAssign.assignLong(VectorColumnAssignFactory.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory$9.assignObjectValue(VectorColumnAssignFactory.java:293)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.internalForward(VectorMapJoinOperator.java:196)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:653)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:656)
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:752)
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:316)
> ... 22 more
> {code}
> The relevant line is due to the check for a full output-batch being outside 
> of the loop here - looks like it can be triggered during MxN joins where 
> there are more values than there were input rows in the input batch.
> {code}
> for (int i=0; i   vcas[i].assignObjectValue(values[i], outputBatch.size);
> }
>++outputBatch.size;
> if (outputBatch.size == VectorizedRowBatch.DEFAULT_SIZE) {
>   flushOutput();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372443#comment-14372443
 ] 

Hive QA commented on HIVE-8746:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706100/HIVE-8746.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8239 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3100/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3100/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3100/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706100 - PreCommit-HIVE-TRUNK-Build

> ORC timestamp columns are sensitive to daylight savings time
> 
>
> Key: HIVE-8746
> URL: https://issues.apache.org/jira/browse/HIVE-8746
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Owen O'Malley
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-8746.1.patch, HIVE-8746.2.patch, HIVE-8746.3.patch
>
>
> Hive uses Java's Timestamp class to manipulate timestamp columns. 
> Unfortunately the textual parsing in Timestamp is done in local time and the 
> internal storage is in UTC.
> ORC mostly side steps this issue by storing the difference between the time 
> and a base time also in local and storing that difference in the file. 
> Reading the file between timezones will mostly work correctly "2014-01-01 
> 12:34:56" will read correctly in every timezone.
> However, when moving between timezones with different daylight saving it 
> creates trouble. In particular, moving from a computer in PST to UTC will 
> read "2014-06-06 12:34:56" as "2014-06-06 11:34:56".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10046) LLAP : disable ORC trace logging for now

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10046.
-
Resolution: Fixed

> LLAP : disable ORC trace logging for now
> 
>
> Key: HIVE-10046
> URL: https://issues.apache.org/jira/browse/HIVE-10046
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> For perf testing. 
> I just need a JIRA to commit the change



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-20 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-9277:

Attachment: HIVE-9277.15.patch

Upload 15th patch for testing.

If this one passes tests, it is ready to be committed. Thanks.

> Hybrid Hybrid Grace Hash Join
> -
>
> Key: HIVE-9277
> URL: https://issues.apache.org/jira/browse/HIVE-9277
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: join
> Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
> HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
> HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
> HIVE-9277.13.patch, HIVE-9277.14.patch, HIVE-9277.15.patch, 
> High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
> hash join”_.
> We can benefit from this feature as illustrated below:
> * The query will not fail even if the estimated memory requirement is 
> slightly wrong
> * Expensive garbage collection overhead can be avoided when hash table grows
> * Join execution using a Map join operator even though the small table 
> doesn't fit in memory as spilling some data from the build and probe sides 
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and 
> significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10046) LLAP : disable ORC trace logging for now

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10046:
---

Assignee: Sergey Shelukhin

> LLAP : disable ORC trace logging for now
> 
>
> Key: HIVE-10046
> URL: https://issues.apache.org/jira/browse/HIVE-10046
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> For perf testing. 
> I just need a JIRA to commit the change



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10045) LLAP : NPE in DiskRangeList helper init

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372416#comment-14372416
 ] 

Sergey Shelukhin commented on HIVE-10045:
-

[~hagleitn] [~gopalv] fyi

> LLAP : NPE in DiskRangeList helper init
> ---
>
> Key: HIVE-10045
> URL: https://issues.apache.org/jira/browse/HIVE-10045
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Sergey Shelukhin
>
> {noformat}
> hive> select count(*) from store_sales where ss_item_sk not in (select 
> i_item_sk from item);
> Warning: Map Join MAPJOIN[32][bigTable=store_sales] in task 'Map 1' is a 
> cross product
> Query ID = gunther_20150320161251_1fb94086-90aa-4fd2-b4b2-fca245850238
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1424502260528_1355)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 KILLED174  00  174   0 
> 174
> Map 3 ..   SUCCEEDED  1  100   0  
>  0
> Map 4 FAILED  1  001   4  
>  0
> Reducer 2 KILLED  1  001   0  
>  1
> Reducer 5 KILLED  1  001   0  
>  1
> 
> VERTICES: 01/05  [>>--] 0%ELAPSED TIME: 0.62 s
> 
> Status: Failed
> Vertex failed, vertexName=Map 4, vertexId=vertex_1424502260528_1355_5_00, 
> diagnostics=[Task failed, taskId=task_1424502260528_1355_5_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:308)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
>   ... 14 more
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>   at 
> org.apa

[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-20 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372413#comment-14372413
 ] 

Wei Zheng commented on HIVE-9277:
-

I see what you are saying. I will remove those unnecessary prefix in 
HybridHashTableContainer class, but keep several in MapJoinOperator class. Will 
upload a final (hopefully) patch later.

> Hybrid Hybrid Grace Hash Join
> -
>
> Key: HIVE-9277
> URL: https://issues.apache.org/jira/browse/HIVE-9277
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: join
> Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
> HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
> HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
> HIVE-9277.13.patch, HIVE-9277.14.patch, 
> High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
> hash join”_.
> We can benefit from this feature as illustrated below:
> * The query will not fail even if the estimated memory requirement is 
> slightly wrong
> * Expensive garbage collection overhead can be avoided when hash table grows
> * Join execution using a Map join operator even though the small table 
> doesn't fit in memory as spilling some data from the build and probe sides 
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and 
> significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10045) LLAP : NPE in DiskRangeList helper init

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372410#comment-14372410
 ] 

Sergey Shelukhin commented on HIVE-10045:
-

Committed the fix for NPE. Not sure if there's deeper underlying problem yet

> LLAP : NPE in DiskRangeList helper init
> ---
>
> Key: HIVE-10045
> URL: https://issues.apache.org/jira/browse/HIVE-10045
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Sergey Shelukhin
>
> {noformat}
> hive> select count(*) from store_sales where ss_item_sk not in (select 
> i_item_sk from item);
> Warning: Map Join MAPJOIN[32][bigTable=store_sales] in task 'Map 1' is a 
> cross product
> Query ID = gunther_20150320161251_1fb94086-90aa-4fd2-b4b2-fca245850238
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1424502260528_1355)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 KILLED174  00  174   0 
> 174
> Map 3 ..   SUCCEEDED  1  100   0  
>  0
> Map 4 FAILED  1  001   4  
>  0
> Reducer 2 KILLED  1  001   0  
>  1
> Reducer 5 KILLED  1  001   0  
>  1
> 
> VERTICES: 01/05  [>>--] 0%ELAPSED TIME: 0.62 s
> 
> Status: Failed
> Vertex failed, vertexName=Map 4, vertexId=vertex_1424502260528_1355_5_00, 
> diagnostics=[Task failed, taskId=task_1424502260528_1355_5_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:308)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
>   ... 14 more
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
>   at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveConte

[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372407#comment-14372407
 ] 

Sergey Shelukhin commented on HIVE-9277:


WRT logger, I meant why not create a separate logger if you need logger prefix? 
Logger logs class name it was created with, and can be configured by class to 
filter out messages, set level, etc.
Why reinvent this in each log message?

Can be fixed on commit, +1

> Hybrid Hybrid Grace Hash Join
> -
>
> Key: HIVE-9277
> URL: https://issues.apache.org/jira/browse/HIVE-9277
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: join
> Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
> HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
> HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
> HIVE-9277.13.patch, HIVE-9277.14.patch, 
> High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
> hash join”_.
> We can benefit from this feature as illustrated below:
> * The query will not fail even if the estimated memory requirement is 
> slightly wrong
> * Expensive garbage collection overhead can be avoided when hash table grows
> * Join execution using a Map join operator even though the small table 
> doesn't fit in memory as spilling some data from the build and probe sides 
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and 
> significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10038) Add Calcite's ProjectMergeRule.

2015-03-20 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10038:

Attachment: HIVE-10038.patch

> Add Calcite's ProjectMergeRule.
> ---
>
> Key: HIVE-10038
> URL: https://issues.apache.org/jira/browse/HIVE-10038
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO, Logical Optimizer
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-10038.patch
>
>
> Helps to improve latency by shortening operator pipeline. Folds adjacent 
> projections in one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2015-03-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-8746:

Attachment: HIVE-8746.3.patch

Rebased patch to trunk

> ORC timestamp columns are sensitive to daylight savings time
> 
>
> Key: HIVE-8746
> URL: https://issues.apache.org/jira/browse/HIVE-8746
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Owen O'Malley
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-8746.1.patch, HIVE-8746.2.patch, HIVE-8746.3.patch
>
>
> Hive uses Java's Timestamp class to manipulate timestamp columns. 
> Unfortunately the textual parsing in Timestamp is done in local time and the 
> internal storage is in UTC.
> ORC mostly side steps this issue by storing the difference between the time 
> and a base time also in local and storing that difference in the file. 
> Reading the file between timezones will mostly work correctly "2014-01-01 
> 12:34:56" will read correctly in every timezone.
> However, when moving between timezones with different daylight saving it 
> creates trouble. In particular, moving from a computer in PST to UTC will 
> read "2014-06-06 12:34:56" as "2014-06-06 11:34:56".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9693) Introduce a stats cache for HBase metastore [hbase-metastore branch]

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372367#comment-14372367
 ] 

Hive QA commented on HIVE-9693:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706067/HIVE-9693.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3099/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3099/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3099/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3099/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
++ awk '{print $2}'
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1668182.

At revision 1668182.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706067 - PreCommit-HIVE-TRUNK-Build

> Introduce a stats cache for HBase metastore  [hbase-metastore branch]
> -
>
> Key: HIVE-9693
> URL: https://issues.apache.org/jira/browse/HIVE-9693
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-9693.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372366#comment-14372366
 ] 

Hive QA commented on HIVE-8746:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706064/HIVE-8746.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3098/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3098/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3098/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3098/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java'
Reverted 
'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatTableInfo.java'
Reverted 
'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java'
Reverted 
'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/PartInfo.java'
Reverted 
'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatSplit.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/thirdparty 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target service/target contrib/target serde/target beeline/target 
odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
Acommon/src/java/org/apache/hadoop/hive/common/DiskRangeList.java
Acommon/src/java/org/apache/hadoop/hive/common/DiskRange.java
U
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
Uql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
Uql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
Uql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java
Uql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
Aql/src/java/org/apache/hadoop/hive/ql/io/orc/MetadataReader.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
Uql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
Aql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java
Uql/src/j

[jira] [Commented] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372362#comment-14372362
 ] 

Hive QA commented on HIVE-9845:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706054/HIVE-9845.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7819 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.mapreduce.TestHCatOutputFormat.testSetOutput
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3097/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3097/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3097/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706054 - PreCommit-HIVE-TRUNK-Build

> HCatSplit repeats information making input split data size huge
> ---
>
> Key: HIVE-9845
> URL: https://issues.apache.org/jira/browse/HIVE-9845
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Rohini Palaniswamy
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-9845.1.patch
>
>
> Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
> has even triple the number of splits(100K+ splits and tasks) does not hit 
> that issue.
> {code}
> HCatBaseInputFormat.java:
>  //Call getSplit on the InputFormat, create an
>   //HCatSplit for each underlying split
>   //NumSplits is 0 for our purposes
>   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
> inputFormat.getSplits(jobConf, 0);
>   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
> splits.add(new HCatSplit(
> partitionInfo,
> split,allCols));
>   }
> {code}
> Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10044) Allow interval params for year/month/day/hour/minute/second functions

2015-03-20 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10044:
--
Attachment: HIVE-10044.1.patch

attaching initial patch

> Allow interval params for year/month/day/hour/minute/second functions
> -
>
> Key: HIVE-10044
> URL: https://issues.apache.org/jira/browse/HIVE-10044
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-10044.1.patch
>
>
> Update the year/month/day/hour/minute/second functions to retrieve the 
> various fields of the year-month and day-time interval types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9771) HiveCombineInputFormat does not appropriately call getSplits on InputFormats for native tables

2015-03-20 Thread Luke Lovett (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lovett resolved HIVE-9771.
---
Resolution: Not a Problem

> HiveCombineInputFormat does not appropriately call getSplits on InputFormats 
> for native tables
> --
>
> Key: HIVE-9771
> URL: https://issues.apache.org/jira/browse/HIVE-9771
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
> Environment: Hive 0.12.0
> Hadoop 2.4.1
> java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>Reporter: Luke Lovett
>
> {{HiveCombineInputFormat}} never calls {{getSplits}} on a custom 
> {{InputFormat}} when those InputFormats are used by native tables. If I 
> {{set}} 
> {{hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;}}, then 
> {{getSplits}} is called. I'm not the first user to have experience this 
> either; see [this post from the hive-user mailing 
> list|https://mail-archives.apache.org/mod_mbox/hive-user/201410.mbox/%3CCAENxBwy+XB1OB2ZOjz=4=NxKNMsWA==o0ibrd+gopxgqrj2...@mail.gmail.com%3E].
> The purpose of this ticket is to discover:
> - Is this difference in behavior between CombineHiveInputFormat and 
> HiveInputFormat intentional?
> - Is there any way of forcing CombineHiveInputFormat to call getSplits 
> on my own InputFormat? I was reading through the code for 
> CombineHiveInputFormat, and it looks like it might only call my own 
> InputFormat's getSplits method if the table is non-native. I'm not sure 
> if I'm interpreting this correctly.
> - For the purpose of creating an InputFormat to be used by other users, is it 
> better to set {{hive.input.format}} to work around this, or to 
> create a StorageHandler and make non-native tables?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9771) HiveCombineInputFormat does not appropriately call getSplits on InputFormats for native tables

2015-03-20 Thread Luke Lovett (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372333#comment-14372333
 ] 

Luke Lovett commented on HIVE-9771:
---

It was my misunderstanding that my own {{InputFormat}}'s version of 
{{getSplits()}} would be the decider for how slits were made. This is not the 
case.

StackOverflow explanation: 
http://stackoverflow.com/questions/29133275/custom-inputformat-getsplits-never-called-in-hive
Hadoop documentation (see "Map" section): 
http://wiki.apache.org/hadoop/HadoopMapReduce


> HiveCombineInputFormat does not appropriately call getSplits on InputFormats 
> for native tables
> --
>
> Key: HIVE-9771
> URL: https://issues.apache.org/jira/browse/HIVE-9771
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
> Environment: Hive 0.12.0
> Hadoop 2.4.1
> java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>Reporter: Luke Lovett
>
> {{HiveCombineInputFormat}} never calls {{getSplits}} on a custom 
> {{InputFormat}} when those InputFormats are used by native tables. If I 
> {{set}} 
> {{hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;}}, then 
> {{getSplits}} is called. I'm not the first user to have experience this 
> either; see [this post from the hive-user mailing 
> list|https://mail-archives.apache.org/mod_mbox/hive-user/201410.mbox/%3CCAENxBwy+XB1OB2ZOjz=4=NxKNMsWA==o0ibrd+gopxgqrj2...@mail.gmail.com%3E].
> The purpose of this ticket is to discover:
> - Is this difference in behavior between CombineHiveInputFormat and 
> HiveInputFormat intentional?
> - Is there any way of forcing CombineHiveInputFormat to call getSplits 
> on my own InputFormat? I was reading through the code for 
> CombineHiveInputFormat, and it looks like it might only call my own 
> InputFormat's getSplits method if the table is non-native. I'm not sure 
> if I'm interpreting this correctly.
> - For the purpose of creating an InputFormat to be used by other users, is it 
> better to set {{hive.input.format}} to work around this, or to 
> create a StorageHandler and make non-native tables?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9555) assorted ORC refactorings for LLAP on trunk

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372319#comment-14372319
 ] 

Sergey Shelukhin commented on HIVE-9555:


Filed  HIVE-10042 as a followup

> assorted ORC refactorings for LLAP on trunk
> ---
>
> Key: HIVE-9555
> URL: https://issues.apache.org/jira/browse/HIVE-9555
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.2.0
>
> Attachments: HIVE-9555.01.patch, HIVE-9555.02.patch, 
> HIVE-9555.03.patch, HIVE-9555.04.patch, HIVE-9555.05.patch, 
> HIVE-9555.06.patch, HIVE-9555.07.patch, HIVE-9555.08.patch, HIVE-9555.patch
>
>
> To minimize conflicts and given that ORC is being developed rapidly on trunk, 
> I would like to refactor some parts of ORC "in advance" based on the changes 
> in LLAP branch. Mostly it concerns making parts of ORC code (esp. SARG, but 
> also some internal methods) more modular and easier to use from alternative 
> codepaths. There's also significant change to how data reading is handled - 
> BufferChunk inherits from DiskRange; the reader receives a list of 
> DiskRange-s (as before), but instead of making a list of buffer chunks it 
> replaces ranges with buffer chunks in the original (linked) list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372310#comment-14372310
 ] 

Sergey Shelukhin commented on HIVE-9277:


left minor feedback on the the JIRA, overall looks ok.
[~mmokhtar] I hope the join has constant overhead when spilling is not needed :)

> Hybrid Hybrid Grace Hash Join
> -
>
> Key: HIVE-9277
> URL: https://issues.apache.org/jira/browse/HIVE-9277
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: join
> Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
> HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
> HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
> HIVE-9277.13.patch, HIVE-9277.14.patch, 
> High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
> hash join”_.
> We can benefit from this feature as illustrated below:
> * The query will not fail even if the estimated memory requirement is 
> slightly wrong
> * Expensive garbage collection overhead can be avoided when hash table grows
> * Join execution using a Map join operator even though the small table 
> doesn't fit in memory as spilling some data from the build and probe sides 
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and 
> significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9277) Hybrid Hybrid Grace Hash Join

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372311#comment-14372311
 ] 

Sergey Shelukhin commented on HIVE-9277:


er, on RB

> Hybrid Hybrid Grace Hash Join
> -
>
> Key: HIVE-9277
> URL: https://issues.apache.org/jira/browse/HIVE-9277
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: join
> Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, 
> HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, 
> HIVE-9277.06.patch, HIVE-9277.07.patch, HIVE-9277.08.patch, 
> HIVE-9277.13.patch, HIVE-9277.14.patch, 
> High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace 
> hash join”_.
> We can benefit from this feature as illustrated below:
> * The query will not fail even if the estimated memory requirement is 
> slightly wrong
> * Expensive garbage collection overhead can be avoided when hash table grows
> * Join execution using a Map join operator even though the small table 
> doesn't fit in memory as spilling some data from the build and probe sides 
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and 
> significant amount of memory available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

2015-03-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7664:
--
Assignee: Matt McCline  (was: Gopal V)

> VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized 
> execution and takes 25% CPU
> -
>
> Key: HIVE-7664
> URL: https://issues.apache.org/jira/browse/HIVE-7664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Matt McCline
> Attachments: HIVE-7664.1.patch.txt, HIVE-7664.2.patch.txt
>
>
> In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in 
> VectorizedBatchUtil.addRowToBatchFrom().
> Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like 
> it wasn't optimized for Vectorized processing.
> addRowToBatchFrom is called for every row and for each row and every column 
> in the batch getPrimitiveCategory is called to figure the type of each 
> column, column types are stored in a HashMap, for VectorGroupByOperator 
> columns types won't change between batches, so column types shouldn't be 
> looked up for every row.
> I recommend storing the column type in StructObjectInspector so that other 
> components can leverage this optimization.
> Also addRowToBatchFrom has a case statement for every row and every column 
> used for type casting I recommend encapsulating the type logic in templatized 
> methods.   
> {code}
> Stack Trace   Sample CountPercentage(%)
> VectorizedBatchUtil.addRowToBatchFrom 86  26.543
>AbstractPrimitiveObjectInspector.getPrimitiveCategory()34  10.494
>LazyBinaryStructObjectInspector.getStructFieldData 25  7.716
>StandardStructObjectInspector.getStructFieldData   4   1.235
> {code}
> The query used : 
> {code}
> select 
> ss_sold_date_sk
> from
> store_sales
> where
> ss_sold_date between '1998-01-01' and '1998-06-01'
> group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
> having sum(ss_list_price) > 50;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9814) LLAP: JMX web-service end points for monitoring & metrics

2015-03-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9814:
--
Summary: LLAP: JMX web-service end points for monitoring & metrics  (was: 
LLAP: JMX end points for monitoring & metrics)

> LLAP: JMX web-service end points for monitoring & metrics
> -
>
> Key: HIVE-9814
> URL: https://issues.apache.org/jira/browse/HIVE-9814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-9814.wip1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10037) JDBC support for interval expressions

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372274#comment-14372274
 ] 

Hive QA commented on HIVE-10037:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706011/HIVE-10037.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7820 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3096/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3096/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3096/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706011 - PreCommit-HIVE-TRUNK-Build

> JDBC support for interval expressions
> -
>
> Key: HIVE-10037
> URL: https://issues.apache.org/jira/browse/HIVE-10037
> Project: Hive
>  Issue Type: Sub-task
>  Components: JDBC
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-10037.1.patch
>
>
> Allow interval expressions to be retrieved from JDBC/beeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10015) LLAP : add LLAP IO read debug tool

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10015:
---

Assignee: Sergey Shelukhin

> LLAP : add LLAP IO read debug tool
> --
>
> Key: HIVE-10015
> URL: https://issues.apache.org/jira/browse/HIVE-10015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Need a tool that will run read pipeline on a file downloaded from the cluster 
> (that causes some exception), for local debugging. Needs to reproduce 
> whatever environment necessary (in IO pipeline itself) from the cluster (e.g. 
> allocation sizes in allocator, etc.). Cache contents can be gotten from the 
> same file (optional, phase 2); incorrect pre-existing cache contents would be 
> out of the scope of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9555) assorted ORC refactorings for LLAP on trunk

2015-03-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372247#comment-14372247
 ] 

Gopal V commented on HIVE-9555:
---

[~sershe]: Adding to Prasanth's +1 - LGTM.

Move the DiskRange classes to common/io/ for clarity.

Please file a follow-up JIRA to extract the TreeReaders out of the 
RecordReaderImpl to an reader.impl. package.

Added to my scale tests - last patch tested bug-free in those runs.

> assorted ORC refactorings for LLAP on trunk
> ---
>
> Key: HIVE-9555
> URL: https://issues.apache.org/jira/browse/HIVE-9555
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-9555.01.patch, HIVE-9555.02.patch, 
> HIVE-9555.03.patch, HIVE-9555.04.patch, HIVE-9555.05.patch, 
> HIVE-9555.06.patch, HIVE-9555.07.patch, HIVE-9555.08.patch, HIVE-9555.patch
>
>
> To minimize conflicts and given that ORC is being developed rapidly on trunk, 
> I would like to refactor some parts of ORC "in advance" based on the changes 
> in LLAP branch. Mostly it concerns making parts of ORC code (esp. SARG, but 
> also some internal methods) more modular and easier to use from alternative 
> codepaths. There's also significant change to how data reading is handled - 
> BufferChunk inherits from DiskRange; the reader receives a list of 
> DiskRange-s (as before), but instead of making a list of buffer chunks it 
> replaces ranges with buffer chunks in the original (linked) list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9996) [CBO] Generate appropriate join operator as per algorithm selected by CBO

2015-03-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372237#comment-14372237
 ] 

Jesus Camacho Rodriguez commented on HIVE-9996:
---

LGTM, though we still need to decide when it should go in.

+1

> [CBO] Generate appropriate join operator as per algorithm selected by CBO
> -
>
> Key: HIVE-9996
> URL: https://issues.apache.org/jira/browse/HIVE-9996
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: cbo-branch
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: cbo-branch
>
> Attachments: HIVE-9996.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9996) [CBO] Generate appropriate join operator as per algorithm selected by CBO

2015-03-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-9996:
--
Fix Version/s: cbo-branch

> [CBO] Generate appropriate join operator as per algorithm selected by CBO
> -
>
> Key: HIVE-9996
> URL: https://issues.apache.org/jira/browse/HIVE-9996
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Query Planning
>Affects Versions: cbo-branch
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: cbo-branch
>
> Attachments: HIVE-9996.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10039) LLAP : llap_partitioned test is broken

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10039.
-
   Resolution: Fixed
Fix Version/s: llap

> LLAP : llap_partitioned test is broken
> --
>
> Key: HIVE-10039
> URL: https://issues.apache.org/jira/browse/HIVE-10039
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10040) CBO (Calcite Return Path): Pluggable cost modules [CBO branch]

2015-03-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10040:
---
Summary: CBO (Calcite Return Path): Pluggable cost modules [CBO branch]  
(was: Pluggable cost modules)

> CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
> --
>
> Key: HIVE-10040
> URL: https://issues.apache.org/jira/browse/HIVE-10040
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
>
> We should be able to deal with cost models in a modular way. Thus, the cost 
> model should be integrated within a Calcite MD provider that is pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10040) CBO (Calcite Return Path): Pluggable cost modules [CBO branch]

2015-03-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10040:
---
Fix Version/s: (was: 1.2.0)
   cbo-branch

> CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
> --
>
> Key: HIVE-10040
> URL: https://issues.apache.org/jira/browse/HIVE-10040
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
>
> We should be able to deal with cost models in a modular way. Thus, the cost 
> model should be integrated within a Calcite MD provider that is pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10040) CBO (Calcite Return Path): Pluggable cost modules [CBO branch]

2015-03-20 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10040:
---
Affects Version/s: cbo-branch

> CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
> --
>
> Key: HIVE-10040
> URL: https://issues.apache.org/jira/browse/HIVE-10040
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
>
> We should be able to deal with cost models in a modular way. Thus, the cost 
> model should be integrated within a Calcite MD provider that is pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2015-03-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-8746:

Affects Version/s: 1.1.0
   1.2.0
   0.11.0
   0.12.0
   0.13.0
   0.14.0
   1.0.0

> ORC timestamp columns are sensitive to daylight savings time
> 
>
> Key: HIVE-8746
> URL: https://issues.apache.org/jira/browse/HIVE-8746
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Owen O'Malley
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-8746.1.patch, HIVE-8746.2.patch
>
>
> Hive uses Java's Timestamp class to manipulate timestamp columns. 
> Unfortunately the textual parsing in Timestamp is done in local time and the 
> internal storage is in UTC.
> ORC mostly side steps this issue by storing the difference between the time 
> and a base time also in local and storing that difference in the file. 
> Reading the file between timezones will mostly work correctly "2014-01-01 
> 12:34:56" will read correctly in every timezone.
> However, when moving between timezones with different daylight saving it 
> creates trouble. In particular, moving from a computer in PST to UTC will 
> read "2014-06-06 12:34:56" as "2014-06-06 11:34:56".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9693) Introduce a stats cache for HBase metastore [hbase-metastore branch]

2015-03-20 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-9693:
---
Attachment: HIVE-9693.1.patch

Ran into some issues with my tests. Cleaning up and attaching those shortly, 
but I guess this will get the review started.

> Introduce a stats cache for HBase metastore  [hbase-metastore branch]
> -
>
> Key: HIVE-9693
> URL: https://issues.apache.org/jira/browse/HIVE-9693
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-9693.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2015-03-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-8746:

Attachment: HIVE-8746.2.patch

Addressed [~owen.omalley]'s review comments and updated golden files to reflect 
the file size change.

> ORC timestamp columns are sensitive to daylight savings time
> 
>
> Key: HIVE-8746
> URL: https://issues.apache.org/jira/browse/HIVE-8746
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-8746.1.patch, HIVE-8746.2.patch
>
>
> Hive uses Java's Timestamp class to manipulate timestamp columns. 
> Unfortunately the textual parsing in Timestamp is done in local time and the 
> internal storage is in UTC.
> ORC mostly side steps this issue by storing the difference between the time 
> and a base time also in local and storing that difference in the file. 
> Reading the file between timezones will mostly work correctly "2014-01-01 
> 12:34:56" will read correctly in every timezone.
> However, when moving between timezones with different daylight saving it 
> creates trouble. In particular, moving from a computer in PST to UTC will 
> read "2014-06-06 12:34:56" as "2014-06-06 11:34:56".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9825) CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO branch]

2015-03-20 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-9825.

Resolution: Fixed

Committed to branch. Thanks, Jesus!

> CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO 
> branch]
> ---
>
> Key: HIVE-9825
> URL: https://issues.apache.org/jira/browse/HIVE-9825
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
> Attachments: HIVE-9825.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-03-20 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-9845:
---
Attachment: HIVE-9845.1.patch

The proposed fix.

> HCatSplit repeats information making input split data size huge
> ---
>
> Key: HIVE-9845
> URL: https://issues.apache.org/jira/browse/HIVE-9845
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Rohini Palaniswamy
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-9845.1.patch
>
>
> Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
> has even triple the number of splits(100K+ splits and tasks) does not hit 
> that issue.
> {code}
> HCatBaseInputFormat.java:
>  //Call getSplit on the InputFormat, create an
>   //HCatSplit for each underlying split
>   //NumSplits is 0 for our purposes
>   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
> inputFormat.getSplits(jobConf, 0);
>   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
> splits.add(new HCatSplit(
> partitionInfo,
> split,allCols));
>   }
> {code}
> Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9825) CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO branch]

2015-03-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372164#comment-14372164
 ] 

Ashutosh Chauhan commented on HIVE-9825:


+1 There might be a potential issue in multi-insert case (association of 
WindowFunctionSpec with ExprNodeConverter). Something to test when we enable 
multi-insert queries on CBO path. Looks good otherwise.

> CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO 
> branch]
> ---
>
> Key: HIVE-9825
> URL: https://issues.apache.org/jira/browse/HIVE-9825
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
> Attachments: HIVE-9825.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372158#comment-14372158
 ] 

Hive QA commented on HIVE-10036:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12706001/HIVE-10036.1.patch

{color:red}ERROR:{color} -1 due to 39 failed/errored test(s), 7819 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_empty_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_vectorization_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_after_multiple_inserts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_vectorization_ppd
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_after_multiple_inserts
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testBloomFilter2
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hadoop.hive.ql.io.orc.TestInStream.testCompressed
org.apache.hadoop.hive.ql.io.orc.TestInStream.testDisjointBuffers
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagementV12[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagementV12[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testEmpty
org.apache.hive.hcatalog.pig.TestHCatLoader.testColumnarStorePushdown[3]
org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3]
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3095/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3095/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3095/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 39 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12706001 - PreCommit-HIVE-TRUNK-Build

> Writing ORC format big table causes OOM - too many fixed sized stream buffers
> -
>
> Key: HIVE-10036
> URL: https://issues.apache.org/jira/browse/HIVE-10036
> Project: Hive
>  Issue Type: Improvement
>Reporter: Selina Zhang
>Assignee: Selina Zhang
> Attachments: HIVE-10036.1.patch
>
>
> ORC writer keeps multiple out steams for each column. Each output stream is 
> allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
> table, the memory cost is unbearable. Specially when HCatalog dynamic 
> partition involves, several hundreds files may be open and writing at the 
> same time (same problems

[jira] [Resolved] (HIVE-8116) clean up q file hive.log

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-8116.

Resolution: Won't Fix

> clean up q file hive.log
> 
>
> Key: HIVE-8116
> URL: https://issues.apache.org/jira/browse/HIVE-8116
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> I'll do it later, or someone can take it if desired.
> We should have separate logging config for q tests, if we don't already, and 
> stuff like ZooKeeperSaslClient, FinalRequestProcessor, ClientCnxn etc. should 
> be filtered out



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8013) CBO:fix checkstyle warnings

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8013:
---
Assignee: (was: Sergey Shelukhin)

> CBO:fix checkstyle warnings
> ---
>
> Key: HIVE-8013
> URL: https://issues.apache.org/jira/browse/HIVE-8013
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-20 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372105#comment-14372105
 ] 

Yongzhi Chen commented on HIVE-7018:


Thank you [~xuefuz] & [~ctang.ma]

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
> Fix For: 1.2.0
>
> Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch
>
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10014) LLAP : investigate showing LLAP IO usage in explain

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10014.
-
   Resolution: Fixed
Fix Version/s: llap

thanks!

> LLAP : investigate showing LLAP IO usage in explain
> ---
>
> Key: HIVE-10014
> URL: https://issues.apache.org/jira/browse/HIVE-10014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10014.patch
>
>
> Not sure if input formats are created during explain, or if it is possible to 
> go deep enough to create them; we should show if LLAP input format will be 
> used for the query based on whether it's vectorized, original format is 
> supported, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10014) LLAP : investigate showing LLAP IO usage in explain

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372104#comment-14372104
 ] 

Sergey Shelukhin commented on HIVE-10014:
-

We don't have separate work class

> LLAP : investigate showing LLAP IO usage in explain
> ---
>
> Key: HIVE-10014
> URL: https://issues.apache.org/jira/browse/HIVE-10014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10014.patch
>
>
> Not sure if input formats are created during explain, or if it is possible to 
> go deep enough to create them; we should show if LLAP input format will be 
> used for the query based on whether it's vectorized, original format is 
> supported, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10014) LLAP : investigate showing LLAP IO usage in explain

2015-03-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372096#comment-14372096
 ] 

Ashutosh Chauhan commented on HIVE-10014:
-

looks ok to me. 
I assume you don't have {{LlapWork extends MapWork}}. If you had that, that 
would have been ideal place to put deriveLlap() method in. 

> LLAP : investigate showing LLAP IO usage in explain
> ---
>
> Key: HIVE-10014
> URL: https://issues.apache.org/jira/browse/HIVE-10014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10014.patch
>
>
> Not sure if input formats are created during explain, or if it is possible to 
> go deep enough to create them; we should show if LLAP input format will be 
> used for the query based on whether it's vectorized, original format is 
> supported, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-20 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372089#comment-14372089
 ] 

Jason Dere commented on HIVE-3454:
--

So this patch did not require any changes to MathExpr? How does the behavior of 
the vectorized cast change?

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9859) Create bitwise left/right shift UDFs

2015-03-20 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372074#comment-14372074
 ] 

Pengcheng Xiong commented on HIVE-9859:
---

[~jdere], I will take a look tonight and get back to you.Thanks.

> Create bitwise left/right shift UDFs
> 
>
> Key: HIVE-9859
> URL: https://issues.apache.org/jira/browse/HIVE-9859
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Attachments: HIVE-9859.1.patch, HIVE-9859.2.patch, HIVE-9859.3.patch
>
>
> Signature:
> a << b
> a >> b
> a >>> b
> For example:
> {code}
> select 1 << 4, 8 >> 2, 8 >>> 2;
> OK
> 16   2   2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10022) DFS in authorization might take too long

2015-03-20 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372028#comment-14372028
 ] 

Pankit Thapar commented on HIVE-10022:
--

So, If I understand this correctly, for the case where the /tmp/dir does not 
exist, we should just check the permission on /tmp/ in case it exists?


> DFS in authorization might take too long
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371995#comment-14371995
 ] 

Hive QA commented on HIVE-9675:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705991/HIVE-9675.7.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7864 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_types
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_update_noupdatepriv
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.testUpdateSomeColumnsUsed
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.testUpdateSomeColumnsUsedExprInSet
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.minorCompactAfterAbort
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.minorCompactWhileStreaming
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3094/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3094/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3094/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705991 - PreCommit-HIVE-TRUNK-Build

> Support START TRANSACTION/COMMIT/ROLLBACK commands
> --
>
> Key: HIVE-9675
> URL: https://issues.apache.org/jira/browse/HIVE-9675
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-9675.6.patch, HIVE-9675.7.patch
>
>
> Hive 0.14 added support for insert/update/delete statements with ACID 
> semantics.  Hive 0.14 only supports auto-commit mode.  We need to add support 
> for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate 
> transaction boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10022) DFS in authorization might take too long

2015-03-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371911#comment-14371911
 ] 

Thejas M Nair commented on HIVE-10022:
--

Lets say user A has permission on dir /tmp/, but not on the dir /tmp/dir1 (lets 
assume only HS2 user has permissions for it). If we allow - "create table .. 
partitioned by (p  string) .. location '/tmp' " . A select query on that table 
would then read all files under that dir. That was the intention of checking 
everything under that dir.

I agree we should handle this case where the given directory does not exist 
differently. If we go check for parent permissions, I think we should stop 
doing this check for all files under that, and just resort to checking that 
particular dir.


> DFS in authorization might take too long
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10037) JDBC support for interval expressions

2015-03-20 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10037:
--
Attachment: HIVE-10037.1.patch

Attaching initial patch.
There is no interval type in Jdbc, so year-month intervals and day-time 
intervals both use java.sql.Types.OTHER as the Jdbc type.
Also does some changes in JdbcColumn/HiveResultSetMetaData to allow a more 
accurate column precision/display size in the case of interval types, and to 
allow users to use get the HiveIntervalYearMonth/HiveIntervalDayTime values 
when calling ResultSet.getObject().

> JDBC support for interval expressions
> -
>
> Key: HIVE-10037
> URL: https://issues.apache.org/jira/browse/HIVE-10037
> Project: Hive
>  Issue Type: Sub-task
>  Components: JDBC
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-10037.1.patch
>
>
> Allow interval expressions to be retrieved from JDBC/beeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10014) LLAP : investigate showing LLAP IO usage in explain

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10014:

Attachment: HIVE-10014.patch

Outputs only of LLAP IO is enabled.
[~ashutoshc] you might be more familiar with this code (TaskCompiler, 
GenMapRedUtils, etc.), can you sanity check the patch? :)

> LLAP : investigate showing LLAP IO usage in explain
> ---
>
> Key: HIVE-10014
> URL: https://issues.apache.org/jira/browse/HIVE-10014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10014.patch
>
>
> Not sure if input formats are created during explain, or if it is possible to 
> go deep enough to create them; we should show if LLAP input format will be 
> used for the query based on whether it's vectorized, original format is 
> supported, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10022) DFS in authorization might take too long

2015-03-20 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371883#comment-14371883
 ] 

Pankit Thapar commented on HIVE-10022:
--

[~thejas] Could you please comment on this?


> DFS in authorization might take too long
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9859) Create bitwise left/right shift UDFs

2015-03-20 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371852#comment-14371852
 ] 

Jason Dere commented on HIVE-9859:
--

CCing [~pxiong] to see if he has any ideas on how to avoid the "<<" conflict

> Create bitwise left/right shift UDFs
> 
>
> Key: HIVE-9859
> URL: https://issues.apache.org/jira/browse/HIVE-9859
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Attachments: HIVE-9859.1.patch, HIVE-9859.2.patch, HIVE-9859.3.patch
>
>
> Signature:
> a << b
> a >> b
> a >>> b
> For example:
> {code}
> select 1 << 4, 8 >> 2, 8 >>> 2;
> OK
> 16   2   2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9859) Create bitwise left/right shift UDFs

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371847#comment-14371847
 ] 

Hive QA commented on HIVE-9859:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705977/HIVE-9859.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7822 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3093/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3093/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3093/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705977 - PreCommit-HIVE-TRUNK-Build

> Create bitwise left/right shift UDFs
> 
>
> Key: HIVE-9859
> URL: https://issues.apache.org/jira/browse/HIVE-9859
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Attachments: HIVE-9859.1.patch, HIVE-9859.2.patch, HIVE-9859.3.patch
>
>
> Signature:
> a << b
> a >> b
> a >>> b
> For example:
> {code}
> select 1 << 4, 8 >> 2, 8 >>> 2;
> OK
> 16   2   2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-20 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.1.patch

> Writing ORC format big table causes OOM - too many fixed sized stream buffers
> -
>
> Key: HIVE-10036
> URL: https://issues.apache.org/jira/browse/HIVE-10036
> Project: Hive
>  Issue Type: Improvement
>Reporter: Selina Zhang
>Assignee: Selina Zhang
> Attachments: HIVE-10036.1.patch
>
>
> ORC writer keeps multiple out steams for each column. Each output stream is 
> allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
> table, the memory cost is unbearable. Specially when HCatalog dynamic 
> partition involves, several hundreds files may be open and writing at the 
> same time (same problems for FileSinkOperator). 
> Global ORC memory manager controls the buffer size, but it only got kicked in 
> at 5000 rows interval. An enhancement could be done here, but the problem is 
> reducing the buffer size introduces worse compression and more IOs in read 
> path. Sacrificing the read performance is always not a good choice. 
> I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
> to the existing configurable buffer size. Most of the streams does not need 
> large buffer so the performance got improved significantly. Comparing to 
> Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
> Solving OOM for ORC completely maybe needs lots of effort , but this is 
> definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands

2015-03-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371806#comment-14371806
 ] 

Eugene Koifman commented on HIVE-9675:
--

patch 7 also includes fix to ErrorMsgs.java where the regex was mangling the 
message.

> Support START TRANSACTION/COMMIT/ROLLBACK commands
> --
>
> Key: HIVE-9675
> URL: https://issues.apache.org/jira/browse/HIVE-9675
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-9675.6.patch, HIVE-9675.7.patch
>
>
> Hive 0.14 added support for insert/update/delete statements with ACID 
> semantics.  Hive 0.14 only supports auto-commit mode.  We need to add support 
> for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate 
> transaction boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10000) 10000 whoooohooo

2015-03-20 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1:
--
Assignee: Damien Carol

> 1 whhooo
> 
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Damien Carol
>
> {noformat}
> g▄µ,  
>   
>   ▄█▀²▀▀▀██▄▄,
>   
> ▄▓▓░/;,,...`▀▀▀█▄, ,gg,   
>   
>  ╓,,,g▄▓▓░░░║║╓w;,.`²▀██▄,  ,▄▀▀░░░▀█,
>   
>▄▓▓▀▀▓▓▓█▄▄║║Ü╓w,,...`▀▀█▓▒░░░╟▄░░▀█   
>   
>   █▓░░░▀▀▒░░░║╗ÿw,..░▀▓█▄░▀█░µ░█╥ 
>   
>  █▓░░░║╗y..²▓▓▌░█░║/▀▄
>   
>  ▓▌░░░║║╓░░╩B▒▀≡██▄▄░║;..▀▓║▐▒░░¡▓▌   
>   
>  ▀▓░░║»╓▄▒▓▓▓▒░░¡.▀▌ÿ▓`   
>   
>   ▀█░╓╓░;,|`▀▓░░║y^██M
>   
> ▀█║║y░▀░░░║░█▒░█▌ 
>   
>   ▀█▄░░░║░░░▒░░▓▌ 
>   
> ▓█▄▄░░░▒▓█
>   
>  ▀█░░░▄▄██▓▓░░░▄▄▒░░▒▓█   
>   
>   ▀▌░░░g▄▄██▀▀░░░░░▒░░▓█  
>   
>`²▀²² g▓▒║░░░▒▓▓▓▄██▄░░░▓█ 
>   
> ▄▓▓█░▒▓▓▒░░▒▒▒▓▓▒░░░╣▀██▓g▄   
>   
>╓▌░▒▓▓▓█▄░░░▒▒█▓▓▒▒▒▀▀▒▓`  
>   
>▒▀█░▒▌░░▒▀░▓░░▒▓   
>   
>▒.⌠▓▄▒▓▓▓░▀░░▒▓╛   
>   
>▒..░░▓█▄░░░▓▓█▒▒▒░░░▒▓▌
>   
>▐▓▄▀▓█▄░▀▓▓█░░░▓▓▄╣▒▓▓ 
>   
> ▓▓▓█▄░▀▓█▓▓▀▀▀▓█▒▓█▄░░░▒▒█▓▀  
>   
> ▀▓██▄░░░▓▓▓█,  ▀█▓▓▓█▀`▀▀██▓▀`
>   
>  ▒░▀███▄▓▓ ²▀▀φy  ▄▄▄╖µ▄▄▄¡▄▄▄╖ 
> ,▄▄▄╖▄▄▄.   
>   ╙µ¡░░░▀▀▓▄▄g╓.  ▓▓▓▌█▓▓▓░▓▓▓▌ 
> ▐▓▓▓░▓▓▓N   
> ▀█▄▄░▀▓▓µ ▓▓▓▌█▓▓▓░▓▓▓▌╟▓▓▓▌█▓▓▓ 
> ²`²
>   ▀███▓▓▓▌╛   ░▓▓▓▌ ▓▓▓Ñ 
> ▓▌ 
>  ▀▀▀▀▓▓▓██▓▓▓█]▓▓▓▌ ▐▓▓  
> ▓Ñ 
>  `╙╨╫░░░▄█▓▀▀²▓▓▓▌█▓▓▓░▓▓▓▌ ╘▓▌  
> ▄▄▄µ   
>   ▓▓▓▌█▓▓▓░▓▓▓▌  █`  
> ▓▓▓▌   
>    ```   
> ```
>  ██╗ ██╗  ██╗  ██╗  ██╗ 
> ███║██╔═╗██╔═╗██╔═╗██╔═╗
> ╚██║██║██╔██║██║██╔██║██║██╔██║██║██╔██║
>  ██║╔╝██║╔╝██║╔╝██║╔╝██║
>  ██║╚██╔╝╚██╔╝╚██╔╝╚██╔╝
>  ╚═╝ ╚═╝  ╚═╝  ╚═╝  ╚═╝ 
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2015-03-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-8746:
---

Assignee: Prasanth Jayachandran  (was: Owen O'Malley)

> ORC timestamp columns are sensitive to daylight savings time
> 
>
> Key: HIVE-8746
> URL: https://issues.apache.org/jira/browse/HIVE-8746
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-8746.1.patch
>
>
> Hive uses Java's Timestamp class to manipulate timestamp columns. 
> Unfortunately the textual parsing in Timestamp is done in local time and the 
> internal storage is in UTC.
> ORC mostly side steps this issue by storing the difference between the time 
> and a base time also in local and storing that difference in the file. 
> Reading the file between timezones will mostly work correctly "2014-01-01 
> 12:34:56" will read correctly in every timezone.
> However, when moving between timezones with different daylight saving it 
> creates trouble. In particular, moving from a computer in PST to UTC will 
> read "2014-06-06 12:34:56" as "2014-06-06 11:34:56".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9675) Support START TRANSACTION/COMMIT/ROLLBACK commands

2015-03-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-9675:
-
Attachment: HIVE-9675.7.patch

fix streaming; fix to fix to lock manager to make it reentrant; update tests 
which were trying to to update a bucketing column; Add 
HiveOperation.ALTERTABLE_EXCHANGEPARTITION - not related to transaction work 
specifically but changes in this ticket expose this bug.

> Support START TRANSACTION/COMMIT/ROLLBACK commands
> --
>
> Key: HIVE-9675
> URL: https://issues.apache.org/jira/browse/HIVE-9675
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-9675.6.patch, HIVE-9675.7.patch
>
>
> Hive 0.14 added support for insert/update/delete statements with ACID 
> semantics.  Hive 0.14 only supports auto-commit mode.  We need to add support 
> for START TRANSACTION/COMMIT/ROLLBACK commands so that the user can demarcate 
> transaction boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10034) Variable substitution broken when a query is contained in double quotes

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371737#comment-14371737
 ] 

Sergey Shelukhin commented on HIVE-10034:
-

{noformat}
$ echo  'create table if not exists b (col int); describe ${a}'
create table if not exists b (col int); describe ${a}
$ echo  "create table if not exists b (col int); describe ${a}"
create table if not exists b (col int); describe 
{noformat}

> Variable substitution broken when a query is contained in double quotes
> ---
>
> Key: HIVE-10034
> URL: https://issues.apache.org/jira/browse/HIVE-10034
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.13.1
> Environment: Ubuntu 12.04.5 LTS
>Reporter: Jakub Kukul
>Priority: Minor
>
> Variable substitution works fine when a SQL passed from the command line is a 
> single quoted string:
> {code}
> hive --hivevar a=b -e 'create table if not exists b (col int); describe ${a}'
> {code}
> , but when the query string is contained within double quotes:
> {code}
> hive --hivevar a=b -e "create table if not exists b (col int); describe ${a}"
> {code}
> it fails with the following stack:
> {code}
> NoViableAltException(83@[])
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.descStatement(HiveParser.java:16225)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2198)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1392)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1030)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:750)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:1 cannot recognize input near 'describe' 
> '' '' in describe statement
> {code}
> This behaviour is a bit confusing, since CLI accepts double quoted strings in 
> other cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10034) Variable substitution broken when a query is contained in double quotes

2015-03-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10034.
-
Resolution: Not a Problem

> Variable substitution broken when a query is contained in double quotes
> ---
>
> Key: HIVE-10034
> URL: https://issues.apache.org/jira/browse/HIVE-10034
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.13.1
> Environment: Ubuntu 12.04.5 LTS
>Reporter: Jakub Kukul
>Priority: Minor
>
> Variable substitution works fine when a SQL passed from the command line is a 
> single quoted string:
> {code}
> hive --hivevar a=b -e 'create table if not exists b (col int); describe ${a}'
> {code}
> , but when the query string is contained within double quotes:
> {code}
> hive --hivevar a=b -e "create table if not exists b (col int); describe ${a}"
> {code}
> it fails with the following stack:
> {code}
> NoViableAltException(83@[])
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.descStatement(HiveParser.java:16225)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2198)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1392)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1030)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:750)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:1 cannot recognize input near 'describe' 
> '' '' in describe statement
> {code}
> This behaviour is a bit confusing, since CLI accepts double quoted strings in 
> other cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10034) Variable substitution broken when a query is contained in double quotes

2015-03-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371735#comment-14371735
 ] 

Sergey Shelukhin commented on HIVE-10034:
-

I suspect this is not a HIVE problem - $blah is bash syntax for variables, so 
bash is replacing ${a} in double-quoted string before command gets to hive.
Can you try to echo the same string with both types of quotes?

> Variable substitution broken when a query is contained in double quotes
> ---
>
> Key: HIVE-10034
> URL: https://issues.apache.org/jira/browse/HIVE-10034
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.13.1
> Environment: Ubuntu 12.04.5 LTS
>Reporter: Jakub Kukul
>Priority: Minor
>
> Variable substitution works fine when a SQL passed from the command line is a 
> single quoted string:
> {code}
> hive --hivevar a=b -e 'create table if not exists b (col int); describe ${a}'
> {code}
> , but when the query string is contained within double quotes:
> {code}
> hive --hivevar a=b -e "create table if not exists b (col int); describe ${a}"
> {code}
> it fails with the following stack:
> {code}
> NoViableAltException(83@[])
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.descStatement(HiveParser.java:16225)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2198)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1392)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1030)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:750)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:1 cannot recognize input near 'describe' 
> '' '' in describe statement
> {code}
> This behaviour is a bit confusing, since CLI accepts double quoted strings in 
> other cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8953) 0.5.2-SNAPSHOT Dependency

2015-03-20 Thread Olaf Flebbe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olaf Flebbe resolved HIVE-8953.
---
   Resolution: Fixed
Fix Version/s: 1.0.0
 Release Note: Fixed in 1.0

> 0.5.2-SNAPSHOT Dependency
> -
>
> Key: HIVE-8953
> URL: https://issues.apache.org/jira/browse/HIVE-8953
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
> Environment: Compiling for Apache BIGTOP. 
>Reporter: Olaf Flebbe
> Fix For: 1.0.0
>
>
> I have the issue that hive shim 0.23 needs tez Version 0.5.2-SNAPSHOT.
> Hm, I have no clue what SNAPSHOT of Apache Tez should be used. There is no 
> 0.5.2-SNAPSHOT in Maven Central Repository.
> Can I use 0.5.2 ? (This seems to be released)
> This relates to:  [HIVE-8614] I have the same problem as the reporter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9965) CBO (Calcite Return Path): Improvement in the cost calculation algorithm for Aggregate and Join operators [CBO Branch]

2015-03-20 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-9965.

Resolution: Fixed

Committed to branch. Thanks, Jesus!

> CBO (Calcite Return Path): Improvement in the cost calculation algorithm for 
> Aggregate and Join operators [CBO Branch]
> --
>
> Key: HIVE-9965
> URL: https://issues.apache.org/jira/browse/HIVE-9965
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
> Attachments: HIVE-9965.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10018) Activating SQLStandardAuth results in NPE [hbase-metastore branch]

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371676#comment-14371676
 ] 

Hive QA commented on HIVE-10018:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705961/HIVE-10018.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3092/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3092/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3092/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3092/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'hbase-handler/src/test/results/positive/hbase_timestamp.q.out'
Reverted 'hbase-handler/src/test/queries/positive/hbase_timestamp.q'
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/primitive/TestPrimitiveObjectInspectorUtils.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java'
Reverted 
'ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out'
Reverted 
'ql/src/test/results/clientpositive/tez/vectorization_decimal_date.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/vectorized_casts.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_decimal_date.q.out'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/thirdparty 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target service/target contrib/target serde/target 
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java.orig
 beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1668102.

At revision 1668102.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705961 - PreCommit-HIVE-TRUNK-Build

> Activating SQLStandardAuth results in NPE [hbase-metastore branch]
> --
>
> Key: HIVE-10018
> URL: https

[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371672#comment-14371672
 ] 

Hive QA commented on HIVE-3454:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705949/HIVE-3454.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7820 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3091/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3091/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3091/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705949 - PreCommit-HIVE-TRUNK-Build

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9859) Create bitwise left/right shift UDFs

2015-03-20 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9859:
--
Attachment: HIVE-9859.3.patch

patch #3. fixed show_functions.q.out

> Create bitwise left/right shift UDFs
> 
>
> Key: HIVE-9859
> URL: https://issues.apache.org/jira/browse/HIVE-9859
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Attachments: HIVE-9859.1.patch, HIVE-9859.2.patch, HIVE-9859.3.patch
>
>
> Signature:
> a << b
> a >> b
> a >>> b
> For example:
> {code}
> select 1 << 4, 8 >> 2, 8 >>> 2;
> OK
> 16   2   2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9965) CBO (Calcite Return Path): Improvement in the cost calculation algorithm for Aggregate and Join operators [CBO Branch]

2015-03-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371590#comment-14371590
 ] 

Jesus Camacho Rodriguez commented on HIVE-9965:
---

I think this one could go in too, even if currently we don't exploit the 
Aggregate cost in the Join selection algorithm.

> CBO (Calcite Return Path): Improvement in the cost calculation algorithm for 
> Aggregate and Join operators [CBO Branch]
> --
>
> Key: HIVE-9965
> URL: https://issues.apache.org/jira/browse/HIVE-9965
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
> Attachments: HIVE-9965.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9825) CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO branch]

2015-03-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371587#comment-14371587
 ] 

Jesus Camacho Rodriguez commented on HIVE-9825:
---

[~ashutoshc], can you take a look at it? Thanks

> CBO (Calcite Return Path): Translate PTFs and Windowing to Hive Op [CBO 
> branch]
> ---
>
> Key: HIVE-9825
> URL: https://issues.apache.org/jira/browse/HIVE-9825
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
> Attachments: HIVE-9825.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10019) Configure jenkins precommit jobs to run HMS upgrade tests

2015-03-20 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371583#comment-14371583
 ] 

Brock Noland commented on HIVE-10019:
-

+1

> Configure jenkins precommit jobs to run HMS upgrade tests
> -
>
> Key: HIVE-10019
> URL: https://issues.apache.org/jira/browse/HIVE-10019
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-10019.1.patch
>
>
> This task is created to configure all jenkins precommit jobs with different 
> branches to run HMS upgrade tests if there are changes on the metastore 
> upgrade script.
> These tests are already created, so this task is only for final jenkins 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10018) Activating SQLStandardAuth results in NPE [hbase-metastore branch]

2015-03-20 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-10018:
--
Attachment: HIVE-10018.patch

Fixes a few failures to check for null and one place where I tried to use 
equals on two different enum classes.

> Activating SQLStandardAuth results in NPE [hbase-metastore branch]
> --
>
> Key: HIVE-10018
> URL: https://issues.apache.org/jira/browse/HIVE-10018
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: hbase-metastore-branch
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-10018.patch
>
>
> Setting the config to run SQLStandardAuth and then doing even simple SQL 
> statements results in an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8817) Create unit test where we insert into an encrypted table and then read from it with pig

2015-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371539#comment-14371539
 ] 

Sergio Peña commented on HIVE-8817:
---

Thanks [~Ferd]

The test passes now. It was a problem with my IDE. I was running the test from 
it, and it was getting that error.

> Create unit test where we insert into an encrypted table and then read from 
> it with pig
> ---
>
> Key: HIVE-8817
> URL: https://issues.apache.org/jira/browse/HIVE-8817
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: encryption-branch
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Fix For: encryption-branch
>
> Attachments: HIVE-8817.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10031) Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor

2015-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371491#comment-14371491
 ] 

Sergio Peña commented on HIVE-10031:


I'm just worried about this part of the code:

{code}
public static void copyTableJobPropertiesToConf(TableDesc tbl, JobConf job) {

for (Map.Entry entry : jobProperties.entrySet()) {
  job.set(entry.getKey(), entry.getValue());
}
}
{code}

could the block replace variables in JobConf that are different in the 
jobProperties?

> Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor
> --
>
> Key: HIVE-10031
> URL: https://issues.apache.org/jira/browse/HIVE-10031
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-10031-parquet.patch
>
>
> In {{ParquetRecordReaderWrapper}} constructor, it create splits, set 
> projections and filters in conf, create task context, and then create Parquet 
> record reader. In this procedure, we could improve the logic of conf usage:
> 1. the clone of jobConf is not necessary. This could speed up getRecordReader 
> a little.
> 2. the updated jobConf is not passed to Parquet in one case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-20 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: HIVE-3454.3.patch

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-20 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: (was: HIVE-3454.3.patch)

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10032) Remove HCatalog broken java file from source code

2015-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371465#comment-14371465
 ] 

Sergio Peña commented on HIVE-10032:


Can these broken files be fixed? or the the only solution is to remove them?

> Remove HCatalog broken java file from source code
> -
>
> Key: HIVE-10032
> URL: https://issues.apache.org/jira/browse/HIVE-10032
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>Priority: Minor
> Attachments: HIVE-10032.patch
>
>
> Remove all hcatalog broken java files in java source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-03-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371448#comment-14371448
 ] 

Sergio Peña commented on HIVE-10016:


Looks good [~dongc].

Just a couple of small comments:

- In DataWritableRecordConverter.java
  Could you remove the imports that are not used anymore:
  * import parquet.schema.MessageTypeParser;
  * import org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport;

- In DataWritableReadSupport.java
  I think the 'MessageType tableSchema' is not needed. What if we just assign 
the value
  to hiveTableSchema, and use this variable in the rest of the block?

   MessageType tableSchema = new MessageType(TABLE_SCHEMA, typeListTable);
   hiveTableSchema = tableSchema;

   could it be:

  hiveTableSchema = new MessageType(TABLE_SCHEMA, typeListTable);

> Remove duplicated Hive table schema parsing in DataWritableReadSupport
> --
>
> Key: HIVE-10016
> URL: https://issues.apache.org/jira/browse/HIVE-10016
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-10016-parquet.patch
>
>
> In {{DataWritableReadSupport.init()}}, the table schema is created and its 
> string format is set in conf. When construct the 
> {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
> several times.
> We could remove these schema parsing, and improve the speed of 
> getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-20 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: HIVE-3454.3.patch

Made additional changes to vectorized functions and unit tests.

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-20 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: (was: HIVE-3454.3.patch)

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10034) Variable substitution broken when a query is contained in double quotes

2015-03-20 Thread Jakub Kukul (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Kukul updated HIVE-10034:
---
Description: 
Variable substitution works fine when a SQL passed from the command line is a 
single quoted string:
{code}
hive --hivevar a=b -e 'create table if not exists b (col int); describe ${a}'
{code}
, but when the query string is contained within double quotes:
{code}
hive --hivevar a=b -e "create table if not exists b (col int); describe ${a}"
{code}
it fails with the following stack:
{code}
NoViableAltException(83@[])
at 
org.apache.hadoop.hive.ql.parse.HiveParser.descStatement(HiveParser.java:16225)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2198)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1392)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1030)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:750)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:1 cannot recognize input near 'describe' '' 
'' in describe statement
{code}

This behaviour is a bit confusing, since CLI accepts double quoted strings in 
other cases.

  was:
Variable substitution works fine when SQL passed from the command line is a 
single quoted string:
{code}
hive --hivevar a=b -e 'create table if not exists b (col int); describe ${a}'
{code}
, but when the query string is contained within double quotes:
{code}
hive --hivevar a=b -e "create table if not exists b (col int); describe ${a}"
{code}
it fails with the following stack:
{code}
NoViableAltException(83@[])
at 
org.apache.hadoop.hive.ql.parse.HiveParser.descStatement(HiveParser.java:16225)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2198)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1392)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1030)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:417)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:750)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(

[jira] [Commented] (HIVE-10007) Support qualified table name in analyze table compute statistics for columns

2015-03-20 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371279#comment-14371279
 ] 

Chaoyu Tang commented on HIVE-10007:


The test failure of transform_acid.q seems unrelated. It could not be 
reproduced in my local env.

> Support qualified table name in analyze table compute statistics for columns
> 
>
> Key: HIVE-10007
> URL: https://issues.apache.org/jira/browse/HIVE-10007
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, Statistics
>Affects Versions: 1.0.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>  Labels: TODOC1.2
> Attachments: HIVE-10007.1.patch, HIVE-10007.patch
>
>
> Currently "analyze table compute statistics for columns" command can not 
> compute column stats for a table in a different database since it does not 
> support qualified table name. You need switch to that table database in order 
> to compute its column stats. For example, you have to "use psqljira", then 
> "analyze table src compute statistics for columns" for the table src under 
> psqljira.
> This JIRA will provide the support to qualified table name in analyze column 
> stats command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371274#comment-14371274
 ] 

Jimmy Xiang commented on HIVE-10006:


Thanks for the clarification. Yes, we should be careful about the thread local 
cache usage.
I think we do need caching, which is good for performance. I prefer patch v5. 
There could be issues with other input formats. We should fix them as we find 
out.

> RSC has memory leak while execute multi queries.[Spark Branch]
> --
>
> Key: HIVE-10006
> URL: https://issues.apache.org/jira/browse/HIVE-10006
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Critical
>  Labels: Spark-M5
> Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, 
> HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, 
> HIVE-10006.5-spark.patch, HIVE-10006.6-spark.patch, HIVE-10006.7-spark.patch
>
>
> While execute query with RSC, MapWork/ReduceWork number is increased all the 
> time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371014#comment-14371014
 ] 

Hive QA commented on HIVE-10006:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705855/HIVE-10006.7-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7644 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_precision
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.testCombine
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/800/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/800/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-800/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705855 - PreCommit-HIVE-SPARK-Build

> RSC has memory leak while execute multi queries.[Spark Branch]
> --
>
> Key: HIVE-10006
> URL: https://issues.apache.org/jira/browse/HIVE-10006
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Critical
>  Labels: Spark-M5
> Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, 
> HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, 
> HIVE-10006.5-spark.patch, HIVE-10006.6-spark.patch, HIVE-10006.7-spark.patch
>
>
> While execute query with RSC, MapWork/ReduceWork number is increased all the 
> time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-20 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371009#comment-14371009
 ] 

Xuefu Zhang commented on HIVE-10006:


Re: Besides, is this ThreadLocal MapWork/ReduceWork cache new introduced 
optimization?

Yes, it's introduced in HIVE-9127. Looks like we need to be careful about this 
threadlocal map, indeed.

> RSC has memory leak while execute multi queries.[Spark Branch]
> --
>
> Key: HIVE-10006
> URL: https://issues.apache.org/jira/browse/HIVE-10006
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Critical
>  Labels: Spark-M5
> Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, 
> HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, 
> HIVE-10006.5-spark.patch, HIVE-10006.6-spark.patch, HIVE-10006.7-spark.patch
>
>
> While execute query with RSC, MapWork/ReduceWork number is increased all the 
> time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-03-20 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371002#comment-14371002
 ] 

Dong Chen commented on HIVE-10016:
--

Thanks for your review! [~Ferd].

Yes, Parquet have a new instance there. The ReadSupport instance in Hive side 
is just for providing some info for ParquetRecordReaderWrapper creation.

> Remove duplicated Hive table schema parsing in DataWritableReadSupport
> --
>
> Key: HIVE-10016
> URL: https://issues.apache.org/jira/browse/HIVE-10016
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-10016-parquet.patch
>
>
> In {{DataWritableReadSupport.init()}}, the table schema is created and its 
> string format is set in conf. When construct the 
> {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
> several times.
> We could remove these schema parsing, and improve the speed of 
> getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10030) The unit test of TestNewInputOutputFormat.testNewInputFormat case got failure with JDK1.8

2015-03-20 Thread Yi Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhou updated HIVE-10030:
---
Summary: The unit test of TestNewInputOutputFormat.testNewInputFormat case 
got failure with JDK1.8  (was: The unit test of testNewInputFormat case got 
failure with JDK1.8)

> The unit test of TestNewInputOutputFormat.testNewInputFormat case got failure 
> with JDK1.8
> -
>
> Key: HIVE-10030
> URL: https://issues.apache.org/jira/browse/HIVE-10030
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Yi Zhou
>
> Running org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.126 sec 
> <<< FAILURE! - in org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
> testNewInputFormat(org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat) 
>  Time elapsed: 4.938 sec  <<< FAILURE!
> junit.framework.ComparisonFailure: expected:<...in}, {1234, hat}], 
> {[mauddib={1, mauddib}, chani={5, chani]}}, 2000-03-12 15:00...> but 
> was:<...in}, {1234, hat}], {[chani={5, chani}, mauddib={1, mauddib]}}, 
> 2000-03-12 15:00...>
> at junit.framework.Assert.assertEquals(Assert.java:100)
> at junit.framework.Assert.assertEquals(Assert.java:107)
> at 
> org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewInputFormat(TestNewInputOutputFormat.java:125)
> For reproduce this test case, run "mvn test 
> -Dtest=TestNewInputOutputFormat#testNewInputFormat -Phadoop-2"
>   
>   
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10016) Remove duplicated Hive table schema parsing in DataWritableReadSupport

2015-03-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370962#comment-14370962
 ] 

Ferdinand Xu commented on HIVE-10016:
-

Hi Dong, one question here:
Are we maintaining two copies of DataWritableReadSupport 
in Parquet and Hive? Sorry for my newbie question.

> Remove duplicated Hive table schema parsing in DataWritableReadSupport
> --
>
> Key: HIVE-10016
> URL: https://issues.apache.org/jira/browse/HIVE-10016
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-10016-parquet.patch
>
>
> In {{DataWritableReadSupport.init()}}, the table schema is created and its 
> string format is set in conf. When construct the 
> {{ParquetRecordReaderWrapper}} , the schema is fetched from conf and parsed 
> several times.
> We could remove these schema parsing, and improve the speed of 
> getRecordReader  a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10031) Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor

2015-03-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370961#comment-14370961
 ] 

Ferdinand Xu commented on HIVE-10031:
-

Hi Dong,
AFAIK, the class Utilities will copy the table 
configurations into this conf. If we don't do a clone operation on this 
configuration, it will pollute the original configuration if you get the 
parquet input split next time. In the same time, do you have any data about the 
time saving without clone?

Any thoughts? 

> Modify the using of jobConf variable in ParquetRecordReaderWrapper constructor
> --
>
> Key: HIVE-10031
> URL: https://issues.apache.org/jira/browse/HIVE-10031
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-10031-parquet.patch
>
>
> In {{ParquetRecordReaderWrapper}} constructor, it create splits, set 
> projections and filters in conf, create task context, and then create Parquet 
> record reader. In this procedure, we could improve the logic of conf usage:
> 1. the clone of jobConf is not necessary. This could speed up getRecordReader 
> a little.
> 2. the updated jobConf is not passed to Parquet in one case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10033) The unit test of TestWebHCatE2e.getStatus case got failure with JDK1.8

2015-03-20 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370957#comment-14370957
 ] 

Yi Zhou commented on HIVE-10033:


For reproduce step:
mvn test -Dtest=TestWebHCatE2e#getStatus -Phadoop-2

> The unit test of TestWebHCatE2e.getStatus case got failure with JDK1.8
> --
>
> Key: HIVE-10033
> URL: https://issues.apache.org/jira/browse/HIVE-10033
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Yi Zhou
>
> Running org.apache.hive.hcatalog.templeton.TestWebHCatE2e
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.385 sec <<< 
> FAILURE! - in org.apache.hive.hcatalog.templeton.TestWebHCatE2e
> getStatus(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time elapsed: 
> 0.876 sec  <<< FAILURE!
> junit.framework.ComparisonFailure: GET 
> http://localhost:58283/templeton/v1/status?user.name=johndoe 
> {"version":"v1","status":"ok"} expected:<{"[status":"ok","version":"v1]"}> 
> but was:<{"[version":"v1","status":"ok]"}>
> at junit.framework.Assert.assertEquals(Assert.java:100)
> at 
> org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus(TestWebHCatE2e.java:97)
> Results :
> Failed tests:
>   TestWebHCatE2e.getStatus:97 GET 
> http://localhost:58283/templeton/v1/status?user.name=johndoe 
> {"version":"v1","status":"ok"} expected:<{"[status":"ok","version":"v1]"}> 
> but was:<{"[version":"v1","status":"ok]"}>
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10030) The unit test of testNewInputFormat case got failure with JDK1.8

2015-03-20 Thread Yi Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhou updated HIVE-10030:
---
Summary: The unit test of testNewInputFormat case got failure with JDK1.8  
(was: The unit test of testNewInputFormat case got failure)

> The unit test of testNewInputFormat case got failure with JDK1.8
> 
>
> Key: HIVE-10030
> URL: https://issues.apache.org/jira/browse/HIVE-10030
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Yi Zhou
>
> Running org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.126 sec 
> <<< FAILURE! - in org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
> testNewInputFormat(org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat) 
>  Time elapsed: 4.938 sec  <<< FAILURE!
> junit.framework.ComparisonFailure: expected:<...in}, {1234, hat}], 
> {[mauddib={1, mauddib}, chani={5, chani]}}, 2000-03-12 15:00...> but 
> was:<...in}, {1234, hat}], {[chani={5, chani}, mauddib={1, mauddib]}}, 
> 2000-03-12 15:00...>
> at junit.framework.Assert.assertEquals(Assert.java:100)
> at junit.framework.Assert.assertEquals(Assert.java:107)
> at 
> org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewInputFormat(TestNewInputOutputFormat.java:125)
> For reproduce this test case, run "mvn test 
> -Dtest=TestNewInputOutputFormat#testNewInputFormat -Phadoop-2"
>   
>   
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Commented] (HIVE-10032) Remove HCatalog broken java file from source code

2015-03-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370946#comment-14370946
 ] 

Hive QA commented on HIVE-10032:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705843/HIVE-10032.patch

{color:green}SUCCESS:{color} +1 7819 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3089/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3089/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3089/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12705843 - PreCommit-HIVE-TRUNK-Build

> Remove HCatalog broken java file from source code
> -
>
> Key: HIVE-10032
> URL: https://issues.apache.org/jira/browse/HIVE-10032
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>Priority: Minor
> Attachments: HIVE-10032.patch
>
>
> Remove all hcatalog broken java files in java source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10030) The unit test of testNewInputFormat case got failure

2015-03-20 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370947#comment-14370947
 ] 

Yi Zhou commented on HIVE-10030:


This unit test case run with JDK1.8

> The unit test of testNewInputFormat case got failure
> 
>
> Key: HIVE-10030
> URL: https://issues.apache.org/jira/browse/HIVE-10030
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Yi Zhou
>
> Running org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.126 sec 
> <<< FAILURE! - in org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat
> testNewInputFormat(org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat) 
>  Time elapsed: 4.938 sec  <<< FAILURE!
> junit.framework.ComparisonFailure: expected:<...in}, {1234, hat}], 
> {[mauddib={1, mauddib}, chani={5, chani]}}, 2000-03-12 15:00...> but 
> was:<...in}, {1234, hat}], {[chani={5, chani}, mauddib={1, mauddib]}}, 
> 2000-03-12 15:00...>
> at junit.framework.Assert.assertEquals(Assert.java:100)
> at junit.framework.Assert.assertEquals(Assert.java:107)
> at 
> org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewInputFormat(TestNewInputOutputFormat.java:125)
> For reproduce this test case, run "mvn test 
> -Dtest=TestNewInputOutputFormat#testNewInputFormat -Phadoop-2"
>   
>   
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >