[GitHub] hive pull request #486: [AXE] Add `toString' interface to abstract operator ...

2018-11-08 Thread czkkkkkk
GitHub user czkk opened a pull request:

https://github.com/apache/hive/pull/486

[AXE] Add `toString' interface to abstract operator desc



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/czkk/hive dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/486.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #486


commit 2c4a99fe83073310428ea61d5462245738a84092
Author: zekucai 
Date:   2018-11-08T12:43:48Z

[AXE] Add `toString' interface to abstract operator desc




---


[GitHub] hive pull request #486: [AXE] Add `toString' interface to abstract operator ...

2018-11-08 Thread czkkkkkk
Github user czkk closed the pull request at:

https://github.com/apache/hive/pull/486


---


Re: Review Request 69107: HIVE-20512

2018-11-08 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Nov. 8, 2018, 5:16 p.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Changes
---

Adding awaitTermination and shutDownNow after cancelling the thread in close().


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/7/

Changes: https://reviews.apache.org/r/69107/diff/6-7/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



[jira] [Created] (HIVE-20894) Clean Up JDBC HiveQueryResultSet

2018-11-08 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20894:
--

 Summary: Clean Up JDBC HiveQueryResultSet
 Key: HIVE-20894
 URL: https://issues.apache.org/jira/browse/HIVE-20894
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 4.0.0
Reporter: BELUGA BEHR






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20893) BloomK Filter probing method is not thread safe

2018-11-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20893:
-

 Summary: BloomK Filter probing method is not thread safe
 Key: HIVE-20893
 URL: https://issues.apache.org/jira/browse/HIVE-20893
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Reporter: slim bouguerra


As far i can tell this is not an issue for Hive yet (most of the usage of 
probing seems to be done by one thread at a time) but it is an issue of other 
users like Druid as per the following 
issue.[https://github.com/apache/incubator-druid/issues/6546]

The fix is proposed by the author of 
[https://github.com/apache/incubator-druid/pull/6584] is to make couple of 
local fields as ThreadLocals.

Idea looks good to me and doesn't have any perf drawbacks.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20891) Call alter_partition in batch when dynamically loading partitions

2018-11-08 Thread Laszlo Pinter (JIRA)
Laszlo Pinter created HIVE-20891:


 Summary: Call alter_partition in batch when dynamically loading 
partitions
 Key: HIVE-20891
 URL: https://issues.apache.org/jira/browse/HIVE-20891
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 4.0.0
Reporter: Laszlo Pinter
Assignee: Laszlo Pinter


When dynamically loading partitions, the setStatsPropAndAlterPartition() is 
called for each partition one by one, resulting in unnecessary calls to the 
metastore client. This whole logic can be changed to just one call. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20895) Cleanup JdbcColumn Class

2018-11-08 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20895:
--

 Summary: Cleanup JdbcColumn Class
 Key: HIVE-20895
 URL: https://issues.apache.org/jira/browse/HIVE-20895
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 3.1.1, 4.0.0
Reporter: BELUGA BEHR






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20886) Fix NPE: GenericUDFLower

2018-11-08 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HIVE-20886:
---

 Summary: Fix NPE: GenericUDFLower
 Key: HIVE-20886
 URL: https://issues.apache.org/jira/browse/HIVE-20886
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan


{noformat}
create table if not exists test1(uuid array);
select lower(uuid) from test1;

Error: Error while compiling statement: FAILED: NullPointerException null 
(state=42000,code=4)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 69294: HIVE-20826 Enhance HiveSemiJoin rule to convert join + group by on left side to Left Semi Join

2018-11-08 Thread Vineet Garg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69294/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-20826
https://issues.apache.org/jira/browse/HIVE-20826


Repository: hive-git


Description
---

See jira


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSemiJoinRule.java
 7799090d43 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 11c8f5f02c 
  ql/src/test/queries/clientpositive/semijoin.q 144069bbe6 
  ql/src/test/results/clientpositive/llap/semijoin.q.out 531ef46c78 
  ql/src/test/results/clientpositive/perf/tez/cbo_query14.q.out 9bb4f2e7f2 
  ql/src/test/results/clientpositive/perf/tez/query14.q.out c078c271ec 
  ql/src/test/results/clientpositive/spark/semijoin.q.out a787bce4b4 


Diff: https://reviews.apache.org/r/69294/diff/1/


Testing
---


Thanks,

Vineet Garg



[jira] [Created] (HIVE-20889) Support timestamp-micros in AvroSerDe

2018-11-08 Thread vinisha (JIRA)
vinisha created HIVE-20889:
--

 Summary: Support timestamp-micros in AvroSerDe
 Key: HIVE-20889
 URL: https://issues.apache.org/jira/browse/HIVE-20889
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: vinisha


This change only supports timestamp-millis. Avro 1.8.2 also supports 
timestamp-micros. 
[https://avro.apache.org/docs/1.8.2/spec.html#Timestamp+%28microsecond+precision%29]

timestamp-micros should also be supported in hive AvroSerde because hive 
timestamps support nano second level precision.

[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-TimestampstimestampTimestamps]

One possibility is to support avro timestamp-millis and avro timestamp-micros 
in serialization. Avro Deserializer can map hive timestamp to timestamp-micros. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20892) Benchmark XXhash for 64 bit hashing function instead of Murmum hash

2018-11-08 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20892:
-

 Summary: Benchmark XXhash for 64 bit hashing function instead of 
Murmum hash
 Key: HIVE-20892
 URL: https://issues.apache.org/jira/browse/HIVE-20892
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
Assignee: slim bouguerra


https://cyan4973.github.io/xxHash/
FYI this is used by lot of other MPP systems ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20888) TxnHandler: sort() called on immutable lists

2018-11-08 Thread Gopal V (JIRA)
Gopal V created HIVE-20888:
--

 Summary: TxnHandler: sort() called on immutable lists
 Key: HIVE-20888
 URL: https://issues.apache.org/jira/browse/HIVE-20888
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


{code}
} else {
  assert (!rqst.isSetSrcTxnToWriteIdList());
  assert (rqst.isSetTxnIds());
  txnIds = rqst.getTxnIds();
}

Collections.sort(txnIds); //easier to read logs and for assumption done 
in replication flow
{code}

when the input comes from

{code}
  @Override
  public long allocateTableWriteId(long txnId, String dbName, String tableName) 
throws TException {
return allocateTableWriteIdsBatch(Collections.singletonList(txnId), dbName, 
tableName).get(0).getWriteId();
  }
{code}

{code}
java.lang.UnsupportedOperationException: null
at java.util.AbstractList.set(AbstractList.java:132) ~[?:1.8.0]
at java.util.AbstractList$ListItr.set(AbstractList.java:426) ~[?:1.8.0]
at java.util.Collections.sort(Collections.java:170) ~[?:1.8.0]
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.allocateTableWriteIds(TxnHandler.java:1523)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.allocate_table_write_ids(HiveMetaStore.java:7349)
 ~[hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69202: HIVE-20804 Further improvements to group by optimization with constraints

2018-11-08 Thread Jesús Camacho Rodríguez


> On Nov. 7, 2018, 2:09 a.m., Jesús Camacho Rodríguez wrote:
> > ql/src/test/queries/clientpositive/constraints_optimization.q
> > Lines 355 (patched)
> > 
> >
> > Can we add two more tests:
> > - One with column swapping before GroupBy (probably if you use group by 
> > b,c,a and table contains a,b,c, it should work and add the Project in 
> > between the TS and the GroupBy).
> > - One with a join and a group by on one column for other table that is 
> > also the join key of the table where all columns are coming from (as in the 
> > whiteboard).
> 
> Vineet Garg wrote:
> I have added the first test, but the one with join doesn't work
> e.g. 
> -- transitive equivalence on pk column, therefore all other columns 
> shoule be removed
> EXPLAIN   CBO
> SELECT
>   C_FIRST_NAME
> FROM
>   CUSTOMER
> , STORE_SALES
> WHERE
>   C_CUSTOMER_SK   =   SS_CUSTOMER_SK
> GROUP BY
>   SS_CUSTOMER_SK
> , C_FIRST_NAME
> , C_LAST_NAME
> , C_PREFERRED_CUST_FLAG
> , C_BIRTH_COUNTRY
> , C_LOGIN
> , C_EMAIL_ADDRESS
> ;
> C_CUSTOMER_SK here is key so ideally we should remove all columns from 
> group by except SS_CUSTOMER_EX and C_FISRT_NAME but getExpressionLineage 
> returs only STOERS_SALES as ref for SS_CUSTOMER_SK column. 
> I looked at the RelMdExpressionLineage logic for join and it doesn't look 
> like it take join condition into account while determining lineage.

You are right, it will need additional logic :( We can create a follow-up for 
that.


- Jesús


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69202/#review210362
---


On Nov. 7, 2018, 7:39 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69202/
> ---
> 
> (Updated Nov. 7, 2018, 7:39 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-20804
> https://issues.apache.org/jira/browse/HIVE-20804
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See Jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 
> 9aa30129b6 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
>  b7c31bdfca 
>   ql/src/test/queries/clientpositive/constraints_optimization.q 70ab8509c5 
>   ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
> 96caa4d6dd 
> 
> 
> Diff: https://reviews.apache.org/r/69202/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>



Work Needs Review and Commit

2018-11-08 Thread dam6923
Hello Team,

I've been trying to focus my latest work on making the logging of the
service more concise and more useful.  There's a lot of clutter in the logs
that makes troubleshooting difficult.  I've also submitted a few ideas for
other small improvements.

Can you please assist in reviewing and committing the following?

https://issues.apache.org/jira/browse/HIVE-20831
https://issues.apache.org/jira/browse/HIVE-18902
https://issues.apache.org/jira/browse/HIVE-20255
https://issues.apache.org/jira/browse/HIVE-20239
https://issues.apache.org/jira/browse/HIVE-20223
https://issues.apache.org/jira/browse/HIVE-20161
https://issues.apache.org/jira/browse/HIVE-20160
https://issues.apache.org/jira/browse/HIVE-20484
https://issues.apache.org/jira/browse/HIVE-19846
https://issues.apache.org/jira/browse/HIVE-19403

Thanks!


Re: Work Needs Review and Commit

2018-11-08 Thread Prasanth Jayachandran
Reviewed patches that weren’t already reviewed. Please do ping in the jira 
after a green test run for committing.

Thanks
Prasanth

> On Nov 8, 2018, at 2:26 PM, dam6923  wrote:
> 
> Hello Team,
> 
> I've been trying to focus my latest work on making the logging of the
> service more concise and more useful.  There's a lot of clutter in the logs
> that makes troubleshooting difficult.  I've also submitted a few ideas for
> other small improvements.
> 
> Can you please assist in reviewing and committing the following?
> 
> https://issues.apache.org/jira/browse/HIVE-20831
> https://issues.apache.org/jira/browse/HIVE-18902
> https://issues.apache.org/jira/browse/HIVE-20255
> https://issues.apache.org/jira/browse/HIVE-20239
> https://issues.apache.org/jira/browse/HIVE-20223
> https://issues.apache.org/jira/browse/HIVE-20161
> https://issues.apache.org/jira/browse/HIVE-20160
> https://issues.apache.org/jira/browse/HIVE-20484
> https://issues.apache.org/jira/browse/HIVE-19846
> https://issues.apache.org/jira/browse/HIVE-19403
> 
> Thanks!



[jira] [Created] (HIVE-20896) CachedStore fail to cache stats in multiple code paths

2018-11-08 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-20896:
-

 Summary: CachedStore fail to cache stats in multiple code paths
 Key: HIVE-20896
 URL: https://issues.apache.org/jira/browse/HIVE-20896
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Daniel Dai
Assignee: Daniel Dai


Bunch of issues discovered in CachedStore to keep up column statistics:
1. The criteria for partition/non-partition is wrong 
(table.isSetPartitionKeys() is always true)
2. In update(), partition column stats are removed when populate table basic 
stats
3. Dirty flags are true right after prewarm(), so the first update() does not 
do anything
4. Could invoke cacheLock without holding the lock, which results a freeze in 
update()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)