[jira] [Created] (HIVE-20608) Incorrect handling of sql command args in hive service leading to misleading error messages

2018-09-19 Thread Soumabrata Chakraborty (JIRA)
Soumabrata Chakraborty created HIVE-20608:
-

 Summary: Incorrect handling of sql command args in hive service 
leading to misleading error messages
 Key: HIVE-20608
 URL: https://issues.apache.org/jira/browse/HIVE-20608
 Project: Hive
  Issue Type: Bug
Reporter: Soumabrata Chakraborty
Assignee: Soumabrata Chakraborty






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2018-09-19 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-20607:
---

 Summary: TxnHandler should use PreparedStatement to execute direct 
SQL queries.
 Key: HIVE-20607
 URL: https://issues.apache.org/jira/browse/HIVE-20607
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore, Transactions
Affects Versions: 4.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan
 Fix For: 4.0.0


TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
in Hive metastore RDBMS.
Most of the methods are direct calls from Metastore api which should be 
directly append input string arguments to the SQL string.
Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20606) hive3.1 beeline to dns complaining about ssl on ip

2018-09-19 Thread t oo (JIRA)
t oo created HIVE-20606:
---

 Summary: hive3.1 beeline to dns complaining about ssl on ip
 Key: HIVE-20606
 URL: https://issues.apache.org/jira/browse/HIVE-20606
 Project: Hive
  Issue Type: Bug
  Components: Beeline, HiveServer2
Affects Versions: 3.1.0
Reporter: t oo


Why is beeline complaining about ip when i use dns in the connection? I have a 
valid cert/jks on the dns. Exact same beeline worked when running on hive2.3.2 
but this is hive3.1.0

[ec2-user@ip-10-1-2-3 logs]$ $HIVE_HOME/bin/beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/lib/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/lib/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 3.1.0 by Apache Hive
beeline> !connect 
jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
 userhere passhere
Connecting to 
jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
18/09/20 04:49:06 [main]: WARN jdbc.HiveConnection: Failed to connect to 
mydns:1
Unknown HS2 problem when communicating with Thrift server.
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit:
 javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
No subject alternative names matching IP address 10.1.2.3 found 
(state=08S01,code=0)
beeline>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20605) merge master-tez092 branch into master

2018-09-19 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-20605:
---

 Summary: merge master-tez092 branch into master
 Key: HIVE-20605
 URL: https://issues.apache.org/jira/browse/HIVE-20605
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


I got tired of waiting for Tez 0.92 release (it's been pending for half a year) 
so I created a branch to prevent various patches from conflicting with each 
other.
This jira is to merge them into master after Tez 0.92 is finally released.
The jiras here: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%20master-tez092
 should then be updated with the corresponding Hive release version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20604) Minor compaction disables ORC column stats

2018-09-19 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20604:
-

 Summary: Minor compaction disables ORC column stats
 Key: HIVE-20604
 URL: https://issues.apache.org/jira/browse/HIVE-20604
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 4.0.0


{noformat}
  @Override
  public org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter
getRawRecordWriter(Path path, Options options) throws IOException {
final Path filename = AcidUtils.createFilename(path, options);
final OrcFile.WriterOptions opts =
OrcFile.writerOptions(options.getTableProperties(), 
options.getConfiguration());
if (!options.isWritingBase()) {
  opts.bufferSize(OrcRecordUpdater.DELTA_BUFFER_SIZE)
  .stripeSize(OrcRecordUpdater.DELTA_STRIPE_SIZE)
  .blockPadding(false)
  .compress(CompressionKind.NONE)
  .rowIndexStride(0)
  ;
}
{noformat}

{{rowIndexStride(0)}} makes {{StripeStatistics.getColumnStatistics()}} return 
objects but with meaningless values, like min/max for 
{{IntegerColumnStatistics}} set to MIN_LONG/MAX_LONG.

This interferes with ability to infer min ROW_ID for a split but also creates 
inefficient files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20603) "Wrong FS" error when inserting to partition after changing table location filesystem

2018-09-19 Thread Jason Dere (JIRA)
Jason Dere created HIVE-20603:
-

 Summary: "Wrong FS" error when inserting to partition after 
changing table location filesystem
 Key: HIVE-20603
 URL: https://issues.apache.org/jira/browse/HIVE-20603
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere


Inserting into an existing partition, after changing a table's location to 
point to a different HDFS filesystem:
{noformat}
   query += "CREATE TABLE test_managed_tbl (id int, name string, dept string) 
PARTITIONED BY (year int);\n"
query += "INSERT INTO test_managed_tbl PARTITION (year=2016) VALUES 
(8,'Henry','CSE');\n"
query += "ALTER TABLE test_managed_tbl ADD PARTITION (year=2017);\n"
query += "ALTER TABLE test_managed_tbl SET LOCATION 
  
'hdfs://ns2/warehouse/tablespace/managed/hive/test_managed_tbl'"
query += "INSERT INTO test_managed_tbl PARTITION (year=2017) VALUES 
(9,'Harris','CSE');\n"
{noformat}

Results in the following error:
{noformat}
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://ns1/warehouse/tablespace/managed/hive/test_managed_tbl/year=2017, 
expected: hdfs://ns2
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:781)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:240)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1580)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1595)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:4141)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1966)
at 
org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:477)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:397)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:210)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2701)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2372)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2048)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1746)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1740)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

2018-09-19 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
---

(Updated szept. 19, 2018, 11:14 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
Zhang.


Repository: hive-git


Description (updated)
---

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the 
intern-ed string of the path.


Diffs (updated)
-

  ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
3293100af96dc60408c53065fa89143ead98f818 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/2/

Changes: https://reviews.apache.org/r/68474/diff/1-2/


Testing
---


Thanks,

Antal Sinkovits



Review Request 68772: HIVE-20593

2018-09-19 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68772/
---

Review request for hive and Eugene Koifman.


Bugs: HIVE-20593
https://issues.apache.org/jira/browse/HIVE-20593


Repository: hive-git


Description
---

Load Data for partitioned ACID tables fails with bucketId out of range: -1

The tempTblObj is inherited from target table. However, the only table property 
which needs to be inherited is bucketing version. Properties like transactional 
etc should be ignored.


Diffs
-

  
data/files/load_data_job_acid/20180918230307-b382b8c7-271c-4025-be64-4a68f4db32e5_0_0
 PRE-CREATION 
  
data/files/load_data_job_acid/20180918230307-b382b8c7-271c-4025-be64-4a68f4db32e5_1_0
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
8d33cf5b23 
  ql/src/test/queries/clientpositive/load_data_using_job.q b760d9bc7e 
  ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 21fd9334ea 


Diff: https://reviews.apache.org/r/68772/diff/1/


Testing
---


Thanks,

Deepak Jaiswal



[jira] [Created] (HIVE-20602) hive3 crashes after 1min

2018-09-19 Thread t oo (JIRA)
t oo created HIVE-20602:
---

 Summary: hive3 crashes after 1min
 Key: HIVE-20602
 URL: https://issues.apache.org/jira/browse/HIVE-20602
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2, Metastore, Standalone Metastore
Affects Versions: 3.0.0
Reporter: t oo


Running hiveserver2 process (v3.0.0 of hive) on ec2 (not emr), the process 
starts up and for the first 1min everything is ok (I can make beeline 
connection, create/repair/select external hive tables) but then the hiveserver2 
process crashes. If I restart the process and even do nothing the hiveserver2 
process crashes after 1min. When checking the logs I see messages like 'number 
of connections to metastore: 1','number of connections to metastore: 2','number 
of connections to metastore: 3' then 'could not bind to port 1 port already 
in use' then end of the logs.

I made some experiments on few different ec2s (if i use hive v2.3.2 the 
hiveserver2 process never crashes), but if i use hive v3.0.0 it consistently 
crashes after a min.

Metastore db is mysql rds, hive metastore process never crashed. I can see the 
external hive table ddls are persisted in the mysql (ie DBS, TBLS tables).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68765: HIVE-20595: Add findbugs-exclude.xml to metastore-server

2018-09-19 Thread Alexander Kolbasov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68765/#review208776
---


Ship it!




Ship It!

- Alexander Kolbasov


On Sept. 19, 2018, 9:30 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68765/
> ---
> 
> (Updated Sept. 19, 2018, 9:30 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov, Peter Vary, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20595: Add findbugs-exclude.xml to metastore-server
> 
> 
> Diffs
> -
> 
>   standalone-metastore/metastore-server/findbugs/findbugs-exclude.xml 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68765/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



[jira] [Created] (HIVE-20601) EnvironmentContext null in ALTER_PARTITION event in DbNotificationListener

2018-09-19 Thread Bharathkrishna Guruvayoor Murali (JIRA)
Bharathkrishna Guruvayoor Murali created HIVE-20601:
---

 Summary: EnvironmentContext null in ALTER_PARTITION event in 
DbNotificationListener
 Key: HIVE-20601
 URL: https://issues.apache.org/jira/browse/HIVE-20601
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 3.0.0, 4.0.0
Reporter: Bharathkrishna Guruvayoor Murali
Assignee: Bharathkrishna Guruvayoor Murali


Cause : EnvironmentContext not passed here:

[https://github.com/apache/hive/blob/36c33ca066c99dfdb21223a711c0c3f33c85b943/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L726]

 

It will be useful to have the environmentContext passed to 
DbNotificationListener in this case, to know if the alter happened due to a 
stat change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20600) Metastore connection leak

2018-09-19 Thread Damon Cortesi (JIRA)
Damon Cortesi created HIVE-20600:


 Summary: Metastore connection leak
 Key: HIVE-20600
 URL: https://issues.apache.org/jira/browse/HIVE-20600
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 2.3.3
Reporter: Damon Cortesi
 Attachments: consume_threads.py

Within the execute method of HiveServer2, there appears to be a connection 
leak. With fairly straightforward series of INSERT statements, the connection 
count in the logs continues to increase over time. Under certain loads, this 
can also consume all underlying threads of the Hive metastore and result in HS2 
becoming unresponsive to new connections.

The log below is the result of some python code executing a single insert 
statement, and then looping through a series of 10 more insert statements. We 
can see there's one dangling connection left open after each execution leaving 
us with 12 open connections (11 from the execute statements + 1 from HS2 
startup).

{code}
2018-09-19T17:14:32,108 INFO [main([])]: hive.metastore 
(HiveMetaStoreClient.java:open(481)) - Opened a connection to metastore, 
current connections: 1
 2018-09-19T17:14:48,175 INFO [29049f74-73c4-4f48-9cf7-b4bfe524a85b 
HiveServer2-Handler-Pool: Thread-31([])]: hive.metastore 
(HiveMetaStoreClient.java:open(481)) - Opened a connection to metastore, 
current connections: 2
 2018-09-19T17:15:05,543 INFO [HiveServer2-Background-Pool: Thread-36([])]: 
hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to 
metastore, current connections: 1
 2018-09-19T17:15:05,548 INFO [HiveServer2-Background-Pool: Thread-36([])]: 
hive.metastore (HiveMetaStoreClient.java:open(481)) - Opened a connection to 
metastore, current connections: 2
 2018-09-19T17:15:05,932 INFO [HiveServer2-Background-Pool: Thread-36([])]: 
hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to 
metastore, current connections: 1
 2018-09-19T17:15:05,935 INFO [HiveServer2-Background-Pool: Thread-36([])]: 
hive.metastore (HiveMetaStoreClient.java:open(481)) - Opened a connection to 
metastore, current connections: 2
 2018-09-19T17:15:06,123 INFO [HiveServer2-Background-Pool: Thread-36([])]: 
hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to 
metastore, current connections: 1
 2018-09-19T17:15:06,126 INFO [HiveServer2-Background-Pool: Thread-36([])]: 
hive.metastore (HiveMetaStoreClient.java:open(481)) - Opened a connection to 
metastore, current connections: 2
...
 2018-09-19T17:15:20,626 INFO [29049f74-73c4-4f48-9cf7-b4bfe524a85b 
HiveServer2-Handler-Pool: Thread-31([])]: hive.metastore 
(HiveMetaStoreClient.java:open(481)) - Opened a connection to metastore, 
current connections: 12
 2018-09-19T17:15:21,153 INFO [HiveServer2-Background-Pool: Thread-162([])]: 
hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to 
metastore, current connections: 11
 2018-09-19T17:15:21,155 INFO [HiveServer2-Background-Pool: Thread-162([])]: 
hive.metastore (HiveMetaStoreClient.java:open(481)) - Opened a connection to 
metastore, current connections: 12
 2018-09-19T17:15:21,306 INFO [HiveServer2-Background-Pool: Thread-162([])]: 
hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to 
metastore, current connections: 11
 2018-09-19T17:15:21,308 INFO [HiveServer2-Background-Pool: Thread-162([])]: 
hive.metastore (HiveMetaStoreClient.java:open(481)) - Opened a connection to 
metastore, current connections: 12
 2018-09-19T17:15:21,385 INFO [HiveServer2-Background-Pool: Thread-162([])]: 
hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to 
metastore, current connections: 11
 2018-09-19T17:15:21,387 INFO [HiveServer2-Background-Pool: Thread-162([])]: 
hive.metastore (HiveMetaStoreClient.java:open(481)) - Opened a connection to 
metastore, current connections: 12
 2018-09-19T17:15:21,541 INFO [HiveServer2-Handler-Pool: Thread-31([])]: 
hive.metastore (HiveMetaStoreClient.java:open(481)) - Opened a connection to 
metastore, current connections: 13
 2018-09-19T17:15:21,542 INFO [HiveServer2-Handler-Pool: Thread-31([])]: 
hive.metastore (HiveMetaStoreClient.java:close(564)) - Closed a connection to 
metastore, current connections: 12
{code}

Attached is a simple [impyla|https://github.com/cloudera/impyla] script that 
triggers the condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20599) CAST(INTERVAL_DAY_TIME AS STRING) is throwing SemanticException

2018-09-19 Thread Naresh P R (JIRA)
Naresh P R created HIVE-20599:
-

 Summary: CAST(INTERVAL_DAY_TIME AS STRING) is throwing 
SemanticException
 Key: HIVE-20599
 URL: https://issues.apache.org/jira/browse/HIVE-20599
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 3.1.0
Reporter: Naresh P R
Assignee: Naresh P R
 Fix For: 3.1.0


SELECT CAST(from_utc_timestamp(timestamp '2018-05-02 15:30:30', 'PST') - 
from_utc_timestamp(timestamp '1970-01-30 16:00:00', 'PST') AS STRING);

throws below Exception
{code:java}
Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
Wrong arguments ''PST'': No matching method for class 
org.apache.hadoop.hive.ql.udf.UDFToString with (interval_day_time). Possible 
choices: _FUNC_(bigint)  _FUNC_(binary)  _FUNC_(boolean)  _FUNC_(date)  
_FUNC_(decimal(38,18))  _FUNC_(double)  _FUNC_(float)  _FUNC_(int)  
_FUNC_(smallint)  _FUNC_(string)  _FUNC_(timestamp)  _FUNC_(tinyint)  
_FUNC_(void) (state=42000,code=4){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68744: Add Surrogate Keys function to Hive

2018-09-19 Thread Miklos Gergely


> On Sept. 18, 2018, 10:35 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSurrogateKey.java
> > Lines 55 (patched)
> > 
> >
> > do udf.setWriteId(3) and call runAndVerifyConst() again to get coverage 
> > on writeId too.

I don't get this, isn't writeId supposed to be set only once for the function? 
Should I add some mechanism to catch if it was set twice, and throw an 
exception?


- Miklos


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68744/#review208739
---


On Sept. 19, 2018, 9:28 a.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68744/
> ---
> 
> (Updated Sept. 19, 2018, 9:28 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20536
> https://issues.apache.org/jira/browse/HIVE-20536
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add new function that allows the generation of a surrogate key composed of 
> the write id, the task id, and an incremental row id.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3f538b3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 3309b9b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 98448e4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSurrogateKey.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 8d41e78 
> 
> 
> Diff: https://reviews.apache.org/r/68744/diff/3/
> 
> 
> Testing
> ---
> 
> Added a new junit test for the function.
> Tested it in beeline by adding one row, adding multiple rows, adding mutliple 
> rows to multiple tables via multuple insert (all having their own 
> surrogate_key column)
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>



***UNCHECKED*** [jira] [Created] (HIVE-20598) Fix computeSortMergeCPUCost calculation

2018-09-19 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-20598:
---

 Summary: Fix computeSortMergeCPUCost calculation
 Key: HIVE-20598
 URL: https://issues.apache.org/jira/browse/HIVE-20598
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


HIVE-10343 have made the costs changeable by hiveconf settings; however there 
was a method in which there was already a local variable named 
cpuCostbottom line is the cost of n-way joins calculated by this method is 
computed as the product of the number of rows...

https://github.com/apache/hive/blob/9c907769a63a6b23c91fdf0b3f3d0aa6387035dc/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveAlgorithmsUtil.java#L83



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20597) full table scan in WHERE clause OR condition

2018-09-19 Thread lishiyang (JIRA)
lishiyang created HIVE-20597:


 Summary: full table scan in WHERE clause OR condition
 Key: HIVE-20597
 URL: https://issues.apache.org/jira/browse/HIVE-20597
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, hpl/sql
Affects Versions: 2.3.2
 Environment: hive server2; hive version is 2.3.2; 
Reporter: lishiyang


*In strict mode*, I query my hql on a partitioned table:

select * from tbl where date = '2018-08-08' *or* age > 18

The table tbl is partitioned, and its partition key is date, age is just a 
attribute column.

This hql in strict mode can be executed, when the condition is "date = 
'2018-08-08' " this query scans one partition only but when the condition is 
"age > 18" ,the query will scan the full table.

It seems that in strict mode,Hive checks the condition of table partition but 
ignores the logic of where clause?

Any help is useful!

Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20596) full

2018-09-19 Thread lishiyang (JIRA)
lishiyang created HIVE-20596:


 Summary: full
 Key: HIVE-20596
 URL: https://issues.apache.org/jira/browse/HIVE-20596
 Project: Hive
  Issue Type: Bug
Reporter: lishiyang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Writing Parquet Timestamp and reading from Hive table

2018-09-19 Thread Srinivas M
Hi

We have a java application which writes parquet files. We are using the
Parquet 1.9.0 API to write the Timestamp data. Since there are
incompatibilities between the Parquet and Hive representation of the
Timestamp data, we have tried to work around the same by writing the
Parquet Timestamp data as 12 byte array by converting the Timestamp fields
in the format Hive expects. However, while setting the field type in the
Schema, since Avro Schema Types does not have an enumeration for the INT96
type, we have set it to bytes under the assumption that hive would allow
reading the data since we have written in the format Hive expects. However,
when we are trying to read the data from the Hive table, we are running
into the following exception.


*Question : *
*---*
*1. Is there any way we can work around this issue by making hive read the
data when the timestamp field is set as bytes*
*2. Is there any way in which the data type can be set as INT96 in the
parquet schema ?*

Exception :

Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable


Schema of the file
=
file schema: parquet.filecc

C1:  REQUIRED INT32 R:0 D:0
C2:  REQUIRED BINARY O:UTF8 R:0 D:0
C3:  REQUIRED BINARY O:UTF8 R:0 D:0
*C4:  REQUIRED BINARY R:0 D:0  > Timestamp
Column*
*C5:  REQUIRED BINARY R:0 D:0  > Timestamp
Column*

---

hive> show create table HiveParquetTimestamp;
OK
CREATE EXTERNAL TABLE `HiveParquetTimestamp`(
  `c1` int,
  `c2` char(4),
  `c3` varchar(8),
  `c4` timestamp,
  `c5` timestamp)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://cdhkrb123.fyre.com:8020/tmp/HiveParquetTimestamp'

-- 
Srinivas
(*-*)
--
You have to grow from the inside out. None can teach you, none can make you
spiritual.
  -Narendra Nath Dutta(Swamy Vivekananda)
--


Review Request 68767: HIVE-20551: Create PreparedStatement query dynamically when IN clause is used

2018-09-19 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68767/
---

Review request for hive, Alexander Kolbasov, Peter Vary, and Vihang 
Karajgaonkar.


Repository: hive-git


Description
---

HIVE-20551: Create PreparedStatement query dynamically when IN clause is used


Diffs
-

  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
 571c789eddfd2b1a27c65c48bdc6dccfafaaf676 


Diff: https://reviews.apache.org/r/68767/diff/1/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 68683: Add new configuration to set the size of the global compile lock

2018-09-19 Thread denys kuzmenko via Review Board


> On Sept. 17, 2018, 9:15 a.m., Zoltan Haindrich wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java
> > Line 507 (original), 666 (patched)
> > 
> >
> > please don't make this method more visible; use compile("sel") or 
> > something...it should work
> 
> denys kuzmenko wrote:
> it's impossible to mock and test compile lock behaviour. Entry point is 
> Driver.compileAndRespond("query"). I do not want to use PowerMock. Actually I 
> tried and faced many issues with hadoop classes.
> 
> Peter Vary wrote:
> What about @VisibleForTesting annotation? It could show the intention at 
> least...

Added @VisibleForTesting annotation


- denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68683/#review208625
---


On Sept. 19, 2018, 9:37 a.m., denys kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68683/
> ---
> 
> (Updated Sept. 19, 2018, 9:37 a.m.)
> 
> 
> Review request for hive, Zoltan Haindrich, Zoltan Haindrich, Naveen Gangam, 
> and Peter Vary.
> 
> 
> Bugs: HIVE-20535
> https://issues.apache.org/jira/browse/HIVE-20535
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> When removing the compile lock, it is quite risky to remove it entirely.
> 
> It would be good to provide a pool size for the concurrent compilation, so 
> the administrator can limit the load
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8c39de3e77 
>   ql/src/java/org/apache/hadoop/hive/ql/CompileLockManager.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 737debd2ad 
>   ql/src/test/org/apache/hadoop/hive/ql/CompileLockTest.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68683/diff/5/
> 
> 
> Testing
> ---
> 
> Added CompileLockTest
> 
> 
> File Attachments
> 
> 
> HIVE-20535.1.patch
>   
> https://reviews.apache.org/media/uploaded/files/2018/09/13/41f5a84a-70e5-4882-99c1-1cf98c4364e4__HIVE-20535.1.patch
> 
> 
> Thanks,
> 
> denys kuzmenko
> 
>



Re: Review Request 68683: Add new configuration to set the size of the global compile lock

2018-09-19 Thread denys kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68683/
---

(Updated Sept. 19, 2018, 9:37 a.m.)


Review request for hive, Zoltan Haindrich, Zoltan Haindrich, Naveen Gangam, and 
Peter Vary.


Bugs: HIVE-20535
https://issues.apache.org/jira/browse/HIVE-20535


Repository: hive-git


Description
---

When removing the compile lock, it is quite risky to remove it entirely.

It would be good to provide a pool size for the concurrent compilation, so the 
administrator can limit the load


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8c39de3e77 
  ql/src/java/org/apache/hadoop/hive/ql/CompileLockManager.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 737debd2ad 
  ql/src/test/org/apache/hadoop/hive/ql/CompileLockTest.java PRE-CREATION 


Diff: https://reviews.apache.org/r/68683/diff/5/

Changes: https://reviews.apache.org/r/68683/diff/4-5/


Testing
---

Added CompileLockTest


File Attachments


HIVE-20535.1.patch
  
https://reviews.apache.org/media/uploaded/files/2018/09/13/41f5a84a-70e5-4882-99c1-1cf98c4364e4__HIVE-20535.1.patch


Thanks,

denys kuzmenko



Re: Review Request 68744: Add Surrogate Keys function to Hive

2018-09-19 Thread Miklos Gergely


> On Sept. 18, 2018, 8:58 p.m., Antal Sinkovits wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java
> > Lines 116 (patched)
> > 
> >
> > You have a potential NPE here, if the execution engine is not TEZ. I 
> > think it should support all execution engines (MR, spark, tez) or if its 
> > not possible, fail fast with a more reasonable exception.

For now this function is supported only if TEZ is the execution engine. It may 
change later.


- Miklos


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68744/#review208733
---


On Sept. 19, 2018, 9:28 a.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68744/
> ---
> 
> (Updated Sept. 19, 2018, 9:28 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20536
> https://issues.apache.org/jira/browse/HIVE-20536
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add new function that allows the generation of a surrogate key composed of 
> the write id, the task id, and an incremental row id.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3f538b3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 3309b9b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 98448e4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSurrogateKey.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 8d41e78 
> 
> 
> Diff: https://reviews.apache.org/r/68744/diff/3/
> 
> 
> Testing
> ---
> 
> Added a new junit test for the function.
> Tested it in beeline by adding one row, adding multiple rows, adding mutliple 
> rows to multiple tables via multuple insert (all having their own 
> surrogate_key column)
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>



Review Request 68765: HIVE-20595: Add findbugs-exclude.xml to metastore-server

2018-09-19 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68765/
---

Review request for hive, Alexander Kolbasov, Peter Vary, and Vihang 
Karajgaonkar.


Repository: hive-git


Description
---

HIVE-20595: Add findbugs-exclude.xml to metastore-server


Diffs
-

  standalone-metastore/metastore-server/findbugs/findbugs-exclude.xml 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68765/diff/1/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 68744: Add Surrogate Keys function to Hive

2018-09-19 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68744/
---

(Updated Sept. 19, 2018, 9:28 a.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-20536
https://issues.apache.org/jira/browse/HIVE-20536


Repository: hive-git


Description
---

Add new function that allows the generation of a surrogate key composed of the 
write id, the task id, and an incremental row id.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3f538b3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 3309b9b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 98448e4 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSurrogateKey.java
 PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 8d41e78 


Diff: https://reviews.apache.org/r/68744/diff/3/

Changes: https://reviews.apache.org/r/68744/diff/2-3/


Testing
---

Added a new junit test for the function.
Tested it in beeline by adding one row, adding multiple rows, adding mutliple 
rows to multiple tables via multuple insert (all having their own surrogate_key 
column)


Thanks,

Miklos Gergely



[jira] [Created] (HIVE-20595) Add findbugs-exclude.xml to metastore-server

2018-09-19 Thread Laszlo Pinter (JIRA)
Laszlo Pinter created HIVE-20595:


 Summary: Add findbugs-exclude.xml to metastore-server
 Key: HIVE-20595
 URL: https://issues.apache.org/jira/browse/HIVE-20595
 Project: Hive
  Issue Type: Bug
  Components: Hive, Standalone Metastore
Affects Versions: 4.0.0
Reporter: Laszlo Pinter
Assignee: Laszlo Pinter


The findbugs-exclude.xml is missing from 
standalone-metastore/metastore-server/findbugs. This should be added, otherwise 
the findbugs check will fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20594) insert overwrite may brings duplicated data when hdfs path exists but partition missing in hms

2018-09-19 Thread J.P Feng (JIRA)
J.P Feng created HIVE-20594:
---

 Summary: insert overwrite may brings duplicated data when hdfs 
path exists but partition missing in hms
 Key: HIVE-20594
 URL: https://issues.apache.org/jira/browse/HIVE-20594
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.1
Reporter: J.P Feng


when i insert overwrite a partitioned table whose hdfs path exists but its 
partition is missing from hms, i will get the duplicated data.

 

sql: insert overwrite table hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns 
partition (month = '201808' ) select * from xxx where month = '201808';

 

1. there is 10 files in hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns

    month=201808/01_0

    month=201808/02_0 ... month=201808/09_0

2. if hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns is a external table 
and i drop partition (month=201808) / or in other ways, i drop partition 
(month=201808) but do not remove the data under it

3.insert overwrite table hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns 
partition (month = '201808' ) select * from xxx where month = '201808' 

if in such sql, it generates 9 maps, and may generates 9 files :

month=201808/01_0 ~ month=201808/08_0

 

after executing such sql, we may find the file `month=201808/09_0` will 
still remain, then we may get the duplicated data.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20593) Load Data for partitioned ACID tables fails with bucketId out of range: -1

2018-09-19 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-20593:
-

 Summary: Load Data for partitioned ACID tables fails with bucketId 
out of range: -1
 Key: HIVE-20593
 URL: https://issues.apache.org/jira/browse/HIVE-20593
 Project: Hive
  Issue Type: Bug
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


Load data for ACID tables is failing to load ORC files when it is converted to 
IAS job.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20592) hive on spark - got a weird output when count(*) from this script

2018-09-19 Thread Gu Yuchen (JIRA)
Gu Yuchen created HIVE-20592:


 Summary: hive on spark - got a weird output when count(*)  from 
this script
 Key: HIVE-20592
 URL: https://issues.apache.org/jira/browse/HIVE-20592
 Project: Hive
  Issue Type: Bug
 Environment: spark 1.6.1

hive 1.2.2

hadoop 2.7.1
Reporter: Gu Yuchen
 Attachments: jira.png

 

use hiveContext to exec a script blew:

with nt as (select label, score from (select * from (select label, score, 
row_number() over (order by score desc) as position from t1)t_1 join (select 
count(*) as countall from t1)t_2 )ta where position <= countall * 0.4) select 
count(*) as c_positive from nt where label = 1

and i got this result.

!jira.png!

it is weird when call the 'count()' func on rdd and dataframe,

as the pic says: different output here

can someone help me out? thanks a lot

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)