[jira] [Created] (HIVE-17631) upgrade orc to 1.4.0

2017-09-27 Thread Saijin Huang (JIRA)
Saijin Huang created HIVE-17631:
---

 Summary: upgrade orc to 1.4.0
 Key: HIVE-17631
 URL: https://issues.apache.org/jira/browse/HIVE-17631
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.0.0
 Environment: It seems like orc 1.4.0 has a latest and stable version:
https://orc.apache.org/docs/releases.html
Reporter: Saijin Huang
Assignee: Saijin Huang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17630) RESIGNAL:actual results are inconsistent with expectations at hplsql

2017-09-27 Thread ZhangBing Lin (JIRA)
ZhangBing Lin created HIVE-17630:


 Summary: RESIGNAL:actual results are inconsistent with 
expectations at hplsql
 Key: HIVE-17630
 URL: https://issues.apache.org/jira/browse/HIVE-17630
 Project: Hive
  Issue Type: Bug
  Components: hpl/sql
Reporter: ZhangBing Lin
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17629) CachedStore - wait for prewarm at use time, not init time

2017-09-27 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17629:
---

 Summary: CachedStore - wait for prewarm at use time, not init time
 Key: HIVE-17629
 URL: https://issues.apache.org/jira/browse/HIVE-17629
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17628) always use fully qualified path for tables/partitions/etc.

2017-09-27 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17628:
---

 Summary: always use fully qualified path for tables/partitions/etc.
 Key: HIVE-17628
 URL: https://issues.apache.org/jira/browse/HIVE-17628
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


# Different services, or the same one at different times, may have different 
default FS, so it doesn't make sense to persist a non-qualified path.
# The logic to detect whether we are using default FS or not is anyway rather 
questionable e.g. if it will run if the setting is set to the same value as the 
default fs, as long as it's set; and in fact might be more expensive that just 
making the path qualified as it iterates thru all the properties, including the 
ones added from getConfVarInputStream.
# It also hits HADOOP-13500.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17627) Use druid scan query instead of the select query.

2017-09-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17627:
-

 Summary: Use druid scan query instead of the select query.
 Key: HIVE-17627
 URL: https://issues.apache.org/jira/browse/HIVE-17627
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra


The biggest difference between select query and scan query is that, scan query 
doesn't retain all rows in memory before rows can be returned to client.
It will cause memory pressure if too many rows required by select query.
Scan query doesn't have this issue.
Scan query can return all rows without issuing another pagination query, which 
is extremely useful when query against historical or realtime node directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17626) Query reoptimization using cached runtime statistics

2017-09-27 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-17626:


 Summary: Query reoptimization using cached runtime statistics
 Key: HIVE-17626
 URL: https://issues.apache.org/jira/browse/HIVE-17626
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer
Affects Versions: 3.0.0
Reporter: Prasanth Jayachandran


Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with 
actual and estimated statistics. The runtime stats can be cached at query level 
and subsequent execution of the same query can make use of the cached 
statistics from the previous run for better optimization. 
Some use cases,
1) re-planning join query (mapjoin failures can be converted to shuffle joins)
2) better statistics for table scan operator if dynamic partition pruning is 
involved
3) Better estimates for bloom filter initialization (setting expected entries 
during merge)

This can extended to support wider queries by caching fragments of operator 
plans scanning same table(s) or matching some operator sequences.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17625) Replication: update hive.repl.partitions.dump.parallelism to 100

2017-09-27 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-17625:
---

 Summary: Replication: update hive.repl.partitions.dump.parallelism 
to 100
 Key: HIVE-17625
 URL: https://issues.apache.org/jira/browse/HIVE-17625
 Project: Hive
  Issue Type: Bug
  Components: repl
Reporter: Vaibhav Gumashta


Set hive.repl.partitions.dump.parallelism=100



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17624) MapredLocakTask running in separate JVM could throw ClassNotFoundException

2017-09-27 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-17624:
---

 Summary: MapredLocakTask running in separate JVM could throw 
ClassNotFoundException 
 Key: HIVE-17624
 URL: https://issues.apache.org/jira/browse/HIVE-17624
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 2.1.1
Reporter: Aihua Xu
Assignee: Aihua Xu


{noformat}
set hive.auto.convert.join=true;
set hive.auto.convert.join.use.nonstaged=false;

add jar hive-hcatalog-core.jar;

drop table if exists t1;
CREATE TABLE t1 (a string, b string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

LOAD DATA LOCAL INPATH "data/files/sample.json" INTO TABLE t1;
select * from t1 l join t1 r on l.a=r.a;
{noformat}

The join will use a MapJoin which uses MapredLocalTask in a separate JVM to 
load the table into a Hashmap. But hive doesn't pass added jar to the classpath 
in such JVM so the following exception is thrown.

{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception 
java.lang.ClassNotFoundException: 
org.apache.hive.hcatalog.data.JsonSerDejava.lang.RuntimeException: 
java.lang.ClassNotFoundException: org.apache.hive.hcatalog.data.JsonSerDe
at 
org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:72)
at 
org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:92)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:564)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:127)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:462)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:390)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:370)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:756)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hive.hcatalog.data.JsonSerDe
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at 
org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:69)
... 15 more

at 
org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:586)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:127)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:462)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:390)
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:370)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:756)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17623) Fix Select query Fix Double column serde and some refactoring

2017-09-27 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-17623:
-

 Summary: Fix Select query Fix Double column serde and some 
refactoring
 Key: HIVE-17623
 URL: https://issues.apache.org/jira/browse/HIVE-17623
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 3.0.0
Reporter: slim bouguerra


This PR has 2 fixes.
First, fixes the limit of results returned by Select query that used to be 
limited to 16K rows
Second fixes the type inference for the double type newly added to druid.
Use Jackson polymorphism to infer types and parse results from druid nodes.
Removes duplicate codes form RecordReaders.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62487: HIVE-17563: CodahaleMetrics.JsonFileReporter is not updating hive.service.metrics.file.location

2017-09-27 Thread Alexander Kolbasov


> On Sept. 25, 2017, 7:05 p.m., Sahil Takiar wrote:
> > Changes look good. Just to confirm, the changes to the unit tests include 
> > an explicit test for the bug reported in HIVE-17563, correct? IOW, there is 
> > a test that first writes the metrics files, updates the metrics in HS2, 
> > re-writes the metrics file, and then validates that the re-written metrics 
> > file contains the updated values?

The tests for both classes do include the test that verifies that the JSON file 
is updated every time the metric is updated along the lines that you described.


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62487/#review186149
---


On Sept. 22, 2017, 11:46 p.m., Alexander Kolbasov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62487/
> ---
> 
> (Updated Sept. 22, 2017, 11:46 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Carl Steinbach, Alan Gates, Sergio Pena, 
> Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-17563
> https://issues.apache.org/jira/browse/HIVE-17563
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-17563: CodahaleMetrics.JsonFileReporter is not updating 
> hive.service.metrics.file.location
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/JsonFileMetricsReporter.java
>  c07517a634e35c936d6ea68e9a2874ac2c59929a 
>   
> common/src/test/org/apache/hadoop/hive/common/metrics/metrics2/TestCodahaleMetrics.java
>  67f81d6c43a7904dc384cfcce6f948f1f802144c 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/metrics/JsonReporter.java
>  b804cdade07079955a65cb431fab078dcecd53b1 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/metrics/TestMetrics.java
>  259a4db43905c47d873776df96210a4e77d07076 
> 
> 
> Diff: https://reviews.apache.org/r/62487/diff/5/
> 
> 
> Testing
> ---
> 
> TestCodahaleMetrics tests that the json reporter functionality
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>



[jira] [Created] (HIVE-17622) Implement ANTLR based expressions for rule triggers

2017-09-27 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-17622:


 Summary: Implement ANTLR based expressions for rule triggers
 Key: HIVE-17622
 URL: https://issues.apache.org/jira/browse/HIVE-17622
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


For more expressiveness of rule trigger expressions move manual expression 
parsing in HIVE-17508 to ANTLR based grammar.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17621) Hive-site settings are ignored during HCatInputFormat split-calculation

2017-09-27 Thread Mithun Radhakrishnan (JIRA)
Mithun Radhakrishnan created HIVE-17621:
---

 Summary: Hive-site settings are ignored during HCatInputFormat 
split-calculation
 Key: HIVE-17621
 URL: https://issues.apache.org/jira/browse/HIVE-17621
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 2.2.0, 3.0.0
Reporter: Mithun Radhakrishnan
Assignee: Chris Drome


Another one that [~selinazh] and [~cdrome] worked on.

The production {{hive-site.xml}} could well contain settings that differ from 
the defaults in {{HiveConf.java}}. In our case, we introduced a custom ORC 
split-strategy, which we introduced as the site-wide default.

We noticed that during {{HCatInputFormat::getSplits()}}, if the user-script did 
not contain the setting, the site-wide default was ignored in favour of the 
{{HiveConf}} default. HCat would not convey hive-site settings to the 
input-format (or anywhere downstream).

The forthcoming patch fixes this problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17620) Use the default MR scratch directory (HDFS) in the only case when hive.blobstore.optimizations.enabled=true AND isFinalJob=true

2017-09-27 Thread JIRA
Gergely Hajós created HIVE-17620:


 Summary: Use the default MR scratch directory (HDFS) in the only 
case when hive.blobstore.optimizations.enabled=true AND isFinalJob=true
 Key: HIVE-17620
 URL: https://issues.apache.org/jira/browse/HIVE-17620
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0, 2.3.0, 3.0.0
Reporter: Gergely Hajós
Assignee: Gergely Hajós


Introduced in HIVE-15121. Context::getTempDirForPath tries to use temporary MR 
directory instead of blobstore directory in three cases:
{code}
if (!isFinalJob && BlobStorageUtils.areOptimizationsEnabled(conf)) {
{code}

while the only valid case for using a temporary MR dir is when optimization is 
enabled and the job is not final:
{code}
if (BlobStorageUtils.areOptimizationsEnabled(conf) && !isFinalJob) {
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17619) Exclude avatica-core.jar since avatica.jar is included

2017-09-27 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-17619:
---

 Summary: Exclude avatica-core.jar since avatica.jar is included
 Key: HIVE-17619
 URL: https://issues.apache.org/jira/browse/HIVE-17619
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.1
Reporter: Aihua Xu
Assignee: Aihua Xu


avatica.jar is included in the project but this jar has a dependency on 
avatica-core.jar and it's pulled into the project as well. 

If avatica-core.jar is included in the classpath in front of  avatica.jar, then 
hive could run into missing class which is shaded inside avatica.jar.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62442: HIVE-17569: Compare filtered output files in BeeLine tests

2017-09-27 Thread Marta Kuczora


> On Sept. 22, 2017, 3:50 p.m., Peter Vary wrote:
> > Overall looks good. Just a few nits, which might be a matter of taste 
> > anyway. Feel free to object, it you find it unreasonable.
> > 
> > Thanks for the patch!

Thanks a lot Peter for the review. I fixed the issues you raised.


> On Sept. 22, 2017, 3:50 p.m., Peter Vary wrote:
> > itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreBeeLineDriver.java
> > Lines 91 (patched)
> > 
> >
> > I think it would be nice to have the default value as a boolean instead 
> > of a string, and we might want to call this method getBooleanPropertyValue. 
> > What do you think?

Yeah, we can do it like that.


> On Sept. 22, 2017, 3:50 p.m., Peter Vary wrote:
> > itests/util/src/main/java/org/apache/hive/beeline/QFile.java
> > Lines 70 (patched)
> > 
> >
> > We might want to use regexps here where we have separators like 
> > DESCRIBE[\s\n]+EXTENDED - just an example, which probably should be changed 
> > to be valid :)

It makes sense, thanks for pointing it out. I fixed the entries.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62442/#review185995
---


On Sept. 20, 2017, 3:11 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62442/
> ---
> 
> (Updated Sept. 20, 2017, 3:11 p.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-17569
> https://issues.apache.org/jira/browse/HIVE-17569
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduce a new property "test.beeline.compare.portable" with the default 
> value false and if this property is set to true, the result of the commands 
> "EXPLAIN", "DESCRIBE EXTENDED" and "DESCRIBE FORMATTED" will be filtered out 
> from the out files before comparing them in BeeLine tests.
> 
> 
> Diffs
> -
> 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreBeeLineDriver.java
>  9dfc253 
>   itests/util/src/main/java/org/apache/hive/beeline/QFile.java e70ac38 
> 
> 
> Diff: https://reviews.apache.org/r/62442/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 62442: HIVE-17569: Compare filtered output files in BeeLine tests

2017-09-27 Thread Marta Kuczora

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62442/
---

(Updated Sept. 27, 2017, 3:11 p.m.)


Review request for hive and Peter Vary.


Changes
---

Fixed the issues found during the review.


Bugs: HIVE-17569
https://issues.apache.org/jira/browse/HIVE-17569


Repository: hive-git


Description
---

Introduce a new property "test.beeline.compare.portable" with the default value 
false and if this property is set to true, the result of the commands 
"EXPLAIN", "DESCRIBE EXTENDED" and "DESCRIBE FORMATTED" will be filtered out 
from the out files before comparing them in BeeLine tests.


Diffs (updated)
-

  
itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreBeeLineDriver.java
 9dfc253 
  itests/util/src/main/java/org/apache/hive/beeline/QFile.java e70ac38 


Diff: https://reviews.apache.org/r/62442/diff/2/

Changes: https://reviews.apache.org/r/62442/diff/1-2/


Testing
---


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-17618) Extend ANALYZE TABLE / DESCRIBE FORMATTED functionality with distribution of selected file-level metadata fields

2017-09-27 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created HIVE-17618:


 Summary: Extend ANALYZE TABLE / DESCRIBE FORMATTED functionality 
with distribution of selected file-level metadata fields
 Key: HIVE-17618
 URL: https://issues.apache.org/jira/browse/HIVE-17618
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Ivanfi


DESCRIBE FORMATTED already shows the number of files:

{noformat}
[...]
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles14
numRows 15653
[...]
{noformat}

It would be useful to break this number down by different file-level metadata 
fields. Once such field would be the different compression settings used in the 
table. Currently there is no way to check whether the contents of a table are 
compressed because some files can be compressed while others not. A file-count 
breakdown could provide this missing information in the following form:

{noformat}
[...]
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles14
breakdown by compression:
Uncompressed:   3
Snappy: 6
Deflate:5
numRows 15653
[...]
{noformat}

Another useful breakdown would be by the writer field of Parquet files, because 
Impala writes Parquet files slightly differently (string fields are not 
annotated with UTF8 by default, timestamps are not adjusted to UTC) and users 
may want to know what kind of Parquet files are in a table but have no way to 
query it at this moment. An example output for Parquet tables could look like:

{noformat}
[...]
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles14
breakdown by compression:
Uncompressed:   3
Snappy: 6
Deflate:5
breakdown by writer:
parquet-mr: 9
impala: 5
numRows 15653
[...]
{noformat}

Any other file-level metadata could be incorporated that we consider useful to 
the user. Since gathering file-level metadata is an expensive operation, it 
should be done when the user issues ANALYZE TABLE ... COMPUTE STATISTICS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17617) Rollup of an empty resultset should contain the grouping of the empty grouping set

2017-09-27 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-17617:
---

 Summary: Rollup of an empty resultset should contain the grouping 
of the empty grouping set
 Key: HIVE-17617
 URL: https://issues.apache.org/jira/browse/HIVE-17617
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


running
{code}
drop table if exists tx1;
create table tx1 (a integer,b integer,c integer);

select  sum(c),
grouping(b)
fromtx1
group by rollup (b);
{code}
returns 0 rows; however 

according to the standard:
The  is regarded as the shortest such initial sublist. For 
example, “ROLLUP ( (A, B), (C, D) )”
is equivalent to “GROUPING SETS ( (A, B, C, D), (A, B), () )”.

so I think the totals row (the grouping for {{()}} should be present)  - psql 
returns it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Hive Table Displaying NULLS for a FIXED length file

2017-09-27 Thread Kiran
Hello Developers,

I am trying to select data from a hive table created on top of a fixed
length file. Below are different DDL's tried and nothing worked.


Trial-1) With the *.** at the end of regex after all the column lengths are
declared and also without *.**
Trial-2) With output Expression as  second DDL below.
Trial-3) Tried with load data INPATH and without as well.

*File Properties:*

35 Columns in total.
Every line ends with a new line '\n'.
All the columns are being read as strings.
Total row count is 25K(+).
Files are under the hdfs dir: /data/source/raw/land/
File name: my_file1.txt

--
*HIVE DDL Trial-1:*

CREATE EXTERNAL TABLE all_files.MY_TABLE_1
(COL1 string,
COL2 string,
COL3 string,
COL4 string,
.
.
.
COL35 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{13})(.{10})(.{10})..so on till
35th col.(.{3}).*" )
STORED AS TEXTFILE
LOCATION 'hdfs:///data/source/raw/land';
---
OK
Time taken 0.165 seconds

hive> select * from all_files.MY_TABLE_1 limit 5;
result: NULLS displayed

-
*HIVE DDL Trial-2:*

CREATE EXTERNAL TABLE all_files.MY_TABLE_1
(COL1 string,
COL2 string,
COL3 string,
COL4 string,
.
.
.
COL35 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(.{13})(.{10})(.{10})..so on till 35th col.(.{3})",
"output.format.string" = "%1$s %2$s %3$s..so on till 35th col...%35$s"
)
STORED AS TEXTFILE
LOCATION 'hdfs:///data/source/raw/land';

OK
Time taken 0.165 seconds

hive> select * from all_files.MY_TABLE_1 limit 5;
result: NULLS displayed




Any help in this regard is much appreciated.

Thanks,
Srikiran.


[jira] [Created] (HIVE-17616) running msck on a table partitioned by a timestamp column reports invalid character

2017-09-27 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-17616:
---

 Summary: running msck on a table partitioned by a timestamp column 
reports invalid character
 Key: HIVE-17616
 URL: https://issues.apache.org/jira/browse/HIVE-17616
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


test:
{code}
create table micky (endtimestamp string, id string, logLevel string, count 
bigint) partitioned by (xstart timestamp) stored as orc;
insert into micky partition (xstart="2017-09-27 13:51:58") values ("2017-09-27 
13:52:58", "5", "WARN", 5);
insert into micky partition (xstart="2017-09-27 13:54:58") values ("2017-09-27 
13:55:58", "6", "INFO", 6);
select * from micky;
create external table mini (endtimestamp string, id string, logLevel string, 
count bigint) partitioned by (xstart timestamp)
 stored as orc location "${system:test.warehouse.dir}/micky";
MSCK REPAIR TABLE mini;
{code}

results in:
{code}
2017-09-27T05:20:03,960  WARN [3bff4423-d4cf-4142-ad6e-476757c9a0b0 main] 
exec.DDLTask: Failed to run metacheck: 
org.apache.hadoop.hive.ql.metadata.HiveException: Repair: Cannot add partition 
mini:xstart=2017-09-27 13%3A51%3A58.0 due to invalid characters in the name
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:2027) 
[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:436) 
[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
[...]
{code}

present on current master



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17615) Task.executeTask has to be thread safe for parallel execution

2017-09-27 Thread anishek (JIRA)
anishek created HIVE-17615:
--

 Summary: Task.executeTask has to be thread safe for parallel 
execution
 Key: HIVE-17615
 URL: https://issues.apache.org/jira/browse/HIVE-17615
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
 Fix For: 3.0.0


With parallel execution enabled we should make sure that the 
{{Task.executeTask}} has to be thread safe, which is not the case with 
hiveHistory object.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17614) Notification_sequence initialization using SQL statement which is compatible with 5.1 Version of Mysql

2017-09-27 Thread anishek (JIRA)
anishek created HIVE-17614:
--

 Summary: Notification_sequence initialization using SQL statement 
which is compatible with 5.1 Version of Mysql
 Key: HIVE-17614
 URL: https://issues.apache.org/jira/browse/HIVE-17614
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
 Fix For: 3.0.0


Since a lot of people are still using hive with 5.1 mysql as the rdbms for 
metastore, it will be helpful to have the initialization statement introduced 
as part of HIVE-16896 for mysql to be friendlier to these older versions. 

INSERT INTO `NOTIFICATION_SEQUENCE` (`NNI_ID`, `NEXT_EVENT_ID`) SELECT * from 
(select 1 as `NNI_ID`, 1 as `NOTIFICATION_SEQUENCE`) a  WHERE (select count(*) 
from `NOTIFICATION_SEQUENCE`) = 0;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)