[jira] [Created] (HIVE-20953) Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded

2018-11-20 Thread Ashutosh Bapat (JIRA)
Ashutosh Bapat created HIVE-20953:
-

 Summary: Fix testcase 
TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions
 to not depend upon the order in which objects get loaded
 Key: HIVE-20953
 URL: https://issues.apache.org/jira/browse/HIVE-20953
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat
 Fix For: 4.0.0


The testcase is intended to test REPL LOAD with retry. The test creates a 
partitioned table and a function in the source database and loads those to the 
replica. The first attempt to load a dump is intended to fail while loading one 
of the partitions. Based on the order in which the objects get loaded, if the 
function is queued after the table, it will not be available in replica after 
the load failure. But if it's queued before the table, it will be available in 
replica even after the load failure. The test assumes the later case, which may 
not be true always.
 
 Hence fix the testcase to order the objects by a fixed ordering. By setting 
hive.in.repl.test.files.sorted to true, the objects are ordered by the 
directory names. This
 ordering is available with minimal changes for testing, hence we use it. With 
this ordering a
 function gets loaded before a table. So changed the test to not expect the 
function to be available after the failed load, but be available after the retry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

2018-11-20 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210740
---




itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Line 165 (original), 165 (patched)


?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Line 170 (original), 170 (patched)


why are all these test made non-tests?
or does this do somethign else?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 533 (patched)


were you going to do "0+validate_acid_sort_order(...)" instead?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 54 (patched)


I'm guessing if compareTo returns 0 that's bad - we should have unique row 
ids



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 61 (patched)


should this return 0?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 80 (patched)


I think comparison should include 'bucketProperty' since we sort on 
'bucketProperty' not just bucketId.
In particular, if you have > 1 statement per txn, we expect that rows from 
2nd stmt follow those from 1st.


- Eugene Koifman


On Nov. 19, 2018, 3:49 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> ---
> 
> (Updated Nov. 19, 2018, 3:49 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
> https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 40dd992455 
>   pom.xml 26b662e4c3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> 8cabf960db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 6e7c78bd17 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 92c74e1d06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[GitHub] hive pull request #490: [HIVE-20952] Cleaning VectorizationContext.java

2018-11-20 Thread b-slim
GitHub user b-slim opened a pull request:

https://github.com/apache/hive/pull/490

[HIVE-20952] Cleaning VectorizationContext.java



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/b-slim/hive HIVE-20952

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/490.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #490


commit cdd533dead7cdb6603ae9c8dff0b64aa23db6dab
Author: Slim Bouguerra 
Date:   2018-11-20T23:52:27Z

Cleaning VectorizationContext.java

Change-Id: Icdcb407ed5c7f5175f54ed023fc2565338d9032c




---


[jira] [Created] (HIVE-20952) Cleaning VectorizationContext.java

2018-11-20 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20952:
-

 Summary: Cleaning VectorizationContext.java
 Key: HIVE-20952
 URL: https://issues.apache.org/jira/browse/HIVE-20952
 Project: Hive
  Issue Type: Improvement
Reporter: slim bouguerra
Assignee: slim bouguerra






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20951) LLAP: Set Xms to 50% always

2018-11-20 Thread Gopal V (JIRA)
Gopal V created HIVE-20951:
--

 Summary: LLAP: Set Xms to 50% always 
 Key: HIVE-20951
 URL: https://issues.apache.org/jira/browse/HIVE-20951
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 3.1.1, 4.0.0
Reporter: Gopal V


The lack of GC pauses is killing LLAP containers whenever the significant 
amount of memory is consumed by the off-heap structures which aren't cleaned up 
automatically until the GC runs.

There's a java.nio.DirectByteBuffer.Deallocator which runs when the Direct 
buffers are garbage collected, which actually does the cleanup of the 
underlying off-heap buffers.

The lack of Garbage collection activity for several hours while responding to 
queries triggers a build-up of these off-heap structures which end up forcing 
YARN to kill the process instead.

It is better to hit a GC pause occasionally rather than to lose a node every 
few hours.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Why are TXN IDs not partitioned per database?

2018-11-20 Thread Granville Barnett
Thanks Alan.

On Tue, 20 Nov 2018, 17:23 Alan Gates  History.  Originally there were only transaction ids, which were global.
> Write ids for tables came later as a way to limit the amount of information
> each transaction needed to track and to make it easier to replicate table
> changes between Hive instances.
>
> But even if we had put them in from the start, we'd have them span
> databases, otherwise transactions couldn't span databases.  Hive has no
> restrictions on queries spanning databases so we wouldn't want to restrict
> transactions from doing so.
>
> Alan.
>
> On Tue, Nov 20, 2018 at 7:32 AM Granville Barnett <
> granvillebarn...@gmail.com> wrote:
>
> > Hi,
> >
> > Reading the source code of Hive 3.x and I have a question regarding
> > transaction IDs which form the span of a transaction: it's begin (TXN ID)
> > and commit ID (NEXT_TXN_ID at time of commit).
> >
> > Why is it that we have a global timeline for transactions rather than a
> > timeline partitioned at the granularity of a database, kind of similar to
> > how write IDs are partitioned per table but at the database scope?
> >
> > E.g.,
> >
> > NEXT_TXN_ID
> > +---+---+
> > | DB| NTXN_NEXT  |
> > +---+---+
> > | test1 | 23   |
> > | test2 | 4 |
> > +---+---+
> >
> > Same question could also be applied to NEXT_LOCK_ID.
> >
> > I am just curious because it seems like partitioning the transaction (and
> > lock IDs) would reduce the granularity of locking in the various
> > transactional methods. For example, openTxn invocations are mutexed with
> > all other openTxn invocations even if they are for transactions running
> in
> > distinct database domains.  Similarly for openTxn mutexing with respect
> to
> > commitTxn if there is a write-write conflict, which I would have thought
> > would only be the case if they are applicable to the same database. I'm
> > sure that this would have the side effect of increasing the complexity of
> > other subsystems but I had to ask what the rationale was behind this.
> >
> > (I'm new to Hive to please forgive me if the answer is obvious.)
> >
> > Regards,
> >
> > Granville
> >
>


[jira] [Created] (HIVE-20950) Transform OUTER join with condition always true into INNER join

2018-11-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-20950:
--

 Summary: Transform OUTER join with condition always true into 
INNER join
 Key: HIVE-20950
 URL: https://issues.apache.org/jira/browse/HIVE-20950
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Reporter: Jesus Camacho Rodriguez


For instance, it may help the join reordering algorithm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20949) Improve PKFK cardinality estimation in physical planning

2018-11-20 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-20949:
--

 Summary: Improve PKFK cardinality estimation in physical planning
 Key: HIVE-20949
 URL: https://issues.apache.org/jira/browse/HIVE-20949
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Missing case for cartesian product and full outer joins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Disabling Basic Stats on a Per-Session Basis

2018-11-20 Thread Morio Ramdenbourg
Hi all,

I've been looking for a particular commit that added the ability to disable
basic HMS statistics (hive.stats.autogather
)
on a per-session basis via the environment context. This change should have
contained some way to allow the environment context to request to disable
this. Does anyone know which commit this is?

Thanks,
Morio Ramdenbourg


[jira] [Created] (HIVE-20948) Eliminate file rename in compactor

2018-11-20 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20948:
-

 Summary: Eliminate file rename in compactor
 Key: HIVE-20948
 URL: https://issues.apache.org/jira/browse/HIVE-20948
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


Once HIVE-20823 is committed, we should investigate if it's possible to have 
compactor write directly to base_x_cZ or delta_x_y_cZ



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Why are TXN IDs not partitioned per database?

2018-11-20 Thread Alan Gates
History.  Originally there were only transaction ids, which were global.
Write ids for tables came later as a way to limit the amount of information
each transaction needed to track and to make it easier to replicate table
changes between Hive instances.

But even if we had put them in from the start, we'd have them span
databases, otherwise transactions couldn't span databases.  Hive has no
restrictions on queries spanning databases so we wouldn't want to restrict
transactions from doing so.

Alan.

On Tue, Nov 20, 2018 at 7:32 AM Granville Barnett <
granvillebarn...@gmail.com> wrote:

> Hi,
>
> Reading the source code of Hive 3.x and I have a question regarding
> transaction IDs which form the span of a transaction: it's begin (TXN ID)
> and commit ID (NEXT_TXN_ID at time of commit).
>
> Why is it that we have a global timeline for transactions rather than a
> timeline partitioned at the granularity of a database, kind of similar to
> how write IDs are partitioned per table but at the database scope?
>
> E.g.,
>
> NEXT_TXN_ID
> +---+---+
> | DB| NTXN_NEXT  |
> +---+---+
> | test1 | 23   |
> | test2 | 4 |
> +---+---+
>
> Same question could also be applied to NEXT_LOCK_ID.
>
> I am just curious because it seems like partitioning the transaction (and
> lock IDs) would reduce the granularity of locking in the various
> transactional methods. For example, openTxn invocations are mutexed with
> all other openTxn invocations even if they are for transactions running in
> distinct database domains.  Similarly for openTxn mutexing with respect to
> commitTxn if there is a write-write conflict, which I would have thought
> would only be the case if they are applicable to the same database. I'm
> sure that this would have the side effect of increasing the complexity of
> other subsystems but I had to ask what the rationale was behind this.
>
> (I'm new to Hive to please forgive me if the answer is obvious.)
>
> Regards,
>
> Granville
>


[jira] [Created] (HIVE-20947) Add User Agent String to Hive Client API

2018-11-20 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-20947:
--

 Summary: Add User Agent String to Hive Client API
 Key: HIVE-20947
 URL: https://issues.apache.org/jira/browse/HIVE-20947
 Project: Hive
  Issue Type: New Feature
  Components: Clients, Diagnosability, JDBC, ODBC
Affects Versions: 4.0.0
Reporter: BELUGA BEHR


Allow users to specify a user agent string as part of their JDBC/ODBC 
connection string and print the information in the HS2 logs.  This will  allow 
us the opportunity to identify misbehaving clients.

Variable: {{userAgent}}

https://en.wikipedia.org/wiki/User_agent#Format_for_human-operated_web_browsers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #489: [SequenceFile]: java.lang.ClassCastException: org.ap...

2018-11-20 Thread kautshukla
GitHub user kautshukla opened a pull request:

https://github.com/apache/hive/pull/489

[SequenceFile]: java.lang.ClassCastException: 
org.apache.hadoop.io.ByteWritable cannot be cast to org.apache.hadoop.io.Text

Check writable instance and then create Text object. 
In sequence file format which writes in key: 
org.apache.hadoop.io.LongWritable & value: org.apache.hadoop.io.BytesWritable 
throws **java.lang.ClassCastException:** **org.apache.hadoop.io.ByteWritable** 
cannot be cast to **org.apache.hadoop.io.Text.**

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kautshukla/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/489.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #489


commit 4b3eb7d8cae26d5ea08fd57fc1199102086da0bf
Author: kautshukla 
Date:   2018-11-20T15:37:33Z

{SequenceFileFormat} Instance check to handle cast exception from 
bytewritable to text




---


Why are TXN IDs not partitioned per database?

2018-11-20 Thread Granville Barnett
Hi,

Reading the source code of Hive 3.x and I have a question regarding
transaction IDs which form the span of a transaction: it's begin (TXN ID)
and commit ID (NEXT_TXN_ID at time of commit).

Why is it that we have a global timeline for transactions rather than a
timeline partitioned at the granularity of a database, kind of similar to
how write IDs are partitioned per table but at the database scope?

E.g.,

NEXT_TXN_ID
+---+---+
| DB| NTXN_NEXT  |
+---+---+
| test1 | 23   |
| test2 | 4 |
+---+---+

Same question could also be applied to NEXT_LOCK_ID.

I am just curious because it seems like partitioning the transaction (and
lock IDs) would reduce the granularity of locking in the various
transactional methods. For example, openTxn invocations are mutexed with
all other openTxn invocations even if they are for transactions running in
distinct database domains.  Similarly for openTxn mutexing with respect to
commitTxn if there is a write-write conflict, which I would have thought
would only be the case if they are applicable to the same database. I'm
sure that this would have the side effect of increasing the complexity of
other subsystems but I had to ask what the rationale was behind this.

(I'm new to Hive to please forgive me if the answer is obvious.)

Regards,

Granville


Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-20 Thread Nandor Kollar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/#review210718
---


Ship it!




Ship It!

- Nandor Kollar


On Nov. 20, 2018, 12:53 p.m., Adam Szita wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69410/
> ---
> 
> (Updated Nov. 20, 2018, 12:53 p.m.)
> 
> 
> Review request for hive, Nandor Kollar and Peter Vary.
> 
> 
> Bugs: HIVE-20330
> https://issues.apache.org/jira/browse/HIVE-20330
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The change in this patch is that we're not just serializing and putting one 
> InputJobInfo into JobConf, but rather always append to a list (or create it 
> on the first occurrence) of InputJobInfo instances in it.
> This ensures that if multiple tables serve as inputs in a job, Pig can 
> retrieve information for each of the tables, not just the last one added.
> 
> I've also discovered a bug in InputJobInfo.writeObject() where the 
> ObjectOutputStream was closed by mistake after writing partition information 
> in a compressed manner. Closing the compressed writer inevitably closed the 
> OOS on the context and prevented any other objects to be written into OOS - I 
> had to fix that because it prevented serializing InputJobInfo instances 
> inside a list.
> 
> 
> Diffs
> -
> 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
> 8e72a1275a5cdcc2d778080fff6bb82198395f5f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  195eaa367933990e3ef0ef879f34049c65822aee 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
>  8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
>  ad6f3eb9f93338023863c6239d6af0449b20ff9c 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
>  364382d9ccf6eb9fc29689b0eb5f973f422051b4 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
>  ac1dd54be821d32aa008d41514df05a41f16223c 
>   
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
> 91aa4fa2693e0b0bd65c1667210af340619f552d 
>   
> hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
>  c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
>   
> hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
>  58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 
> 
> 
> Diff: https://reviews.apache.org/r/69410/diff/1/
> 
> 
> Testing
> ---
> 
> Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
> instances to/from config instances.
> Added (integration-like) unit tests to mock Pig calling HCatLoader for 
> multiple input tables, and checking the reported input sizes.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>



Re: Review Request 69404: HIVE-20932 adding vectorize wrapper for Druid Storage Handler.

2018-11-20 Thread Slim Bouguerra


> On Nov. 20, 2018, 7:57 a.m., Nishant Bangarwa wrote:
> > checkstyle/checkstyle.xml
> > Lines 166 (patched)
> > 
> >
> > please break down this into 2 separate patches.
> > The main benefits of separating this would be easy reviews as well as 
> > backporting the changes to other branches would be easier and would lead to 
> > less conflicts.

For the reviews i mentioned the classes that need to be looked at. All the 
reset is a checkstyle fix. Not sure why it is hard to backport since it will be 
a follow up anyway thus on master branch and other changes will follow. 
I tend to minimize the number of patches to avoid huge wait time on the Test 
queue. My goal is to remove all the style issue, as you can see now the count 
is 0. 
BTW i noticed that your license header is not the one used by Hive can you 
please fix it.


- Slim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69404/#review210701
---


On Nov. 20, 2018, 2:08 a.m., Slim Bouguerra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69404/
> ---
> 
> (Updated Nov. 20, 2018, 2:08 a.m.)
> 
> 
> Review request for hive, Gopal V and Teddy Choi.
> 
> 
> Bugs: HIVE-20932
> https://issues.apache.org/jira/browse/HIVE-20932
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-20932
> Note for reviewing 
> most important changes are :
> 
> - 
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidVectorizedWrapper.java
> - org.apache.hadoop.hive.druid.serde.DruidSerDe#deserializeAsPrimitive
> - org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat#getRecordReader
> 
> 
> Diffs
> -
> 
>   checkstyle/checkstyle.xml 12e166311b 
>   data/files/datasets/druid_table_alltypesorc/load.hive.sql 5fde266a01 
>   data/scripts/q_test_cleanup.sql 1c59381aa0 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/DruidKafkaUtils.java 
> e0e29a3c6d 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java 
> 7434559532 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandlerUtils.java
>  6dc97d53b7 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/conf/DruidConstants.java 
> 242f7be4dd 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/conf/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidQueryBasedInputFormat.java
>  c1e0e75f98 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidRecordWriter.java 
> 65edc665a3 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidVectorizedWrapper.java
>  PRE-CREATION 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/io/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/AvroBytesDecoder.java
>  3a1dbf7229 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/json/AvroParseSpec.java 
> af71f9a732 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/AvroStreamInputRowParser.java
>  d6e6624669 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/InlineSchemaAvroBytesDecoder.java
>  72d6cbbc1e 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/KafkaSupervisorIOConfig.java
>  c1b3bf8d41 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/TaskReportData.java 
> 9ecba1b18c 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/json/package-info.java 
> PRE-CREATION 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/security/package-info.java
>  PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidQueryRecordReader.java
>  53d74417f8 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidSerDe.java 
> 516faf0814 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/serde/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/DerbyConnectorTestUtility.java
>  bf42a74f0f 
>   druid-handler/src/test/org/apache/hadoop/hive/druid/QTestDruidSerDe.java 
> 099e5b3357 
>   druid-handler/src/test/org/apache/hadoop/hive/druid/QTestDruidSerDe2.java 
> f52e721763 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/TestDruidStorageHandler.java
>  0cb3c237b1 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/TestHiveDruidQueryBasedInputFormat.java
>  513119ea32 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/io/TestHiveDruidSplit.java
>  234c783d25 
>   

Re: Review Request 69404: HIVE-20932 adding vectorize wrapper for Druid Storage Handler.

2018-11-20 Thread Slim Bouguerra


> On Nov. 20, 2018, 7:54 a.m., Nishant Bangarwa wrote:
> > data/files/datasets/druid_table_alltypesorc/load.hive.sql
> > Line 36 (original)
> > 
> >
> > why not dropping this table ?

it is dropped later, this will allow us to add tests against the orc table to 
cross match results when not sure.


> On Nov. 20, 2018, 7:54 a.m., Nishant Bangarwa wrote:
> > druid-handler/src/java/org/apache/hadoop/hive/druid/DruidKafkaUtils.java
> > Line 188 (original), 183 (patched)
> > 
> >
> > why this change ?

more java idomatic. In fact it will be better to have an enum ...


- Slim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69404/#review210699
---


On Nov. 20, 2018, 2:08 a.m., Slim Bouguerra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69404/
> ---
> 
> (Updated Nov. 20, 2018, 2:08 a.m.)
> 
> 
> Review request for hive, Gopal V and Teddy Choi.
> 
> 
> Bugs: HIVE-20932
> https://issues.apache.org/jira/browse/HIVE-20932
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-20932
> Note for reviewing 
> most important changes are :
> 
> - 
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidVectorizedWrapper.java
> - org.apache.hadoop.hive.druid.serde.DruidSerDe#deserializeAsPrimitive
> - org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat#getRecordReader
> 
> 
> Diffs
> -
> 
>   checkstyle/checkstyle.xml 12e166311b 
>   data/files/datasets/druid_table_alltypesorc/load.hive.sql 5fde266a01 
>   data/scripts/q_test_cleanup.sql 1c59381aa0 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/DruidKafkaUtils.java 
> e0e29a3c6d 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java 
> 7434559532 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandlerUtils.java
>  6dc97d53b7 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/conf/DruidConstants.java 
> 242f7be4dd 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/conf/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidQueryBasedInputFormat.java
>  c1e0e75f98 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidRecordWriter.java 
> 65edc665a3 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/io/DruidVectorizedWrapper.java
>  PRE-CREATION 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/io/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/AvroBytesDecoder.java
>  3a1dbf7229 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/json/AvroParseSpec.java 
> af71f9a732 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/AvroStreamInputRowParser.java
>  d6e6624669 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/InlineSchemaAvroBytesDecoder.java
>  72d6cbbc1e 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/KafkaSupervisorIOConfig.java
>  c1b3bf8d41 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/json/TaskReportData.java 
> 9ecba1b18c 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/json/package-info.java 
> PRE-CREATION 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/security/package-info.java
>  PRE-CREATION 
>   
> druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidQueryRecordReader.java
>  53d74417f8 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidSerDe.java 
> 516faf0814 
>   druid-handler/src/java/org/apache/hadoop/hive/druid/serde/package-info.java 
> PRE-CREATION 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/DerbyConnectorTestUtility.java
>  bf42a74f0f 
>   druid-handler/src/test/org/apache/hadoop/hive/druid/QTestDruidSerDe.java 
> 099e5b3357 
>   druid-handler/src/test/org/apache/hadoop/hive/druid/QTestDruidSerDe2.java 
> f52e721763 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/TestDruidStorageHandler.java
>  0cb3c237b1 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/TestHiveDruidQueryBasedInputFormat.java
>  513119ea32 
>   
> druid-handler/src/test/org/apache/hadoop/hive/druid/io/TestHiveDruidSplit.java
>  234c783d25 
>   druid-handler/src/test/org/apache/hadoop/hive/druid/io/package-info.java 
> PRE-CREATION 
>   druid-handler/src/test/org/apache/hadoop/hive/druid/package-info.java 
> PRE-CREATION 
>   
> 

Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-20 Thread Adam Szita via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/
---

Review request for hive, Nandor Kollar and Peter Vary.


Bugs: HIVE-20330
https://issues.apache.org/jira/browse/HIVE-20330


Repository: hive-git


Description
---

The change in this patch is that we're not just serializing and putting one 
InputJobInfo into JobConf, but rather always append to a list (or create it on 
the first occurrence) of InputJobInfo instances in it.
This ensures that if multiple tables serve as inputs in a job, Pig can retrieve 
information for each of the tables, not just the last one added.

I've also discovered a bug in InputJobInfo.writeObject() where the 
ObjectOutputStream was closed by mistake after writing partition information in 
a compressed manner. Closing the compressed writer inevitably closed the OOS on 
the context and prevented any other objects to be written into OOS - I had to 
fix that because it prevented serializing InputJobInfo instances inside a list.


Diffs
-

  hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
8e72a1275a5cdcc2d778080fff6bb82198395f5f 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 195eaa367933990e3ef0ef879f34049c65822aee 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
 8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
 ad6f3eb9f93338023863c6239d6af0449b20ff9c 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
 364382d9ccf6eb9fc29689b0eb5f973f422051b4 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
 ac1dd54be821d32aa008d41514df05a41f16223c 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
91aa4fa2693e0b0bd65c1667210af340619f552d 
  
hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
 c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
 58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 


Diff: https://reviews.apache.org/r/69410/diff/1/


Testing
---

Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
instances to/from config instances.
Added (integration-like) unit tests to mock Pig calling HCatLoader for multiple 
input tables, and checking the reported input sizes.


Thanks,

Adam Szita



[jira] [Created] (HIVE-20946) Standardised HMS Audit log formats and complete metadata

2018-11-20 Thread Dharak Kharod (JIRA)
Dharak Kharod created HIVE-20946:


 Summary: Standardised HMS Audit log formats and complete metadata
 Key: HIVE-20946
 URL: https://issues.apache.org/jira/browse/HIVE-20946
 Project: Hive
  Issue Type: Improvement
Reporter: Dharak Kharod
Assignee: Dharak Kharod


Currently the HMS audit log has the following inconsistencies:
 * Inconsistent format across methods
 * Metadata in some methods is not complete
 * Logs for some methods are missing (e.g. grant/revoke privileges, create/drop 
role)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20945) set conf more than 2 is not work although whitelist set in hiveserver2 in Hue or Zeppelin, ...

2018-11-20 Thread sungryoungkim (JIRA)
sungryoungkim created HIVE-20945:


 Summary: set conf more than 2 is not work although whitelist set 
in hiveserver2 in Hue or Zeppelin, ...
 Key: HIVE-20945
 URL: https://issues.apache.org/jira/browse/HIVE-20945
 Project: Hive
  Issue Type: Bug
Reporter: sungryoungkim


I use hive 2.3.2 with Hue and Zeppellin.

and I try below statement.
{quote}set something1=1;

set something2=3;

set something3=4;
{quote}
It works well in beeline, but not in Hue and Zeppelin.

The error message : Cannot modify set something2 at runtime. It is not in list 
of params that are allowed to be modified at runtime

however I already add it in whitelist.

After I debugging hive code, I guess this code is something strange in 
org/apache/hive/service/cli/operation/HiveCommandOperation.java
{quote}Line 111
 String command = getStatement().trim();
 String[] tokens = {color:#FF}statement.split("\\s");{color}  
 String commandArgs = command.substring(tokens[0].length()).trim();
{quote}
 

I think it should be like this 
{quote}String[] tokens = {color:#FF}command.split("\\s");{color}  
{quote}
I test it, and it works very well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)