[jira] [Created] (HIVE-19452) Avoid Deserializing and Serializing Druid query in DruidRecordReaders

2018-05-07 Thread Nishant Bangarwa (JIRA)
Nishant Bangarwa created HIVE-19452:
---

 Summary: Avoid Deserializing and Serializing Druid query in 
DruidRecordReaders
 Key: HIVE-19452
 URL: https://issues.apache.org/jira/browse/HIVE-19452
 Project: Hive
  Issue Type: Task
Reporter: Nishant Bangarwa
Assignee: Nishant Bangarwa


Druid record reader deserializes and serializes the Druid query before sending 
it to druid. 
This can be avoided and we can stop packaging some of druid dependencies e.g. 
org.antlr from druid-handler selfcontained jar. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19451) Druid Query Execution fails with ClassNotFoundException org.antlr.v4.runtime.CharStream

2018-05-07 Thread Nishant Bangarwa (JIRA)
Nishant Bangarwa created HIVE-19451:
---

 Summary: Druid Query Execution fails with ClassNotFoundException 
org.antlr.v4.runtime.CharStream
 Key: HIVE-19451
 URL: https://issues.apache.org/jira/browse/HIVE-19451
 Project: Hive
  Issue Type: Task
Reporter: Nishant Bangarwa
Assignee: Nishant Bangarwa


Stack trace - 
{code}
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, 
vertexId=vertex_1524814504173_1344_45_00, diagnostics=[Task failed, 
taskId=task_1524814504173_1344_45_00_29, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_1524814504173_1344_45_00_29_0:java.lang.RuntimeException: 
java.io.IOException: 
org.apache.hive.druid.com.fasterxml.jackson.databind.exc.InvalidDefinitionException:
 Cannot construct instance of 
`org.apache.hive.druid.io.druid.segment.virtual.ExpressionVirtualColumn`, 
problem: org/antlr/v4/runtime/CharStream
 at [Source: 
(String)"{"queryType":"scan","dataSource":{"type":"table","name":"tpcds_real_bin_partitioned_orc_1000.tpcds_denormalized_druid_table_7mcd"},"intervals":{"type":"segments","segments":[{"itvl":"1998-11-30T00:00:00.000Z/1998-12-01T00:00:00.000Z","ver":"2018-05-03T11:35:22.230Z","part":0}]},"virtualColumns":[{"type":"expression","name":"vc","expression":"\"__time\"","outputType":"LONG"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"filter":{"type":"bound","dimension":"i_brand"[truncated
 241 chars]; line: 1, column: 376] (through reference chain: 
org.apache.hive.druid.io.druid.query.scan.ScanQuery["virtualColumns"]->java.util.ArrayList[0])
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 
org.apache.hive.druid.com.fasterxml.jackson.databind.exc.InvalidDefinitionException:
 Cannot construct instance of 
`org.apache.hive.druid.io.druid.segment.virtual.ExpressionVirtualColumn`, 
problem: org/antlr/v4/runtime/CharStream
 at [Source: 
(String)"{"queryType":"scan","dataSource":{"type":"table","name":"tpcds_real_bin_partitioned_orc_1000.tpcds_denormalized_druid_table_7mcd"},"intervals":{"type":"segments","segments":[{"itvl":"1998-11-30T00:00:00.000Z/1998-12-01T00:00:00.000Z","ver":"2018-05-03T11:35:22.230Z","part":0}]},"virtualColumns":[{"type":"expression","name":"vc","expression":"\"__time\"","outputType":"LONG"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"filter":{"type":"bound","dimension":"i_brand"[truncated
 241 chars]; line: 1, column: 376] (through reference chain: 
org.apache.hive.druid.io.druid.query.scan.ScanQuery["virtualColumns"]->java.util.ArrayList[0])
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:438)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
at 
org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662)
at 
org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
  

Re: [DISCUSS] Storage-API 2.6.1 release

2018-05-07 Thread Owen O'Malley
Implied in this is that we are putting features into a dot release. I would
argue that since storage-api 2.6.0 was just released last week and has only
been adopted by ORC, we can go ahead and sneak features into the bug fix
release. Does anyone have concerns with that plan?

.. Owen

On Mon, May 7, 2018 at 4:49 PM, Owen O'Malley 
wrote:

> Thanks, Deepak.
>
> The storage-api releases are global, not specific to a single branch.
> Please delete the storage-branch-2.6.1 branch, the changes that you need
> should go on the storage-branch-2.6.
>
> Please don't copy all of the patches from branch-3 into
> storage-branch-2.6. Just cherry-pick the ones that touched storage-api.
>
> % git log --format=oneline rel/storage-release-2.6.0..apache/branch-3
> storage-api/
>
> shows that we 6 potential patches:
>
> 54651c783bef724713739e59b0d79672a60c6a2c HIVE-18910 : Migrate to Murmur
> hash for shuffle and bucketing
> cdd31fab6eb51e319df03bff880274dc66dd1c39 HIVE-18988: Support bootstrap
> replication of ACID tables
> ea18769f026429ea6ebbbd66858920ebf869a9d6 HIVE-19124 : implement a basic
> major compactor for MM tables
> a39b24660127ace7459ea1598b12d8add1f7b783 HIVE-19226: Extend storage-api
> to print timestamp values in UTC
> bbb8e27bc8b7c7592f45c1e5ffc2ca00b6db6820 HIVE-19226: Extend storage-api
> to print timestamp values in UTC
> 624e464a2cc4fe4dd9395edf8b377fd7323a299e HIVE-19126: CachedStore: Use
> memory estimation to limit cache size during prewarm
>
> so those are the ones to consider.
>
> .. Owen
>
> On Mon, May 7, 2018 at 4:19 PM, Deepak Jaiswal 
> wrote:
>
>> All,
>>
>> Branch 3 contains changes to migrate from JAVA hash to Murmur hash
>> (HIVE-18910) which required addition of couple of APIs keeping the backward
>> compatibility. Since 2.6.0 is already released, I propose a new branch-3
>> only release of storage-api.
>> Please let me know your thoughts on this. I am working on the release
>> candidate.
>>
>> Regards,
>> Deepak
>>
>
>


[jira] [Created] (HIVE-19450) OOM due to map join and backup task not invoked

2018-05-07 Thread zhuwei (JIRA)
zhuwei created HIVE-19450:
-

 Summary: OOM due to map join and backup task not invoked
 Key: HIVE-19450
 URL: https://issues.apache.org/jira/browse/HIVE-19450
 Project: Hive
  Issue Type: Bug
Reporter: zhuwei
Assignee: zhuwei


Map join task may cause OOM due to orc compression , in most cases , a backup 
task will be invoked. However , if the size of hash table is close to memory 
limit, the task which load the hash table will NOT fail . OOM will happen in 
next task witch do local join. The load task has a backup but next task not. So 
in this case , the whole query will fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19449) Create minimized uber jar for hive streaming module

2018-05-07 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-19449:


 Summary: Create minimized uber jar for hive streaming module
 Key: HIVE-19449
 URL: https://issues.apache.org/jira/browse/HIVE-19449
 Project: Hive
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 3.0.0, 3.1.0
Reporter: Prasanth Jayachandran


Hive streaming API depends on several hive modules (common, serde, ql, orc, 
standalone-metastore etc). Users of the API has to include all the dependencies 
in the classpath for it to work correctly. Provide a uber jar with minimal set 
of dependencies that are required to make use of new streaming API. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66862: HIVE-19258 add originals support to MM tables (and make the conversion a metadata only operation)

2018-05-07 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66862/
---

(Updated May 8, 2018, 1:10 a.m.)


Review request for hive and Thejas Nair.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 23a9c74a60 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 7e17d5d888 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 3141a7e981 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 969c591917 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 183515a0ed 
  ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 
58f0480059 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 655d10b643 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 2337a350e6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java 
b61a945d94 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
b698c84080 
  ql/src/test/queries/clientpositive/mm_conversions.q 55565a9428 
  ql/src/test/results/clientpositive/llap/mm_conversions.q.out 4754710291 


Diff: https://reviews.apache.org/r/66862/diff/3/

Changes: https://reviews.apache.org/r/66862/diff/2-3/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Created] (HIVE-19448) Vectorization: sysdb test doesn't work after enabling vectorization by default

2018-05-07 Thread Matt McCline (JIRA)
Matt McCline created HIVE-19448:
---

 Summary: Vectorization: sysdb test doesn't work after enabling 
vectorization by default
 Key: HIVE-19448
 URL: https://issues.apache.org/jira/browse/HIVE-19448
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


{noformat}
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
java.lang.Boolean
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaBooleanObjectInspector.getPrimitiveWritableObject(JavaBooleanObjectInspector.java:36)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:434)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:347)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:948){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Storage-API 2.6.1 release

2018-05-07 Thread Deepak Jaiswal
Hi Owen,

Thanks for the help. I will work on this.

Regards,
Deepak

On 5/7/18, 4:49 PM, "Owen O'Malley"  wrote:

Thanks, Deepak.

The storage-api releases are global, not specific to a single branch.
Please delete the storage-branch-2.6.1 branch, the changes that you need
should go on the storage-branch-2.6.

Please don't copy all of the patches from branch-3 into storage-branch-2.6.
Just cherry-pick the ones that touched storage-api.

% git log --format=oneline rel/storage-release-2.6.0..apache/branch-3
storage-api/

shows that we 6 potential patches:

54651c783bef724713739e59b0d79672a60c6a2c HIVE-18910 : Migrate to Murmur
hash for shuffle and bucketing
cdd31fab6eb51e319df03bff880274dc66dd1c39 HIVE-18988: Support bootstrap
replication of ACID tables
ea18769f026429ea6ebbbd66858920ebf869a9d6 HIVE-19124 : implement a basic
major compactor for MM tables
a39b24660127ace7459ea1598b12d8add1f7b783 HIVE-19226: Extend storage-api to
print timestamp values in UTC
bbb8e27bc8b7c7592f45c1e5ffc2ca00b6db6820 HIVE-19226: Extend storage-api to
print timestamp values in UTC
624e464a2cc4fe4dd9395edf8b377fd7323a299e HIVE-19126: CachedStore: Use
memory estimation to limit cache size during prewarm

so those are the ones to consider.

.. Owen

On Mon, May 7, 2018 at 4:19 PM, Deepak Jaiswal 
wrote:

> All,
>
> Branch 3 contains changes to migrate from JAVA hash to Murmur hash
> (HIVE-18910) which required addition of couple of APIs keeping the 
backward
> compatibility. Since 2.6.0 is already released, I propose a new branch-3
> only release of storage-api.
> Please let me know your thoughts on this. I am working on the release
> candidate.
>
> Regards,
> Deepak
>




Re: [DISCUSS] Storage-API 2.6.1 release

2018-05-07 Thread Owen O'Malley
Thanks, Deepak.

The storage-api releases are global, not specific to a single branch.
Please delete the storage-branch-2.6.1 branch, the changes that you need
should go on the storage-branch-2.6.

Please don't copy all of the patches from branch-3 into storage-branch-2.6.
Just cherry-pick the ones that touched storage-api.

% git log --format=oneline rel/storage-release-2.6.0..apache/branch-3
storage-api/

shows that we 6 potential patches:

54651c783bef724713739e59b0d79672a60c6a2c HIVE-18910 : Migrate to Murmur
hash for shuffle and bucketing
cdd31fab6eb51e319df03bff880274dc66dd1c39 HIVE-18988: Support bootstrap
replication of ACID tables
ea18769f026429ea6ebbbd66858920ebf869a9d6 HIVE-19124 : implement a basic
major compactor for MM tables
a39b24660127ace7459ea1598b12d8add1f7b783 HIVE-19226: Extend storage-api to
print timestamp values in UTC
bbb8e27bc8b7c7592f45c1e5ffc2ca00b6db6820 HIVE-19226: Extend storage-api to
print timestamp values in UTC
624e464a2cc4fe4dd9395edf8b377fd7323a299e HIVE-19126: CachedStore: Use
memory estimation to limit cache size during prewarm

so those are the ones to consider.

.. Owen

On Mon, May 7, 2018 at 4:19 PM, Deepak Jaiswal 
wrote:

> All,
>
> Branch 3 contains changes to migrate from JAVA hash to Murmur hash
> (HIVE-18910) which required addition of couple of APIs keeping the backward
> compatibility. Since 2.6.0 is already released, I propose a new branch-3
> only release of storage-api.
> Please let me know your thoughts on this. I am working on the release
> candidate.
>
> Regards,
> Deepak
>


[DISCUSS] Storage-API 2.6.1 release

2018-05-07 Thread Deepak Jaiswal
All,

Branch 3 contains changes to migrate from JAVA hash to Murmur hash (HIVE-18910) 
which required addition of couple of APIs keeping the backward compatibility. 
Since 2.6.0 is already released, I propose a new branch-3 only release of 
storage-api.
Please let me know your thoughts on this. I am working on the release candidate.

Regards,
Deepak


[jira] [Created] (HIVE-19447) BucketizedHiveInputFormat doesn't account for ACID

2018-05-07 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19447:
---

 Summary: BucketizedHiveInputFormat doesn't account for ACID
 Key: HIVE-19447
 URL: https://issues.apache.org/jira/browse/HIVE-19447
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


See the TODO added in HIVE-19312.
It doesn't account for MM tables and can apparently be used for them, producing 
incorrect results.

I'm not sure if it can be used for ACID tables; we need to fix it for ACID 
(w.r.t. the ACID-related logic in HIF) or perhaps add a negative test where for 
the same query it's used for a non-ACID table but not used for an ACID table. 
mm_bhif test has a simple example query (count distinct iirc)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19446) QueryCache: Transaction lists needed for pending cache entries

2018-05-07 Thread Gopal V (JIRA)
Gopal V created HIVE-19446:
--

 Summary: QueryCache: Transaction lists needed for pending cache 
entries
 Key: HIVE-19446
 URL: https://issues.apache.org/jira/browse/HIVE-19446
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


Hive query-cache needs a transactional list, even when the entry is pending 
state so that other identical queries with the same transactional state can 
wait for the first query to complete, instead of triggering their own instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19445) Graceful handling of "close" in WritableByteChannelAdapter

2018-05-07 Thread Eric Wohlstadter (JIRA)
Eric Wohlstadter created HIVE-19445:
---

 Summary: Graceful handling of "close" in WritableByteChannelAdapter
 Key: HIVE-19445
 URL: https://issues.apache.org/jira/browse/HIVE-19445
 Project: Hive
  Issue Type: Bug
Reporter: Eric Wohlstadter


org.apache.hadoop.hive.llap.WritableByteChannelAdapter
{quote}"I see now that the writeListener could be implemented in such a way as 
to propagate a write error back to the writer (so we can possibly throw an 
exception and fail the current operation rather than just log and ignore the 
error). Plus on close I'm wondering if it is better just to wait for the close 
future to complete so we can check the status."
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Hive 3.0.0 release preparation

2018-05-07 Thread Deepak Jaiswal
Owen, thanks for bringing this to attention. It is definitely me who made those 
changes.
The good thing is the change is backward compatible and the new API is needed 
to keep Vectorized mode performant while using Murmur hash.
We can treat this as a bug fix and I can work on a new point-point release.

Regards,
Deepak

On 5/7/18, 1:08 PM, "Owen O'Malley"  wrote:

*Sigh* It looks like branch-3 requires features in storage-api that were
added after storage-api 2.6.0.

You'll need to make another storage-api release or revert those changes.

.. Owen

On Mon, May 7, 2018 at 11:58 AM, Vihang Karajgaonkar 
wrote:

> How do we handle the cases where a patch needs to go in branch-2? I think
> the reason of the commits are going in branch-3 is because you don't want
> have a intermediate version where a patch is missing. Eg: patch is fixed 
in
> Hive 2.4.0 and Hive 3.1.0 but not Hive 3.0.0. Is that normal?
>
> On Mon, May 7, 2018 at 11:34 AM, Vineet Garg 
> wrote:
>
> > Hello all,
> >
> > It’s been more than a month since we have cutoff branch-3. There are
> tests
> > which are still failing consistently(https://issues.
> > apache.org/jira/browse/HIVE-19142).  At this point we should work on
> > fixing tests and stabilize the branch. So please refrain from committing
> > anything but test fixes to branch-3.
> > If you have a patch beside test fix which you would like to get into
> > branch-3 please talk to me before committing.
> >
> > Also if you are assigned a JIRA for fixing test in branch-3 please fix 
it
> > as soon as you can.
> >
> > Thanks,
> > Vineet Garg
> >
> > On Apr 13, 2018, at 12:36 PM, Vihang Karajgaonkar  > > wrote:
> >
> > Hi Vineet,
> >
> > I created a profile on ptest-server so that tests can be run on 
branch-3.
> > It is the same as branch-2 patches. You will need to include branch-3 in
> > the patch name. Eg. HIVE-1234.01-branch-3.patch
> >
> > -Vihang
> >
> >
> >
> > On Mon, Apr 9, 2018 at 4:35 PM, Vineet Garg   > vg...@hortonworks.com>> wrote:
> >
> > I have created an umbrella jira to investigate and fix test failures for
> > hive 3.0.0. LINK : https://issues.apache.org/jira/browse/HIVE-19142.
> > Please link any other existing jira related to test failure with this
> > umbrella jira.
> >
> > Also, how do we run tests on branch-3? Is there some setup to be done?
> >
> > -Vineet
> >
> > On Apr 9, 2018, at 4:26 AM, Zoltan Haindrich  <
> > mailto:zhaindr...@hortonworks.com><
> > mailto:zhaindr...@hortonworks.com>> wrote:
> >
> > Hello
> >
> > A few weeks earlier I've tried to hunt down this problem...
> > so...to my best knowledge the cause of this seems to be the following:
> >
> > * in some cases the "cleanup" after a failed query may somehow leave 
some
> > threads behind...
> > * these threads have reference to the "customized" session classloader -
> > this makes the threads more memory hungry
> > * after a while these threads/classloaders eat up the heap...
> >
> > I've opened HIVE-18522 for this thread issue
> >
> > I think this problem is not new ...and it might have been present 
earlier
> > as well...the only thing what changed is that there were a few more new
> > features which have added new udfs/etc which made the memory cost of a
> > session more heavier..
> > ...and as a sidenote: I'm not convinced that this issue will arise in a
> > proper hs2 setup - as it might be easily connected to the fact that 
these
> > tests are using the cli driver to execute the tests.
> >
> >
> > cheers,
> > Zoltan
> >
> > On 7 Apr 2018 7:15 p.m., Ashutosh Chauhan  ashut...@apache.org> > ashut...@apache.org>> wrote:
> > We need to investigate and find out root cause of these failures. If its
> > determined that its a corner case and fix is non-trivial then we may
> > release note it under known issues. But ideally we should fix these
> > failures.
> > Cutting a branch should make it easier since branch is expected to
> receive
> > lot less commits as compared to master so it should be faster to
> stabilize
> > branch.
> >
> > On Fri, Apr 6, 2018 at 10:49 AM, Eugene Koifman <
> ekoif...@hortonworks.com<
> > mailto:ekoif...@hortonworks.com><
> > mailto:ekoif...@hortonworks.com>>
> > wrote:
> >
> > Cutting the branch before the tests are stabilized would mean we have to
> > fix them in 2 places.
> >
> > On 4/6/18, 10:05 AM, "Thejas Nair" 

Re: Integrating Yetus with Precommit job

2018-05-07 Thread Sahil Takiar
The FindBugs plugin for Yetus is now working. Yetus will give a -1 if it
finds any FindBugs warning in your patch. It gives a 0 for any patch
applied to a module that contains existing FindBugs warnings (e.g. ql has
2318 existing FindBugs issues).

On Mon, Nov 27, 2017 at 8:57 AM, Andrew Sherman 
wrote:

> Thanks, this is going to be useful
>
> On Wed, Nov 22, 2017 at 11:28 AM, Vineet Garg 
> wrote:
>
> > Thanks Adam!
> >
> > > On Nov 22, 2017, at 5:46 AM, Adam Szita  wrote:
> > >
> > > This is now done. Patch is committed and we deployed the new war file
> to
> > > the ptest server.
> > >
> > > Jobs that were waiting in queue at the time of ptest server restart
> have
> > > been retriggered in Jenkins.
> > >
> > > I hope this change will contribute to the overall code quality of Hive
> in
> > > our future patches to come :)
> > >
> > > On 21 November 2017 at 17:39, Adam Szita  wrote:
> > >
> > >> Hi,
> > >>
> > >> In the last days all prerequisites have been resolved for this:
> > >> -ASF headers are fixed
> > >> -checkstyle is upgraded to support Java8
> > >> -proper checkstyle configuration has been introduced to poms that are
> > >> disconnected from Hive's root pom
> > >>
> > >> Thanks Alan for reviewing these.
> > >>
> > >> Therefore we plan to move ahead with this tomorrow around 10:00AM CET,
> > do
> > >> the commit with Peter Vary and replace the war file among ptest
> servers
> > >> Tomcat webapps.
> > >>
> > >> Thanks,
> > >> Adam
> > >>
> > >> On 7 November 2017 at 18:42, Alan Gates  wrote:
> > >>
> > >>> I’ve put some feedback in HIVE-17995.  17996 and 17997 look good.
> I’ll
> > >>> commit them once the tests run.
> > >>>
> > >>> I think you’ll need to do similar patches for storage-api, as it is
> > also
> > >>> not connected to the hive pom anymore.
> > >>>
> > >>> Alan.
> > >>>
> > >>> On Tue, Nov 7, 2017 at 6:17 AM, Adam Szita 
> wrote:
> > >>>
> >  Thanks for all the replies.
> > 
> >  Vihang: Good idea on making everything green before turning this on.
> > For
> >  this purpose I've filed a couple of jiras:
> >  -HIVE-17995  Run
> >  checkstyle on standalone-metastore module with proper configuration
> >  -HIVE-17996  Fix
> > ASF
> >  headers
> >  -HIVE-17997  Add
> > rat
> >  plugin and configuration to standalone metastore pom
> > 
> >  Sahil: there is an umbrella jira (HIVE-13503
> >  ) for test
> > >>> improvements,
> >  the Yetus integration itself is also a subtask of it. I think any
> > >>> further
> >  improvements on what Yetus features we want to enable should go here
> > >>> too.
> > 
> >  Adam
> > 
> > 
> > 
> > 
> > 
> > >>>
> > >>
> > >>
> >
> >
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309


Re: precommit-admin job stuck again

2018-05-07 Thread Vihang Karajgaonkar
Meanwhile the committers can submit the job manually on
https://builds.apache.org/job/PreCommit-HIVE-Build/

On Mon, May 7, 2018 at 2:01 PM, Vihang Karajgaonkar 
wrote:

> Looks like precommit-admin job is stuck for some reason which is
> preventing all the precommit jobs for Hive (also for Hadoop, Hbase etc)
>
> https://builds.apache.org/job/PreCommit-Admin/
> https://issues.apache.org/jira/browse/INFRA-16492
>


precommit-admin job stuck again

2018-05-07 Thread Vihang Karajgaonkar
Looks like precommit-admin job is stuck for some reason which is preventing
all the precommit jobs for Hive (also for Hadoop, Hbase etc)

https://builds.apache.org/job/PreCommit-Admin/
https://issues.apache.org/jira/browse/INFRA-16492


[jira] [Created] (HIVE-19444) Create View - Table not found _dummy_table

2018-05-07 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-19444:
--

 Summary: Create View - Table not found _dummy_table
 Key: HIVE-19444
 URL: https://issues.apache.org/jira/browse/HIVE-19444
 Project: Hive
  Issue Type: Bug
  Components: Views
Affects Versions: 1.1.0
Reporter: BELUGA BEHR


{code:sql}
CREATE VIEW view_s1 AS select 1;

-- FAILED: SemanticException 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found 
_dummy_table
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Hive 3.0.0 release preparation

2018-05-07 Thread Vihang Karajgaonkar
Okay. Thanks for that information.

On Mon, May 7, 2018 at 12:45 PM, Thejas Nair  wrote:

> Hi Vihang,
> That is expected if you have 2.4.0 release that happens little after
> (or around same time as)  a 3.0 release.
> This is more likely to happen with last minute bug fixes.
>
>
>
> On Mon, May 7, 2018 at 11:58 AM, Vihang Karajgaonkar
>  wrote:
> > How do we handle the cases where a patch needs to go in branch-2? I think
> > the reason of the commits are going in branch-3 is because you don't want
> > have a intermediate version where a patch is missing. Eg: patch is fixed
> in
> > Hive 2.4.0 and Hive 3.1.0 but not Hive 3.0.0. Is that normal?
> >
> > On Mon, May 7, 2018 at 11:34 AM, Vineet Garg 
> wrote:
> >
> >> Hello all,
> >>
> >> It’s been more than a month since we have cutoff branch-3. There are
> tests
> >> which are still failing consistently(https://issues.
> >> apache.org/jira/browse/HIVE-19142).  At this point we should work on
> >> fixing tests and stabilize the branch. So please refrain from committing
> >> anything but test fixes to branch-3.
> >> If you have a patch beside test fix which you would like to get into
> >> branch-3 please talk to me before committing.
> >>
> >> Also if you are assigned a JIRA for fixing test in branch-3 please fix
> it
> >> as soon as you can.
> >>
> >> Thanks,
> >> Vineet Garg
> >>
> >> On Apr 13, 2018, at 12:36 PM, Vihang Karajgaonkar  >> > wrote:
> >>
> >> Hi Vineet,
> >>
> >> I created a profile on ptest-server so that tests can be run on
> branch-3.
> >> It is the same as branch-2 patches. You will need to include branch-3 in
> >> the patch name. Eg. HIVE-1234.01-branch-3.patch
> >>
> >> -Vihang
> >>
> >>
> >>
> >> On Mon, Apr 9, 2018 at 4:35 PM, Vineet Garg   >> vg...@hortonworks.com>> wrote:
> >>
> >> I have created an umbrella jira to investigate and fix test failures for
> >> hive 3.0.0. LINK : https://issues.apache.org/jira/browse/HIVE-19142.
> >> Please link any other existing jira related to test failure with this
> >> umbrella jira.
> >>
> >> Also, how do we run tests on branch-3? Is there some setup to be done?
> >>
> >> -Vineet
> >>
> >> On Apr 9, 2018, at 4:26 AM, Zoltan Haindrich <
> zhaindr...@hortonworks.com<
> >> mailto:zhaindr...@hortonworks.com><
> >> mailto:zhaindr...@hortonworks.com>> wrote:
> >>
> >> Hello
> >>
> >> A few weeks earlier I've tried to hunt down this problem...
> >> so...to my best knowledge the cause of this seems to be the following:
> >>
> >> * in some cases the "cleanup" after a failed query may somehow leave
> some
> >> threads behind...
> >> * these threads have reference to the "customized" session classloader -
> >> this makes the threads more memory hungry
> >> * after a while these threads/classloaders eat up the heap...
> >>
> >> I've opened HIVE-18522 for this thread issue
> >>
> >> I think this problem is not new ...and it might have been present
> earlier
> >> as well...the only thing what changed is that there were a few more new
> >> features which have added new udfs/etc which made the memory cost of a
> >> session more heavier..
> >> ...and as a sidenote: I'm not convinced that this issue will arise in a
> >> proper hs2 setup - as it might be easily connected to the fact that
> these
> >> tests are using the cli driver to execute the tests.
> >>
> >>
> >> cheers,
> >> Zoltan
> >>
> >> On 7 Apr 2018 7:15 p.m., Ashutosh Chauhan  >> ashut...@apache.org> >> ashut...@apache.org>> wrote:
> >> We need to investigate and find out root cause of these failures. If its
> >> determined that its a corner case and fix is non-trivial then we may
> >> release note it under known issues. But ideally we should fix these
> >> failures.
> >> Cutting a branch should make it easier since branch is expected to
> receive
> >> lot less commits as compared to master so it should be faster to
> stabilize
> >> branch.
> >>
> >> On Fri, Apr 6, 2018 at 10:49 AM, Eugene Koifman <
> ekoif...@hortonworks.com<
> >> mailto:ekoif...@hortonworks.com><
> >> mailto:ekoif...@hortonworks.com>>
> >> wrote:
> >>
> >> Cutting the branch before the tests are stabilized would mean we have to
> >> fix them in 2 places.
> >>
> >> On 4/6/18, 10:05 AM, "Thejas Nair" > thejas.n...@gmail.com> >> thejas.n...@gmail.com>> wrote:
> >>
> >>   That needs to be cleaned up. There are far too many right now, its
> >>   just not handful of flaky tests.
> >>
> >>
> >>   On Fri, Apr 6, 2018 at 2:48 AM, Peter Vary   >> pv...@cloudera.com> >> pv...@cloudera.com>> wrote:
> >> Hi Team,
> >>
> >> I am new to the Hive release process and it is not clear to me how
> >> the failing tests are handled. Do we plan to fix the failing tests
> before
> >> release? Or it is accepted to cut a 

Re: Apache Hive 3.0.0 release preparation

2018-05-07 Thread Owen O'Malley
*Sigh* It looks like branch-3 requires features in storage-api that were
added after storage-api 2.6.0.

You'll need to make another storage-api release or revert those changes.

.. Owen

On Mon, May 7, 2018 at 11:58 AM, Vihang Karajgaonkar 
wrote:

> How do we handle the cases where a patch needs to go in branch-2? I think
> the reason of the commits are going in branch-3 is because you don't want
> have a intermediate version where a patch is missing. Eg: patch is fixed in
> Hive 2.4.0 and Hive 3.1.0 but not Hive 3.0.0. Is that normal?
>
> On Mon, May 7, 2018 at 11:34 AM, Vineet Garg 
> wrote:
>
> > Hello all,
> >
> > It’s been more than a month since we have cutoff branch-3. There are
> tests
> > which are still failing consistently(https://issues.
> > apache.org/jira/browse/HIVE-19142).  At this point we should work on
> > fixing tests and stabilize the branch. So please refrain from committing
> > anything but test fixes to branch-3.
> > If you have a patch beside test fix which you would like to get into
> > branch-3 please talk to me before committing.
> >
> > Also if you are assigned a JIRA for fixing test in branch-3 please fix it
> > as soon as you can.
> >
> > Thanks,
> > Vineet Garg
> >
> > On Apr 13, 2018, at 12:36 PM, Vihang Karajgaonkar  > > wrote:
> >
> > Hi Vineet,
> >
> > I created a profile on ptest-server so that tests can be run on branch-3.
> > It is the same as branch-2 patches. You will need to include branch-3 in
> > the patch name. Eg. HIVE-1234.01-branch-3.patch
> >
> > -Vihang
> >
> >
> >
> > On Mon, Apr 9, 2018 at 4:35 PM, Vineet Garg   > vg...@hortonworks.com>> wrote:
> >
> > I have created an umbrella jira to investigate and fix test failures for
> > hive 3.0.0. LINK : https://issues.apache.org/jira/browse/HIVE-19142.
> > Please link any other existing jira related to test failure with this
> > umbrella jira.
> >
> > Also, how do we run tests on branch-3? Is there some setup to be done?
> >
> > -Vineet
> >
> > On Apr 9, 2018, at 4:26 AM, Zoltan Haindrich  <
> > mailto:zhaindr...@hortonworks.com><
> > mailto:zhaindr...@hortonworks.com>> wrote:
> >
> > Hello
> >
> > A few weeks earlier I've tried to hunt down this problem...
> > so...to my best knowledge the cause of this seems to be the following:
> >
> > * in some cases the "cleanup" after a failed query may somehow leave some
> > threads behind...
> > * these threads have reference to the "customized" session classloader -
> > this makes the threads more memory hungry
> > * after a while these threads/classloaders eat up the heap...
> >
> > I've opened HIVE-18522 for this thread issue
> >
> > I think this problem is not new ...and it might have been present earlier
> > as well...the only thing what changed is that there were a few more new
> > features which have added new udfs/etc which made the memory cost of a
> > session more heavier..
> > ...and as a sidenote: I'm not convinced that this issue will arise in a
> > proper hs2 setup - as it might be easily connected to the fact that these
> > tests are using the cli driver to execute the tests.
> >
> >
> > cheers,
> > Zoltan
> >
> > On 7 Apr 2018 7:15 p.m., Ashutosh Chauhan  ashut...@apache.org> > ashut...@apache.org>> wrote:
> > We need to investigate and find out root cause of these failures. If its
> > determined that its a corner case and fix is non-trivial then we may
> > release note it under known issues. But ideally we should fix these
> > failures.
> > Cutting a branch should make it easier since branch is expected to
> receive
> > lot less commits as compared to master so it should be faster to
> stabilize
> > branch.
> >
> > On Fri, Apr 6, 2018 at 10:49 AM, Eugene Koifman <
> ekoif...@hortonworks.com<
> > mailto:ekoif...@hortonworks.com><
> > mailto:ekoif...@hortonworks.com>>
> > wrote:
> >
> > Cutting the branch before the tests are stabilized would mean we have to
> > fix them in 2 places.
> >
> > On 4/6/18, 10:05 AM, "Thejas Nair"  thejas.n...@gmail.com> > thejas.n...@gmail.com>> wrote:
> >
> >   That needs to be cleaned up. There are far too many right now, its
> >   just not handful of flaky tests.
> >
> >
> >   On Fri, Apr 6, 2018 at 2:48 AM, Peter Vary  pv...@cloudera.com> > pv...@cloudera.com>> wrote:
> > Hi Team,
> >
> > I am new to the Hive release process and it is not clear to me how
> > the failing tests are handled. Do we plan to fix the failing tests before
> > release? Or it is accepted to cut a new major release with known test
> > issues.
> >
> > Thanks,
> > Peter
> >
> > On Apr 5, 2018, at 8:25 PM, Vineet Garg  vg...@hortonworks.com> > vg...@hortonworks.com>>
> > wrote:
> >
> > Hello,
> >
> > I 

Re: May 2018 Hive User Group Meeting

2018-05-07 Thread Sahil Takiar
Hey Everyone,

The meetup is only a day away! Here

is a link to all the abstracts we have compiled thus far. Several of you
have asked about event streaming and recordings. The meetup will be both
streamed live and recorded. We will post the links on this thread and on
the meetup link tomorrow closer to the start of the meetup.

The meetup will be at Cloudera HQ - 395 Page Mill Rd. If you have any
trouble getting into the building, feel free to post on the meetup link.

Meetup Link:
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/

On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar  wrote:

> Hey Everyone,
>
> The agenda for the meetup has been set and I'm excited to say we have lots
> of interesting talks scheduled! Below is final agenda, the full list of
> abstracts will be sent out soon. If you are planning to attend, please RSVP
> on the meetup link so we can get an accurate headcount of attendees (
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>
> 6:30 - 7:00 PM Networking and Refreshments
> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>
>- What's new in Hive 3.0.0 - Ashutosh Chauhan
>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>- Parquet Vectorization in Hive - Vihang Karajgaonkar
>- ORC Column Level Encryption - Owen O’Malley
>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>- Materialized Views in Hive - Jesus Camacho Rodriguez
>
> 8:30 PM - 9:00 PM Hive Metastore Panel
>
>- Moderator: Vihang Karajgaonkar
>- Participants:
>   - Daniel Dai - Hive Metastore Caching
>   - Alan Gates - Hive Metastore Separation
>   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
>   Metadata
>
> The Metastore panel will consist of a short presentation by each panelist
> followed by a Q session driven by the moderator.
>
> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar 
> wrote:
>
>> We still have a few slots open for lightening talks, so if anyone is
>> interested in giving a presentation don't hesitate to reach out!
>>
>> If you are planning to attend the meetup, please RSVP on the Meetup link (
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so
>> that we can get an accurate headcount for food.
>>
>> Thanks!
>>
>> --Sahil
>>
>> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm happy to announce that the Hive community is organizing a Hive user
>>> group meeting in the Bay Area next month. The details can be found at
>>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>>>
>>> The format of this meetup will be slightly different from previous ones.
>>> There will be one hour dedicated to lightning talks, followed by a group
>>> discussion on the future of the Hive Metastore.
>>>
>>> We are inviting talk proposals from Hive users as well as developers at
>>> this time. Please contact either myself (takiar.sa...@gmail.com),
>>> Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
>>> pv...@cloudera.com) with proposals. We currently have 5 openings.
>>>
>>> Please let me know if you have any questions or suggestions.
>>>
>>> Thanks,
>>> Sahil
>>>
>>
>>
>>
>> --
>> Sahil Takiar
>> Software Engineer
>> takiar.sa...@gmail.com | (510) 673-0309
>>
>
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309


Re: Apache Hive 3.0.0 release preparation

2018-05-07 Thread Thejas Nair
Hi Vihang,
That is expected if you have 2.4.0 release that happens little after
(or around same time as)  a 3.0 release.
This is more likely to happen with last minute bug fixes.



On Mon, May 7, 2018 at 11:58 AM, Vihang Karajgaonkar
 wrote:
> How do we handle the cases where a patch needs to go in branch-2? I think
> the reason of the commits are going in branch-3 is because you don't want
> have a intermediate version where a patch is missing. Eg: patch is fixed in
> Hive 2.4.0 and Hive 3.1.0 but not Hive 3.0.0. Is that normal?
>
> On Mon, May 7, 2018 at 11:34 AM, Vineet Garg  wrote:
>
>> Hello all,
>>
>> It’s been more than a month since we have cutoff branch-3. There are tests
>> which are still failing consistently(https://issues.
>> apache.org/jira/browse/HIVE-19142).  At this point we should work on
>> fixing tests and stabilize the branch. So please refrain from committing
>> anything but test fixes to branch-3.
>> If you have a patch beside test fix which you would like to get into
>> branch-3 please talk to me before committing.
>>
>> Also if you are assigned a JIRA for fixing test in branch-3 please fix it
>> as soon as you can.
>>
>> Thanks,
>> Vineet Garg
>>
>> On Apr 13, 2018, at 12:36 PM, Vihang Karajgaonkar > > wrote:
>>
>> Hi Vineet,
>>
>> I created a profile on ptest-server so that tests can be run on branch-3.
>> It is the same as branch-2 patches. You will need to include branch-3 in
>> the patch name. Eg. HIVE-1234.01-branch-3.patch
>>
>> -Vihang
>>
>>
>>
>> On Mon, Apr 9, 2018 at 4:35 PM, Vineet Garg  vg...@hortonworks.com>> wrote:
>>
>> I have created an umbrella jira to investigate and fix test failures for
>> hive 3.0.0. LINK : https://issues.apache.org/jira/browse/HIVE-19142.
>> Please link any other existing jira related to test failure with this
>> umbrella jira.
>>
>> Also, how do we run tests on branch-3? Is there some setup to be done?
>>
>> -Vineet
>>
>> On Apr 9, 2018, at 4:26 AM, Zoltan Haindrich > mailto:zhaindr...@hortonworks.com><
>> mailto:zhaindr...@hortonworks.com>> wrote:
>>
>> Hello
>>
>> A few weeks earlier I've tried to hunt down this problem...
>> so...to my best knowledge the cause of this seems to be the following:
>>
>> * in some cases the "cleanup" after a failed query may somehow leave some
>> threads behind...
>> * these threads have reference to the "customized" session classloader -
>> this makes the threads more memory hungry
>> * after a while these threads/classloaders eat up the heap...
>>
>> I've opened HIVE-18522 for this thread issue
>>
>> I think this problem is not new ...and it might have been present earlier
>> as well...the only thing what changed is that there were a few more new
>> features which have added new udfs/etc which made the memory cost of a
>> session more heavier..
>> ...and as a sidenote: I'm not convinced that this issue will arise in a
>> proper hs2 setup - as it might be easily connected to the fact that these
>> tests are using the cli driver to execute the tests.
>>
>>
>> cheers,
>> Zoltan
>>
>> On 7 Apr 2018 7:15 p.m., Ashutosh Chauhan  ashut...@apache.org>> ashut...@apache.org>> wrote:
>> We need to investigate and find out root cause of these failures. If its
>> determined that its a corner case and fix is non-trivial then we may
>> release note it under known issues. But ideally we should fix these
>> failures.
>> Cutting a branch should make it easier since branch is expected to receive
>> lot less commits as compared to master so it should be faster to stabilize
>> branch.
>>
>> On Fri, Apr 6, 2018 at 10:49 AM, Eugene Koifman > mailto:ekoif...@hortonworks.com><
>> mailto:ekoif...@hortonworks.com>>
>> wrote:
>>
>> Cutting the branch before the tests are stabilized would mean we have to
>> fix them in 2 places.
>>
>> On 4/6/18, 10:05 AM, "Thejas Nair"  thejas.n...@gmail.com>> thejas.n...@gmail.com>> wrote:
>>
>>   That needs to be cleaned up. There are far too many right now, its
>>   just not handful of flaky tests.
>>
>>
>>   On Fri, Apr 6, 2018 at 2:48 AM, Peter Vary  pv...@cloudera.com>> pv...@cloudera.com>> wrote:
>> Hi Team,
>>
>> I am new to the Hive release process and it is not clear to me how
>> the failing tests are handled. Do we plan to fix the failing tests before
>> release? Or it is accepted to cut a new major release with known test
>> issues.
>>
>> Thanks,
>> Peter
>>
>> On Apr 5, 2018, at 8:25 PM, Vineet Garg  vg...@hortonworks.com>> vg...@hortonworks.com>>
>> wrote:
>>
>> Hello,
>>
>> I plan to cut off branch for Hive 3.0.0 on Monday (9 April) since
>> bunch of folks have big patches pending.
>>
>> Regards,
>> Vineet G
>>
>> 

[jira] [Created] (HIVE-19443) Issue with Druid timestamp with timezone handling

2018-05-07 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19443:
-

 Summary: Issue with Druid timestamp with timezone handling
 Key: HIVE-19443
 URL: https://issues.apache.org/jira/browse/HIVE-19443
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Reporter: slim bouguerra
 Attachments: test_resutls.out, test_timestamp.q

As you can see at the attached file [^test_resutls.out] when switching current 
timezone to UTC the insert of values from Hive table into Druid table does miss 
some rows.

You can use this to reproduce it.

[^test_timestamp.q]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Hive 3.0.0 release preparation

2018-05-07 Thread Vihang Karajgaonkar
How do we handle the cases where a patch needs to go in branch-2? I think
the reason of the commits are going in branch-3 is because you don't want
have a intermediate version where a patch is missing. Eg: patch is fixed in
Hive 2.4.0 and Hive 3.1.0 but not Hive 3.0.0. Is that normal?

On Mon, May 7, 2018 at 11:34 AM, Vineet Garg  wrote:

> Hello all,
>
> It’s been more than a month since we have cutoff branch-3. There are tests
> which are still failing consistently(https://issues.
> apache.org/jira/browse/HIVE-19142).  At this point we should work on
> fixing tests and stabilize the branch. So please refrain from committing
> anything but test fixes to branch-3.
> If you have a patch beside test fix which you would like to get into
> branch-3 please talk to me before committing.
>
> Also if you are assigned a JIRA for fixing test in branch-3 please fix it
> as soon as you can.
>
> Thanks,
> Vineet Garg
>
> On Apr 13, 2018, at 12:36 PM, Vihang Karajgaonkar  > wrote:
>
> Hi Vineet,
>
> I created a profile on ptest-server so that tests can be run on branch-3.
> It is the same as branch-2 patches. You will need to include branch-3 in
> the patch name. Eg. HIVE-1234.01-branch-3.patch
>
> -Vihang
>
>
>
> On Mon, Apr 9, 2018 at 4:35 PM, Vineet Garg > wrote:
>
> I have created an umbrella jira to investigate and fix test failures for
> hive 3.0.0. LINK : https://issues.apache.org/jira/browse/HIVE-19142.
> Please link any other existing jira related to test failure with this
> umbrella jira.
>
> Also, how do we run tests on branch-3? Is there some setup to be done?
>
> -Vineet
>
> On Apr 9, 2018, at 4:26 AM, Zoltan Haindrich  mailto:zhaindr...@hortonworks.com><
> mailto:zhaindr...@hortonworks.com>> wrote:
>
> Hello
>
> A few weeks earlier I've tried to hunt down this problem...
> so...to my best knowledge the cause of this seems to be the following:
>
> * in some cases the "cleanup" after a failed query may somehow leave some
> threads behind...
> * these threads have reference to the "customized" session classloader -
> this makes the threads more memory hungry
> * after a while these threads/classloaders eat up the heap...
>
> I've opened HIVE-18522 for this thread issue
>
> I think this problem is not new ...and it might have been present earlier
> as well...the only thing what changed is that there were a few more new
> features which have added new udfs/etc which made the memory cost of a
> session more heavier..
> ...and as a sidenote: I'm not convinced that this issue will arise in a
> proper hs2 setup - as it might be easily connected to the fact that these
> tests are using the cli driver to execute the tests.
>
>
> cheers,
> Zoltan
>
> On 7 Apr 2018 7:15 p.m., Ashutosh Chauhan  ashut...@apache.org>> wrote:
> We need to investigate and find out root cause of these failures. If its
> determined that its a corner case and fix is non-trivial then we may
> release note it under known issues. But ideally we should fix these
> failures.
> Cutting a branch should make it easier since branch is expected to receive
> lot less commits as compared to master so it should be faster to stabilize
> branch.
>
> On Fri, Apr 6, 2018 at 10:49 AM, Eugene Koifman  mailto:ekoif...@hortonworks.com><
> mailto:ekoif...@hortonworks.com>>
> wrote:
>
> Cutting the branch before the tests are stabilized would mean we have to
> fix them in 2 places.
>
> On 4/6/18, 10:05 AM, "Thejas Nair"  thejas.n...@gmail.com>> wrote:
>
>   That needs to be cleaned up. There are far too many right now, its
>   just not handful of flaky tests.
>
>
>   On Fri, Apr 6, 2018 at 2:48 AM, Peter Vary  pv...@cloudera.com>> wrote:
> Hi Team,
>
> I am new to the Hive release process and it is not clear to me how
> the failing tests are handled. Do we plan to fix the failing tests before
> release? Or it is accepted to cut a new major release with known test
> issues.
>
> Thanks,
> Peter
>
> On Apr 5, 2018, at 8:25 PM, Vineet Garg  vg...@hortonworks.com>>
> wrote:
>
> Hello,
>
> I plan to cut off branch for Hive 3.0.0 on Monday (9 April) since
> bunch of folks have big patches pending.
>
> Regards,
> Vineet G
>
> On Apr 2, 2018, at 3:14 PM, Vineet Garg  vg...@hortonworks.com>>
> wrote:
>
> Hello,
>
> We have enough votes to prepare a release candidate for Hive
> 3.0.0. I am going to cutoff a branch in a day or two. I’ll send an email as
> soon as I have the branch ready.
> Meanwhile there are approximately 69 JIRAs which are 

[jira] [Created] (HIVE-19442) convert Hive stats to deltas

2018-05-07 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19442:
---

 Summary: convert Hive stats to deltas
 Key: HIVE-19442
 URL: https://issues.apache.org/jira/browse/HIVE-19442
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


This would allow for
1) Accurate stats after partial operations like inserts.
2) Better ACID integration.

The idea is for partition stats and table stats to be written as deltas, with a 
flag that indicates this is a delta (i.e. "this insert wrote 500 rows").
The flag like this would also allow us to avoid converting old stats.
Thats can be merged after the query if appropriate locking is present and the 
table is not transactional, or by compactor, based on ACID watermarks, when the 
table is transactional.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Hive 3.0.0 release preparation

2018-05-07 Thread Vineet Garg
Hello all,

It’s been more than a month since we have cutoff branch-3. There are tests 
which are still failing 
consistently(https://issues.apache.org/jira/browse/HIVE-19142).  At this point 
we should work on fixing tests and stabilize the branch. So please refrain from 
committing anything but test fixes to branch-3.
If you have a patch beside test fix which you would like to get into branch-3 
please talk to me before committing.

Also if you are assigned a JIRA for fixing test in branch-3 please fix it as 
soon as you can.

Thanks,
Vineet Garg

On Apr 13, 2018, at 12:36 PM, Vihang Karajgaonkar 
> wrote:

Hi Vineet,

I created a profile on ptest-server so that tests can be run on branch-3.
It is the same as branch-2 patches. You will need to include branch-3 in
the patch name. Eg. HIVE-1234.01-branch-3.patch

-Vihang



On Mon, Apr 9, 2018 at 4:35 PM, Vineet Garg 
> wrote:

I have created an umbrella jira to investigate and fix test failures for
hive 3.0.0. LINK : https://issues.apache.org/jira/browse/HIVE-19142.
Please link any other existing jira related to test failure with this
umbrella jira.

Also, how do we run tests on branch-3? Is there some setup to be done?

-Vineet

On Apr 9, 2018, at 4:26 AM, Zoltan Haindrich 
<
mailto:zhaindr...@hortonworks.com>> wrote:

Hello

A few weeks earlier I've tried to hunt down this problem...
so...to my best knowledge the cause of this seems to be the following:

* in some cases the "cleanup" after a failed query may somehow leave some
threads behind...
* these threads have reference to the "customized" session classloader -
this makes the threads more memory hungry
* after a while these threads/classloaders eat up the heap...

I've opened HIVE-18522 for this thread issue

I think this problem is not new ...and it might have been present earlier
as well...the only thing what changed is that there were a few more new
features which have added new udfs/etc which made the memory cost of a
session more heavier..
...and as a sidenote: I'm not convinced that this issue will arise in a
proper hs2 setup - as it might be easily connected to the fact that these
tests are using the cli driver to execute the tests.


cheers,
Zoltan

On 7 Apr 2018 7:15 p.m., Ashutosh Chauhan 
mailto:ashut...@apache.org>>> wrote:
We need to investigate and find out root cause of these failures. If its
determined that its a corner case and fix is non-trivial then we may
release note it under known issues. But ideally we should fix these
failures.
Cutting a branch should make it easier since branch is expected to receive
lot less commits as compared to master so it should be faster to stabilize
branch.

On Fri, Apr 6, 2018 at 10:49 AM, Eugene Koifman 
<
mailto:ekoif...@hortonworks.com>>
wrote:

Cutting the branch before the tests are stabilized would mean we have to
fix them in 2 places.

On 4/6/18, 10:05 AM, "Thejas Nair" 
mailto:thejas.n...@gmail.com>>> wrote:

  That needs to be cleaned up. There are far too many right now, its
  just not handful of flaky tests.


  On Fri, Apr 6, 2018 at 2:48 AM, Peter Vary 
mailto:pv...@cloudera.com>>> wrote:
Hi Team,

I am new to the Hive release process and it is not clear to me how
the failing tests are handled. Do we plan to fix the failing tests before
release? Or it is accepted to cut a new major release with known test
issues.

Thanks,
Peter

On Apr 5, 2018, at 8:25 PM, Vineet Garg 
mailto:vg...@hortonworks.com>>>
wrote:

Hello,

I plan to cut off branch for Hive 3.0.0 on Monday (9 April) since
bunch of folks have big patches pending.

Regards,
Vineet G

On Apr 2, 2018, at 3:14 PM, Vineet Garg 
mailto:vg...@hortonworks.com>>>
wrote:

Hello,

We have enough votes to prepare a release candidate for Hive
3.0.0. I am going to cutoff a branch in a day or two. I’ll send an email as
soon as I have the branch ready.
Meanwhile there are approximately 69 JIRAs which are currently
opened with fix version 3.0.0. I’ll appreciate if their respective owners
would update the JIRA if it is a blocker. Otherwise I’ll update them to
defer the fix version to next release.

Regards,
Vineet G












[jira] [Created] (HIVE-19441) Add support for float aggregator and use LLAP test Driver

2018-05-07 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-19441:
-

 Summary: Add support for float aggregator and use LLAP test Driver
 Key: HIVE-19441
 URL: https://issues.apache.org/jira/browse/HIVE-19441
 Project: Hive
  Issue Type: Bug
Reporter: slim bouguerra
Assignee: slim bouguerra


Adding support to the float kind aggregator.

Use LLAP as test Driver to reduce execution time of tests from about 2 hours to 
15 min:

Although this patches unveiling an issue with timezone, maybe it is fixed by 
[~jcamachorodriguez] upcoming set of patches.

 

Before

{code}

[INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ 
hive-it-qfile ---
[INFO] Compiling 21 source files to 
/Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile ---
[INFO]
[INFO] ---
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6,654.117 s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:51 h
[INFO] Finished at: 2018-05-04T12:43:19-07:00
[INFO] 

{code}

After

{code}

INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ 
hive-it-qfile ---
[INFO] Compiling 22 source files to 
/Users/sbouguerra/Hdev/hive/itests/qtest/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hive-it-qfile ---
[INFO]
[INFO] ---
[INFO] T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 907.167 
s - in org.apache.hadoop.hive.cli.TestMiniDruidCliDriver
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 15:31 min
[INFO] Finished at: 2018-05-04T13:15:11-07:00
[INFO] 

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-07 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review202561
---



Added new version of patch.
Adding the result as "Unknown rows affected" for return value -1 from beeline.
Fixing test failures, and modifying tests to accommodate the change.
Further changes in this version are:
  - Using the waitForOperationToComplete method itself in 
HiveStatement#getUpdateCount, because in executeAsync mode it fails otherwise.
  - I converted the while loop to do-while in 
HiveStatement#waitForOperationToComplete, because otherwise some cases the 
response is never initialized.

- Bharathkrishna Guruvayoor Murali


On May 7, 2018, 5:58 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated May 7, 2018, 5:58 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   beeline/src/main/resources/BeeLine.properties 
> c41b3ed637e04d8d2d9800ad5e9284264f7e4055 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> b217259553be472863cd33bb2259aa700e6c3528 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 9f4e6f2e53b43839fefe1d2522a75a95d393544f 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> 01a5b4c9c328cb034a613a1539cea2584e122fb4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 
> 18f4c69a191bde3cae2d5efac5ef20fd0b1a9f0c 
>   ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 
> 28f376f8c4c2151383286e754447d1349050ef4e 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 
> 96819f4e1c446f6de423f99c7697d548ff5dbe06 
>   ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
> d2fcdaa1bfba03e1f0e4191c8d056b05f334443d 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-07 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/
---

(Updated May 7, 2018, 5:58 p.m.)


Review request for hive, Sahil Takiar and Vihang Karajgaonkar.


Bugs: HIVE-14388
https://issues.apache.org/jira/browse/HIVE-14388


Repository: hive-git


Description
---

Currently, when you run insert command on beeline, it returns a message saying 
"No rows affected .."
A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"

Added the numRows parameter as part of QueryState.
Adding the numRows to the response as well to display in beeline.

Getting the count in FileSinkOperator and setting it in statsMap, when it 
operates only on table specific rows for the particular operation. (so that we 
can get only the insert to table count and avoid counting non-table specific 
file-sink operations happening during query execution).


Diffs (updated)
-

  beeline/src/main/resources/BeeLine.properties 
c41b3ed637e04d8d2d9800ad5e9284264f7e4055 
  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
b217259553be472863cd33bb2259aa700e6c3528 
  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
06542cee02e5dc4696f2621bb45cc4f24c67dfda 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
9f4e6f2e53b43839fefe1d2522a75a95d393544f 
  ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
cf9c2273159c0d779ea90ad029613678fb0967a6 
  ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
01a5b4c9c328cb034a613a1539cea2584e122fb4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
fcdc9967f12a454a9d3f31031e2261f264479118 
  ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 
18f4c69a191bde3cae2d5efac5ef20fd0b1a9f0c 
  ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 
28f376f8c4c2151383286e754447d1349050ef4e 
  ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 
96819f4e1c446f6de423f99c7697d548ff5dbe06 
  ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
d2fcdaa1bfba03e1f0e4191c8d056b05f334443d 
  service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
  service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
b2b62c71492b844f4439367364c5c81aa62f3908 
  
service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
 15e8220eb3eb12b72c7b64029410dced33bc0d72 
  service-rpc/src/gen/thrift/gen-php/Types.php 
abb7c1ff3a2c8b72dc97689758266b675880e32b 
  service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
  service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
60183dae9e9927bd09a9676e49eeb4aea2401737 
  service/src/java/org/apache/hive/service/cli/CLIService.java 
c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
  service/src/java/org/apache/hive/service/cli/OperationStatus.java 
52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 
3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
c64c99120ad21ee98af81ec6659a2722e3e1d1c7 


Diff: https://reviews.apache.org/r/66290/diff/6/

Changes: https://reviews.apache.org/r/66290/diff/5-6/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



[jira] [Created] (HIVE-19440) Make StorageBasedAuthorizer work with information schema

2018-05-07 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-19440:
-

 Summary: Make StorageBasedAuthorizer work with information schema
 Key: HIVE-19440
 URL: https://issues.apache.org/jira/browse/HIVE-19440
 Project: Hive
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai


With HIVE-19161, Hive information schema works with external authorizer (such 
as ranger). However, we also need to make StorageBasedAuthorizer 
synchronization work as it is also widely use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19439) MapWork shouldn't be reused when Spark task fails during initialization

2018-05-07 Thread Rui Li (JIRA)
Rui Li created HIVE-19439:
-

 Summary: MapWork shouldn't be reused when Spark task fails during 
initialization
 Key: HIVE-19439
 URL: https://issues.apache.org/jira/browse/HIVE-19439
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li


Issue identified in HIVE-19388. When a Spark task fails during initializing the 
map operator, the task is retried with the same MapWork retrieved from cache. 
This can be problematic because the MapWork may be partially initialized, e.g. 
some operators are already in INIT state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19438) Test failure: org.apache.hadoop.hive.llap.daemon.services.impl.TestLlapWebServices.testContextRootUrlRewrite

2018-05-07 Thread Pravin Dsilva (JIRA)
Pravin Dsilva created HIVE-19438:


 Summary: Test failure: 
org.apache.hadoop.hive.llap.daemon.services.impl.TestLlapWebServices.testContextRootUrlRewrite
 Key: HIVE-19438
 URL: https://issues.apache.org/jira/browse/HIVE-19438
 Project: Hive
  Issue Type: Bug
  Components: llap, Tests
Reporter: Pravin Dsilva


* Error Message*

{code:java}
expected:<200> but was:<500>
{code}


*Stacktrace*

{code:java}
java.lang.AssertionError: expected:<200> but was:<500> at 
org.junit.Assert.fail(Assert.java:88) at 
org.junit.Assert.failNotEquals(Assert.java:743) at 
org.junit.Assert.assertEquals(Assert.java:118) at 
org.junit.Assert.assertEquals(Assert.java:555) at 
org.junit.Assert.assertEquals(Assert.java:542) at 
org.apache.hadoop.hive.llap.daemon.services.impl.TestLlapWebServices.getURLResponseAsString(TestLlapWebServices.java:59)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #343: HIVE-19435: Incremental replication cause data loss ...

2018-05-07 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/343

HIVE-19435: Incremental replication cause data loss if a table is dropped 
followed by create and insert-into with different partition type.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-19435

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/343.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #343


commit 5d1d9dccbd8298ee8b593283315a6ad4f0f33c6e
Author: Sankar Hariappan 
Date:   2018-05-07T08:41:05Z

HIVE-19435: Incremental replication cause data loss if a table is dropped 
followed by create and insert-into with different partition type.




---


[jira] [Created] (HIVE-19437) HiveServer2 Drops connection to Metastore when hiverserver2 webui is enabled

2018-05-07 Thread rr (JIRA)
rr created HIVE-19437:
-

 Summary: HiveServer2 Drops connection to Metastore when 
hiverserver2 webui is enabled
 Key: HIVE-19437
 URL: https://issues.apache.org/jira/browse/HIVE-19437
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, SQL, Web UI
Affects Versions: 2.1.1
Reporter: rr


 

when ssl is enabled for hiveserver2 webui on port 10002, hiveserver2 is unable 
to start up. I keeps connecting to hive metastore and then drops the connection 
and then retry again. Hiveserver2 pid will be available but its not actually UP 
as it drops the metastore connection.

Logs shows as follows :

 

2018-05-07T04:45:52,980 INFO [main] sqlstd.SQLStdHiveAccessController: Created 
SQLStdHiveAccessController for session context : HiveAuthzSessionContext 
[sessionString=9f65e1ba-8810-47ee-a370-238606f02479, clientType=HIVESERVER2]
2018-05-07T04:45:52,980 WARN [main] session.SessionState: METASTORE_FILTER_HOOK 
will be ignored, since hive.security.authorization.manager is set to instance 
of HiveAuthorizerFactory.

2018-05-07T04:45:52,981 INFO [main] hive.metastore: Mestastore configuration 
hive.metastore.filter.hook changed from 
org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook

2018-05-07T04:45:52,981 INFO [main] hive.metastore: Closed a connection to 
metastore, current connections: 0

2018-05-07T04:45:52,982 INFO [main] hive.metastore: Trying to connect to 
metastore with URI thrift://localhost:9083

2018-05-07T04:45:52,982 INFO [main] hive.metastore: Opened a connection to 
metastore, current connections: 1

2018-05-07T04:45:52,985 INFO [main] hive.metastore: Connected to metastore.

2018-05-07T04:45:52,986 INFO [main] service.CompositeService: Operation log 
root directory is created: /var/hive/hs2log/tmp

2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
Background operation thread pool size: 100

2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
Background operation thread wait queue size: 100

2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
Background operation thread keepalive time: 10 seconds

2018-05-07T04:45:52,988 INFO [main] hive.metastore: Closed a connection to 
metastore, current connections: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19436) NullPointerException while getting block info

2018-05-07 Thread Amruth S (JIRA)
Amruth S created HIVE-19436:
---

 Summary: NullPointerException while getting block info
 Key: HIVE-19436
 URL: https://issues.apache.org/jira/browse/HIVE-19436
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.3.2
Reporter: Amruth S
Assignee: Amruth S


>From hive 2.3.2, there are cases where block info object comes out to be null 
>(src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java)

Comes in this code path

 
{code:java}
if ( blockInfos.size() > 0 ) {
 InputSplit[] inputSplits = getInputSplits();
 FileSplit fS = null;
 BlockInfo bI = blockInfos.get(0);
...
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19435) Data lost when Incremental REPL LOAD with Drop partitioned table followed by create/insert non-partitioned table with same name.

2018-05-07 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-19435:
---

 Summary: Data lost when Incremental REPL LOAD with Drop 
partitioned table followed by create/insert non-partitioned table with same 
name.
 Key: HIVE-19435
 URL: https://issues.apache.org/jira/browse/HIVE-19435
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, repl
Affects Versions: 3.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan
 Fix For: 3.0.0, 3.1.0


Hive replication uses Hadoop distcp to copy files from primary to replica 
warehouse. If the HDFS block size is different across clusters, it cause file 
copy failures.
{code:java}
2018-04-09 14:32:06,690 ERROR [main] org.apache.hadoop.tools.mapred.CopyMapper: 
Failure in copying 
hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 to 
hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
java.io.IOException: File copy failed: 
hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 --> 
hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
 at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:299)
 at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:266)
 at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 to 
hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
 at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
 at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:296)
 ... 10 more
Caused by: java.io.IOException: Check-sum mismatch between 
hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 and 
hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/.distcp.tmp.attempt_1522833620762_4416_m_00_0.
 Source and target differ in block-size. Use -pb to preserve block-sizes during 
copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By 
skipping checksums, one runs the risk of masking data-corruption during 
file-transfer.)
 at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:212)
 at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:130)
 at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
 at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
 ... 11 more
{code}
Distcp failed as the CM path for the file doesn't point to source file system. 
So, it is needed to get the qualified cm root URI as part of files listed in 
dump.

Also, REPL LOAD returns success even if distcp jobs failed.

CopyUtils.doCopyRetry doesn't throw error if copy failed even after maximum 
attempts. 

So, need to perform 2 things.
 # If copy of multiple files fail for some reason, then retry with same set of 
files again but need to set CM path if original source file is missing or 
modified based on checksum. Let distcp to skip the properly copied files. 
FileUtil.copy will always overwrite the files.
 # If source path is moved to CM path, then delete the incorrectly copied files.
 # If copy fails for maximum attempt, then throw error.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)