[jira] [Created] (HIVE-16865) Handle replication bootstrap of large databases

2017-06-08 Thread anishek (JIRA)
anishek created HIVE-16865:
--

 Summary: Handle replication bootstrap of large databases
 Key: HIVE-16865
 URL: https://issues.apache.org/jira/browse/HIVE-16865
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
Assignee: anishek
 Fix For: 3.0.0


for larger databases make sure that we can handle replication bootstrap.

* Assuming large database can have close to million tables or a few tables with 
few hundred thousand partitions. 

*  for function replication if a primary warehouse has large number of custom 
functions defined such that the same binary file in corporates most of these 
functions then on the replica warehouse there might be a problem in loading all 
these functions as we will have the same jar on primary copied over for each 
function such that each function will have a local copy of the jar, loading all 
these jars might lead to excessive memory usage. 

 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] hive pull request #190: HIVE-16644: Hook Change Manager to Insert Overwrite

2017-06-08 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/190


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-16864) add validation to stream position search in LLAP IO

2017-06-08 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-16864:
---

 Summary: add validation to stream position search in LLAP IO
 Key: HIVE-16864
 URL: https://issues.apache.org/jira/browse/HIVE-16864
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran
Assignee: Sergey Shelukhin


There's a TODO there to add the checks. We've seen some issues before where 
incorrect ranges lead to obscure errors after this method returns a bad result 
due to absence of validity checks; we also see one now.
Adding the checks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Review Request 59885: HIVE-16844: Fix Connection leak in ObjectStore when new Conf object is used

2017-06-08 Thread Anthony Hsu via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59885/#review177432
---


Ship it!




Looks good to me.

- Anthony Hsu


On 六月 7, 2017, 4:29 p.m., Sunitha Beeram wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59885/
> ---
> 
> (Updated 六月 7, 2017, 4:29 p.m.)
> 
> 
> Review request for hive, Carl Steinbach, Anthony Hsu, and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-16844
> https://issues.apache.org/jira/browse/HIVE-16844
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16844: Fix Connection leak in ObjectStore when new Conf object is used
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 4676e15942d72b0db56bedf0ff30aa60964c28d8 
> 
> 
> Diff: https://reviews.apache.org/r/59885/diff/1/
> 
> 
> Testing
> ---
> 
> Can't provide unit tests to test the functionality, but problem is 
> reproducible and one way to simulate it is by setting pmf=null in 
> ObjectStore::setConf - you will notice leaked connections. With the fix the 
> same does not happen.
> 
> 
> Thanks,
> 
> Sunitha Beeram
> 
>



Re: Review Request 59885: HIVE-16844: Fix Connection leak in ObjectStore when new Conf object is used

2017-06-08 Thread Anthony Hsu via Review Board


> On 六月 7, 2017, 8:45 p.m., Anthony Hsu wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 302 (original), 304 (patched)
> > 
> >
> > Do we need to close the PersistenceManager as well?
> 
> Sunitha Beeram wrote:
> Good point, but the call to shutdown() on line 301 closes pm.

Ah, yes, thanks for pointing that out.


- Anthony


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59885/#review177222
---


On 六月 7, 2017, 4:29 p.m., Sunitha Beeram wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59885/
> ---
> 
> (Updated 六月 7, 2017, 4:29 p.m.)
> 
> 
> Review request for hive, Carl Steinbach, Anthony Hsu, and Ratandeep Ratti.
> 
> 
> Bugs: HIVE-16844
> https://issues.apache.org/jira/browse/HIVE-16844
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16844: Fix Connection leak in ObjectStore when new Conf object is used
> 
> 
> Diffs
> -
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 4676e15942d72b0db56bedf0ff30aa60964c28d8 
> 
> 
> Diff: https://reviews.apache.org/r/59885/diff/1/
> 
> 
> Testing
> ---
> 
> Can't provide unit tests to test the functionality, but problem is 
> reproducible and one way to simulate it is by setting pmf=null in 
> ObjectStore::setConf - you will notice leaked connections. With the fix the 
> same does not happen.
> 
> 
> Thanks,
> 
> Sunitha Beeram
> 
>



[jira] [Created] (HIVE-16863) Vectorization: See if instanceof checks added to VectorAssignRow in HIVE-16589 can be removed...

2017-06-08 Thread Matt McCline (JIRA)
Matt McCline created HIVE-16863:
---

 Summary: Vectorization: See if instanceof checks added to 
VectorAssignRow in HIVE-16589 can be removed...
 Key: HIVE-16863
 URL: https://issues.apache.org/jira/browse/HIVE-16863
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Teddy Choi
Priority: Critical


(Wait for HIVE-16589 commit).

During HIVE-16589 when VectorUDFAdaptor was converted to use VectorAssignRow 
instead of adding Complex Type variants to setOutputCol we added instanceof 
checks since data from VectorUDFAdaptor is sometimes a Byte object instead of a 
ByteWritable, etc.

In code review, Jason suggested we might be able to make use of the output 
object inspectors.  Is there a way to add (perhaps) a 2nd method(s) that can 
take advantage of its object inspectors and not do instanceof stuff?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16862) Implement a similar feature like "hive.tez.dynamic.semijoin.reduction" in hive on spark

2017-06-08 Thread liyunzhang_intel (JIRA)
liyunzhang_intel created HIVE-16862:
---

 Summary: Implement a similar feature like 
"hive.tez.dynamic.semijoin.reduction" in hive on spark
 Key: HIVE-16862
 URL: https://issues.apache.org/jira/browse/HIVE-16862
 Project: Hive
  Issue Type: Bug
Reporter: liyunzhang_intel


Currently if we enable "hive.tez.dynamic.semijoin.reduction" (the default value 
is true) in hive on spark, following script fail
{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.dynamic.partition.pruning=true;

-- multiple sources, single key
select count(*) from srcpart join srcpart_date on (srcpart.ds = 
srcpart_date.ds) join srcpart_hour on (srcpart.hr = srcpart_hour.hr) {code}
{code}
the reason why this fail see HIVE-16780, currently we only disable 
"hive.tez.dynamic.semijoin.reduction" when running hive on spark to pass the 
test.  Later we can implement a similar feature like what hive on tez does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16861) MapredParquetOutputFormat - Save Some Array Allocations

2017-06-08 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-16861:
--

 Summary: MapredParquetOutputFormat - Save Some Array Allocations
 Key: HIVE-16861
 URL: https://issues.apache.org/jira/browse/HIVE-16861
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.1.1, 3.0.0
Reporter: BELUGA BEHR
Priority: Trivial


Remove superfluous array allocations from {{MapredParquetOutputFormat}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16860) HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 breaks at runtime.

2017-06-08 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-16860:
---

 Summary: HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 
breaks at runtime.
 Key: HIVE-16860
 URL: https://issues.apache.org/jira/browse/HIVE-16860
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0, 0.14.0
Reporter: Chris Drome
Assignee: Jason Dere
 Fix For: 0.14.0


The signature for HostUtil.getTaskLogUrl has changed between Hadoop-2.3 and 
Hadoop-2.4.

Code in 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java works 
with Hadoop-2.3 method and causes compilation failure with Hadoop-2.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16859) CM uri encoding

2017-06-08 Thread anishek (JIRA)
anishek created HIVE-16859:
--

 Summary: CM uri encoding
 Key: HIVE-16859
 URL: https://issues.apache.org/jira/browse/HIVE-16859
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
Assignee: anishek


Currently for hive replication, the cm root uri is configured via 
"hive.repl.cmrootdir". This configuration needs to have the same value on both 
the primary and replica hive warehouse. 

CM uri should be encoded such that the cm root of the source should be part of 
the URI itself. so the cmfs uri's should be following

{code
}cmfs:hdfs://[authority]/[actual_location]#[checksum_of_file]_[encoded_cm_root_on_primary]
{code}

so that we can detect what is the root location of the source cm root at any 
target replica warehouse. Since the filesystem configurations can be different 
for the  primary and replica warehouse there might be additional configurations 
will be required to create {{FileSystem}} objects to talk to respective 
filesystems. if we want to support that we can add an additional configuration 
stating the primary cm root location on the replica warehouse along with other 
fs related configurations and in that case this bug might be irrelevant.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)