[jira] [Created] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread Naresh P R (Jira)
Naresh P R created HIVE-24036:
-

 Summary: Kryo Exception while serializing plan for getSplits UDF 
call
 Key: HIVE-24036
 URL: https://issues.apache.org/jira/browse/HIVE-24036
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R


{code:java}
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization 
trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
(org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
(org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24035) Add Jenkinsfile for branch-2.3

2020-08-12 Thread Chao Sun (Jira)
Chao Sun created HIVE-24035:
---

 Summary: Add Jenkinsfile for branch-2.3
 Key: HIVE-24035
 URL: https://issues.apache.org/jira/browse/HIVE-24035
 Project: Hive
  Issue Type: Test
Reporter: Chao Sun
Assignee: Chao Sun


To enable precommit tests for github PR, we need to have a Jenkinsfile in the 
repo. This is already done for master and branch-2. This adds the same for 
branch-2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Apply for permission to edit Hive Wikipages

2020-08-12 Thread Liquan Pei
My Apache Confluence ID is liquanpei.

On Wed, Aug 12, 2020 at 10:43 AM Liquan Pei  wrote:

> Hi,
>
> I am applying for permission to edit Hive wiki pages. We
> recently successfully make TiDB (https://github.com/pingcap/tidb) a Hive
> metastore backend. TiDB is a scalable database that can help solve the We
> would like to share the step by step instruction on how to use TiDB as Hive
> metastore to the community.
>
> Best,
> Liquan
>
> --
> Liquan Pei
> Senior Database Engineer, PingCAP
>


-- 
Liquan Pei
Software Engineer, Confluent Inc


Apply for permission to edit Hive Wikipages

2020-08-12 Thread Liquan Pei
Hi,

I am applying for permission to edit Hive wiki pages. We
recently successfully make TiDB (https://github.com/pingcap/tidb) a Hive
metastore backend. TiDB is a scalable database that can help solve the We
would like to share the step by step instruction on how to use TiDB as Hive
metastore to the community.

Best,
Liquan

-- 
Liquan Pei
Senior Database Engineer, PingCAP


[jira] [Created] (HIVE-24034) Add getTable to HS2 local cache

2020-08-12 Thread Soumyakanti Das (Jira)
Soumyakanti Das created HIVE-24034:
--

 Summary: Add getTable to HS2 local cache
 Key: HIVE-24034
 URL: https://issues.apache.org/jira/browse/HIVE-24034
 Project: Hive
  Issue Type: New Feature
Reporter: Soumyakanti Das


getTable is called from many other APIs. Although its latency is not that high 
(from tests with TPCDS in DWX), it'd be good to cache it to improve the query 
compilation latency even further.
However, it looks like this is not as straightforward as caching other APIs 
like listPartitionsByExpr or getAggrColStatsFor as these APIs include the 
tableID in its key but getTable cannot do this (without getting the table 
first). 
The problem arises when a table is dropped and recreated with the same name, 
because then they have same dbName, tblName, and writeIdList in the 
GetTableRequest object, which is used as the key in the cache. It'd be good to 
somehow utilize the id field of the request as well.
More investigation is required for this feature.



Related:
[HIVE-23949|https://issues.apache.org/jira/browse/HIVE-23949]

[HIVE-24025|https://issues.apache.org/jira/browse/HIVE-24025]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24033) full outer join returns wrong number of results if hive.optimize.joinreducededuplication is enabled

2020-08-12 Thread Sebastian Klemke (Jira)
Sebastian Klemke created HIVE-24033:
---

 Summary: full outer join returns wrong number of results if 
hive.optimize.joinreducededuplication is enabled
 Key: HIVE-24033
 URL: https://issues.apache.org/jira/browse/HIVE-24033
 Project: Hive
  Issue Type: Bug
Reporter: Sebastian Klemke


We encountered a hive query that returns incorrect results, when joining two 
CTEs on a group by value. The input tables `id_table` and
`reference_table` are unfortunately too large to share and on smaller tables we 
have not been able to reproduce.

{code}
WITH ids AS (
SELECT
record.id AS id
FROM
`id_table`
LATERAL VIEW explode(records) r AS record
WHERE
record.id = '5ef0bad74d325f72f0360c19'
LIMIT 1
),

refs AS (
SELECT
reference['id'] AS referenceId
FROM
`reference_table`
WHERE
partition_date = '2020-06-24'
AND type = '1b0e9eb5c492d1859815410253dd79b5'
AND reference['id'] = '5ef0bad74d325f72f0360c19'
GROUP BY
reference['id']
)

SELECT
l.id AS id
, r.referenceId AS referenceId
FROM 
ids l
FULL OUTER JOIN
refs r
ON
l.id = r.referenceId
{code}

This returns 2 rows, because the join clause misses: 

{code}
OK
5ef0bad74d325f72f0360c19NULL
NULL5ef0bad74d325f72f0360c19
{code}

Instead, a single row should be returned. The correct behavior can be achieved 
by either 

 * calling lower() on the refs group by statement (doesn't change the string 
contents)
 * setting hive.optimize.joinreducededuplication=false




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-12 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-24032:
--

 Summary: Remove hadoop shims dependency and use FileSystem Api 
directly from standalone metastore
 Key: HIVE-24032
 URL: https://issues.apache.org/jira/browse/HIVE-24032
 Project: Hive
  Issue Type: Task
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24031) Infinite planning time on syntactically big queries

2020-08-12 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-24031:
--

 Summary: Infinite planning time on syntactically big queries
 Key: HIVE-24031
 URL: https://issues.apache.org/jira/browse/HIVE-24031
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
 Fix For: 4.0.0


Syntactically big queries (~1 million tokens), such as the query shown below, 
lead to very big (seemingly infinite) planning times.

{code:sql}
select posexplode(array('item1', 'item2', ..., 'item1M'));
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Triggering tests for Hive PR or patch

2020-08-12 Thread Chao Sun
Thanks Zoltán! I'll take a look at the Jenkinsfile and see if we can add
one for branch-2.3.

On Tue, Aug 11, 2020 at 10:25 PM Zoltán Haindrich  wrote:

> Hey Chao!
>
> Attaching a patch file will not work - however you could try adding a
> Jenkinsfile for branch-2.3 as well(like David done for branch-2) however
> you should expect a much longer execution time, because longer cli tests
> will not be split - because of this please make sure to increase the
> timeout to around 12 hours.
> ...there is a chance that you will bump into some already known/fixed
> issues, I think I remember a few of them...so ping me in case you need help
>
> cheers,
> Zoltan
>
> On August 12, 2020 6:33:36 AM GMT+02:00, Chao Sun 
> wrote:
>>
>> Ping. Does anyone know about this?
>>
>> Chao
>>
>> On Thu, Aug 6, 2020 at 5:17 PM Chao Sun  wrote:
>>
>>  Hi,
>>>
>>>  Does anyone know if github PR works for branches other than master? and if
>>>  so what is the way to trigger tests? if not, does attaching patch ending
>>>  with branch-2.3.patch still work?
>>>
>>>  Thanks,
>>>  Chao
>>>
>>>
> --
> Zoltán Haindrich
>