[jira] [Commented] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-04 Thread David Phillips (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783706#comment-16783706
 ] 

David Phillips commented on HIVE-21376:
---

I believe that v2 will have a similar incompatible change between 3.0 and 3.1 
for {{TIMESTAMP}} due to the time value coming from {{java.sql.Timestamp}} 
changing from local to UTC.

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe

2019-01-26 Thread David Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Phillips updated HIVE-15773:
--
Attachment: HIVE-15773.3.patch
Status: Patch Available  (was: Open)

> HCatRecordObjectInspectorFactory is not thread safe
> ---
>
> Key: HIVE-15773
> URL: https://issues.apache.org/jira/browse/HIVE-15773
> Project: Hive
>  Issue Type: Bug
>Reporter: David Phillips
>Priority: Major
> Attachments: HIVE-15773.2.patch, HIVE-15773.3.patch, HIVE-15773.patch
>
>
> {{HashMap}} used without synchronization for the caches, which makes the code 
> unsafe for use in a multi-threaded environment such as Presto (or Spark?). 
> The simple fix is to switch them to {{ConcurrentHashMap}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe

2019-01-26 Thread David Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Phillips updated HIVE-15773:
--
Status: Open  (was: Patch Available)

> HCatRecordObjectInspectorFactory is not thread safe
> ---
>
> Key: HIVE-15773
> URL: https://issues.apache.org/jira/browse/HIVE-15773
> Project: Hive
>  Issue Type: Bug
>Reporter: David Phillips
>Priority: Major
> Attachments: HIVE-15773.2.patch, HIVE-15773.patch
>
>
> {{HashMap}} used without synchronization for the caches, which makes the code 
> unsafe for use in a multi-threaded environment such as Presto (or Spark?). 
> The simple fix is to switch them to {{ConcurrentHashMap}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe

2019-01-26 Thread David Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Phillips updated HIVE-15773:
--
Attachment: HIVE-15773.2.patch
Status: Patch Available  (was: Open)

> HCatRecordObjectInspectorFactory is not thread safe
> ---
>
> Key: HIVE-15773
> URL: https://issues.apache.org/jira/browse/HIVE-15773
> Project: Hive
>  Issue Type: Bug
>Reporter: David Phillips
>Priority: Major
> Attachments: HIVE-15773.2.patch, HIVE-15773.patch
>
>
> {{HashMap}} used without synchronization for the caches, which makes the code 
> unsafe for use in a multi-threaded environment such as Presto (or Spark?). 
> The simple fix is to switch them to {{ConcurrentHashMap}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe

2019-01-26 Thread David Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Phillips updated HIVE-15773:
--
Status: Open  (was: Patch Available)

> HCatRecordObjectInspectorFactory is not thread safe
> ---
>
> Key: HIVE-15773
> URL: https://issues.apache.org/jira/browse/HIVE-15773
> Project: Hive
>  Issue Type: Bug
>Reporter: David Phillips
>Priority: Major
> Attachments: HIVE-15773.patch
>
>
> {{HashMap}} used without synchronization for the caches, which makes the code 
> unsafe for use in a multi-threaded environment such as Presto (or Spark?). 
> The simple fix is to switch them to {{ConcurrentHashMap}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe

2019-01-26 Thread David Phillips (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Phillips updated HIVE-15773:
--
Attachment: HIVE-15773.patch
Status: Patch Available  (was: Open)

> HCatRecordObjectInspectorFactory is not thread safe
> ---
>
> Key: HIVE-15773
> URL: https://issues.apache.org/jira/browse/HIVE-15773
> Project: Hive
>  Issue Type: Bug
>Reporter: David Phillips
>Priority: Major
> Attachments: HIVE-15773.patch
>
>
> {{HashMap}} used without synchronization for the caches, which makes the code 
> unsafe for use in a multi-threaded environment such as Presto (or Spark?). 
> The simple fix is to switch them to {{ConcurrentHashMap}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-178) SELECT without FROM should assume a one-row table with no columns.

2016-09-15 Thread David Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Phillips resolved HIVE-178.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

{code}
hive> select 1+1;
OK
2
Time taken: 1.104 seconds, Fetched: 1 row(s)
{code}

> SELECT without FROM should assume a one-row table with no columns.
> --
>
> Key: HIVE-178
> URL: https://issues.apache.org/jira/browse/HIVE-178
> Project: Hive
>  Issue Type: Wish
>  Components: Query Processor, Testing Infrastructure
>Reporter: Adam Kramer
>Priority: Minor
>  Labels: SQL
> Fix For: 0.13.0
>
>
> SELECT 1+1;
> should just return '2', but instead hive fails because no table is listed.
> SELECT 1+1 FROM (empty table);
> should also just return '2', but instead hive "succeeds" because there is "no 
> possible output," so it produces no output.
> So, currently we have to run 
> SELECT 1+1 FROM (silly one-row dummy table);
> ...which runs a whole mapreduce step to ignore a column of data that is 
> useless anyway. This is much easier due to local mode, but still, it would be 
> nice to be able to SELECT without specifying a table and to get one row of 
> output in moments instead of waiting for even a local-mode job to launch, 
> complete, and return.
> This is especially useful for testing UDFs.
> Relatedly, an optimization by which Hive can tell that data from a table 
> isn't even USED would be useful, because it means that the data needn't be 
> queried...the only relevant info from the table would be the number of rows 
> it has, which is available for free from the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)