[jira] [Commented] (HIVE-21376) Incompatible change in Hive bucket computation
[ https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783706#comment-16783706 ] David Phillips commented on HIVE-21376: --- I believe that v2 will have a similar incompatible change between 3.0 and 3.1 for {{TIMESTAMP}} due to the time value coming from {{java.sql.Timestamp}} changing from local to UTC. > Incompatible change in Hive bucket computation > -- > > Key: HIVE-21376 > URL: https://issues.apache.org/jira/browse/HIVE-21376 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: David Phillips >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-21376.patch > > > HIVE-20007 seems to have inadvertently changed the bucket hash code > computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the > {{DATE}} and {{TIMESTAMP}} data type2. > {{DATE}} was previously computed using {{DateWritable}}, which uses > {{daysSinceEpoch}} as the hash code. It is now computed using > {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} > (which is not days since epoch). > {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses > {{TimestampWritableV2}}. They ostensibly use the same hash code computation, > but there are two important differences: > # {{TimestampWritable}} rounds the number of milliseconds into the seconds > portion of the computation, but {{TimestampWritableV2}} does not. > # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, > which returns it relative to the JVM time zone, not UTC. > {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC. > I was unable to get Hive 3.1 running in order to verify if this actually > causes data to be read or written incorrectly (there may be code above this > library method which makes things work correctly). However, if my > understanding is correct, this means Hive 3.1 is both forwards and backwards > incompatible with bucketed tables using either of these data types. It also > indicates that Hive needs tests to verify that the hash code does not change > between releases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe
[ https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Phillips updated HIVE-15773: -- Attachment: HIVE-15773.3.patch Status: Patch Available (was: Open) > HCatRecordObjectInspectorFactory is not thread safe > --- > > Key: HIVE-15773 > URL: https://issues.apache.org/jira/browse/HIVE-15773 > Project: Hive > Issue Type: Bug >Reporter: David Phillips >Priority: Major > Attachments: HIVE-15773.2.patch, HIVE-15773.3.patch, HIVE-15773.patch > > > {{HashMap}} used without synchronization for the caches, which makes the code > unsafe for use in a multi-threaded environment such as Presto (or Spark?). > The simple fix is to switch them to {{ConcurrentHashMap}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe
[ https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Phillips updated HIVE-15773: -- Status: Open (was: Patch Available) > HCatRecordObjectInspectorFactory is not thread safe > --- > > Key: HIVE-15773 > URL: https://issues.apache.org/jira/browse/HIVE-15773 > Project: Hive > Issue Type: Bug >Reporter: David Phillips >Priority: Major > Attachments: HIVE-15773.2.patch, HIVE-15773.patch > > > {{HashMap}} used without synchronization for the caches, which makes the code > unsafe for use in a multi-threaded environment such as Presto (or Spark?). > The simple fix is to switch them to {{ConcurrentHashMap}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe
[ https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Phillips updated HIVE-15773: -- Attachment: HIVE-15773.2.patch Status: Patch Available (was: Open) > HCatRecordObjectInspectorFactory is not thread safe > --- > > Key: HIVE-15773 > URL: https://issues.apache.org/jira/browse/HIVE-15773 > Project: Hive > Issue Type: Bug >Reporter: David Phillips >Priority: Major > Attachments: HIVE-15773.2.patch, HIVE-15773.patch > > > {{HashMap}} used without synchronization for the caches, which makes the code > unsafe for use in a multi-threaded environment such as Presto (or Spark?). > The simple fix is to switch them to {{ConcurrentHashMap}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe
[ https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Phillips updated HIVE-15773: -- Status: Open (was: Patch Available) > HCatRecordObjectInspectorFactory is not thread safe > --- > > Key: HIVE-15773 > URL: https://issues.apache.org/jira/browse/HIVE-15773 > Project: Hive > Issue Type: Bug >Reporter: David Phillips >Priority: Major > Attachments: HIVE-15773.patch > > > {{HashMap}} used without synchronization for the caches, which makes the code > unsafe for use in a multi-threaded environment such as Presto (or Spark?). > The simple fix is to switch them to {{ConcurrentHashMap}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-15773) HCatRecordObjectInspectorFactory is not thread safe
[ https://issues.apache.org/jira/browse/HIVE-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Phillips updated HIVE-15773: -- Attachment: HIVE-15773.patch Status: Patch Available (was: Open) > HCatRecordObjectInspectorFactory is not thread safe > --- > > Key: HIVE-15773 > URL: https://issues.apache.org/jira/browse/HIVE-15773 > Project: Hive > Issue Type: Bug >Reporter: David Phillips >Priority: Major > Attachments: HIVE-15773.patch > > > {{HashMap}} used without synchronization for the caches, which makes the code > unsafe for use in a multi-threaded environment such as Presto (or Spark?). > The simple fix is to switch them to {{ConcurrentHashMap}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HIVE-178) SELECT without FROM should assume a one-row table with no columns.
[ https://issues.apache.org/jira/browse/HIVE-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Phillips resolved HIVE-178. - Resolution: Fixed Fix Version/s: 0.13.0 {code} hive> select 1+1; OK 2 Time taken: 1.104 seconds, Fetched: 1 row(s) {code} > SELECT without FROM should assume a one-row table with no columns. > -- > > Key: HIVE-178 > URL: https://issues.apache.org/jira/browse/HIVE-178 > Project: Hive > Issue Type: Wish > Components: Query Processor, Testing Infrastructure >Reporter: Adam Kramer >Priority: Minor > Labels: SQL > Fix For: 0.13.0 > > > SELECT 1+1; > should just return '2', but instead hive fails because no table is listed. > SELECT 1+1 FROM (empty table); > should also just return '2', but instead hive "succeeds" because there is "no > possible output," so it produces no output. > So, currently we have to run > SELECT 1+1 FROM (silly one-row dummy table); > ...which runs a whole mapreduce step to ignore a column of data that is > useless anyway. This is much easier due to local mode, but still, it would be > nice to be able to SELECT without specifying a table and to get one row of > output in moments instead of waiting for even a local-mode job to launch, > complete, and return. > This is especially useful for testing UDFs. > Relatedly, an optimization by which Hive can tell that data from a table > isn't even USED would be useful, because it means that the data needn't be > queried...the only relevant info from the table would be the number of rows > it has, which is available for free from the metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)