[jira] [Created] (HIVE-23180) Remove unused variables from tez build dag
Mustafa Iman created HIVE-23180: --- Summary: Remove unused variables from tez build dag Key: HIVE-23180 URL: https://issues.apache.org/jira/browse/HIVE-23180 Project: Hive Issue Type: Improvement Reporter: Mustafa Iman Assignee: Mustafa Iman Attachments: HIVE-23180.patch This is a simple refactoring around TezTask build dag functionality. Unused options are removed from function calls. Also some variables are given meaningful names. Gets rid of unneccessary filesystem creation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23179) Show create table is not showing SerDe Properties in unicode
Naresh P R created HIVE-23179: - Summary: Show create table is not showing SerDe Properties in unicode Key: HIVE-23179 URL: https://issues.apache.org/jira/browse/HIVE-23179 Project: Hive Issue Type: Bug Reporter: Naresh P R Assignee: Naresh P R Table with special character delimiters are not shown in show create output eg., create external table test(age int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001' stored as textfile; Show create output ++ | createtab_stmt | ++ | CREATE EXTERNAL TABLE `test`(| | `age` int, | | `name` string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'='', | | 'serialization.format'='') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://abcd:8020/warehouse/tablespace/external/hive/testca' | | TBLPROPERTIES (| | 'bucketing_version'='2', | | 'discover.partitions'='true',| | 'transient_lastDdlTime'='1577162310')| ++ Few client console not able to show ^A (Ctrl + A) properly. It's better to show the output in unicode as shown in desc formatted. | Storage Desc Params: | NULL | NULL | | | field.delim | \u0001 | | | serialization.format | \u0001 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23178) Add Tez Total Order Partitioner
Roohi Syeda created HIVE-23178: -- Summary: Add Tez Total Order Partitioner Key: HIVE-23178 URL: https://issues.apache.org/jira/browse/HIVE-23178 Project: Hive Issue Type: Bug Reporter: Roohi Syeda -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Many ANTLR Tokens
Hello Gang, I've been investigating this issue. This should no longer be an issue with ANTL4 (and ANTLR3 is long since stopped seeing development circa 2014). However, ANTL4 is not fully backwards compatible with ANTL3. In particular, ANTL4 changes how it approaches "rewrite rules" operations. ANTLR3 heavily uses these operations and therefore it is quite a lift to get this upgrade done. Not to mention, as I work on fixing some of these things, we may want to backport to HIVE 3.x branches. https://issues.apache.org/jira/browse/HIVE-23177 I also looked at possibly writing a tool that will break up the java file that ANTL3 produces into smaller pieces, but this would require that I create another Maven module in Hive just for this purpose. It would be a custom Maven Plugin that performs this action of reading in the source code and then chopping it up a bit to make the compiler happy. This is possible, but adds quite a bit of overheard to the project (yet another Maven module to manage). We can also just remove the duplicate token names. I understand that its design grants flexibility, but SQL is a pretty tight standard at this point and I don't see Hive leveraging this in any meaningful way. This would be the path of least resistance. Thoughts? Thanks. On Thu, Apr 9, 2020 at 6:36 PM David Mollitor wrote: > Hello Gang, > > I am investigating HIVE-23172 and I am having a problem addressing this > because I am getting the following error from compiling the grammar: > > hive-parser: Compilation failure > [ERROR] > /home/apache/hive/hive/parser/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java:[40,38] > code too large > > I traced it down to the fact that there are too many token defined. In > HiveParser.java, it has the following: > > public static final String[] tokenNames = new String[] { ... }; > > That list is so long, it's breaking Java compilation. Someone else came > across this awhile ago: HIVE-15577. > > I observed that the parser defines two token for most elements, for > example: > > KW_TRUNCATE / TOK_TRUNCATETABLE > > What is the value of having both? Can we consolidate this down to one and > conserve some space? I would propose just using TOK_TRUNCATE and get rid > of the KW version. > > Does anyone have an insight into why things are setup the way they are? >
[jira] [Created] (HIVE-23177) Upgrade to ANTLR4
David Mollitor created HIVE-23177: - Summary: Upgrade to ANTLR4 Key: HIVE-23177 URL: https://issues.apache.org/jira/browse/HIVE-23177 Project: Hive Issue Type: Improvement Reporter: David Mollitor Upgrade Hive to ANTL4, ANTLR3 lost support many moons ago. This is going to be a big lift. Many of the Hive rules use the "rule rewrite" feature which no longer exists in ANLTR4 and it must be completely re-implemented: https://stackoverflow.com/questions/14565794/antlr-4-tree-inject-rewrite-operator -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23176) Remove REGEX Column Feature
David Mollitor created HIVE-23176: - Summary: Remove REGEX Column Feature Key: HIVE-23176 URL: https://issues.apache.org/jira/browse/HIVE-23176 Project: Hive Issue Type: Improvement Reporter: David Mollitor Remove the Hive feature: REGEX Column. Hive has this interesting feature for doing REGEX to SELECT multiple columns. This needs to go. It is not SQL standard and as currently implemented, it is impossible to determine if a column identifier is a REGEX or the actual name of the column. If a column name is enclosed in back ticks then any UTF-8 character is a valid table name. [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html] [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select] -- This message was sent by Atlassian Jira (v8.3.4#803005)