[jira] [Created] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on
Harsh J created HIVE-15908: -- Summary: OperationLog's LogFile writer should have autoFlush turned on Key: HIVE-15908 URL: https://issues.apache.org/jira/browse/HIVE-15908 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Harsh J Assignee: Harsh J Priority: Minor The HS2 offers an API to fetch Operation Log results from the maintained OperationLog file. The reader used inside class OperationLog$LogFile class reads line-by-line on its input stream, for any lines available from the OS's file input perspective. The writer inside the same class uses PrintStream to write to the file in parallel. However, the PrintStream constructor used sets PrintStream's {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by PrintStream to accumulate 8k worth of bytes in memory as the buffer before flushing the writes to disk, causing a slowness in the logs streamed back to the client. Every line must be ideally flushed entirely as-its-written, for a smoother experience. I suggest changing the line inside {{OperationLog$LogFile}} that appears as below: {code} out = new PrintStream(new FileOutputStream(file)); {code} Into: {code} out = new PrintStream(new FileOutputStream(file), true); {code} This will cause it to use the described autoFlush feature of PrintStream and make for a better reader-log-results-streaming experience: https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-14593) Non-canonical integer partition columns do not work with IN operations
Harsh J created HIVE-14593: -- Summary: Non-canonical integer partition columns do not work with IN operations Key: HIVE-14593 URL: https://issues.apache.org/jira/browse/HIVE-14593 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Harsh J The below use-case no longer works (tested on a PostgresQL backed HMS using JDO): {code} CREATE TABLE foo (a STRING) PARTITIONED BY (b INT, c INT); ALTER TABLE foo ADD PARTITION (b='07', c='08'); LOAD DATA LOCAL INPATH '/etc/hostname' INTO TABLE foo PARTITION(b='07', c='08'); -- Does not work if you provide a string IN variable: SELECT a, c FROM foo WHERE b IN ('07'); (No rows selected) -- Works if you provide it in integer forms: SELECT a, c FROM foo WHERE b IN (07); (1 row(s) selected) SELECT a, c FROM foo WHERE b IN (7); (1 row(s) selected) {code} This worked fine prior to HIVE-8099. The change of HIVE-8099 is inducing a double conversion on the partition column input, such that the IN GenericUDFIn now receives b's value as a column type converted canonical integer 7, as opposed to an as-is DB stored non-canonical value 07. Subsequently the GenericUDFIn again up-converts the b's value to match its argument's value types instead, making 7 (int) into a string "7". Then, "7" is compared against "07" which naturally never matches. As a regression, this breaks anyone upgrading pre-1.0 to 1.0 or higher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()
Harsh J created HIVE-13704: -- Summary: Don't call DistCp.execute() instead of DistCp.run() Key: HIVE-13704 URL: https://issues.apache.org/jira/browse/HIVE-13704 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0, 1.3.0 Reporter: Harsh J Priority: Critical HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} method runs added logic that drives the state of {{SimpleCopyListing}} which runs in the driver, and of {{CopyCommitter}} which runs in the job runtime. When Hive ends up running DistCp for copy work (Between non matching FS or between encrypted/non-encrypted zones, for sizes above a configured value) this state not being set causes wrong paths to appear on the target (subdirs named after the file, instead of just the file). Hive should call DistCp's Tool {{run}} method and not the {{execute}} method directly, to not skip the target exists flag that the {{setTargetPathExists}} call would set: https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13275) Add a toString method to BytesRefArrayWritable
Harsh J created HIVE-13275: -- Summary: Add a toString method to BytesRefArrayWritable Key: HIVE-13275 URL: https://issues.apache.org/jira/browse/HIVE-13275 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 1.1.0 Reporter: Harsh J Assignee: Harsh J Priority: Trivial Attachments: HIVE-13275.000.patch RCFileInputFormat cannot be used externally for Hadoop Streaming today cause Streaming generally relies on the K/V pairs to be able to emit text representations (via toString()). Since BytesRefArrayWritable has no toString() methods, the usage of the RCFileInputFormat causes object representation prints which are not useful. Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an array), so its important to output them in a valid/parseable manner, as opposed to choosing a simple joining delimiter over the string representations of the inner elements. I propose adding a standardised CSV formatting of the array data, such that users of Streaming can then parse the results in their own script. Since we have OpenCSV as a dependency already, we can make use of it for this purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11325) Infinite loop in HiveHFileOutputFormat
Harsh J created HIVE-11325: -- Summary: Infinite loop in HiveHFileOutputFormat Key: HIVE-11325 URL: https://issues.apache.org/jira/browse/HIVE-11325 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 1.0.0 Reporter: Harsh J No idea why {{hbase_handler_bulk.q}} does not catch this if its being run regularly in Hive builds, but here's the gist of the issue: The condition at https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164 indicates that we will infinitely loop until we find a file whose last path component (the name) is equal to the column family name. In execution, however, the iteration enters an actual infinite loop cause the file we end up considering as the srcDir name, is actually the region file, whose name will never match the family name. This is an example of the IPC the listing loop of a 100% progress task gets stuck in: {code} 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Call - cdh54.vm/172.16.29.132:8020: getListing {src: /user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c startAfter: needLocation: false} 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive sending #510346 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got value #510346 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getListing took 0ms 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Response - cdh54.vm/172.16.29.132:8020: getListing {dirList { partialListing { fileType: IS_FILE path: length: 863 permission { perm: 4600 } owner: hive group: hive modification_time: 1437454718130 access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }} {code} The path we are getting out of the listing results is {{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_00_0/family/97112ac1c09548ae87bd85af072d2e8c}}, but instead of checking the path's parent {{family}} we're instead looping infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} cause it does not match {{family}}. It stays in the infinite loop therefore, until the MR framework kills it away due to an idle task timeout (and then since the subsequent task attempts fail outright, the job fails). While doing a {{getPath().getParent()}} will resolve that, is that infinite loop even necessary? Especially given the fact that we throw exceptions if there are no entries or there is more than one entry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons
Harsh J created HIVE-9870: - Summary: Add JvmPauseMonitor threads to HMS and HS2 daemons Key: HIVE-9870 URL: https://issues.apache.org/jira/browse/HIVE-9870 Project: Hive Issue Type: Improvement Components: HiveServer2, Metastore Affects Versions: 1.1 Reporter: Harsh J Assignee: Harsh J Priority: Minor The hadoop-common carries in it a nifty thread that prints GC or non-GC pauses within the JVM if it exceeds a specific threshold. This has been immeasurably useful in supporting several clusters, in identifying GC or other form of process pauses to be the root cause of some event being investigated. The HMS and HS2 daemons are good targets for running similar threads within it. It can be loaded in an if-available style. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline
[ https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281532#comment-14281532 ] Harsh J commented on HIVE-7224: --- HIVE-7448 is resolved in 0.13 via HIVE-3611. Can this patch go in? I got no review authority, but the patch change appears good enough to me. (Except the unwanted changes such as whitespace fixes, which should ideally just be done across all the source files with a direct, single commit than by polluting random commits with such changes -- but thats just IMHO). Set incremental printing to true by default in Beeline -- Key: HIVE-7224 URL: https://issues.apache.org/jira/browse/HIVE-7224 Project: Hive Issue Type: Bug Components: Clients, JDBC Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-7224.1.patch See HIVE-7221. By default beeline tries to buffer the entire output relation before printing it on stdout. This can cause OOM when the output relation is large. However, beeline has the option of incremental prints. We should keep that as the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7164) Support non-string partition types in HCatalog
[ https://issues.apache.org/jira/browse/HIVE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228730#comment-14228730 ] Harsh J commented on HIVE-7164: --- The Scan Filter on the MR+HCatalog page at https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-ScanFilter could be an ideal place, although it doesn't cover use of the same filtering technique via Pig or so at the same time. Support non-string partition types in HCatalog -- Key: HIVE-7164 URL: https://issues.apache.org/jira/browse/HIVE-7164 Project: Hive Issue Type: Bug Components: HCatalog Reporter: bharath v Currently querying hive tables with non-string partition columns using HCat gives us the following error. Error: Filtering is supported only on partition keys of type string Related discussion here : https://www.mail-archive.com/dev@hive.apache.org/msg18011.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7164) Support non-string partition types in HCatalog
[ https://issues.apache.org/jira/browse/HIVE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-7164. --- Resolution: Duplicate Resolved via HIVE-2702 for HMS. You will need to set property {{hive.metastore.integral.jdo.pushdown}} to {{true}} on the HMS' hive-site.xml to enable this ability, however. It is false by default. Support non-string partition types in HCatalog -- Key: HIVE-7164 URL: https://issues.apache.org/jira/browse/HIVE-7164 Project: Hive Issue Type: Bug Components: HCatalog Reporter: bharath v Currently querying hive tables with non-string partition columns using HCat gives us the following error. Error: Filtering is supported only on partition keys of type string Related discussion here : https://www.mail-archive.com/dev@hive.apache.org/msg18011.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6468) HS2 out of memory error when curl sends a get request
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207308#comment-14207308 ] Harsh J commented on HIVE-6468: --- The issue is not resolved. For the issue to be resolved, either Hive should update its thrift dependency to 0.9.2 (or higher), to pull in the THFIT-2660 fix, or Hive should RTC the alternative patch created here. If the former is done, the config will not be introduced, nor needed. HS2 out of memory error when curl sends a get request - Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab Assignee: Navis Attachments: HIVE-6468.1.patch.txt, HIVE-6468.2.patch.txt We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8606) [hs2] Do not unnecessarily call setPermission on staging directories
Harsh J created HIVE-8606: - Summary: [hs2] Do not unnecessarily call setPermission on staging directories Key: HIVE-8606 URL: https://issues.apache.org/jira/browse/HIVE-8606 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.1 Reporter: Harsh J Assignee: Harsh J Priority: Minor HS2 has made setPermission mandatory within its CLIService#setupStagingDir method as a result of HIVE-6602. This causes HS2 to fail to start if the owner of the staging directory is not the same user as it, even though the directory is already 777. This is because only owners and superusers of a directory can change its permission, not group or others. Failure appears as: {code} Caused by: org.apache.hive.service.ServiceException: Error setting stage directories at org.apache.hive.service.cli.CLIService.start(CLIService.java:132) at org.apache.hive.service.CompositeService.start(CompositeService.java:70) ... 8 more Caused by: org.apache.hadoop.security.AccessControlException: Permission denied {code} We should only call the setPermission if it is unsatisfactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8606) [hs2] Do not unnecessarily call setPermission on staging directories
[ https://issues.apache.org/jira/browse/HIVE-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-8606. --- Resolution: Duplicate In checking trunk, this was resolved overall via HIVE-6847. Sorry for the noise! [hs2] Do not unnecessarily call setPermission on staging directories Key: HIVE-8606 URL: https://issues.apache.org/jira/browse/HIVE-8606 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.13.1 Reporter: Harsh J Assignee: Harsh J Priority: Minor HS2 has made setPermission mandatory within its CLIService#setupStagingDir method as a result of HIVE-6602. This causes HS2 to fail to start if the owner of the staging directory is not the same user as it, even though the directory is already 777. This is because only owners and superusers of a directory can change its permission, not group or others. Failure appears as: {code} Caused by: org.apache.hive.service.ServiceException: Error setting stage directories at org.apache.hive.service.cli.CLIService.start(CLIService.java:132) at org.apache.hive.service.CompositeService.start(CompositeService.java:70) ... 8 more Caused by: org.apache.hadoop.security.AccessControlException: Permission denied {code} We should only call the setPermission if it is unsatisfactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7178) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it
[ https://issues.apache.org/jira/browse/HIVE-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-7178: -- Description: The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); Alias works for just one column: EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) was: The following SQL doesn't work: EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' The following SQL works (without alias in grouping set): EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); Alias works for just one column: EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it -- Key: HIVE-7178 URL: https://issues.apache.org/jira/browse/HIVE-7178 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.0 Reporter: Yibing Shi The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); Alias works for just one column: EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7178) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it
[ https://issues.apache.org/jira/browse/HIVE-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-7178: -- Description: The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); {code} Alias works for just one column: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) was: The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); Alias works for just one column: EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it -- Key: HIVE-7178 URL: https://issues.apache.org/jira/browse/HIVE-7178 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.0 Reporter: Yibing Shi The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); {code} Alias works for just one column: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7178) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it
[ https://issues.apache.org/jira/browse/HIVE-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-7178. --- Resolution: Fixed Same as HIVE-6950. Resolving this one as dupe as the other one has comments. Table alias cannot be used in GROUPING SETS clause if there are more than one column in it -- Key: HIVE-7178 URL: https://issues.apache.org/jira/browse/HIVE-7178 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.0 Reporter: Yibing Shi The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); {code} Alias works for just one column: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7178) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it
[ https://issues.apache.org/jira/browse/HIVE-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-7178. --- Resolution: Duplicate Table alias cannot be used in GROUPING SETS clause if there are more than one column in it -- Key: HIVE-7178 URL: https://issues.apache.org/jira/browse/HIVE-7178 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.0 Reporter: Yibing Shi The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); {code} Alias works for just one column: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-7178) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it
[ https://issues.apache.org/jira/browse/HIVE-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HIVE-7178: --- Table alias cannot be used in GROUPING SETS clause if there are more than one column in it -- Key: HIVE-7178 URL: https://issues.apache.org/jira/browse/HIVE-7178 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.0 Reporter: Yibing Shi The following SQL doesn't work: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' {code} The following SQL works (without alias in grouping set): {code} EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); {code} Alias works for just one column: {code} EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); {code} Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6950) Parsing Error in GROUPING SETS
[ https://issues.apache.org/jira/browse/HIVE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170251#comment-14170251 ] Harsh J commented on HIVE-6950: --- Some more failing examples are present on the HIVE-7178 JIRA that was marked dupe of this. Parsing Error in GROUPING SETS -- Key: HIVE-6950 URL: https://issues.apache.org/jira/browse/HIVE-6950 Project: Hive Issue Type: Bug Reporter: Rohit Agarwal The following query: {code} SELECT tab1.a, tab1.b, SUM(tab1.c) FROM tab1 GROUP BY tab1.a, tab1.b GROUPING SETS ((tab1.a, tab1.b)) {code} results in the following error: {code} ParseException line 7:22 missing ) at ',' near 'EOF' line 7:31 extraneous input ')' expecting EOF near 'EOF' {code} Changing the query to: {code} SELECT tab1.a, tab1.b, SUM(tab1.c) FROM tab1 GROUP BY tab1.a, tab1.b GROUPING SETS ((a, tab1.b)) {code} makes it work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6758) Beeline doesn't work with -e option when started in background
[ https://issues.apache.org/jira/browse/HIVE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168765#comment-14168765 ] Harsh J commented on HIVE-6758: --- bq. Please feel free to assign this JIRA to yourself if you'd like to working on this. Unfortunately I'm not certain where the issue lies in Beeline itself that may be triggering this. It could be a JLine bug or in our method of usage of JLine, but thats as far as I can tell at the moment. If Hive CLI uses JLine too, then the difference in how we prompt in Beeline vs. CLI may explain the bug. Beeline doesn't work with -e option when started in background -- Key: HIVE-6758 URL: https://issues.apache.org/jira/browse/HIVE-6758 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Johndee Burks Assignee: Xuefu Zhang In hive CLI you could easily integrate its use into a script and back ground the process like this: hive -e some query Beeline does not run when you do the same even with the -f switch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6758) Beeline doesn't work with -e option when started in background
[ https://issues.apache.org/jira/browse/HIVE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168768#comment-14168768 ] Harsh J commented on HIVE-6758: --- Alternatively a flag that disables jline for non-prompted execution may resolve this too. It could be a better way to solve this. Beeline doesn't work with -e option when started in background -- Key: HIVE-6758 URL: https://issues.apache.org/jira/browse/HIVE-6758 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Johndee Burks Assignee: Xuefu Zhang In hive CLI you could easily integrate its use into a script and back ground the process like this: hive -e some query Beeline does not run when you do the same even with the -f switch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6468) HS2 out of memory error when curl sends a get request
[ https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095083#comment-14095083 ] Harsh J commented on HIVE-6468: --- It would also help add a check for the payload length detected as 0 (i.e. negative number) aside of an upper message cap. It also seems Thrift is doing this incorrectly so we should rather fix it there and consume it in, than duplicate its code at our end. I found this JIRA after I'd filed THRIFT-2660. HS2 out of memory error when curl sends a get request - Key: HIVE-6468 URL: https://issues.apache.org/jira/browse/HIVE-6468 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Environment: Centos 6.3, hive 12, hadoop-2.2 Reporter: Abin Shahab Assignee: Navis Attachments: HIVE-6468.1.patch.txt, HIVE-6468.2.patch.txt We see an out of memory error when we run simple beeline calls. (The hive.server2.transport.mode is binary) curl localhost:1 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7075) JsonSerde raises NullPointerException when object key is not lower case
[ https://issues.apache.org/jira/browse/HIVE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006721#comment-14006721 ] Harsh J commented on HIVE-7075: --- Thanks for clarifying Navis (and the test) and Yibing! JsonSerde raises NullPointerException when object key is not lower case --- Key: HIVE-7075 URL: https://issues.apache.org/jira/browse/HIVE-7075 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Yibing Shi Attachments: HIVE-7075.1.patch.txt, HIVE-7075.2.patch.txt We have noticed that the JsonSerde produces a NullPointerException if a JSON object has a key value that is not lower case. For example. Assume we have the file one.json: { empId : 123, name : John } { empId : 456, name : Jane } hive CREATE TABLE emps (empId INT, name STRING) ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; hive LOAD DATA LOCAL INPATH 'one.json' INTO TABLE emps; hive SELECT * FROM emps; Failed with exception java.io.IOException:java.lang.NullPointerException Notice, it seems to work if the keys are lower case. Assume we have the file 'two.json': { empid : 123, name : John } { empid : 456, name : Jane } hive DROP TABLE emps; hive CREATE TABLE emps (empId INT, name STRING) ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; hive LOAD DATA LOCAL INPATH 'two.json' INTO TABLE emps; hive SELECT * FROM emps; OK 123 John 456 Jane -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7075) JsonSerde raises NullPointerException when object key is not lower case
[ https://issues.apache.org/jira/browse/HIVE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004582#comment-14004582 ] Harsh J commented on HIVE-7075: --- Can a test case be added as well, aside of just the fix, so this does not regress in future? bq. Cannot sure it's right to use lower cased field names. Hive explicitly appears to make them lower case though? JsonSerde raises NullPointerException when object key is not lower case --- Key: HIVE-7075 URL: https://issues.apache.org/jira/browse/HIVE-7075 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Yibing Shi Attachments: HIVE-7075.1.patch.txt We have noticed that the JsonSerde produces a NullPointerException if a JSON object has a key value that is not lower case. For example. Assume we have the file one.json: { empId : 123, name : John } { empId : 456, name : Jane } hive CREATE TABLE emps (empId INT, name STRING) ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; hive LOAD DATA LOCAL INPATH 'one.json' INTO TABLE emps; hive SELECT * FROM emps; Failed with exception java.io.IOException:java.lang.NullPointerException Notice, it seems to work if the keys are lower case. Assume we have the file 'two.json': { empid : 123, name : John } { empid : 456, name : Jane } hive DROP TABLE emps; hive CREATE TABLE emps (empId INT, name STRING) ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; hive LOAD DATA LOCAL INPATH 'two.json' INTO TABLE emps; hive SELECT * FROM emps; OK 123 John 456 Jane -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6758) Beeline doesn't work with -e option when started in background
[ https://issues.apache.org/jira/browse/HIVE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954967#comment-13954967 ] Harsh J commented on HIVE-6758: --- Here's one way to fix it: https://issuetracker.springsource.com/browse/STS-2552?focusedCommentId=66702page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-66702 Beeline doesn't work with -e option when started in background -- Key: HIVE-6758 URL: https://issues.apache.org/jira/browse/HIVE-6758 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Johndee Burks Assignee: Xuefu Zhang In hive CLI you could easily integrate its use into a script and back ground the process like this: hive -e some query Beeline does not run when you do the same even with the -f switch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6758) Beeline doesn't work with -e option when started in background
[ https://issues.apache.org/jira/browse/HIVE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954968#comment-13954968 ] Harsh J commented on HIVE-6758: --- Workaround (tested, works): {code} export HADOOP_CLIENT_OPTS=-Djline.terminal=jline.UnsupportedTerminal beeline … {code} Beeline doesn't work with -e option when started in background -- Key: HIVE-6758 URL: https://issues.apache.org/jira/browse/HIVE-6758 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Johndee Burks Assignee: Xuefu Zhang In hive CLI you could easily integrate its use into a script and back ground the process like this: hive -e some query Beeline does not run when you do the same even with the -f switch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6758) Beeline doesn't work with -e option when started in background
[ https://issues.apache.org/jira/browse/HIVE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951794#comment-13951794 ] Harsh J commented on HIVE-6758: --- Beeline is running into one of SIGTTOU or SIGTTIN signals from the TTY. Beeline doesn't work with -e option when started in background -- Key: HIVE-6758 URL: https://issues.apache.org/jira/browse/HIVE-6758 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Johndee Burks Assignee: Xuefu Zhang In hive CLI you could easily integrate its use into a script and back ground the process like this: hive -e some query Beeline does not run when you do the same even with the -f switch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6758) Beeline only works in interactive mode
[ https://issues.apache.org/jira/browse/HIVE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-6758: -- Environment: (was: CDH4.5) Beeline only works in interactive mode -- Key: HIVE-6758 URL: https://issues.apache.org/jira/browse/HIVE-6758 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0, 0.12.0 Reporter: Johndee Burks In hive CLI you could easily integrate its use into a script and back ground the process like this: hive -e some query Beeline does not run when you do the same even with the -f switch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6758) Beeline only works in interactive mode
[ https://issues.apache.org/jira/browse/HIVE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-6758: -- Affects Version/s: (was: 0.12.0) Beeline only works in interactive mode -- Key: HIVE-6758 URL: https://issues.apache.org/jira/browse/HIVE-6758 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Johndee Burks In hive CLI you could easily integrate its use into a script and back ground the process like this: hive -e some query Beeline does not run when you do the same even with the -f switch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-2539) Enable passing username/password via JDBC
[ https://issues.apache.org/jira/browse/HIVE-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947632#comment-13947632 ] Harsh J commented on HIVE-2539: --- This should no longer be an issue with the new jdbc:hive2 drivers. Enable passing username/password via JDBC - Key: HIVE-2539 URL: https://issues.apache.org/jira/browse/HIVE-2539 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1 Reporter: Sriram Krishnan Assignee: chunqing xie Labels: patch Attachments: HIVE-2539.PATCH Changing the username and/or the password seems to have no effect (also confirmed here: https://cwiki.apache.org/Hive/hivejdbcinterface.html). Connection con = DriverManager.getConnection(jdbc:hive://localhost:1/default, , ); Would be beneficial to pass the username/password via JDBC - and also for the server to honor the username password being passed (may be dependent of that being fixed first). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-3576) Regression: ALTER TABLE DROP IF EXISTS PARTITION throws a SemanticException if Partition is not found
[ https://issues.apache.org/jira/browse/HIVE-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-3576. --- Resolution: Cannot Reproduce Fix Version/s: 0.11.0 Unable to reproduce in 0.11+. Resolving. {code} hive create table str_part_table (a string) partitioned by (dt string); OK Time taken: 16.108 seconds hive create table int_part_table (a string) partitioned by (dt int); OK Time taken: 0.705 seconds hive alter table str_part_table drop if exists partition (dt=2007); OK Time taken: 1.38 seconds hive alter table int_part_table drop if exists partition (dt=2007); OK Time taken: 3.494 seconds hive alter table str_part_table drop if exists partition (dt='2007'); OK Time taken: 0.091 seconds hive alter table int_part_table drop if exists partition (dt='2007'); OK Time taken: 0.101 seconds {code} Regression: ALTER TABLE DROP IF EXISTS PARTITION throws a SemanticException if Partition is not found - Key: HIVE-3576 URL: https://issues.apache.org/jira/browse/HIVE-3576 Project: Hive Issue Type: Bug Components: Metastore, Query Processor Affects Versions: 0.9.0 Reporter: Harsh J Fix For: 0.11.0 Doing a simple {{ALTER TABLE testtable DROP IF EXISTS PARTITION(dt=NONEXISTENTPARTITION)}} fails with a SemanticException of the 10006 kind (INVALID_PARTITION). This does not respect the {{hive.exec.drop.ignorenonexistent}} condition either, since there are no if-check-wraps around this area, when fetching partitions from the store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-2615. --- Resolution: Duplicate Fix Version/s: 0.12.0 This has been fixed via HIVE-4172. Resolving as duplicate. CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0 Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Fix For: 0.12.0 Attachments: HIVE-2615.1.patch Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-4413) Parse Exception : character '@' not supported while granting privileges to user in a Secure Cluster through hive client.
[ https://issues.apache.org/jira/browse/HIVE-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-4413. --- Resolution: Duplicate HIVE-3807 should resolve this (the specific need of @ in secure clusters) Parse Exception : character '@' not supported while granting privileges to user in a Secure Cluster through hive client. Key: HIVE-4413 URL: https://issues.apache.org/jira/browse/HIVE-4413 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Navin Madathil Labels: cli, hive While running through hive CLI , hive grant command throws a parseException '@' not supported. But in a secure cluster ( Kerberos ) the username is appended with the realmname seperated by the character '@'.Without giving the full username the permissions are not granted to the intended user. grant all on table tablename to user user@REALM -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-5783: -- Release Note: Added support for 'STORED AS PARQUET' and for setting parquet as the default storage engine. (was: adds stored as parquet and setting parquet as the default storage engine.) Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Labels: Parquet Fix For: 0.13.0 Attachments: HIVE-5783.noprefix.patch, HIVE-5783.noprefix.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf
Harsh J created HIVE-6437: - Summary: DefaultHiveAuthorizationProvider should not initialize a new HiveConf Key: HIVE-6437 URL: https://issues.apache.org/jira/browse/HIVE-6437 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0 Reporter: Harsh J Assignee: Harsh J Priority: Trivial During a HS2 connection, every SessionState got initializes a new DefaultHiveAuthorizationProvider object (on stock configs). In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that may prove too expensive, and unnecessary to do, since SessionState itself sends in a fully applied HiveConf to it in the first place. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf
[ https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-6437: -- Assignee: (was: Harsh J) This appears to be done cause {{Hive#get(…)}} expects a HiveConf parameter and not a Configuration one. I don't see {{Hive#get(…)}} particularly relying on a HiveConf specific method, but since the change is deeper than what I envisioned earlier, I'll leave it to the more knowledgeable folks to decide here. DefaultHiveAuthorizationProvider should not initialize a new HiveConf - Key: HIVE-6437 URL: https://issues.apache.org/jira/browse/HIVE-6437 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0 Reporter: Harsh J Priority: Trivial During a HS2 connection, every SessionState got initializes a new DefaultHiveAuthorizationProvider object (on stock configs). In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that may prove too expensive, and unnecessary to do, since SessionState itself sends in a fully applied HiveConf to it in the first place. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-3472) Build An Analytical SQL Engine for MapReduce
[ https://issues.apache.org/jira/browse/HIVE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3472: -- Affects Version/s: (was: 0.13.0) (was: 0.12.0) (was: 0.11.0) Build An Analytical SQL Engine for MapReduce Key: HIVE-3472 URL: https://issues.apache.org/jira/browse/HIVE-3472 Project: Hive Issue Type: New Feature Affects Versions: 0.10.0 Reporter: Shane Huang Attachments: SQL-design.pdf While there are continuous efforts in extending Hive’s SQL support (e.g., see some recent examples such as HIVE-2005 and HIVE-2810), many widely used SQL constructs are still not supported in HiveQL, such as selecting from multiple tables, subquery in WHERE clauses, etc. We propose to build a SQL-92 full compatible engine (for MapReduce based analytical query processing) as an extension to Hive. The SQL frontend will co-exist with the HiveQL frontend; consequently, one can mix SQL and HiveQL statements in their queries (switching between HiveQL mode and SQL-92 mode using a “hive.ql.mode” parameter before each query statement). This way useful Hive extensions are still accessible to users. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5524) Unwanted delay in getting Hive metastore connection with METASTORE_CLIENT_CONNECT_RETRY_DELAY/
[ https://issues.apache.org/jira/browse/HIVE-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-5524: -- Affects Version/s: (was: 0.11.0) Status: Patch Available (was: Open) Marking as Patch Available for review. Unwanted delay in getting Hive metastore connection with METASTORE_CLIENT_CONNECT_RETRY_DELAY/ -- Key: HIVE-5524 URL: https://issues.apache.org/jira/browse/HIVE-5524 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Rajesh Balamohan Attachments: HIVE-5524.patch Reference: http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java for (URI store : metastoreUris) { ... if (isConnected) { break; } } // Wait before launching the next round of connection retries. if (retryDelaySeconds 0) { try { LOG.info(Waiting + retryDelaySeconds + seconds before next connection attempt.); Thread.sleep(retryDelaySeconds * 1000); } catch (InterruptedException ignore) {} } By default hive.metastore.client.connect.retry.delay is set to 1 second. If it is set to 10 seconds, this code will wait for 10 seconds even if a successful connection is made in first attempt itself. This can be avoided by changing to if (!isConnected retryDelaySeconds 0) { -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5747) Hcat alter table add parttition: add skip header/row feature
[ https://issues.apache.org/jira/browse/HIVE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813889#comment-13813889 ] Harsh J commented on HIVE-5747: --- Moved from HADOOP to HIVE as request is unrelated to HADOOP. Please file issues under the right project. Hcat alter table add parttition: add skip header/row feature Key: HIVE-5747 URL: https://issues.apache.org/jira/browse/HIVE-5747 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.10.0 Reporter: Rekha Joshi Priority: Minor Creating hcatalog table using creating tables and alter table add partition is most used approach.However at times the incoming files can come with header row/column names. In such cases it would be good feature to be able skip header/rows. Suggestions below: hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -skip header hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -skip [n] hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -DskipRow=1 -- can choose with bounded array (rows) for selecting rows for table hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -rows[2:] // from first row till all hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -rows[2:100] // from first row till 100 rows Correct place for this feature in hive or hcat?or with -D can be handled in hcat? Thanks Rekha -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Moved] (HIVE-5747) Hcat alter table add parttition: add skip header/row feature
[ https://issues.apache.org/jira/browse/HIVE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J moved HADOOP-10084 to HIVE-5747: Component/s: (was: conf) HCatalog Affects Version/s: (was: 0.5.0) 0.10.0 Key: HIVE-5747 (was: HADOOP-10084) Project: Hive (was: Hadoop Common) Hcat alter table add parttition: add skip header/row feature Key: HIVE-5747 URL: https://issues.apache.org/jira/browse/HIVE-5747 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.10.0 Reporter: Rekha Joshi Priority: Minor Creating hcatalog table using creating tables and alter table add partition is most used approach.However at times the incoming files can come with header row/column names. In such cases it would be good feature to be able skip header/rows. Suggestions below: hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -skip header hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -skip [n] hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -DskipRow=1 -- can choose with bounded array (rows) for selecting rows for table hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -rows[2:] // from first row till all hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -rows[2:100] // from first row till 100 rows Correct place for this feature in hive or hcat?or with -D can be handled in hcat? Thanks Rekha -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5747) Hcat alter table add parttition: add skip header/row feature
[ https://issues.apache.org/jira/browse/HIVE-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813890#comment-13813890 ] Harsh J commented on HIVE-5747: --- P.s. Doesn't the {{alter table add partition}} clause just alter metadata? Adding a skip option to that may not make sense. Perhaps you mean to add it generally to a {{load data into table}} or a {{insert into}} clause? Hcat alter table add parttition: add skip header/row feature Key: HIVE-5747 URL: https://issues.apache.org/jira/browse/HIVE-5747 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 0.10.0 Reporter: Rekha Joshi Priority: Minor Creating hcatalog table using creating tables and alter table add partition is most used approach.However at times the incoming files can come with header row/column names. In such cases it would be good feature to be able skip header/rows. Suggestions below: hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -skip header hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -skip [n] hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -DskipRow=1 -- can choose with bounded array (rows) for selecting rows for table hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -rows[2:] // from first row till all hcat alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data' -rows[2:100] // from first row till 100 rows Correct place for this feature in hive or hcat?or with -D can be handled in hcat? Thanks Rekha -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5454) HCatalog runs a partition listing with an empty filter
[ https://issues.apache.org/jira/browse/HIVE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787535#comment-13787535 ] Harsh J commented on HIVE-5454: --- The removal of deprecated methods seems to have caught several usage of it within hcatalog. I'll re-up a new patch that updates all old refs. HCatalog runs a partition listing with an empty filter -- Key: HIVE-5454 URL: https://issues.apache.org/jira/browse/HIVE-5454 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Harsh J Attachments: D13317.1.patch This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of calling HCatInputFormat causes it to do 2x partition lookups - once without the filter, and then again with the filter. For tables with large number partitions (10, say), the non-filter lookup proves fatal both to the client (Read timed out errors from ThriftMetaStoreClient cause the server doesn't respond) and to the server (too much data loaded into the cache, OOME, or slowdown). The fix would be to use a single call that also passes a partition filter information, as was in the case of HCatalog 0.4 sources before HCATALOG-527. (HCatalog-release-wise, this affects all 0.5.x users) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5454) HCatalog runs a partition listing with an empty filter
[ https://issues.apache.org/jira/browse/HIVE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787751#comment-13787751 ] Harsh J commented on HIVE-5454: --- Looked into the single test failure and the test seems unrelated to my changes. Oddly https://builds.apache.org/job/PreCommit-HIVE-Build/1061/testReport/org.apache.hive.hcatalog.listener/TestNotificationListener/ reports the test passed. HCatalog runs a partition listing with an empty filter -- Key: HIVE-5454 URL: https://issues.apache.org/jira/browse/HIVE-5454 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Harsh J Attachments: D13317.1.patch, D13317.2.patch, D13317.3.patch This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of calling HCatInputFormat causes it to do 2x partition lookups - once without the filter, and then again with the filter. For tables with large number partitions (10, say), the non-filter lookup proves fatal both to the client (Read timed out errors from ThriftMetaStoreClient cause the server doesn't respond) and to the server (too much data loaded into the cache, OOME, or slowdown). The fix would be to use a single call that also passes a partition filter information, as was in the case of HCatalog 0.4 sources before HCATALOG-527. (HCatalog-release-wise, this affects all 0.5.x users) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5454) HCatalog runs a partition listing with an empty filter
Harsh J created HIVE-5454: - Summary: HCatalog runs a partition listing with an empty filter Key: HIVE-5454 URL: https://issues.apache.org/jira/browse/HIVE-5454 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Harsh J This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of calling HCatInputFormat causes it to do 2x partition lookups - once without the filter, and then again with the filter. For tables with large number partitions (10, say), the non-filter lookup proves fatal both to the client (Read timed out errors from ThriftMetaStoreClient cause the server doesn't respond) and to the server (too much data loaded into the cache, OOME, or slowdown). The fix would be to use a single call that also passes a partition filter information, as was in the case of HCatalog 0.4 sources before HCATALOG-527. (HCatalog-release-wise, this affects all 0.5.x users) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5454) HCatalog runs a partition listing with an empty filter
[ https://issues.apache.org/jira/browse/HIVE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787243#comment-13787243 ] Harsh J commented on HIVE-5454: --- I'll submit a patch for this shortly. HCatalog runs a partition listing with an empty filter -- Key: HIVE-5454 URL: https://issues.apache.org/jira/browse/HIVE-5454 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Harsh J This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of calling HCatInputFormat causes it to do 2x partition lookups - once without the filter, and then again with the filter. For tables with large number partitions (10, say), the non-filter lookup proves fatal both to the client (Read timed out errors from ThriftMetaStoreClient cause the server doesn't respond) and to the server (too much data loaded into the cache, OOME, or slowdown). The fix would be to use a single call that also passes a partition filter information, as was in the case of HCatalog 0.4 sources before HCATALOG-527. (HCatalog-release-wise, this affects all 0.5.x users) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5454) HCatalog runs a partition listing with an empty filter
[ https://issues.apache.org/jira/browse/HIVE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-5454: -- Release Note: Deprecated the HCatInputFormat#setFilter(…) chain API call in favor of a new, filter-passing, HCatInputFormat#setInput(…) method. Status: Patch Available (was: Open) Review opened at https://reviews.facebook.net/D13317 HCatalog runs a partition listing with an empty filter -- Key: HIVE-5454 URL: https://issues.apache.org/jira/browse/HIVE-5454 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Harsh J Attachments: D13317.1.patch This is a HCATALOG-527 caused regression, wherein the HCatLoader's way of calling HCatInputFormat causes it to do 2x partition lookups - once without the filter, and then again with the filter. For tables with large number partitions (10, say), the non-filter lookup proves fatal both to the client (Read timed out errors from ThriftMetaStoreClient cause the server doesn't respond) and to the server (too much data loaded into the cache, OOME, or slowdown). The fix would be to use a single call that also passes a partition filter information, as was in the case of HCatalog 0.4 sources before HCATALOG-527. (HCatalog-release-wise, this affects all 0.5.x users) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-1083) allow sub-directories for an external table/partition
[ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772565#comment-13772565 ] Harsh J commented on HIVE-1083: --- Hi, Can you confirm you used MR2, and that the config toggle specified on MAPREDUCE-1501 was turned on? allow sub-directories for an external table/partition - Key: HIVE-1083 URL: https://issues.apache.org/jira/browse/HIVE-1083 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: Namit Jain Assignee: Zheng Shao Labels: inputformat Sometimes users want to define an external table/partition based on all files (recursively) inside a directory. Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5208) Provide an easier way to capture DEBUG logging
[ https://issues.apache.org/jira/browse/HIVE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759766#comment-13759766 ] Harsh J commented on HIVE-5208: --- Certainly, but thats not as intuitive as a flag. Provide an easier way to capture DEBUG logging -- Key: HIVE-5208 URL: https://issues.apache.org/jira/browse/HIVE-5208 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Harsh J Priority: Minor Capturing debug logging for troubleshooting is painful in Hive today: 1. It doesn't log anywhere by default. 2. We need to add a long -hiveconf hive.root.logger=DEBUG,console to the Hive CLI just to enable the debug flag, or set an equivalent env-var appropriately. I suggest we make this simpler via either one of the below: 1. Provide a wrapped binary, hive-debug, so folks can simply re-run the hive-debug command and re-run their query and capture an output. This could also write to a pre-designated $PWD file. 2. Provide a simpler switch, such as -verbose that automatically toggles the flag instead, much like what Beeline does today already. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5208) Provide an easier way to capture DEBUG logging
Harsh J created HIVE-5208: - Summary: Provide an easier way to capture DEBUG logging Key: HIVE-5208 URL: https://issues.apache.org/jira/browse/HIVE-5208 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.11.0 Reporter: Harsh J Priority: Minor Capturing debug logging for troubleshooting is painful in Hive today: 1. It doesn't log anywhere by default. 2. We need to add a long -hiveconf hive.root.logger=DEBUG,console to the Hive CLI just to enable the debug flag, or set an equivalent env-var appropriately. I suggest we make this simpler via either one of the below: 1. Provide a wrapped binary, hive-debug, so folks can simply re-run the hive-debug command and re-run their query and capture an output. This could also write to a pre-designated $PWD file. 2. Provide a simpler switch, such as -verbose that automatically toggles the flag instead, much like what Beeline does today already. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4338) Create Table fails after upgrade from 0.9 to 0.10
[ https://issues.apache.org/jira/browse/HIVE-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721498#comment-13721498 ] Harsh J commented on HIVE-4338: --- I don't think this is a bug. The error will pop up if you either don't upgrade your schema, or attempt to use a 0.10 client against a 0.9 db schema. I think we should just mark this closed. Create Table fails after upgrade from 0.9 to 0.10 - Key: HIVE-4338 URL: https://issues.apache.org/jira/browse/HIVE-4338 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Environment: Ubuntu 3.2.0-23-generic #36-Ubuntu on AMD Reporter: Geula Vainappel I ran apt-get upgrade on a relatively old cdh installation. Many things were upgraded, among them hadoop, hdfs and hive (from 0.9 to 0.10). After the upgrade, CREATE TABLE started failing. I rebooted the machine, and it is still not working. The error I am receiving is: hive create table ttt(line string); FAILED: Error in metadata: MetaException(message:javax.jdo.JDODataStoreException: Error(s) were found while auto-creating/validating the datastore for classes. The errors are printed in the log, and are attached to this exception. NestedThrowables: java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column 'IS_STOREDASSUBDIRECTORIES' has been specified as NOT NULL and either the DEFAULT clause was not specified or was specified as DEFAULT NULL.) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4571) Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe
Harsh J created HIVE-4571: - Summary: Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe Key: HIVE-4571 URL: https://issues.apache.org/jira/browse/HIVE-4571 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Harsh J Priority: Minor HIVE-337 added support for complex data structures and also oddly added in a limit of the # of separator characters required to make that happen. When using an Avro-based table that has more than 8-10 levels of nesting in records, this limit gets hit and such tables can't be queried. We either need to remove such a limit or raise it to a high-enough value to support such nested data structures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4571) Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe
[ https://issues.apache.org/jira/browse/HIVE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659431#comment-13659431 ] Harsh J commented on HIVE-4571: --- A sample change would be: {code} diff --git a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java index 0036a8e..252ea6b 100644 --- a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java +++ b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java @@ -211,7 +211,7 @@ public class LazySimpleSerDe implements SerDe { // Read the separators: We use 8 levels of separators by default, but we // should change this when we allow users to specify more than 10 levels // of separators through DDL. -serdeParams.separators = new byte[8]; +serdeParams.separators = new byte[32]; serdeParams.separators[0] = getByte(tbl.getProperty(Constants.FIELD_DELIM, tbl.getProperty(Constants.SERIALIZATION_FORMAT)), DefaultSeparators[0]); serdeParams.separators[1] = getByte(tbl {code} Reinvestigate HIVE-337 induced limit on number of separator characters in LazySerDe --- Key: HIVE-4571 URL: https://issues.apache.org/jira/browse/HIVE-4571 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: Harsh J Priority: Minor HIVE-337 added support for complex data structures and also oddly added in a limit of the # of separator characters required to make that happen. When using an Avro-based table that has more than 8-10 levels of nesting in records, this limit gets hit and such tables can't be queried. We either need to remove such a limit or raise it to a high-enough value to support such nested data structures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1083) allow sub-directories for an external table/partition
[ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558674#comment-13558674 ] Harsh J commented on HIVE-1083: --- Given that MAPREDUCE-1501 is in MR2 today, and Hive can make use of it, should we close this out now? allow sub-directories for an external table/partition - Key: HIVE-1083 URL: https://issues.apache.org/jira/browse/HIVE-1083 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: Namit Jain Assignee: Zheng Shao Labels: inputformat Sometimes users want to define an external table/partition based on all files (recursively) inside a directory. Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3449) Speed up ant builds with the ant uptodate task
[ https://issues.apache.org/jira/browse/HIVE-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530809#comment-13530809 ] Harsh J commented on HIVE-3449: --- Thanks Carl! Speed up ant builds with the ant uptodate task -- Key: HIVE-3449 URL: https://issues.apache.org/jira/browse/HIVE-3449 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Given that the hive build is an enormously lng build (~ 6hrs), it might be very helpful if there are some checkpointing capabilities available to be able to resume a build from failed point and not have to restart everything on a single test failure. One possible way to do this would be to use the ant uptodate task to check set of target files to be more up-to-date than given set of source files and execute a target only if that is true. By default this capability could be off but can be activated with the -Dresume=true argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510258#comment-13510258 ] Harsh J commented on HIVE-3745: --- My colleague, [~esteban], pointed out that the SQL92 standard (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt, 8.2 comparison predicate) qualifies this thought and that Hive does have a bug. Hive does improper = based string comparisons for strings with trailing whitespaces - Key: HIVE-3745 URL: https://issues.apache.org/jira/browse/HIVE-3745 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.9.0 Reporter: Harsh J Compared to other systems such as DB2, MySQL, etc., which disregard trailing whitespaces in a string used when comparing two strings with the {{=}} relational operator, Hive does not do this. For example, note the following line from the MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/char.html {quote} All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces. {quote} Hive still is whitespace sensitive and regards trailing spaces of a string as worthy elements when comparing. Ideally {{LIKE}} should consider this strongly, but {{=}} should not. Is there a specific reason behind this difference of implementation in Hive's SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2250) DESCRIBE EXTENDED table_name shows inconsistent compression information.
[ https://issues.apache.org/jira/browse/HIVE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507271#comment-13507271 ] Harsh J commented on HIVE-2250: --- If we don't really make use of the IS_COMPRESSED attribute of a table, should we just get rid of it (or at least not print it in the {{describe extended/formatted}} output, which causes great confusion as it is always certainly {{No}})? DESCRIBE EXTENDED table_name shows inconsistent compression information. -- Key: HIVE-2250 URL: https://issues.apache.org/jira/browse/HIVE-2250 Project: Hive Issue Type: Bug Components: CLI, Diagnosability Affects Versions: 0.7.0 Environment: RHEL, Full Cloudera stack Reporter: Travis Powell Assignee: subramanian raghunathan Priority: Critical Attachments: HIVE-2250.patch Commands executed in this order: user@node # hive hive SET hive.exec.compress.output=true; hive SET io.seqfile.compression.type=BLOCK; hive CREATE TABLE table_name ( [...] ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE; hive CREATE TABLE staging_table ( [...] ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; hive LOAD DATA LOCAL INPATH 'file:///root/input/' OVERWRITE INTO TABLE staging_table; hive INSERT OVERWRITE TABLE table_name SELECT * FROM staging_table; (Map reduce job to change to sequence file...) hive DESCRIBE EXTENDED table_name; Detailed Table Information Table(tableName:table_name, dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key, type:string, comment:null), FieldSchema(name:remote_address, type:string, comment:null), FieldSchema(name:canister_lssn, type:string, comment:null), FieldSchema(name:canister_session_id, type:bigint, comment:null), FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, type:string, comment:null), FieldSchema(name:tltvid, type:string, comment:null), FieldSchema(name:canister_server, type:string, comment:null), FieldSchema(name:session_timestamp, type:string, comment:null), FieldSchema(name:session_duration, type:string, comment:null), FieldSchema(name:hit_count, type:bigint, comment:null), FieldSchema(name:http_user_agent, type:string, comment:null), FieldSchema(name:extractid, type:bigint, comment:null), FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= *** SEE ABOVE: Compression is set to FALSE, even though contents of table is compressed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504479#comment-13504479 ] Harsh J commented on HIVE-2266: --- bq. Hadoop loads native compression libraries. I believe that they are platform dependent hence I do not assume that they always have same compression ratio. Please correct me if I am wrong here. Compression is based on standard algorithms, which is platform independent. The native code is platform-dependent cause of the library references it has. Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266-2.patch, HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3745) Hive does improper = based string comparisons with trailing whitespaces
Harsh J created HIVE-3745: - Summary: Hive does improper = based string comparisons with trailing whitespaces Key: HIVE-3745 URL: https://issues.apache.org/jira/browse/HIVE-3745 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.9.0 Reporter: Harsh J Compared to other systems such as DB2, MySQL, etc., which disregard trailing whitespaces in a string used when comparing two strings with the {{=}} relational operator, Hive does not do this. For example, note the following line from the MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/char.html {quote} All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces. {quote} Hive still is whitespace sensitive and regards trailing spaces of a string as worthy elements when comparing. Ideally {{LIKE}} should consider this strongly, but {{=}} should not. Is there a specific reason behind this difference of implementation in Hive's SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3745: -- Summary: Hive does improper = based string comparisons for strings with trailing whitespaces (was: Hive does improper = based string comparisons with trailing whitespaces) Hive does improper = based string comparisons for strings with trailing whitespaces - Key: HIVE-3745 URL: https://issues.apache.org/jira/browse/HIVE-3745 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.9.0 Reporter: Harsh J Compared to other systems such as DB2, MySQL, etc., which disregard trailing whitespaces in a string used when comparing two strings with the {{=}} relational operator, Hive does not do this. For example, note the following line from the MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/char.html {quote} All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces. {quote} Hive still is whitespace sensitive and regards trailing spaces of a string as worthy elements when comparing. Ideally {{LIKE}} should consider this strongly, but {{=}} should not. Is there a specific reason behind this difference of implementation in Hive's SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3653) Failure in a counter poller run should not be considered as a job failure
Harsh J created HIVE-3653: - Summary: Failure in a counter poller run should not be considered as a job failure Key: HIVE-3653 URL: https://issues.apache.org/jira/browse/HIVE-3653 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.7.1 Reporter: Harsh J A client had a simple transient failure in polling the JT for job status (which it does for HIVECOUNTERSPULLINTERVAL for each currently running job). {code} java.io.IOException: Call to HOST/IP:PORT failed on local exception: java.io.IOException: Connection reset by peer at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) at org.apache.hadoop.ipc.Client.call(Client.java:1110) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1053) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1065) at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:351) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:686) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:131) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:310) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:317) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:490) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {code} This lead to Hive thinking the running job itself has failed, and it failed the query run, although the running job progressed to completion in the background. We should not let transient IOExceptions in counter polling cause query termination, and should instead just retry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3636) Catch the NPe when using ^D to exit from CLI
[ https://issues.apache.org/jira/browse/HIVE-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485894#comment-13485894 ] Harsh J commented on HIVE-3636: --- This is no longer a problem on Hive trunk/recent releases. Was resolved (in a different manner) quite a while ago after a CLI refactor I think. Catch the NPe when using ^D to exit from CLI Key: HIVE-3636 URL: https://issues.apache.org/jira/browse/HIVE-3636 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.9.0 Reporter: Alexander Alten-Lorenz Assignee: Alexander Alten-Lorenz Fix For: 0.10.0 Attachments: HIVE-3636.patch The exit patch is just a quick hack to catch the NPE in order to allow ^D to exit hive without a stacktrace. Originally created by Frank Fejes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3576) Regression: ALTER TABLE DROP IF EXISTS PARTITION throws a SemanticException if Partition is not found
[ https://issues.apache.org/jira/browse/HIVE-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484004#comment-13484004 ] Harsh J commented on HIVE-3576: --- This also only happens on tables partitioned by a string type key, not int types, for example: {code} hive create table str_part_table (a string) partitioned by (dt string); OK Time taken: 0.054 seconds hive create table int_part_table (a string) partitioned by (dt int); OK Time taken: 0.031 seconds hive alter table str_part_table drop if exists partition (dt=2007); FAILED: SemanticException [Error 10006]: Partition not found dt = 2007 hive alter table int_part_table drop if exists partition (dt=2007); OK Time taken: 0.091 seconds hive alter table str_part_table drop if exists partition (dt='2007'); OK Time taken: 0.06 seconds hive alter table int_part_table drop if exists partition (dt='2007'); OK Time taken: 0.065 seconds {code} Regression: ALTER TABLE DROP IF EXISTS PARTITION throws a SemanticException if Partition is not found - Key: HIVE-3576 URL: https://issues.apache.org/jira/browse/HIVE-3576 Project: Hive Issue Type: Bug Components: Metastore, Query Processor Affects Versions: 0.9.0 Reporter: Harsh J Doing a simple {{ALTER TABLE testtable DROP IF EXISTS PARTITION(dt=NONEXISTENTPARTITION)}} fails with a SemanticException of the 10006 kind (INVALID_PARTITION). This does not respect the {{hive.exec.drop.ignorenonexistent}} condition either, since there are no if-check-wraps around this area, when fetching partitions from the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3619) Hive JDBC driver should return a proper update-count of rows affected by query
Harsh J created HIVE-3619: - Summary: Hive JDBC driver should return a proper update-count of rows affected by query Key: HIVE-3619 URL: https://issues.apache.org/jira/browse/HIVE-3619 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Harsh J Priority: Minor HiveStatement.java currently has an explicit 0 return: public int getUpdateCount() throws SQLException { return 0; } Ideally we ought to emit the exact number of rows affected by the query statement itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3595) Hive should adapt new FsShell commands since Hadoop 2 has changed FsShell argument structures
Harsh J created HIVE-3595: - Summary: Hive should adapt new FsShell commands since Hadoop 2 has changed FsShell argument structures Key: HIVE-3595 URL: https://issues.apache.org/jira/browse/HIVE-3595 Project: Hive Issue Type: Improvement Components: Shims Affects Versions: 0.9.0 Reporter: Harsh J Priority: Minor A simple example is that hive calls -rmr in the FsShell class, which in Hadoop 2 is rm -r. This helps avoid printing an unnecessary Deprecated warning in Hive when the Hadoop23 (or hadoop-2) shim is in use. We should wrap the logic and call the right commands of hadoop-2 to avoid this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3595) Hive should adapt new FsShell commands since Hadoop 2 has changed FsShell argument structures
[ https://issues.apache.org/jira/browse/HIVE-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478842#comment-13478842 ] Harsh J commented on HIVE-3595: --- A quick scan suggests we just need to fix the one instance of rmr in use at Hive#replaceFiles: {code} // point of no return -- delete oldPath if (oldPath != null) { try { FileSystem fs2 = oldPath.getFileSystem(conf); if (fs2.exists(oldPath)) { // use FsShell to move data to .Trash first rather than delete permanently FsShell fshell = new FsShell(); fshell.setConf(conf); fshell.run(new String[]{-rmr, oldPath.toString()}); } } catch (Exception e) { //swallow the exception LOG.warn(Directory + oldPath.toString() + canot be removed.); } } {code} If we can wrap that -rmr to use -rmr only for 0.23 hadoop versions, this can be closed. For higher versions the logic ought to use -rm -r. Hive should adapt new FsShell commands since Hadoop 2 has changed FsShell argument structures - Key: HIVE-3595 URL: https://issues.apache.org/jira/browse/HIVE-3595 Project: Hive Issue Type: Improvement Components: Shims Affects Versions: 0.9.0 Reporter: Harsh J Priority: Minor A simple example is that hive calls -rmr in the FsShell class, which in Hadoop 2 is rm -r. This helps avoid printing an unnecessary Deprecated warning in Hive when the Hadoop23 (or hadoop-2) shim is in use. We should wrap the logic and call the right commands of hadoop-2 to avoid this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3576) Regression: ALTER TABLE DROP IF EXISTS PARTITION throws a SemanticException if Partition is not found
Harsh J created HIVE-3576: - Summary: Regression: ALTER TABLE DROP IF EXISTS PARTITION throws a SemanticException if Partition is not found Key: HIVE-3576 URL: https://issues.apache.org/jira/browse/HIVE-3576 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Reporter: Harsh J Doing a simple {{ALTER TABLE testtable DROP IF EXISTS PARTITION(dt=NONEXISTENTPARTITION)}} fails with a SemanticException of the 10006 kind (INVALID_PARTITION). This does not respect the {{hive.exec.drop.ignorenonexistent}} condition either, since there are no if-check-wraps around this area, when fetching partitions from the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3543) A hive-builtins snapshot is required on the classpath of generated eclipse files
[ https://issues.apache.org/jira/browse/HIVE-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HIVE-3543. --- Resolution: Invalid Doing this causes HIVE-2673. Resolving. Sorry for noise. A hive-builtins snapshot is required on the classpath of generated eclipse files Key: HIVE-3543 URL: https://issues.apache.org/jira/browse/HIVE-3543 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Harsh J We shouldn't rely on presence of a jar of our own project to pre-exist when generating the eclipse files, like so: {code} classpathentry kind=lib path=build/builtins/hive-builtins-0.10.0-SNAPSHOT.jar/ {code} Does the src on classpath for builtins not suffice instead? This one's presence makes one have to run {{ant jar eclipse-files}} instead of the simple {{ant compile eclipse-files}}. If not source paths, lets at least consider adding a classes/ directory instead of expecting a jar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3543) A hive-builtins snapshot is required on the classpath of generated eclipse files
[ https://issues.apache.org/jira/browse/HIVE-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470963#comment-13470963 ] Harsh J commented on HIVE-3543: --- Removing the entry still lets me compile the whole project in the IDE without a compilation error. I think we ought to remove it unless it serves any purpose over the src/ classpath entry for the same thing. A hive-builtins snapshot is required on the classpath of generated eclipse files Key: HIVE-3543 URL: https://issues.apache.org/jira/browse/HIVE-3543 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Harsh J We shouldn't rely on presence of a jar of our own project to pre-exist when generating the eclipse files, like so: {code} classpathentry kind=lib path=build/builtins/hive-builtins-0.10.0-SNAPSHOT.jar/ {code} Does the src on classpath for builtins not suffice instead? This one's presence makes one have to run {{ant jar eclipse-files}} instead of the simple {{ant compile eclipse-files}}. If not source paths, lets at least consider adding a classes/ directory instead of expecting a jar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3449) Speed up ant builds with the ant uptodate task
[ https://issues.apache.org/jira/browse/HIVE-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471088#comment-13471088 ] Harsh J commented on HIVE-3449: --- Here's a quick speed hack I did: Delete the {{checkmodified=${ivy.checkmodified}}} lines in ivy/settings.xml file. This made Hive not try to resolve everything under the sun every single time, dropping the {{clean package}} target build time from over 10 minutes to 1 minute. Speed up ant builds with the ant uptodate task -- Key: HIVE-3449 URL: https://issues.apache.org/jira/browse/HIVE-3449 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Given that the hive build is an enormously lng build (~ 6hrs), it might be very helpful if there are some checkpointing capabilities available to be able to resume a build from failed point and not have to restart everything on a single test failure. One possible way to do this would be to use the ant uptodate task to check set of target files to be more up-to-date than given set of source files and execute a target only if that is true. By default this capability could be off but can be activated with the -Dresume=true argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3543) A hive-builtins snapshot is required on the classpath of generated eclipse files
Harsh J created HIVE-3543: - Summary: A hive-builtins snapshot is required on the classpath of generated eclipse files Key: HIVE-3543 URL: https://issues.apache.org/jira/browse/HIVE-3543 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.10.0 Reporter: Harsh J We shouldn't rely on presence of a jar of our own project to pre-exist when generating the eclipse files, like so: {code} classpathentry kind=lib path=build/builtins/hive-builtins-0.10.0-SNAPSHOT.jar/ {code} Does the src on classpath for builtins not suffice instead? This one's presence makes one have to run {{ant jar eclipse-files}} instead of the simple {{ant compile eclipse-files}}. If not source paths, lets at least consider adding a classes/ directory instead of expecting a jar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3463) Add CASCADING to MySQL's InnoDB schema
[ https://issues.apache.org/jira/browse/HIVE-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466647#comment-13466647 ] Harsh J commented on HIVE-3463: --- DN does have cascading support: http://www.datanucleus.org/products/datanucleus/jdo/orm/cascading.html Add CASCADING to MySQL's InnoDB schema -- Key: HIVE-3463 URL: https://issues.apache.org/jira/browse/HIVE-3463 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.9.0 Reporter: Alexander Alten-Lorenz Assignee: Alexander Alten-Lorenz Cascading could help to cleanup the tables when a FK is deleted. http://dev.mysql.com/doc/refman/5.5/en/innodb-foreign-key-constraints.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443778#comment-13443778 ] Harsh J commented on HIVE-2247: --- Can someone mark a Fix Version for this JIRA please? It is unclear. ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.10.patch.txt, HIVE-2247.11.patch.txt, HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt, HIVE-2247.8.patch.txt, HIVE-2247.9.patch.txt, HIVE-2247.9.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3414) Exception cast issue in HiveMetaStore.java
Harsh J created HIVE-3414: - Summary: Exception cast issue in HiveMetaStore.java Key: HIVE-3414 URL: https://issues.apache.org/jira/browse/HIVE-3414 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.8.1 Reporter: Harsh J Priority: Trivial (This is reading the 0.8 code) Faulty way of checking for types in HiveMetaStore.java, under the HMSHandler.rename_partition method: {code} 1914 } catch (Exception e) { 1915 assert(e instanceof RuntimeException); 1916 throw (RuntimeException)e; 1917 } {code} Leads to: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.metastore.api.InvalidOperationException cannot be cast to java.lang.RuntimeException at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:1916) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partition(HiveMetaStore.java:1884) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_partition(HiveMetaStoreClient.java:818) at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition(Hive.java:427) at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1464) ... 18 more {code} When a genuine exception occurs when processing the alter_partition method. Why do we cast here and not re-throw in a wrapped fashion? On trunk the similar statements now exist just in createDefaultDB and get_database methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3277) Enable Metastore audit logging for non-secure connections
[ https://issues.apache.org/jira/browse/HIVE-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442241#comment-13442241 ] Harsh J commented on HIVE-3277: --- Hi Carl, We do get some audit logging in unsecure mode but it lacks info I think. Isn't this done already via HIVE-2797? Enable Metastore audit logging for non-secure connections - Key: HIVE-3277 URL: https://issues.apache.org/jira/browse/HIVE-3277 Project: Hive Issue Type: Improvement Components: Logging, Metastore, Security Reporter: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3195) Typo in dynamic partitioning code bits, says genereated instead of generated in some places.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13414641#comment-13414641 ] Harsh J commented on HIVE-3195: --- Ha, thanks Edward :D (Will remember to check P-A next time) Typo in dynamic partitioning code bits, says genereated instead of generated in some places. Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch, HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3195) Typo in dynamic partitioning code bits, says genereated instead of generated in some places.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401229#comment-13401229 ] Harsh J commented on HIVE-3195: --- I know that Edward. I didn't find any references, so thought I'll look at it later and cancelled the patch (although it is funny I would have to fix logs in tests for a typo fix, heh). {code} ➜ hive git:(typoutilities) grep genereated -R . ➜ hive git:(typoutilities) {code} In any case, what am I missing? Typo in dynamic partitioning code bits, says genereated instead of generated in some places. Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch, HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3195) Typo in ql's Utilities, says genereated instead of generated.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3195: -- Attachment: HIVE-3195.patch Carl - Sorry I missed that though I did see it on the logs 2x, the other being this one. Fixed in this patch. Typo in ql's Utilities, says genereated instead of generated. - Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch, HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3195) Typo in dynamic partitioning code bits, says genereated instead of generated in some places.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3195: -- Summary: Typo in dynamic partitioning code bits, says genereated instead of generated in some places. (was: Typo in ql's Utilities, says genereated instead of generated.) Typo in dynamic partitioning code bits, says genereated instead of generated in some places. Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch, HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3195) Typo in dynamic partitioning code bits, says genereated instead of generated in some places.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3195: -- Status: Patch Available (was: Open) Typo in dynamic partitioning code bits, says genereated instead of generated in some places. Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch, HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3195) Typo in dynamic partitioning code bits, says genereated instead of generated in some places.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3195: -- Status: Open (was: Patch Available) Canceling patch to investigate Edward's comment. Typo in dynamic partitioning code bits, says genereated instead of generated in some places. Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch, HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3195) Typo in ql's Utilities, says genereated instead of generated.
Harsh J created HIVE-3195: - Summary: Typo in ql's Utilities, says genereated instead of generated. Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3195) Typo in ql's Utilities, says genereated instead of generated.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3195: -- Status: Patch Available (was: Open) Typo in ql's Utilities, says genereated instead of generated. - Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3195) Typo in ql's Utilities, says genereated instead of generated.
[ https://issues.apache.org/jira/browse/HIVE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3195: -- Attachment: HIVE-3195.patch Typo in ql's Utilities, says genereated instead of generated. - Key: HIVE-3195 URL: https://issues.apache.org/jira/browse/HIVE-3195 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.7.1 Reporter: Harsh J Priority: Trivial Labels: typo Attachments: HIVE-3195.patch Typo: {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java @@ -1860,7 +1860,7 @@ public final class Utilities { FileStatus[] status = Utilities.getFileStatusRecurse(loadPath, numDPCols, if (status.length == 0) { -LOG.warn(No partition is genereated by dynamic partitioning); +LOG.warn(No partition is generated by dynamic partitioning); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira