[jira] [Created] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled
Amareshwari Sriramadasu created HIVE-14264: -- Summary: ArrayIndexOutOfBoundsException when cbo is enabled Key: HIVE-14264 URL: https://issues.apache.org/jira/browse/HIVE-14264 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 2.1.0 Reporter: Amareshwari Sriramadasu We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL filter. Exception goes away when hive.cbo.enable=false Here is a stacktrace in our production environment : {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72] at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72] at org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) ~[hive-exec-2.1.2-inm.jar:2.1.2-inm] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147) ~[hive-service-2.1.2-inm.jar:2.1.2-inm] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13862) org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilter falls back to ORM
Amareshwari Sriramadasu created HIVE-13862: -- Summary: org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilter falls back to ORM Key: HIVE-13862 URL: https://issues.apache.org/jira/browse/HIVE-13862 Project: Hive Issue Type: Bug Components: Metastore Reporter: Amareshwari Sriramadasu Assignee: Rajat Khandelwal Fix For: 2.1.0 We are seeing following exception and calls fall back to ORM which make it costly : {noformat} WARN org.apache.hadoop.hive.metastore.ObjectStore - Direct SQL failed, falling back to ORM java.lang.ClassCastException: org.datanucleus.store.rdbms.query.ForwardQueryResult cannot be cast to java.lang.Number at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlInt(MetaStoreDirectSql.java:892) ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:855) ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilter(MetaStoreDirectSql.java:405) ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:2763) ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:2755) ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2606) ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore.getNumPartitionsByFilterInternal(ObjectStore.java:2770) [hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] at org.apache.hadoop.hive.metastore.ObjectStore.getNumPartitionsByFilter(ObjectStore.java:2746) [hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11482) Add retrying thrift client for HiveServer2
Amareshwari Sriramadasu created HIVE-11482: -- Summary: Add retrying thrift client for HiveServer2 Key: HIVE-11482 URL: https://issues.apache.org/jira/browse/HIVE-11482 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Amareshwari Sriramadasu Similar to https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java, this improvement request is to add a retrying thrift client for HiveServer2 to do retries upon thrift exceptions. Here are few commits done on a forked branch that can be picked - https://github.com/InMobi/hive/commit/7fb957fb9c2b6000d37c53294e256460010cb6b7 https://github.com/InMobi/hive/commit/11e4b330f051c3f58927a276d562446761c9cd6d https://github.com/InMobi/hive/commit/241386fd870373a9253dca0bcbdd4ea7e665406c -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11483) Add encoding and decoding for query string config
Amareshwari Sriramadasu created HIVE-11483: -- Summary: Add encoding and decoding for query string config Key: HIVE-11483 URL: https://issues.apache.org/jira/browse/HIVE-11483 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu We have seen some queries in production where some of the literals passed in the query have control characters, which result in exception when query string is set in the job xml. Proposing a solution to encode the query string in configuration and provide getters decoded string. Here is a commit in a forked repo : https://github.com/InMobi/hive/commit/2faf5761191fa3103a0d779fde584d494ed75bf5 Suggestions are welcome on the solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11486) Hive should log exceptions for better debuggability with full trace
Amareshwari Sriramadasu created HIVE-11486: -- Summary: Hive should log exceptions for better debuggability with full trace Key: HIVE-11486 URL: https://issues.apache.org/jira/browse/HIVE-11486 Project: Hive Issue Type: Improvement Components: Diagnosability Reporter: Amareshwari Sriramadasu For ex : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2638 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java#L315 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11485) Session close should not close async SQL operations
Amareshwari Sriramadasu created HIVE-11485: -- Summary: Session close should not close async SQL operations Key: HIVE-11485 URL: https://issues.apache.org/jira/browse/HIVE-11485 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Amareshwari Sriramadasu Right now, session close on HiveServer closes all operations. But, queries running are actually available across sessions and they are not tied to a session (expect the launch - which requires configuration and resources). And it allows getting the status of the query across sessions. But session close of the session ( on which operation is launched) closes all the operations as well. So, we should avoid closing all operations upon closing a session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11487) Add getNumPartitionsByFilter api in metastore api
Amareshwari Sriramadasu created HIVE-11487: -- Summary: Add getNumPartitionsByFilter api in metastore api Key: HIVE-11487 URL: https://issues.apache.org/jira/browse/HIVE-11487 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Amareshwari Sriramadasu Adding api for getting number of partitions for a filter will be more optimal when we are only interested in the number. getAllPartitions will construct all the partition object which can be time consuming and not required. Here is a commit we pushed in a forked repo in our organization - https://github.com/inmobi/hive/commit/68b3534d3e6c4d978132043cec668798ed53e444. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11484) Fix ObjectInspector for Char and VarChar
Amareshwari Sriramadasu created HIVE-11484: -- Summary: Fix ObjectInspector for Char and VarChar Key: HIVE-11484 URL: https://issues.apache.org/jira/browse/HIVE-11484 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Amareshwari Sriramadasu The creation of HiveChar and Varchar is not happening through ObjectInspector. Here is fix we pushed internally : https://github.com/InMobi/hive/commit/fe95c7850e7130448209141155f28b25d3504216 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10435) Make HiveSession implementation pluggable through configuration
Amareshwari Sriramadasu created HIVE-10435: -- Summary: Make HiveSession implementation pluggable through configuration Key: HIVE-10435 URL: https://issues.apache.org/jira/browse/HIVE-10435 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Amareshwari Sriramadasu SessionManager in CLIService creates and keeps track of HiveSession. Right now, it creates HiveSessionImpl which is one implementation of HiveSession. This improvement request is to make it pluggable through a configuration sothat other implementations can be passed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock
[ https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270779#comment-14270779 ] Amareshwari Sriramadasu commented on HIVE-9324: --- After doing some code walkthrough, here is what i found, On JoinOperator, whenever any key as more values than BLOCKSIZE(hardcoded to 25000), it spills the values to a file on disk, and spill uses SequenceFile format. Here is the table description for spill (from org.apache.hadoop.hive.ql.exec.JoinUtil.java) {noformat} TableDesc tblDesc = new TableDesc( SequenceFileInputFormat.class, HiveSequenceFileOutputFormat.class, Utilities.makeProperties( org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_FORMAT, + Utilities.ctrlaCode, org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMNS, colNames .toString(), org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMN_TYPES, colTypes.toString(), serdeConstants.SERIALIZATION_LIB,LazyBinarySerDe.class.getName())); spillTableDesc[tag] = tblDesc; {noformat} From the exception: {noformat} Caused by: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) ... 13 more {noformat} I see that the value in SequenceFile is RCFile$KeyBuffer, dont know why. Also couldnt figure out the reason why the reading went wrong. Following is the code snippet from SequenceFile.java for the exception we are hitting : {noformat} 2417 public synchronized Object next(Object key) throws IOException { 2418 if (key != null key.getClass() != getKeyClass()) { 2419 throw new IOException(wrong key class: +key.getClass().getName() 2420 + is not +keyClass); 2421 } 2422 2423 if (!blockCompressed) { 2424 outBuf.reset(); 2425 2426 keyLength = next(outBuf); 2427 if (keyLength 0) 2428 return null; 2429 2430 valBuffer.reset(outBuf.getData(), outBuf.getLength()); 2431 2432 key = deserializeKey(key); 2433 valBuffer.mark(0); 2434 if (valBuffer.getPosition() != keyLength) 2435 throw new IOException(key + read + valBuffer.getPosition() 2436 + bytes, should read + keyLength); {noformat} Reduce side joins failing with IOException from RowContainer.nextBlock -- Key: HIVE-9324 URL: https://issues.apache.org/jira/browse/HIVE-9324 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Amareshwari Sriramadasu We are seeing some reduce side join mapreduce jobs failing with following exception : {noformat} 2014-12-14 16:58:51,296 ERROR org.apache.hadoop.hive.ql.exec.persistence.RowContainer: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) 2014-12-14 16:58:51,334 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException:
[jira] [Created] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock
Amareshwari Sriramadasu created HIVE-9324: - Summary: Reduce side joins failing with IOException from RowContainer.nextBlock Key: HIVE-9324 URL: https://issues.apache.org/jira/browse/HIVE-9324 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Amareshwari Sriramadasu We are seeing some reduce side join mapreduce jobs failing with following exception : {noformat} 2014-12-14 16:58:51,296 ERROR org.apache.hadoop.hive.ql.exec.persistence.RowContainer: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) 2014-12-14 16:58:51,334 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230) ... 12 more Caused by: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360) ... 13 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock
[ https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270588#comment-14270588 ] Amareshwari Sriramadasu commented on HIVE-9324: --- More task log : {noformat} 2014-12-14 16:58:03,905 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: __REDUCE_PLAN__ 2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.log.PerfLogger: PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities 2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.exec.Utilities: Deserializing ReduceWork via kryo 2014-12-14 16:58:04,987 INFO org.apache.hadoop.hive.ql.log.PerfLogger: /PERFLOG method=deserializePlan start=1418576283945 end=1418576284987 duration=1042 from=org.apache.hadoop.hive.ql.exec.Utilities 2014-12-14 16:58:04,988 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: __REDUCE_PLAN__ 2014-12-14 16:58:05,327 INFO ExecReducer: JOINId =0 Children FSId =1 Children \Children ParentId = 0 null\Parent \FS \Children Parent\Parent \JOIN 2014-12-14 16:58:05,327 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Initializing Self 0 JOIN 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: JOIN struct_col23:string,_col65:double,_col99:double,_col237:double,_col240:double,_col250:string,_col367:int totalsz = 7 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Operator 0 JOIN initialized 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Initializing children of 0 JOIN 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 1 FS 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 1 FS 2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 1 FS initialized 2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 1 FS 2014-12-14 16:58:05,395 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Initialization Done 0 JOIN 2014-12-14 16:58:05,401 INFO ExecReducer: ExecReducer: processing 1 rows: used memory = 242598168 2014-12-14 16:58:05,406 INFO ExecReducer: ExecReducer: processing 10 rows: used memory = 242759392 2014-12-14 16:58:05,437 INFO ExecReducer: ExecReducer: processing 100 rows: used memory = 242759392 2014-12-14 16:58:05,657 INFO ExecReducer: ExecReducer: processing 1000 rows: used memory = 243653240 2014-12-14 16:58:06,976 INFO ExecReducer: ExecReducer: processing 1 rows: used memory = 247197944 2014-12-14 16:58:07,646 INFO ExecReducer: ExecReducer: processing 10 rows: used memory = 277801256 2014-12-14 16:58:11,511 INFO ExecReducer: ExecReducer: processing 100 rows: used memory = 283150744 2014-12-14 16:58:14,993 INFO ExecReducer: ExecReducer: processing 200 rows: used memory = 293036992 2014-12-14 16:58:18,497 INFO ExecReducer: ExecReducer: processing 300 rows: used memory = 311449488 2014-12-14 16:58:20,815 INFO ExecReducer: ExecReducer: processing 400 rows: used memory = 285251752 2014-12-14 16:58:26,460 INFO ExecReducer: ExecReducer: processing 500 rows: used memory = 328223864 2014-12-14 16:58:29,412 INFO ExecReducer: ExecReducer: processing 600 rows: used memory = 263175576 2014-12-14 16:58:31,331 INFO ExecReducer: ExecReducer: processing 700 rows: used memory = 282021320 2014-12-14 16:58:35,099 INFO ExecReducer: ExecReducer: processing 800 rows: used memory = 299301184 2014-12-14 16:58:37,981 INFO ExecReducer: ExecReducer: processing 900 rows: used memory = 306925648 2014-12-14 16:58:40,506 INFO ExecReducer: ExecReducer: processing 1000 rows: used memory = 307407920 2014-12-14 16:58:42,242 INFO ExecReducer: ExecReducer: processing 1100 rows: used memory = 304664048 2014-12-14 16:58:46,142 INFO ExecReducer: ExecReducer: processing 1200 rows: used memory = 298347024 2014-12-14 16:58:48,549 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 1000 rows for join key [003b9de7876541c2bcce9029ff0d3873] 2014-12-14 16:58:48,622 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 2000 rows for join key [003b9de7876541c2bcce9029ff0d3873] 2014-12-14 16:58:48,677 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 4000 rows for join key [003b9de7876541c2bcce9029ff0d3873] 2014-12-14 16:58:48,679 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://data-grill300-null.arshad.ev1.inmobi.com:8020/tmp/hive-dataqa/hive_2014-12-14_16-49-14_996_1630664550753106415-32/_tmp.-mr-10002/00_0 2014-12-14 16:58:48,680 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS
[jira] [Comment Edited] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock
[ https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270588#comment-14270588 ] Amareshwari Sriramadasu edited comment on HIVE-9324 at 1/9/15 5:54 AM: --- More task log : {noformat} 2014-12-14 16:58:03,905 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: __REDUCE_PLAN__ 2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.log.PerfLogger: PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities 2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.exec.Utilities: Deserializing ReduceWork via kryo 2014-12-14 16:58:04,987 INFO org.apache.hadoop.hive.ql.log.PerfLogger: /PERFLOG method=deserializePlan start=1418576283945 end=1418576284987 duration=1042 from=org.apache.hadoop.hive.ql.exec.Utilities 2014-12-14 16:58:04,988 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: __REDUCE_PLAN__ 2014-12-14 16:58:05,327 INFO ExecReducer: JOINId =0 Children FSId =1 Children \Children ParentId = 0 null\Parent \FS \Children Parent\Parent \JOIN 2014-12-14 16:58:05,327 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Initializing Self 0 JOIN 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: JOIN struct_col23:string,_col65:double,_col99:double,_col237:double,_col240:double,_col250:string,_col367:int totalsz = 7 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Operator 0 JOIN initialized 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Initializing children of 0 JOIN 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 1 FS 2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 1 FS 2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 1 FS initialized 2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 1 FS 2014-12-14 16:58:05,395 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: Initialization Done 0 JOIN 2014-12-14 16:58:05,401 INFO ExecReducer: ExecReducer: processing 1 rows: used memory = 242598168 2014-12-14 16:58:05,406 INFO ExecReducer: ExecReducer: processing 10 rows: used memory = 242759392 2014-12-14 16:58:05,437 INFO ExecReducer: ExecReducer: processing 100 rows: used memory = 242759392 2014-12-14 16:58:05,657 INFO ExecReducer: ExecReducer: processing 1000 rows: used memory = 243653240 2014-12-14 16:58:06,976 INFO ExecReducer: ExecReducer: processing 1 rows: used memory = 247197944 2014-12-14 16:58:07,646 INFO ExecReducer: ExecReducer: processing 10 rows: used memory = 277801256 2014-12-14 16:58:11,511 INFO ExecReducer: ExecReducer: processing 100 rows: used memory = 283150744 2014-12-14 16:58:14,993 INFO ExecReducer: ExecReducer: processing 200 rows: used memory = 293036992 2014-12-14 16:58:18,497 INFO ExecReducer: ExecReducer: processing 300 rows: used memory = 311449488 2014-12-14 16:58:20,815 INFO ExecReducer: ExecReducer: processing 400 rows: used memory = 285251752 2014-12-14 16:58:26,460 INFO ExecReducer: ExecReducer: processing 500 rows: used memory = 328223864 2014-12-14 16:58:29,412 INFO ExecReducer: ExecReducer: processing 600 rows: used memory = 263175576 2014-12-14 16:58:31,331 INFO ExecReducer: ExecReducer: processing 700 rows: used memory = 282021320 2014-12-14 16:58:35,099 INFO ExecReducer: ExecReducer: processing 800 rows: used memory = 299301184 2014-12-14 16:58:37,981 INFO ExecReducer: ExecReducer: processing 900 rows: used memory = 306925648 2014-12-14 16:58:40,506 INFO ExecReducer: ExecReducer: processing 1000 rows: used memory = 307407920 2014-12-14 16:58:42,242 INFO ExecReducer: ExecReducer: processing 1100 rows: used memory = 304664048 2014-12-14 16:58:46,142 INFO ExecReducer: ExecReducer: processing 1200 rows: used memory = 298347024 2014-12-14 16:58:48,549 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 1000 rows for join key [003b9de7876541c2bcce9029ff0d3873] 2014-12-14 16:58:48,622 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 2000 rows for join key [003b9de7876541c2bcce9029ff0d3873] 2014-12-14 16:58:48,677 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 4000 rows for join key [003b9de7876541c2bcce9029ff0d3873] 2014-12-14 16:58:48,679 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://test-machine:8020/tmp/hive-dataqa/hive_2014-12-14_16-49-14_996_1630664550753106415-32/_tmp.-mr-10002/00_0 2014-12-14 16:58:48,680 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS
[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4115: -- Resolution: Won't Fix Status: Resolved (was: Patch Available) This effort has been incubated into apache here - http://incubator.apache.org/projects/lens.html Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: HIVE-4115.D10689.1.patch, HIVE-4115.D10689.2.patch, HIVE-4115.D10689.3.patch, HIVE-4115.D10689.4.patch, cube-design-2.pdf, cube-design.docx We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7892: -- Resolution: Fixed Fix Version/s: 0.14.0 Release Note: Maps thrift's set type to hive's array type. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks Satish ! Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Fix For: 0.14.0 Attachments: HIVE-7892.1.patch, HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133617#comment-14133617 ] Amareshwari Sriramadasu commented on HIVE-7936: --- +1 Patch looks fine. Can you update the test output here? Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7936: -- Resolution: Fixed Release Note: Support Thrift union type as hive union type Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks Suma! Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133946#comment-14133946 ] Amareshwari Sriramadasu commented on HIVE-7936: --- Sorry.. missed commenting binary file changes. Merging it now. Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133946#comment-14133946 ] Amareshwari Sriramadasu edited comment on HIVE-7936 at 9/15/14 2:37 PM: Sorry.. missed committing binary file changes. Merged it now in a different commit was (Author: amareshwari): Sorry.. missed commenting binary file changes. Merging it now. Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7694: -- Release Note: SMB join on tables differing by number of sorted by columns with same join prefix (was: I just committed this. Thanks Suma!) SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7694: -- Resolution: Fixed Release Note: I just committed this. Thanks Suma! Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128116#comment-14128116 ] Amareshwari Sriramadasu commented on HIVE-7694: --- I just committed this. Thanks Suma! SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128138#comment-14128138 ] Amareshwari Sriramadasu commented on HIVE-7892: --- Code changes look fine. Can you update the test output for convert_enum_to_string.q and upload the patch? Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Attachments: HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128313#comment-14128313 ] Amareshwari Sriramadasu commented on HIVE-7936: --- The code changes look fine. Put a few comments on the review board. Since the patch involves a binary file change, i think jenkins wont be able to apply the patch. Can you run the tests on a local machine and update the result here? Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-2390) Expand support for union types
[ https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-2390: -- Resolution: Fixed Release Note: Adds UnionType support in LazyBinarySerde Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks Suma! Expand support for union types -- Key: HIVE-2390 URL: https://issues.apache.org/jira/browse/HIVE-2390 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jakob Homan Assignee: Suma Shivaprasad Labels: uniontype Fix For: 0.14.0 Attachments: HIVE-2390.1.patch, HIVE-2390.patch When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde: {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126785#comment-14126785 ] Amareshwari Sriramadasu commented on HIVE-7694: --- +1 Code changes look fine to me. [~suma.shivaprasad], Can you rebase the patch? Also run tests once again as the last test build was having test failures. Make sure failed tests are not failing on your local machine before submitting again SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2390) Expand support for union types
[ https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125210#comment-14125210 ] Amareshwari Sriramadasu commented on HIVE-2390: --- +1 Changes look fine to me. [~suma.shivaprasad], Test failure seems unrelated to me. Can you look into and confirm? Expand support for union types -- Key: HIVE-2390 URL: https://issues.apache.org/jira/browse/HIVE-2390 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jakob Homan Assignee: Suma Shivaprasad Labels: uniontype Fix For: 0.14.0 Attachments: HIVE-2390.1.patch, HIVE-2390.patch When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde: {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7694: -- Assignee: Suma Shivaprasad SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7629: -- Assignee: Suma Shivaprasad Problem in SMB Joins between two Parquet tables --- Key: HIVE-7629 URL: https://issues.apache.org/jira/browse/HIVE-7629 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Labels: Parquet Fix For: 0.14.0 Attachments: HIVE-7629.1.patch, HIVE-7629.patch The issue is clearly seen when two bucketed and sorted parquet tables with different number of columns are involved in the join . The following exception is seen {noformat} Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5733: -- Status: Open (was: Patch Available) There are problems with uploaded patch, canceling Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Assignee: Amareshwari Sriramadasu Attachments: HIVE-5733.1.patch Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5733: -- Status: Patch Available (was: Open) Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Assignee: Amareshwari Sriramadasu Attachments: HIVE-5733.1.patch Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5733: -- Attachment: HIVE-5733.1.patch Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Assignee: Amareshwari Sriramadasu Attachments: HIVE-5733.1.patch Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5733: -- Attachment: HIVE-5733.1.patch As per the documentation of maven-shade-plugin the plugin will replace the project's main artifact with the shaded artifact. Giving a different classifier like nodep also was not helpful. As per the doc, the shaded artifact can be given a different name. Doc - http://maven.apache.org/plugins/maven-shade-plugin/examples/attached-artifact.html Attaching the patch which generates hive-exec-version.jar with no dependencies and hive-exec-version.-withdep.jar as the shaded jar. Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Assignee: Amareshwari Sriramadasu Attachments: HIVE-5733.1.patch Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5733: -- Attachment: (was: HIVE-5733.1.patch) Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Assignee: Amareshwari Sriramadasu Attachments: HIVE-5733.1.patch Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu reassigned HIVE-5733: - Assignee: Amareshwari Sriramadasu Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Assignee: Amareshwari Sriramadasu Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997276#comment-13997276 ] Amareshwari Sriramadasu commented on HIVE-5733: --- +1 This is much required. I agree it has become difficult to depend on hive exec jar, because of ql module shading all the dependencies. I will try to put a patch. Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved HIVE-6953. --- Resolution: Duplicate The test failures were because of HIVE-6877. After applying the patch from HIVE-6877 into branch-0.13, all the tests are passing All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu Assignee: Alan Gates Attachments: TEST-org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.xml, TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, TEST-org.apache.hadoop.hive.ql.txn.compactor.TestWorker.xml, nohup.out.gz When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979312#comment-13979312 ] Amareshwari Sriramadasu commented on HIVE-6953: --- Also, there are some tests failing randomly because they fail to create path in /user/hive/warehouse For ex: org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat failed with following errors {noformat} log4j:ERROR Could not read configuration file from URL [file:/hive-path/ql/target/tmp/conf/hive-log4j.properties]. java.io.FileNotFoundException: /hive-path/ql/target/tmp/conf/hive-log4j.properties (No such file or directory) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/text_symlink_text is not a directory or unable to create one) {noformat} So, most probably something is getting cleaned-up. But I could find where to start. Can someone help me to find the root cause? Where can i start to look at it. All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu Assignee: Alan Gates Attachments: TEST-org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.xml, TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, TEST-org.apache.hadoop.hive.ql.txn.compactor.TestWorker.xml, nohup.out.gz When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979312#comment-13979312 ] Amareshwari Sriramadasu edited comment on HIVE-6953 at 4/24/14 5:47 AM: Also, there are some tests failing randomly because they fail to create path in /user/hive/warehouse For ex: org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat failed with following errors {noformat} log4j:ERROR Could not read configuration file from URL [file:/hive-path/ql/target/tmp/conf/hive-log4j.properties]. java.io.FileNotFoundException: /hive-path/ql/target/tmp/conf/hive-log4j.properties (No such file or directory) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/text_symlink_text is not a directory or unable to create one) {noformat} So, most probably something is getting cleaned-up. But I could not find where to start. Can someone help me to find the root cause? Where can i start to look at it. was (Author: amareshwari): Also, there are some tests failing randomly because they fail to create path in /user/hive/warehouse For ex: org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat failed with following errors {noformat} log4j:ERROR Could not read configuration file from URL [file:/hive-path/ql/target/tmp/conf/hive-log4j.properties]. java.io.FileNotFoundException: /hive-path/ql/target/tmp/conf/hive-log4j.properties (No such file or directory) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/text_symlink_text is not a directory or unable to create one) {noformat} So, most probably something is getting cleaned-up. But I could find where to start. Can someone help me to find the root cause? Where can i start to look at it. All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu Assignee: Alan Gates Attachments: TEST-org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.xml, TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, TEST-org.apache.hadoop.hive.ql.txn.compactor.TestWorker.xml, nohup.out.gz When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
Amareshwari Sriramadasu created HIVE-6953: - Summary: All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976740#comment-13976740 ] Amareshwari Sriramadasu commented on HIVE-6953: --- There are no failures in trunk, all tests pass. [~rhbutani], do you think anything is missing in branch-0.13? Looking at commits, i couldnt figure out. All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-6953: -- Attachment: nohup.out.gz The nohup test output All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu Assignee: Alan Gates Attachments: nohup.out.gz When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977847#comment-13977847 ] Amareshwari Sriramadasu commented on HIVE-6953: --- Thanks [~alangates] and [~rhbutani] for trying. The tests are passing when i run them individually. When all the tests are run together, they are failing. Here is what i have done : {noformat} git clone https://github.com/apache/hive apache-hive git checkout branch-0.13 nohup mvn clean install -Phadoop-1 {noformat} Attaching nohup output for reference. bq. Is there anything in your logs indicating it tried to create the tables and failed? Will check and update. bq. Are you doing anything in your build to turn off the hive.in.test config value? No. I'm thinking the test db or conf is getting cleaned up by some-other means, when all the tests are run together. All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu Assignee: Alan Gates Attachments: nohup.out.gz When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977849#comment-13977849 ] Amareshwari Sriramadasu commented on HIVE-6953: --- The machine on which I'm running is a Linux machine. Same thing happens on Mac as well. {noformat} uname -a Linux hostname 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux $ java -version java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) {noformat} All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu Assignee: Alan Gates Attachments: nohup.out.gz When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
[ https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-6953: -- Attachment: TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml Attaching TestInitiator.xml All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist -- Key: HIVE-6953 URL: https://issues.apache.org/jira/browse/HIVE-6953 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Amareshwari Sriramadasu Assignee: Alan Gates Attachments: TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, nohup.out.gz When I'm running all tests through the command 'mvn clean install -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with following exception : {noformat} org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) {noformat} This is happening on branch-0.13. Has anyone faced this problem? [~owen.omalley] or someone else help me solve this. Do i have to set anything? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument
[ https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5370: -- Status: Patch Available (was: Open) format_number udf should take user specifed format as argument -- Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Fix For: 0.13.0 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument
[ https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5370: -- Attachment: HIVE-5370.patch Attaching the patch with following changes: * Added the format as second argument. * Also takes care of null be being formatted. Current code throws NPE for null value, fixed it to return null on formatting of null. Review request for the same: https://reviews.apache.org/r/18182/ format_number udf should take user specifed format as argument -- Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Fix For: 0.13.0 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument
[ https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5370: -- Status: Open (was: Patch Available) Earlier patch missed delete of the file. format_number udf should take user specifed format as argument -- Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Fix For: 0.13.0 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument
[ https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5370: -- Status: Patch Available (was: Open) format_number udf should take user specifed format as argument -- Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Fix For: 0.13.0 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch, HIVE-5370.patch Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument
[ https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5370: -- Attachment: HIVE-5370.patch Corrected the patch format_number udf should take user specifed format as argument -- Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Fix For: 0.13.0 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch, HIVE-5370.patch Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.
[ https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-6410: -- Status: Open (was: Patch Available) Resubmitting for tests to run Allow output serializations separators to be set for HDFS path as well. --- Key: HIVE-6410 URL: https://issues.apache.org/jira/browse/HIVE-6410 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: HIVE-6410.patch HIVE-3682 adds functionality for users to set serialization constants for 'insert overwrite local directory'. The same functionality should be available for hdfs path as well. The workaround suggested is to create a table with required format and insert into the table, which enforces the users to know the schema of the result and create the table ahead. Though that works, it is good to have the functionality for loading into directory as well. I'm planning to add the same functionality in 'insert overwrite directory' in this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.
[ https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-6410: -- Fix Version/s: 0.13.0 Status: Patch Available (was: Open) Allow output serializations separators to be set for HDFS path as well. --- Key: HIVE-6410 URL: https://issues.apache.org/jira/browse/HIVE-6410 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.13.0 Attachments: HIVE-6410.patch HIVE-3682 adds functionality for users to set serialization constants for 'insert overwrite local directory'. The same functionality should be available for hdfs path as well. The workaround suggested is to create a table with required format and insert into the table, which enforces the users to know the schema of the result and create the table ahead. Though that works, it is good to have the functionality for loading into directory as well. I'm planning to add the same functionality in 'insert overwrite directory' in this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.
[ https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901100#comment-13901100 ] Amareshwari Sriramadasu commented on HIVE-6410: --- [~xuefu.w...@kodak.com], had the patch ready, so uploaded. I don't mind any of them being closed as duplicate of other, as long as code gets in. Allow output serializations separators to be set for HDFS path as well. --- Key: HIVE-6410 URL: https://issues.apache.org/jira/browse/HIVE-6410 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: HIVE-6410.patch HIVE-3682 adds functionality for users to set serialization constants for 'insert overwrite local directory'. The same functionality should be available for hdfs path as well. The workaround suggested is to create a table with required format and insert into the table, which enforces the users to know the schema of the result and create the table ahead. Though that works, it is good to have the functionality for loading into directory as well. I'm planning to add the same functionality in 'insert overwrite directory' in this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.
[ https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-6410: -- Status: Patch Available (was: Open) Allow output serializations separators to be set for HDFS path as well. --- Key: HIVE-6410 URL: https://issues.apache.org/jira/browse/HIVE-6410 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: HIVE-6410.patch HIVE-3682 adds functionality for users to set serialization constants for 'insert overwrite local directory'. The same functionality should be available for hdfs path as well. The workaround suggested is to create a table with required format and insert into the table, which enforces the users to know the schema of the result and create the table ahead. Though that works, it is good to have the functionality for loading into directory as well. I'm planning to add the same functionality in 'insert overwrite directory' in this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.
[ https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-6410: -- Attachment: HIVE-6410.patch Attaching the patch with fix. The changes include : * Grammar changes for accepting table format and row format for 'insert overwrite directory'. * Fixed existing code to accept serde properties as well. * Added insert_overwrite_directory.q Review board request - https://reviews.apache.org/r/18060/ Allow output serializations separators to be set for HDFS path as well. --- Key: HIVE-6410 URL: https://issues.apache.org/jira/browse/HIVE-6410 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: HIVE-6410.patch HIVE-3682 adds functionality for users to set serialization constants for 'insert overwrite local directory'. The same functionality should be available for hdfs path as well. The workaround suggested is to create a table with required format and insert into the table, which enforces the users to know the schema of the result and create the table ahead. Though that works, it is good to have the functionality for loading into directory as well. I'm planning to add the same functionality in 'insert overwrite directory' in this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.
Amareshwari Sriramadasu created HIVE-6410: - Summary: Allow output serializations separators to be set for HDFS path as well. Key: HIVE-6410 URL: https://issues.apache.org/jira/browse/HIVE-6410 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu HIVE-3682 adds functionality for users to set serialization constants for 'insert overwrite local directory'. The same functionality should be available for hdfs path as well. The workaround suggested is to create a table with required format and insert into the table, which enforces the users to know the schema of the result and create the table ahead. Though that works, it is good to have the functionality for loading into directory as well. I'm planning to add the same functionality in 'insert overwrite directory' in this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice
[ https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898805#comment-13898805 ] Amareshwari Sriramadasu commented on HIVE-3682: --- Though above suggestion of creating a table and insert overwrite table works, it enforces the user to know schema of the output and create the table ahead. When queries are automated, it is difficult to always create the table ahead. I have created the issue HIVE-6410 for adding the functionality in this issue for INSERT OVERWRITE DIRECTORY as well. when output hive table to file,users should could have a separator of their own choice -- Key: HIVE-3682 URL: https://issues.apache.org/jira/browse/HIVE-3682 Project: Hive Issue Type: New Feature Components: CLI Affects Versions: 0.8.1 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux java version 1.6.0_25 hadoop-0.20.2-cdh3u0 hive-0.8.1 Reporter: caofangkun Assignee: Sushanth Sowmyan Fix For: 0.11.0 Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, HIVE-3682.D10275.4.patch.for.0.11, HIVE-3682.with.serde.patch By default,when output hive table to file ,columns of the Hive table are separated by ^A character (that is \001). But indeed users should have the right to set a seperator of their own choice. Usage Example: create table for_test (key string, value string); load data local inpath './in1.txt' into table for_test select * from for_test; UT-01:default separator is \001 line separator is \n insert overwrite local directory './test-01' select * from src ; create table array_table (a arraystring, b arraystring) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ','; load data local inpath ../hive/examples/files/arraytest.txt overwrite into table table2; CREATE TABLE map_table (foo STRING , bar MAPSTRING, STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' STORED AS TEXTFILE; UT-02:defined field separator as ':' insert overwrite local directory './test-02' row format delimited FIELDS TERMINATED BY ':' select * from src ; UT-03: line separator DO NOT ALLOWED to define as other separator insert overwrite local directory './test-03' row format delimited FIELDS TERMINATED BY ':' select * from src ; UT-04: define map separators insert overwrite local directory './test-04' row format delimited FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' select * from src; -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6404) Fix typo in serde constants for collection delimitor
Amareshwari Sriramadasu created HIVE-6404: - Summary: Fix typo in serde constants for collection delimitor Key: HIVE-6404 URL: https://issues.apache.org/jira/browse/HIVE-6404 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Amareshwari Sriramadasu Priority: Trivial The collection delimiter is defined with a typo in serdeConstants: {noformat} public static final String COLLECTION_DELIM = colelction.delim; {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6390) Read timeout on metastore client leads to out of sequence responses
Amareshwari Sriramadasu created HIVE-6390: - Summary: Read timeout on metastore client leads to out of sequence responses Key: HIVE-6390 URL: https://issues.apache.org/jira/browse/HIVE-6390 Project: Hive Issue Type: Bug Components: Metastore Reporter: Amareshwari Sriramadasu When client application gets Read timeout on hive metastore client, the subsequent calls to metastore fail with out of sequence response. Then, the only way to get out of this is to restart the client application. here are the exceptions: {noformat} 2014-02-04 08:42:04,132 ERROR hive.log (MetaStoreUtils) - Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912) {noformat} And subsequent calls to metastore get the following: {noformat} 2014-02-04 08:43:14,273 ERROR hive.log (MetaStoreUtils) - Got exception: org.apache.thrift.TApplicationException get_tables failed: out of sequence response org.apache.thrift.TApplicationException: get_tables failed: out of sequence response at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-6390) Read timeout on metastore client leads to out of sequence responses
[ https://issues.apache.org/jira/browse/HIVE-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved HIVE-6390. --- Resolution: Not A Problem Seems not problem with Hive metastore client, and the exceptions are handled in RetryingMetaStoreClient. The client application was getting continuous 'out of sequence response' from hive server connection. Read timeout on metastore client leads to out of sequence responses --- Key: HIVE-6390 URL: https://issues.apache.org/jira/browse/HIVE-6390 Project: Hive Issue Type: Bug Components: Metastore Reporter: Amareshwari Sriramadasu When client application gets Read timeout on hive metastore client, the subsequent calls to metastore fail with out of sequence response. Then, the only way to get out of this is to restart the client application. here are the exceptions: {noformat} 2014-02-04 08:42:04,132 ERROR hive.log (MetaStoreUtils) - Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912) {noformat} And subsequent calls to metastore get the following: {noformat} 2014-02-04 08:43:14,273 ERROR hive.log (MetaStoreUtils) - Got exception: org.apache.thrift.TApplicationException get_tables failed: out of sequence response org.apache.thrift.TApplicationException: get_tables failed: out of sequence response at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
[ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828528#comment-13828528 ] Amareshwari Sriramadasu commented on HIVE-4956: --- I agree with the concerns above that this is deviating from SQL. But it gives lot of performance improvement in distributed systems. How about change the separator to '+' instead of ',', as part of Hive QL? The query will look like the following : {noformat} select t.x, t.y, from T1+T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] {noformat} If the proposal is fine, I can upload the patch. Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently - Key: HIVE-4956 URL: https://issues.apache.org/jira/browse/HIVE-4956 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We have a usecase where the table storage partitioning changes over time. For ex: we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations. The proposal is to support queries of the following format : select t.x, t.y, from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on. Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument
[ https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5370: -- Fix Version/s: 0.13.0 Status: Patch Available (was: Open) format_number udf should take user specifed format as argument -- Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Fix For: 0.13.0 Attachments: D13185.1.patch, D13185.2.patch Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument
[ https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-5370: -- Status: Open (was: Patch Available) Looking into test failures format_number udf should take user specifed format as argument -- Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Fix For: 0.13.0 Attachments: D13185.1.patch, D13185.2.patch Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5370) format_number udf should take user specifed format as argument
Amareshwari Sriramadasu created HIVE-5370: - Summary: format_number udf should take user specifed format as argument Key: HIVE-5370 URL: https://issues.apache.org/jira/browse/HIVE-5370 Project: Hive Issue Type: Improvement Components: UDF Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Minor Currently, format_number udf formats the number to #,###,###.##, but it should also take a user specified format as optional input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5326) Operators and || do not work
Amareshwari Sriramadasu created HIVE-5326: - Summary: Operators and || do not work Key: HIVE-5326 URL: https://issues.apache.org/jira/browse/HIVE-5326 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html says they are same as AND and OR, they do not even get parsed. User gets parsing when they are used. hive select key from src where key=a || key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in expression specification hive select key from src where key=a key =b; FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in expression specification -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739435#comment-13739435 ] Amareshwari Sriramadasu commented on HIVE-4569: --- I think it makes sense to have two apis as JDBC drivers can call one with sync and other users interested in async can call async api. Though the documentation of execute() has to be changed to say that it is executed synchronously. GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5060) JDBC driver assumes executeStatement is synchronous
[ https://issues.apache.org/jira/browse/HIVE-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739437#comment-13739437 ] Amareshwari Sriramadasu commented on HIVE-5060: --- @Henry, HIVE-4569 adds another api to call execute asynchronously. After that, current code of jdbc driver should just work. If we have a synchronous api, the clients such as jdbc can fetch results after the execute immediately without bombarding the server with so many get-status calls. So, i definitely see the need for two apis. JDBC driver assumes executeStatement is synchronous --- Key: HIVE-5060 URL: https://issues.apache.org/jira/browse/HIVE-5060 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.11.0 Reporter: Henry Robinson Fix For: 0.11.1, 0.12.0 Attachments: 0001-HIVE-5060-JDBC-driver-assumes-executeStatement-is-sy.patch The JDBC driver seems to assume that {{ExecuteStatement}} is a synchronous call when performing updates via {{executeUpdate}}, where the following comment on the RPC in the Thrift file indicates otherwise: {code} // ExecuteStatement() // // Execute a statement. // The returned OperationHandle can be used to check on the // status of the statement, and to fetch results once the // statement has finished executing. {code} I understand that Hive's implementation of {{ExecuteStatement}} is blocking (see https://issues.apache.org/jira/browse/HIVE-4569), but presumably other implementations of the HiveServer2 API (and I'm talking specifically about Impala here, but others might have a similar concern) should be free to return a pollable {{OperationHandle}} per the specification. The JDBC driver's {{executeUpdate}} is as follows: {code} public int executeUpdate(String sql) throws SQLException { execute(sql); return 0; } {code} {{execute(sql)}} discards the {{OperationHandle}} that it gets from the server after determining whether there are results to be fetched. This is problematic for us, because Impala will cancel queries that are running when a session executes, but there's no easy way to be sure that an {{INSERT}} statement has completed before terminating a session on the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
Amareshwari Sriramadasu created HIVE-4956: - Summary: Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently Key: HIVE-4956 URL: https://issues.apache.org/jira/browse/HIVE-4956 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We have a usecase where the table storage partitioning changes over time. For ex: we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations. The proposal is to support queries of the following format : select t.x, t.y, from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on. Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
[ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723712#comment-13723712 ] Amareshwari Sriramadasu commented on HIVE-4956: --- The same usecase can be applied for tables stored at different rollups like daily rollups and hourly rollups. Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently - Key: HIVE-4956 URL: https://issues.apache.org/jira/browse/HIVE-4956 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We have a usecase where the table storage partitioning changes over time. For ex: we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations. The proposal is to support queries of the following format : select t.x, t.y, from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on. Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4710) ant maven-build -Dmvn.publish.repo=local fails
Amareshwari Sriramadasu created HIVE-4710: - Summary: ant maven-build -Dmvn.publish.repo=local fails Key: HIVE-4710 URL: https://issues.apache.org/jira/browse/HIVE-4710 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Amareshwari Sriramadasu ant maven-build fails with following error : /home/amareshwaris/hive/build.xml:121: The following error occurred while executing this line: /home/amareshwaris/hive/build.xml:123: The following error occurred while executing this line: Target make-pom does not exist in the project hcatalog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4115: -- Status: Patch Available (was: Open) Code is ready for review and checkin. Changing the status Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: cube-design-2.pdf, cube-design.docx, HIVE-4115.D10689.1.patch, HIVE-4115.D10689.2.patch, HIVE-4115.D10689.3.patch, HIVE-4115.D10689.4.patch We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing
[ https://issues.apache.org/jira/browse/HIVE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664858#comment-13664858 ] Amareshwari Sriramadasu commented on HIVE-4570: --- bq. Current API GetOperationState is not enough since it returns only a state enum. Instead of changing that we can add new API GetOperationProgress() which will return both OperationState and OperationProgress. Sounds good. +1. For default implementation of getProgress(), you can return 1, if task is successful and 0, otherwise. More information to user on GetOperationStatus in Hive Server2 when query is still executing Key: HIVE-4570 URL: https://issues.apache.org/jira/browse/HIVE-4570 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Currently in Hive Server2, when the query is still executing only the status is set as STILL_EXECUTING. This issue is to give more information to the user such as progress and running job handles, if possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4569: -- Assignee: Jaideep Dhok GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4569) GetQueryPlan api in Hive Server2
Amareshwari Sriramadasu created HIVE-4569: - Summary: GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing
Amareshwari Sriramadasu created HIVE-4570: - Summary: More information to user on GetOperationStatus in Hive Server2 when query is still executing Key: HIVE-4570 URL: https://issues.apache.org/jira/browse/HIVE-4570 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Currently in Hive Server2, when the query is still executing only the status is set as STILL_EXECUTING. This issue is to give more information to the user such as progress and running job handles, if possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652819#comment-13652819 ] Amareshwari Sriramadasu commented on HIVE-4115: --- The branch HIVE-4115 is ready for review. Also, created the phabricator entry. Changes include : * ql/src/java/org/apache/hadoop/hive/ql/cube/metadata/ has classes for Cube Metastore adn CubeMetastoreClient.java has the api to create cube, fact and dimension tables. * ql/src/java/org/apache/hadoop/hive/ql/cube/parse/ has code for validating the cube ql and converting the cube ql to HQL involving final storage tables * ql/src/java/org/apache/hadoop/hive/ql/cube/processors/CubeDriver.java is the entry point for the cube query. If query start with 'cube', it will be processed by CubeDriver. Will add Cube DDL in a followup jira. Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: cube-design-2.pdf, cube-design.docx, HIVE-4115.D10689.1.patch We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4115: -- Attachment: cube-design-2.pdf Attaching the updated design doc Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: cube-design-2.pdf, cube-design.docx We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4409) Prevent incompatible column type changes
[ https://issues.apache.org/jira/browse/HIVE-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642605#comment-13642605 ] Amareshwari Sriramadasu commented on HIVE-4409: --- Looks like the commit checked in /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java.orig as well. [~namitjain], Do you want to remove it? Prevent incompatible column type changes Key: HIVE-4409 URL: https://issues.apache.org/jira/browse/HIVE-4409 Project: Hive Issue Type: Improvement Components: CLI, Metastore Affects Versions: 0.10.0 Reporter: Dilip Joseph Assignee: Dilip Joseph Priority: Minor Fix For: 0.12.0 Attachments: hive.4409.1.patch, HIVE-4409.D10539.1.patch, HIVE-4409.D10539.2.patch If a user changes the type of an existing column of a partitioned table to an incompatible type, subsequent accesses of old partitions will result in a ClassCastException (see example below). We should prevent the user from making incompatible type changes. This feature will be controlled by a new config parameter. Example: CREATE TABLE test_table123 (a INT, b MAPSTRING, STRING) PARTITIONED BY (ds STRING) STORED AS SEQUENCEFILE; INSERT OVERWRITE TABLE test_table123 PARTITION(ds=foo1) SELECT 1, MAP(a1, b1) FROM src LIMIT 1; SELECT * from test_table123 WHERE ds=foo1; SET hive.metastore.disallow.invalid.col.type.changes=true; ALTER TABLE test_table123 REPLACE COLUMNS (a INT, b STRING); SELECT * from test_table123 WHERE ds=foo1; The last SELECT fails with the following exception: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1406) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:790) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:124) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_class_cast(TestCliDriver.java:108) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638891#comment-13638891 ] Amareshwari Sriramadasu commented on HIVE-4018: --- [~namit]Can you look at the latest patch on phabricator? I'm hoping this can get into hive 0.11 branch. MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4018: -- Attachment: HIVE-4018-2.txt Attaching the latest patch. MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018-2.txt, HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616156#comment-13616156 ] Amareshwari Sriramadasu commented on HIVE-4018: --- After Updating the patch to trunk, the test fails with NPE again. Will see whats the cause and update. MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616047#comment-13616047 ] Amareshwari Sriramadasu commented on HIVE-4018: --- [~namitjain], Can you please look at the latest patch on phabricator ? MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4115: -- Attachment: cube-design.docx Attaching the first cut design document for adding cube abstraction in hive. Pushed the code (being developed) to the branch HIVE-4115. Will be developing on the branch going forward. Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: cube-design.docx We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4115) Introduce cube abstraction in hive
Amareshwari Sriramadasu created HIVE-4115: - Summary: Introduce cube abstraction in hive Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593296#comment-13593296 ] Amareshwari Sriramadasu commented on HIVE-4115: --- Logical model : - *Cube* : * A cube is a set of dimensions and measures in a particular subject. * A measure is a quantity that you are interested in measuring. * A dimension is an attribute, or set of attributes, by which you can divide measures into sub-categories. *Fact Tables* : * Cube will have fact tables associated with it. * A fact table would have subset of measures and dimensions. * Fact tables can be rolled at any dimension and time. *Dimensions* : * The cube dimension can refer to a dimension table * The cube dimension can have hierarchy of elements. *Dimension tables* : * A table with list of columns. * The table can have references to other dimension tables. * The dimension tables can be shared across cubes. *Storage*: * Fact or dimension table can have storages associated with it. Storage Model : - A physical table will be created in hive metastore for each fact, per storage per rollup. Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593316#comment-13593316 ] Amareshwari Sriramadasu commented on HIVE-4115: --- Illustrating above model with an example : * Define a SALES_CUBE cube with measures : Sales, Discount and Dimensions: CustomerID, Location, Transaction-time * Dimensions: ** CustomerID is a simple dimension which refers to the customer table on column ID. CustomerTable is having the schema : ID, Age, Gender ** Location is hierarchical dimension with the hierarchy : Zipcode, CityID, StateID, CountryID, RegionID *** Zipcode refers to ZipTable on column code. ZipTable schema : code, street-name, cityID, stateID *** CityID refers to cityTable on column ID. CityTable schema : ID, name, stateID *** stateID refers to stateTable on column ID. StateTable schema : ID, name, capital, countryID *** countryID refers to counteryTable on column ID. CounterTable : ID, name, capital, Region *** Region is an inline dimension with values 'APAC', 'EMEA', 'USA' ** Transaction-time is simple dimension with timestamp field. * Facts :Sales_cube can have the following fact tables : ## RawFact with columns Sales, Discount, CustomerId, ZipCode, Transaction-time ## CountryFact with columns Sales, Discount, CountryID Physical storage tables : In the example described above say that RawFact is rolled hourly in Cluster c1, is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, monthly, quarterly and yearly on Cluster C2; Also, Customer table is available in HBase cluster H1; All the location tables are available in HDFS cluster C2. The physical tables would be : * C1_Rawfact_hourly - schema : Sales, Discount, CustomerId, ZipCode, Transaction-time Partitioned by dt and state. * C2_Rawfact_daily - schema : Sales, Discount, CustomerId, ZipCode, Transaction-time Partitioned by dt and state. * C2_Rawfact_monthly - schema : Sales, Discount, CustomerId, ZipCode, Transaction-time Partitioned by dt and state. * C2_CountryFact_daily - Schema : Sales, Discount, CountryID Partitioned by dt * C2_CountryFact_monthly - Schema : Sales, Discount, CountryID Partitioned by dt * C2_CountryFact_quarterly - Schema : Sales, Discount, CountryID Partitioned by dt * C2_CountryFact_yearly - Schema : Sales, Discount, CountryID Partitioned by dt * H1_CustomerTable - schema : ID, Age, Gender * C2_ZipTable - schema : code, street-name, cityID, stateID * C2_CityTable - schema : ID, name, stateID * C2_StateTable -schema : ID, name, capital, countryID * C2_CountryTable -schema : ID, name, capital, Region If User queries the data on cube with a query like the following : * Select sales from SALES_CUBE where region = 'APAC' and time_range_in(09/01/2012, 12/31/2012) // Q4 -2012. Cube Abstraction provided would be smart enough to figure out which table to go and give the result . In this case the query translates to : * Select sales from C2_CountryFact_quarterly join C2_countryTable on C2_CountryFact_quarterly.CountryID = C2_countryTable.ID where dt = Q4-2012 and C2_countryTable.region = 'APAC'; Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive
[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593337#comment-13593337 ] Amareshwari Sriramadasu commented on HIVE-4115: --- bq. In the example described above say that RawFact is rolled hourly in Cluster c1, is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, monthly, quarterly and yearly on Cluster C2; Also, Customer table is available in HBase cluster H1; All the location tables are available in HDFS cluster C2. Forgot to mention that, along with timely rolling RawFact is rolled at dimension state also. Introduce cube abstraction in hive -- Key: HIVE-4115 URL: https://issues.apache.org/jira/browse/HIVE-4115 Project: Hive Issue Type: New Feature Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu We would like to define a cube abstraction so that user can query at cube layer and do not know anything about storage and rollups. Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4018: -- Status: Patch Available (was: Open) Updated the phabricator entry with comments incorporated. Now, AbstractMapJoinKey.readExternal uses MapJoinOperator's static variable, writeExternal uses HashTableSinkOperator's static variable MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3655) Use ChainMapper and ChainReducer for queries having [Map+][RMap*] pattern
[ https://issues.apache.org/jira/browse/HIVE-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved HIVE-3655. --- Resolution: Won't Fix Use ChainMapper and ChainReducer for queries having [Map+][RMap*] pattern - Key: HIVE-3655 URL: https://issues.apache.org/jira/browse/HIVE-3655 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu While breaking the query plan into multiple map reduce tasks, Hive should consider the pattern [Map+][ReduceMap*] and generate single map reduce job for such patterns using ChainMapper and ChainReducer -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3952) merge map-job followed by map-reduce job
[ https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588233#comment-13588233 ] Amareshwari Sriramadasu commented on HIVE-3952: --- Tried out the patch, when we run query like the following : INSERT OVERWRITE DIRECTORY /dir Select It fails with exception : {noformat} java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.MoveTask cannot be cast to org.apache.hadoop.hive.ql.exec.MapRedTask at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.mayBeMergeMapJoinTaskWithMapReduceTask(CommonJoinResolver.java:291) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:535) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:701) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:113) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8138) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8470) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:259) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:898) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} merge map-job followed by map-reduce job Key: HIVE-3952 URL: https://issues.apache.org/jira/browse/HIVE-3952 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Vinod Kumar Vavilapalli Attachments: HIVE-3952-20130226.txt Consider the query like: select count(*) FROM ( select idOne, idTwo, value FROM bigTable JOIN smallTableOne on (bigTable.idOne = smallTableOne.idOne) ) firstjoin JOIN smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo); where smallTableOne and smallTableTwo are smaller than hive.auto.convert.join.noconditionaltask.size and hive.auto.convert.join.noconditionaltask is set to true. The joins are collapsed into mapjoins, and it leads to a map-only job (for the map-joins) followed by a map-reduce job (for the group by). Ideally, the map-only job should be merged with the following map-reduce job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586930#comment-13586930 ] Amareshwari Sriramadasu commented on HIVE-4018: --- Phabricator request - https://reviews.facebook.net/D8913 MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4018: -- Status: Patch Available (was: Open) MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-4018: -- Attachment: HIVE-4018.patch Here is a patch which fixes the issue, with testcase added. MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.11.0 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581110#comment-13581110 ] Amareshwari Sriramadasu commented on HIVE-4018: --- bq. this is a existing bug. Do you really think we should fix this now ? I mean, it is a pretty big and fundamental change. I would say this should be fixed. Because earlier we are able to run the same multi join query using 2 MR jobs with mapjoin hint passed in nested structure for each join as described in HIVE-3652. Now there is no way to do mapjoin for this multiway join, as the same query fails with this error, after changes to HIVE-3784. It becomes more trouble because there are no more mapjoin hints and will have to explicitly turn off autojoin for such queries. MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error
[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581149#comment-13581149 ] Amareshwari Sriramadasu commented on HIVE-4018: --- bq. I agree from your point of view. Do you think you would be able to help on this ? Sure. Will give a try. MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.4018.test.2.patch, HIVE-4018-test.patch When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-3652: -- Attachment: HIVE-3652-tests.patch Attaching test with .q and .out files, which is launching two MR jobs for star join queries. Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Attachments: HIVE-3652-tests.patch Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577403#comment-13577403 ] Amareshwari Sriramadasu commented on HIVE-3652: --- Seems I figured it out. The hive.auto.convert.join.noconditionaltask.size is not the number of rows. When i changed hive.auto.convert.join.noconditionaltask.size value in the attached tests, it is launching one MR job. Will upload the patch again to add tests. Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Attachments: HIVE-3652-tests.patch Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-3652: -- Attachment: HIVE-3652-tests.patch Attaching the tests again. With hive.auto.convert.join.noconditionaltask.size increased, it launches single MR job for the queries. Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Attachments: HIVE-3652-tests.patch, HIVE-3652-tests.patch Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4018) MapJoin failing with Distributed Cache error
Amareshwari Sriramadasu created HIVE-4018: - Summary: MapJoin failing with Distributed Cache error Key: HIVE-4018 URL: https://issues.apache.org/jira/browse/HIVE-4018 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.11.0 Reporter: Amareshwari Sriramadasu Fix For: 0.11.0 When I'm a running a star join query after HIVE-3784, it is failing with following error: 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load Distributed Cache Error 2013-02-13 08:36:04,585 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3652) Join optimization for star schema
[ https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved HIVE-3652. --- Resolution: Duplicate Join optimization for star schema - Key: HIVE-3652 URL: https://issues.apache.org/jira/browse/HIVE-3652 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Vikram Dixit K Fix For: 0.11.0 Attachments: HIVE-3652-tests.patch, HIVE-3652-tests.patch Currently, if we join one fact table with multiple dimension tables, it results in multiple mapreduce jobs for each join with dimension table, because join would be on different keys for each dimension. Usually all the dimension tables will be small and can fit into memory and so map-side join can used to join with fact table. In this issue I want to look at optimizing such query to generate single mapreduce job sothat mapper loads dimension tables into memory and joins with fact table on different keys as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira