[jira] [Created] (HIVE-14264) ArrayIndexOutOfBoundsException when cbo is enabled

2016-07-18 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-14264:
--

 Summary: ArrayIndexOutOfBoundsException when cbo is enabled 
 Key: HIVE-14264
 URL: https://issues.apache.org/jira/browse/HIVE-14264
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 2.1.0
Reporter: Amareshwari Sriramadasu


We have noticed ArrayIndexOutOfBoundsException for queries with IS NOT NULL 
filter. Exception goes away when hive.cbo.enable=false

Here is a  stacktrace in our production environment :
{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.elementData(ArrayList.java:418) ~[na:1.8.0_72]
at java.util.ArrayList.set(ArrayList.java:446) ~[na:1.8.0_72]
at 
org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:173)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) 
~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:274) 
~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10764)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:234)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
 ~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:436) 
~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:328) 
~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1156) 
~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1143) 
~[hive-exec-2.1.2-inm.jar:2.1.2-inm]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:147)
 ~[hive-service-2.1.2-inm.jar:2.1.2-inm]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13862) org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilter falls back to ORM

2016-05-26 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-13862:
--

 Summary: 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilter
 falls back to ORM 
 Key: HIVE-13862
 URL: https://issues.apache.org/jira/browse/HIVE-13862
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Amareshwari Sriramadasu
Assignee: Rajat Khandelwal
 Fix For: 2.1.0


We are seeing following exception and calls fall back to ORM which make it 
costly :

{noformat}
 WARN  org.apache.hadoop.hive.metastore.ObjectStore - Direct SQL failed, 
falling back to ORM
java.lang.ClassCastException: 
org.datanucleus.store.rdbms.query.ForwardQueryResult cannot be cast to 
java.lang.Number
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlInt(MetaStoreDirectSql.java:892)
 ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:855)
 ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getNumPartitionsViaSqlFilter(MetaStoreDirectSql.java:405)
 ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:2763)
 ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:2755)
 ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2606)
 ~[hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore.getNumPartitionsByFilterInternal(ObjectStore.java:2770)
 [hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]
at 
org.apache.hadoop.hive.metastore.ObjectStore.getNumPartitionsByFilter(ObjectStore.java:2746)
 [hive-exec-2.1.2-inm-SNAPSHOT.jar:2.1.2-inm-SNAPSHOT]

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11482) Add retrying thrift client for HiveServer2

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11482:
--

 Summary: Add retrying thrift client for HiveServer2
 Key: HIVE-11482
 URL: https://issues.apache.org/jira/browse/HIVE-11482
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu


Similar to 
https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java,
 this improvement request is to add a retrying thrift client for HiveServer2 to 
do retries upon thrift exceptions.

Here are few commits done on a forked branch that can be picked - 
https://github.com/InMobi/hive/commit/7fb957fb9c2b6000d37c53294e256460010cb6b7
https://github.com/InMobi/hive/commit/11e4b330f051c3f58927a276d562446761c9cd6d
https://github.com/InMobi/hive/commit/241386fd870373a9253dca0bcbdd4ea7e665406c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11483) Add encoding and decoding for query string config

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11483:
--

 Summary: Add encoding and decoding for query string config
 Key: HIVE-11483
 URL: https://issues.apache.org/jira/browse/HIVE-11483
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu


We have seen some queries in production where some of the literals passed in 
the query have control characters, which result in exception when query string 
is set in the job xml.

Proposing a solution to encode the query string in configuration and provide 
getters decoded string.

Here is a commit in a forked repo : 
https://github.com/InMobi/hive/commit/2faf5761191fa3103a0d779fde584d494ed75bf5

Suggestions are welcome on the solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11486) Hive should log exceptions for better debuggability with full trace

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11486:
--

 Summary: Hive should log exceptions for better debuggability with 
full trace
 Key: HIVE-11486
 URL: https://issues.apache.org/jira/browse/HIVE-11486
 Project: Hive
  Issue Type: Improvement
  Components: Diagnosability
Reporter: Amareshwari Sriramadasu


For ex : 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2638
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java#L315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11485) Session close should not close async SQL operations

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11485:
--

 Summary: Session close should not close async SQL operations
 Key: HIVE-11485
 URL: https://issues.apache.org/jira/browse/HIVE-11485
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu


Right now, session close on HiveServer closes all operations. But, queries 
running are actually available across sessions and they are not tied to a 
session (expect the launch - which requires configuration and resources). And 
it allows getting the status of the query across sessions.

But session close of the session ( on which operation is launched) closes all 
the operations as well. 

So, we should avoid closing all operations upon closing a session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11487) Add getNumPartitionsByFilter api in metastore api

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11487:
--

 Summary: Add getNumPartitionsByFilter api in metastore api
 Key: HIVE-11487
 URL: https://issues.apache.org/jira/browse/HIVE-11487
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Amareshwari Sriramadasu


Adding api for getting number of partitions for a filter will be more optimal 
when we are only interested in the number. getAllPartitions will construct all 
the partition object which can be time consuming and not required.

Here is a commit we pushed in a forked repo in our organization - 
https://github.com/inmobi/hive/commit/68b3534d3e6c4d978132043cec668798ed53e444.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11484) Fix ObjectInspector for Char and VarChar

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11484:
--

 Summary: Fix ObjectInspector for Char and VarChar
 Key: HIVE-11484
 URL: https://issues.apache.org/jira/browse/HIVE-11484
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Amareshwari Sriramadasu


The creation of HiveChar and Varchar is not happening through ObjectInspector.

Here is fix we pushed internally : 
https://github.com/InMobi/hive/commit/fe95c7850e7130448209141155f28b25d3504216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10435) Make HiveSession implementation pluggable through configuration

2015-04-22 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-10435:
--

 Summary: Make HiveSession implementation pluggable through 
configuration
 Key: HIVE-10435
 URL: https://issues.apache.org/jira/browse/HIVE-10435
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu


SessionManager in CLIService creates and keeps track of HiveSession. 
Right now, it creates HiveSessionImpl which is one implementation of 
HiveSession. This improvement request is to make it pluggable through a 
configuration sothat other implementations can be passed.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock

2015-01-09 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270779#comment-14270779
 ] 

Amareshwari Sriramadasu commented on HIVE-9324:
---

After doing some code walkthrough, here is what i found,

On JoinOperator, whenever any key as more values than BLOCKSIZE(hardcoded to 
25000), it spills the values to a file on disk, and spill uses SequenceFile 
format. 

Here is the table description for spill (from 
org.apache.hadoop.hive.ql.exec.JoinUtil.java)
{noformat}
  TableDesc tblDesc = new TableDesc(
  SequenceFileInputFormat.class, HiveSequenceFileOutputFormat.class,
  Utilities.makeProperties(
  org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_FORMAT, 
  + Utilities.ctrlaCode,
  org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMNS, colNames
  .toString(),
  org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMN_TYPES,
  colTypes.toString(),
  serdeConstants.SERIALIZATION_LIB,LazyBinarySerDe.class.getName()));
  spillTableDesc[tag] = tblDesc;
{noformat}
From the exception:
{noformat}
Caused by: java.io.IOException: 
org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 
27264
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
... 13 more
{noformat}

I see that the value in SequenceFile is RCFile$KeyBuffer, dont know why. Also 
couldnt figure out the reason why the reading went wrong.

Following is the code snippet from SequenceFile.java for the exception we are 
hitting :
{noformat}
2417 public synchronized Object next(Object key) throws IOException {
2418   if (key != null  key.getClass() != getKeyClass()) {
2419 throw new IOException(wrong key class: +key.getClass().getName()
2420   + is not +keyClass);
2421   }
2422 
2423   if (!blockCompressed) {
2424 outBuf.reset();
2425 
2426 keyLength = next(outBuf);
2427 if (keyLength  0)
2428   return null;
2429 
2430 valBuffer.reset(outBuf.getData(), outBuf.getLength());
2431 
2432 key = deserializeKey(key);
2433 valBuffer.mark(0);
2434 if (valBuffer.getPosition() != keyLength)
2435   throw new IOException(key +  read  + valBuffer.getPosition()
2436 +  bytes, should read  + keyLength);
{noformat}

 Reduce side joins failing with IOException from RowContainer.nextBlock
 --

 Key: HIVE-9324
 URL: https://issues.apache.org/jira/browse/HIVE-9324
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Amareshwari Sriramadasu

 We are seeing some reduce side join mapreduce jobs failing with following 
 exception :
 {noformat}
 2014-12-14 16:58:51,296 ERROR 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer: 
 org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should 
 read 27264
 java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 
 read 1 bytes, should read 27264
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
   at 
 org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.mapred.Child.main(Child.java:262)
 2014-12-14 16:58:51,334 FATAL ExecReducer: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 

[jira] [Created] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock

2015-01-08 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-9324:
-

 Summary: Reduce side joins failing with IOException from 
RowContainer.nextBlock
 Key: HIVE-9324
 URL: https://issues.apache.org/jira/browse/HIVE-9324
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Amareshwari Sriramadasu


We are seeing some reduce side join mapreduce jobs failing with following 
exception :

{noformat}
2014-12-14 16:58:51,296 ERROR 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer: 
org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 
27264
java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 
1 bytes, should read 27264
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2014-12-14 16:58:51,334 FATAL ExecReducer: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 
27264
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 
1 bytes, should read 27264
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
... 12 more
Caused by: java.io.IOException: 
org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 
27264
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
... 13 more

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock

2015-01-08 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270588#comment-14270588
 ] 

Amareshwari Sriramadasu commented on HIVE-9324:
---

More task log :

{noformat}
2014-12-14 16:58:03,905 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: 
Ignoring retrieval request: __REDUCE_PLAN__
2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.log.PerfLogger: PERFLOG 
method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities
2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.exec.Utilities: 
Deserializing ReduceWork via kryo
2014-12-14 16:58:04,987 INFO org.apache.hadoop.hive.ql.log.PerfLogger: 
/PERFLOG method=deserializePlan start=1418576283945 end=1418576284987 
duration=1042 from=org.apache.hadoop.hive.ql.exec.Utilities
2014-12-14 16:58:04,988 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: 
Ignoring cache key: __REDUCE_PLAN__
2014-12-14 16:58:05,327 INFO ExecReducer: 
JOINId =0
  Children
FSId =1
  Children
  \Children
  ParentId = 0 null\Parent
\FS
  \Children
  Parent\Parent
\JOIN
2014-12-14 16:58:05,327 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Initializing Self 0 JOIN
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
JOIN 
struct_col23:string,_col65:double,_col99:double,_col237:double,_col240:double,_col250:string,_col367:int
 totalsz = 7
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Operator 0 JOIN initialized
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Initializing children of 0 JOIN
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Initializing child 1 FS
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Initializing Self 1 FS
2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Operator 1 FS initialized
2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Initialization Done 1 FS
2014-12-14 16:58:05,395 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Initialization Done 0 JOIN
2014-12-14 16:58:05,401 INFO ExecReducer: ExecReducer: processing 1 rows: used 
memory = 242598168
2014-12-14 16:58:05,406 INFO ExecReducer: ExecReducer: processing 10 rows: used 
memory = 242759392
2014-12-14 16:58:05,437 INFO ExecReducer: ExecReducer: processing 100 rows: 
used memory = 242759392
2014-12-14 16:58:05,657 INFO ExecReducer: ExecReducer: processing 1000 rows: 
used memory = 243653240
2014-12-14 16:58:06,976 INFO ExecReducer: ExecReducer: processing 1 rows: 
used memory = 247197944
2014-12-14 16:58:07,646 INFO ExecReducer: ExecReducer: processing 10 rows: 
used memory = 277801256
2014-12-14 16:58:11,511 INFO ExecReducer: ExecReducer: processing 100 rows: 
used memory = 283150744
2014-12-14 16:58:14,993 INFO ExecReducer: ExecReducer: processing 200 rows: 
used memory = 293036992
2014-12-14 16:58:18,497 INFO ExecReducer: ExecReducer: processing 300 rows: 
used memory = 311449488
2014-12-14 16:58:20,815 INFO ExecReducer: ExecReducer: processing 400 rows: 
used memory = 285251752
2014-12-14 16:58:26,460 INFO ExecReducer: ExecReducer: processing 500 rows: 
used memory = 328223864
2014-12-14 16:58:29,412 INFO ExecReducer: ExecReducer: processing 600 rows: 
used memory = 263175576
2014-12-14 16:58:31,331 INFO ExecReducer: ExecReducer: processing 700 rows: 
used memory = 282021320
2014-12-14 16:58:35,099 INFO ExecReducer: ExecReducer: processing 800 rows: 
used memory = 299301184
2014-12-14 16:58:37,981 INFO ExecReducer: ExecReducer: processing 900 rows: 
used memory = 306925648
2014-12-14 16:58:40,506 INFO ExecReducer: ExecReducer: processing 1000 
rows: used memory = 307407920
2014-12-14 16:58:42,242 INFO ExecReducer: ExecReducer: processing 1100 
rows: used memory = 304664048
2014-12-14 16:58:46,142 INFO ExecReducer: ExecReducer: processing 1200 
rows: used memory = 298347024
2014-12-14 16:58:48,549 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 1000 rows for join key [003b9de7876541c2bcce9029ff0d3873]
2014-12-14 16:58:48,622 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 2000 rows for join key [003b9de7876541c2bcce9029ff0d3873]
2014-12-14 16:58:48,677 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 4000 rows for join key [003b9de7876541c2bcce9029ff0d3873]
2014-12-14 16:58:48,679 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Final Path: FS 
hdfs://data-grill300-null.arshad.ev1.inmobi.com:8020/tmp/hive-dataqa/hive_2014-12-14_16-49-14_996_1630664550753106415-32/_tmp.-mr-10002/00_0
2014-12-14 16:58:48,680 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Writing to temp file: FS 

[jira] [Comment Edited] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock

2015-01-08 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270588#comment-14270588
 ] 

Amareshwari Sriramadasu edited comment on HIVE-9324 at 1/9/15 5:54 AM:
---

More task log :

{noformat}
2014-12-14 16:58:03,905 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: 
Ignoring retrieval request: __REDUCE_PLAN__
2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.log.PerfLogger: PERFLOG 
method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities
2014-12-14 16:58:03,945 INFO org.apache.hadoop.hive.ql.exec.Utilities: 
Deserializing ReduceWork via kryo
2014-12-14 16:58:04,987 INFO org.apache.hadoop.hive.ql.log.PerfLogger: 
/PERFLOG method=deserializePlan start=1418576283945 end=1418576284987 
duration=1042 from=org.apache.hadoop.hive.ql.exec.Utilities
2014-12-14 16:58:04,988 INFO org.apache.hadoop.hive.ql.exec.mr.ObjectCache: 
Ignoring cache key: __REDUCE_PLAN__
2014-12-14 16:58:05,327 INFO ExecReducer: 
JOINId =0
  Children
FSId =1
  Children
  \Children
  ParentId = 0 null\Parent
\FS
  \Children
  Parent\Parent
\JOIN
2014-12-14 16:58:05,327 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Initializing Self 0 JOIN
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
JOIN 
struct_col23:string,_col65:double,_col99:double,_col237:double,_col240:double,_col250:string,_col367:int
 totalsz = 7
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Operator 0 JOIN initialized
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Initializing children of 0 JOIN
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Initializing child 1 FS
2014-12-14 16:58:05,377 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Initializing Self 1 FS
2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Operator 1 FS initialized
2014-12-14 16:58:05,394 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Initialization Done 1 FS
2014-12-14 16:58:05,395 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
Initialization Done 0 JOIN
2014-12-14 16:58:05,401 INFO ExecReducer: ExecReducer: processing 1 rows: used 
memory = 242598168
2014-12-14 16:58:05,406 INFO ExecReducer: ExecReducer: processing 10 rows: used 
memory = 242759392
2014-12-14 16:58:05,437 INFO ExecReducer: ExecReducer: processing 100 rows: 
used memory = 242759392
2014-12-14 16:58:05,657 INFO ExecReducer: ExecReducer: processing 1000 rows: 
used memory = 243653240
2014-12-14 16:58:06,976 INFO ExecReducer: ExecReducer: processing 1 rows: 
used memory = 247197944
2014-12-14 16:58:07,646 INFO ExecReducer: ExecReducer: processing 10 rows: 
used memory = 277801256
2014-12-14 16:58:11,511 INFO ExecReducer: ExecReducer: processing 100 rows: 
used memory = 283150744
2014-12-14 16:58:14,993 INFO ExecReducer: ExecReducer: processing 200 rows: 
used memory = 293036992
2014-12-14 16:58:18,497 INFO ExecReducer: ExecReducer: processing 300 rows: 
used memory = 311449488
2014-12-14 16:58:20,815 INFO ExecReducer: ExecReducer: processing 400 rows: 
used memory = 285251752
2014-12-14 16:58:26,460 INFO ExecReducer: ExecReducer: processing 500 rows: 
used memory = 328223864
2014-12-14 16:58:29,412 INFO ExecReducer: ExecReducer: processing 600 rows: 
used memory = 263175576
2014-12-14 16:58:31,331 INFO ExecReducer: ExecReducer: processing 700 rows: 
used memory = 282021320
2014-12-14 16:58:35,099 INFO ExecReducer: ExecReducer: processing 800 rows: 
used memory = 299301184
2014-12-14 16:58:37,981 INFO ExecReducer: ExecReducer: processing 900 rows: 
used memory = 306925648
2014-12-14 16:58:40,506 INFO ExecReducer: ExecReducer: processing 1000 
rows: used memory = 307407920
2014-12-14 16:58:42,242 INFO ExecReducer: ExecReducer: processing 1100 
rows: used memory = 304664048
2014-12-14 16:58:46,142 INFO ExecReducer: ExecReducer: processing 1200 
rows: used memory = 298347024
2014-12-14 16:58:48,549 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 1000 rows for join key [003b9de7876541c2bcce9029ff0d3873]
2014-12-14 16:58:48,622 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 2000 rows for join key [003b9de7876541c2bcce9029ff0d3873]
2014-12-14 16:58:48,677 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: 
table 0 has 4000 rows for join key [003b9de7876541c2bcce9029ff0d3873]
2014-12-14 16:58:48,679 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Final Path: FS 
hdfs://test-machine:8020/tmp/hive-dataqa/hive_2014-12-14_16-49-14_996_1630664550753106415-32/_tmp.-mr-10002/00_0
2014-12-14 16:58:48,680 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
Writing to temp file: FS 

[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive

2014-11-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4115:
--
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

This effort has been incubated into apache here - 
http://incubator.apache.org/projects/lens.html

 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-4115.D10689.1.patch, HIVE-4115.D10689.2.patch, 
 HIVE-4115.D10689.3.patch, HIVE-4115.D10689.4.patch, cube-design-2.pdf, 
 cube-design.docx


 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7892) Thrift Set type not working with Hive

2014-09-15 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7892:
--
   Resolution: Fixed
Fix Version/s: 0.14.0
 Release Note: Maps thrift's set type to hive's array type.
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Satish !

 Thrift Set type not working with Hive
 -

 Key: HIVE-7892
 URL: https://issues.apache.org/jira/browse/HIVE-7892
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Satish Mittal
Assignee: Satish Mittal
 Fix For: 0.14.0

 Attachments: HIVE-7892.1.patch, HIVE-7892.patch.txt


 Thrift supports List, Map and Struct complex types, which get mapped to 
 Array, Map and Struct complex types in Hive respectively. However thrift Set 
 type doesn't seem to be working. 
 Here is an example thrift struct:
 {noformat}
 namespace java sample.thrift
 struct setrow {
 1: required seti32 ids,
 2: required string name,
 }
 {noformat}
 A Hive table is created with ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH 
 SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 
 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol').
 Describing the table shows:
 {noformat}
 hive describe settable; 
 OK
 ids   structfrom deserializer   
 namestringfrom deserializer
 {noformat}
 Issuing a select query on set column throws SemanticException:
 {noformat}
 hive select ids from settable;
 FAILED: SemanticException java.lang.IllegalArgumentException: Error: name 
 expected at the position 7 of 'struct' but '' is found.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types

2014-09-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133617#comment-14133617
 ] 

Amareshwari Sriramadasu commented on HIVE-7936:
---

+1 Patch looks fine.

Can you update the test output here?

 Support for handling Thrift Union types 
 

 Key: HIVE-7936
 URL: https://issues.apache.org/jira/browse/HIVE-7936
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, 
 complex.seq


 Currently hive does not support thrift unions through ThriftDeserializer. 
 Need to add support for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7936) Support for handling Thrift Union types

2014-09-15 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7936:
--
  Resolution: Fixed
Release Note: Support Thrift union type as hive union type
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Suma!

 Support for handling Thrift Union types 
 

 Key: HIVE-7936
 URL: https://issues.apache.org/jira/browse/HIVE-7936
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, 
 complex.seq


 Currently hive does not support thrift unions through ThriftDeserializer. 
 Need to add support for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types

2014-09-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133946#comment-14133946
 ] 

Amareshwari Sriramadasu commented on HIVE-7936:
---

Sorry.. missed commenting binary file changes. Merging it now.

 Support for handling Thrift Union types 
 

 Key: HIVE-7936
 URL: https://issues.apache.org/jira/browse/HIVE-7936
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, 
 complex.seq


 Currently hive does not support thrift unions through ThriftDeserializer. 
 Need to add support for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-7936) Support for handling Thrift Union types

2014-09-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133946#comment-14133946
 ] 

Amareshwari Sriramadasu edited comment on HIVE-7936 at 9/15/14 2:37 PM:


Sorry.. missed committing binary file changes. Merged it now in a different 
commit


was (Author: amareshwari):
Sorry.. missed commenting binary file changes. Merging it now.

 Support for handling Thrift Union types 
 

 Key: HIVE-7936
 URL: https://issues.apache.org/jira/browse/HIVE-7936
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, 
 complex.seq


 Currently hive does not support thrift unions through ThriftDeserializer. 
 Need to add support for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-09-10 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7694:
--
Release Note: SMB join on tables differing by number of sorted by columns 
with same join prefix  (was: I just committed this. Thanks Suma!)

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-09-10 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7694:
--
  Resolution: Fixed
Release Note: I just committed this. Thanks Suma!
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-09-10 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128116#comment-14128116
 ] 

Amareshwari Sriramadasu commented on HIVE-7694:
---

I just committed this. Thanks Suma!

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7892) Thrift Set type not working with Hive

2014-09-10 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128138#comment-14128138
 ] 

Amareshwari Sriramadasu commented on HIVE-7892:
---

Code changes look fine. Can you update the test output for 
convert_enum_to_string.q and upload the patch?

 Thrift Set type not working with Hive
 -

 Key: HIVE-7892
 URL: https://issues.apache.org/jira/browse/HIVE-7892
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Satish Mittal
Assignee: Satish Mittal
 Attachments: HIVE-7892.patch.txt


 Thrift supports List, Map and Struct complex types, which get mapped to 
 Array, Map and Struct complex types in Hive respectively. However thrift Set 
 type doesn't seem to be working. 
 Here is an example thrift struct:
 {noformat}
 namespace java sample.thrift
 struct setrow {
 1: required seti32 ids,
 2: required string name,
 }
 {noformat}
 A Hive table is created with ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH 
 SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 
 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol').
 Describing the table shows:
 {noformat}
 hive describe settable; 
 OK
 ids   structfrom deserializer   
 namestringfrom deserializer
 {noformat}
 Issuing a select query on set column throws SemanticException:
 {noformat}
 hive select ids from settable;
 FAILED: SemanticException java.lang.IllegalArgumentException: Error: name 
 expected at the position 7 of 'struct' but '' is found.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types

2014-09-10 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128313#comment-14128313
 ] 

Amareshwari Sriramadasu commented on HIVE-7936:
---

The code changes look fine. Put a few comments on the review board. 
Since the patch involves a binary file change, i think jenkins wont be able to 
apply the patch. Can you run the tests on a local machine and update the result 
here?

 Support for handling Thrift Union types 
 

 Key: HIVE-7936
 URL: https://issues.apache.org/jira/browse/HIVE-7936
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7936.1.patch, HIVE-7936.patch, complex.seq


 Currently hive does not support thrift unions through ThriftDeserializer. 
 Need to add support for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-2390) Expand support for union types

2014-09-09 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-2390:
--
  Resolution: Fixed
Release Note: Adds UnionType support in LazyBinarySerde
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Suma!

 Expand support for union types
 --

 Key: HIVE-2390
 URL: https://issues.apache.org/jira/browse/HIVE-2390
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Jakob Homan
Assignee: Suma Shivaprasad
  Labels: uniontype
 Fix For: 0.14.0

 Attachments: HIVE-2390.1.patch, HIVE-2390.patch


 When the union type was introduced, full support for it wasn't provided.  For 
 instance, when working with a union that gets passed to LazyBinarySerde: 
 {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468)
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230)
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-09-09 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126785#comment-14126785
 ] 

Amareshwari Sriramadasu commented on HIVE-7694:
---

+1 Code changes look fine to me. 
[~suma.shivaprasad], Can you rebase the patch? Also run tests once again as the 
last test build was having test failures.
Make sure failed tests are not failing on your local machine before submitting 
again

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2390) Expand support for union types

2014-09-07 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125210#comment-14125210
 ] 

Amareshwari Sriramadasu commented on HIVE-2390:
---

+1 Changes look fine to me.

[~suma.shivaprasad], Test failure seems unrelated to me. Can you look into and 
confirm?

 Expand support for union types
 --

 Key: HIVE-2390
 URL: https://issues.apache.org/jira/browse/HIVE-2390
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Jakob Homan
Assignee: Suma Shivaprasad
  Labels: uniontype
 Fix For: 0.14.0

 Attachments: HIVE-2390.1.patch, HIVE-2390.patch


 When the union type was introduced, full support for it wasn't provided.  For 
 instance, when working with a union that gets passed to LazyBinarySerde: 
 {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468)
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230)
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-08-20 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7694:
--

Assignee: Suma Shivaprasad

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-20 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-7629:
--

Assignee: Suma Shivaprasad

 Problem in SMB Joins between two Parquet tables
 ---

 Key: HIVE-7629
 URL: https://issues.apache.org/jira/browse/HIVE-7629
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
  Labels: Parquet
 Fix For: 0.14.0

 Attachments: HIVE-7629.1.patch, HIVE-7629.patch


 The issue is clearly seen when two bucketed and sorted parquet tables with 
 different number of columns are involved in the join . The following 
 exception is seen
 {noformat}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79)
 at 
 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66)
 at 
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-07-02 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5733:
--

Status: Open  (was: Patch Available)

There are problems with uploaded patch, canceling

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-5733.1.patch


 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-16 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5733:
--

Status: Patch Available  (was: Open)

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-5733.1.patch


 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-16 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5733:
--

Attachment: HIVE-5733.1.patch

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-5733.1.patch


 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-16 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5733:
--

Attachment: HIVE-5733.1.patch

As per the documentation of maven-shade-plugin the plugin will replace the 
project's main artifact with the shaded artifact. Giving a different classifier 
like nodep also was not helpful. As per the doc, the shaded artifact can be 
given a different name. Doc - 
http://maven.apache.org/plugins/maven-shade-plugin/examples/attached-artifact.html
 

Attaching the patch which generates hive-exec-version.jar with no 
dependencies and hive-exec-version.-withdep.jar as the shaded jar. 



 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-5733.1.patch


 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-16 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5733:
--

Attachment: (was: HIVE-5733.1.patch)

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-5733.1.patch


 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-15 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu reassigned HIVE-5733:
-

Assignee: Amareshwari Sriramadasu

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho
Assignee: Amareshwari Sriramadasu

 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997276#comment-13997276
 ] 

Amareshwari Sriramadasu commented on HIVE-5733:
---

+1 This is much required.
I agree it has become difficult to depend on hive exec jar, because of ql 
module shading all the dependencies.

I will try to put a patch.

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho

 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-28 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved HIVE-6953.
---

Resolution: Duplicate

The test failures were because of HIVE-6877. After applying the patch from 
HIVE-6877 into branch-0.13, all the tests are passing

 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu
Assignee: Alan Gates
 Attachments: 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.xml, 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestWorker.xml, nohup.out.gz


 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-23 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979312#comment-13979312
 ] 

Amareshwari Sriramadasu commented on HIVE-6953:
---

Also, there are some tests failing randomly because they fail to create path in 
/user/hive/warehouse

For ex: org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat failed with 
following errors
{noformat}
log4j:ERROR Could not read configuration file from URL 
[file:/hive-path/ql/target/tmp/conf/hive-log4j.properties].
java.io.FileNotFoundException: 
/hive-path/ql/target/tmp/conf/hive-log4j.properties (No such file or 
directory)


FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:file:/user/hive/warehouse/text_symlink_text is not a 
directory or unable to create one)
{noformat}

So, most probably something is getting cleaned-up. But I could find where to 
start.

Can someone help me to find the root cause? Where can i start to look at it.  

 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu
Assignee: Alan Gates
 Attachments: 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.xml, 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestWorker.xml, nohup.out.gz


 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-23 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979312#comment-13979312
 ] 

Amareshwari Sriramadasu edited comment on HIVE-6953 at 4/24/14 5:47 AM:


Also, there are some tests failing randomly because they fail to create path in 
/user/hive/warehouse

For ex: org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat failed with 
following errors
{noformat}
log4j:ERROR Could not read configuration file from URL 
[file:/hive-path/ql/target/tmp/conf/hive-log4j.properties].
java.io.FileNotFoundException: 
/hive-path/ql/target/tmp/conf/hive-log4j.properties (No such file or 
directory)


FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:file:/user/hive/warehouse/text_symlink_text is not a 
directory or unable to create one)
{noformat}

So, most probably something is getting cleaned-up. But I could not find where 
to start.

Can someone help me to find the root cause? Where can i start to look at it.  


was (Author: amareshwari):
Also, there are some tests failing randomly because they fail to create path in 
/user/hive/warehouse

For ex: org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat failed with 
following errors
{noformat}
log4j:ERROR Could not read configuration file from URL 
[file:/hive-path/ql/target/tmp/conf/hive-log4j.properties].
java.io.FileNotFoundException: 
/hive-path/ql/target/tmp/conf/hive-log4j.properties (No such file or 
directory)


FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:file:/user/hive/warehouse/text_symlink_text is not a 
directory or unable to create one)
{noformat}

So, most probably something is getting cleaned-up. But I could find where to 
start.

Can someone help me to find the root cause? Where can i start to look at it.  

 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu
Assignee: Alan Gates
 Attachments: 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestCleaner.xml, 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestWorker.xml, nohup.out.gz


 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-22 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-6953:
-

 Summary: All CompactorTest failing with Table/View 'NEXT_TXN_ID' 
does not exist
 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu


When I'm running all tests through the command 'mvn clean install -Phadoop-1', 
all CompactorTest classes TestInitiator, TestWorker, TestCleaner fail with 
following exception :

{noformat}
org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_TXN_ID' 
does not exist.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)


Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
 Source)

{noformat}

This is happening on branch-0.13. Has anyone faced this problem?

[~owen.omalley] or someone else help me solve this. Do i have to set anything?






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976740#comment-13976740
 ] 

Amareshwari Sriramadasu commented on HIVE-6953:
---

There are no failures in trunk, all tests pass. [~rhbutani], do you think 
anything is missing in branch-0.13? Looking at commits, i couldnt figure out.


 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu

 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-6953:
--

Attachment: nohup.out.gz

The nohup test output

 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu
Assignee: Alan Gates
 Attachments: nohup.out.gz


 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977847#comment-13977847
 ] 

Amareshwari Sriramadasu commented on HIVE-6953:
---

Thanks [~alangates] and [~rhbutani] for trying. The tests are passing when i 
run them individually. When all the tests are run together, they are failing.

Here is what i have done :
{noformat}
git clone https://github.com/apache/hive apache-hive
git checkout branch-0.13
nohup mvn clean install -Phadoop-1 
{noformat}

Attaching nohup output for reference.

bq. Is there anything in your logs indicating it tried to create the tables and 
failed?
Will check and update.

bq. Are you doing anything in your build to turn off the hive.in.test config 
value?
No. 

I'm thinking the test db or conf is getting cleaned up by some-other means, 
when all the tests are run together.



 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu
Assignee: Alan Gates
 Attachments: nohup.out.gz


 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977849#comment-13977849
 ] 

Amareshwari Sriramadasu commented on HIVE-6953:
---

The machine on which I'm running is a Linux machine. Same thing happens on Mac 
as well.

{noformat}
uname -a
Linux hostname 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 
x86_64 GNU/Linux

$ java -version
java version 1.6.0_26
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
{noformat}

 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu
Assignee: Alan Gates
 Attachments: nohup.out.gz


 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6953) All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist

2014-04-22 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-6953:
--

Attachment: TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml

Attaching TestInitiator.xml

 All CompactorTest failing with Table/View 'NEXT_TXN_ID' does not exist
 --

 Key: HIVE-6953
 URL: https://issues.apache.org/jira/browse/HIVE-6953
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Amareshwari Sriramadasu
Assignee: Alan Gates
 Attachments: 
 TEST-org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.xml, nohup.out.gz


 When I'm running all tests through the command 'mvn clean install 
 -Phadoop-1', all CompactorTest classes TestInitiator, TestWorker, TestCleaner 
 fail with following exception :
 {noformat}
 org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from 
 transaction database java.sql.SQLSyntaxErrorException: Table/View 
 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
 at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
 at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
 
 Caused by: java.sql.SQLException: Table/View 'NEXT_TXN_ID' does not exist.
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
 at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
 {noformat}
 This is happening on branch-0.13. Has anyone faced this problem?
 [~owen.omalley] or someone else help me solve this. Do i have to set anything?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument

2014-02-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5370:
--

Status: Patch Available  (was: Open)

 format_number udf should take user specifed format as argument
 --

 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch


 Currently, format_number udf formats the number to #,###,###.##, but it 
 should also take a user specified format as optional input.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument

2014-02-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5370:
--

Attachment: HIVE-5370.patch

Attaching the patch with following changes:

* Added the format as second argument. 
* Also takes care of null be being formatted. Current code throws NPE for null 
value, fixed it to return null on formatting of null.

Review request for the same: https://reviews.apache.org/r/18182/

 format_number udf should take user specifed format as argument
 --

 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch


 Currently, format_number udf formats the number to #,###,###.##, but it 
 should also take a user specified format as optional input.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument

2014-02-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5370:
--

Status: Open  (was: Patch Available)

Earlier patch missed delete of the file.

 format_number udf should take user specifed format as argument
 --

 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch


 Currently, format_number udf formats the number to #,###,###.##, but it 
 should also take a user specified format as optional input.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument

2014-02-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5370:
--

Status: Patch Available  (was: Open)

 format_number udf should take user specifed format as argument
 --

 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch, 
 HIVE-5370.patch


 Currently, format_number udf formats the number to #,###,###.##, but it 
 should also take a user specified format as optional input.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument

2014-02-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5370:
--

Attachment: HIVE-5370.patch

Corrected the patch

 format_number udf should take user specifed format as argument
 --

 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13185.1.patch, D13185.2.patch, HIVE-5370.patch, 
 HIVE-5370.patch


 Currently, format_number udf formats the number to #,###,###.##, but it 
 should also take a user specified format as optional input.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.

2014-02-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-6410:
--

Status: Open  (was: Patch Available)

Resubmitting for tests to run

 Allow output serializations separators to be set for HDFS path as well.
 ---

 Key: HIVE-6410
 URL: https://issues.apache.org/jira/browse/HIVE-6410
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-6410.patch


 HIVE-3682 adds functionality for users to set serialization constants for 
 'insert overwrite local directory'. The same functionality should be 
 available for hdfs path as well. The workaround suggested is to create a 
 table with required format and insert into the table, which enforces the 
 users to know the schema of the result and create the table ahead. Though 
 that works, it is good to have the functionality for loading into directory 
 as well.
 I'm planning to add the same functionality in 'insert overwrite directory' in 
 this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.

2014-02-17 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-6410:
--

Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

 Allow output serializations separators to be set for HDFS path as well.
 ---

 Key: HIVE-6410
 URL: https://issues.apache.org/jira/browse/HIVE-6410
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.13.0

 Attachments: HIVE-6410.patch


 HIVE-3682 adds functionality for users to set serialization constants for 
 'insert overwrite local directory'. The same functionality should be 
 available for hdfs path as well. The workaround suggested is to create a 
 table with required format and insert into the table, which enforces the 
 users to know the schema of the result and create the table ahead. Though 
 that works, it is good to have the functionality for loading into directory 
 as well.
 I'm planning to add the same functionality in 'insert overwrite directory' in 
 this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.

2014-02-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901100#comment-13901100
 ] 

Amareshwari Sriramadasu commented on HIVE-6410:
---

[~xuefu.w...@kodak.com], had the patch ready, so uploaded. I don't mind any of 
them being closed as duplicate of other, as long as code gets in.

 Allow output serializations separators to be set for HDFS path as well.
 ---

 Key: HIVE-6410
 URL: https://issues.apache.org/jira/browse/HIVE-6410
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-6410.patch


 HIVE-3682 adds functionality for users to set serialization constants for 
 'insert overwrite local directory'. The same functionality should be 
 available for hdfs path as well. The workaround suggested is to create a 
 table with required format and insert into the table, which enforces the 
 users to know the schema of the result and create the table ahead. Though 
 that works, it is good to have the functionality for loading into directory 
 as well.
 I'm planning to add the same functionality in 'insert overwrite directory' in 
 this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.

2014-02-12 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-6410:
--

Status: Patch Available  (was: Open)

 Allow output serializations separators to be set for HDFS path as well.
 ---

 Key: HIVE-6410
 URL: https://issues.apache.org/jira/browse/HIVE-6410
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-6410.patch


 HIVE-3682 adds functionality for users to set serialization constants for 
 'insert overwrite local directory'. The same functionality should be 
 available for hdfs path as well. The workaround suggested is to create a 
 table with required format and insert into the table, which enforces the 
 users to know the schema of the result and create the table ahead. Though 
 that works, it is good to have the functionality for loading into directory 
 as well.
 I'm planning to add the same functionality in 'insert overwrite directory' in 
 this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.

2014-02-12 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-6410:
--

Attachment: HIVE-6410.patch

Attaching the patch with fix. The changes include :
* Grammar changes for accepting table format and row format for 'insert 
overwrite directory'. 
* Fixed existing code to accept serde properties as well.
* Added insert_overwrite_directory.q 

Review board request - https://reviews.apache.org/r/18060/

 Allow output serializations separators to be set for HDFS path as well.
 ---

 Key: HIVE-6410
 URL: https://issues.apache.org/jira/browse/HIVE-6410
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-6410.patch


 HIVE-3682 adds functionality for users to set serialization constants for 
 'insert overwrite local directory'. The same functionality should be 
 available for hdfs path as well. The workaround suggested is to create a 
 table with required format and insert into the table, which enforces the 
 users to know the schema of the result and create the table ahead. Though 
 that works, it is good to have the functionality for loading into directory 
 as well.
 I'm planning to add the same functionality in 'insert overwrite directory' in 
 this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6410) Allow output serializations separators to be set for HDFS path as well.

2014-02-11 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-6410:
-

 Summary: Allow output serializations separators to be set for HDFS 
path as well.
 Key: HIVE-6410
 URL: https://issues.apache.org/jira/browse/HIVE-6410
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu


HIVE-3682 adds functionality for users to set serialization constants for 
'insert overwrite local directory'. The same functionality should be available 
for hdfs path as well. The workaround suggested is to create a table with 
required format and insert into the table, which enforces the users to know the 
schema of the result and create the table ahead. Though that works, it is good 
to have the functionality for loading into directory as well.

I'm planning to add the same functionality in 'insert overwrite directory' in 
this jira.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice

2014-02-11 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898805#comment-13898805
 ] 

Amareshwari Sriramadasu commented on HIVE-3682:
---

Though above suggestion of creating a table and insert overwrite table works, 
it enforces the user to know schema of the output and create the table ahead. 
When queries are automated, it is difficult to always create the table ahead. I 
have created the issue HIVE-6410 for adding the functionality in this issue for 
INSERT OVERWRITE DIRECTORY as well.

 when output hive table to file,users should could have a separator of their 
 own choice
 --

 Key: HIVE-3682
 URL: https://issues.apache.org/jira/browse/HIVE-3682
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.8.1
 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
 java version 1.6.0_25
 hadoop-0.20.2-cdh3u0
 hive-0.8.1
Reporter: caofangkun
Assignee: Sushanth Sowmyan
 Fix For: 0.11.0

 Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
 HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, 
 HIVE-3682.D10275.4.patch.for.0.11, HIVE-3682.with.serde.patch


 By default,when output hive table to file ,columns of the Hive table are 
 separated by ^A character (that is \001).
 But indeed users should have the right to set a seperator of their own choice.
 Usage Example:
 create table for_test (key string, value string);
 load data local inpath './in1.txt' into table for_test
 select * from for_test;
 UT-01:default separator is \001 line separator is \n
 insert overwrite local directory './test-01' 
 select * from src ;
 create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ',';
 load data local inpath ../hive/examples/files/arraytest.txt overwrite into 
 table table2;
 CREATE TABLE map_table (foo STRING , bar MAPSTRING, STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 STORED AS TEXTFILE;
 UT-02:defined field separator as ':'
 insert overwrite local directory './test-02' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-03: line separator DO NOT ALLOWED to define as other separator 
 insert overwrite local directory './test-03' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-04: define map separators 
 insert overwrite local directory './test-04' 
 row format delimited 
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 select * from src;



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6404) Fix typo in serde constants for collection delimitor

2014-02-10 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-6404:
-

 Summary: Fix typo in serde constants for collection delimitor
 Key: HIVE-6404
 URL: https://issues.apache.org/jira/browse/HIVE-6404
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Amareshwari Sriramadasu
Priority: Trivial


The collection delimiter is defined with a typo in serdeConstants:

{noformat}
  public static final String COLLECTION_DELIM = colelction.delim;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6390) Read timeout on metastore client leads to out of sequence responses

2014-02-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-6390:
-

 Summary: Read timeout on metastore client leads to out of sequence 
responses
 Key: HIVE-6390
 URL: https://issues.apache.org/jira/browse/HIVE-6390
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Amareshwari Sriramadasu


When client application gets Read timeout on hive metastore client, the 
subsequent calls to metastore fail with out of sequence response. Then, the 
only way to get out of this is to restart the client application.

here are the exceptions:
{noformat}
2014-02-04 08:42:04,132 ERROR hive.log (MetaStoreUtils) - Got exception: 
org.apache.thrift.transport.TTransportException 
java.net.SocketTimeoutException: Read timed out
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912)
{noformat}

And subsequent calls to metastore get the following:
{noformat}
2014-02-04 08:43:14,273 ERROR hive.log (MetaStoreUtils) - Got exception: 
org.apache.thrift.TApplicationException get_tables failed: out of sequence 
response
org.apache.thrift.TApplicationException: get_tables failed: out of sequence 
response
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HIVE-6390) Read timeout on metastore client leads to out of sequence responses

2014-02-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved HIVE-6390.
---

Resolution: Not A Problem

Seems not problem with Hive metastore client, and the exceptions are handled in 
RetryingMetaStoreClient. The client application was getting continuous 'out of 
sequence response' from hive server connection.

 Read timeout on metastore client leads to out of sequence responses
 ---

 Key: HIVE-6390
 URL: https://issues.apache.org/jira/browse/HIVE-6390
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Amareshwari Sriramadasu

 When client application gets Read timeout on hive metastore client, the 
 subsequent calls to metastore fail with out of sequence response. Then, the 
 only way to get out of this is to restart the client application.
 here are the exceptions:
 {noformat}
 2014-02-04 08:42:04,132 ERROR hive.log (MetaStoreUtils) - Got exception: 
 org.apache.thrift.transport.TTransportException 
 java.net.SocketTimeoutException: Read timed out
 org.apache.thrift.transport.TTransportException: 
 java.net.SocketTimeoutException: Read timed out
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912)
 {noformat}
 And subsequent calls to metastore get the following:
 {noformat}
 2014-02-04 08:43:14,273 ERROR hive.log (MetaStoreUtils) - Got exception: 
 org.apache.thrift.TApplicationException get_tables failed: out of sequence 
 response
 org.apache.thrift.TApplicationException: get_tables failed: out of sequence 
 response
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables(ThriftHiveMetastore.java:887)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables(ThriftHiveMetastore.java:873)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:912)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

2013-11-20 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13828528#comment-13828528
 ] 

Amareshwari Sriramadasu commented on HIVE-4956:
---

I agree with the concerns above that this is deviating from SQL. But it gives 
lot of performance improvement in distributed systems. How about change the 
separator to '+' instead of ',', as part of Hive QL? 

The query will look like the following :
{noformat}
select t.x, t.y,  from T1+T2 t where t.p1='x' OR t.p1='y' ... 
[groupby-clause] [having-clause] [orderby-clause]
{noformat}

If the proposal is fine, I can upload the patch.

 Allow multiple tables in from clause if all them have the same schema, but 
 can be partitioned differently
 -

 Key: HIVE-4956
 URL: https://issues.apache.org/jira/browse/HIVE-4956
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We have a usecase where the table storage partitioning changes over time.
 For ex:
  we can have a table T1 which is partitioned by p1. But overtime, we want to 
 partition the table on p1 and p2 as well. The new table can be T2. So, if we 
 have to query table on partition p1, it will be a union query across two 
 table T1 and T2. Especially with aggregations like avg, it becomes costly 
 union query because we cannot make use of mapside aggregations and other 
 optimizations.
 The proposal is to support queries of the following format :
 select t.x, t.y,  from T1,T2 t where t.p1='x' OR t.p1='y' ... 
 [groupby-clause] [having-clause] [orderby-clause] and so on.
 Here we allow from clause as a comma separated list of tables with an alias 
 and alias will be used in the full query, and partition pruning will happen 
 on the actual tables to pick up the right paths. This will work because the 
 difference is only on picking up the input paths and whole operator tree does 
 not change. If this sounds a good usecase, I can put up the changes required 
 to support the same.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument

2013-09-30 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5370:
--

Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

 format_number udf should take user specifed format as argument
 --

 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13185.1.patch, D13185.2.patch


 Currently, format_number udf formats the number to #,###,###.##, but it 
 should also take a user specified format as optional input.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5370) format_number udf should take user specifed format as argument

2013-09-30 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5370:
--

Status: Open  (was: Patch Available)

Looking into test failures

 format_number udf should take user specifed format as argument
 --

 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13185.1.patch, D13185.2.patch


 Currently, format_number udf formats the number to #,###,###.##, but it 
 should also take a user specified format as optional input.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5370) format_number udf should take user specifed format as argument

2013-09-26 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-5370:
-

 Summary: format_number udf should take user specifed format as 
argument
 Key: HIVE-5370
 URL: https://issues.apache.org/jira/browse/HIVE-5370
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Minor


Currently, format_number udf formats the number to #,###,###.##, but it should 
also take a user specified format as optional input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5326) Operators and || do not work

2013-09-19 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-5326:
-

 Summary: Operators  and || do not work
 Key: HIVE-5326
 URL: https://issues.apache.org/jira/browse/HIVE-5326
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu


Though the documentation https://cwiki.apache.org/Hive/languagemanual-udf.html 
says they are same as AND and OR, they do not even get parsed. User gets 
parsing when they are used. 

hive select key from src where key=a || key =b;
FAILED: Parse Error: line 1:33 cannot recognize input near '|' 'key' '=' in 
expression specification

hive select key from src where key=a  key =b;
FAILED: Parse Error: line 1:33 cannot recognize input near '' 'key' '=' in 
expression specification

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2

2013-08-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739435#comment-13739435
 ] 

Amareshwari Sriramadasu commented on HIVE-4569:
---

I think it makes sense to have two apis as JDBC drivers can call one with sync 
and other users interested in async can call async api. Though the 
documentation of execute() has to be changed to say that it is executed 
synchronously.

 GetQueryPlan api in Hive Server2
 

 Key: HIVE-4569
 URL: https://issues.apache.org/jira/browse/HIVE-4569
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu
Assignee: Jaideep Dhok
 Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, 
 HIVE-4569.D11469.1.patch, HIVE-4569.D12231.1.patch, HIVE-4569.D12237.1.patch


 It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan 
 api available in HiveServer2, though the wiki 
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API 
 contains, not sure why it was not added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5060) JDBC driver assumes executeStatement is synchronous

2013-08-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739437#comment-13739437
 ] 

Amareshwari Sriramadasu commented on HIVE-5060:
---

@Henry, HIVE-4569 adds another api to call execute asynchronously. After that, 
current code of jdbc driver should just work.
If we have a synchronous api, the clients such as jdbc can fetch results after 
the execute immediately without bombarding the server with so many get-status 
calls. So, i definitely see the need for two apis.

 JDBC driver assumes executeStatement is synchronous
 ---

 Key: HIVE-5060
 URL: https://issues.apache.org/jira/browse/HIVE-5060
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Henry Robinson
 Fix For: 0.11.1, 0.12.0

 Attachments: 
 0001-HIVE-5060-JDBC-driver-assumes-executeStatement-is-sy.patch


 The JDBC driver seems to assume that {{ExecuteStatement}} is a synchronous 
 call when performing updates via {{executeUpdate}}, where the following 
 comment on the RPC in the Thrift file indicates otherwise:
 {code}
 // ExecuteStatement()
 //
 // Execute a statement.
 // The returned OperationHandle can be used to check on the
 // status of the statement, and to fetch results once the
 // statement has finished executing.
 {code}
 I understand that Hive's implementation of {{ExecuteStatement}} is blocking 
 (see https://issues.apache.org/jira/browse/HIVE-4569), but presumably other 
 implementations of the HiveServer2 API (and I'm talking specifically about 
 Impala here, but others might have a similar concern) should be free to 
 return a pollable {{OperationHandle}} per the specification.
 The JDBC driver's {{executeUpdate}} is as follows:
 {code}
 public int executeUpdate(String sql) throws SQLException {
 execute(sql);
 return 0;
   }
 {code}
 {{execute(sql)}} discards the {{OperationHandle}} that it gets from the 
 server after determining whether there are results to be fetched.
 This is problematic for us, because Impala will cancel queries that are 
 running when a session executes, but there's no easy way to be sure that an 
 {{INSERT}} statement has completed before terminating a session on the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

2013-07-30 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4956:
-

 Summary: Allow multiple tables in from clause if all them have the 
same schema, but can be partitioned differently
 Key: HIVE-4956
 URL: https://issues.apache.org/jira/browse/HIVE-4956
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu


We have a usecase where the table storage partitioning changes over time.

For ex:
 we can have a table T1 which is partitioned by p1. But overtime, we want to 
partition the table on p1 and p2 as well. The new table can be T2. So, if we 
have to query table on partition p1, it will be a union query across two table 
T1 and T2. Especially with aggregations like avg, it becomes costly union query 
because we cannot make use of mapside aggregations and other optimizations.

The proposal is to support queries of the following format :

select t.x, t.y,  from T1,T2 t where t.p1='x' OR t.p1='y' ... 
[groupby-clause] [having-clause] [orderby-clause] and so on.

Here we allow from clause as a comma separated list of tables with an alias and 
alias will be used in the full query, and partition pruning will happen on the 
actual tables to pick up the right paths. This will work because the difference 
is only on picking up the input paths and whole operator tree does not change. 
If this sounds a good usecase, I can put up the changes required to support the 
same.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

2013-07-30 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723712#comment-13723712
 ] 

Amareshwari Sriramadasu commented on HIVE-4956:
---

The same usecase can be applied for tables stored at different rollups like 
daily rollups and hourly rollups.

 Allow multiple tables in from clause if all them have the same schema, but 
 can be partitioned differently
 -

 Key: HIVE-4956
 URL: https://issues.apache.org/jira/browse/HIVE-4956
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We have a usecase where the table storage partitioning changes over time.
 For ex:
  we can have a table T1 which is partitioned by p1. But overtime, we want to 
 partition the table on p1 and p2 as well. The new table can be T2. So, if we 
 have to query table on partition p1, it will be a union query across two 
 table T1 and T2. Especially with aggregations like avg, it becomes costly 
 union query because we cannot make use of mapside aggregations and other 
 optimizations.
 The proposal is to support queries of the following format :
 select t.x, t.y,  from T1,T2 t where t.p1='x' OR t.p1='y' ... 
 [groupby-clause] [having-clause] [orderby-clause] and so on.
 Here we allow from clause as a comma separated list of tables with an alias 
 and alias will be used in the full query, and partition pruning will happen 
 on the actual tables to pick up the right paths. This will work because the 
 difference is only on picking up the input paths and whole operator tree does 
 not change. If this sounds a good usecase, I can put up the changes required 
 to support the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4710) ant maven-build -Dmvn.publish.repo=local fails

2013-06-11 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4710:
-

 Summary: ant maven-build -Dmvn.publish.repo=local fails
 Key: HIVE-4710
 URL: https://issues.apache.org/jira/browse/HIVE-4710
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Amareshwari Sriramadasu


ant maven-build fails with following error :

/home/amareshwaris/hive/build.xml:121: The following error occurred while 
executing this line:
/home/amareshwaris/hive/build.xml:123: The following error occurred while 
executing this line:
Target make-pom does not exist in the project hcatalog. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive

2013-06-10 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4115:
--

Status: Patch Available  (was: Open)

Code is ready for review and checkin. Changing the status

 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: cube-design-2.pdf, cube-design.docx, 
 HIVE-4115.D10689.1.patch, HIVE-4115.D10689.2.patch, HIVE-4115.D10689.3.patch, 
 HIVE-4115.D10689.4.patch


 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing

2013-05-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664858#comment-13664858
 ] 

Amareshwari Sriramadasu commented on HIVE-4570:
---

bq. Current API GetOperationState is not enough since it returns only a state 
enum. Instead of changing that we can add new API GetOperationProgress() which 
will return both OperationState and OperationProgress.

Sounds good. +1.

For default implementation of getProgress(), you can return 1, if task is 
successful and 0, otherwise.

 More information to user on GetOperationStatus in Hive Server2 when query is 
 still executing
 

 Key: HIVE-4570
 URL: https://issues.apache.org/jira/browse/HIVE-4570
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Jaideep Dhok

 Currently in Hive Server2, when the query is still executing only the status 
 is set as STILL_EXECUTING. 
 This issue is to give more information to the user such as progress and 
 running job handles, if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4569) GetQueryPlan api in Hive Server2

2013-05-19 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4569:
--

Assignee: Jaideep Dhok

 GetQueryPlan api in Hive Server2
 

 Key: HIVE-4569
 URL: https://issues.apache.org/jira/browse/HIVE-4569
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Jaideep Dhok

 It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan 
 api available in HiveServer2, though the wiki 
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API 
 contains, not sure why it was not added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4569) GetQueryPlan api in Hive Server2

2013-05-16 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4569:
-

 Summary: GetQueryPlan api in Hive Server2
 Key: HIVE-4569
 URL: https://issues.apache.org/jira/browse/HIVE-4569
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu


It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api 
available in HiveServer2, though the wiki 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API 
contains, not sure why it was not added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4570) More information to user on GetOperationStatus in Hive Server2 when query is still executing

2013-05-16 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4570:
-

 Summary: More information to user on GetOperationStatus in Hive 
Server2 when query is still executing
 Key: HIVE-4570
 URL: https://issues.apache.org/jira/browse/HIVE-4570
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu


Currently in Hive Server2, when the query is still executing only the status is 
set as STILL_EXECUTING. 

This issue is to give more information to the user such as progress and running 
job handles, if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive

2013-05-09 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652819#comment-13652819
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
---

The branch HIVE-4115 is ready for review. Also, created the phabricator entry.

Changes include :
* ql/src/java/org/apache/hadoop/hive/ql/cube/metadata/ has classes for Cube 
Metastore adn CubeMetastoreClient.java has the api to create cube, fact and 
dimension tables.
* ql/src/java/org/apache/hadoop/hive/ql/cube/parse/ has code for validating the 
cube ql and converting the cube ql to HQL involving final storage tables
* ql/src/java/org/apache/hadoop/hive/ql/cube/processors/CubeDriver.java is the 
entry point for the cube query. If query start with 'cube', it will be 
processed by CubeDriver.

Will add Cube DDL in a followup jira.

 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: cube-design-2.pdf, cube-design.docx, 
 HIVE-4115.D10689.1.patch


 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive

2013-05-08 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4115:
--

Attachment: cube-design-2.pdf

Attaching the updated design doc

 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: cube-design-2.pdf, cube-design.docx


 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4409) Prevent incompatible column type changes

2013-04-25 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642605#comment-13642605
 ] 

Amareshwari Sriramadasu commented on HIVE-4409:
---

Looks like the commit checked in 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java.orig
 as well. 

[~namitjain], Do you want to remove it?

 Prevent incompatible column type changes
 

 Key: HIVE-4409
 URL: https://issues.apache.org/jira/browse/HIVE-4409
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Metastore
Affects Versions: 0.10.0
Reporter: Dilip Joseph
Assignee: Dilip Joseph
Priority: Minor
 Fix For: 0.12.0

 Attachments: hive.4409.1.patch, HIVE-4409.D10539.1.patch, 
 HIVE-4409.D10539.2.patch


 If a user changes the type of an existing column of a partitioned table to an 
 incompatible type, subsequent accesses of old partitions will result in a 
 ClassCastException (see example below).  We should prevent the user from 
 making incompatible type changes.  This feature will be controlled by a new 
 config parameter.
 Example:
 CREATE TABLE test_table123 (a INT, b MAPSTRING, STRING) PARTITIONED BY (ds 
 STRING) STORED AS SEQUENCEFILE;
 INSERT OVERWRITE TABLE test_table123 PARTITION(ds=foo1) SELECT 1, MAP(a1, 
 b1) FROM src LIMIT 1;
 SELECT * from test_table123 WHERE ds=foo1;
 SET hive.metastore.disallow.invalid.col.type.changes=true;
 ALTER TABLE test_table123 REPLACE COLUMNS (a INT, b STRING);
 SELECT * from test_table123 WHERE ds=foo1;
 The last SELECT fails with the following exception:
 Failed with exception java.io.IOException:java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyMapObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
 java.io.IOException: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyMapObjectInspector 
 cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1406)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:790)
   at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:124)
   at 
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_class_cast(TestCliDriver.java:108)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-04-23 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638891#comment-13638891
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

[~namit]Can you look at the latest patch on phabricator?

I'm hoping this can get into hive 0.11 branch.

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-04-23 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4018:
--

Attachment: HIVE-4018-2.txt

Attaching the latest patch.

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018-2.txt, HIVE-4018.patch, 
 hive.4018.test.2.patch, HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-03-28 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616156#comment-13616156
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

After Updating the patch to trunk, the test fails with NPE again. Will see 
whats the cause and update.

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-03-27 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616047#comment-13616047
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

[~namitjain], Can you please look at the latest patch on phabricator ?

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4115) Introduce cube abstraction in hive

2013-03-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4115:
--

Attachment: cube-design.docx

Attaching the first cut design document for adding cube abstraction in hive.

Pushed the code (being developed) to the branch HIVE-4115. Will be developing 
on the branch going forward.



 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: cube-design.docx


 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4115:
-

 Summary: Introduce cube abstraction in hive
 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu


We would like to define a cube abstraction so that user can query at cube layer 
and do not know anything about storage and rollups. 

Will describe the model more in following comments.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593296#comment-13593296
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
---

Logical model :
-
*Cube* :
* A cube is a set of dimensions and measures in a particular subject. 
* A measure is a quantity that you are interested in measuring.
* A dimension is an attribute, or set of attributes, by which you can divide 
measures into sub-categories. 

*Fact Tables* :
* Cube will have fact tables associated with it.
* A fact table would have subset of measures and dimensions.
* Fact tables can be rolled at any dimension and time.

*Dimensions* :
* The cube dimension can refer to a dimension table 
* The cube dimension can have hierarchy of elements.

*Dimension tables* :
* A table with list of columns.
* The table can have references to other dimension tables.
* The dimension tables can be shared across cubes.

*Storage*:
* Fact or dimension table can have storages associated with it.

Storage Model :
-
A physical table will be created in hive metastore for each fact, per storage 
per rollup.


 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593316#comment-13593316
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
---


Illustrating above model with an example :
* Define a SALES_CUBE cube with measures : Sales, Discount and Dimensions: 
CustomerID, Location, Transaction-time 

* Dimensions:
** CustomerID is a simple dimension which refers to the customer table on 
column ID. CustomerTable is having the schema : ID, Age, Gender
** Location is hierarchical dimension with the hierarchy : Zipcode, CityID, 
StateID, CountryID, RegionID
*** Zipcode refers to ZipTable on column code. ZipTable schema : code, 
street-name, cityID, stateID
*** CityID refers to cityTable on column ID. CityTable schema : ID, name, 
stateID
*** stateID refers to stateTable on column ID. StateTable schema : ID, name, 
capital, countryID
*** countryID refers to counteryTable on column ID. CounterTable : ID, name, 
capital, Region
*** Region is an inline dimension with values 'APAC', 'EMEA', 'USA'
** Transaction-time is simple dimension with timestamp field.

* Facts :Sales_cube can have the following fact tables :
## RawFact with columns Sales, Discount, CustomerId, ZipCode, Transaction-time
## CountryFact with columns Sales, Discount, CountryID


Physical storage tables :

In the example described above say that RawFact is rolled hourly in Cluster c1, 
is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, 
monthly, quarterly and yearly on Cluster C2; Also, Customer table is available 
in HBase cluster H1; All the location tables are available in HDFS cluster C2.

The physical tables would be :
* C1_Rawfact_hourly - schema : Sales, Discount, CustomerId, ZipCode, 
Transaction-time Partitioned by dt and state.
* C2_Rawfact_daily - schema : Sales, Discount, CustomerId, ZipCode, 
Transaction-time Partitioned by dt and state.
* C2_Rawfact_monthly - schema : Sales, Discount, CustomerId, ZipCode, 
Transaction-time Partitioned by dt and state.
* C2_CountryFact_daily - Schema : Sales, Discount, CountryID Partitioned by dt
* C2_CountryFact_monthly - Schema : Sales, Discount, CountryID Partitioned by 
dt
* C2_CountryFact_quarterly - Schema : Sales, Discount, CountryID Partitioned 
by dt
* C2_CountryFact_yearly - Schema : Sales, Discount, CountryID Partitioned by 
dt
* H1_CustomerTable - schema :  ID, Age, Gender
* C2_ZipTable - schema : code, street-name, cityID, stateID
* C2_CityTable - schema : ID, name, stateID
* C2_StateTable -schema : ID, name, capital, countryID
* C2_CountryTable -schema : ID, name, capital, Region


If User queries the data on cube with a query like the following :
* Select sales from SALES_CUBE where region = 'APAC' and 
time_range_in(09/01/2012, 12/31/2012)  // Q4 -2012.

Cube Abstraction provided would be smart enough to figure out which table to go 
and give the result . In this case the query translates to :

* Select sales from C2_CountryFact_quarterly join C2_countryTable on 
C2_CountryFact_quarterly.CountryID = C2_countryTable.ID where dt = Q4-2012 
and C2_countryTable.region = 'APAC';



 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4115) Introduce cube abstraction in hive

2013-03-05 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593337#comment-13593337
 ] 

Amareshwari Sriramadasu commented on HIVE-4115:
---

bq. In the example described above say that RawFact is rolled hourly in Cluster 
c1, is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, 
monthly, quarterly and yearly on Cluster C2; Also, Customer table is available 
in HBase cluster H1; All the location tables are available in HDFS cluster C2.

Forgot to mention that, along with timely rolling RawFact is rolled at 
dimension state also. 

 Introduce cube abstraction in hive
 --

 Key: HIVE-4115
 URL: https://issues.apache.org/jira/browse/HIVE-4115
 Project: Hive
  Issue Type: New Feature
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We would like to define a cube abstraction so that user can query at cube 
 layer and do not know anything about storage and rollups. 
 Will describe the model more in following comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-03-04 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4018:
--

Status: Patch Available  (was: Open)

Updated the phabricator entry with comments incorporated. 

Now, AbstractMapJoinKey.readExternal uses MapJoinOperator's static variable, 
writeExternal uses HashTableSinkOperator's static variable

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3655) Use ChainMapper and ChainReducer for queries having [Map+][RMap*] pattern

2013-03-03 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved HIVE-3655.
---

Resolution: Won't Fix

 Use ChainMapper and ChainReducer for queries having [Map+][RMap*] pattern
 -

 Key: HIVE-3655
 URL: https://issues.apache.org/jira/browse/HIVE-3655
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu

 While breaking the query plan into multiple map reduce tasks, Hive should 
 consider the pattern [Map+][ReduceMap*] and generate single map reduce job 
 for such patterns using ChainMapper and ChainReducer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3952) merge map-job followed by map-reduce job

2013-02-27 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588233#comment-13588233
 ] 

Amareshwari Sriramadasu commented on HIVE-3952:
---

Tried out the patch, when we run query like the following :

INSERT OVERWRITE DIRECTORY /dir
Select 

It fails with exception :

{noformat}
java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.MoveTask cannot be 
cast to org.apache.hadoop.hive.ql.exec.MapRedTask
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.mayBeMergeMapJoinTaskWithMapReduceTask(CommonJoinResolver.java:291)
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:535)
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:701)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:113)
at 
org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8138)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8470)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:259)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:898)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
{noformat}



 merge map-job followed by map-reduce job
 

 Key: HIVE-3952
 URL: https://issues.apache.org/jira/browse/HIVE-3952
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Vinod Kumar Vavilapalli
 Attachments: HIVE-3952-20130226.txt


 Consider the query like:
 select count(*) FROM
 ( select idOne, idTwo, value FROM
   bigTable   
   JOIN
 
   smallTableOne on (bigTable.idOne = smallTableOne.idOne) 
   
   ) firstjoin 
 
 JOIN  
 
 smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo);
 where smallTableOne and smallTableTwo are smaller than 
 hive.auto.convert.join.noconditionaltask.size and
 hive.auto.convert.join.noconditionaltask is set to true.
 The joins are collapsed into mapjoins, and it leads to a map-only job
 (for the map-joins) followed by a map-reduce job (for the group by).
 Ideally, the map-only job should be merged with the following map-reduce job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-02-26 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586930#comment-13586930
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

Phabricator request - https://reviews.facebook.net/D8913

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-02-25 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4018:
--

Status: Patch Available  (was: Open)

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-02-25 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-4018:
--

Attachment: HIVE-4018.patch

Here is a patch which fixes the issue, with testcase added.

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-02-19 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581110#comment-13581110
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

bq.  this is a existing bug. Do you really think we should fix this now ? I 
mean, it is a pretty big and fundamental change.

I would say this should be fixed. Because earlier we are able to run the same 
multi join query using 2 MR jobs with mapjoin hint passed in nested structure 
for each join as described in HIVE-3652. Now there is no way to do mapjoin for 
this multiway join, as the same query fails with this error, after changes to 
HIVE-3784. 
It becomes more trouble because there are no more mapjoin hints and will have 
to explicitly turn off autojoin for such queries.


 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.4018.test.2.patch, HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-02-19 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581149#comment-13581149
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

bq. I agree from your point of view. Do you think you would be able to help on 
this ?

Sure. Will give a try.

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.4018.test.2.patch, HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3652) Join optimization for star schema

2013-02-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-3652:
--

Attachment: HIVE-3652-tests.patch

Attaching test with .q and .out files, which is launching two MR jobs for star 
join queries.

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0

 Attachments: HIVE-3652-tests.patch


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3652) Join optimization for star schema

2013-02-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577403#comment-13577403
 ] 

Amareshwari Sriramadasu commented on HIVE-3652:
---

Seems I figured it out. The hive.auto.convert.join.noconditionaltask.size is 
not the number of rows. When i changed 
hive.auto.convert.join.noconditionaltask.size value in the attached tests, it 
is launching one MR job. Will upload the patch again to add tests.

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0

 Attachments: HIVE-3652-tests.patch


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3652) Join optimization for star schema

2013-02-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-3652:
--

Attachment: HIVE-3652-tests.patch

Attaching the tests again. With hive.auto.convert.join.noconditionaltask.size 
increased, it launches single MR job for the queries.

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0

 Attachments: HIVE-3652-tests.patch, HIVE-3652-tests.patch


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-02-13 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4018:
-

 Summary: MapJoin failing with Distributed Cache error
 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
 Fix For: 0.11.0


When I'm a running a star join query after HIVE-3784, it is failing with 
following error:

2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
Load Distributed Cache Error
2013-02-13 08:36:04,585 FATAL ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3652) Join optimization for star schema

2013-02-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved HIVE-3652.
---

Resolution: Duplicate

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0

 Attachments: HIVE-3652-tests.patch, HIVE-3652-tests.patch


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   >