[jira] [Created] (HIVE-10809) HCat FileOutputCommitterContainer leaves behind empty _SCRATCH directories

2015-05-22 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10809:
---

 Summary: HCat FileOutputCommitterContainer leaves behind empty 
_SCRATCH directories
 Key: HIVE-10809
 URL: https://issues.apache.org/jira/browse/HIVE-10809
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang


When static partition is added through HCatStorer or HCatWriter

{code}
JoinedData = LOAD '/user/selinaz/data/part-r-0' USING JsonLoader();
STORE JoinedData INTO 'selina.joined_events_e' USING 
org.apache.hive.hcatalog.pig.HCatStorer('author=selina');
{code}

The table directory looks like
{noformat}
drwx--   - selinaz users  0 2015-05-22 21:19 
/user/selinaz/joined_events_e/_SCRATCH0.9157208938193798
drwx--   - selinaz users  0 2015-05-22 21:19 
/user/selinaz/joined_events_e/author=selina
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10729) Query failed when join on an element in complex type (tez map join only)

2015-05-15 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10729:
---

 Summary: Query failed when join on an element in complex type (tez 
map join only)
 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
 Environment: Steps to reproduce:
{code.sql}
hive set hive.auto.convert.join;
hive.auto.convert.join=true
hive desc foo;
a   arrayint
hive select * from foo;
[1,2]
hive desc src_int;
key int
value   string
hive select * from src_int where key=2;
2  val_2
hive select * from foo join src  on src.key = foo.a[1];
{code.sql}
Query will fail with stacktrace
{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
[Ljava.lang.Object;
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
... 23 more
{code}

Similar error when join on a map key:
{code.sql}
hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
src LIMIT 1;
hive select * from src join test where src.value=test.b[2];
{code.sql}


Reporter: Selina Zhang
Assignee: Selina Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10596) Further optimize metadata only query

2015-05-04 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10596:
---

 Summary: Further optimize metadata only query 
 Key: HIVE-10596
 URL: https://issues.apache.org/jira/browse/HIVE-10596
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Selina Zhang
Assignee: Selina Zhang


Currently with NullScan optimizer we skip the full table scan for metadata only 
query. However, further step can be taken: writing metadata info to a HDFS temp 
file and replace the original table info with the temp table info. In this way 
we will not calculate the mapper number according the original files. Users 
like to run metadata only query for facts table; calculates splits and 
allocates resources for this type of query seems unnecessary.  

We had this patch and it has been running for a while. Will upload it soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10308) Vectorization execution throws java.lang.IllegalArgumentException: Unsupported complex type: MAP

2015-04-10 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10308:
---

 Summary: Vectorization execution throws 
java.lang.IllegalArgumentException: Unsupported complex type: MAP
 Key: HIVE-10308
 URL: https://issues.apache.org/jira/browse/HIVE-10308
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.1, 0.14.0, 1.2.0, 1.1.0
Reporter: Selina Zhang
Assignee: Selina Zhang


Steps to reproduce:

CREATE TABLE test_orc (a INT, b MAPINT, STRING) STORED AS ORC;
INSERT OVERWRITE TABLE test_orc SELECT 1, MAP(1, one, 2, two) FROM src 
LIMIT 1;
CREATE TABLE test(key INT) ;
INSERT OVERWRITE TABLE test SELECT 1 FROM src LIMIT 1;

set hive.vectorized.execution.enabled=true;
set hive.auto.convert.join=false;

select l.key from test l left outer join test_orc r on (l.key= r.a) where r.a 
is not null;

Stack trace:

Caused by: java.lang.IllegalArgumentException: Unsupported complex type: MAP
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:456)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1191)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:58)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:198)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10089) RCFile: lateral view explode caused ConcurrentModificationException

2015-03-25 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10089:
---

 Summary: RCFile: lateral view explode caused 
ConcurrentModificationException
 Key: HIVE-10089
 URL: https://issues.apache.org/jira/browse/HIVE-10089
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang


CREATE TABLE test_table123 (a INT, b MAPSTRING, STRING) STORED AS RCFILE;

INSERT OVERWRITE TABLE test_table123 SELECT 1, MAP(a1, b1, c1, d1) FROM 
src LIMIT 1;

The following query will lead to ConcurrentModificationException

SELECT * FROM (SELECT b FROM test_table123) t1 LATERAL VIEW explode(b) x AS b,c 
LIMIT 1;
Failed with exception 
java.io.IOException:java.util.ConcurrentModificationException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10088) RCFIle: Lateral view with explode throws ConcurrentModificationException

2015-03-25 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10088:
---

 Summary: RCFIle: Lateral view with explode throws 
ConcurrentModificationException
 Key: HIVE-10088
 URL: https://issues.apache.org/jira/browse/HIVE-10088
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang


CREATE TABLE test_table123 (a INT, b MAPSTRING, STRING) STORED AS RCFILE;
INSERT OVERWRITE TABLE test_table123 SELECT 1, MAP(a1, b1, c1, d1) FROM 
src LIMIT 1;
hive select * from test_table123;
1
{a1:b1,c1:d1}
The following query will lead to ConcurrentModificationException
SELECT * FROM (SELECT b FROM test_table123) t1 LATERAL VIEW explode(b) x AS b,c 
LIMIT 1;
Failed with exception 
java.io.IOException:java.util.ConcurrentModificationException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-03-20 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-10036:
---

 Summary: Writing ORC format big table causes OOM - too many fixed 
sized stream buffers
 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang


ORC writer keeps multiple out steams for each column. Each output stream is 
allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
table, the memory cost is unbearable. Specially when HCatalog dynamic partition 
involves, several hundreds files may be open and writing at the same time (same 
problems for FileSinkOperator). 

Global ORC memory manager controls the buffer size, but it only got kicked in 
at 5000 rows interval. An enhancement could be done here, but the problem is 
reducing the buffer size introduces worse compression and more IOs in read 
path. Sacrificing the read performance is always not a good choice. 

I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
to the existing configurable buffer size. Most of the streams does not need 
large buffer so the performance got improved significantly. Comparing to 
Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 

Solving OOM for ORC completely maybe needs lots of effort , but this is 
definitely a low hanging fruit. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9573) Lazy load partitions for SELECT LIMIT type query

2015-02-03 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-9573:
--

 Summary: Lazy load partitions for SELECT LIMIT type query
 Key: HIVE-9573
 URL: https://issues.apache.org/jira/browse/HIVE-9573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Selina Zhang
Assignee: Selina Zhang


Some tools such as HUE uses 

SELECT * FROM table LIMIT 100;

to grab the sample content of table. For table with large partitions it causes 
a large amount of partition objects been loaded. Thus slow down the HS2 even 
cause OOM from time to time. 

My solution is lazy load partition objects in FetchOperator for this type of 
query. We can skip retrieving the whole partitions but only partition names in 
PartitionPrunner, and lazy load the partition objects when needed in execution 
time (for local job only).

I have a patch ready. But want to hear more suggestions. Thanks!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition

2015-01-26 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292449#comment-14292449
 ] 

Selina Zhang commented on HIVE-9447:


The unit test failures seem irrelevant to this patch.  

 Metastore: inefficient Oracle query for removing unused column descriptors 
 when add/drop table/partition
 

 Key: HIVE-9447
 URL: https://issues.apache.org/jira/browse/HIVE-9447
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-9447.1.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Metastore needs removing unused column descriptors when drop/add partitions 
 or tables. For query the unused column descriptor, the current implementation 
 utilizes datanuleus' range function, which basically equals LIMIT syntax. 
 However, Oracle does not support LIMIT, the query is converted as  
 {quote}
 SQL SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
 NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
 A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 
 WHERE A0.CD_ID = ? ) subq ) WHERE  rn = 1;
 {quote}
 Given that CD_ID is not very selective, this query may have to access large 
 amount of rows (depends how many partitions the table has, millions of rows 
 in our case). Metastore may become unresponsive because of this. 
 Since Metastore only needs to know if the specific CD_ID is referenced in SDS 
 table and does not need access the whole row. We can use 
 {quote}
 select count(1) from SDS where SDS.CD_ID=?
 {quote}
 CD_ID is index column, the above query will do range scan for index, which is 
 faster. 
 For other DBs support LIMIT syntax such as MySQL, this problem does not 
 exist. However, the new query does not hurt.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition

2015-01-22 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-9447:
--

 Summary: Metastore: inefficient Oracle query for removing unused 
column descriptors when add/drop table/partition
 Key: HIVE-9447
 URL: https://issues.apache.org/jira/browse/HIVE-9447
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Selina Zhang
Assignee: Selina Zhang


Metastore needs removing unused column descriptors when drop/add partitions or 
tables. For query the unused column descriptor, the current implementation 
utilizes datanuleus' range function, which basically equals LIMIT syntax. 
However, Oracle does not support LIMIT, the query is converted as  
{quote}
SQL SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 
WHERE A0.CD_ID = ? ) subq ) WHERE  rn = 1;
{quote}
Given that CD_ID is not very selective, this query may have to access large 
amount of rows (depends how many partitions the table has, millions of rows in 
our case). Metastore may become unresponsive because of this. 

Since Metastore only needs to know if the specific CD_ID is referenced in SDS 
table and does not need access the whole row. We can use 
{quote}
select count(1) from SDS where SDS.CD_ID=?
{quote}
CD_ID is index column, the above query will do range scan for index, which is 
faster. 

For other DBs support LIMIT syntax such as MySQL, this problem does not exist. 
However, the new query does not hurt.  







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition

2015-01-22 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-9447 started by Selina Zhang.
--
 Metastore: inefficient Oracle query for removing unused column descriptors 
 when add/drop table/partition
 

 Key: HIVE-9447
 URL: https://issues.apache.org/jira/browse/HIVE-9447
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Selina Zhang
Assignee: Selina Zhang
   Original Estimate: 3h
  Remaining Estimate: 3h

 Metastore needs removing unused column descriptors when drop/add partitions 
 or tables. For query the unused column descriptor, the current implementation 
 utilizes datanuleus' range function, which basically equals LIMIT syntax. 
 However, Oracle does not support LIMIT, the query is converted as  
 {quote}
 SQL SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
 NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
 A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 
 WHERE A0.CD_ID = ? ) subq ) WHERE  rn = 1;
 {quote}
 Given that CD_ID is not very selective, this query may have to access large 
 amount of rows (depends how many partitions the table has, millions of rows 
 in our case). Metastore may become unresponsive because of this. 
 Since Metastore only needs to know if the specific CD_ID is referenced in SDS 
 table and does not need access the whole row. We can use 
 {quote}
 select count(1) from SDS where SDS.CD_ID=?
 {quote}
 CD_ID is index column, the above query will do range scan for index, which is 
 faster. 
 For other DBs support LIMIT syntax such as MySQL, this problem does not 
 exist. However, the new query does not hurt.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition

2015-01-22 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-9447:
---
Affects Version/s: 0.14.0
   Status: Patch Available  (was: In Progress)

 Metastore: inefficient Oracle query for removing unused column descriptors 
 when add/drop table/partition
 

 Key: HIVE-9447
 URL: https://issues.apache.org/jira/browse/HIVE-9447
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-9447.1.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Metastore needs removing unused column descriptors when drop/add partitions 
 or tables. For query the unused column descriptor, the current implementation 
 utilizes datanuleus' range function, which basically equals LIMIT syntax. 
 However, Oracle does not support LIMIT, the query is converted as  
 {quote}
 SQL SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
 NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
 A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 
 WHERE A0.CD_ID = ? ) subq ) WHERE  rn = 1;
 {quote}
 Given that CD_ID is not very selective, this query may have to access large 
 amount of rows (depends how many partitions the table has, millions of rows 
 in our case). Metastore may become unresponsive because of this. 
 Since Metastore only needs to know if the specific CD_ID is referenced in SDS 
 table and does not need access the whole row. We can use 
 {quote}
 select count(1) from SDS where SDS.CD_ID=?
 {quote}
 CD_ID is index column, the above query will do range scan for index, which is 
 faster. 
 For other DBs support LIMIT syntax such as MySQL, this problem does not 
 exist. However, the new query does not hurt.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition

2015-01-22 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-9447:
---
Attachment: HIVE-9447.1.patch

 Metastore: inefficient Oracle query for removing unused column descriptors 
 when add/drop table/partition
 

 Key: HIVE-9447
 URL: https://issues.apache.org/jira/browse/HIVE-9447
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-9447.1.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Metastore needs removing unused column descriptors when drop/add partitions 
 or tables. For query the unused column descriptor, the current implementation 
 utilizes datanuleus' range function, which basically equals LIMIT syntax. 
 However, Oracle does not support LIMIT, the query is converted as  
 {quote}
 SQL SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
 NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
 A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 
 WHERE A0.CD_ID = ? ) subq ) WHERE  rn = 1;
 {quote}
 Given that CD_ID is not very selective, this query may have to access large 
 amount of rows (depends how many partitions the table has, millions of rows 
 in our case). Metastore may become unresponsive because of this. 
 Since Metastore only needs to know if the specific CD_ID is referenced in SDS 
 table and does not need access the whole row. We can use 
 {quote}
 select count(1) from SDS where SDS.CD_ID=?
 {quote}
 CD_ID is index column, the above query will do range scan for index, which is 
 faster. 
 For other DBs support LIMIT syntax such as MySQL, this problem does not 
 exist. However, the new query does not hurt.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-8293:
--

 Summary: Metastore direct SQL failed for Oracle becasue ORA-01722: 
invalid number
 Key: HIVE-8293
 URL: https://issues.apache.org/jira/browse/HIVE-8293
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Selina Zhang
Assignee: Selina Zhang


The direct SQL route of retrieve partition objects through filters failed for 
Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
PARTITION_KEY_VALs table to decimal before evaluate the condition. 

Here is the stack trace:
{quote}
2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query select 
PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ?   inner 
join DBS on TBLS.DB_ID = DBS.DB_ID  and DBS.NAME = ? inner 
join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = 
PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when 
TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL 
as decimal(21,0)) else null end)  ?)).
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: ORA-01722: invalid number
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152087#comment-14152087
 ] 

Selina Zhang commented on HIVE-8293:


It is easy to reproduce:
{block}
hive create table a (col string) partitioned by (dt string);
hive create table b (col string) partitioned by (idx int); 
hive alter table a add partition(dt='20140808');
hive alter table b add partition(idx=50);  
hive select * from b where idx  10;
{block}

 Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number
 

 Key: HIVE-8293
 URL: https://issues.apache.org/jira/browse/HIVE-8293
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Selina Zhang
Assignee: Selina Zhang

 The direct SQL route of retrieve partition objects through filters failed for 
 Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
 PARTITION_KEY_VALs table to decimal before evaluate the condition. 
 Here is the stack trace:
 {quote}
 2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
 PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ?   inner 
 join DBS on TBLS.DB_ID = DBS.DB_ID  and DBS.NAME = ? inner 
 join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = 
 PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when 
 TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL 
 as decimal(21,0)) else null end)  ?)).
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 NestedThrowablesStackTrace:
 

[jira] [Work started] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8293 started by Selina Zhang.
--
 Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number
 

 Key: HIVE-8293
 URL: https://issues.apache.org/jira/browse/HIVE-8293
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-8293.1.patch


 The direct SQL route of retrieve partition objects through filters failed for 
 Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
 PARTITION_KEY_VALs table to decimal before evaluate the condition. 
 Here is the stack trace:
 {quote}
 2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
 PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ?   inner 
 join DBS on TBLS.DB_ID = DBS.DB_ID  and DBS.NAME = ? inner 
 join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = 
 PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when 
 TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL 
 as decimal(21,0)) else null end)  ?)).
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 NestedThrowablesStackTrace:
 java.sql.SQLSyntaxErrorException: ORA-01722: invalid number
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8293:
---
Attachment: HIVE-8293.1.patch

Use the similar workaround as DERBY-6358 : add projected column to case-when.

 Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number
 

 Key: HIVE-8293
 URL: https://issues.apache.org/jira/browse/HIVE-8293
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-8293.1.patch


 The direct SQL route of retrieve partition objects through filters failed for 
 Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
 PARTITION_KEY_VALs table to decimal before evaluate the condition. 
 Here is the stack trace:
 {quote}
 2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
 PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ?   inner 
 join DBS on TBLS.DB_ID = DBS.DB_ID  and DBS.NAME = ? inner 
 join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = 
 PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when 
 TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL 
 as decimal(21,0)) else null end)  ?)).
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 NestedThrowablesStackTrace:
 java.sql.SQLSyntaxErrorException: ORA-01722: invalid number
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8293:
---
Status: Patch Available  (was: In Progress)

 Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number
 

 Key: HIVE-8293
 URL: https://issues.apache.org/jira/browse/HIVE-8293
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-8293.1.patch


 The direct SQL route of retrieve partition objects through filters failed for 
 Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
 PARTITION_KEY_VALs table to decimal before evaluate the condition. 
 Here is the stack trace:
 {quote}
 2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
 PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ?   inner 
 join DBS on TBLS.DB_ID = DBS.DB_ID  and DBS.NAME = ? inner 
 join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = 
 PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when 
 TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL 
 as decimal(21,0)) else null end)  ?)).
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 NestedThrowablesStackTrace:
 java.sql.SQLSyntaxErrorException: ORA-01722: invalid number
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-29 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-8295:
--

 Summary: Add batch retrieve partition objects for metastore direct 
sql 
 Key: HIVE-8295
 URL: https://issues.apache.org/jira/browse/HIVE-8295
 Project: Hive
  Issue Type: Bug
 Environment: Currently in MetastoreDirectSql partition objects are 
constructed in a way that fetching partition ids first. However, if the 
partition ids that match the filter is larger than 1000, direct sql will fail 
with the following stack trace:

{quote}
2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms + 
13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID  
and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
where ((FILTER2.PART_KEY_VAL = ?))]
2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query select 
PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, 
SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, 
SERDES.SLIB from PARTITIONS  left outer join SDS on 
PARTITIONS.SD_ID = SDS.SD_ID   left outer join SERDES on 
SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
(136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
) order by PART_NAME asc.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in 

[jira] [Updated] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8295:
---
Description: 
Currently in MetastoreDirectSql partition objects are constructed in a way that 
fetching partition ids first. However, if the partition ids that match the 
filter is larger than 1000, direct sql will fail with the following stack trace:

{quote}
2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms + 
13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID  
and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
where ((FILTER2.PART_KEY_VAL = ?))]
2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query select 
PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, 
SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, 
SERDES.SLIB from PARTITIONS  left outer join SDS on 
PARTITIONS.SD_ID = SDS.SD_ID   left outer join SERDES on 
SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
(136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
) order by PART_NAME asc.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a 
list is 1000
{quote}

Add retrieve partition objects in batch for direct sql will solve this Oracle 
specific problem. 

[jira] [Updated] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8295:
---
Environment: (was: Currently in MetastoreDirectSql partition objects 
are constructed in a way that fetching partition ids first. However, if the 
partition ids that match the filter is larger than 1000, direct sql will fail 
with the following stack trace:

{quote}
2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms + 
13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID  
and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
where ((FILTER2.PART_KEY_VAL = ?))]
2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query select 
PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, 
SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, 
SERDES.SLIB from PARTITIONS  left outer join SDS on 
PARTITIONS.SD_ID = SDS.SD_ID   left outer join SERDES on 
SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
(136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
) order by PART_NAME asc.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a 
list is 1000
{quote}

Add retrieve partition objects in batch for direct sql will solve this Oracle 
specific 

[jira] [Updated] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8295:
---
Description: 
Currently in MetastoreDirectSql partition objects are constructed in a way that 
fetching partition ids first. However, if the partition ids that match the 
filter is larger than 1000, direct sql will fail with the following stack trace:

{code}
2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms + 
13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID  
and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
where ((FILTER2.PART_KEY_VAL = ?))]
2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query select 
PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, 
SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, 
SERDES.SLIB from PARTITIONS  left outer join SDS on 
PARTITIONS.SD_ID = SDS.SD_ID   left outer join SERDES on 
SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
(136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
) order by PART_NAME asc.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a 
list is 1000
{code}

Add retrieve partition objects in batch for direct sql will solve this Oracle 
specific problem. 

[jira] [Updated] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8293:
---
Description: 
The direct SQL route of retrieve partition objects through filters failed for 
Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
PARTITION_KEY_VALs table to decimal before evaluate the condition. 

Here is the stack trace:
{code}
2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query select 
PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ?   inner 
join DBS on TBLS.DB_ID = DBS.DB_ID  and DBS.NAME = ? inner 
join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = 
PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when 
TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL 
as decimal(21,0)) else null end)  ?)).
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
at 
org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
java.sql.SQLSyntaxErrorException: ORA-01722: invalid number
{code}

  was:
The direct SQL route of retrieve partition objects through filters failed for 
Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in 
PARTITION_KEY_VALs table to decimal before evaluate the condition. 

Here is the stack trace:
{quote}
2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore 
(ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back 
to ORM
javax.jdo.JDODataStoreException: Error executing SQL query select 
PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ?   inner 
join DBS on TBLS.DB_ID = DBS.DB_ID  and DBS.NAME = ? inner 
join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = 
PARTITIONS.PART_ID and 

[jira] [Work started] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8295 started by Selina Zhang.
--
 Add batch retrieve partition objects for metastore direct sql 
 --

 Key: HIVE-8295
 URL: https://issues.apache.org/jira/browse/HIVE-8295
 Project: Hive
  Issue Type: Bug
Reporter: Selina Zhang
Assignee: Selina Zhang

 Currently in MetastoreDirectSql partition objects are constructed in a way 
 that fetching partition ids first. However, if the partition ids that match 
 the filter is larger than 1000, direct sql will fail with the following stack 
 trace:
 {code}
 2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
 (MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms 
 + 13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
 inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
 TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID
   and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
 FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
 where ((FILTER2.PART_KEY_VAL = ?))]
 2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
 PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
 SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, 
 SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, 
 SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS  
 left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID   left outer 
 join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
 (136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
 ) order by PART_NAME asc.
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
 at 
 

[jira] [Updated] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8295:
---
Attachment: HIVE-8295.1.patch

In this patch, I did not add configuration parameter to control the batch size, 
for 1000 is hard limit and not configurable. 


 Add batch retrieve partition objects for metastore direct sql 
 --

 Key: HIVE-8295
 URL: https://issues.apache.org/jira/browse/HIVE-8295
 Project: Hive
  Issue Type: Bug
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-8295.1.patch


 Currently in MetastoreDirectSql partition objects are constructed in a way 
 that fetching partition ids first. However, if the partition ids that match 
 the filter is larger than 1000, direct sql will fail with the following stack 
 trace:
 {code}
 2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
 (MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms 
 + 13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
 inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
 TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID
   and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
 FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
 where ((FILTER2.PART_KEY_VAL = ?))]
 2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
 PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
 SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, 
 SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, 
 SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS  
 left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID   left outer 
 join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
 (136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
 ) order by PART_NAME asc.
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
 at 
 

[jira] [Updated] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql

2014-09-29 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-8295:
---
Status: Patch Available  (was: In Progress)

 Add batch retrieve partition objects for metastore direct sql 
 --

 Key: HIVE-8295
 URL: https://issues.apache.org/jira/browse/HIVE-8295
 Project: Hive
  Issue Type: Bug
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-8295.1.patch


 Currently in MetastoreDirectSql partition objects are constructed in a way 
 that fetching partition ids first. However, if the partition ids that match 
 the filter is larger than 1000, direct sql will fail with the following stack 
 trace:
 {code}
 2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql 
 (MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms 
 + 13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS  
 inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and 
 TBLS.TBL_NAME = ?   inner join DBS on TBLS.DB_ID = DBS.DB_ID
   and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on 
 FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 
 where ((FILTER2.PART_KEY_VAL = ?))]
 2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, 
 PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, 
 SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, 
 SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, 
 SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS  
 left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID   left outer 
 join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in 
 (136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217...
 ) order by PART_NAME asc.
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422)
 at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
 at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
 at 
 

[jira] [Commented] (HIVE-7368) datanucleus sometimes returns an empty result instead of an error or data

2014-09-09 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127931#comment-14127931
 ] 

Selina Zhang commented on HIVE-7368:


Hi Sush,

My feeling for the root cause probably is not as same as yours. Just want to 
provide another possibility. Please correct me if I am wrong. 

Exist databases(tables) were reported as non-exist: This is due to the 
connection to db (mysql/oracle) was bounced back due to the connection pool is 
small and the thread waiting time is too short. Currently this internal error 
exception was mistakenly casted to NoSuchObjectException. We have to fix the 
misleading error message. ( 

Parallelism execution: This is due to meta store usually hold connections for a 
very long time because lots of drop/add/alter operations have HDFS operations 
involved. Sometimes the table stats also are collected during the window. And 
connections to db is not shared by the meta store clients. So the best practice 
for parallelism is increasing the size of connection pools(DBCP for example). 
The db load is not heavy at all, we can utilize the concurrency of existing 
RDBMS. DirectSQL get_database definitely will hold connection for much less 
time than ORM get_database, so the connection shortage problem may not be 
obvious. 

I think datanucleus.connectionPool.testSQL=SELECT 1 is the validation query 
for DBCP to validate the underneath connection to RDBMS. Have it set DBCP will 
guarantee the connection each time we borrow from the connection pool is valid. 

Thanks,


 datanucleus sometimes returns an empty result instead of an error or data
 -

 Key: HIVE-7368
 URL: https://issues.apache.org/jira/browse/HIVE-7368
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.12.0
Reporter: Sushanth Sowmyan

 I investigated a scenario wherein a user needed to use a large number of 
 concurrent hive clients doing simple DDL tasks, while not using a standalone 
 metastore server. Say, for eg., each of them doing drop table if exists 
 tmp_blah_${i};
 This would consistently fail stating that it could not create a db, which is 
 a funny error to have when trying to drop a db if exists. On digging in, it 
 turned out that the error was a mistaken report, coming instead from an 
 attempt by the embedded metastore attempting to create a default db when it 
 did not exist. The funny thing being that the default db did exist, and the 
 getDatabase call would return empty, rather than returning an error or 
 returning a result. We could disable hive.metastore.checkForDefaultDb and the 
 number of these reports would drastically fall, but that only moved the 
 problem, and we'd get phantom reports from time to time of various other 
 databases that existed that were being reported as non-existent.
 On digging further, parallelism seemed to be an important factor in whether 
 or not hive was able to perform getDatabases without error. With about 20 
 simultaneous processes, there seemed to be no errors whatsoever. At about 40 
 simultaneous processes, at least 1 would consistently fail. At about 200, 
 about 15-20 would consistently fail, in addition to taking a long time to run.
 I wrote a sample JDBC ping (actually a get_database mimic) utility to see 
 whether the issue was with connecting from that machine to the database 
 server, and this had no errors whatsoever up to 400 simultaneous processes. 
 The mysql server in question was configured to serve up to 650 connections, 
 and it seemed to be serving responses quickly and did not seem overloaded. We 
 also disabled connection pooling in case that was exacerbating a connection 
 availability issue with that many concurrent processes, each with an embedded 
 metastore. That, especially in conjunction with disabling schema checking, 
 and specifying a datanucleus.connectionPool.testSQL=SELECT 1 did a fair 
 amount for performance in this scenarios, but the errors (or rather, the 
 null-result-successes when there shouldn't have been one) continued.
 On checking through hive again, if we modified hive to have datanucleus 
 simply return a connection, with which we did a direct sql get database, 
 there would not be any error, but if we tried to use jdo on datanucleus to 
 construct a db object, we would get an empty result, so the issue seems to 
 crop up in the jdo mapping.
 One of the biggest issues with this investigation, for me, was the difficulty 
 of reproducibility. When trying to reproduce in a lab, we were unable to 
 create a similar enough environment that caused the issue. Even in the 
 client's environment, moving from RHEL5 to RHEL6 made the issue go away.
 Thus, we still have work to do on determining the underlying issue, I'm 
 logging this issue to 

[jira] [Commented] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-29 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115538#comment-14115538
 ] 

Selina Zhang commented on HIVE-7803:


The test failures seem not related to this patch. Saw the same failures for 
HIVE-7890. 

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 {code}
 STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();
 {code}
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 {code}
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter 
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext = 
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-28 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-7803:
---

Attachment: HIVE-7803.2.patch

You are right, I forgot set it to true for 
FileOutputCommitterContainer.needsTaskCommit().I have updated the patch and in 
this patch I also removed some lines we do not need anymore. 

Thanks for reviewing this!

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 {code}
 STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();
 {code}
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 {code}
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter 
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext = 
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work stopped] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-28 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7803 stopped by Selina Zhang.

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 {code}
 STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();
 {code}
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 {code}
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter 
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext = 
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-28 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7803 started by Selina Zhang.

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 {code}
 STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();
 {code}
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 {code}
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter 
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext = 
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-28 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-7803:
---

Status: Patch Available  (was: In Progress)

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 {code}
 STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();
 {code}
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 {code}
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter 
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext = 
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-20 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-7803:
--

 Summary: Enable Hadoop speculative execution may cause corrupt 
output directory (dynamic partition)
 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 

Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical


One of our users reports they see intermittent failures due to attempt 
directories in the input paths. We found with speculative execution turned on, 
two mappers tried to commit task at the same time using the same committed task 
path,  which cause the corrupt output directory. 

The original Pig script:
(STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
USING org.apache.hcatalog.pig.HCatStorer();)

Two mappers
attempt_1405021984947_5394024_m_000523_0: KILLED
attempt_1405021984947_5394024_m_000523_1: SUCCEEDED

attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.

As a result, it created corrupt directory as 
  
/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
containing 
   part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
and 
   attempt_1405021984947_5394024_m_000523_1/part-m-00523

Namenode Audit log
==
1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
cmd=create 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
 dst=null  perm=user:group:rw-r-

2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
cmd=create 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
 dst=null  perm=user:group:rw-r-

3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
cmd=rename 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
perm=user:group:rwxr-x---

4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
cmd=rename 
src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
perm=user:group:rwxr-x---

After consulting our Hadoop core team, we was pointed out some HCat code does 
not participating in the two-phase commit protocol, for example in 
FileRecordWriterContainer.close():

for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter
entry : baseDynamicCommitters.entrySet()) {
org.apache.hadoop.mapred.TaskAttemptContext currContext =
dynamicContexts.get(entry.getKey());
OutputCommitter baseOutputCommitter = entry.getValue();
if (baseOutputCommitter.needsTaskCommit(currContext)) {
baseOutputCommitter.commitTask(currContext);
}
}





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-20 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7803 started by Selina Zhang.

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical

 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 (STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();)
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext =
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)

2014-08-20 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-7803:
---

Attachment: HIVE-7803.1.patch

Just realized the dynamic partition creation part was refactored to 
DynamicPartitionFileWriterContainer.java.

Moved the commit logic for each generated dynamic partition directory to 
FileOutputCommitterContainer.java. Also add the clean up code to remove the 
attempt directories when the task is failed/abandoned. 

 Enable Hadoop speculative execution may cause corrupt output directory 
 (dynamic partition)
 --

 Key: HIVE-7803
 URL: https://issues.apache.org/jira/browse/HIVE-7803
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
 Environment: 
Reporter: Selina Zhang
Assignee: Selina Zhang
Priority: Critical
 Attachments: HIVE-7803.1.patch


 One of our users reports they see intermittent failures due to attempt 
 directories in the input paths. We found with speculative execution turned 
 on, two mappers tried to commit task at the same time using the same 
 committed task path,  which cause the corrupt output directory. 
 The original Pig script:
 (STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME'
 USING org.apache.hcatalog.pig.HCatStorer();)
 Two mappers
 attempt_1405021984947_5394024_m_000523_0: KILLED
 attempt_1405021984947_5394024_m_000523_1: SUCCEEDED
 attempt_1405021984947_5394024_m_000523_0 was killed right after the commit.
 As a result, it created corrupt directory as 
   
 /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/
 containing 
part-m-00523 (from attempt_1405021984947_5394024_m_000523_0)
 and 
attempt_1405021984947_5394024_m_000523_1/part-m-00523
 Namenode Audit log
 ==
 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523
  dst=null  perm=user:group:rw-r-
 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=create 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523
  dst=null  perm=user:group:rw-r-
 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2  
 cmd=rename 
 src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1
 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523
 perm=user:group:rwxr-x---
 After consulting our Hadoop core team, we was pointed out some HCat code does 
 not participating in the two-phase commit protocol, for example in 
 FileRecordWriterContainer.close():
 for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter
 entry : baseDynamicCommitters.entrySet()) {
 org.apache.hadoop.mapred.TaskAttemptContext currContext =
 dynamicContexts.get(entry.getKey());
 OutputCommitter baseOutputCommitter = entry.getValue();
 if (baseOutputCommitter.needsTaskCommit(currContext)) {
 baseOutputCommitter.commitTask(currContext);
 }
 }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050571#comment-14050571
 ] 

Selina Zhang commented on HIVE-7090:


Just wonder how transaction works with temp table if we plan to add transaction 
management on top of it, since the meta data stores in the client side. 

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050726#comment-14050726
 ] 

Selina Zhang commented on HIVE-7090:


I mean if we want to support ROLLBACK/COMMIT in next release, how do we 
rollback the changes for temp table. It seems server does not have a clue where 
the data location is. 

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050903#comment-14050903
 ] 

Selina Zhang commented on HIVE-7090:


Thanks for the explanation! It will work. Somehow I got a wrong impression ACID 
needs store addition information along with data location. 

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7195) Improve Metastore performance

2014-06-13 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031289#comment-14031289
 ] 

Selina Zhang commented on HIVE-7195:


[~ sershe] I added the link for the drop table JIRA. For the idea of stateless 
iterator, Metastore has to be database-aware.  Oracle does not support LIMIT, 
and instead it is using the pseudo-column rownum. 

 Improve Metastore performance
 -

 Key: HIVE-7195
 URL: https://issues.apache.org/jira/browse/HIVE-7195
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Critical

 Even with direct SQL, which significantly improves MS performance, some 
 operations take a considerable amount of time, when there are many partitions 
 on table. Specifically I believe the issue:
 * When a client gets all partitions we do not send them an iterator, we 
 create a collection of all data and then pass the object over the network in 
 total
 * Operations which require looking up data on the NN can still be slow since 
 there is no cache of information and it's done in a serial fashion
 * Perhaps a tangent, but our client timeout is quite dumb. The client will 
 timeout and the server has no idea the client is gone. We should use 
 deadlines, i.e. pass the timeout to the server so it can calculate that the 
 client has expired.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5664) Drop cascade database fails when the db has any tables with indexes

2014-05-12 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995390#comment-13995390
 ] 

Selina Zhang commented on HIVE-5664:


We had the same issue. In case we cannot move the drop db cascade to server 
side for now, the simple fix can be client request table name/index name list 
again after each drop request. It is not perfect solution, but is simple and 
more general. 

 Drop cascade database fails when the db has any tables with indexes
 ---

 Key: HIVE-5664
 URL: https://issues.apache.org/jira/browse/HIVE-5664
 Project: Hive
  Issue Type: Bug
  Components: Indexing, Metastore
Affects Versions: 0.10.0, 0.11.0, 0.12.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.14.0

 Attachments: HIVE-5664.1.patch.txt


 {code}
 CREATE DATABASE db2; 
 USE db2; 
 CREATE TABLE tab1 (id int, name string); 
 CREATE INDEX idx1 ON TABLE tab1(id) as 'COMPACT' with DEFERRED REBUILD IN 
 TABLE tab1_indx; 
 DROP DATABASE db2 CASCADE;
 {code}
 Last DDL fails with the following error:
 {code}
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. Database does not exist: db2
 Hive.log has following exception
 2013-10-27 20:46:16,629 ERROR exec.DDLTask (DDLTask.java:execute(434)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: db2
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.dropDatabase(DDLTask.java:3473)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:231)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1441)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1219)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1047)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:915)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: NoSuchObjectException(message:db2.tab1_indx table not found)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1376)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
 at com.sun.proxy.$Proxy7.get_table(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:890)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:660)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:652)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:546)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
 at com.sun.proxy.$Proxy8.dropDatabase(Unknown Source)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:284)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.dropDatabase(DDLTask.java:3470)
 ... 18 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7016) Hive returns wrong results when execute UDF on top of DISTINCT column

2014-05-05 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-7016:
--

 Summary: Hive returns wrong results when execute UDF on top of 
DISTINCT column
 Key: HIVE-7016
 URL: https://issues.apache.org/jira/browse/HIVE-7016
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.1
Reporter: Selina Zhang


The following query returns wrong result:
select hash(distinct value) from table;

This kind of query should be identified as syntax error. However, Hive ignores 
DISTINCT and returns the result. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6980) Drop table by using direct sql

2014-05-01 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986839#comment-13986839
 ] 

Selina Zhang commented on HIVE-6980:


[~sershe]I have done the test based on 0.12 version. With delete cascade, drop 
table with 5000 partitions (oracle flavor) only takes less than 4 seconds(all 
partition data paths are subdirectories of table path). Will do more test with 
different alter table/partitions combinations. Thanks!

 Drop table by using direct sql
 --

 Key: HIVE-6980
 URL: https://issues.apache.org/jira/browse/HIVE-6980
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang

 Dropping table which has lots of partitions is slow. Even after applying the 
 patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
 The fixes come with two parts:
 1. use directSQL to query the partitions protect mode;
 the current implementation needs to transfer the Partition object to client 
 and check the protect mode for each partition. I'd like to move this part of 
 logic to metastore. The check will be done by direct sql (if direct sql is 
 disabled, execute the same logic in the ObjectStore);
 2. use directSQL to drop partitions for table;
 there maybe two solutions here:
 1. add DELETE CASCADE in the schema. In this way we only need to delete 
 entries from partitions table use direct sql. May need to change 
 datanucleus.deletionPolicy = DataNucleus. 
 2. clean up the dependent tables by issue DELETE statement. This also needs 
 to turn on datanucleus.query.sql.allowAll
 Both of above solutions should be able to fix the problem. The DELETE CASCADE 
 has to change schemas and prepare upgrade scripts. The second solutions added 
 maintenance cost if new tables added in the future releases.
 Please advice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-04-30 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986234#comment-13986234
 ] 

Selina Zhang commented on HIVE-6765:


[~adrian-wang] We came across the same problem. Just wonder why this 
ASTNodeOrigin serialization problem came up suddenly. It is not the first day 
we join with a view. The root cause maybe something else? Thanks!

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Fix For: 0.13.0

 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6980) Drop table by using direct sql

2014-04-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983605#comment-13983605
 ] 

Selina Zhang commented on HIVE-6980:


[~ashutoshc] Thanks! I have checked HIVE-6809. That patch would not help the 
use case of drop table. In HIVE-6809, the main problem it is trying to solve is 
optimize the deletion of partition data. Instead delete the partition path one 
by one, the new patch tried to delete the paths in bulk. However, in drop table 
case, this optimization has been done(the method isSubdirectory() check if the 
partition path is subdirectory of table path). 

The problem causes dropping table very slow is how we drop partitions. 
1.In client side, DDLTask asks server for each Partition object in order to 
check if the partition is protected. So if we have 100K partitions, 100K 
Partition objects will be constructed and passing along from the server to 
client (in a batch way, so won't be OOM here). 
2.After the check, client sends drop table request to the server. And 
HiveMetaStore begins the fun again. It retrieves each Partition object again 
and check the data path. Then send the partition names to ObjectStore and ask 
for deletion (in batch mode).
3. The ObjectStore got a list of partition names. With the fix of HIVE-6252, 
the DataNucleus will do the real work. It cleans the related metadata tables 
for each partition. I have turned on the query log for MySQL. For dropping a 
table with 6 partitions, there were 72 DELETEs, 32 SELECT COUNTs and 17 SELECT 
DISTINCTs issued.



 Drop table by using direct sql
 --

 Key: HIVE-6980
 URL: https://issues.apache.org/jira/browse/HIVE-6980
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang

 Dropping table which has lots of partitions is slow. Even after applying the 
 patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
 The fixes come with two parts:
 1. use directSQL to query the partitions protect mode;
 the current implementation needs to transfer the Partition object to client 
 and check the protect mode for each partition. I'd like to move this part of 
 logic to metastore. The check will be done by direct sql (if direct sql is 
 disabled, execute the same logic in the ObjectStore);
 2. use directSQL to drop partitions for table;
 there maybe two solutions here:
 1. add DELETE CASCADE in the schema. In this way we only need to delete 
 entries from partitions table use direct sql. May need to change 
 datanucleus.deletionPolicy = DataNucleus. 
 2. clean up the dependent tables by issue DELETE statement. This also needs 
 to turn on datanucleus.query.sql.allowAll
 Both of above solutions should be able to fix the problem. The DELETE CASCADE 
 has to change schemas and prepare upgrade scripts. The second solutions added 
 maintenance cost if new tables added in the future releases.
 Please advice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6980) Drop table by using direct sql

2014-04-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983695#comment-13983695
 ] 

Selina Zhang commented on HIVE-6980:


[~sershe]
Thanks for the comments! 

Sorry, I linked to the wrong JIRA. It should be HIVE-6256. :)

1. Protection check:
The work flows of drop table and drop partitions are different. Drop partitions 
need to have the Partition object. So the protection check is free. In drop 
table, we do not need to check partition by partition. The following statement 
will return all the partitions has protect mode on in one shot
{code}
String queryText =  select PARTITIONS.PART_NAME, 
+   PARTITION_PARAMS.PARAM_VALUE from PARTITIONS
+   inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID 
+   inner join DBS on TBLS.DB_ID = DBS.DB_ID 
+   inner join PARTITION_PARAMS on PARTITIONS.PART_ID = 
PARTITION_PARAMS.PART_ID 
+   where TBLS.TBL_NAME = ? and DBS.NAME = ? and PARAM_KEY = ' + 
ProtectMode.PARAMETER_NAME + ';
{code}

2. Construct the Partition object itself is expensive. Even with direct sql, 
you need to query bunch of tables and do loop join for the Partitions objects. 

3. I have the same concern as you in HIVE-6397(Hope I did not type wrong this 
time :)). But after research this page (session Deletion, using Foreign Keys 
(RDBMS)), I think turn on delete cascade is a good idea. It will save trips 
for DN. 
http://www.datanucleus.org/products/datanucleus/jdo/orm/cascading.html

Very appreciate your previous work on meta store. It gets better and better!

 Drop table by using direct sql
 --

 Key: HIVE-6980
 URL: https://issues.apache.org/jira/browse/HIVE-6980
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang

 Dropping table which has lots of partitions is slow. Even after applying the 
 patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 
 The fixes come with two parts:
 1. use directSQL to query the partitions protect mode;
 the current implementation needs to transfer the Partition object to client 
 and check the protect mode for each partition. I'd like to move this part of 
 logic to metastore. The check will be done by direct sql (if direct sql is 
 disabled, execute the same logic in the ObjectStore);
 2. use directSQL to drop partitions for table;
 there maybe two solutions here:
 1. add DELETE CASCADE in the schema. In this way we only need to delete 
 entries from partitions table use direct sql. May need to change 
 datanucleus.deletionPolicy = DataNucleus. 
 2. clean up the dependent tables by issue DELETE statement. This also needs 
 to turn on datanucleus.query.sql.allowAll
 Both of above solutions should be able to fix the problem. The DELETE CASCADE 
 has to change schemas and prepare upgrade scripts. The second solutions added 
 maintenance cost if new tables added in the future releases.
 Please advice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6980) Drop table by using direct sql

2014-04-27 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-6980:
--

 Summary: Drop table by using direct sql
 Key: HIVE-6980
 URL: https://issues.apache.org/jira/browse/HIVE-6980
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang


Dropping table which has lots of partitions is slow. Even after applying the 
patch of HIVE-6265, the drop table still takes hours (100K+ partitions). 

The fixes come with two parts:
1. use directSQL to query the partitions protect mode;
the current implementation needs to transfer the Partition object to client and 
check the protect mode for each partition. I'd like to move this part of logic 
to metastore. The check will be done by direct sql (if direct sql is disabled, 
execute the same logic in the ObjectStore);

2. use directSQL to drop partitions for table;
there maybe two solutions here:
1. add DELETE CASCADE in the schema. In this way we only need to delete 
entries from partitions table use direct sql. May need to change 
datanucleus.deletionPolicy = DataNucleus. 
2. clean up the dependent tables by issue DELETE statement. This also needs to 
turn on datanucleus.query.sql.allowAll

Both of above solutions should be able to fix the problem. The DELETE CASCADE 
has to change schemas and prepare upgrade scripts. The second solutions added 
maintenance cost if new tables added in the future releases.

Please advice. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951150#comment-13951150
 ] 

Selina Zhang commented on HIVE-6492:


[~leftylev] Thanks for reminding! We can put 
This controls how many partitions can be scanned for each partitioned table. 
The default value -1 means no limit.  
What do you think? 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6492) limit partition number involved in a table scan

2014-03-25 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-6492:
---

Attachment: HIVE-6492.7.parch.txt

Uploaded new patch based on the review from Ashutosh.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-6492) limit partition number involved in a table scan

2014-03-25 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang reassigned HIVE-6492:
--

Assignee: Ashutosh Chauhan  (was: Selina Zhang)

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-25 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947410#comment-13947410
 ] 

Selina Zhang commented on HIVE-6492:


Thanks, Ashutosh! Yes, it just limits partition per table scan intentionally. 
It based on the assumption that most of queries only involve one instance 
table. And it is more like a supplement for strict mode. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt, HIVE-6492.7.parch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-20 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942548#comment-13942548
 ] 

Selina Zhang commented on HIVE-6492:


Review request is here:
https://reviews.apache.org/r/19373/

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6492) limit partition number involved in a table scan

2014-03-11 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-6492:
---

Attachment: HIVE-6492.6.patch.txt

[~hagleitn]  - I added a flag to TableScanOperator to mark that particular 
table scan has been optimized as metadataonly. And moved the logic to the end 
of analyzer because at that point the physical optimizers have finished. In 
this way, both MR/Tez work, and also releases metadata only queries. I 
hesitated to touch TableScanOperator but it seems no better way. What do you 
think? Thanks!


 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt, HIVE-6492.6.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-07 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924459#comment-13924459
 ] 

Selina Zhang commented on HIVE-6492:


[~hagleitn] Thank you for the suggestions!

I will work on the suggestion 2 and move the code to the driver. Because 
currently I am working on a patch to shorten the execution time for the 
metadata only query (which is important for BI tools). I prefer leaving out 
metadata only query from this limitation.  What do you think?

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-6492:
---

Attachment: HIVE-6492.4.patch.txt

Gunther, thanks for your comments!

Removed the logic for simple fetch query. Let the query pass if it is a fetch 
operator (no mapreduce job launched). 

However, I still need to put the logic right after the physical optimizers 
because only till then I have the information that if the query is a metadata 
only query. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-6492:
---

Attachment: HIVE-6492.5.patch.txt

Thank you,Gunther! I like your patch though it does not really care of metadata 
only query. But I agree put it to SemanticAnalyzer is better. 

I just renamed the suggestion patch and re-submit it. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920237#comment-13920237
 ] 

Selina Zhang commented on HIVE-6492:


In the new test case limit_partition_2.q:
select distinct hr from srcpart;
should let pass because hr is the partition key. With the new patch, it is 
blocked:
FAILED: SemanticException Number of partitions scanned (=4) on table srcpart 
exceeds limit (=1). This is controlled by hive.limit.query.max.table.partition.

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-03-04 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920358#comment-13920358
 ] 

Selina Zhang commented on HIVE-6492:


Also should let the test case pass in limit_partition_3.q 

set hive.compute.query.using.stats=true;
set hive.limit.query.max.table.partition=1;
select count(*) from part;

for it does not need a table scan. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt, HIVE-6492.4.patch.txt, HIVE-6492.4.patch_suggestion, 
 HIVE-6492.5.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6492) limit partition number involved in a table scan

2014-03-03 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-6492:
---

Attachment: HIVE-6492.3.patch.txt

The test failure seems not related to this patch. Re-submit the patch. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt, 
 HIVE-6492.3.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916310#comment-13916310
 ] 

Selina Zhang commented on HIVE-6492:


Strict mode disables types of queries we cannot disabled. We need:
1. enable queries on small table without partition filters;
2. select * from table issues from Tableau, because it is a must to enable 
Tableau connects Hive Server directly through ODBC driver;
3. Enable aggregation on partition keys without partition limits. 
Thanks for reviewing the changes!

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916765#comment-13916765
 ] 

Selina Zhang commented on HIVE-6492:


The original patch actually has two tasks included:
1. limit the partition number when a table scan happens:
2. a hack to identify the query from Tableau and do special handling for it.
As we discussed, the second task is just a hack and probably it is not helpful 
if commit it to trunk. So I created a new patch which only contains the first 
task. 
The reason of introducing this configure variable is we want to limit the 
number of partitions when do table scan. As for metadata only query, since 
HIVE-1003 has the optimization on this type of query , the table scan is not a 
problem any more. 



 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-26 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913795#comment-13913795
 ] 

Selina Zhang commented on HIVE-6492:


It is not a rare case when a table has 1000+ partitions. To avoid people issue 
a query lack of knowledge how many partitions will be scanned, introducing one 
more configure variable hive.limit.query.max.table.partition will enable 
system admin to protect the grid. 

The default value is set to -1 which means no limit. 

This variable will be ignored in the following cases:
1. Simple fetch query with limit : 
select * from table limit n;
2. Metadata only query: 
select distinct partition_key from partition_table;

There is one special case: Sometimes BI tools such as Tableau (connected 
through ODBC driver) will issue 
   select * from table
at the initial stage to figure out table meta data. It will not hurt the grid 
because Tableau will cancel the query after it receives one or two rows. To 
allow Tableau still can work, code is added to mark the query client types such 
as CLIDriver and JDBC. And only allow ODBC-sourced query go through. 




 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6492) limit partition number involved in a table scan

2014-02-26 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-6492:
---

Attachment: HIVE-6492.1.patch.txt

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6492.1.patch.txt

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6492) limit partition number involved in a table scan

2014-02-24 Thread Selina Zhang (JIRA)
Selina Zhang created HIVE-6492:
--

 Summary: limit partition number involved in a table scan
 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0


To protect the cluster, a new configure variable 
hive.limit.query.max.table.partition is added to hive configuration to
limit the table partitions involved in a table scan. 

The default value will be set to -1 which means there is no limit by default. 

This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HIVE-6492) limit partition number involved in a table scan

2014-02-24 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang resolved HIVE-6492.


Resolution: Fixed

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Reopened] (HIVE-6492) limit partition number involved in a table scan

2014-02-24 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang reopened HIVE-6492:



Will submit a patch for it. 

 limit partition number involved in a table scan
 ---

 Key: HIVE-6492
 URL: https://issues.apache.org/jira/browse/HIVE-6492
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Selina Zhang
 Fix For: 0.13.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 To protect the cluster, a new configure variable 
 hive.limit.query.max.table.partition is added to hive configuration to
 limit the table partitions involved in a table scan. 
 The default value will be set to -1 which means there is no limit by default. 
 This variable will not affect metadata only query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)