date:20150421

[jira] [Created] (HIVE-10421) DROP TABLE with qualified table name ignores database name when checking partitions

2015-04-21 Thread Jason Dere (JIRA)

Jason Dere created HIVE-10421:
-

 Summary: DROP TABLE with qualified table name ignores database 
name when checking partitions
 Key: HIVE-10421
 URL: https://issues.apache.org/jira/browse/HIVE-10421
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere


Hive was only recently changed to allow drop table dbname.tabname. However 
DDLTask.dropTable() is still using an older version of 
Hive.getPartitionNames(), which only took in a single string for the table 
name, rather than the database and table names. As a result Hive is filling in 
the current database name as the dbname during the listPartitions call to the 
MetaStore.
It also appears that on the Hive Metastore side, in the non-auth path there is 
no validation to check that the dbname.tablename actually exists - this call 
simply returns back an empty list of partitions, which causes the table to be 
dropped without checking any of the partition information. I will open a 
separate issue for this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10422) HiveMetaStoreClient.listPartitionNames() does not return error for non-existent table

2015-04-21 Thread Jason Dere (JIRA)

Jason Dere created HIVE-10422:
-

 Summary: HiveMetaStoreClient.listPartitionNames() does not return 
error for non-existent table
 Key: HIVE-10422
 URL: https://issues.apache.org/jira/browse/HIVE-10422
 Project: Hive
  Issue Type: Bug
  Components: API
Reporter: Jason Dere


In the non-auth case, calling HiveMetaStoreClient.getPartitionNames() on a 
non-existent table returns an empty list, rather than NoSuchObjectException.

It looks like currently all of the checking for valid table is being done at 
the SemanticAnalyzer level, and no such checking done in the API/metastore 
level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 32549: HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data

2015-04-21 Thread Gunther Hagleitner


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32549/#review81047
---



ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java
https://reviews.apache.org/r/32549/#comment131267

typo in comment (operatos)


- Gunther Hagleitner


On April 20, 2015, 6:42 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32549/
 ---
 
 (Updated April 20, 2015, 6:42 p.m.)
 
 
 Review request for hive, Gunther Hagleitner and Vikram Dixit Kumaraswamy.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 In q.test environment with src table, execute the following query: 
 {code}
 CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE;
 
 CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE;
 
 FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
  UNION all 
   select s2.key as key, s2.value as value from src s2) unionsrc
 INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT 
 SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
 INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, 
 COUNT(DISTINCT SUBSTR(unionsrc.value,5)) 
 GROUP BY unionsrc.key, unionsrc.value;
 
 select * from DEST1;
 select * from DEST2;
 {code}
 
 DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row 
 tst1500 1
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Vertex.java 
 b45c782 
   itests/src/test/resources/testconfiguration.properties 0a5d839 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java 90616ad 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 4dcdf91 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 0990894 
   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWorkWalker.java 08fd61e 
   ql/src/test/queries/clientpositive/explainuser_2.q 03264ca 
   ql/src/test/queries/clientpositive/tez_union_multiinsert.q PRE-CREATION 
   ql/src/test/results/clientpositive/tez/explainuser_2.q.out ea6b558 
   ql/src/test/results/clientpositive/tez/tez_union_multiinsert.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/32549/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong

[jira] [Created] (HIVE-10424) LLAP: Factor known capacity into scheduling decisions

2015-04-21 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-10424:
-

 Summary: LLAP: Factor known capacity into scheduling decisions
 Key: HIVE-10424
 URL: https://issues.apache.org/jira/browse/HIVE-10424
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10425) LLAP: Control number of threads used to communicate with a single LLAP instance

2015-04-21 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-10425:
-

 Summary: LLAP: Control number of threads used to communicate with 
a single LLAP instance
 Key: HIVE-10425
 URL: https://issues.apache.org/jira/browse/HIVE-10425
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10420) Black-list for table-properties in replicated-tables.

2015-04-21 Thread Mithun Radhakrishnan (JIRA)

Mithun Radhakrishnan created HIVE-10420:
---

 Summary: Black-list for table-properties in replicated-tables.
 Key: HIVE-10420
 URL: https://issues.apache.org/jira/browse/HIVE-10420
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Metastore
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan


(Not essential for 1.2 release, although this'll be good to have.)
When table-schema changes are propagated between 2 HiveMetastore/HCatalog 
instances (using {{HCatTable.diff()}} and {{HCatTable.resolve()}}, some table 
properties are replicated identically, even though those properties might be 
specific to the source-table (or source-metastore).

For instance,
# Last update/DDL time
# JMS message coordinates
# Whether or not the table is external (ideally)

We should run the replication properties through a black-list filter, and have 
these removed when generating diffs, or replicating tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh

2015-04-21 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-10423:
-

 Summary: HIVE-7948 breaks deploy_e2e_artifacts.sh
 Key: HIVE-10423
 URL: https://issues.apache.org/jira/browse/HIVE-10423
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Aswathy Chellammal Sreekumar


HIVE-7948 added a step to download a ml-1m.zip file and unzip it.

this only works if you call deploy_e2e_artifacts.sh once.  If you call it again 
(which is very common in dev) it blocks and ask for additional input from user 
because target files already exist.

This needs to be changed similarly to what we discussed for HIVE-9272, i.e. 
place artifacts not under source control in testdist/.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10419) can't do query on partitioned view with analytical function in strictmode

2015-04-21 Thread Hector Lagos (JIRA)

Hector Lagos created HIVE-10419:
---

 Summary: can't do query on partitioned view with analytical 
function in strictmode
 Key: HIVE-10419
 URL: https://issues.apache.org/jira/browse/HIVE-10419
 Project: Hive
  Issue Type: Bug
  Components: Hive, Views
Affects Versions: 0.13.0
 Environment: Cloudera 5.3.x. 
Reporter: Hector Lagos


Hey Guysm





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect

2015-04-21 Thread Chinna Rao Lalam (JIRA)

Chinna Rao Lalam created HIVE-10415:
---

 Summary: hive.start.cleanup.scratchdir configuration is not taking 
effect
 Key: HIVE-10415
 URL: https://issues.apache.org/jira/browse/HIVE-10415
 Project: Hive
  Issue Type: Bug
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 1.2.0


This configuration hive.start.cleanup.scratchdir is not taking effect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite

2015-04-21 Thread Jesus Camacho Rodriguez (JIRA)

Jesus Camacho Rodriguez created HIVE-10416:
--

 Summary: CBO (Calcite Return Path): Fix return columns if Sort 
operator is on top of plan returned by Calcite
 Key: HIVE-10416
 URL: https://issues.apache.org/jira/browse/HIVE-10416
 Project: Hive
  Issue Type: Sub-task
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


When return path is on, if the plan's top operator is a Sort, we need to 
produce a SelectOp that will output exactly the columns needed by the FS.

The following query reproduces the problem:

{noformat}
select cbo_t3.c_int, c, count(*)
from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1
where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int  0 or cbo_t1.c_float = 0)
group by c_float, cbo_t1.c_int, key order by a) cbo_t1
join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2
where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int  0 or cbo_t2.c_float = 0)
group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on 
cbo_t1.a=p
join cbo_t3 on cbo_t1.a=key
where (b + cbo_t2.q = 0) and (b  0 or c_int = 0)
group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10414) Hive query does not run on data of file size 825MB

2015-04-21 Thread Olalekan Elesin (JIRA)

Olalekan Elesin created HIVE-10414:
--

 Summary: Hive query does not run on data of file size 825MB
 Key: HIVE-10414
 URL: https://issues.apache.org/jira/browse/HIVE-10414
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0
Reporter: Olalekan Elesin


I'm currently running Hive 1.0.0 on a single node Hadoop cluster. I have 
created a table in Hive but anytime I run a query that involves MapReduce, Hive 
hangs and doesn't run the query. The file size is about 835MB.
Please help. 

Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10417) Parallel Order By return wrong results for partitioned tables

2015-04-21 Thread Nemon Lou (JIRA)

Nemon Lou created HIVE-10417:


 Summary: Parallel Order By return wrong results for partitioned 
tables
 Key: HIVE-10417
 URL: https://issues.apache.org/jira/browse/HIVE-10417
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0, 0.13.1, 0.14.0
Reporter: Nemon Lou


Following is the script that reproduce this bug.
set hive.optimize.sampling.orderby=true;
set mapreduce.job.reduces=10;
select * from src order by key desc limit 10;
+--++
| src.key  | src.value  |
+--++
| 98   | val_98 |
| 98   | val_98 |
| 97   | val_97 |
| 97   | val_97 |
| 96   | val_96 |
| 95   | val_95 |
| 95   | val_95 |
| 92   | val_92 |
| 90   | val_90 |
| 90   | val_90 |
+--++
10 rows selected (47.916 seconds)
reset;
create table src_orc_p (key string ,value string )
partitioned by (kp string)
stored as orc
tblproperties(orc.compress=SNAPPY);
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1;
set hive.exec.max.dynamic.partitions=1;
insert into table src_orc_p partition(kp) select *,substring(key,1) from src 
distribute by substring(key,1);
set mapreduce.job.reduces=10;
set hive.optimize.sampling.orderby=true;
select * from src_orc_p order by key desc limit 10;
++--+-+
| src_orc_p.key  | src_orc_p.value  | src_orc_p.kend  |
++--+-+
| 0  | val_0| 0   |
| 0  | val_0| 0   |
| 0  | val_0| 0   |
++--+-+
3 rows selected (39.861 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

2015-04-21 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/#review80969
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java
https://reviews.apache.org/r/33251/#comment131153

1. For clarity, it might be good to put this in a separate private method.
2. Does it work if we just synchronize on mapJoinTables[pos]?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
https://reviews.apache.org/r/33251/#comment131156

Method naming, see below.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
https://reviews.apache.org/r/33251/#comment131154

Using thread-local makes me a little nervous, but let's discuss about this 
offline.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
https://reviews.apache.org/r/33251/#comment131155

The method name suggests no indication of a side effect of setting thread 
local value. We'd better put this outside of this method.

In addition, the method name seems also a little confusing in that it 
suggests cleanup is for sure but in fact it's conditional.


- Xuefu Zhang


On April 21, 2015, 1:37 a.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33251/
 ---
 
 (Updated April 21, 2015, 1:37 a.m.)
 
 
 Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
 
 
 Bugs: HIVE-10302
 https://issues.apache.org/jira/browse/HIVE-10302
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Cached the small table containter so that mapjoin tasks can use it if the 
 task is executed on the same Spark executor.
 The cache is released right before the next job after the mapjoin job is done.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
 fe108c4 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 3f240f5 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
 97b3471 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 72ab913 
 
 Diff: https://reviews.apache.org/r/33251/diff/
 
 
 Testing
 ---
 
 Ran several queries in live cluster. ptest pending.
 
 
 Thanks,
 
 Jimmy Xiang

[jira] [Created] (HIVE-10418) It is impossible to avoid the deprecated AggregationBuffer when implementing a GenericUDAFEvaluator

2015-04-21 Thread Daniel Mescheder (JIRA)

Daniel Mescheder created HIVE-10418:
---

 Summary: It is impossible to avoid the deprecated 
AggregationBuffer when implementing a GenericUDAFEvaluator
 Key: HIVE-10418
 URL: https://issues.apache.org/jira/browse/HIVE-10418
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.1
Reporter: Daniel Mescheder


To create a custom UDAF I derived from GenericUDAFEvaluator (in scala). The 
public interface of this class uses the AggregationBuffer class.
The scala compiler complains because the interface of my class makes heavy use 
of a deprecated type (AggregationBuffer) - however there is no way to use the 
suggested AbstractAggregationBuffer due to the interface of the parent class.

Expected behaviour: As long as AggregationBuffer is still an unavoidable part 
of the public interface it should not be marked deprecated. 
If it remains deprecated, GenericUDAFEvaluator methods should take 
AbstractAggregationBuffer arguments instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation

2015-04-21 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-10426:
---

 Summary: Rework/simplify ReplicationTaskFactory instantiation
 Key: HIVE-10426
 URL: https://issues.apache.org/jira/browse/HIVE-10426
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Creating a new jira to continue discussions of what ReplicationTask.Factory 
instantiation should look like.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10428) NPE in RegexSerDe using HCat

2015-04-21 Thread Jason Dere (JIRA)

Jason Dere created HIVE-10428:
-

 Summary: NPE in RegexSerDe using HCat
 Key: HIVE-10428
 URL: https://issues.apache.org/jira/browse/HIVE-10428
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Jason Dere
Assignee: Jason Dere


When HCatalog calls to table with org.apache.hadoop.hive.serde2.RegexSerDe, 
when doing Hcatalog call to get read the table, it throws exception:

{noformat}
15/04/21 14:07:31 INFO security.TokenCache: Got dt for hdfs://hdpsecahdfs; 
Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hdpsecahdfs, Ident: 
(HDFS_DELEGATION_TOKEN token 1478 for haha)
15/04/21 14:07:31 INFO mapred.FileInputFormat: Total input paths to process : 1
Splits len : 1
SplitInfo : [hdpseca03.seca.hwxsup.com, hdpseca04.seca.hwxsup.com, 
hdpseca05.seca.hwxsup.com]
15/04/21 14:07:31 INFO mapreduce.InternalUtil: Initializing 
org.apache.hadoop.hive.serde2.RegexSerDe with properties 
{name=casetest.regex_table, numFiles=1, columns.types=string,string, 
serialization.format=1, columns=id,name, rawDataSize=0, numRows=0, 
output.format.string=%1$s %2$s, 
serialization.lib=org.apache.hadoop.hive.serde2.RegexSerDe, 
COLUMN_STATS_ACCURATE=true, totalSize=25, serialization.null.format=\N, 
input.regex=([^ ]*) ([^ ]*), transient_lastDdlTime=1429590172}
15/04/21 14:07:31 WARN serde2.RegexSerDe: output.format.string has been 
deprecated
Exception in thread main java.lang.NullPointerException
at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
at com.google.common.base.Splitter.split(Splitter.java:371)
at 
org.apache.hadoop.hive.serde2.RegexSerDe.initialize(RegexSerDe.java:155)
at 
org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:49)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:518)
at 
org.apache.hive.hcatalog.mapreduce.InternalUtil.initializeDeserializer(InternalUtil.java:156)
at 
org.apache.hive.hcatalog.mapreduce.HCatRecordReader.createDeserializer(HCatRecordReader.java:127)
at 
org.apache.hive.hcatalog.mapreduce.HCatRecordReader.initialize(HCatRecordReader.java:92)
at HCatalogSQLMR.main(HCatalogSQLMR.java:81)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10430) HIVE-9937 broke hadoop-1 build

2015-04-21 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-10430:


 Summary: HIVE-9937 broke hadoop-1 build
 Key: HIVE-10430
 URL: https://issues.apache.org/jira/browse/HIVE-10430
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


TestLazySimpleFast uses Text.copyBytes() that is not present in hadoop-1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10432) Need to add more e2e like tests between HiveServer2 and JDBC using wiremock or equivalent

2015-04-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-10432:


 Summary: Need to add more e2e like tests between HiveServer2 and 
JDBC using wiremock or equivalent
 Key: HIVE-10432
 URL: https://issues.apache.org/jira/browse/HIVE-10432
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


The current unit tests use ThriftCLIService to test client-server interaction. 
We will need to mock HS2 to facilitate use of writing test cases where we can 
parse HTTP request/response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10434) Cancel connection to HS2 when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun (JIRA)

Chao Sun created HIVE-10434:
---

 Summary: Cancel connection to HS2 when remote Spark driver process 
has failed [Spark Branch] 
 Key: HIVE-10434
 URL: https://issues.apache.org/jira/browse/HIVE-10434
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Affects Versions: 1.2.0
Reporter: Chao Sun
Assignee: Chao Sun


Currently in HoS, in SparkClientImpl it first launch a remote Driver process, 
and then wait for it to connect back to the HS2. However, in certain situations 
(for instance, permission issue), the remote process may fail and exit with 
error code. In this situation, the HS2 process will still wait for the process 
to connect, and wait for a full timeout period before it throws the exception.

What makes it worth, user may need to wait for two timeout periods: one for the 
SparkSetReducerParallelism, and another for the actual Spark job. This could be 
very annoying.

We should cancel the timeout task once we found out that the process has 
failed, and set the promise as failed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

RE: hive contributor meetup in bay area

2015-04-21 Thread Xu, Cheng A

Hi Thejas, could you post the slides in advance on the wiki 
https://cwiki.apache.org/confluence/display/Hive/Presentations if you have?

-Original Message-
From: Thejas Nair [mailto:thejas.n...@gmail.com] 
Sent: Wednesday, April 22, 2015 9:35 AM
To: dev
Subject: Re: hive contributor meetup in bay area

I have also created a webex link for those who are unable to attend in person - 
http://www.meetup.com/Hive-Contributors-Group/events/221610423/

Please RSVP yes ONLY if you are planning to attend in person.

On Tue, Apr 21, 2015 at 4:49 PM, Thejas Nair thejas.n...@gmail.com wrote:
 FYI, there is contributor meetup being hosted tomorrow evening at the 
 Hortonworks office in Santa Clara, CA

 http://www.meetup.com/Hive-Contributors-Group/events/221610423/

 Please RSVP in the meetup page if you would like to attend.

 Thanks,
 Thejas

Preparation for Hive-1.2 release

2015-04-21 Thread Sushanth Sowmyan

Hi Folks,

Per my mail 3 weeks back, we should start getting ready to release 1.2
as a rollup. And as per my proposal to manage this release, I'd like
to start off the process of forking 1.2, and making trunk 1.3.

I've set up a cwiki page for people to land development patches that
are almost done, to signal their desire that this be included in 1.2 :
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

A rough timeline I see for this process would be to fork this Friday
(24th Apr), and then start rolling out RC0 by, say, Wednesday next
week. This would mean that I would request that if you want your jira
included in 1.2, it be close to completion, or have a patch available
for review. By mid next week, also, I expect to freeze the wiki
inclusion list for features, and keep it open only for bugfixes
discovered during testing the various RCs.

Please feel free to edit that jira with your requests, or, if you
don't have edit privileges, if you reply to this mail, I can add it
in. (Also, if you don't have wiki edit privileges, you should probably
ask for it. :p)

Thanks!
-Sushanth

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun



 On April 22, 2015, 12:38 a.m., Marcelo Vanzin wrote:
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, 
  line 172
  https://reviews.apache.org/r/33422/diff/1/?file=938965#file938965line172
 
  This will throw an exception if the child process exits with a non-zero 
  status after the RSC connects back to HS2. I don't think you want that.

Oh yes. I forgot that case.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81103
---


On April 22, 2015, 12:30 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33422/
 ---
 
 (Updated April 22, 2015, 12:30 a.m.)
 
 
 Review request for hive and Marcelo Vanzin.
 
 
 Bugs: HIVE-10434
 https://issues.apache.org/jira/browse/HIVE-10434
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch cancels the connection from HS2 to remote process once the latter 
 has failed and exited with error code, to
 avoid potential long timeout.
 It add a new public method cancelClient to the RpcServer class - not sure 
 whether there's an easier way to do this..
 
 
 Diffs
 -
 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 71e432d 
   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
 32d4c46 
 
 Diff: https://reviews.apache.org/r/33422/diff/
 
 
 Testing
 ---
 
 Tested on my own cluster, and it worked.
 
 
 Thanks,
 
 Chao Sun

Re: hive contributor meetup in bay area

2015-04-21 Thread Thejas Nair

I have also created a webex link for those who are unable to attend in
person - http://www.meetup.com/Hive-Contributors-Group/events/221610423/

Please RSVP yes ONLY if you are planning to attend in person.


On Tue, Apr 21, 2015 at 4:49 PM, Thejas Nair thejas.n...@gmail.com wrote:
 FYI, there is contributor meetup being hosted tomorrow evening at the
 Hortonworks office in Santa Clara, CA

 http://www.meetup.com/Hive-Contributors-Group/events/221610423/

 Please RSVP in the meetup page if you would like to attend.

 Thanks,
 Thejas

Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

2015-04-21 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81115
---



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131359

I don't think this comment is applicable.

From 
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html
 -

The memory effects for accesses and updates of atomics generally follow the 
rules for volatiles, as stated in section 17.4 of The Java™ Language 
Specification.

get has the memory effects of reading a volatile variable.
set has the memory effects of writing (assigning) a volatile variable.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131364

I don't think we really need the locks at the level of candidate list, it 
can be made a more finer lock by using ConcurrentLinkedQueue or something 
similar.
The only mutable part of AggrColStatsCached is already stored in a volatile 
member.

Can you please open a follow up jira to explore that ?



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131360

match has to be null if this exception is thrown. (unnecessary also is 
unintuitive.)



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131361

can we just skip these instead of adding them as potential candidates ?



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131362

as discussed offline, there is potential for improving the performance here 
by avoiding two loops. That can be done in a follow up jira.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131358

spawnCleaner() or startCleaner() might be a better name.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131363

we should give this thread a name (for ease of debugging).



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131367

this is not being used anywhere, can be removed.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131368

the name of this class is too similar to the outer class. I feel it would 
better to name it just AggrColStats or so



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131366

the time is already being updated from findBestMatch, so this isn't 
necessary.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131365

lets removed these unused classes.


- Thejas Nair


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33367/
 ---
 
 (Updated April 20, 2015, 6:44 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10382
 https://issues.apache.org/jira/browse/HIVE-10382
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Similar to the work done on the HBase branch (HIVE-9693), the stats cache can 
 potentially have performance gains.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
   common/src/test/org/apache/hive/common/util/TestBloomFilter.java 
 PRE-CREATION 
   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java 
 PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
 bf169c9 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2

Re: hive contributor meetup in bay area

2015-04-21 Thread Thejas Nair

I don't have the slides yet, I will ask for them.


From: Xu, Cheng A cheng.a...@intel.com
Sent: Tuesday, April 21, 2015 7:49 PM
To: dev@hive.apache.org
Subject: RE: hive contributor meetup in bay area

Hi Thejas, could you post the slides in advance on the wiki 
https://cwiki.apache.org/confluence/display/Hive/Presentations if you have?

-Original Message-
From: Thejas Nair [mailto:thejas.n...@gmail.com]
Sent: Wednesday, April 22, 2015 9:35 AM
To: dev
Subject: Re: hive contributor meetup in bay area

I have also created a webex link for those who are unable to attend in person - 
http://www.meetup.com/Hive-Contributors-Group/events/221610423/

Please RSVP yes ONLY if you are planning to attend in person.


On Tue, Apr 21, 2015 at 4:49 PM, Thejas Nair thejas.n...@gmail.com wrote:
 FYI, there is contributor meetup being hosted tomorrow evening at the
 Hortonworks office in Santa Clara, CA

 http://www.meetup.com/Hive-Contributors-Group/events/221610423/

 Please RSVP in the meetup page if you would like to attend.

 Thanks,
 Thejas

Re: Reading RC file using Mapreduce

2015-04-21 Thread Lefty Leverenz

Rakesh, you might get a quicker response if you send this question to
u...@hive.apache.org (instead of dev@hive.apache.org) and give more details
about what you have already tried.

-- Lefty

On Tue, Apr 21, 2015 at 6:00 AM, Rakesh Sharma raksha...@expedia.com
wrote:

 Hi hive dev team,

 Any quick Help in this regard will be really appreciated. We  are kind of
 stuck with this.

 Thanks and Regards,
 Rakesh.


 From: Rakesh Sharma raksha...@expedia.commailto:raksha...@expedia.com
 Date: Tuesday, April 21, 2015 at 12:26 PM
 To: dev@hive.apache.orgmailto:dev@hive.apache.org dev@hive.apache.org
 mailto:dev@hive.apache.org
 Subject: Reading RC file using Mapreduce

 Hi,

 I need to read an RC file in my map reduce using newer Api.

 I was trying to use RCFileMapReduceInputFormat, but it seems, it has a bug
 an rather than returning a single record, it returns the whole file.
 May be I am missing out something trivial.

 Could you please suggest, what can I use to read records from RC file.
 Any pointers or some sample code will be of great help.

 Thanks in advance.

 Regards,
 Rakesh.

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/
---

(Updated April 22, 2015, 1:25 a.m.)


Review request for hive and Marcelo Vanzin.


Bugs: HIVE-10434
https://issues.apache.org/jira/browse/HIVE-10434


Repository: hive-git


Description
---

This patch cancels the connection from HS2 to remote process once the latter 
has failed and exited with error code, to
avoid potential long timeout.
It add a new public method cancelClient to the RpcServer class - not sure 
whether there's an easier way to do this..


Diffs (updated)
-

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
71e432d 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
32d4c46 

Diff: https://reviews.apache.org/r/33422/diff/


Testing
---

Tested on my own cluster, and it worked.


Thanks,

Chao Sun

Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

2015-04-21 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81119
---



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131370

this is not being set to false, which means the cleaner would run only once.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131371

For tracking how the cache is performing, would be useful to have an INFO 
level message about how many entries were there and how many were removed due 
to expiry and if the eviction based on LRU is going to be triggered.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
https://reviews.apache.org/r/33367/#comment131372

evicting one LRU node at a time is expensive.
I think we should just reduce the TTL to 0.9*TTL , 0.8*TTL etc and call 
this function again. Can be done in a follow up jira.

Ideally, in the long term, we should think of using both the frequency of 
use and cost of re-computing the stats while deciding which ones to evict.


- Thejas Nair


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33367/
 ---
 
 (Updated April 20, 2015, 6:44 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-10382
 https://issues.apache.org/jira/browse/HIVE-10382
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Similar to the work done on the HBase branch (HIVE-9693), the stats cache can 
 potentially have performance gains.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
   common/src/test/org/apache/hive/common/util/TestBloomFilter.java 
 PRE-CREATION 
   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
   
 metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java 
 PRE-CREATION 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
 bf169c9 
   
 metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 
 32b95ab 
   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java 
 d0f3a5e 
 
 Diff: https://reviews.apache.org/r/33367/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Vaibhav Gumashta

Re: Preparation for Hive-1.2 release

2015-04-21 Thread Lefty Leverenz

You might want to allow extra time for the transition to git, unless it
goes very smoothly. Right now commits aren't possible.

And for those that don't know, here's how to get wiki edit privileges: About
This Wiki
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit
.

-- Lefty

On Tue, Apr 21, 2015 at 11:33 PM, Sushanth Sowmyan khorg...@gmail.com
wrote:

Hi Folks,

Per my mail 3 weeks back, we should start getting ready to release 1.2
as a rollup. And as per my proposal to manage this release, I'd like
to start off the process of forking 1.2, and making trunk 1.3.

I've set up a cwiki page for people to land development patches that
are almost done, to signal their desire that this be included in 1.2 :
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status

A rough timeline I see for this process would be to fork this Friday
(24th Apr), and then start rolling out RC0 by, say, Wednesday next
week. This would mean that I would request that if you want your jira
included in 1.2, it be close to completion, or have a patch available
for review. By mid next week, also, I expect to freeze the wiki
inclusion list for features, and keep it open only for bugfixes
discovered during testing the various RCs.

Please feel free to edit that jira with your requests, or, if you
don't have edit privileges, if you reply to this mail, I can add it
in. (Also, if you don't have wiki edit privileges, you should probably
ask for it. :p)

Thanks!
-Sushanth

[jira] [Created] (HIVE-10433) Cancel connection when remote driver process exited with error code [Spark Branch]

2015-04-21 Thread Chao Sun (JIRA)

Chao Sun created HIVE-10433:
---

 Summary: Cancel connection when remote driver process exited with 
error code [Spark Branch]
 Key: HIVE-10433
 URL: https://issues.apache.org/jira/browse/HIVE-10433
 Project: Hive
  Issue Type: Bug
  Components: spark-branch
Reporter: Chao Sun


Currently in HoS, after starting a remote process in SparkClientImpl, it will 
wait for the process to connect back. However, there are cases that the process 
may fail and exit with error code, and thus no connection is attempted. In this 
situation, the HS2 process will still wait for the connection and eventually 
timeout itself. What makes it worse, user may need to wait for two timeout 
periods, one for SparkSetReducerParallelism, and another for the actual Spark 
job.

We should cancel the timeout task and mark the promise as failed once we know 
that the process is failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/
---

Review request for hive and Marcelo Vanzin.


Bugs: HIVE-10434
https://issues.apache.org/jira/browse/HIVE-10434


Repository: hive-git


Description
---

This patch cancels the connection from HS2 to remote process once the latter 
has failed and exited with error code, to
avoid potential long timeout.
It add a new public method cancelClient to the RpcServer class - not sure 
whether there's an easier way to do this..


Diffs
-

  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
71e432d 
  spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
32d4c46 

Diff: https://reviews.apache.org/r/33422/diff/


Testing
---

Tested on my own cluster, and it worked.


Thanks,

Chao Sun

[jira] [Created] (HIVE-10429) LLAP: Abort hive tez processor on interrupts

2015-04-21 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-10429:


 Summary: LLAP: Abort hive tez processor on interrupts
 Key: HIVE-10429
 URL: https://issues.apache.org/jira/browse/HIVE-10429
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Executors in LLAP can be interrupted by the user (kill) or by system 
(pre-emption). The task interruption should be propagated all the way down to 
the operator pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10431) HIVE-9555 broke hadoop-1 build

2015-04-21 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-10431:


 Summary: HIVE-9555 broke hadoop-1 build
 Key: HIVE-10431
 URL: https://issues.apache.org/jira/browse/HIVE-10431
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Sergey Shelukhin


HIVE-9555 RecordReaderUtils uses direct bytebuffer read from FSDataInputStream 
which is not present in hadoop-1. This breaks hadoop-1 compilation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

hive contributor meetup in bay area

2015-04-21 Thread Thejas Nair

FYI, there is contributor meetup being hosted tomorrow evening at the
Hortonworks office in Santa Clara, CA

http://www.meetup.com/Hive-Contributors-Group/events/221610423/

Please RSVP in the meetup page if you would like to attend.

Thanks,
Thejas

Can anyone review HIVE-10275 ?

2015-04-21 Thread Alexander Pivovarov

Thank you

Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]

2015-04-21 Thread Marcelo Vanzin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33422/#review81103
---



spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
https://reviews.apache.org/r/33422/#comment131349

This will throw an exception if the child process exits with a non-zero 
status after the RSC connects back to HS2. I don't think you want that.



spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
https://reviews.apache.org/r/33422/#comment131351

While the only current call site reflects the error message, this method 
seems more generic than that. Maybe pass the error message as a parameter to 
the method?


- Marcelo Vanzin


On April 22, 2015, 12:30 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33422/
 ---
 
 (Updated April 22, 2015, 12:30 a.m.)
 
 
 Review request for hive and Marcelo Vanzin.
 
 
 Bugs: HIVE-10434
 https://issues.apache.org/jira/browse/HIVE-10434
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch cancels the connection from HS2 to remote process once the latter 
 has failed and exited with error code, to
 avoid potential long timeout.
 It add a new public method cancelClient to the RpcServer class - not sure 
 whether there's an easier way to do this..
 
 
 Diffs
 -
 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 71e432d 
   spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 
 32d4c46 
 
 Diff: https://reviews.apache.org/r/33422/diff/
 
 
 Testing
 ---
 
 Tested on my own cluster, and it worked.
 
 
 Thanks,
 
 Chao Sun

Reading RC file using Mapreduce

2015-04-21 Thread Rakesh Sharma

Hi,

I need to read an RC file in my map reduce using newer Api.

I was trying to use RCFileMapReduceInputFormat, but it seems, it has a bug an 
rather than returning a single record, it returns the whole file.
May be I am missing out something trivial.

Could you please suggest, what can I use to read records from RC file.
Any pointers or some sample code will be of great help.

Thanks in advance.

Regards,
Rakesh.

Re: Reading RC file using Mapreduce

2015-04-21 Thread Rakesh Sharma

Hi hive dev team,

Any quick Help in this regard will be really appreciated. We  are kind of stuck 
with this.

Thanks and Regards,
Rakesh.


From: Rakesh Sharma raksha...@expedia.commailto:raksha...@expedia.com
Date: Tuesday, April 21, 2015 at 12:26 PM
To: dev@hive.apache.orgmailto:dev@hive.apache.org 
dev@hive.apache.orgmailto:dev@hive.apache.org
Subject: Reading RC file using Mapreduce

Hi,

I need to read an RC file in my map reduce using newer Api.

I was trying to use RCFileMapReduceInputFormat, but it seems, it has a bug an 
rather than returning a single record, it returns the whole file.
May be I am missing out something trivial.

Could you please suggest, what can I use to read records from RC file.
Any pointers or some sample code will be of great help.

Thanks in advance.

Regards,
Rakesh.

[jira] [Created] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument

2015-04-21 Thread Alexander Behm (JIRA)

Alexander Behm created HIVE-10427:
-

 Summary: collect_list() and collect_set() should accept struct 
types as argument
 Key: HIVE-10427
 URL: https://issues.apache.org/jira/browse/HIVE-10427
 Project: Hive
  Issue Type: Wish
  Components: UDF
Reporter: Alexander Behm


The collect_list() and collect_set() functions currently only accept scalar 
argument types. It would be very useful if these functions could also accept 
struct argument types for creating nested data from flat data.
For example, suppose I wanted to create a nested customers/orders table from 
two flat tables, customers and orders. Then it'd be very convenient to write 
something like this:

{code}
insert into table nested_customers_orders
select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...))
from customers c inner join orders o on (c.cid = o.oid)
group by c.cid
{code}

Thanks you for your consideration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

39 matches

Mail list logo