[jira] [Created] (HIVE-10421) DROP TABLE with qualified table name ignores database name when checking partitions
Jason Dere created HIVE-10421: - Summary: DROP TABLE with qualified table name ignores database name when checking partitions Key: HIVE-10421 URL: https://issues.apache.org/jira/browse/HIVE-10421 Project: Hive Issue Type: Bug Reporter: Jason Dere Hive was only recently changed to allow drop table dbname.tabname. However DDLTask.dropTable() is still using an older version of Hive.getPartitionNames(), which only took in a single string for the table name, rather than the database and table names. As a result Hive is filling in the current database name as the dbname during the listPartitions call to the MetaStore. It also appears that on the Hive Metastore side, in the non-auth path there is no validation to check that the dbname.tablename actually exists - this call simply returns back an empty list of partitions, which causes the table to be dropped without checking any of the partition information. I will open a separate issue for this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10422) HiveMetaStoreClient.listPartitionNames() does not return error for non-existent table
Jason Dere created HIVE-10422: - Summary: HiveMetaStoreClient.listPartitionNames() does not return error for non-existent table Key: HIVE-10422 URL: https://issues.apache.org/jira/browse/HIVE-10422 Project: Hive Issue Type: Bug Components: API Reporter: Jason Dere In the non-auth case, calling HiveMetaStoreClient.getPartitionNames() on a non-existent table returns an empty list, rather than NoSuchObjectException. It looks like currently all of the checking for valid table is being done at the SemanticAnalyzer level, and no such checking done in the API/metastore level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 32549: HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32549/#review81047 --- ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java https://reviews.apache.org/r/32549/#comment131267 typo in comment (operatos) - Gunther Hagleitner On April 20, 2015, 6:42 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32549/ --- (Updated April 20, 2015, 6:42 p.m.) Review request for hive, Gunther Hagleitner and Vikram Dixit Kumaraswamy. Repository: hive-git Description --- In q.test environment with src table, execute the following query: {code} CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE; CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE; FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1 UNION all select s2.key as key, s2.value as value from src s2) unionsrc INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value; select * from DEST1; select * from DEST2; {code} DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row tst1500 1 Diffs - common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Vertex.java b45c782 itests/src/test/resources/testconfiguration.properties 0a5d839 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java 90616ad ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 4dcdf91 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 0990894 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWorkWalker.java 08fd61e ql/src/test/queries/clientpositive/explainuser_2.q 03264ca ql/src/test/queries/clientpositive/tez_union_multiinsert.q PRE-CREATION ql/src/test/results/clientpositive/tez/explainuser_2.q.out ea6b558 ql/src/test/results/clientpositive/tez/tez_union_multiinsert.q.out PRE-CREATION Diff: https://reviews.apache.org/r/32549/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Created] (HIVE-10424) LLAP: Factor known capacity into scheduling decisions
Siddharth Seth created HIVE-10424: - Summary: LLAP: Factor known capacity into scheduling decisions Key: HIVE-10424 URL: https://issues.apache.org/jira/browse/HIVE-10424 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10425) LLAP: Control number of threads used to communicate with a single LLAP instance
Siddharth Seth created HIVE-10425: - Summary: LLAP: Control number of threads used to communicate with a single LLAP instance Key: HIVE-10425 URL: https://issues.apache.org/jira/browse/HIVE-10425 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10420) Black-list for table-properties in replicated-tables.
Mithun Radhakrishnan created HIVE-10420: --- Summary: Black-list for table-properties in replicated-tables. Key: HIVE-10420 URL: https://issues.apache.org/jira/browse/HIVE-10420 Project: Hive Issue Type: Bug Components: HCatalog, Metastore Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan (Not essential for 1.2 release, although this'll be good to have.) When table-schema changes are propagated between 2 HiveMetastore/HCatalog instances (using {{HCatTable.diff()}} and {{HCatTable.resolve()}}, some table properties are replicated identically, even though those properties might be specific to the source-table (or source-metastore). For instance, # Last update/DDL time # JMS message coordinates # Whether or not the table is external (ideally) We should run the replication properties through a black-list filter, and have these removed when generating diffs, or replicating tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh
Eugene Koifman created HIVE-10423: - Summary: HIVE-7948 breaks deploy_e2e_artifacts.sh Key: HIVE-10423 URL: https://issues.apache.org/jira/browse/HIVE-10423 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Aswathy Chellammal Sreekumar HIVE-7948 added a step to download a ml-1m.zip file and unzip it. this only works if you call deploy_e2e_artifacts.sh once. If you call it again (which is very common in dev) it blocks and ask for additional input from user because target files already exist. This needs to be changed similarly to what we discussed for HIVE-9272, i.e. place artifacts not under source control in testdist/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10419) can't do query on partitioned view with analytical function in strictmode
Hector Lagos created HIVE-10419: --- Summary: can't do query on partitioned view with analytical function in strictmode Key: HIVE-10419 URL: https://issues.apache.org/jira/browse/HIVE-10419 Project: Hive Issue Type: Bug Components: Hive, Views Affects Versions: 0.13.0 Environment: Cloudera 5.3.x. Reporter: Hector Lagos Hey Guysm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10415) hive.start.cleanup.scratchdir configuration is not taking effect
Chinna Rao Lalam created HIVE-10415: --- Summary: hive.start.cleanup.scratchdir configuration is not taking effect Key: HIVE-10415 URL: https://issues.apache.org/jira/browse/HIVE-10415 Project: Hive Issue Type: Bug Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 1.2.0 This configuration hive.start.cleanup.scratchdir is not taking effect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
Jesus Camacho Rodriguez created HIVE-10416: -- Summary: CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10414) Hive query does not run on data of file size 825MB
Olalekan Elesin created HIVE-10414: -- Summary: Hive query does not run on data of file size 825MB Key: HIVE-10414 URL: https://issues.apache.org/jira/browse/HIVE-10414 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0 Reporter: Olalekan Elesin I'm currently running Hive 1.0.0 on a single node Hadoop cluster. I have created a table in Hive but anytime I run a query that involves MapReduce, Hive hangs and doesn't run the query. The file size is about 835MB. Please help. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10417) Parallel Order By return wrong results for partitioned tables
Nemon Lou created HIVE-10417: Summary: Parallel Order By return wrong results for partitioned tables Key: HIVE-10417 URL: https://issues.apache.org/jira/browse/HIVE-10417 Project: Hive Issue Type: Bug Affects Versions: 1.0.0, 0.13.1, 0.14.0 Reporter: Nemon Lou Following is the script that reproduce this bug. set hive.optimize.sampling.orderby=true; set mapreduce.job.reduces=10; select * from src order by key desc limit 10; +--++ | src.key | src.value | +--++ | 98 | val_98 | | 98 | val_98 | | 97 | val_97 | | 97 | val_97 | | 96 | val_96 | | 95 | val_95 | | 95 | val_95 | | 92 | val_92 | | 90 | val_90 | | 90 | val_90 | +--++ 10 rows selected (47.916 seconds) reset; create table src_orc_p (key string ,value string ) partitioned by (kp string) stored as orc tblproperties(orc.compress=SNAPPY); set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions.pernode=1; set hive.exec.max.dynamic.partitions=1; insert into table src_orc_p partition(kp) select *,substring(key,1) from src distribute by substring(key,1); set mapreduce.job.reduces=10; set hive.optimize.sampling.orderby=true; select * from src_orc_p order by key desc limit 10; ++--+-+ | src_orc_p.key | src_orc_p.value | src_orc_p.kend | ++--+-+ | 0 | val_0| 0 | | 0 | val_0| 0 | | 0 | val_0| 0 | ++--+-+ 3 rows selected (39.861 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33251/#review80969 --- ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java https://reviews.apache.org/r/33251/#comment131153 1. For clarity, it might be good to put this in a separate private method. 2. Does it work if we just synchronize on mapJoinTables[pos]? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java https://reviews.apache.org/r/33251/#comment131156 Method naming, see below. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java https://reviews.apache.org/r/33251/#comment131154 Using thread-local makes me a little nervous, but let's discuss about this offline. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java https://reviews.apache.org/r/33251/#comment131155 The method name suggests no indication of a side effect of setting thread local value. We'd better put this outside of this method. In addition, the method name seems also a little confusing in that it suggests cleanup is for sure but in fact it's conditional. - Xuefu Zhang On April 21, 2015, 1:37 a.m., Jimmy Xiang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33251/ --- (Updated April 21, 2015, 1:37 a.m.) Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-10302 https://issues.apache.org/jira/browse/HIVE-10302 Repository: hive-git Description --- Cached the small table containter so that mapjoin tasks can use it if the task is executed on the same Spark executor. The cache is released right before the next job after the mapjoin job is done. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java fe108c4 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 97b3471 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 72ab913 Diff: https://reviews.apache.org/r/33251/diff/ Testing --- Ran several queries in live cluster. ptest pending. Thanks, Jimmy Xiang
[jira] [Created] (HIVE-10418) It is impossible to avoid the deprecated AggregationBuffer when implementing a GenericUDAFEvaluator
Daniel Mescheder created HIVE-10418: --- Summary: It is impossible to avoid the deprecated AggregationBuffer when implementing a GenericUDAFEvaluator Key: HIVE-10418 URL: https://issues.apache.org/jira/browse/HIVE-10418 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.1 Reporter: Daniel Mescheder To create a custom UDAF I derived from GenericUDAFEvaluator (in scala). The public interface of this class uses the AggregationBuffer class. The scala compiler complains because the interface of my class makes heavy use of a deprecated type (AggregationBuffer) - however there is no way to use the suggested AbstractAggregationBuffer due to the interface of the parent class. Expected behaviour: As long as AggregationBuffer is still an unavoidable part of the public interface it should not be marked deprecated. If it remains deprecated, GenericUDAFEvaluator methods should take AbstractAggregationBuffer arguments instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation
Sushanth Sowmyan created HIVE-10426: --- Summary: Rework/simplify ReplicationTaskFactory instantiation Key: HIVE-10426 URL: https://issues.apache.org/jira/browse/HIVE-10426 Project: Hive Issue Type: Sub-task Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Creating a new jira to continue discussions of what ReplicationTask.Factory instantiation should look like. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10428) NPE in RegexSerDe using HCat
Jason Dere created HIVE-10428: - Summary: NPE in RegexSerDe using HCat Key: HIVE-10428 URL: https://issues.apache.org/jira/browse/HIVE-10428 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Jason Dere Assignee: Jason Dere When HCatalog calls to table with org.apache.hadoop.hive.serde2.RegexSerDe, when doing Hcatalog call to get read the table, it throws exception: {noformat} 15/04/21 14:07:31 INFO security.TokenCache: Got dt for hdfs://hdpsecahdfs; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hdpsecahdfs, Ident: (HDFS_DELEGATION_TOKEN token 1478 for haha) 15/04/21 14:07:31 INFO mapred.FileInputFormat: Total input paths to process : 1 Splits len : 1 SplitInfo : [hdpseca03.seca.hwxsup.com, hdpseca04.seca.hwxsup.com, hdpseca05.seca.hwxsup.com] 15/04/21 14:07:31 INFO mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.RegexSerDe with properties {name=casetest.regex_table, numFiles=1, columns.types=string,string, serialization.format=1, columns=id,name, rawDataSize=0, numRows=0, output.format.string=%1$s %2$s, serialization.lib=org.apache.hadoop.hive.serde2.RegexSerDe, COLUMN_STATS_ACCURATE=true, totalSize=25, serialization.null.format=\N, input.regex=([^ ]*) ([^ ]*), transient_lastDdlTime=1429590172} 15/04/21 14:07:31 WARN serde2.RegexSerDe: output.format.string has been deprecated Exception in thread main java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187) at com.google.common.base.Splitter.split(Splitter.java:371) at org.apache.hadoop.hive.serde2.RegexSerDe.initialize(RegexSerDe.java:155) at org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:49) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:518) at org.apache.hive.hcatalog.mapreduce.InternalUtil.initializeDeserializer(InternalUtil.java:156) at org.apache.hive.hcatalog.mapreduce.HCatRecordReader.createDeserializer(HCatRecordReader.java:127) at org.apache.hive.hcatalog.mapreduce.HCatRecordReader.initialize(HCatRecordReader.java:92) at HCatalogSQLMR.main(HCatalogSQLMR.java:81) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10430) HIVE-9937 broke hadoop-1 build
Prasanth Jayachandran created HIVE-10430: Summary: HIVE-9937 broke hadoop-1 build Key: HIVE-10430 URL: https://issues.apache.org/jira/browse/HIVE-10430 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran TestLazySimpleFast uses Text.copyBytes() that is not present in hadoop-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10432) Need to add more e2e like tests between HiveServer2 and JDBC using wiremock or equivalent
Hari Sankar Sivarama Subramaniyan created HIVE-10432: Summary: Need to add more e2e like tests between HiveServer2 and JDBC using wiremock or equivalent Key: HIVE-10432 URL: https://issues.apache.org/jira/browse/HIVE-10432 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan The current unit tests use ThriftCLIService to test client-server interaction. We will need to mock HS2 to facilitate use of writing test cases where we can parse HTTP request/response. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10434) Cancel connection to HS2 when remote Spark driver process has failed [Spark Branch]
Chao Sun created HIVE-10434: --- Summary: Cancel connection to HS2 when remote Spark driver process has failed [Spark Branch] Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Improvement Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: hive contributor meetup in bay area
Hi Thejas, could you post the slides in advance on the wiki https://cwiki.apache.org/confluence/display/Hive/Presentations if you have? -Original Message- From: Thejas Nair [mailto:thejas.n...@gmail.com] Sent: Wednesday, April 22, 2015 9:35 AM To: dev Subject: Re: hive contributor meetup in bay area I have also created a webex link for those who are unable to attend in person - http://www.meetup.com/Hive-Contributors-Group/events/221610423/ Please RSVP yes ONLY if you are planning to attend in person. On Tue, Apr 21, 2015 at 4:49 PM, Thejas Nair thejas.n...@gmail.com wrote: FYI, there is contributor meetup being hosted tomorrow evening at the Hortonworks office in Santa Clara, CA http://www.meetup.com/Hive-Contributors-Group/events/221610423/ Please RSVP in the meetup page if you would like to attend. Thanks, Thejas
Preparation for Hive-1.2 release
Hi Folks, Per my mail 3 weeks back, we should start getting ready to release 1.2 as a rollup. And as per my proposal to manage this release, I'd like to start off the process of forking 1.2, and making trunk 1.3. I've set up a cwiki page for people to land development patches that are almost done, to signal their desire that this be included in 1.2 : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status A rough timeline I see for this process would be to fork this Friday (24th Apr), and then start rolling out RC0 by, say, Wednesday next week. This would mean that I would request that if you want your jira included in 1.2, it be close to completion, or have a patch available for review. By mid next week, also, I expect to freeze the wiki inclusion list for features, and keep it open only for bugfixes discovered during testing the various RCs. Please feel free to edit that jira with your requests, or, if you don't have edit privileges, if you reply to this mail, I can add it in. (Also, if you don't have wiki edit privileges, you should probably ask for it. :p) Thanks! -Sushanth
Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
On April 22, 2015, 12:38 a.m., Marcelo Vanzin wrote: spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java, line 172 https://reviews.apache.org/r/33422/diff/1/?file=938965#file938965line172 This will throw an exception if the child process exits with a non-zero status after the RSC connects back to HS2. I don't think you want that. Oh yes. I forgot that case. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/#review81103 --- On April 22, 2015, 12:30 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/ --- (Updated April 22, 2015, 12:30 a.m.) Review request for hive and Marcelo Vanzin. Bugs: HIVE-10434 https://issues.apache.org/jira/browse/HIVE-10434 Repository: hive-git Description --- This patch cancels the connection from HS2 to remote process once the latter has failed and exited with error code, to avoid potential long timeout. It add a new public method cancelClient to the RpcServer class - not sure whether there's an easier way to do this.. Diffs - spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 71e432d spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 32d4c46 Diff: https://reviews.apache.org/r/33422/diff/ Testing --- Tested on my own cluster, and it worked. Thanks, Chao Sun
Re: hive contributor meetup in bay area
I have also created a webex link for those who are unable to attend in person - http://www.meetup.com/Hive-Contributors-Group/events/221610423/ Please RSVP yes ONLY if you are planning to attend in person. On Tue, Apr 21, 2015 at 4:49 PM, Thejas Nair thejas.n...@gmail.com wrote: FYI, there is contributor meetup being hosted tomorrow evening at the Hortonworks office in Santa Clara, CA http://www.meetup.com/Hive-Contributors-Group/events/221610423/ Please RSVP in the meetup page if you would like to attend. Thanks, Thejas
Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33367/#review81115 --- metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131359 I don't think this comment is applicable. From http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html - The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in section 17.4 of The Java™ Language Specification. get has the memory effects of reading a volatile variable. set has the memory effects of writing (assigning) a volatile variable. metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131364 I don't think we really need the locks at the level of candidate list, it can be made a more finer lock by using ConcurrentLinkedQueue or something similar. The only mutable part of AggrColStatsCached is already stored in a volatile member. Can you please open a follow up jira to explore that ? metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131360 match has to be null if this exception is thrown. (unnecessary also is unintuitive.) metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131361 can we just skip these instead of adding them as potential candidates ? metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131362 as discussed offline, there is potential for improving the performance here by avoiding two loops. That can be done in a follow up jira. metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131358 spawnCleaner() or startCleaner() might be a better name. metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131363 we should give this thread a name (for ease of debugging). metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131367 this is not being used anywhere, can be removed. metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131368 the name of this class is too similar to the outer class. I feel it would better to name it just AggrColStats or so metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131366 the time is already being updated from findBestMatch, so this isn't necessary. metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131365 lets removed these unused classes. - Thejas Nair On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33367/ --- (Updated April 20, 2015, 6:44 p.m.) Review request for hive. Bugs: HIVE-10382 https://issues.apache.org/jira/browse/HIVE-10382 Repository: hive-git Description --- Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2
Re: hive contributor meetup in bay area
I don't have the slides yet, I will ask for them. From: Xu, Cheng A cheng.a...@intel.com Sent: Tuesday, April 21, 2015 7:49 PM To: dev@hive.apache.org Subject: RE: hive contributor meetup in bay area Hi Thejas, could you post the slides in advance on the wiki https://cwiki.apache.org/confluence/display/Hive/Presentations if you have? -Original Message- From: Thejas Nair [mailto:thejas.n...@gmail.com] Sent: Wednesday, April 22, 2015 9:35 AM To: dev Subject: Re: hive contributor meetup in bay area I have also created a webex link for those who are unable to attend in person - http://www.meetup.com/Hive-Contributors-Group/events/221610423/ Please RSVP yes ONLY if you are planning to attend in person. On Tue, Apr 21, 2015 at 4:49 PM, Thejas Nair thejas.n...@gmail.com wrote: FYI, there is contributor meetup being hosted tomorrow evening at the Hortonworks office in Santa Clara, CA http://www.meetup.com/Hive-Contributors-Group/events/221610423/ Please RSVP in the meetup page if you would like to attend. Thanks, Thejas
Re: Reading RC file using Mapreduce
Rakesh, you might get a quicker response if you send this question to u...@hive.apache.org (instead of dev@hive.apache.org) and give more details about what you have already tried. -- Lefty On Tue, Apr 21, 2015 at 6:00 AM, Rakesh Sharma raksha...@expedia.com wrote: Hi hive dev team, Any quick Help in this regard will be really appreciated. We are kind of stuck with this. Thanks and Regards, Rakesh. From: Rakesh Sharma raksha...@expedia.commailto:raksha...@expedia.com Date: Tuesday, April 21, 2015 at 12:26 PM To: dev@hive.apache.orgmailto:dev@hive.apache.org dev@hive.apache.org mailto:dev@hive.apache.org Subject: Reading RC file using Mapreduce Hi, I need to read an RC file in my map reduce using newer Api. I was trying to use RCFileMapReduceInputFormat, but it seems, it has a bug an rather than returning a single record, it returns the whole file. May be I am missing out something trivial. Could you please suggest, what can I use to read records from RC file. Any pointers or some sample code will be of great help. Thanks in advance. Regards, Rakesh.
Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/ --- (Updated April 22, 2015, 1:25 a.m.) Review request for hive and Marcelo Vanzin. Bugs: HIVE-10434 https://issues.apache.org/jira/browse/HIVE-10434 Repository: hive-git Description --- This patch cancels the connection from HS2 to remote process once the latter has failed and exited with error code, to avoid potential long timeout. It add a new public method cancelClient to the RpcServer class - not sure whether there's an easier way to do this.. Diffs (updated) - spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 71e432d spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 32d4c46 Diff: https://reviews.apache.org/r/33422/diff/ Testing --- Tested on my own cluster, and it worked. Thanks, Chao Sun
Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33367/#review81119 --- metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131370 this is not being set to false, which means the cleaner would run only once. metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131371 For tracking how the cache is performing, would be useful to have an INFO level message about how many entries were there and how many were removed due to expiry and if the eviction based on LRU is going to be triggered. metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java https://reviews.apache.org/r/33367/#comment131372 evicting one LRU node at a time is expensive. I think we should just reduce the TTL to 0.9*TTL , 0.8*TTL etc and call this function again. Can be done in a follow up jira. Ideally, in the long term, we should think of using both the frequency of use and cost of re-computing the stats while deciding which ones to evict. - Thejas Nair On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33367/ --- (Updated April 20, 2015, 6:44 p.m.) Review request for hive. Bugs: HIVE-10382 https://issues.apache.org/jira/browse/HIVE-10382 Repository: hive-git Description --- Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e Diff: https://reviews.apache.org/r/33367/diff/ Testing --- Thanks, Vaibhav Gumashta
Re: Preparation for Hive-1.2 release
You might want to allow extra time for the transition to git, unless it goes very smoothly. Right now commits aren't possible. And for those that don't know, here's how to get wiki edit privileges: About This Wiki https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit . -- Lefty On Tue, Apr 21, 2015 at 11:33 PM, Sushanth Sowmyan khorg...@gmail.com wrote: Hi Folks, Per my mail 3 weeks back, we should start getting ready to release 1.2 as a rollup. And as per my proposal to manage this release, I'd like to start off the process of forking 1.2, and making trunk 1.3. I've set up a cwiki page for people to land development patches that are almost done, to signal their desire that this be included in 1.2 : https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status A rough timeline I see for this process would be to fork this Friday (24th Apr), and then start rolling out RC0 by, say, Wednesday next week. This would mean that I would request that if you want your jira included in 1.2, it be close to completion, or have a patch available for review. By mid next week, also, I expect to freeze the wiki inclusion list for features, and keep it open only for bugfixes discovered during testing the various RCs. Please feel free to edit that jira with your requests, or, if you don't have edit privileges, if you reply to this mail, I can add it in. (Also, if you don't have wiki edit privileges, you should probably ask for it. :p) Thanks! -Sushanth
[jira] [Created] (HIVE-10433) Cancel connection when remote driver process exited with error code [Spark Branch]
Chao Sun created HIVE-10433: --- Summary: Cancel connection when remote driver process exited with error code [Spark Branch] Key: HIVE-10433 URL: https://issues.apache.org/jira/browse/HIVE-10433 Project: Hive Issue Type: Bug Components: spark-branch Reporter: Chao Sun Currently in HoS, after starting a remote process in SparkClientImpl, it will wait for the process to connect back. However, there are cases that the process may fail and exit with error code, and thus no connection is attempted. In this situation, the HS2 process will still wait for the connection and eventually timeout itself. What makes it worse, user may need to wait for two timeout periods, one for SparkSetReducerParallelism, and another for the actual Spark job. We should cancel the timeout task and mark the promise as failed once we know that the process is failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/ --- Review request for hive and Marcelo Vanzin. Bugs: HIVE-10434 https://issues.apache.org/jira/browse/HIVE-10434 Repository: hive-git Description --- This patch cancels the connection from HS2 to remote process once the latter has failed and exited with error code, to avoid potential long timeout. It add a new public method cancelClient to the RpcServer class - not sure whether there's an easier way to do this.. Diffs - spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 71e432d spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 32d4c46 Diff: https://reviews.apache.org/r/33422/diff/ Testing --- Tested on my own cluster, and it worked. Thanks, Chao Sun
[jira] [Created] (HIVE-10429) LLAP: Abort hive tez processor on interrupts
Prasanth Jayachandran created HIVE-10429: Summary: LLAP: Abort hive tez processor on interrupts Key: HIVE-10429 URL: https://issues.apache.org/jira/browse/HIVE-10429 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Executors in LLAP can be interrupted by the user (kill) or by system (pre-emption). The task interruption should be propagated all the way down to the operator pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10431) HIVE-9555 broke hadoop-1 build
Prasanth Jayachandran created HIVE-10431: Summary: HIVE-9555 broke hadoop-1 build Key: HIVE-10431 URL: https://issues.apache.org/jira/browse/HIVE-10431 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Sergey Shelukhin HIVE-9555 RecordReaderUtils uses direct bytebuffer read from FSDataInputStream which is not present in hadoop-1. This breaks hadoop-1 compilation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
hive contributor meetup in bay area
FYI, there is contributor meetup being hosted tomorrow evening at the Hortonworks office in Santa Clara, CA http://www.meetup.com/Hive-Contributors-Group/events/221610423/ Please RSVP in the meetup page if you would like to attend. Thanks, Thejas
Can anyone review HIVE-10275 ?
Thank you
Re: Review Request 33422: HIVE-10434 - Cancel connection when remote Spark driver process has failed [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/#review81103 --- spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java https://reviews.apache.org/r/33422/#comment131349 This will throw an exception if the child process exits with a non-zero status after the RSC connects back to HS2. I don't think you want that. spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java https://reviews.apache.org/r/33422/#comment131351 While the only current call site reflects the error message, this method seems more generic than that. Maybe pass the error message as a parameter to the method? - Marcelo Vanzin On April 22, 2015, 12:30 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33422/ --- (Updated April 22, 2015, 12:30 a.m.) Review request for hive and Marcelo Vanzin. Bugs: HIVE-10434 https://issues.apache.org/jira/browse/HIVE-10434 Repository: hive-git Description --- This patch cancels the connection from HS2 to remote process once the latter has failed and exited with error code, to avoid potential long timeout. It add a new public method cancelClient to the RpcServer class - not sure whether there's an easier way to do this.. Diffs - spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 71e432d spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java 32d4c46 Diff: https://reviews.apache.org/r/33422/diff/ Testing --- Tested on my own cluster, and it worked. Thanks, Chao Sun
Reading RC file using Mapreduce
Hi, I need to read an RC file in my map reduce using newer Api. I was trying to use RCFileMapReduceInputFormat, but it seems, it has a bug an rather than returning a single record, it returns the whole file. May be I am missing out something trivial. Could you please suggest, what can I use to read records from RC file. Any pointers or some sample code will be of great help. Thanks in advance. Regards, Rakesh.
Re: Reading RC file using Mapreduce
Hi hive dev team, Any quick Help in this regard will be really appreciated. We are kind of stuck with this. Thanks and Regards, Rakesh. From: Rakesh Sharma raksha...@expedia.commailto:raksha...@expedia.com Date: Tuesday, April 21, 2015 at 12:26 PM To: dev@hive.apache.orgmailto:dev@hive.apache.org dev@hive.apache.orgmailto:dev@hive.apache.org Subject: Reading RC file using Mapreduce Hi, I need to read an RC file in my map reduce using newer Api. I was trying to use RCFileMapReduceInputFormat, but it seems, it has a bug an rather than returning a single record, it returns the whole file. May be I am missing out something trivial. Could you please suggest, what can I use to read records from RC file. Any pointers or some sample code will be of great help. Thanks in advance. Regards, Rakesh.
[jira] [Created] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument
Alexander Behm created HIVE-10427: - Summary: collect_list() and collect_set() should accept struct types as argument Key: HIVE-10427 URL: https://issues.apache.org/jira/browse/HIVE-10427 Project: Hive Issue Type: Wish Components: UDF Reporter: Alexander Behm The collect_list() and collect_set() functions currently only accept scalar argument types. It would be very useful if these functions could also accept struct argument types for creating nested data from flat data. For example, suppose I wanted to create a nested customers/orders table from two flat tables, customers and orders. Then it'd be very convenient to write something like this: {code} insert into table nested_customers_orders select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...)) from customers c inner join orders o on (c.cid = o.oid) group by c.cid {code} Thanks you for your consideration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)