date:20170822

Re: How to optimize multiple count( distinct col) in Hive SQL

2017-08-22 Thread panfei

Hi Gopal, Thanks for all the information and suggestion. The Hive version is 2.0.1 and use Hive-on-MR as the execution engine. I think I should create a intermediate table which includes all the dimensions (including the serval kinds of ids), and then use spark-sql to calculate the distinct

Re: How to optimize multiple count( distinct col) in Hive SQL

2017-08-22 Thread Gopal Vijayaraghavan

> COUNT(DISTINCT monthly_user_id) AS monthly_active_users, > COUNT(DISTINCT weekly_user_id) AS weekly_active_users, … > GROUPING_ID() AS gid, > COUNT(1) AS dummy There are two things which prevent Hive from optimize multiple count distincts. Another aggregate like a count(1) or a Grouping sets

Fwd: How to optimize multiple count( distinct col) in Hive SQL

2017-08-22 Thread panfei

-- Forwarded message -- From: panfei Date: 2017-08-23 12:26 GMT+08:00 Subject: Fwd: How to optimize multiple count( distinct col) in Hive SQL To: hive-...@hadoop.apache.org -- Forwarded message -- From: panfei Date:

Re: ORC Transaction Table - Spark

2017-08-22 Thread Eugene Koifman

Could you do recursive “ls” in your table or partition that you are trying to read? Most likely you have files that don’t follow expected naming convention Eugene From: Aviral Agarwal Reply-To: "user@hive.apache.org" Date: Tuesday, August 22, 2017

Re: Aug. 2017 Hive User Group Meeting

2017-08-22 Thread dan young

Dooh..thanx! On Tue, Aug 22, 2017, 11:11 AM Alan Gates wrote: > The address is at the top of the text description, even though it isn’t in > the location field: > > 5470 Great America Parkway, Santa Clara, CA > > Alan. > > On Mon, Aug 21, 2017 at 5:50 PM, dan young

Re: Hive on Spark

2017-08-22 Thread Vihang Karajgaonkar

Xuefu is planning to give a talk on Hive-on-Spark @Uber the user meetup this week. We can check if can share the presentation on this list for folks who can't attend the meetup. https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/ On Mon, Aug 21, 2017 at 11:44 PM, peter zhang

ORC Transaction Table - Spark

2017-08-22 Thread Aviral Agarwal

Hi, I am trying to read hive orc transaction table through Spark but I am getting the following error Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) at

Re: Hive index + Tez engine = no performance gain?!

2017-08-22 Thread Gopal Vijayaraghavan

TL;DR - A Materialized view is a much more useful construct than trying to get limited indexes to work. That is pretty lively project which has been going on for a while with Druid+LLAP https://issues.apache.org/jira/browse/HIVE-14486 > This seems out of the blue but my initial benchmarks

Hive on Spark

2017-08-22 Thread peter zhang

Hi All, Has anybody used hive on spark in your production environment? How does it's the stability and performance compared with spark sql? Hope anybody can share your experience. Thanks in advance!

Re: How to optimize multiple count( distinct col) in Hive SQL

Re: How to optimize multiple count( distinct col) in Hive SQL

Fwd: How to optimize multiple count( distinct col) in Hive SQL

Re: ORC Transaction Table - Spark

Re: Aug. 2017 Hive User Group Meeting

Re: Hive on Spark

ORC Transaction Table - Spark

Re: Hive index + Tez engine = no performance gain?!

Hive on Spark

9 matches

Site Navigation

Mail list logo

Footer information