Re: Hive LLAP Macro and Window Function

2018-06-27 Thread Gopal Vijayaraghavan
> When LLAP Execution Mode is set to 'only' you can't have a macro and window function in the same select statement. The "only" part isn't enforced for the simple select query, but is enforced for the complex one (the PTF one). > select col_1, col_2 from macro_bug where

Re: Hive LLAP Macro and Window Function

2018-06-27 Thread Gopal Vijayaraghavan
> When LLAP Execution Mode is set to 'only' you can't have a macro and window > function in the same select statement. The "only" part isn't enforced for the simple select query, but is enforced for the complex one (the PTF one). > select col_1, col_2 from macro_bug where otrim(col_1) is not

Re: HiveServer2 performance references?

2018-10-15 Thread Gopal Vijayaraghavan
Hi, > I was looking at HiveServer2 performance going through Knox in KNOX-1524 and > found that HTTP mode is significantly slower. The HTTP mode does re-auth for every row before HIVE-20621 was fixed – Knox should be doing cookie-auth to prevent ActiveDirectory/LDAP from throttling this. I

Re: hive 3.1 mapjoin with complex predicate produce incorrect results

2018-12-22 Thread Gopal Vijayaraghavan
Hi, > Subject: Re: hive 3.1 mapjoin with complex predicate produce incorrect results ... > |                         0 if(_col0 is null, 44, _col0) (type: int) | > |                         1 _col0 (type: int)        | That rewrite is pretty neat, but I feel like the IF expression nesting is

Re: Out Of Memory Error

2019-01-10 Thread Gopal Vijayaraghavan
>   ,row_number() over ( PARTITION BY A.dt,A.year, A.month, >A.bouncer,A.visitor_type,A.device_type order by A.total_page_view_time desc ) >as rank from content_pages_agg_by_month A The row_number() window function is a streaming function, so this should not consume a significant

Re: [feature request] auto-increment field in Hive

2018-09-15 Thread Gopal Vijayaraghavan
Hi, > It doesn't help if you need concurrent threads writing to a table but we are > just using the row_number analytic and a max value subquery to generate > sequences on our star schema warehouse. Yup, you're right the row_number doesn't help with concurrent writes - it doesn't even scale

Re: UDFClassLoader isolation leaking

2018-09-13 Thread Gopal Vijayaraghavan
Hi, > Hopefully someone can tell me if this is a bug, expected behavior, or > something I'm causing myself :) I don't think this is expected behaviour, but where the bug is what I'm looking into. > We have a custom StorageHandler that we're updating from Hive 1.2.1 to Hive > 3.0.0. Most

Re: out of memory using Union operator and array column type

2019-03-11 Thread Gopal Vijayaraghavan
> I'll try the simplest query I can reduce it to  with loads of memory and see > if that gets anywhere. Other pointers are much appreciated. Looks like something I'm testing right now (to make the memory setting cost-based). https://issues.apache.org/jira/browse/HIVE-21399 A less

Re: Hive Order By Question

2019-02-06 Thread Gopal Vijayaraghavan
>I am running an older version of Hive on MR. Does it have it too? Hard to tell without an explain. AFAIK, this was fixed in Aug 2013 - how old is your build? Cheers, Gopal

Re: Hive Order By Question

2019-02-06 Thread Gopal Vijayaraghavan
Hi, That looks like the TopN hash optimization didn't kick in, that must be a settings issue in the install. | Reduce Output Operator | | key expressions: _col0 (type: string) | | sort order: + | |

Re: Hive Order By Question

2019-02-06 Thread Gopal Vijayaraghavan
> I expect the maps to do some sorting and limiting in parallel. That way the > reducer load would be small. I don’t think it does that. Can you tell me why?  They do. Which version are you running, is it Tez and do you have an explain for the plan? Cheers, Gopal

Re: Predicate Push Down Vs On Clause

2019-04-28 Thread Gopal Vijayaraghavan
> Yes both of these are valid ways of filtering data before join in Hive. This has several implementation specifics attached to it. If you're looking at Hive 1.1 or before, it might not work the same way as Vineet mentioned. In older versions Calcite rewrites aren't triggered, which prevented

Re: Hive on Tez vs Impala

2019-04-22 Thread Gopal Vijayaraghavan
> I wish the Hive team to keep things more backward-compatible as well. Hive is > such an enormous system with a wide-spread impact so any > backward-incompatible change could cause an uproar in the community. The incompatibilities were not avoidable in a set of situations - a lot of those

Re: Hive on Tez vs Impala

2019-04-15 Thread Gopal Vijayaraghavan
Hi, >> However, we have built Tez on CDH and it runs just fine. Down that path you'll also need to deploy a slightly newer version of Hive as well, because Hive 1.1 is a bit ancient & has known bugs with the tez planner code. You effectively end up building the hortonworks/hive-release

Re: Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NoSuchMethodError

2019-07-19 Thread Gopal Vijayaraghavan
Hi, > java.lang.NoSuchMethodError: > org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I > (state=,code=0) Are you rolling your own Hadoop install? https://issues.apache.org/jira/browse/HADOOP-14683 Cheers, Gopal

re: Gather Partition Locations

2019-11-11 Thread Gopal Vijayaraghavan
Hi, > I have a question about how to get the location for a bunch of partitions. ... > But in an enterprise environment I'm pretty sure this approach would not be > the best because the RDS (mysql or derby) is maybe not reachable or > I don't have the permission to it. That was the reason Hive

<    1   2   3   4