Role of AM in TEZ as execution engine

2017-08-24 Thread Raghuraman Murugaiyan
Hi All, I am new to TEZ and could you please help to understand how TEZ works and how we can control the Number of mappers and can we execute instead of increasing the AM memory (tez.am.resource.memory.mb) and TEZ container size. We are dealing with huge volume of Data of size 4.5 TB per

Re: ORC Transaction Table - Spark

2017-08-24 Thread Gopal Vijayaraghavan
> Or, is this an artifact of an incompatibility between ORC files written by > the Hive 2.x ORC serde not being readable by the Hive 1.x ORC serde? > 3. Is there a difference in the ORC file format spec. at play here? Nope, we're still defaulting to hive-0.12 format ORC files in Hive-2.x. We

RE: ORC Transaction Table - Spark

2017-08-24 Thread Larson, Kurt
Just some clarifying points please. 1. Is this the general case for all file formats? 2. Or, is this an artifact of an incompatibility between ORC files written by the Hive 2.x ORC serde not being readable by the Hive 1.x ORC serde? 3. Is there a difference in the ORC file

Re: LEFT JOIN and WHERE CLAUSE - How to handle

2017-08-24 Thread Ramasubramanian Narayanan
Hi, My bad, you are right I intended to give the query as below (TXN left joining CURRENCY not the other way around): *Query : *Select ROW_NUM,CCY_CD,TXN_DT,CNTRY_DESC from TXN LEFT JOIN CURRENCY on (CURRENCY.CCY_CD = TXN.CCY_CD) where TXN_DT between EFF_ST_DT and EFF_END_DT; Thanks and

Re: ORC Transaction Table - Spark

2017-08-24 Thread Furcy Pin
As far as I know, Spark can't read Hive's transactionnal tables yet: https://issues.apache.org/jira/browse/SPARK-16996 On Thu, Aug 24, 2017 at 4:34 AM, Aviral Agarwal wrote: > So, there is no way possible right now for Spark to read Hive 2.x data ? > > On Thu, Aug 24,

Re: One column into multiple column.

2017-08-24 Thread Furcy Pin
Hello, You can use the split(string, pattern) UDF, that returns an array. You can compute this array in a subquery, and then assign a[0], a[1], a[2] ... to each column. The split UDF will only be called once per row. On Thu, Aug 24, 2017 at 12:11 AM, Deepak Khandelwal <