Hi All,
I am new to TEZ and could you please help to understand how TEZ works and
how we can control the Number of mappers and can we execute instead of
increasing the AM memory (tez.am.resource.memory.mb) and TEZ container
size. We are dealing with huge volume of Data of size 4.5 TB per
> Or, is this an artifact of an incompatibility between ORC files written by
> the Hive 2.x ORC serde not being readable by the Hive 1.x ORC serde?
> 3. Is there a difference in the ORC file format spec. at play here?
Nope, we're still defaulting to hive-0.12 format ORC files in Hive-2.x.
We
Just some clarifying points please.
1. Is this the general case for all file formats?
2. Or, is this an artifact of an incompatibility between ORC files
written by the Hive 2.x ORC serde not being readable by the Hive 1.x ORC serde?
3. Is there a difference in the ORC file
Hi,
My bad, you are right I intended to give the query as below (TXN left
joining CURRENCY not the other way around):
*Query : *Select ROW_NUM,CCY_CD,TXN_DT,CNTRY_DESC
from TXN LEFT JOIN CURRENCY on (CURRENCY.CCY_CD = TXN.CCY_CD)
where
TXN_DT between EFF_ST_DT and EFF_END_DT;
Thanks and
As far as I know, Spark can't read Hive's transactionnal tables yet:
https://issues.apache.org/jira/browse/SPARK-16996
On Thu, Aug 24, 2017 at 4:34 AM, Aviral Agarwal
wrote:
> So, there is no way possible right now for Spark to read Hive 2.x data ?
>
> On Thu, Aug 24,
Hello,
You can use the split(string, pattern) UDF, that returns an array.
You can compute this array in a subquery, and then assign a[0], a[1], a[2]
... to each column.
The split UDF will only be called once per row.
On Thu, Aug 24, 2017 at 12:11 AM, Deepak Khandelwal <