RE: PARTITION error because different columns size

2016-12-13 Thread Joaquin Alzola
Hi Suresh

I choose the * and not the specific fields because I have 520 columns.
The data that I tested was only a testing ground.

I suppose then that I need to select the 520 fileds. ☹



From: Suresh Kumar Sethuramaswamy [mailto:rock...@gmail.com]
Sent: 13 December 2016 14:19
To: user@hive.apache.org
Subject: Re: PARTITION error because different columns size

Hi Joaquin

In hive , when u run 'select * from employee' it is going to return the 
partitioned columns also at the end,  whereas you don't want that to be 
inserted into ur ORC table , so ur insert query should look like

  INSERT INTO TABLE employee_orc PARTITION (country='USA', office='HQ-TX') 
select eid,salary from employee where country='USA' and office='HQ-TX';


 Remember partition in hive is a physical folder name

Regards
Suresh



On Tue, Dec 13, 2016 at 6:37 AM Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:













Hi List



I change Spark to 2.0.2 and Hive 2.0.1.

I have the bellow tables but the INSERT INTO TABLE employee_orc PARTITION 
(country='USA', office='HQ-TX') select * from employee where country='USA' and 
office='HQ-TX';

Is giving me --> Cannot insert into table `default`.`employee_orc` because the 
number of columns are different: need 4 columns, but query has 6 columns.;



When doing select it is adding the Partition as columns ….



CREATE TABLE IF NOT EXISTS employee ( eid int, name String,

salary String, destination String)

COMMENT 'Employee details'

PARTITIONED BY(country string, office string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;



CREATE TABLE IF NOT EXISTS employee_orc ( eid int, name String,

salary String, destination String)

COMMENT 'Employee details'

PARTITIONED BY(country string, office string)

STORED AS ORC tblproperties ("orc.compress"="ZLIB");



0: jdbc:hive2://localhost:1> LOAD DATA LOCAL INPATH '/mnt/sample.txt.gz' 
INTO TABLE employee PARTITION (country='USA', office='HQ-TX');

+-+--+

| Result  |

+-+--+

+-+--+

No rows selected (0.685 seconds)

0: jdbc:hive2://localhost:1> select * from employee;

+---+--+-++--+-+--+

|  eid  | name | salary  |destination | country  | office  |

+---+--+-++--+-+--+

| 1201  | Gopal| 45000   | Technical manager  | USA  | HQ-TX   |

| 1202  | Manisha  | 45000   | Proof reader   | USA  | HQ-TX   |

| 1203  | Masthanvali  | 4   | Technical writer   | USA  | HQ-TX   |

| 1204  | Kiran| 4   | Hr Admin   | USA  | HQ-TX   |

| 1205  | Kranthi  | 3   | Op Admin   | USA  | HQ-TX   |

+---+--+-++--+-+--+

5 rows selected (0.358 seconds)

0: jdbc:hive2://localhost:1> INSERT INTO TABLE employee_orc PARTITION 
(country='USA', office='HQ-TX') select * from employee where country='USA' and 
office='HQ-TX';

Error: org.apache.spark.sql.AnalysisException: Cannot insert into table 
`default`.`employee_orc` because the number of columns are different: need 4 
columns, but query has 6 columns.; (state=,code=0)





0: jdbc:hive2://localhost:1> describe employee_orc;

+--++--+--+

| col_name | data_type  | comment  |

+--++--+--+

| eid  | int| NULL |

| name | string | NULL |

| salary   | string | NULL |

| destination  | string | NULL |

| country  | string | NULL |

| office   | string | NULL |

| # Partition Information  ||  |

| # col_name   | data_type  | comment  |

| country  | string | NULL |

| office   | string | NULL |

+--++--+--+



0: jdbc:hive2://localhost:1>  describe employee;

+--++--+--+

| col_name | data_type  | comment  |

+--++--+--+

| eid  | int| NULL |

| name | string | NULL |

| salary   | string | NULL |

| destination  | string | NULL |

| country  | string | NULL |

| office   | string | NULL |

| # Partition Information  ||  |

| # col_name   | data_type  | comment  |

| country  | string | NULL |

| office   | string | NULL |

+--+--

PARTITION error because different columns size

2016-12-13 Thread Joaquin Alzola
Hi List

I change Spark to 2.0.2 and Hive 2.0.1.
I have the bellow tables but the INSERT INTO TABLE employee_orc PARTITION 
(country='USA', office='HQ-TX') select * from employee where country='USA' and 
office='HQ-TX';
Is giving me --> Cannot insert into table `default`.`employee_orc` because the 
number of columns are different: need 4 columns, but query has 6 columns.;

When doing select it is adding the Partition as columns 

CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT 'Employee details'
PARTITIONED BY(country string, office string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

CREATE TABLE IF NOT EXISTS employee_orc ( eid int, name String,
salary String, destination String)
COMMENT 'Employee details'
PARTITIONED BY(country string, office string)
STORED AS ORC tblproperties ("orc.compress"="ZLIB");

0: jdbc:hive2://localhost:1> LOAD DATA LOCAL INPATH '/mnt/sample.txt.gz' 
INTO TABLE employee PARTITION (country='USA', office='HQ-TX');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.685 seconds)
0: jdbc:hive2://localhost:1> select * from employee;
+---+--+-++--+-+--+
|  eid  | name | salary  |destination | country  | office  |
+---+--+-++--+-+--+
| 1201  | Gopal| 45000   | Technical manager  | USA  | HQ-TX   |
| 1202  | Manisha  | 45000   | Proof reader   | USA  | HQ-TX   |
| 1203  | Masthanvali  | 4   | Technical writer   | USA  | HQ-TX   |
| 1204  | Kiran| 4   | Hr Admin   | USA  | HQ-TX   |
| 1205  | Kranthi  | 3   | Op Admin   | USA  | HQ-TX   |
+---+--+-++--+-+--+
5 rows selected (0.358 seconds)
0: jdbc:hive2://localhost:1> INSERT INTO TABLE employee_orc PARTITION 
(country='USA', office='HQ-TX') select * from employee where country='USA' and 
office='HQ-TX';
Error: org.apache.spark.sql.AnalysisException: Cannot insert into table 
`default`.`employee_orc` because the number of columns are different: need 4 
columns, but query has 6 columns.; (state=,code=0)


0: jdbc:hive2://localhost:1> describe employee_orc;
+--++--+--+
| col_name | data_type  | comment  |
+--++--+--+
| eid  | int| NULL |
| name | string | NULL |
| salary   | string | NULL |
| destination  | string | NULL |
| country  | string | NULL |
| office   | string | NULL |
| # Partition Information  ||  |
| # col_name   | data_type  | comment  |
| country  | string | NULL |
| office   | string | NULL |
+--++--+--+

0: jdbc:hive2://localhost:1>  describe employee;
+--++--+--+
| col_name | data_type  | comment  |
+--++--+--+
| eid  | int| NULL |
| name | string | NULL |
| salary   | string | NULL |
| destination  | string | NULL |
| country  | string | NULL |
| office   | string | NULL |
| # Partition Information  ||  |
| # col_name   | data_type  | comment  |
| country  | string | NULL |
| office   | string | NULL |
+--++--+--+
10 rows selected (0.045 seconds)
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Hive Stored Textfile to Stored ORC taking long time

2016-12-09 Thread Joaquin Alzola
Hi Jan

So you just load the .gz file into the STORED Textfile and then just use the 
INSERT to pass it from the TEXTFILE table to the ORC table?


From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com]
Sent: 09 December 2016 22:29
To: user@hive.apache.org
Subject: RE: Hive Stored Textfile to Stored ORC taking long time

I have this problem as well. It takes forever to insert into ORC table. I have 
original table text files gzipped. Having 4nodes with each 64gb and 16cores

From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com]
Sent: pátek 9. prosince 2016 12:34
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Hive Stored Textfile to Stored ORC taking long time

Hi Jorn

Yes I will do that test. Same file size but with less columns.

I created a table with simple columns (all strings) and not nested and I do not 
do any transformations. Attach both tables schema.

As per default the hive.vectorized.execution.enabled is set to false.
I have not enable it.

Just an example that it took 1 hours :
0: jdbc:hive2://localhost:1> insert into table ret_rec_cdrs_orc PARTITION 
(country='DE',year='2016',month='12') select * from ret_rec_cdrs where 
country='DE' and year='2016' and month='12';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (3837.457 seconds)
0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs where 
country='DE' and year='2016' and month='12';
+--+--+
|   _c0|
+--+--+
| 3900155  |
+--+--+
1 row selected (24.722 seconds)
0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs_orc where 
country='DE' and year='2016' and month='12';
+--+--+
|   _c0|
+--+--+
| 3900155  |
+--+--+
1 row selected (82.071 seconds)

From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 09 December 2016 10:22
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Stored Textfile to Stored ORC taking long time

Ok.
No do no split in smaller files. This is done automatically. Your behavior 
looks strange. For that file size I would expect that it takes below one minute.
Maybe you hit a bug in the spark on hive engine. You could try with a file with 
less columns, but the same size. I assume that this is a hive table with simple 
columns (nothing deeply nested) and that you do not any transformations.
What is the CTAS query?
Do you enable vectorization in Hive?

If you just need a simple mapping from CSV to orc you can use any framework 
(mr, tez, spark etc), because performance does not differ so much in these 
cases, especially for the small amount of data you process.

On 9 Dec 2016, at 11:02, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
Hi Jorn

The file is about 1.5GB with 1.5 milion records and about 550 fields in each 
row.

ORC is compress as Zlib.

I am using a standalone solution before expanding it, so everything is on the 
same node.
Hive 2.0.1 --> Spark 1.6.3 --> HDFS 2.6.5

The configuration is much more as standard and have not change anything much.

It cannot be a network issue because all the apps are on the same node.

Since I am doing all of this translation on the Hive point (from textfile to 
ORC) I wanted to know if I could do it quicker on the Spark or HDFS level 
(doing the file conversion some other way) not on the stop of the “stack”

We take the files every day once so if I put them in textfile and then to ORC 
it will take me almost half a day just to display the data.

It is basicly a time consuming task, and want to do it much quicker. A better 
solution of course would be to put smaller files with FLUME but this I will do 
it in the future.

From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 09 December 2016 09:48
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Stored Textfile to Stored ORC taking long time

How large is the file? Might IO be an issue? How many disks have you on the 
only node?

Do you compress the ORC (snappy?).

What is the Hadoop distribution? Configuration baseline? Hive version?

Not sure if i understood your setup, but might network be an issue?

On 9 Dec 2016, at 02:08, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
HI List

The transformation from textfile table to stored ORC table takes quiet a long 
time.

Steps follow>


1.Create one normal table using textFile format

2.Load the data normally into this table

3.Create one table with the schema of the expected results of your normal hive 
table using stored as orcfile

4.Insert overwrite query to copy the data from textFile table to orcfile table

I have about 1,5 million records with about 550 fields in each row.

Doing step 4 takes about 30 minutes (moving from one format to the other).

I have spark with only one worker (same for HDFS) so running now a standalone 
server but with 25G and 14 cores on 

RE: Hive Stored Textfile to Stored ORC taking long time

2016-12-09 Thread Joaquin Alzola
Hi Jorn

Yes I will do that test. Same file size but with less columns.

I created a table with simple columns (all strings) and not nested and I do not 
do any transformations. Attach both tables schema.

As per default the hive.vectorized.execution.enabled is set to false.
I have not enable it.

Just an example that it took 1 hours :
0: jdbc:hive2://localhost:1> insert into table ret_rec_cdrs_orc PARTITION 
(country='DE',year='2016',month='12') select * from ret_rec_cdrs where 
country='DE' and year='2016' and month='12';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (3837.457 seconds)
0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs where 
country='DE' and year='2016' and month='12';
+--+--+
|   _c0|
+--+--+
| 3900155  |
+--+--+
1 row selected (24.722 seconds)
0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs_orc where 
country='DE' and year='2016' and month='12';
+--+--+
|   _c0|
+--+--+
| 3900155  |
+--+--+
1 row selected (82.071 seconds)

From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 09 December 2016 10:22
To: user@hive.apache.org
Subject: Re: Hive Stored Textfile to Stored ORC taking long time

Ok.
No do no split in smaller files. This is done automatically. Your behavior 
looks strange. For that file size I would expect that it takes below one minute.
Maybe you hit a bug in the spark on hive engine. You could try with a file with 
less columns, but the same size. I assume that this is a hive table with simple 
columns (nothing deeply nested) and that you do not any transformations.
What is the CTAS query?
Do you enable vectorization in Hive?

If you just need a simple mapping from CSV to orc you can use any framework 
(mr, tez, spark etc), because performance does not differ so much in these 
cases, especially for the small amount of data you process.

On 9 Dec 2016, at 11:02, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
Hi Jorn

The file is about 1.5GB with 1.5 milion records and about 550 fields in each 
row.

ORC is compress as Zlib.

I am using a standalone solution before expanding it, so everything is on the 
same node.
Hive 2.0.1 --> Spark 1.6.3 --> HDFS 2.6.5

The configuration is much more as standard and have not change anything much.

It cannot be a network issue because all the apps are on the same node.

Since I am doing all of this translation on the Hive point (from textfile to 
ORC) I wanted to know if I could do it quicker on the Spark or HDFS level 
(doing the file conversion some other way) not on the stop of the “stack”

We take the files every day once so if I put them in textfile and then to ORC 
it will take me almost half a day just to display the data.

It is basicly a time consuming task, and want to do it much quicker. A better 
solution of course would be to put smaller files with FLUME but this I will do 
it in the future.

From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 09 December 2016 09:48
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Stored Textfile to Stored ORC taking long time

How large is the file? Might IO be an issue? How many disks have you on the 
only node?

Do you compress the ORC (snappy?).

What is the Hadoop distribution? Configuration baseline? Hive version?

Not sure if i understood your setup, but might network be an issue?

On 9 Dec 2016, at 02:08, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
HI List

The transformation from textfile table to stored ORC table takes quiet a long 
time.

Steps follow>


1.Create one normal table using textFile format

2.Load the data normally into this table

3.Create one table with the schema of the expected results of your normal hive 
table using stored as orcfile

4.Insert overwrite query to copy the data from textFile table to orcfile table

I have about 1,5 million records with about 550 fields in each row.

Doing step 4 takes about 30 minutes (moving from one format to the other).

I have spark with only one worker (same for HDFS) so running now a standalone 
server but with 25G and 14 cores on that worker.

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
CREATE TABLE IF NOT EXISTS RET_rec_cdrs (
CDR_ID String,
CDR_SUB_ID String,
TIME_STAMP String,
ServiceKey String,
CallingPartyNumber St

RE: Hive Stored Textfile to Stored ORC taking long time

2016-12-09 Thread Joaquin Alzola
Hi Jorn

The file is about 1.5GB with 1.5 milion records and about 550 fields in each 
row.

ORC is compress as Zlib.

I am using a standalone solution before expanding it, so everything is on the 
same node.
Hive 2.0.1 --> Spark 1.6.3 --> HDFS 2.6.5

The configuration is much more as standard and have not change anything much.

It cannot be a network issue because all the apps are on the same node.

Since I am doing all of this translation on the Hive point (from textfile to 
ORC) I wanted to know if I could do it quicker on the Spark or HDFS level 
(doing the file conversion some other way) not on the stop of the “stack”

We take the files every day once so if I put them in textfile and then to ORC 
it will take me almost half a day just to display the data.

It is basicly a time consuming task, and want to do it much quicker. A better 
solution of course would be to put smaller files with FLUME but this I will do 
it in the future.

From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 09 December 2016 09:48
To: user@hive.apache.org
Subject: Re: Hive Stored Textfile to Stored ORC taking long time

How large is the file? Might IO be an issue? How many disks have you on the 
only node?

Do you compress the ORC (snappy?).

What is the Hadoop distribution? Configuration baseline? Hive version?

Not sure if i understood your setup, but might network be an issue?

On 9 Dec 2016, at 02:08, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
HI List

The transformation from textfile table to stored ORC table takes quiet a long 
time.

Steps follow>


1.Create one normal table using textFile format

2.Load the data normally into this table

3.Create one table with the schema of the expected results of your normal hive 
table using stored as orcfile

4.Insert overwrite query to copy the data from textFile table to orcfile table

I have about 1,5 million records with about 550 fields in each row.

Doing step 4 takes about 30 minutes (moving from one format to the other).

I have spark with only one worker (same for HDFS) so running now a standalone 
server but with 25G and 14 cores on that worker.

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Hive Stored Textfile to Stored ORC taking long time

2016-12-09 Thread Joaquin Alzola
HI Gopal,

Hive version 2.0.1 with spark 1.6.3

The textfile was loaded to Hive as plain text then created the ORC table and 
then INSERT into ORC table.

Would it be faster to input into the STORED TEXTFILE as gzip already?



From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal 
Vijayaraghavan
Sent: 09 December 2016 04:17
To: user@hive.apache.org
Subject: Re: Hive Stored Textfile to Stored ORC taking long time



> I have spark with only one worker (same for HDFS) so running now a standalone 
> server but with 25G and 14 cores on that worker.

Which version of Hive was this?

And was the input text file compressed with something like gzip?

Cheers,
Gopal

This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Hive Stored Textfile to Stored ORC taking long time

2016-12-09 Thread Joaquin Alzola
Did you do anything to mitigate this issue? Like putting it directly on the 
HDFS? Or thourg spark instead of going through Hive?

From: Qiuzhuang Lian [mailto:qiuzhuang.l...@gmail.com]
Sent: 09 December 2016 04:02
To: user@hive.apache.org
Subject: Re: Hive Stored Textfile to Stored ORC taking long time

Yes, we did run into this issue too. Typically if the text hive table exceeds 
100 million when converting txt table into ORC table.

On Fri, Dec 9, 2016 at 9:08 AM, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
HI List

The transformation from textfile table to stored ORC table takes quiet a long 
time.

Steps follow>


1.Create one normal table using textFile format

2.Load the data normally into this table

3.Create one table with the schema of the expected results of your normal hive 
table using stored as orcfile

4.Insert overwrite query to copy the data from textFile table to orcfile table

I have about 1,5 million records with about 550 fields in each row.

Doing step 4 takes about 30 minutes (moving from one format to the other).

I have spark with only one worker (same for HDFS) so running now a standalone 
server but with 25G and 14 cores on that worker.

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.

This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Hive Stored Textfile to Stored ORC taking long time

2016-12-08 Thread Joaquin Alzola
HI List

The transformation from textfile table to stored ORC table takes quiet a long 
time.

Steps follow>


1.Create one normal table using textFile format

2.Load the data normally into this table

3.Create one table with the schema of the expected results of your normal hive 
table using stored as orcfile

4.Insert overwrite query to copy the data from textFile table to orcfile table

I have about 1,5 million records with about 550 fields in each row.

Doing step 4 takes about 30 minutes (moving from one format to the other).

I have spark with only one worker (same for HDFS) so running now a standalone 
server but with 25G and 14 cores on that worker.

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: ORC and Table partition

2016-12-08 Thread Joaquin Alzola
Thanks Jan

insert into table ret_mms_cdrs_orc PARTITION 
(country='TALK',year='2016',month='12') select * from ret_mms_cdrs where 
country='TALK' and year='2016' and month='12';

I was missing the PARTITION sentence.

From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com]
Sent: 08 December 2016 14:20
To: Joaquin Alzola <joaquin.alz...@lebara.com>
Subject: RE: ORC and Table partition

create partitioned ORC table first and then insert into it from text table

insert into table test.partitions PARTITION (part_col = 20161212)
select
a,
b
from test.source;
From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com]
Sent: čtvrtek 8. prosince 2016 15:08
To: Brotanek, Jan 
<jan.brota...@adastragrp.com<mailto:jan.brota...@adastragrp.com>>
Subject: RE: ORC and Table partition

Sorry, by mistake I reply only to you.
Just send another email to the list.

From: Joaquin Alzola
Sent: 08 December 2016 14:07
To: 'Brotanek, Jan' 
<jan.brota...@adastragrp.com<mailto:jan.brota...@adastragrp.com>>
Subject: RE: ORC and Table partition

Asking because I have a partition but for textfile:
Table: RET_mms_cdrs
COMMENT 'Retail MMS CDRs'
PARTITIONED BY(country STRING, year STRING, month STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

And need to move it to an ORC stored file:
Table: RET_mms_cdrs_orc
COMMENT 'Retail MMS CDRs'
PARTITIONED BY(country STRING, year STRING, month STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS ORC tblproperties ("orc.compress"="ZLIB");

But when doing:
INSERT INTO TABLE RET_mms_cdrs_orc SELECT * FROM RET_mms_cdrs

0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs;
+---+--+
|  _c0  |
+---+--+
| 4554  |
+---+--+

0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs_orc;
+--+--+
| _c0  |
+--+--+
| 0|
+--+--+

So it is not passing the info from one table to the other ORC table.
Cause I think this is the only way to add ORC files into Hive.

From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com]
Sent: 08 December 2016 13:51
To: Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>>
Subject: RE: ORC and Table partition

Sure.

create table if not exists CEOSK.CEO_CUST_MKIB2
(
DAY STRING,
SITE DECIMAL(5,0),
VAL0 DECIMAL(13,2),
VAL1 DECIMAL(13,2),
VAL2 DECIMAL(13,2),
VAL3 DECIMAL(13,2),
VAL4 DECIMAL(13,2),
VAL5 DECIMAL(13,2),
VAL6 DECIMAL(13,2),
VAL7 DECIMAL(13,2),
VAL8 DECIMAL(13,2),
VAL9 DECIMAL(13,2)
)
PARTITIONED BY (part_col string)
STORED AS ORC;

zlib compression type is default

From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com]
Sent: čtvrtek 8. prosince 2016 14:49
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: ORC and Table partition

Hi Guys

Can the ORC files and the table partitions coexist on the same table?

Such as 

)
COMMENT 'Retail MMS CDRs'
PARTITIONED BY(country STRING, year STRING, month STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS ORC tblproperties ("orc.compress"="ZLIB");

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: ORC and Table partition

2016-12-08 Thread Joaquin Alzola
Asking because I have a partition but for textfile:
Table: RET_mms_cdrs
COMMENT 'Retail MMS CDRs'
PARTITIONED BY(country STRING, year STRING, month STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

And need to move it to an ORC stored file:
Table: RET_mms_cdrs_orc
COMMENT 'Retail MMS CDRs'
PARTITIONED BY(country STRING, year STRING, month STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS ORC tblproperties ("orc.compress"="ZLIB");

But when doing:
INSERT INTO TABLE RET_mms_cdrs_orc SELECT * FROM RET_mms_cdrs

0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs;
+---+--+
|  _c0  |
+---+--+
| 4554  |
+---+--+

0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs_orc;
+--+--+
| _c0  |
+--+--+
| 0|
+--+--+

So it is not passing the info from one table to the other ORC table.
Cause I think this is the only way to add ORC files into Hive.

From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com]
Sent: 08 December 2016 13:51
To: Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>>
Subject: RE: ORC and Table partition

Sure.

create table if not exists CEOSK.CEO_CUST_MKIB2
(
DAY STRING,
SITE DECIMAL(5,0),
VAL0 DECIMAL(13,2),
VAL1 DECIMAL(13,2),
VAL2 DECIMAL(13,2),
VAL3 DECIMAL(13,2),
VAL4 DECIMAL(13,2),
VAL5 DECIMAL(13,2),
VAL6 DECIMAL(13,2),
VAL7 DECIMAL(13,2),
VAL8 DECIMAL(13,2),
VAL9 DECIMAL(13,2)
)
PARTITIONED BY (part_col string)
STORED AS ORC;

zlib compression type is default

From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com]
Sent: čtvrtek 8. prosince 2016 14:49
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: ORC and Table partition

Hi Guys

Can the ORC files and the table partitions coexist on the same table?

Such as 

)
COMMENT 'Retail MMS CDRs'
PARTITIONED BY(country STRING, year STRING, month STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS ORC tblproperties ("orc.compress"="ZLIB");

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


ORC and Table partition

2016-12-08 Thread Joaquin Alzola
Hi Guys

Can the ORC files and the table partitions coexist on the same table?

Such as 

)
COMMENT 'Retail MMS CDRs'
PARTITIONED BY(country STRING, year STRING, month STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS ORC tblproperties ("orc.compress"="ZLIB");

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: When Hive on Spark will support Spark 2.0?

2016-12-07 Thread Joaquin Alzola
The version that will support Spark2.0 is Hive2.2

No not know yet when this is going to be release.

-Original Message-
From: baipeng [mailto:b...@meitu.com]
Sent: 07 December 2016 08:04
To: user@hive.apache.org
Subject: When Hive on Spark will support Spark 2.0?

Does Anyone know when Hive will release version to support Spark 2.0? Now hive 
2.1.0 only supports spark 1.6.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


RE: Hive on Spark not working

2016-11-29 Thread Joaquin Alzola
Being unable to integrate separately Hive with Spark I just started directly on 
Spark the thrift server.
Now it is working as expected.

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: 29 November 2016 11:12
To: user <user@hive.apache.org>
Subject: Re: Hive on Spark not working

Hive on Spark engine only works with Spark 1.3.1.


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 29 November 2016 at 07:56, Furcy Pin 
<furcy@flaminem.com<mailto:furcy@flaminem.com>> wrote:
ClassNotFoundException generally means that jars are missing from your class 
path.

You probably need to link the spark jar to $HIVE_HOME/lib
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive

On Tue, Nov 29, 2016 at 2:03 AM, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
Hi Guys

No matter what I do that when I execute “select count(*) from employee” I get 
the following output on the logs:
It is quiet funny because if I put hive.execution.engine=mr the output is 
correct. If I put hive.execution.engine=spark then I get the bellow errors.
If I do the search directly through spark-shell it work great.
+---+
|_c0|
+---+
|1005635|
+---+
So there has to be a problem from hive to spark.

Seems as the RPC(??) connection is not setup …. Can somebody guide me on what 
to look for.
spark.master=spark://172.16.173.31:7077<http://172.16.173.31:7077>
hive.execution.engine=spark
spark.executor.extraClassPath
/mnt/spark/lib/spark-1.6.2-yarn-shuffle.jar:/mnt/hive/lib/hive-exec-2.0.1.jar

Hive2.0.1--> Spark 1.6.2 –> Hadoop – 2.6.5 --> Scala 2.10

2016-11-29T00:35:11,099 WARN  [RPC-Handler-2]: rpc.RpcDispatcher 
(RpcDispatcher.java:handleError(142)) - Received error 
message:io.netty.handler.codec.DecoderException: 
java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230)
at 
io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Nati

RE: Hive on Spark not working

2016-11-29 Thread Joaquin Alzola
HI Mich

I read in some older post that you make it work as well with the configuration 
I have:
Hive2.0.1--> Spark 1.6.2 –> Hadoop – 2.6.5 --> Scala 2.10
You only make it work with Hive 1.2.1 --> Spark 1.3.1 --> etc ….?

BR

Joaquin

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: 29 November 2016 11:12
To: user <user@hive.apache.org>
Subject: Re: Hive on Spark not working

Hive on Spark engine only works with Spark 1.3.1.


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 29 November 2016 at 07:56, Furcy Pin 
<furcy@flaminem.com<mailto:furcy@flaminem.com>> wrote:
ClassNotFoundException generally means that jars are missing from your class 
path.

You probably need to link the spark jar to $HIVE_HOME/lib
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive

On Tue, Nov 29, 2016 at 2:03 AM, Joaquin Alzola 
<joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote:
Hi Guys

No matter what I do that when I execute “select count(*) from employee” I get 
the following output on the logs:
It is quiet funny because if I put hive.execution.engine=mr the output is 
correct. If I put hive.execution.engine=spark then I get the bellow errors.
If I do the search directly through spark-shell it work great.
+---+
|_c0|
+---+
|1005635|
+---+
So there has to be a problem from hive to spark.

Seems as the RPC(??) connection is not setup …. Can somebody guide me on what 
to look for.
spark.master=spark://172.16.173.31:7077<http://172.16.173.31:7077>
hive.execution.engine=spark
spark.executor.extraClassPath
/mnt/spark/lib/spark-1.6.2-yarn-shuffle.jar:/mnt/hive/lib/hive-exec-2.0.1.jar

Hive2.0.1--> Spark 1.6.2 –> Hadoop – 2.6.5 --> Scala 2.10

2016-11-29T00:35:11,099 WARN  [RPC-Handler-2]: rpc.RpcDispatcher 
(RpcDispatcher.java:handleError(142)) - Received error 
message:io.netty.handler.codec.DecoderException: 
java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230)
at 
io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)

Hive on Spark not working

2016-11-28 Thread Joaquin Alzola
Hi Guys

No matter what I do that when I execute "select count(*) from employee" I get 
the following output on the logs:
It is quiet funny because if I put hive.execution.engine=mr the output is 
correct. If I put hive.execution.engine=spark then I get the bellow errors.
If I do the search directly through spark-shell it work great.
+---+
|_c0|
+---+
|1005635|
+---+
So there has to be a problem from hive to spark.

Seems as the RPC(??) connection is not setup  Can somebody guide me on what 
to look for.
spark.master=spark://172.16.173.31:7077
hive.execution.engine=spark
spark.executor.extraClassPath
/mnt/spark/lib/spark-1.6.2-yarn-shuffle.jar:/mnt/hive/lib/hive-exec-2.0.1.jar

Hive2.0.1--> Spark 1.6.2 -> Hadoop - 2.6.5 --> Scala 2.10

2016-11-29T00:35:11,099 WARN  [RPC-Handler-2]: rpc.RpcDispatcher 
(RpcDispatcher.java:handleError(142)) - Received error 
message:io.netty.handler.codec.DecoderException: 
java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230)
at 
io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at 
org.apache.hive.spark.client.rpc.KryoMessageCodec.decode(KryoMessageCodec.java:97)
at 
io.netty.handler.codec.ByteToMessageCodec$1.decode(ByteToMessageCodec.java:42)
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:327)
... 15 more
Caused by: java.lang.ClassNotFoundException: org.apache.hive.spark.client.Job
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 39 more
This email is confidential and may 

Working Hive--> Spark --> HDFS

2016-11-23 Thread Joaquin Alzola
Hi Guys

Can somebody tell me a workin version of HoSoHDFS.

So far I have tested:
Hive1.2.1-->Spark-->1.6.3--> Hadoop 2.6

Hive 2.1 --> Spark2.0.2 --> Hadoop 2.7

And both of them give me varios exceptions.
I have to say the first one creates the job in HDFS and finish it successfully 
but give back an error on spark.

BR

Joaquin

This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Hive 2.2 binaries

2016-11-14 Thread Joaquin Alzola
Hi Guys

I found out that I am having a inconsistency when running Hive2.1 with 
Spark2.0.1

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/spark/JavaSparkListener

Found out this JIRA: https://issues.apache.org/jira/browse/HIVE-14029

When is the Hive 2.2 binaries coming out?

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Hive count(*) excetion

2016-11-10 Thread Joaquin Alzola
HI Guys

When I type this simple query: select count(*) from employee;

I get the following execption:

2016-11-11T00:13:14,605 ERROR [HiveServer2-Background-Pool: Thread-123]: 
spark.SparkTask (:()) - Failed to execute spark task, with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark 
client.)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:64)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:89)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90)
   at 
org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at 
org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at 
org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:203)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
... 22 more

Hive 2.1
Spark 2.0.1
Hadoop 2.7.3

Queries such as ><= work fine.

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.


Hive Exception

2016-11-08 Thread Joaquin Alzola
Hi List

I am getting the following error when ONLY using count(*) or order by

0: jdbc:hive2://localhost:1> insert into employee 
(eid,name,salary,destination) values ('1206','Joaquin','38000','Engineer');
Query ID = joaquin_20161107030932_a118a171-9862-4c59-8e29-edf0e6eeb194
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Failed to execute spark task, with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark 
client.)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=1)
0: jdbc:hive2://localhost:1>

Any ideas on how to fix this issue?

BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.