RE: PARTITION error because different columns size
Hi Suresh I choose the * and not the specific fields because I have 520 columns. The data that I tested was only a testing ground. I suppose then that I need to select the 520 fileds. ☹ From: Suresh Kumar Sethuramaswamy [mailto:rock...@gmail.com] Sent: 13 December 2016 14:19 To: user@hive.apache.org Subject: Re: PARTITION error because different columns size Hi Joaquin In hive , when u run 'select * from employee' it is going to return the partitioned columns also at the end, whereas you don't want that to be inserted into ur ORC table , so ur insert query should look like INSERT INTO TABLE employee_orc PARTITION (country='USA', office='HQ-TX') select eid,salary from employee where country='USA' and office='HQ-TX'; Remember partition in hive is a physical folder name Regards Suresh On Tue, Dec 13, 2016 at 6:37 AM Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: Hi List I change Spark to 2.0.2 and Hive 2.0.1. I have the bellow tables but the INSERT INTO TABLE employee_orc PARTITION (country='USA', office='HQ-TX') select * from employee where country='USA' and office='HQ-TX'; Is giving me --> Cannot insert into table `default`.`employee_orc` because the number of columns are different: need 4 columns, but query has 6 columns.; When doing select it is adding the Partition as columns …. CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String) COMMENT 'Employee details' PARTITIONED BY(country string, office string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; CREATE TABLE IF NOT EXISTS employee_orc ( eid int, name String, salary String, destination String) COMMENT 'Employee details' PARTITIONED BY(country string, office string) STORED AS ORC tblproperties ("orc.compress"="ZLIB"); 0: jdbc:hive2://localhost:1> LOAD DATA LOCAL INPATH '/mnt/sample.txt.gz' INTO TABLE employee PARTITION (country='USA', office='HQ-TX'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.685 seconds) 0: jdbc:hive2://localhost:1> select * from employee; +---+--+-++--+-+--+ | eid | name | salary |destination | country | office | +---+--+-++--+-+--+ | 1201 | Gopal| 45000 | Technical manager | USA | HQ-TX | | 1202 | Manisha | 45000 | Proof reader | USA | HQ-TX | | 1203 | Masthanvali | 4 | Technical writer | USA | HQ-TX | | 1204 | Kiran| 4 | Hr Admin | USA | HQ-TX | | 1205 | Kranthi | 3 | Op Admin | USA | HQ-TX | +---+--+-++--+-+--+ 5 rows selected (0.358 seconds) 0: jdbc:hive2://localhost:1> INSERT INTO TABLE employee_orc PARTITION (country='USA', office='HQ-TX') select * from employee where country='USA' and office='HQ-TX'; Error: org.apache.spark.sql.AnalysisException: Cannot insert into table `default`.`employee_orc` because the number of columns are different: need 4 columns, but query has 6 columns.; (state=,code=0) 0: jdbc:hive2://localhost:1> describe employee_orc; +--++--+--+ | col_name | data_type | comment | +--++--+--+ | eid | int| NULL | | name | string | NULL | | salary | string | NULL | | destination | string | NULL | | country | string | NULL | | office | string | NULL | | # Partition Information || | | # col_name | data_type | comment | | country | string | NULL | | office | string | NULL | +--++--+--+ 0: jdbc:hive2://localhost:1> describe employee; +--++--+--+ | col_name | data_type | comment | +--++--+--+ | eid | int| NULL | | name | string | NULL | | salary | string | NULL | | destination | string | NULL | | country | string | NULL | | office | string | NULL | | # Partition Information || | | # col_name | data_type | comment | | country | string | NULL | | office | string | NULL | +--+--
PARTITION error because different columns size
Hi List I change Spark to 2.0.2 and Hive 2.0.1. I have the bellow tables but the INSERT INTO TABLE employee_orc PARTITION (country='USA', office='HQ-TX') select * from employee where country='USA' and office='HQ-TX'; Is giving me --> Cannot insert into table `default`.`employee_orc` because the number of columns are different: need 4 columns, but query has 6 columns.; When doing select it is adding the Partition as columns CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String) COMMENT 'Employee details' PARTITIONED BY(country string, office string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; CREATE TABLE IF NOT EXISTS employee_orc ( eid int, name String, salary String, destination String) COMMENT 'Employee details' PARTITIONED BY(country string, office string) STORED AS ORC tblproperties ("orc.compress"="ZLIB"); 0: jdbc:hive2://localhost:1> LOAD DATA LOCAL INPATH '/mnt/sample.txt.gz' INTO TABLE employee PARTITION (country='USA', office='HQ-TX'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.685 seconds) 0: jdbc:hive2://localhost:1> select * from employee; +---+--+-++--+-+--+ | eid | name | salary |destination | country | office | +---+--+-++--+-+--+ | 1201 | Gopal| 45000 | Technical manager | USA | HQ-TX | | 1202 | Manisha | 45000 | Proof reader | USA | HQ-TX | | 1203 | Masthanvali | 4 | Technical writer | USA | HQ-TX | | 1204 | Kiran| 4 | Hr Admin | USA | HQ-TX | | 1205 | Kranthi | 3 | Op Admin | USA | HQ-TX | +---+--+-++--+-+--+ 5 rows selected (0.358 seconds) 0: jdbc:hive2://localhost:1> INSERT INTO TABLE employee_orc PARTITION (country='USA', office='HQ-TX') select * from employee where country='USA' and office='HQ-TX'; Error: org.apache.spark.sql.AnalysisException: Cannot insert into table `default`.`employee_orc` because the number of columns are different: need 4 columns, but query has 6 columns.; (state=,code=0) 0: jdbc:hive2://localhost:1> describe employee_orc; +--++--+--+ | col_name | data_type | comment | +--++--+--+ | eid | int| NULL | | name | string | NULL | | salary | string | NULL | | destination | string | NULL | | country | string | NULL | | office | string | NULL | | # Partition Information || | | # col_name | data_type | comment | | country | string | NULL | | office | string | NULL | +--++--+--+ 0: jdbc:hive2://localhost:1> describe employee; +--++--+--+ | col_name | data_type | comment | +--++--+--+ | eid | int| NULL | | name | string | NULL | | salary | string | NULL | | destination | string | NULL | | country | string | NULL | | office | string | NULL | | # Partition Information || | | # col_name | data_type | comment | | country | string | NULL | | office | string | NULL | +--++--+--+ 10 rows selected (0.045 seconds) This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
RE: Hive Stored Textfile to Stored ORC taking long time
Hi Jan So you just load the .gz file into the STORED Textfile and then just use the INSERT to pass it from the TEXTFILE table to the ORC table? From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com] Sent: 09 December 2016 22:29 To: user@hive.apache.org Subject: RE: Hive Stored Textfile to Stored ORC taking long time I have this problem as well. It takes forever to insert into ORC table. I have original table text files gzipped. Having 4nodes with each 64gb and 16cores From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com] Sent: pátek 9. prosince 2016 12:34 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: RE: Hive Stored Textfile to Stored ORC taking long time Hi Jorn Yes I will do that test. Same file size but with less columns. I created a table with simple columns (all strings) and not nested and I do not do any transformations. Attach both tables schema. As per default the hive.vectorized.execution.enabled is set to false. I have not enable it. Just an example that it took 1 hours : 0: jdbc:hive2://localhost:1> insert into table ret_rec_cdrs_orc PARTITION (country='DE',year='2016',month='12') select * from ret_rec_cdrs where country='DE' and year='2016' and month='12'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (3837.457 seconds) 0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs where country='DE' and year='2016' and month='12'; +--+--+ | _c0| +--+--+ | 3900155 | +--+--+ 1 row selected (24.722 seconds) 0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs_orc where country='DE' and year='2016' and month='12'; +--+--+ | _c0| +--+--+ | 3900155 | +--+--+ 1 row selected (82.071 seconds) From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: 09 December 2016 10:22 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Hive Stored Textfile to Stored ORC taking long time Ok. No do no split in smaller files. This is done automatically. Your behavior looks strange. For that file size I would expect that it takes below one minute. Maybe you hit a bug in the spark on hive engine. You could try with a file with less columns, but the same size. I assume that this is a hive table with simple columns (nothing deeply nested) and that you do not any transformations. What is the CTAS query? Do you enable vectorization in Hive? If you just need a simple mapping from CSV to orc you can use any framework (mr, tez, spark etc), because performance does not differ so much in these cases, especially for the small amount of data you process. On 9 Dec 2016, at 11:02, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: Hi Jorn The file is about 1.5GB with 1.5 milion records and about 550 fields in each row. ORC is compress as Zlib. I am using a standalone solution before expanding it, so everything is on the same node. Hive 2.0.1 --> Spark 1.6.3 --> HDFS 2.6.5 The configuration is much more as standard and have not change anything much. It cannot be a network issue because all the apps are on the same node. Since I am doing all of this translation on the Hive point (from textfile to ORC) I wanted to know if I could do it quicker on the Spark or HDFS level (doing the file conversion some other way) not on the stop of the “stack” We take the files every day once so if I put them in textfile and then to ORC it will take me almost half a day just to display the data. It is basicly a time consuming task, and want to do it much quicker. A better solution of course would be to put smaller files with FLUME but this I will do it in the future. From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: 09 December 2016 09:48 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Hive Stored Textfile to Stored ORC taking long time How large is the file? Might IO be an issue? How many disks have you on the only node? Do you compress the ORC (snappy?). What is the Hadoop distribution? Configuration baseline? Hive version? Not sure if i understood your setup, but might network be an issue? On 9 Dec 2016, at 02:08, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: HI List The transformation from textfile table to stored ORC table takes quiet a long time. Steps follow> 1.Create one normal table using textFile format 2.Load the data normally into this table 3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile 4.Insert overwrite query to copy the data from textFile table to orcfile table I have about 1,5 million records with about 550 fields in each row. Doing step 4 takes about 30 minutes (moving from one format to the other). I have spark with only one worker (same for HDFS) so running now a standalone server but with 25G and 14 cores on
RE: Hive Stored Textfile to Stored ORC taking long time
Hi Jorn Yes I will do that test. Same file size but with less columns. I created a table with simple columns (all strings) and not nested and I do not do any transformations. Attach both tables schema. As per default the hive.vectorized.execution.enabled is set to false. I have not enable it. Just an example that it took 1 hours : 0: jdbc:hive2://localhost:1> insert into table ret_rec_cdrs_orc PARTITION (country='DE',year='2016',month='12') select * from ret_rec_cdrs where country='DE' and year='2016' and month='12'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (3837.457 seconds) 0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs where country='DE' and year='2016' and month='12'; +--+--+ | _c0| +--+--+ | 3900155 | +--+--+ 1 row selected (24.722 seconds) 0: jdbc:hive2://localhost:1> select count(*) from ret_rec_cdrs_orc where country='DE' and year='2016' and month='12'; +--+--+ | _c0| +--+--+ | 3900155 | +--+--+ 1 row selected (82.071 seconds) From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: 09 December 2016 10:22 To: user@hive.apache.org Subject: Re: Hive Stored Textfile to Stored ORC taking long time Ok. No do no split in smaller files. This is done automatically. Your behavior looks strange. For that file size I would expect that it takes below one minute. Maybe you hit a bug in the spark on hive engine. You could try with a file with less columns, but the same size. I assume that this is a hive table with simple columns (nothing deeply nested) and that you do not any transformations. What is the CTAS query? Do you enable vectorization in Hive? If you just need a simple mapping from CSV to orc you can use any framework (mr, tez, spark etc), because performance does not differ so much in these cases, especially for the small amount of data you process. On 9 Dec 2016, at 11:02, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: Hi Jorn The file is about 1.5GB with 1.5 milion records and about 550 fields in each row. ORC is compress as Zlib. I am using a standalone solution before expanding it, so everything is on the same node. Hive 2.0.1 --> Spark 1.6.3 --> HDFS 2.6.5 The configuration is much more as standard and have not change anything much. It cannot be a network issue because all the apps are on the same node. Since I am doing all of this translation on the Hive point (from textfile to ORC) I wanted to know if I could do it quicker on the Spark or HDFS level (doing the file conversion some other way) not on the stop of the “stack” We take the files every day once so if I put them in textfile and then to ORC it will take me almost half a day just to display the data. It is basicly a time consuming task, and want to do it much quicker. A better solution of course would be to put smaller files with FLUME but this I will do it in the future. From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: 09 December 2016 09:48 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Hive Stored Textfile to Stored ORC taking long time How large is the file? Might IO be an issue? How many disks have you on the only node? Do you compress the ORC (snappy?). What is the Hadoop distribution? Configuration baseline? Hive version? Not sure if i understood your setup, but might network be an issue? On 9 Dec 2016, at 02:08, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: HI List The transformation from textfile table to stored ORC table takes quiet a long time. Steps follow> 1.Create one normal table using textFile format 2.Load the data normally into this table 3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile 4.Insert overwrite query to copy the data from textFile table to orcfile table I have about 1,5 million records with about 550 fields in each row. Doing step 4 takes about 30 minutes (moving from one format to the other). I have spark with only one worker (same for HDFS) so running now a standalone server but with 25G and 14 cores on that worker. BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. CREATE TABLE IF NOT EXISTS RET_rec_cdrs ( CDR_ID String, CDR_SUB_ID String, TIME_STAMP String, ServiceKey String, CallingPartyNumber St
RE: Hive Stored Textfile to Stored ORC taking long time
Hi Jorn The file is about 1.5GB with 1.5 milion records and about 550 fields in each row. ORC is compress as Zlib. I am using a standalone solution before expanding it, so everything is on the same node. Hive 2.0.1 --> Spark 1.6.3 --> HDFS 2.6.5 The configuration is much more as standard and have not change anything much. It cannot be a network issue because all the apps are on the same node. Since I am doing all of this translation on the Hive point (from textfile to ORC) I wanted to know if I could do it quicker on the Spark or HDFS level (doing the file conversion some other way) not on the stop of the “stack” We take the files every day once so if I put them in textfile and then to ORC it will take me almost half a day just to display the data. It is basicly a time consuming task, and want to do it much quicker. A better solution of course would be to put smaller files with FLUME but this I will do it in the future. From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: 09 December 2016 09:48 To: user@hive.apache.org Subject: Re: Hive Stored Textfile to Stored ORC taking long time How large is the file? Might IO be an issue? How many disks have you on the only node? Do you compress the ORC (snappy?). What is the Hadoop distribution? Configuration baseline? Hive version? Not sure if i understood your setup, but might network be an issue? On 9 Dec 2016, at 02:08, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: HI List The transformation from textfile table to stored ORC table takes quiet a long time. Steps follow> 1.Create one normal table using textFile format 2.Load the data normally into this table 3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile 4.Insert overwrite query to copy the data from textFile table to orcfile table I have about 1,5 million records with about 550 fields in each row. Doing step 4 takes about 30 minutes (moving from one format to the other). I have spark with only one worker (same for HDFS) so running now a standalone server but with 25G and 14 cores on that worker. BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
RE: Hive Stored Textfile to Stored ORC taking long time
HI Gopal, Hive version 2.0.1 with spark 1.6.3 The textfile was loaded to Hive as plain text then created the ORC table and then INSERT into ORC table. Would it be faster to input into the STORED TEXTFILE as gzip already? From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: 09 December 2016 04:17 To: user@hive.apache.org Subject: Re: Hive Stored Textfile to Stored ORC taking long time > I have spark with only one worker (same for HDFS) so running now a standalone > server but with 25G and 14 cores on that worker. Which version of Hive was this? And was the input text file compressed with something like gzip? Cheers, Gopal This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
RE: Hive Stored Textfile to Stored ORC taking long time
Did you do anything to mitigate this issue? Like putting it directly on the HDFS? Or thourg spark instead of going through Hive? From: Qiuzhuang Lian [mailto:qiuzhuang.l...@gmail.com] Sent: 09 December 2016 04:02 To: user@hive.apache.org Subject: Re: Hive Stored Textfile to Stored ORC taking long time Yes, we did run into this issue too. Typically if the text hive table exceeds 100 million when converting txt table into ORC table. On Fri, Dec 9, 2016 at 9:08 AM, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: HI List The transformation from textfile table to stored ORC table takes quiet a long time. Steps follow> 1.Create one normal table using textFile format 2.Load the data normally into this table 3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile 4.Insert overwrite query to copy the data from textFile table to orcfile table I have about 1,5 million records with about 550 fields in each row. Doing step 4 takes about 30 minutes (moving from one format to the other). I have spark with only one worker (same for HDFS) so running now a standalone server but with 25G and 14 cores on that worker. BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
Hive Stored Textfile to Stored ORC taking long time
HI List The transformation from textfile table to stored ORC table takes quiet a long time. Steps follow> 1.Create one normal table using textFile format 2.Load the data normally into this table 3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile 4.Insert overwrite query to copy the data from textFile table to orcfile table I have about 1,5 million records with about 550 fields in each row. Doing step 4 takes about 30 minutes (moving from one format to the other). I have spark with only one worker (same for HDFS) so running now a standalone server but with 25G and 14 cores on that worker. BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
RE: ORC and Table partition
Thanks Jan insert into table ret_mms_cdrs_orc PARTITION (country='TALK',year='2016',month='12') select * from ret_mms_cdrs where country='TALK' and year='2016' and month='12'; I was missing the PARTITION sentence. From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com] Sent: 08 December 2016 14:20 To: Joaquin Alzola <joaquin.alz...@lebara.com> Subject: RE: ORC and Table partition create partitioned ORC table first and then insert into it from text table insert into table test.partitions PARTITION (part_col = 20161212) select a, b from test.source; From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com] Sent: čtvrtek 8. prosince 2016 15:08 To: Brotanek, Jan <jan.brota...@adastragrp.com<mailto:jan.brota...@adastragrp.com>> Subject: RE: ORC and Table partition Sorry, by mistake I reply only to you. Just send another email to the list. From: Joaquin Alzola Sent: 08 December 2016 14:07 To: 'Brotanek, Jan' <jan.brota...@adastragrp.com<mailto:jan.brota...@adastragrp.com>> Subject: RE: ORC and Table partition Asking because I have a partition but for textfile: Table: RET_mms_cdrs COMMENT 'Retail MMS CDRs' PARTITIONED BY(country STRING, year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; And need to move it to an ORC stored file: Table: RET_mms_cdrs_orc COMMENT 'Retail MMS CDRs' PARTITIONED BY(country STRING, year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS ORC tblproperties ("orc.compress"="ZLIB"); But when doing: INSERT INTO TABLE RET_mms_cdrs_orc SELECT * FROM RET_mms_cdrs 0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs; +---+--+ | _c0 | +---+--+ | 4554 | +---+--+ 0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs_orc; +--+--+ | _c0 | +--+--+ | 0| +--+--+ So it is not passing the info from one table to the other ORC table. Cause I think this is the only way to add ORC files into Hive. From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com] Sent: 08 December 2016 13:51 To: Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> Subject: RE: ORC and Table partition Sure. create table if not exists CEOSK.CEO_CUST_MKIB2 ( DAY STRING, SITE DECIMAL(5,0), VAL0 DECIMAL(13,2), VAL1 DECIMAL(13,2), VAL2 DECIMAL(13,2), VAL3 DECIMAL(13,2), VAL4 DECIMAL(13,2), VAL5 DECIMAL(13,2), VAL6 DECIMAL(13,2), VAL7 DECIMAL(13,2), VAL8 DECIMAL(13,2), VAL9 DECIMAL(13,2) ) PARTITIONED BY (part_col string) STORED AS ORC; zlib compression type is default From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com] Sent: čtvrtek 8. prosince 2016 14:49 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: ORC and Table partition Hi Guys Can the ORC files and the table partitions coexist on the same table? Such as ) COMMENT 'Retail MMS CDRs' PARTITIONED BY(country STRING, year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS ORC tblproperties ("orc.compress"="ZLIB"); BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
RE: ORC and Table partition
Asking because I have a partition but for textfile: Table: RET_mms_cdrs COMMENT 'Retail MMS CDRs' PARTITIONED BY(country STRING, year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; And need to move it to an ORC stored file: Table: RET_mms_cdrs_orc COMMENT 'Retail MMS CDRs' PARTITIONED BY(country STRING, year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS ORC tblproperties ("orc.compress"="ZLIB"); But when doing: INSERT INTO TABLE RET_mms_cdrs_orc SELECT * FROM RET_mms_cdrs 0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs; +---+--+ | _c0 | +---+--+ | 4554 | +---+--+ 0: jdbc:hive2://localhost:1> select count(*) from RET_mms_cdrs_orc; +--+--+ | _c0 | +--+--+ | 0| +--+--+ So it is not passing the info from one table to the other ORC table. Cause I think this is the only way to add ORC files into Hive. From: Brotanek, Jan [mailto:jan.brota...@adastragrp.com] Sent: 08 December 2016 13:51 To: Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> Subject: RE: ORC and Table partition Sure. create table if not exists CEOSK.CEO_CUST_MKIB2 ( DAY STRING, SITE DECIMAL(5,0), VAL0 DECIMAL(13,2), VAL1 DECIMAL(13,2), VAL2 DECIMAL(13,2), VAL3 DECIMAL(13,2), VAL4 DECIMAL(13,2), VAL5 DECIMAL(13,2), VAL6 DECIMAL(13,2), VAL7 DECIMAL(13,2), VAL8 DECIMAL(13,2), VAL9 DECIMAL(13,2) ) PARTITIONED BY (part_col string) STORED AS ORC; zlib compression type is default From: Joaquin Alzola [mailto:joaquin.alz...@lebara.com] Sent: čtvrtek 8. prosince 2016 14:49 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: ORC and Table partition Hi Guys Can the ORC files and the table partitions coexist on the same table? Such as ) COMMENT 'Retail MMS CDRs' PARTITIONED BY(country STRING, year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS ORC tblproperties ("orc.compress"="ZLIB"); BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
ORC and Table partition
Hi Guys Can the ORC files and the table partitions coexist on the same table? Such as ) COMMENT 'Retail MMS CDRs' PARTITIONED BY(country STRING, year STRING, month STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS ORC tblproperties ("orc.compress"="ZLIB"); BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
RE: When Hive on Spark will support Spark 2.0?
The version that will support Spark2.0 is Hive2.2 No not know yet when this is going to be release. -Original Message- From: baipeng [mailto:b...@meitu.com] Sent: 07 December 2016 08:04 To: user@hive.apache.org Subject: When Hive on Spark will support Spark 2.0? Does Anyone know when Hive will release version to support Spark 2.0? Now hive 2.1.0 only supports spark 1.6. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
RE: Hive on Spark not working
Being unable to integrate separately Hive with Spark I just started directly on Spark the thrift server. Now it is working as expected. From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: 29 November 2016 11:12 To: user <user@hive.apache.org> Subject: Re: Hive on Spark not working Hive on Spark engine only works with Spark 1.3.1. Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 29 November 2016 at 07:56, Furcy Pin <furcy@flaminem.com<mailto:furcy@flaminem.com>> wrote: ClassNotFoundException generally means that jars are missing from your class path. You probably need to link the spark jar to $HIVE_HOME/lib https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive On Tue, Nov 29, 2016 at 2:03 AM, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: Hi Guys No matter what I do that when I execute “select count(*) from employee” I get the following output on the logs: It is quiet funny because if I put hive.execution.engine=mr the output is correct. If I put hive.execution.engine=spark then I get the bellow errors. If I do the search directly through spark-shell it work great. +---+ |_c0| +---+ |1005635| +---+ So there has to be a problem from hive to spark. Seems as the RPC(??) connection is not setup …. Can somebody guide me on what to look for. spark.master=spark://172.16.173.31:7077<http://172.16.173.31:7077> hive.execution.engine=spark spark.executor.extraClassPath /mnt/spark/lib/spark-1.6.2-yarn-shuffle.jar:/mnt/hive/lib/hive-exec-2.0.1.jar Hive2.0.1--> Spark 1.6.2 –> Hadoop – 2.6.5 --> Scala 2.10 2016-11-29T00:35:11,099 WARN [RPC-Handler-2]: rpc.RpcDispatcher (RpcDispatcher.java:handleError(142)) - Received error message:io.netty.handler.codec.DecoderException: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230) at io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Nati
RE: Hive on Spark not working
HI Mich I read in some older post that you make it work as well with the configuration I have: Hive2.0.1--> Spark 1.6.2 –> Hadoop – 2.6.5 --> Scala 2.10 You only make it work with Hive 1.2.1 --> Spark 1.3.1 --> etc ….? BR Joaquin From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: 29 November 2016 11:12 To: user <user@hive.apache.org> Subject: Re: Hive on Spark not working Hive on Spark engine only works with Spark 1.3.1. Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 29 November 2016 at 07:56, Furcy Pin <furcy@flaminem.com<mailto:furcy@flaminem.com>> wrote: ClassNotFoundException generally means that jars are missing from your class path. You probably need to link the spark jar to $HIVE_HOME/lib https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive On Tue, Nov 29, 2016 at 2:03 AM, Joaquin Alzola <joaquin.alz...@lebara.com<mailto:joaquin.alz...@lebara.com>> wrote: Hi Guys No matter what I do that when I execute “select count(*) from employee” I get the following output on the logs: It is quiet funny because if I put hive.execution.engine=mr the output is correct. If I put hive.execution.engine=spark then I get the bellow errors. If I do the search directly through spark-shell it work great. +---+ |_c0| +---+ |1005635| +---+ So there has to be a problem from hive to spark. Seems as the RPC(??) connection is not setup …. Can somebody guide me on what to look for. spark.master=spark://172.16.173.31:7077<http://172.16.173.31:7077> hive.execution.engine=spark spark.executor.extraClassPath /mnt/spark/lib/spark-1.6.2-yarn-shuffle.jar:/mnt/hive/lib/hive-exec-2.0.1.jar Hive2.0.1--> Spark 1.6.2 –> Hadoop – 2.6.5 --> Scala 2.10 2016-11-29T00:35:11,099 WARN [RPC-Handler-2]: rpc.RpcDispatcher (RpcDispatcher.java:handleError(142)) - Received error message:io.netty.handler.codec.DecoderException: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230) at io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
Hive on Spark not working
Hi Guys No matter what I do that when I execute "select count(*) from employee" I get the following output on the logs: It is quiet funny because if I put hive.execution.engine=mr the output is correct. If I put hive.execution.engine=spark then I get the bellow errors. If I do the search directly through spark-shell it work great. +---+ |_c0| +---+ |1005635| +---+ So there has to be a problem from hive to spark. Seems as the RPC(??) connection is not setup Can somebody guide me on what to look for. spark.master=spark://172.16.173.31:7077 hive.execution.engine=spark spark.executor.extraClassPath /mnt/spark/lib/spark-1.6.2-yarn-shuffle.jar:/mnt/hive/lib/hive-exec-2.0.1.jar Hive2.0.1--> Spark 1.6.2 -> Hadoop - 2.6.5 --> Scala 2.10 2016-11-29T00:35:11,099 WARN [RPC-Handler-2]: rpc.RpcDispatcher (RpcDispatcher.java:handleError(142)) - Received error message:io.netty.handler.codec.DecoderException: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230) at io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790) at org.apache.hive.spark.client.rpc.KryoMessageCodec.decode(KryoMessageCodec.java:97) at io.netty.handler.codec.ByteToMessageCodec$1.decode(ByteToMessageCodec.java:42) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:327) ... 15 more Caused by: java.lang.ClassNotFoundException: org.apache.hive.spark.client.Job at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 39 more This email is confidential and may
Working Hive--> Spark --> HDFS
Hi Guys Can somebody tell me a workin version of HoSoHDFS. So far I have tested: Hive1.2.1-->Spark-->1.6.3--> Hadoop 2.6 Hive 2.1 --> Spark2.0.2 --> Hadoop 2.7 And both of them give me varios exceptions. I have to say the first one creates the job in HDFS and finish it successfully but give back an error on spark. BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
Hive 2.2 binaries
Hi Guys I found out that I am having a inconsistency when running Hive2.1 with Spark2.0.1 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener Found out this JIRA: https://issues.apache.org/jira/browse/HIVE-14029 When is the Hive 2.2 binaries coming out? BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
Hive count(*) excetion
HI Guys When I type this simple query: select count(*) from employee; I get the following execption: 2016-11-11T00:13:14,605 ERROR [HiveServer2-Background-Pool: Thread-123]: spark.SparkTask (:()) - Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client. at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:64) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:89) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/apache/spark/SparkConf at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:203) at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:65) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62) ... 22 more Hive 2.1 Spark 2.0.1 Hadoop 2.7.3 Queries such as ><= work fine. BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.
Hive Exception
Hi List I am getting the following error when ONLY using count(*) or order by 0: jdbc:hive2://localhost:1> insert into employee (eid,name,salary,destination) values ('1206','Joaquin','38000','Engineer'); Query ID = joaquin_20161107030932_a118a171-9862-4c59-8e29-edf0e6eeb194 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=1) 0: jdbc:hive2://localhost:1> Any ideas on how to fix this issue? BR Joaquin This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.