[jira] [Commented] (HIVE-12779) Buffer underflow when inserting data to table

2016-05-31 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307573#comment-15307573
 ] 

Oleksiy Sayankin commented on HIVE-12779:
-

Alina has found a workaround for this issue.

*ROOT-CAUSE:*

Consider method 

{code}
  protected int require (int required) throws KryoException 
{code}

from class com.esotericsoftware.kryo.io.Input where exception happens. 

{code}
  int remaining = limit - position;
  if (remaining >= required) return remaining;
  if (required > capacity) throw new KryoException("Buffer too small: capacity:
" + capacity + ", required: " + required);

  int count;
  // Try to fill the buffer.
  if (remaining > 0) {
  count = fill(buffer, limit, capacity - limit);
  if (count == -1) throw new KryoException("Buffer underflow.");
{code}

We can see that exception ("Buffer underflow.") occurs when count == -1. So let 
us see method fill(byte[] buffer, int offset, int count) in details:
{code}
  if (inputStream == null) return -1;
  try {
  return inputStream.read(buffer, offset, count);
  } catch (IOException ex) {
  throw new KryoException(ex);
  }
{code}

It returns -1 either when inputStream == null or from inputStream.read(buffer, 
offset, count). We definitely know that inputStream can not be equal null here 
because of constructor:

{code}
  public Input (InputStream inputStream) {
this(4096);
if (inputStream == null) throw new IllegalArgumentException("inputStream
cannot be null.");
this.inputStream = inputStream;
}
{code}

>From Java docs we know that if no byte is available because the stream is at 
>end of file, the value -1 is returned by the method  inputStream.read(buffer, 
>offset, count). Hence we suspect here some errors in HDFS  here that causes -1 
>to be a return value. Skipping usage of file system as query plan storage and 
>sending it via RPC directly will fix the issue.


*SOLUTION:*

Use 

{code}
  
hive.rpc.query.plan
true
  
{code}

in hive-site.xml as workaround. This property defines whether to send the query 
plan via local resource or RPC.

> Buffer underflow when inserting data to table
> -
>
> Key: HIVE-12779
> URL: https://issues.apache.org/jira/browse/HIVE-12779
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, SQL
> Environment: CDH 5.4.9
>Reporter: Ming Hsuan Tu
>Assignee: Alan Gates
>
> I face a buffer underflow problem when inserting data to table from hive 
> 1.1.0.
> the block size is 128 MB and the data size is only 10MB, but it gives me 891 
> mappers.
> Task with the most failures(4):
> -
> Task ID:
>   task_1451989578563_0001_m_08
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1451989578563_0001=task_1451989578563_0001_m_08
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: Failed to load plan: 
> hdfs://tpe-nn-3-1:8020/tmp/hive/alec.tu/af798488-dbf5-45da-8adb-e4f2ddde1242/hive_2016-01-05_18-34-26_864_3947114301988950007-1/-mr-10004/bb86c923-0dca-43cd-aa5d-ef575d764e06/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Buffer underflow.
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:450)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:296)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:234)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:701)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Buffer 
> underflow.
> at 
> org.apache.hive.com.esotericsoftware.kryo.io.Input.require(Input.java:181)
> at 
> org.apache.hive.com.esotericsoftware.kryo.io.Input.readBoolean(Input.java:783)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeBooleanField.read(UnsafeCacheFields.java:120)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> 

[jira] [Commented] (HIVE-12779) Buffer underflow when inserting data to table

2016-05-23 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296226#comment-15296226
 ] 

Oleksiy Sayankin commented on HIVE-12779:
-

I have patch for this issue. Alan Gates, could you assign this issue to me?

> Buffer underflow when inserting data to table
> -
>
> Key: HIVE-12779
> URL: https://issues.apache.org/jira/browse/HIVE-12779
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, SQL
> Environment: CDH 5.4.9
>Reporter: Ming Hsuan Tu
>Assignee: Alan Gates
>
> I face a buffer underflow problem when inserting data to table from hive 
> 1.1.0.
> the block size is 128 MB and the data size is only 10MB, but it gives me 891 
> mappers.
> Task with the most failures(4):
> -
> Task ID:
>   task_1451989578563_0001_m_08
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1451989578563_0001=task_1451989578563_0001_m_08
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: Failed to load plan: 
> hdfs://tpe-nn-3-1:8020/tmp/hive/alec.tu/af798488-dbf5-45da-8adb-e4f2ddde1242/hive_2016-01-05_18-34-26_864_3947114301988950007-1/-mr-10004/bb86c923-0dca-43cd-aa5d-ef575d764e06/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Buffer underflow.
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:450)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:296)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:234)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:701)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Buffer 
> underflow.
> at 
> org.apache.hive.com.esotericsoftware.kryo.io.Input.require(Input.java:181)
> at 
> org.apache.hive.com.esotericsoftware.kryo.io.Input.readBoolean(Input.java:783)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeBooleanField.read(UnsafeCacheFields.java:120)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1069)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:960)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:974)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:416)
> ... 12 more
> Container killed by the ApplicationMaster.
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12779) Buffer underflow when inserting data to table

2016-05-23 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296115#comment-15296115
 ] 

Oleksiy Sayankin commented on HIVE-12779:
-

Steps to reproduce:

*STEP 1. Create tables temp.test with over 1 000 000 of rows*

Create test file test.csv with 1 000 000 lines of random test data of according 
types to table.

*STEP 2. Create folders in hdfs*

{code}
sudo -u mapr hadoop fs -mkdir /temp
sudo -u mapr hadoop fs -mkdir /temp/step5
sudo -u mapr hadoop fs -mkdir /temp/step6
{code}

*STEP 3 Create test table and upload data*

{code}
CREATE TABLE temp.test 
(id INT, a1 BIGINT, a2 BIGINT, b1 BOOLEAN, b2 BOOLEAN,
c1 DECIMAL(10,5), c2 DECIMAL(10,5), d1 DOUBLE, d2 DOUBLE, e1 FLOAT,
e2 FLOAT, f1 INT, f2 INT, g1 SMALLINT, g2 SMALLINT,
h1 STRING, h2 STRING, i1 TIMESTAMP, i2 TIMESTAMP, j1 TINYINT,
j2 TINYINT, k1 CHAR(22), k2 CHAR(22), l1 VARCHAR(22), l2 VARCHAR(22))
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
{code}

{code}
hadoop fs -put ~/test.csv /user/hive/warehouse/temp.db/test;
{code}

*STEP 4. Create tables temp.step5 and temp.step6 using queries below*

{code}
CREATE TABLE temp.step5
{code}

{code}
Create External Table if not exists temp.step5 (
trans_num decimal(10,5),
store_num bigint,
quantity double,
net_price double,
weight double,
operating_company string,
banner string,
product_id_hormel_rev3 int,
exposed_flag int,
tm_dim_key_week bigint,
experian_id bigint,
units int,
cents int,
baseline_units double,
baseline_cents double,
coupon int,
feature int,
display int,
totl_prc_reduc int,
feature_or_display int,
any_promo int,
coupon_occasions int,
feature_occasions int,
display_occasions int,
price_reduction_occasions int,
feature_or_display_occasions int,
any_promo_occasions int,
coupon_dollars double,
coupon_units int,
period int,
estimated_hh_income string,
state string,
county_code string,
latitude string,
longitude string,
cape_age_pop_pct_0_17 string,
cape_age_pop_pct_65_99_plus string,
cape_age_pop_pct_18_99_plus string,
cape_age_pop_median_age string,
cape_ethnic_pop_pct_white_only string,
cape_ethnic_pop_pct_black_only string,
cape_ethnic_pop_pct_asian_only string,
cape_ethnic_pop_pct_hispanic string,
cape_child_hh_pct_with_persons_lt18 string,
cape_child_hh_pct_marr_couple_famwith_persons_lt18 string,
cape_typ_hh_pct_married_couple_family string,
cape_tenancy_occhu_pct_owner_occupied string,
cape_tenancy_occhu_pct_renter_occupied string,
cape_hhsize_hh_average_household_size string,
cape_density_persons_per_hh_for_pop_in_hh string,
cape_homval_oohu_median_home_value string,
cape_hustr_hu_pct_mobile_home string,
cape_built_hu_median_housing_unit_age string,
cape_lang_hh_pct_spanish_speaking string,
cape_educ_pop25_plus_median_education_attained string,
cape_inc_hh_median_family_household_income string,
cape_educ_ispsa_decile string,
cape_inc_family_inc_state_decile string,
census_rural_urban_county_size_code string,
core_based_statistical_areas string,
person_1_birth_year_and_month string,
person_1_combined_age string,
person_1_gender string,
person_1_marital_status string,
recipient_reliability_code string,
household_composition string,
person_1_person_type string,
homeowner_combined_homeowner string,
homeowner_probability_model string,
dwelling_type string,
length_of_residence string,
dwelling_unit_size string,
number_of_children_in_living_unit string,
number_of_adults_in_living_unit string,
number_of_person_in_living_unit string,
mail_responder string,
mosaic_hh string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/temp/step5';
{code}

{code}
CREATE TABLE temp.step6
{code}

{code}
Create External Table if not exists temp.step6
(
experian_id bigint,  
exposed_flag int,   
banner STRING,
Product_0_Quantity_PRE STRING,
Product_1_Quantity_PRE STRING,   
Product_2_Quantity_PRE STRING,   
Product_3_Quantity_PRE STRING,   
Product_4_Quantity_PRE STRING,   
Product_5_Quantity_PRE STRING,   
Product_6_Quantity_PRE STRING,   
Product_7_Quantity_PRE STRING,  
Product_8_Quantity_PRE STRING,  
Product_0_Quantity_POS STRING,

Product_1_Quantity_POS STRING,   
Product_2_Quantity_POS STRING,   
Product_3_Quantity_POS STRING,   
Product_4_Quantity_POS STRING,   
Product_5_Quantity_POS STRING,   
Product_6_Quantity_POS STRING,   
Product_7_Quantity_POS STRING,   
Product_8_Quantity_POS STRING,

Product_0_Net_Price_PRE STRING ,  
Product_1_Net_Price_PRE STRING ,  
Product_2_Net_Price_PRE STRING ,  
Product_3_Net_Price_PRE STRING ,  

[jira] [Commented] (HIVE-12779) Buffer underflow when inserting data to table

2016-03-21 Thread Alina Abramova (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204052#comment-15204052
 ] 

Alina Abramova commented on HIVE-12779:
---

I investigated this issue and see that this bug appears randomly and for me 
only in case concurrently running jobs in Hive which address to the same table. 
I ran two queries in two beelines and about one of thirty attempts finished 
with this exception. After investigating issues with kryo's serializer I found 
that this exception appears in few cases:
1) Kryo is used in multithreading env. But I see that in Hive kryo created as 
thread local variable and this case is excluded, I think
2) Two Input uses the same buffer. As example
Input inp1 = new Input(buff);
Input inp2 = new Input(buff);
For me suspected causes cases when Hive receives the stream to read from 
FileSystem, maybe two threads use the one path for reading a plan, but, in 
theory, it is excluded, because jobs can not have the one plan with one path.
3) When output stream was not closed after serialization, but I did not found 
such incidents in Hive's code after HIVE-8688.

Maybe reporter could give more information about the query for finding 100% 
reproducable case.

> Buffer underflow when inserting data to table
> -
>
> Key: HIVE-12779
> URL: https://issues.apache.org/jira/browse/HIVE-12779
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, SQL
> Environment: CDH 5.4.9
>Reporter: Ming Hsuan Tu
>Assignee: Alan Gates
>
> I face a buffer underflow problem when inserting data to table from hive 
> 1.1.0.
> the block size is 128 MB and the data size is only 10MB, but it gives me 891 
> mappers.
> Task with the most failures(4):
> -
> Task ID:
>   task_1451989578563_0001_m_08
> URL:
>   
> http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1451989578563_0001=task_1451989578563_0001_m_08
> -
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: Failed to load plan: 
> hdfs://tpe-nn-3-1:8020/tmp/hive/alec.tu/af798488-dbf5-45da-8adb-e4f2ddde1242/hive_2016-01-05_18-34-26_864_3947114301988950007-1/-mr-10004/bb86c923-0dca-43cd-aa5d-ef575d764e06/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Buffer underflow.
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:450)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:296)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:234)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:701)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Buffer 
> underflow.
> at 
> org.apache.hive.com.esotericsoftware.kryo.io.Input.require(Input.java:181)
> at 
> org.apache.hive.com.esotericsoftware.kryo.io.Input.readBoolean(Input.java:783)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeBooleanField.read(UnsafeCacheFields.java:120)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1069)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:960)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:974)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:416)
> ... 12 more
> Container killed by the ApplicationMaster.
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)