Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-26 Thread satyajit vegesna
Hi Eugene,

One more observation, in namenode logs when i run select count(*) on
individual tables i still see the same error as before,

org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from
192.168.120.133:47114 Call#4 Retry#0: java.io.FileNotFoundException: File
does not exist: /user/hive/warehouse/mls_public_record_association_
snapshot_orc/delta_002_002_/bucket_00094_flush_length

but i get the count of the tables and they match well with the source data.
So i believe the problem is with joining these tables together.

Any specific logs you want me to debug.

Regards,
Satyajit.

On Wed, Oct 26, 2016 at 4:16 PM, satyajit vegesna <
satyajit.apas...@gmail.com> wrote:

> Hi Eugene,
>
> select count(*) from mls_public_record_association_snapshot_orc pra  left
> outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = ml.id  left
> outer join attribute a on a.id = ml.standard_status
>
> ran till end and threw the below exception.
>
> MapReduce Total cumulative CPU time: 0 days 1 hours 0 minutes 53 seconds
> 760 msec
> Ended Job = job_1477494091659_0024
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/apache-hive-2.
> 1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.2/
> share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/
> impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.
> Log4jLoggerFactory]
> 2016-10-26 16:09:01 Starting to launch local task to process map join; maximum
> memory = 514850816
> Execution failed with exit status: 2
> Obtaining error information
>
> Task failed!
> Task ID:
>   Stage-9
>
> Logs:
>
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.
> exec.mr.MapredLocalTask
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 300  Reduce: 121   Cumulative CPU: 3654.02 sec   HDFS
> Read: 1771032233 HDFS Write: 1917532703 SUCCESS
> Total MapReduce CPU Time Spent: 0 days 1 hours 0 minutes 54 seconds 20 msec
>
> Explain Plan:
>
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage , consists of Stage-1
>   Stage-1
>   Stage-9 depends on stages: Stage-1
>   Stage-3 depends on stages: Stage-9
>   Stage-0 depends on stages: Stage-3
>
> STAGE PLANS:
>   Stage: Stage-8
> Conditional Operator
>
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: pra
> Statistics: Num rows: 99241216 Data size: 9924121600 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: primary_listing_id (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 99241216 Data size: 9924121600 Basic
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 99241216 Data size: 9924121600 Basic
> stats: COMPLETE Column stats: NONE
>   TableScan
> alias: ml
> Statistics: Num rows: 201432950 Data size: 20949026816 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: id (type: string), standard_status (type: int)
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 201432950 Data size: 20949026816 Basic
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 201432950 Data size: 20949026816
> Basic stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: int)
>   Reduce Operator Tree:
> Join Operator
>   condition map:
>Left Outer Join0 to 1
>   keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
>   outputColumnNames: _col2
>   Statistics: Num rows: 221576249 Data size: 23043929997 Basic
> stats: COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> table:
> input format: org.apache.hadoop.mapred.
> SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.
> HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.
> lazybinary.LazyBinarySerDe
>
>   Stage: Stage-9 --it is failing in the same mapreduce local work.
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> $hdt$_2:a
>   Fetch Operator
> 

Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-26 Thread satyajit vegesna
Hi Eugene,

select count(*) from mls_public_record_association_snapshot_orc pra  left
outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = ml.id  left
outer join attribute a on a.id = ml.standard_status

ran till end and threw the below exception.

MapReduce Total cumulative CPU time: 0 days 1 hours 0 minutes 53 seconds
760 msec
Ended Job = job_1477494091659_0024
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
2016-10-26 16:09:01 Starting to launch local task to process map join; maximum
memory = 514850816
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-9

Logs:

FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 300  Reduce: 121   Cumulative CPU: 3654.02 sec   HDFS
Read: 1771032233 HDFS Write: 1917532703 SUCCESS
Total MapReduce CPU Time Spent: 0 days 1 hours 0 minutes 54 seconds 20 msec

Explain Plan:

STAGE DEPENDENCIES:
  Stage-8 is a root stage , consists of Stage-1
  Stage-1
  Stage-9 depends on stages: Stage-1
  Stage-3 depends on stages: Stage-9
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-8
Conditional Operator

  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: pra
Statistics: Num rows: 99241216 Data size: 9924121600 Basic
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: primary_listing_id (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 99241216 Data size: 9924121600 Basic
stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 99241216 Data size: 9924121600 Basic
stats: COMPLETE Column stats: NONE
  TableScan
alias: ml
Statistics: Num rows: 201432950 Data size: 20949026816 Basic
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: id (type: string), standard_status (type: int)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 201432950 Data size: 20949026816 Basic
stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 201432950 Data size: 20949026816
Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: int)
  Reduce Operator Tree:
Join Operator
  condition map:
   Left Outer Join0 to 1
  keys:
0 _col0 (type: string)
1 _col0 (type: string)
  outputColumnNames: _col2
  Statistics: Num rows: 221576249 Data size: 23043929997 Basic
stats: COMPLETE Column stats: NONE
  File Output Operator
compressed: false
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-9 --it is failing in the same mapreduce local work.
Map Reduce Local Work
  Alias -> Map Local Tables:
$hdt$_2:a
  Fetch Operator
limit: -1
  Alias -> Map Local Operator Tree:
$hdt$_2:a
  TableScan
alias: a
Statistics: Num rows: 12830 Data size: 51322 Basic stats:
COMPLETE Column stats: NONE
Select Operator
  expressions: id (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 12830 Data size: 51322 Basic stats:
COMPLETE Column stats: NONE
  HashTable Sink Operator
keys:
  0 _col2 (type: int)
  1 _col0 (type: int)

  Stage: Stage-3
Map Reduce
  Map Operator Tree:
  TableScan
Map Join Operator
  condition map:
   Left Outer Join0 to 1
  keys:
0 _col2 (type: int)
1 _col0 (type: int)
  Statistics: Num rows: 243733879 Data size: 25348323546 Basic
stats: COMPLETE Column stats: NONE
  Group By Operator
aggregations: 

Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-26 Thread Eugene Koifman
If you can run this, then it’s safe to ignore “00094_flush_length” messages and 
the issue is somewhere else

select count(*) from mls_public_record_association_snapshot_orc pra  left outer 
join mls_listing_snapshot_orc ml on pra.primary_listing_id = 
ml.id  left outer join attribute a on a.id = 
ml.standard_status

Eugene

From: satyajit vegesna 
>
Date: Wednesday, October 26, 2016 at 2:14 PM
To: "user@hive.apache.org" 
>, Eugene Koifman 
>
Cc: "d...@hive.apache.org" 
>
Subject: Re: Error with flush_length File in Orc, in hive 2.1.0 and mr 
execution engine.

Hi Eugene,

PFB Transaction table in green and parquet tables in yellow,

INSERT INTO access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select 
pra.url as prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as 
prCreated, CAST(pra.updated_at AS timestamp) as prUpdated, CAST(ml.created_at 
AS timestamp) as mlCreated, CAST(ml.updated_at AS timestamp) as mlUpdated, 
a.name as status, pra.public_record_id as prId, acl.accesstime 
as crawledon, pra.id as propId, pra.primary_listing_id as 
listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS 
timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS 
timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage, 
datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS timestamp)) 
as prcreateage, datediff(CAST(acl.accesstime AS timestamp),CAST(pra.updated_at 
AS timestamp)) as prupdateage,  (case when (pra.public_record_id is not null 
and TRIM(pra.public_record_id) <> '')  then (case when (pra.primary_listing_id 
is null or TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END)  else 
(case when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id) 
<> '') then 'MLS' else 'UNKNOWN' END) END) as listingType,  acl.httpstatuscode, 
 acl.httpverb,  acl.requesttime, acl.upstreamheadertime , 
acl.upstreamresponsetime,  acl.page_id,  useragent AS user_agent,  
substring(split(pra.url,'/')[0], 0,length(split(pra.url,'/')[0])-3) as city,  
substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as state,  
ml.mls_id  FROM access_logs.loadbalancer_accesslogs acl  inner join 
mls_public_record_association_snapshot_orc pra on acl.listing_url = pra.url  
left outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = 
ml.id  left outer join attribute a on a.id = 
ml.standard_status  WHERE acl.accesstimedate="2016-10-23";


Any clue, or something that you would want me to focus on to debug the issue.

Regards,
Satyajit.



On Tue, Oct 25, 2016 at 8:49 PM, Eugene Koifman 
> wrote:
Which of your tables are are transactional?  Can you provide the DDL?

I don’t think “File does not exist” error is causing your queries to fail.  
It’s an INFO level msg.
There should be some other error.

Eugene


From: satyajit vegesna 
>
Reply-To: "user@hive.apache.org" 
>
Date: Tuesday, October 25, 2016 at 5:46 PM
To: "user@hive.apache.org" 
>, 
"d...@hive.apache.org" 
>
Subject: Error with flush_length File in Orc, in hive 2.1.0 and mr execution 
engine.

HI All,

i am using hive 2.1.0 , hadoop 2.7.2 , but  when i try running queries like 
simple insert,

set mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set 
hive.exec.dynamic.partition.mode=nonstrict;set 
hive.exec.max.dynamic.partitions.pernode=400;set 
hive.exec.max.dynamic.partitions=2000;set mapreduce.map.memory.mb=5120;set 
mapreduce.reduce.memory.mb=5120;set mapred.tasktracker.map.tasks.maximum=30;set 
mapred.tasktracker.reduce.tasks.maximum=20;set 
mapred.reduce.child.java.opts=-Xmx2048m;set 
mapred.map.child.java.opts=-Xmx2048m; set hive.support.concurrency=true; set 
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set 
hive.compactor.initiator.on=false; set hive.compactor.worker.threads=1;set 
mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set 
hive.exec.dynamic.partition.mode=nonstrict;INSERT INTO 
access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select pra.url as 
prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as prCreated, 
CAST(pra.updated_at AS timestamp) as prUpdated, CAST(ml.created_at AS 
timestamp) as mlCreated, CAST(ml.updated_at AS timestamp) as mlUpdated, 
a.name 

Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-26 Thread satyajit vegesna
PFB EXPLAIN PLAN from hive, and query has issues at stage-14,
i have also run another query which also fail in stage,  *Map Reduce Local
Work.*

STAGE DEPENDENCIES:
  Stage-13 is a root stage , consists of Stage-1
  Stage-1
  Stage-10 depends on stages: Stage-1 , consists of Stage-15, Stage-2
  Stage-15 has a backup stage: Stage-2
  Stage-9 depends on stages: Stage-15
  Stage-14 depends on stages: Stage-2, Stage-9
  Stage-8 depends on stages: Stage-14
  Stage-0 depends on stages: Stage-8
  Stage-4 depends on stages: Stage-0
  Stage-2

STAGE PLANS:
  Stage: Stage-13
Conditional Operator

  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: acl
Statistics: Num rows: 1689960 Data size: 1419567234 Basic
stats: COMPLETE Column stats: NONE
Filter Operator
  predicate: listing_url is not null (type: boolean)
  Statistics: Num rows: 1689960 Data size: 1419567234 Basic
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: httpstatuscode (type: string), accesstime
(type: timestamp), useragent (type: string), page_id (type: string),
listing_url (type: string), requesttime (type: string), upstreamheadertime
(type: string), upstreamresponsetime (type: string), httpverb (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6, _col7, _col8
Statistics: Num rows: 1689960 Data size: 1419567234 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col4 (type: string)
  sort order: +
  Map-reduce partition columns: _col4 (type: string)
  Statistics: Num rows: 1689960 Data size: 1419567234 Basic
stats: COMPLETE Column stats: NONE
  value expressions: _col0 (type: string), _col1 (type:
timestamp), _col2 (type: string), _col3 (type: string), _col5 (type:
string), _col6 (type: string), _col7 (type: string), _col8 (type: string)
  TableScan
alias: pra
Statistics: Num rows: 19232793 Data size: 9924121600 Basic
stats: COMPLETE Column stats: NONE
Filter Operator
  predicate: url is not null (type: boolean)
  Statistics: Num rows: 19232793 Data size: 9924121600 Basic
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: id (type: string), primary_listing_id (type:
string), public_record_id (type: string), url_type (type: string), url
(type: string), created_at (type: bigint), updated_at (type: bigint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6
Statistics: Num rows: 19232793 Data size: 9924121600 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col4 (type: string)
  sort order: +
  Map-reduce partition columns: _col4 (type: string)
  Statistics: Num rows: 19232793 Data size: 9924121600
Basic stats: COMPLETE Column stats: NONE
  value expressions: _col0 (type: string), _col1 (type:
string), _col2 (type: string), _col3 (type: string), _col5 (type: bigint),
_col6 (type: bigint)
  Reduce Operator Tree:
Join Operator
  condition map:
   Inner Join 0 to 1
  keys:
0 _col4 (type: string)
1 _col4 (type: string)
  outputColumnNames: _col0, _col1, _col2, _col3, _col5, _col6,
_col7, _col8, _col10, _col11, _col12, _col13, _col14, _col15, _col16
  Statistics: Num rows: 21156072 Data size: 10916533996 Basic
stats: COMPLETE Column stats: NONE
  File Output Operator
compressed: false
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-10
Conditional Operator

  Stage: Stage-15
Map Reduce Local Work
  Alias -> Map Local Tables:
$hdt$_2:ml
  Fetch Operator
limit: -1
  Alias -> Map Local Operator Tree:
$hdt$_2:ml
  TableScan
alias: ml
Statistics: Num rows: 168943764 Data size: 20949026816 Basic
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: id (type: string), mls_id (type: int),
standard_status (type: int), updated_at (type: bigint), created_at (type:
bigint)
  outputColumnNames: _col0, _col1, _col2, _col3, _col4
  Statistics: Num rows: 168943764 Data size: 20949026816 Basic
stats: COMPLETE Column stats: NONE
  HashTable Sink Operator
keys:
  0 _col11 (type: string)
  1 _col0 (type: string)

  Stage: Stage-9
Map 

Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-26 Thread satyajit vegesna
Hi Eugene,

PFB Transaction table in green and parquet tables in yellow,

INSERT INTO access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select
pra.url as prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp)
as prCreated, CAST(pra.updated_at AS timestamp) as prUpdated,
CAST(ml.created_at AS timestamp) as mlCreated, CAST(ml.updated_at AS
timestamp) as mlUpdated, a.name as status, pra.public_record_id as prId,
acl.accesstime as crawledon, pra.id as propId, pra.primary_listing_id as
listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS
timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS
timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage,
datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS
timestamp)) as prcreateage, datediff(CAST(acl.accesstime AS
timestamp),CAST(pra.updated_at AS timestamp)) as prupdateage,  (case when
(pra.public_record_id is not null and TRIM(pra.public_record_id) <> '')
 then (case when (pra.primary_listing_id is null or
TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END)  else (case
when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id)
<> '') then 'MLS' else 'UNKNOWN' END) END) as listingType,
 acl.httpstatuscode,  acl.httpverb,  acl.requesttime,
acl.upstreamheadertime , acl.upstreamresponsetime,  acl.page_id,  useragent
AS user_agent,  substring(split(pra.url,'/')[0],
0,length(split(pra.url,'/')[0])-3) as city,  substring(split(pra.url,'/')[0],
length(split(pra.url,'/')[0])-1,2) as state,  ml.mls_id  FROM
access_logs.loadbalancer_accesslogs acl  inner join
mls_public_record_association_snapshot_orc pra on acl.listing_url = pra.url
 left outer join mls_listing_snapshot_orc ml on pra.primary_listing_id =
ml.id  left outer join attribute a on a.id = ml.standard_status  WHERE
acl.accesstimedate="2016-10-23";


Any clue, or something that you would want me to focus on to debug the
issue.

Regards,
Satyajit.



On Tue, Oct 25, 2016 at 8:49 PM, Eugene Koifman 
wrote:

> Which of your tables are are transactional?  Can you provide the DDL?
>
> I don’t think “File does not exist” error is causing your queries to
> fail.  It’s an INFO level msg.
> There should be some other error.
>
> Eugene
>
>
> From: satyajit vegesna 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, October 25, 2016 at 5:46 PM
> To: "user@hive.apache.org" , "d...@hive.apache.org" <
> d...@hive.apache.org>
> Subject: Error with flush_length File in Orc, in hive 2.1.0 and mr
> execution engine.
>
> HI All,
>
> i am using hive 2.1.0 , hadoop 2.7.2 , but  when i try running queries
> like simple insert,
>
> set mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set
> hive.exec.dynamic.partition.mode=nonstrict;set 
> hive.exec.max.dynamic.partitions.pernode=400;set
> hive.exec.max.dynamic.partitions=2000;set mapreduce.map.memory.mb=5120;set
> mapreduce.reduce.memory.mb=5120;set 
> mapred.tasktracker.map.tasks.maximum=30;set
> mapred.tasktracker.reduce.tasks.maximum=20;set
> mapred.reduce.child.java.opts=-Xmx2048m;set 
> mapred.map.child.java.opts=-Xmx2048m;
> set hive.support.concurrency=true; set hive.txn.manager=org.apache.
> hadoop.hive.ql.lockmgr.DbTxnManager; set hive.compactor.initiator.on=false;
> set hive.compactor.worker.threads=1;set mapreduce.job.queuename=default;set
> hive.exec.dynamic.partition=true;set 
> hive.exec.dynamic.partition.mode=nonstrict;INSERT
> INTO access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select pra.url
> as prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as
> prCreated, CAST(pra.updated_at AS timestamp) as prUpdated,
> CAST(ml.created_at AS timestamp) as mlCreated, CAST(ml.updated_at AS
> timestamp) as mlUpdated, a.name as status, pra.public_record_id as prId,
> acl.accesstime as crawledon, pra.id as propId, pra.primary_listing_id as
> listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS
> timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS
> timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage,
> datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS
> timestamp)) as prcreateage, datediff(CAST(acl.accesstime AS
> timestamp),CAST(pra.updated_at AS timestamp)) as prupdateage,  (case when
> (pra.public_record_id is not null and TRIM(pra.public_record_id) <> '')
>  then (case when (pra.primary_listing_id is null or
> TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END)  else (case
> when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id)
> <> '') then 'MLS' else 'UNKNOWN' END) END) as listingType,
>  acl.httpstatuscode,  acl.httpverb,  acl.requesttime,
> acl.upstreamheadertime , acl.upstreamresponsetime,  acl.page_id,  useragent
> AS user_agent,  substring(split(pra.url,'/')[0],
> 0,length(split(pra.url,'/')[0])-3) as city,  substring(split(pra.url,'/')[0],
> 

Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.

2016-10-25 Thread Eugene Koifman
Which of your tables are are transactional?  Can you provide the DDL?

I don’t think “File does not exist” error is causing your queries to fail.  
It’s an INFO level msg.
There should be some other error.

Eugene


From: satyajit vegesna 
>
Reply-To: "user@hive.apache.org" 
>
Date: Tuesday, October 25, 2016 at 5:46 PM
To: "user@hive.apache.org" 
>, 
"d...@hive.apache.org" 
>
Subject: Error with flush_length File in Orc, in hive 2.1.0 and mr execution 
engine.

HI All,

i am using hive 2.1.0 , hadoop 2.7.2 , but  when i try running queries like 
simple insert,

set mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set 
hive.exec.dynamic.partition.mode=nonstrict;set 
hive.exec.max.dynamic.partitions.pernode=400;set 
hive.exec.max.dynamic.partitions=2000;set mapreduce.map.memory.mb=5120;set 
mapreduce.reduce.memory.mb=5120;set mapred.tasktracker.map.tasks.maximum=30;set 
mapred.tasktracker.reduce.tasks.maximum=20;set 
mapred.reduce.child.java.opts=-Xmx2048m;set 
mapred.map.child.java.opts=-Xmx2048m; set hive.support.concurrency=true; set 
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set 
hive.compactor.initiator.on=false; set hive.compactor.worker.threads=1;set 
mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set 
hive.exec.dynamic.partition.mode=nonstrict;INSERT INTO 
access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select pra.url as 
prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as prCreated, 
CAST(pra.updated_at AS timestamp) as prUpdated, CAST(ml.created_at AS 
timestamp) as mlCreated, CAST(ml.updated_at AS timestamp) as mlUpdated, 
a.name as status, pra.public_record_id as prId, acl.accesstime 
as crawledon, pra.id as propId, pra.primary_listing_id as 
listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS 
timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS 
timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage, 
datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS timestamp)) 
as prcreateage, datediff(CAST(acl.accesstime AS timestamp),CAST(pra.updated_at 
AS timestamp)) as prupdateage,  (case when (pra.public_record_id is not null 
and TRIM(pra.public_record_id) <> '')  then (case when (pra.primary_listing_id 
is null or TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END)  else 
(case when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id) 
<> '') then 'MLS' else 'UNKNOWN' END) END) as listingType,  acl.httpstatuscode, 
 acl.httpverb,  acl.requesttime, acl.upstreamheadertime , 
acl.upstreamresponsetime,  acl.page_id,  useragent AS user_agent,  
substring(split(pra.url,'/')[0], 0,length(split(pra.url,'/')[0])-3) as city,  
substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as state,  
ml.mls_id  FROM access_logs.loadbalancer_accesslogs acl  inner join 
mls_public_record_association_snapshot_orc pra on acl.listing_url = pra.url  
left outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = 
ml.id  left outer join attribute a on a.id = 
ml.standard_status  WHERE acl.accesstimedate="2016-10-23";

i finally end up getting below error,

2016-10-25 17:40:18,725 Stage-2 map = 100%,  reduce = 52%, Cumulative CPU 
1478.96 sec
2016-10-25 17:40:19,761 Stage-2 map = 100%,  reduce = 62%, Cumulative CPU 
1636.58 sec
2016-10-25 17:40:20,794 Stage-2 map = 100%,  reduce = 64%, Cumulative CPU 
1764.97 sec
2016-10-25 17:40:21,820 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 
1879.61 sec
2016-10-25 17:40:22,842 Stage-2 map = 100%,  reduce = 80%, Cumulative CPU 
2051.38 sec
2016-10-25 17:40:23,872 Stage-2 map = 100%,  reduce = 90%, Cumulative CPU 
2151.49 sec
2016-10-25 17:40:24,907 Stage-2 map = 100%,  reduce = 93%, Cumulative CPU 
2179.67 sec
2016-10-25 17:40:25,944 Stage-2 map = 100%,  reduce = 94%, Cumulative CPU 
2187.86 sec
2016-10-25 17:40:29,062 Stage-2 map = 100%,  reduce = 95%, Cumulative CPU 
2205.22 sec
2016-10-25 17:40:30,107 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 
2241.25 sec
MapReduce Total cumulative CPU time: 37 minutes 21 seconds 250 msec
Ended Job = job_1477437520637_0009
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type