Dear all We have a Hive query that 'insert overwrites' from one main hive table to another table about 24million rows every day.
This query was working fine so long, but lately it has started to hang at the reduce steps. It just gets stuck after all maps are completed. We checked the logs and it says the containers are released. The below exception starts to show up in the logs once the reduce steps start., and keeps recurring. The job completes fine if we reduce the # of rows processed by reducing the # of days data being processed. 2016-01-08 19:33:33,091 INFO [IPC Server handler 28 on 43451] org.apache.tez.dag.app.dag.impl.TaskImpl: TaskAttempt:attempt_1442077641322_71853_1_06_000001_0 sent events: (682-684) 2016-01-08 19:33:33,119 INFO [Socket Reader #1 for port 43451] org.apache.hadoop.ipc.Server: Socket Reader #1 for port 43451: readAndProcess from client 39.0.8.17 threw exception [java.io.IOException: Connection reset by peer] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.ipc.Server.channelRead(Server.java:2558) at org.apache.hadoop.ipc.Server.access$2800(Server.java:130) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1459) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) 2016-01-08 19:33:33,124 INFO [IPC Server handler 24 on 43451] org.apache.tez.dag.app.dag.impl.TaskImpl: TaskAttempt:attempt_1442077641322_71853_1_06_000000_0 sent events: (682-684) 2016-01-08 19:33:33,125 INFO [AsyncDispatcher event handler] org.apache.tez.dag.history.HistoryEventHandler: [HISTORY][DAG:dag_1442077641322_71853_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 4, taskAttemptId=attempt_1442077641322_71853_1_07_000750_0, startTime=1452303118171, finishTime=1452303213123, timeTaken=94952, status=SUCCEEDED, diagnostics=, counters=Counters: 28, org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=1, File System Counters, FILE: BYTES_READ=56, FILE: BYTES_WRITTEN=554004, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS: BYTES_READ=503489499, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=29, HDFS: LARGE_READ_OPS=0, HDFS: WRITE_OPS=0, org.apache.tez.common.counters.TaskCounter, SPILLED_RECORDS=6593, GC_TIME_MILLIS=317, CPU_MILLISECONDS=-765630, PHYSICAL_MEMORY_BYTES=684494848, VIRTUAL_MEMORY_BYTES=1374089216, COMMITTED_HEAP_BYTES=801112064, INPUT_RECORDS_PROCESSED=9068754, OUTPUT_RECORDS=6593, OUTPUT_BYTES=540750, OUTPUT_BYTES_WITH_OVERHEAD=553940, OUTPUT_BYTES_PHYSICAL=553948, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, org.apache.hadoop.hive.ql.exec.FilterOperator$Counter, FILTERED=28489474, PASSED=22646, org.apache.hadoop.hive.ql.exec.MapOperator$Counter, DESERIALIZE_ERRORS=0 2 Please let me know if more details are required. Request help with this issue. Thank you Suresh.