Re: Creating Parquet Files from tsv .gz Data_Read Error

Tanmay Solanki Thu, 16 Jun 2016 14:48:06 -0700

Hi,
Thanks for the reply. It seems like the error has to do with running out of 
memory. Here is the full error message:
Error: DATA_READ ERROR: Error processing input: , line=6026, char=7284350. 
Content parsed: [ ]


Failure while reading file s3a://<bucket/file>.gz. Happened at or shortly 
before byte position 929686.
Fragment 1:171

[Error Id: ce3d41af-5ee2-448a-97ee-206b601acd25 on <host>:31010]

  (com.univocity.parsers.common.TextParsingException) Error processing input: , 
line=6026, char=7284350. Content parsed: [ ]
    
org.apache.drill.exec.store.easy.text.compliant.TextReader.handleException():480
    org.apache.drill.exec.store.easy.text.compliant.TextReader.parseNext():389
    
org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.next():196
    org.apache.drill.exec.physical.impl.ScanBatch.next():191
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():91
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
    
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745
  Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Failure 
allocating buffer.
    io.netty.buffer.PooledByteBufAllocatorL.allocate():64
    org.apache.drill.exec.memory.AllocationManager.<init>():80
    org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():239
    org.apache.drill.exec.memory.BaseAllocator.buffer():221
    org.apache.drill.exec.memory.BaseAllocator.buffer():191
    org.apache.drill.exec.vector.UInt4Vector.reAlloc():217
    
org.apache.drill.exec.store.easy.text.compliant.RepeatedVarCharOutput.expandVarCharOffsets():212
    
org.apache.drill.exec.store.easy.text.compliant.RepeatedVarCharOutput.endField():255
    org.apache.drill.exec.store.easy.text.compliant.TextReader.parseField():325
    org.apache.drill.exec.store.easy.text.compliant.TextReader.parseRecord():141
    org.apache.drill.exec.store.easy.text.compliant.TextReader.parseNext():370
    
org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.next():196
    org.apache.drill.exec.physical.impl.ScanBatch.next():191
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():91
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
    
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745
  Caused By (java.lang.OutOfMemoryError) Direct buffer memory
    java.nio.Bits.reserveMemory():693
    java.nio.DirectByteBuffer.<init>():123
    java.nio.ByteBuffer.allocateDirect():311
    io.netty.buffer.PoolArena$DirectArena.newChunk():437
    io.netty.buffer.PoolArena.allocateNormal():179
    io.netty.buffer.PoolArena.allocate():168
    io.netty.buffer.PoolArena.allocate():98
    
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.newDirectBufferL():165
    io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.directBuffer():195
    io.netty.buffer.PooledByteBufAllocatorL.allocate():62
    org.apache.drill.exec.memory.AllocationManager.<init>():80
    org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():239
    org.apache.drill.exec.memory.BaseAllocator.buffer():221
    org.apache.drill.exec.memory.BaseAllocator.buffer():191
    org.apache.drill.exec.vector.UInt4Vector.reAlloc():217
    
org.apache.drill.exec.store.easy.text.compliant.RepeatedVarCharOutput.expandVarCharOffsets():212
    
org.apache.drill.exec.store.easy.text.compliant.RepeatedVarCharOutput.endField():255
    org.apache.drill.exec.store.easy.text.compliant.TextReader.parseField():325
    org.apache.drill.exec.store.easy.text.compliant.TextReader.parseRecord():141
    org.apache.drill.exec.store.easy.text.compliant.TextReader.parseNext():370
    
org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.next():196
    org.apache.drill.exec.physical.impl.ScanBatch.next():191
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():91
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
    
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():745 (state=,code=0)

I have the MAX_DIRECT_MEMORY set to 128G and MAX_HEAP_MEMORY set to 32G which I 
assume is per node. I also set the planner.memory.max_query_memory_per_node to 
a very high value. However, looking in the web console, the Maximum Direct 
Memory is 8,589,934,592 which seems much much lower than what I set it to.
Tanmay
 

    On Thursday, 16 June 2016 2:22 PM, Jinfeng Ni <[email protected]> wrote:
 

 sounds like data at line 2026 is either not well formatted, or hit a
bug in drill.

Can you try the following?

1) turn on verbose error message, and run the query again to see if
the verbose error msg will tell us more info?

alter session set `exec.errors.verbose` = true;

2) if possible, check line 2026 in your input file, to see if there is
anything suspicious .



On Thu, Jun 16, 2016 at 1:49 PM, Tanmay Solanki
<[email protected]> wrote:
> Hello,
> I am currently running Apache Drill on a 20 node cluster and was running into 
> some errors that I was wondering if you would be able to help me with this.
>
> I am attempting to run the following query to create a parquet table in a new 
> S3 bucket from another table that is in a tsv format:
> create table s3_output.tmp.`<output file>` as select
> columns[0], columns[1], columns[2], columns[3], columns[4], columns[5], 
> columns[6], columns[7], columns[8], columns[9],
> columns[10], columns[11], columns[12], columns[13], columns[14], columns[15], 
> columns[16], columns[17], columns[18], columns[19],
> columns[20], columns[21], columns[22], columns[23], columns[24], columns[25], 
> columns[26], columns[27], columns[28], columns[29],
> columns[30], columns[31], columns[32], columns[33], columns[34], columns[35], 
> columns[36], columns[37], columns[38], columns[39],
> columns[40], columns[41], columns[42], columns[43], columns[44], columns[45], 
> columns[46], columns[47], columns[48], columns[49],
> columns[50], columns[51], columns[52], columns[53], columns[54], columns[55], 
> columns[56], columns[57], columns[58], columns[59],
> columns[60], columns[61], columns[62], columns[63], columns[64], columns[65], 
> columns[66], columns[67], columns[68], columns[69],
> columns[70], columns[71], columns[72], columns[73], columns[74], columns[75], 
> columns[76], columns[77], columns[78], columns[79],
> columns[80], columns[81], columns[82], columns[83], columns[84], columns[85], 
> columns[86], columns[87], columns[88], columns[89],
> columns[90], columns[91], columns[92], columns[93], columns[94], columns[95], 
> columns[96], columns[97], columns[98], columns[99],
> columns[100], columns[101], columns[102], columns[103], columns[104], 
> columns[105], columns[106], columns[107], columns[108], columns[109],
> columns[110], columns[111], columns[112], columns[113], columns[114], 
> columns[115], columns[116], columns[117], columns[118], columns[119],
> columns[120], columns[121], columns[122], columns[123], columns[124], 
> columns[125], columns[126], columns[127], columns[128], columns[129],
> columns[130], columns[131], columns[132], columns[133], columns[134], 
> columns[135], columns[136], columns[137], columns[138], columns[139],
> columns[140], columns[141], columns[142], columns[143], columns[144], 
> columns[145], columns[146], columns[147], columns[148], columns[149],
> columns[150], columns[151], columns[152], columns[153], columns[154], 
> columns[155], columns[156], columns[157], columns[158], columns[159],
> columns[160], columns[161], columns[162], columns[163], columns[164], 
> columns[165], columns[166], columns[167], columns[168], columns[169],
> columns[170], columns[171], columns[172], columns[173] from s3input.`<input 
> path>*.gz`;
> This is the error output I get while running this query.
> Error: DATA_READ ERROR: Error processing input: , line=2026, char=2449781. 
> Content parsed: [ ]
>
> Failure while reading file s3a://<input bucket/file>.gz. Happened at or 
> shortly before byte position 329719.
> Fragment 1:19
>
> [Error Id: fe289e19-c7b7-4739-9960-c15b8a62af3b on <node 6>:31010] 
> (state=,code=0)
> Do you have any idea how I can go about trying to solve this issue?
> Thanks for any help!Tanmay Solanki

Re: Creating Parquet Files from tsv .gz Data_Read Error

Reply via email to