How to verify predicate pushdown

2017-10-26 Thread PROJJWAL SAHA
Hello,

One question,
How to verify whether predicate pushdown is happening ?

I have one parquet file generated using CTAS command. I have executed
REFRESH METADATA.  I am firing a simple query with a WHERE clause. In the
physical plan for the scan operation, i see rowcount as total number of
rows in the data. Should this value gets lowered in case of predicate
pushdown ? Is is necessary to sort the predicate column for this to take
effect ? Any pointers ?

Regards,
Projjwal


Re: Benchmark numbers using Drill

2017-10-20 Thread PROJJWAL SAHA
Thanks for this very useful info..

On 19 Oct 2017 11:28 pm, "Saurabh Mahapatra" <saurabhmahapatr...@gmail.com>
wrote:

> I do not think you will get such information about benchmarks from
> customers on production workloads. But from the customers I have worked
> with who have taken Drill to production, here is some information that may
> be of use to you:
>
> 1. The trend universally has been to use beefier machines for in-memory
> query engines. We see 256GB RAM and 32 cores as the most frequent
> configuration. On the network side, it is 2x10GbE.
>
> 2. The most commonly sized dedicated cluster for starting out with Drill in
> production has been around 16-20 nodes with the above configuration. I have
> several customers who have deployed this on 200+ nodes as well but in those
> scenarios, it is a service among many.
>
> 3. The concurrency we see in the above settings is a function of the size
> of the dataset and the complexity of the customer query. In general,
> Little's law holds. The smaller the chunk of work is to be processed, the
> faster will be the throughput. Our understanding of this changes further
> with the new releases of Drill where spill to disk features will make it
> more of a pessimistic execution engine. Also, the use of queues can also
> change this understanding.
>
> 4. From my company side, we do have TPCH and TPCDS benchmarks that I do
> share with customers. But such benchmarks are flawed because they come from
> the world of traditional warehousing where the competition was among
> general purpose query engines. For example, our tests show that at higher
> and higher data scale, Drill beats Impala on these benchmarks. The same is
> touted by the Hive LLAP folks as well. But they do not necessarily imply
> that it is the best tool choice for the production environment. It is a
> reason why I am resistant getting into the war of the query engines in
> which every query engine beats the other under a given set of primed
> conditions.
>
> 5. It is an absolute most that you understand the query patterns that the
> system will have to withstand with the data characteristics specific to
> your use case. I would only trust that. Big data systems are going to be
> application specific and will require tuning. Which also means that you
> have to revisit the kinds of analytics you would like your end users to
> have. Which again raises the question-what kinds of analytics truly
> generate value for the BI user?
>
> Best,
> Saurabh
>
> On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA <proj.s...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Is there any public performance benchmark that users have achieved using
> > Drill in production scenarios ? It would be useful if someone can pass me
> > any links for customer user stories.
> >
> > Regards
> >
>


Benchmark numbers using Drill

2017-10-18 Thread PROJJWAL SAHA
Hi,

Is there any public performance benchmark that users have achieved using
Drill in production scenarios ? It would be useful if someone can pass me
any links for customer user stories.

Regards


Re: Exception while reading parquet data

2017-10-16 Thread PROJJWAL SAHA
here is the link for the parquet data.
https://drive.google.com/file/d/0BzZhvMHOeao1S2Rud2xDS1NyS00/view?usp=sharing

Setting store.parquet.reader.pagereader.bufferedread=false did not solve
the issue.

I am using Drill 1.11. The parquet data is fetched from Oracle Storage
Cloud Service using swift driver.

Here is the error on the drill command prompt -
Error: DATA_READ ERROR: Exception occurred while reading from disk.

File:
/data1GBparquet/storereturns/part-0-7ce26fde-f342-4aae-a727-71b8b7a60e63.parquet
Column:  sr_return_time_sk
Row Group Start:  417866
File:
/data1GBparquet/storereturns/part-0-7ce26fde-f342-4aae-a727-71b8b7a60e63.parquet
Column:  sr_return_time_sk
Row Group Start:  417866
Fragment 0:0

On Sun, Oct 15, 2017 at 8:59 PM, Kunal Khatua <kkha...@mapr.com> wrote:

> You could try uploading to Google Drive (since you have a Gmail account)
> and share the link .
>
> Did Parth's suggestion of
> store.parquet.reader.pagereader.bufferedread=false
> resolve the issue?
>
> Also share the details of the hardware setup... #nodes, Hadoop version,
> etc.
>
>
> -----Original Message-
> From: PROJJWAL SAHA [mailto:proj.s...@gmail.com]
> Sent: Sunday, October 15, 2017 8:07 AM
> To: user@drill.apache.org
> Subject: Re: Exception while reading parquet data
>
> Is there any place where I can upload the 12MB parquet data. I am not able
> to send the file through mail to the user group.
>
> On Thu, Oct 12, 2017 at 10:58 PM, Parth Chandra <par...@apache.org> wrote:
>
> > Seems like a bug in BufferedDirectBufInputStream.  Is it possible to
> > share a minimal data file that triggers this?
> >
> > You can also try turning off the buffering reader.
> >store.parquet.reader.pagereader.bufferedread=false
> >
> > With async reader on and buffering off, you might not see any
> > degradation in performance in most cases.
> >
> >
> >
> > On Thu, Oct 12, 2017 at 2:08 AM, PROJJWAL SAHA <proj.s...@gmail.com>
> > wrote:
> >
> > > hi,
> > >
> > > disabling sync parquet reader doesnt solve the problem. I am getting
> > > similar exception I dont see any issue with the parquet file since
> > > the same file works on loading the same on alluxio.
> > >
> > > 2017-10-12 04:19:50,502
> > > [2620da63-4efb-47e2-5e2c-29b48c0194c0:frag:0:0] ERROR
> > > o.a.d.e.u.f.BufferedDirectBufInputStream - Error reading from stream
> > > part-0-7ce26fde-f342-4aae-a727-71b8b7a60e63.parquet. Error was :
> > > null
> > > 2017-10-12 04:19:50,506
> > > [2620da63-4efb-47e2-5e2c-29b48c0194c0:frag:0:0] ERROR
> > > o.a.d.exec.physical.impl.ScanBatch - SYSTEM ERROR:
> > > IndexOutOfBoundsException
> > >
> > >
> > > [Error Id: 3b7c4587-c1b8-4e79-bdaa-b2aa1516275b ]
> > > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> > > IndexOutOfBoundsException
> > >
> > >
> > > [Error Id: 3b7c4587-c1b8-4e79-bdaa-b2aa1516275b ]
> > > at org.apache.drill.common.exceptions.UserException$
> > > Builder.build(UserException.java:550)
> > > ~[drill-common-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.physical.impl.ScanBatch.next(
> > > ScanBatch.java:249)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > > AbstractRecordBatch.java:119)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > > AbstractRecordBatch.java:109)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.record.AbstractSingleRecordBatch.
> > > innerNext(AbstractSingleRecordBatch.java:51)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > > AbstractRecordBatch.java:162)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > > AbstractRecordBatch.java:119)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > > AbstractRecordBatch.java:109)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.record.AbstractSingleRecordBatch.
> > > innerNext(AbstractSingleRecordBatch.java:51)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > at org.apache.drill.exec.physical.impl.svremover.
> > > RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> >

Re: Exception while reading parquet data

2017-10-15 Thread PROJJWAL SAHA
Is there any place where I can upload the 12MB parquet data. I am not able
to send the file through mail to the user group.

On Thu, Oct 12, 2017 at 10:58 PM, Parth Chandra <par...@apache.org> wrote:

> Seems like a bug in BufferedDirectBufInputStream.  Is it possible to share
> a minimal data file that triggers this?
>
> You can also try turning off the buffering reader.
>store.parquet.reader.pagereader.bufferedread=false
>
> With async reader on and buffering off, you might not see any degradation
> in performance in most cases.
>
>
>
> On Thu, Oct 12, 2017 at 2:08 AM, PROJJWAL SAHA <proj.s...@gmail.com>
> wrote:
>
> > hi,
> >
> > disabling sync parquet reader doesnt solve the problem. I am getting
> > similar exception
> > I dont see any issue with the parquet file since the same file works on
> > loading the same on alluxio.
> >
> > 2017-10-12 04:19:50,502
> > [2620da63-4efb-47e2-5e2c-29b48c0194c0:frag:0:0] ERROR
> > o.a.d.e.u.f.BufferedDirectBufInputStream - Error reading from stream
> > part-0-7ce26fde-f342-4aae-a727-71b8b7a60e63.parquet. Error was :
> > null
> > 2017-10-12 04:19:50,506
> > [2620da63-4efb-47e2-5e2c-29b48c0194c0:frag:0:0] ERROR
> > o.a.d.exec.physical.impl.ScanBatch - SYSTEM ERROR:
> > IndexOutOfBoundsException
> >
> >
> > [Error Id: 3b7c4587-c1b8-4e79-bdaa-b2aa1516275b ]
> > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> > IndexOutOfBoundsException
> >
> >
> > [Error Id: 3b7c4587-c1b8-4e79-bdaa-b2aa1516275b ]
> > at org.apache.drill.common.exceptions.UserException$
> > Builder.build(UserException.java:550)
> > ~[drill-common-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.physical.impl.ScanBatch.next(
> > ScanBatch.java:249)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:119)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:109)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractSingleRecordBatch.
> > innerNext(AbstractSingleRecordBatch.java:51)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:162)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:119)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:109)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractSingleRecordBatch.
> > innerNext(AbstractSingleRecordBatch.java:51)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.physical.impl.svremover.
> > RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:162)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:119)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:109)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractSingleRecordBatch.
> > innerNext(AbstractSingleRecordBatch.java:51)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.physical.impl.project.
> > ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:162)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:119)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:109)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.physical.impl.aggregate.
> > HashAggBatch.buildSchema(HashAggBatch.java:111)
> > [drill-java-exec-1.11.0.jar:1.11.0]
> > at org.apache.drill.exec.record.AbstractRecordBatch.next(
> > AbstractRecordBatch.java:142)
> > [drill-java-exec-1.11.0.jar:

Re: Exception while reading parquet data

2017-10-12 Thread PROJJWAL SAHA
ED], 17004301}, ColumnMetaData{UNCOMPRESSED
[sr_dummycol] BINARY  [RLE, PLAIN, BIT_PACKED], 18570072}]}]}
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleException(ParquetRecordReader.java:272)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:299)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:180)
[drill-java-exec-1.11.0.jar:1.11.0]
... 60 common frames omitted
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.getNextBlock(BufferedDirectBufInputStream.java:185)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.readInternal(BufferedDirectBufInputStream.java:212)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.read(BufferedDirectBufInputStream.java:277)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.util.filereader.DirectBufInputStream.getNext(DirectBufInputStream.java:111)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.PageReader.readPage(PageReader.java:216)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.PageReader.nextInternal(PageReader.java:283)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:307)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:69)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFieldsSerial(BatchReader.java:63)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFields(BatchReader.java:56)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader$FixedWidthReader.readRecords(BatchReader.java:143)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:42)
~[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:297)
~[drill-java-exec-1.11.0.jar:1.11.0]
... 61 common frames omitted
Caused by: java.lang.IndexOutOfBoundsException: null
at java.nio.Buffer.checkBounds(Buffer.java:567) ~[na:1.8.0_121]
at java.nio.ByteBuffer.put(ByteBuffer.java:827) ~[na:1.8.0_121]
at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:379) 
~[na:1.8.0_121]
at 
org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf(CompatibilityUtil.java:110)
~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.getNextBlock(BufferedDirectBufInputStream.java:182)
~[drill-java-exec-1.11.0.jar:1.11.0]
... 73 common frames omitted
2017-10-12 04:19:50,506
[2620da63-4efb-47e2-5e2c-29b48c0194c0:frag:0:0] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2620da63-4efb-47e2-5e2c-29b48c0194c0:0:0: State change requested
RUNNING --> FAILED
2017-10-12 04:19:50,507
[2620da63-4efb-47e2-5e2c-29b48c0194c0:frag:0:0] INFO
o.a.d.e.w.fragment.FragmentExecutor -
2620da63-4efb-47e2-5e2c-29b48c0194c0:0:0: State change requested
FAILED --> FINISHED
2017-10-12 04:19:50,533 [BitServer-2] WARN
o.a.drill.exec.work.foreman.Foreman - Dropping request to move to
COMPLETED state as query is already at FAILED state (which is
terminal).
2017-10-12 04:19:50,533 [BitServer-2] WARN
o.a.d.e.w.b.ControlMessageHandler - Dropping request to cancel
fragment. 2620da63-4efb-47e2-5e2c-29b48c0194c0:0:0 does not exist.



On Thu, Oct 12, 2017 at 1:49 PM, PROJJWAL SAHA <proj.s...@gmail.com> wrote:

> sure, I can try disabling sync parquet reader.
> Will this however, impact the performance of queries on parquet data ?
>
> On Thu, Oct 12, 2017 at 9:39 AM, Kunal Khatua <kkha...@mapr.com> wrote:
>
>> If this resolves the issue, could you share some additional details, such
>> as the metadata of the Parquet files, the OS, etc.? Details describing the
>> setup is also very helpful in identifying what could be the cause of the
>> error.
>>
>> We had observed some similar DATA_READ errors in the early iterations of
>> the Async Parquet reader, but those have been resolved. I'm presuming
>> you're already on the latest (i.e. Apache Drill 1.11.0)
>>
>> -Original Message-
>> From: Arjun kr [mailto:arjun...@outlook.com]
>> Sent: Wednesday, October 11, 2017 6:52 PM
>> To: user@d

Re: Exception while reading parquet data

2017-10-12 Thread PROJJWAL SAHA
sure, I can try disabling sync parquet reader.
Will this however, impact the performance of queries on parquet data ?

On Thu, Oct 12, 2017 at 9:39 AM, Kunal Khatua <kkha...@mapr.com> wrote:

> If this resolves the issue, could you share some additional details, such
> as the metadata of the Parquet files, the OS, etc.? Details describing the
> setup is also very helpful in identifying what could be the cause of the
> error.
>
> We had observed some similar DATA_READ errors in the early iterations of
> the Async Parquet reader, but those have been resolved. I'm presuming
> you're already on the latest (i.e. Apache Drill 1.11.0)
>
> -Original Message-
> From: Arjun kr [mailto:arjun...@outlook.com]
> Sent: Wednesday, October 11, 2017 6:52 PM
> To: user@drill.apache.org
> Subject: Re: Exception while reading parquet data
>
>
> Can you try disabling async parquet reader to see if problem gets resolved.
>
>
> alter session set `store.parquet.reader.pagereader.async`=false;
>
> Thanks,
>
> Arjun
>
>
> 
> From: PROJJWAL SAHA <proj.s...@gmail.com>
> Sent: Wednesday, October 11, 2017 2:20 PM
> To: user@drill.apache.org
> Subject: Exception while reading parquet data
>
> I get below exception when querying parquet data on Oracle Storage Cloud
> service.
> Any pointers on what does this point to ?
>
> Regards,
> Projjwal
>
>
> ERROR o.a.d.e.u.f.BufferedDirectBufInputStream - Error reading from
> stream part-6-25a9ae4b-fd9e-4770-b17e-9a29b270a4c2.parquet. Error was
> : null
> 2017-10-09 09:42:18,516 [scan-2] INFO  o.a.d.e.s.p.c.AsyncPageReader -
> User Error Occurred: Exception occurred while reading from disk.
> (java.lang.IndexOutOfBoundsException)
> org.apache.drill.common.exceptions.UserException: DATA_READ ERROR:
> Exception occurred while reading from disk.
>
> File:
> /data25GB/storereturns/part-6-25a9ae4b-fd9e-4770-b17e-
> 9a29b270a4c2.parquet
> Column:  sr_return_time_sk
> Row Group Start:  479751
>
> [Error Id: 10680bb8-d1d6-43a1-b5e0-ef15bd8a9406 ] at
> org.apache.drill.common.exceptions.UserException$
> Builder.build(UserException.java:550)
> ~[drill-common-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.
> handleAndThrowException(AsyncPageReader.java:185)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.store.parquet.columnreaders.
> AsyncPageReader.access$700(AsyncPageReader.java:82)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$
> AsyncPageReaderTask.call(AsyncPageReader.java:461)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$
> AsyncPageReaderTask.call(AsyncPageReader.java:381)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_121] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> [na:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] Caused by:
> java.io.IOException: java.lang.IndexOutOfBoundsException
> at
> org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.
> getNextBlock(BufferedDirectBufInputStream.java:185)
> ~[drill-java-exec-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.
> readInternal(BufferedDirectBufInputStream.java:212)
> ~[drill-java-exec-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.read(
> BufferedDirectBufInputStream.java:277)
> ~[drill-java-exec-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.util.filereader.DirectBufInputStream.getNext(
> DirectBufInputStream.java:111)
> ~[drill-java-exec-1.11.0.jar:1.11.0]
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$
> AsyncPageReaderTask.call(AsyncPageReader.java:421)
> [drill-java-exec-1.11.0.jar:1.11.0]
> ... 5 common frames omitted
> Caused by: java.lang.IndexOutOfBoundsException: null at
> java.nio.Buffer.checkBounds(Buffer.java:567) ~[na:1.8.0_121] at
> java.nio.ByteBuffer.put(ByteBuffer.java:827) ~[na:1.8.0_121] at
> java.nio.DirectByteBuffer.put(DirectByteBuffer.java:379) ~[na:1.8.0_121]
> at
> org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf(
> CompatibilityUtil.java:110)
> ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> at
> org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.
> getNextBlock(BufferedDirectBufInputStream.java:182)
>

Exception while reading parquet data

2017-10-11 Thread PROJJWAL SAHA
I get below exception when querying parquet data on Oracle Storage Cloud
service.
Any pointers on what does this point to ?

Regards,
Projjwal


ERROR o.a.d.e.u.f.BufferedDirectBufInputStream - Error reading from stream
part-6-25a9ae4b-fd9e-4770-b17e-9a29b270a4c2.parquet. Error was : null
2017-10-09 09:42:18,516 [scan-2] INFO  o.a.d.e.s.p.c.AsyncPageReader - User
Error Occurred: Exception occurred while reading from disk.
(java.lang.IndexOutOfBoundsException)
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR:
Exception occurred while reading from disk.

File:
/data25GB/storereturns/part-6-25a9ae4b-fd9e-4770-b17e-9a29b270a4c2.parquet
Column:  sr_return_time_sk
Row Group Start:  479751

[Error Id: 10680bb8-d1d6-43a1-b5e0-ef15bd8a9406 ]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
~[drill-common-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:185)
[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$700(AsyncPageReader.java:82)
[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:461)
[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:381)
[drill-java-exec-1.11.0.jar:1.11.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_121]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException
at
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.getNextBlock(BufferedDirectBufInputStream.java:185)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.readInternal(BufferedDirectBufInputStream.java:212)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.read(BufferedDirectBufInputStream.java:277)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.util.filereader.DirectBufInputStream.getNext(DirectBufInputStream.java:111)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:421)
[drill-java-exec-1.11.0.jar:1.11.0]
... 5 common frames omitted
Caused by: java.lang.IndexOutOfBoundsException: null
at java.nio.Buffer.checkBounds(Buffer.java:567) ~[na:1.8.0_121]
at java.nio.ByteBuffer.put(ByteBuffer.java:827) ~[na:1.8.0_121]
at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:379) ~[na:1.8.0_121]
at
org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf(CompatibilityUtil.java:110)
~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at
org.apache.drill.exec.util.filereader.BufferedDirectBufInputStream.getNextBlock(BufferedDirectBufInputStream.java:182)
~[drill-java-exec-1.11.0.jar:1.11.0]
... 9 common frames omitted
2017-10-09 09:42:20,533 [26248359-2fc8-d177-c3a6-507f6857e0ea:frag:2:3]
INFO  o.a.d.e.w.fragment.FragmentExecutor -
26248359-2fc8-d177-c3a6-507f6857e0ea:2:3: State change requested
AWAITING_ALLOCATION --> RUNNING
2017-10-09 09:42:20,533 [26248359-2fc8-d177-c3a6-507f6857e0ea:frag:2:3]
INFO  o.a.d.e.w.f.FragmentStatusReporter -
26248359-2fc8-d177-c3a6-507f6857e0ea:2:3: State to report: RUNNING
2017-10-09 09:42:20,534 [26248359-2fc8-d177-c3a6-507f6857e0ea:frag:2:3]
INFO  o.a.d.e.w.fragment.FragmentExecutor -
26248359-2fc8-d177-c3a6-507f6857e0ea:2:3: State change requested RUNNING
--> CANCELLATION_REQUESTED
2017-10-09 09:42:20,534 [26248359-2fc8-d177-c3a6-507f6857e0ea:frag:2:3]
INFO  o.a.d.e.w.f.FragmentStatusReporter -
26248359-2fc8-d177-c3a6-507f6857e0ea:2:3: State to report:
CANCELLATION_REQUESTED


Re: Exception when querying parquet data

2017-10-11 Thread PROJJWAL SAHA
I am using Oracle storage cloud service.
I did not run refresh table metadata. Even refresh table metadata fails
with this message.

I think it depends on external network . One another day, the same query
works fine..

Regards,
Projjwal

On Mon, Oct 9, 2017 at 8:23 PM, Padma Penumarthy <ppenumar...@mapr.com>
wrote:

> which cloud service is this ?
> It is not able to read parquet metadata. Did you run refresh table
> metadata to
> generate parquet metadata ?
> Can you manually check if there is parquet metadata file
> (.drill.parquet_metadata)
> in the directory you used in the query i.e. `data25Goct6/websales` ?
>
> Thanks
> Padma
>
>
> On Oct 9, 2017, at 5:50 AM, PROJJWAL SAHA <proj.s...@gmail.com<mailto:pr
> oj.s...@gmail.com>> wrote:
>
> Hello all,
>
> I am getting the below exception when querying parquet data stored in
> storage cloud service.What does this exception point to ?
> The query on the same parquet files works when they are stored in
> alluxio.which means the data is fine.
> I am using drill 11.1
>
> Any help is appreciated !
>
> Regards,
> Projjwal
>
> 2017-10-09 08:11:10,221 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
> INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> 262498a1-4fc1-608e-a7bc-ab2c6ddc09c9: select count(*) from
> `data25Goct6/websales`
> 2017-10-09 08:11:38,117 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> took 0 ms, numFiles: 1
> 2017-10-09 08:11:58,362 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> took 0 ms, numFiles: 1
> 2017-10-09 08:15:28,459 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
> INFO  o.a.d.exec.store.parquet.Metadata - Took 105962 ms to get file
> statuses
> 2017-10-09 08:16:00,651 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
> ERROR o.a.d.exec.store.parquet.Metadata - Waited for 27187ms, but
> tasks for 'Fetch parquet metadata' are not complete. Total runnable
> size 29, parallelism 16.
> 2017-10-09 08:16:00,652 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
> INFO  o.a.d.exec.store.parquet.Metadata - User Error Occurred: Waited
> for 27187ms, but tasks for 'Fetch parquet metadata' are not complete.
> Total runnable size 29, parallelism
> 16.org.apache.drill.common.exceptions.UserException: RESOURCE ERROR:
> Waited for 27187ms, but tasks for 'Fetch parquet metadata' are not
> complete. Total runnable size 29, parallelism 16.
>
>
> [Error Id: d9b6ee72-2e81-49ae-846c-61a14931b7ab ]
> at org.apache.drill.common.exceptions.UserException$
> Builder.build(UserException.java:550)
> ~[drill-common-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:151)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(
> Metadata.java:293)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(
> Metadata.java:270)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(
> Metadata.java:255)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(
> Metadata.java:117)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.ParquetGroupScan.init(
> ParquetGroupScan.java:730)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.ParquetGroupScan.<
> init>(ParquetGroupScan.java:226)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.ParquetGroupScan.<
> init>(ParquetGroupScan.java:186)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(
> ParquetFormatPlugin.java:170)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(
> ParquetFormatPlugin.java:66)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(
> FileSystemPlugin.java:144)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(
> AbstractStoragePlugin.java:100)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.planner.logical.DrillTable.
> getGroupScan(DrillTable.java:85)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.
> onMatch(DrillPushProjIntoScan.java:63)
> [drill-java-exec-1.11.0.jar:1.11.0]
> at org.apache.calcite.plan.volcano.VolcanoRuleCall.
> onMatch(VolcanoRuleCall.jav

Exception when querying parquet data

2017-10-09 Thread PROJJWAL SAHA
Hello all,

I am getting the below exception when querying parquet data stored in
storage cloud service.What does this exception point to ?
The query on the same parquet files works when they are stored in
alluxio.which means the data is fine.
I am using drill 11.1

Any help is appreciated !

Regards,
Projjwal

2017-10-09 08:11:10,221 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
262498a1-4fc1-608e-a7bc-ab2c6ddc09c9: select count(*) from
`data25Goct6/websales`
2017-10-09 08:11:38,117 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
took 0 ms, numFiles: 1
2017-10-09 08:11:58,362 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
took 0 ms, numFiles: 1
2017-10-09 08:15:28,459 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
INFO  o.a.d.exec.store.parquet.Metadata - Took 105962 ms to get file
statuses
2017-10-09 08:16:00,651 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
ERROR o.a.d.exec.store.parquet.Metadata - Waited for 27187ms, but
tasks for 'Fetch parquet metadata' are not complete. Total runnable
size 29, parallelism 16.
2017-10-09 08:16:00,652 [262498a1-4fc1-608e-a7bc-ab2c6ddc09c9:foreman]
INFO  o.a.d.exec.store.parquet.Metadata - User Error Occurred: Waited
for 27187ms, but tasks for 'Fetch parquet metadata' are not complete.
Total runnable size 29, parallelism
16.org.apache.drill.common.exceptions.UserException: RESOURCE ERROR:
Waited for 27187ms, but tasks for 'Fetch parquet metadata' are not
complete. Total runnable size 29, parallelism 16.


[Error Id: d9b6ee72-2e81-49ae-846c-61a14931b7ab ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
~[drill-common-1.11.0.jar:1.11.0]
at org.apache.drill.exec.store.TimedRunnable.run(TimedRunnable.java:151)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.Metadata.getParquetFileMetadata_v3(Metadata.java:293)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:270)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:255)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.Metadata.getParquetTableMetadata(Metadata.java:117)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.init(ParquetGroupScan.java:730)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:226)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.ParquetGroupScan.(ParquetGroupScan.java:186)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:170)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:66)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:144)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.store.AbstractStoragePlugin.getPhysicalScan(AbstractStoragePlugin.java:100)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:85)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:63)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:811)
[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:310)
[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:401)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:242)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:292)
[drill-java-exec-1.11.0.jar:1.11.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:169)
[drill-java-exec-1.11.0.jar:1.11.0]
at 

Re: Enable debugging for 3rd party storage plugin with eclipse

2017-03-23 Thread PROJJWAL SAHA
All,

I am able to connect from eclipse debug view to remote drill sqlline using
the instructions here.-
http://www.confusedcoders.com/bigdata/apache-drill/debugging-apache-drill-sqlline-query-with-java-remote-debugging

However, I am not sure how to use the 3rd party plugin source code in
eclipse and put it in the classpath of drill to enable me to debug the code
at runtime.

Please help me here.

Regards,
Projjwal

On Sun, Mar 19, 2017 at 10:43 PM, PROJJWAL SAHA <proj.s...@gmail.com> wrote:

> Hi all,
>
> I am trying to debug a 3rd party storage plugin and I need to enable debug
> with my eclipse IDE. Can someone pls guide me on the steps to enable
> debugging for eclipse - any documentation / link would also help. Also are
> the steps same if I would want to debug drill codebase ?
>
> Regards,
> Projjwal
>


Enable debugging for 3rd party storage plugin with eclipse

2017-03-19 Thread PROJJWAL SAHA
Hi all,

I am trying to debug a 3rd party storage plugin and I need to enable debug
with my eclipse IDE. Can someone pls guide me on the steps to enable
debugging for eclipse - any documentation / link would also help. Also are
the steps same if I would want to debug drill codebase ?

Regards,
Projjwal


Re: Display of query result using command line

2017-03-15 Thread PROJJWAL SAHA
select count() query works fine for me - it returns exact number
of rows close to 3 million.

The way I am executing queries is - I am doing a putty session to the drill
node, firing the query and getting the result printed on the putty console.

on removing extract headers from the tsv and then firing select * query,
the query completed on the console but it printed close to 2 million rows -
i feel that the entire process of printing such high number of rows is
giving me inconsistent results.

Thoughts ?

Regards,
Projjwal



On Thu, Mar 16, 2017 at 12:04 AM, Khurram Faraaz <kfar...@mapr.com> wrote:

> Three million rows is too many rows, for sqlline to print.
>
> Try doing a COUNT(*) and see if that query returns the correct count on
> that table.
>
>
> Thanks,
>
> Khurram
>
> ____
> From: PROJJWAL SAHA <proj.s...@gmail.com>
> Sent: Wednesday, March 15, 2017 7:41:00 PM
> To: user@drill.apache.org
> Subject: Display of query result using command line
>
> All,
>
> I am using drillconf from command line to display a query result like
> select * from xxx
> having 3 million rows. The screen display scrolls fast to display the
> result, however, it stops after some time with this exception -
>
> java.lang.NegativeArraySizeException
> at
> org.apache.drill.exec.vector.VarCharVector$Accessor.get(
> VarCharVector.java:440)
> at
> org.apache.drill.exec.vector.accessor.VarCharAccessor.
> getBytes(VarCharAccessor.java:128)
> at
> org.apache.drill.exec.vector.accessor.VarCharAccessor.
> getString(VarCharAccessor.java:149)
> at
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getString(
> BoundCheckingAccessor.java:124)
> at
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getString(
> TypeConvertingSqlAccessor.java:649)
> at
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getString(
> AvaticaDrillSqlAccessor.java:94)
> at org.apache.calcite.avatica.AvaticaSite.get(AvaticaSite.java:352)
> at
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(
> DrillResultSetImpl.java:434)
> at sqlline.Rows$Row.(Rows.java:157)
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
> at
> sqlline.TableOutputFormat$ResizingRowsProvider.next(
> TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1593)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:746)
> at sqlline.SqlLine.begin(SqlLine.java:621)
> at sqlline.SqlLine.start(SqlLine.java:375)
> at sqlline.SqlLine.main(SqlLine.java:268)
>
>
> The query shows completed in the profile.
>
> Any reason/suggestions on this ?
>
>
> Regards,
>
> Projjwal
>


Display of query result using command line

2017-03-15 Thread PROJJWAL SAHA
All,

I am using drillconf from command line to display a query result like
select * from xxx
having 3 million rows. The screen display scrolls fast to display the
result, however, it stops after some time with this exception -

java.lang.NegativeArraySizeException
at
org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:440)
at
org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:128)
at
org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:149)
at
org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getString(BoundCheckingAccessor.java:124)
at
org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getString(TypeConvertingSqlAccessor.java:649)
at
org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getString(AvaticaDrillSqlAccessor.java:94)
at org.apache.calcite.avatica.AvaticaSite.get(AvaticaSite.java:352)
at
org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:434)
at sqlline.Rows$Row.(Rows.java:157)
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
at
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
at sqlline.SqlLine.print(SqlLine.java:1593)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:746)
at sqlline.SqlLine.begin(SqlLine.java:621)
at sqlline.SqlLine.start(SqlLine.java:375)
at sqlline.SqlLine.main(SqlLine.java:268)


The query shows completed in the profile.

Any reason/suggestions on this ?


Regards,

Projjwal


Query on .gz.parquet files

2017-03-09 Thread PROJJWAL SAHA
All,

one question
i am querying on .gz.parquet files.
select * from xxx returns data like
+-+
| current |
+-+
|
{"vendor_id":"VTS","pickup_datetime":"ACj75+tEAAAvfSUA","payment_type":"CSH","fare_amount":12.0,"mta_tax":0.5,"tip_amount":0.0,"tolls_amount":5.33,"total_amount":18.33,"ratecodeid":1.0,"dropoff_datetime":"AEhTi5NFAAAvfSUA","passenger_count":1,"trip_distance":2.93,"extra":0.5,"pickup_geocode":{"Latitude":40.743677,"Longitude":-73.953802},"dropoff_geocode":{"Latitude":40.740917,"Longitude":-73.989298},"PRIMARY_KEY":"8589934600","pickup_geocode_geo_city":"Long
Island
City","pickup_geocode_geo_country":"US","pickup_geocode_geo_postcode":"11109","pickup_geocode_geo_region":"New
York","pickup_geocode_geo_subregion":"Queens
County","pickup_geocode_geo_regionid":"5128638","pickup_geocode_geo_subregionid":"5133268","dropoff_geocode_geo_city":"New
York
City","dropoff_geocode_geo_country":"US","dropoff_geocode_geo_postcode":"10007","dropoff_geocode_geo_region":"New
York","dropoff_geocode_geo_regionid":"5128638"} |.

it doesnt return in tabular format with headers at the top.

also select count(*) works fine
whereas select count(vendor_id) doesnt work - it returns 0

looks like the header names are not detected.

I have tried adding extractheaders: true for parquet
also tried adding extensions as gz.parquet - it doesnt work

i also have defaultInputFormat as parquet for the workspace.

Any suggestions ?

Regards,
Projjwal


Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-07 Thread PROJJWAL SAHA
Hi Kunal,

Good catch ! Thanks for this pointer.
I enabled logging at org.apache.drill level and I found -

2017-03-08 01:39:01,822 [274058fa-79df-9c74-3219-3fcb83a04605:foreman] INFO
 o.a.drill.exec.work.foreman.Foreman - Query text for query id
274058fa-79df-9c74-3219-3fcb83a04605: explain plan for select * from
dfs.root.`/scratch/localdisk/drill/testdata/Cust_1G_20_tsv` where
ORDER_ID='41' and CUSTOMER_ID='568'
*2017-03-08 01:39:01,823 [274058fa-79df-9c74-3219-3fcb83a04605:foreman]
DEBUG o.a.d.e.s.h.HBaseStoragePluginConfig - Initializing HBase
StoragePlugin configuration with zookeeper quorum 'localhost', port '2181'.*
*2017-03-08 01:39:16,038 [274058fa-79df-9c74-3219-3fcb83a04605:foreman]
DEBUG o.a.drill.exec.store.SchemaFactory - Took 14214 ms to register
schemas.*


i am not sure why the hbase storage plugin comes in play as it is disabled.
i then disabled all the other active plugins that i had and just kept the
dfs plugin.

the planning time is now reduced to 0.9 secs
and the query time for 1GB partitioned tsv data is taking 3.63 secs

is that a reasonable behaviour ?

Regards,
Projjwal

On Wed, Mar 8, 2017 at 12:11 AM, Kunal Khatua <kkha...@mapr.com> wrote:

>
> Looking at the 1st two lines of the log shows that the bulk of time was
> lost before the query even went into the real planning stage of the query:
>
>
> 2017-03-07 06:27:28,074 [274166de-f543-3fa7-ef9e-8e9e87d5d6a0:foreman]
> INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id
> 274166de-f543-3fa7-ef9e-8e9e87d5d6a0: select columns[0] from
> dfs.root.`/scratch/localdisk/drill/testdata/Cust_1G_20_tsv` where
> columns[0] ='41' and columns[3] ='568'
> 2017-03-07 06:28:00,775 [274166de-f543-3fa7-ef9e-8e9e87d5d6a0:foreman]
> INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> took 0 ms, numFiles: 1
>
>
> More than 30 secs is unaccounted for. Can you turn on the root logger to
> be at the debug level and retry the explain plan?
>
>
> Kunal Khatua
>
>
> 
> From: rahul challapalli <challapallira...@gmail.com>
> Sent: Tuesday, March 7, 2017 5:24:43 AM
> To: user
> Subject: Re: Minimise query plan time for dfs plugin for local file system
> on tsv file
>
> I did not get a chance to review the log file.
>
> However the next thing I would try is to make your cluster a single node
> cluster first and then run the same explain plan query separately on each
> individual file.
>
>
>
> On Mar 7, 2017 5:09 AM, "PROJJWAL SAHA" <proj.s...@gmail.com> wrote:
>
> > Hi Rahul,
> >
> > thanks for your suggestions. However, I am still not able to see any
> > reduction in query planning time
> > by explicit column names, removing extract headers and using
> columns[index]
> >
> > As I said, I ran explain plan and its taking 30+ secs for me.
> > My data is 1 GB tsv split into 20 files of 5 MB each.
> > Each 5MB file has close to 50k records
> > Its a 5 node cluster, and width per node is 4
> > Therefore, total number of minor fragments for one major fragment is 20
> > I have copied the source directory in all the drillbit nodes
> >
> > can you tell me a reasonable time estimate which I can expect drill to
> > return result for query for the above described scenario.
> > Query is - select columns[0] from dfs.root.`/scratch/localdisk/
> drill/testdata/Cust_1G_20_tsv`
> > where columns[0] ='41' and columns[3] ='568'
> >
> > attached is the json profile
> > and the drillbit.log
> >
> > I also have the tracing enabled.
> > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> > org.apache.drill.exec.work.foreman.Foreman
> > however i see the duration of various steps in the order of ms in the
> logs.
> > i am not sure where planning time of the order of 30 secs is consumed.
> >
> > Please help
> >
> > Regards,
> > Projjwal
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Mar 6, 2017 at 11:23 PM, rahul challapalli <
> > challapallira...@gmail.com> wrote:
> >
> >> You can try the below things. For each of the below check the planning
> >> time
> >> individually
> >>
> >> 1. Run explain plan for a simple "select * from `
> >> /scratch/localdisk/drill/testdata/Cust_1G_tsv`"
> >> 2. Replace the '*' in your query with explicit column names
> >> 3. Remove the extract header from your storage plugin configuration and
> >> from your data files? Rewrite your query to use, columns[0_based_index]
> >> instead of explicit column names
> >>
> >> Also how many columns do you have in your text files and 

Fwd: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-06 Thread PROJJWAL SAHA
all, please help me in giving suggestions on what areas i can look into why
the query planning time is taking so long for files which are local to the
drill machines. I have the same directory structure copied on all the 5
nodes of the cluster. I am accessing the source files using out of the box
dfs storage plugin.

Query planning time is approx 30 secs
Query execution time is apprx 1.5 secs

Regards,
Projjwal

-- Forwarded message --
From: PROJJWAL SAHA <proj.s...@gmail.com>
Date: Fri, Mar 3, 2017 at 5:06 PM
Subject: Minimise query plan time for dfs plugin for local file system on
tsv file
To: user@drill.apache.org


Hello all,

I am quering select * from dfs.xxx where yyy (filter condition)

I am using dfs storage plugin that comes out of the box from drill on a 1GB
file, local to the drill cluster.
The 1GB file is split into 10 files of 100 MB each.
As expected I see 11 minor and 2 major fagments.
The drill cluster is 5 nodes cluster with 4 cores, 32 GB  each.

One observation is that the query plan time is more than 30 seconds. I ran
the explain plan query to validate this.
The query execution time is 2 secs.
total time taken is 32secs

I wanted to understand how can i minimise the query plan time. Suggestions ?
Is the time taken described above expected ?
Attached is result from explain plan query

Regards,
Projjwal
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02UnionExchange
01-01  Project(T2¦¦*=[$0])
01-02SelectionVectorRemover
01-03  Filter(condition=[AND(=($1, '41'), =($2, '568'))])
01-04Project(T2¦¦*=[$0], ORDER_ID=[$1], CUSTOMER_ID=[$2])
01-05  Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/scratch/localdisk/drill/testdata/Cust_1G_tsv, numFiles=10, 
columns=[`*`], files=[file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/4.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/5.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/10.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/2.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/3.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/1.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/7.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/6.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/8.tsv, 
file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/9.tsv]]])
 | {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "fs-scan",
"@id" : 65541,
"userName" : "optitest",
"files" : [ "file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/4.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/5.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/10.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/2.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/3.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/1.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/7.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/6.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/8.tsv", 
"file:/scratch/localdisk/drill/testdata/Cust_1G_tsv/9.tsv" ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "file:///",
  "config" : null,
  "workspaces" : {
"root" : {
  "location" : "/",
  "writable" : true,
  "defaultInputFormat" : null
},
"tpch9m" : {
  "location" : "/user/hive/warehouse/tpch9m.db",
  "writable" : true,
  "defaultInputFormat" : null
},
"taxi1m" : {
  "location" : "/user/hive/warehouse/taxi.db/taxi_enriched_sukhdeep_1m",
  "writable" : true,
  "defaultInputFormat" : null
},
"tmp" : {
  "location" : "/tmp",
  "writable" : true,
  "defaultInputFormat" : null
}
  },
  "formats" : {
"psv" : {
  "type" : "text",
  "extensions" : [ "tbl" ],
  "delimiter" : "|"
},
"csv" : {
  "

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-05 Thread PROJJWAL SAHA
The files are copied to the same location in all the nodes of the cluster.
And all the nodes have equal access to the files.
The files are not located to a single shared file system.

On Fri, Mar 3, 2017 at 7:12 PM, John Omernik <j...@omernik.com> wrote:

> Can you help me understand what "local to the cluster" means in the context
> of a 5 node cluster? In the plan, the files are all file:// Are the files
> replicated to each node? is it a common shared filesystem?  Do all 5 nodes
> have equal access to the 10 files? I wonder if using a local FS in a
> distributed cluster is having some effect on the planning...
>
> On Fri, Mar 3, 2017 at 6:08 AM, PROJJWAL SAHA <proj.s...@gmail.com> wrote:
>
> > I did not change the default values used by drill.
> > Are you talking of changing planner.memory_limit
> > and planner.memory.max_query_memory_per_node ?
> > If there are any other debug work that I can do, pls suggest
> >
> > Regards
> >
> > On Fri, Mar 3, 2017 at 5:14 PM, Nitin Pawar <nitinpawar...@gmail.com>
> > wrote:
> >
> > > how much memory have you set for planner ?
> > >
> > > On Fri, Mar 3, 2017 at 5:06 PM, PROJJWAL SAHA <proj.s...@gmail.com>
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I am quering select * from dfs.xxx where yyy (filter condition)
> > > >
> > > > I am using dfs storage plugin that comes out of the box from drill
> on a
> > > > 1GB file, local to the drill cluster.
> > > > The 1GB file is split into 10 files of 100 MB each.
> > > > As expected I see 11 minor and 2 major fagments.
> > > > The drill cluster is 5 nodes cluster with 4 cores, 32 GB  each.
> > > >
> > > > One observation is that the query plan time is more than 30 seconds.
> I
> > > ran
> > > > the explain plan query to validate this.
> > > > The query execution time is 2 secs.
> > > > total time taken is 32secs
> > > >
> > > > I wanted to understand how can i minimise the query plan time.
> > > Suggestions
> > > > ?
> > > > Is the time taken described above expected ?
> > > > Attached is result from explain plan query
> > > >
> > > > Regards,
> > > > Projjwal
> > > >
> > > >
> > >
> > >
> > > --
> > > Nitin Pawar
> > >
> >
>


Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-03 Thread PROJJWAL SAHA
I did not change the default values used by drill.
Are you talking of changing planner.memory_limit
and planner.memory.max_query_memory_per_node ?
If there are any other debug work that I can do, pls suggest

Regards

On Fri, Mar 3, 2017 at 5:14 PM, Nitin Pawar <nitinpawar...@gmail.com> wrote:

> how much memory have you set for planner ?
>
> On Fri, Mar 3, 2017 at 5:06 PM, PROJJWAL SAHA <proj.s...@gmail.com> wrote:
>
> > Hello all,
> >
> > I am quering select * from dfs.xxx where yyy (filter condition)
> >
> > I am using dfs storage plugin that comes out of the box from drill on a
> > 1GB file, local to the drill cluster.
> > The 1GB file is split into 10 files of 100 MB each.
> > As expected I see 11 minor and 2 major fagments.
> > The drill cluster is 5 nodes cluster with 4 cores, 32 GB  each.
> >
> > One observation is that the query plan time is more than 30 seconds. I
> ran
> > the explain plan query to validate this.
> > The query execution time is 2 secs.
> > total time taken is 32secs
> >
> > I wanted to understand how can i minimise the query plan time.
> Suggestions
> > ?
> > Is the time taken described above expected ?
> > Attached is result from explain plan query
> >
> > Regards,
> > Projjwal
> >
> >
>
>
> --
> Nitin Pawar
>


Distribution of workload across nodes in a cluster

2017-02-22 Thread PROJJWAL SAHA
Hello,

I am doing select * query on a csv file of 1 GB with a 5 node drill
cluster. The csv file is stored in another storage cluster within the
enterprise.

In the query profile, I see one major fragment and within the major
fragment, I see only 1 minor fragment. The hostname for the minor fragment
corresponds to one of the nodes of the cluster.

I think therefore, that all the resources of the cluster are not utilized.
Is there any configuration parameters that can be tweaked to achieve more
effective workload distribution across cluster machines ?

Let me know what you think.

Regards,
Projjwal


Re: Query on performance using Drill and Amazon s3.

2017-02-21 Thread PROJJWAL SAHA
Thanks Nitin for the matrices you provided and the suggestions.

On Tue, Feb 21, 2017 at 2:23 PM, Nitin Pawar <nitinpawar...@gmail.com>
wrote:

> instead of doing select * in the first go,
> can you do query like select count(1)
>
> when your data is in csv files then yes all the data is transferred to the
> drill node and then query is executed on top of it.
> We had noticed the performance on csv was significantly more compared to
> parquet files, so we moved our data to parquet from csv and have not seen
> any issues on then.
>
> we did test run on 125M records, size was 8 GB in parquet and it took
> roughly 30 second or so.
>
> I would suggest two things
> 1) Which AWS region your S3 bucket is hosted  and which region your ec2
> servers are hosted?
> 2) If answer to above question is two different regions then you might want
> to move them into a single region.
>
> In either case, from AWS console you can figure out how much network
> throughput you are getting if that is the bottleneck
> Also drill machines would need CPU so along with 32GB memory if you have 8
> cores that would be desirable
>
> On Tue, Feb 21, 2017 at 2:17 PM, PROJJWAL SAHA <proj.s...@gmail.com>
> wrote:
>
> > Hi Nitin,
> >
> > I am executing the SQL query on a drillbit node using drill-conf .
> >  We have configured a 5 node drill cluster external to Amazon with 32GB
> > RAM. From one of the nodes, we are using drill-conf utility to fire the
> SQL
> > query.
> >
> > One observation is had is
> > select * from `xxx.tsv`
> > select * from `xxx.tsv` where yyy = 'zzz'
> >
> > Both these queries are taking almost the same time for 1 GB data with
> > 100 rows. So if the network for data transfer is the major time
> taking
> > component compared with the query execution time,  I think that the
> entire
> > data is first transferred to drill cluster and then the query is executed
> > on the drill cluster ?
> >
> > Regards,
> > Projjwal
> >
> > On Mon, Feb 20, 2017 at 6:18 PM, Nitin Pawar <nitinpawar...@gmail.com>
> > wrote:
> >
> > > how are you doing select * .. using drill UI or sqlline?
> > > where are you running it from ?
> > > is the drill hosted in aws or on your local machine?
> > >
> > > I think majority of the time is spent on displaying the result set
> > instead
> > > of querying the file if the drill server is on aws.
> > > If the drill server is local then it might be your network which might
> > take
> > > a lot of time based on s3 bucket location and where your drill server
> is
> > >
> > > On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA <proj.s...@gmail.com>
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I am using 1GB data in the form of .tsv file, stored in Amazon S3
> using
> > > > Drill 1.8. I am using default configurations of Drill using S3
> storage
> > > > plugin coming out of the box. The drill bits are configured on a 5
> node
> > > > cluster with 32GB RAM and 4VCPU.
> > > >
> > > > I see that select * from xxx; query takes 23 mins to fetch 1,040,000
> > > rows.
> > > >
> > > > Is this the expected behaviour ?
> > > > I am looking for any quick tuning that can improve the performance or
> > any
> > > > other suggestions.
> > > >
> > > > Attaching is the JSON profile for this query.
> > > >
> > > > Regards,
> > > > Projjwal
> > > >
> > >
> > >
> > >
> > > --
> > > Nitin Pawar
> > >
> >
>
>
>
> --
> Nitin Pawar
>


Re: Query on performance using Drill and Amazon s3.

2017-02-21 Thread PROJJWAL SAHA
Hi Nitin,

I am executing the SQL query on a drillbit node using drill-conf .
 We have configured a 5 node drill cluster external to Amazon with 32GB
RAM. From one of the nodes, we are using drill-conf utility to fire the SQL
query.

One observation is had is
select * from `xxx.tsv`
select * from `xxx.tsv` where yyy = 'zzz'

Both these queries are taking almost the same time for 1 GB data with
100 rows. So if the network for data transfer is the major time taking
component compared with the query execution time,  I think that the entire
data is first transferred to drill cluster and then the query is executed
on the drill cluster ?

Regards,
Projjwal

On Mon, Feb 20, 2017 at 6:18 PM, Nitin Pawar <nitinpawar...@gmail.com>
wrote:

> how are you doing select * .. using drill UI or sqlline?
> where are you running it from ?
> is the drill hosted in aws or on your local machine?
>
> I think majority of the time is spent on displaying the result set instead
> of querying the file if the drill server is on aws.
> If the drill server is local then it might be your network which might take
> a lot of time based on s3 bucket location and where your drill server is
>
> On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA <proj.s...@gmail.com>
> wrote:
>
> > Hello all,
> >
> > I am using 1GB data in the form of .tsv file, stored in Amazon S3 using
> > Drill 1.8. I am using default configurations of Drill using S3 storage
> > plugin coming out of the box. The drill bits are configured on a 5 node
> > cluster with 32GB RAM and 4VCPU.
> >
> > I see that select * from xxx; query takes 23 mins to fetch 1,040,000
> rows.
> >
> > Is this the expected behaviour ?
> > I am looking for any quick tuning that can improve the performance or any
> > other suggestions.
> >
> > Attaching is the JSON profile for this query.
> >
> > Regards,
> > Projjwal
> >
>
>
>
> --
> Nitin Pawar
>


Query on performance using Drill and Amazon s3.

2017-02-20 Thread PROJJWAL SAHA
Hello all,

I am using 1GB data in the form of .tsv file, stored in Amazon S3 using
Drill 1.8. I am using default configurations of Drill using S3 storage
plugin coming out of the box. The drill bits are configured on a 5 node
cluster with 32GB RAM and 4VCPU.

I see that select * from xxx; query takes 23 mins to fetch 1,040,000 rows.

Is this the expected behaviour ?
I am looking for any quick tuning that can improve the performance or any
other suggestions.

Attaching is the JSON profile for this query.

Regards,
Projjwal
{
"id": {
"part1": 2834241350655354400,
"part2": -4719640768589854000
},
"type": 1,
"start": 1487585409966,
"end": 1487586748105,
"query": "select * from `xxx`",
"plan": "00-00Screen : rowType = RecordType(ANY *): rowcount = 
1.0704562E7, cumulative cost = {1.17750182E7 rows, 1.17750182E7 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 187\n00-01  Project(*=[$0]) : rowType = 
RecordType(ANY *): rowcount = 1.0704562E7, cumulative cost = {1.0704562E7 rows, 
1.0704562E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 186\n00-02
Scan(groupscan=[EasyGroupScan [selectionRoot=s3a://xxx.tsv, numFiles=1, 
columns=[`*`], files=[s3a://xxx.tsv]]]) : rowType = (DrillRecordRow[*]): 
rowcount = 1.0704562E7, cumulative cost = {1.0704562E7 rows, 1.0704562E7 cpu, 
0.0 io, 0.0 network, 0.0 memory}, id = 185\n",
"foreman": {
"address": "xxx",
"userPort": 31010,
"controlPort": 31011,
"dataPort": 31012
},
"state": 2,
"totalFragments": 1,
"finishedFragments": 0,
"fragmentProfile": [
{
"majorFragmentId": 0,
"minorFragmentProfile": [
{
"state": 3,
"minorFragmentId": 0,
"operatorProfile": [
{
"inputProfile": [
{
"records": 104,
"batches": 129,
"schemas": 1
}
],
"operatorId": 2,
"operatorType": 28,
"setupNanos": 0,
"processNanos": 50858446809,
"peakLocalMemoryAllocated": 15646720,
"waitNanos": 1257947908700
},
{
"inputProfile": [
{
"records": 104,
"batches": 129,
"schemas": 1
}
],
"operatorId": 1,
"operatorType": 10,
"setupNanos": 3929932,
"processNanos": 26307751,
"peakLocalMemoryAllocated": 9142272,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 104,
"batches": 129,
"schemas": 1
}
],
"operatorId": 0,
"operatorType": 13,
"setupNanos": 0,
"processNanos": 38391526,
"peakLocalMemoryAllocated": 9142272,
"metric": [
{
"metricId": 0,
"longValue": 1095420252
}
],
"waitNanos": 19474468
}
],
"startTime": 1487585439164,
"endTime": 1487586748101,
"memoryUsed": 0,
"maxMemoryUsed": 21979712,
"endpoint": {
"address": "xxx",
"userPort": 31010,
"controlPort": 31011,
"dataPort": 31012
},
"lastUpdate": 1487586748102,
"lastProgress": 1487586748102
}
]
}
],
"user": "anonymous"
}