Re: [HANGOUT] Topics for 2017-02-21

2017-02-20 Thread Charles Givre
I was also wondering what is the status of the ElasticSearch plugin?
Thanks,
— C

> On Feb 20, 2017, at 14:49, Shankar Mane  wrote:
> 
> any plans to support Hive 2.x line?
> 
> On 21-Feb-2017 12:40 AM, "Paul Rogers"  wrote:
> 
>> Hi All,
>> 
>> Our bi-weekly hangout is tomorrow (2017-02-21, 10 AM PT). Please respond
>> with suggested topics. We will also ask for additional topics at the
>> beginning of the hangout.
>> 
>> One topic I’d like to suggest: how we can make Drill even more stable than
>> it already is? Suggestions for focus areas? Tests? Particular JIRA tickets?
>> Other ideas?
>> 
>> Thanks,
>> 
>> - Paul



Re: [HANGOUT] Topics for 2017-02-21

2017-02-20 Thread Shankar Mane
any plans to support Hive 2.x line?

On 21-Feb-2017 12:40 AM, "Paul Rogers"  wrote:

> Hi All,
>
> Our bi-weekly hangout is tomorrow (2017-02-21, 10 AM PT). Please respond
> with suggested topics. We will also ask for additional topics at the
> beginning of the hangout.
>
> One topic I’d like to suggest: how we can make Drill even more stable than
> it already is? Suggestions for focus areas? Tests? Particular JIRA tickets?
> Other ideas?
>
> Thanks,
>
> - Paul


RE: Query on performance using Drill and Amazon s3.

2017-02-20 Thread Chetan Kothari
My query is generic

What I am asking is that does drill fire query on target data store and only 
fetch result or does it fetch data and then fire query ?

 

Regards

Chetan

 

-Original Message-
From: Nitin Pawar [mailto:nitinpawar...@gmail.com] 
Sent: Monday, February 20, 2017 8:14 PM
To: user@drill.apache.org
Subject: RE: Query on performance using Drill and Amazon s3.

 

Hi chetan,

 

Projjwal has the issue. Me too asked the same question

 

On Feb 20, 2017 7:56 PM, "Chetan Kothari" mailto:chetan.koth...@oracle.com"chetan.koth...@oracle.com> wrote:

 

> Hi Nitin

> 

> 

> 

> Where does the query execute?

> 

> Does Drill execute query on AWS and fetch results to be displayed?

> 

> 

> 

> Regards

> 

> Chetan

> 

> 

> 

> -Original Message-

> From: Nitin Pawar [mailto:nitinpawar...@gmail.com]

> Sent: Monday, February 20, 2017 6:19 PM

> To: HYPERLINK "mailto:user@drill.apache.org"user@drill.apache.org

> Subject: Re: Query on performance using Drill and Amazon s3.

> 

> 

> 

> how are you doing select * .. using drill UI or sqlline?

> 

> where are you running it from ?

> 

> is the drill hosted in aws or on your local machine?

> 

> 

> 

> I think majority of the time is spent on displaying the result set 

> instead of querying the file if the drill server is on aws.

> 

> If the drill server is local then it might be your network which might 

> take a lot of time based on s3 bucket location and where your drill 

> server is

> 

> 

> 

> On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA  HYPERLINK 
> "mailto:proj.s...@gmail.com%22proj.s...@gmail.com"proj.s...@gmail.com"proj.s...@gmail.com>
>  wrote:

> 

> 

> 

> > Hello all,

> 

> >

> 

> > I am using 1GB data in the form of .tsv file, stored in Amazon S3

> 

> > using Drill 1.8. I am using default configurations of Drill using S3

> 

> > storage plugin coming out of the box. The drill bits are configured 

> > on

> 

> > a 5 node cluster with 32GB RAM and 4VCPU.

> 

> >

> 

> > I see that select * from xxx; query takes 23 mins to fetch 1,040,000

> rows.

> 

> >

> 

> > Is this the expected behaviour ?

> 

> > I am looking for any quick tuning that can improve the performance 

> > or

> 

> > any other suggestions.

> 

> >

> 

> > Attaching is the JSON profile for this query.

> 

> >

> 

> > Regards,

> 

> > Projjwal

> 

> >

> 

> 

> 

> 

> 

> 

> 

> --

> 

> Nitin Pawar

> 

> 

> 

 


RE: Query on performance using Drill and Amazon s3.

2017-02-20 Thread Shankar Mane
1. how much memory have u configured for drill?
2. what about network bandwidth between your s3 and cluster?

On 20-Feb-2017 8:14 PM, "Nitin Pawar"  wrote:

> Hi chetan,
>
> Projjwal has the issue. Me too asked the same question
>
> On Feb 20, 2017 7:56 PM, "Chetan Kothari" 
> wrote:
>
> > Hi Nitin
> >
> >
> >
> > Where does the query execute?
> >
> > Does Drill execute query on AWS and fetch results to be displayed?
> >
> >
> >
> > Regards
> >
> > Chetan
> >
> >
> >
> > -Original Message-
> > From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
> > Sent: Monday, February 20, 2017 6:19 PM
> > To: user@drill.apache.org
> > Subject: Re: Query on performance using Drill and Amazon s3.
> >
> >
> >
> > how are you doing select * .. using drill UI or sqlline?
> >
> > where are you running it from ?
> >
> > is the drill hosted in aws or on your local machine?
> >
> >
> >
> > I think majority of the time is spent on displaying the result set
> instead
> > of querying the file if the drill server is on aws.
> >
> > If the drill server is local then it might be your network which might
> > take a lot of time based on s3 bucket location and where your drill
> server
> > is
> >
> >
> >
> > On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA  > proj.s...@gmail.com"proj.s...@gmail.com> wrote:
> >
> >
> >
> > > Hello all,
> >
> > >
> >
> > > I am using 1GB data in the form of .tsv file, stored in Amazon S3
> >
> > > using Drill 1.8. I am using default configurations of Drill using S3
> >
> > > storage plugin coming out of the box. The drill bits are configured on
> >
> > > a 5 node cluster with 32GB RAM and 4VCPU.
> >
> > >
> >
> > > I see that select * from xxx; query takes 23 mins to fetch 1,040,000
> > rows.
> >
> > >
> >
> > > Is this the expected behaviour ?
> >
> > > I am looking for any quick tuning that can improve the performance or
> >
> > > any other suggestions.
> >
> > >
> >
> > > Attaching is the JSON profile for this query.
> >
> > >
> >
> > > Regards,
> >
> > > Projjwal
> >
> > >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Nitin Pawar
> >
> >
> >
>


RE: Query on performance using Drill and Amazon s3.

2017-02-20 Thread Nitin Pawar
Hi chetan,

Projjwal has the issue. Me too asked the same question

On Feb 20, 2017 7:56 PM, "Chetan Kothari"  wrote:

> Hi Nitin
>
>
>
> Where does the query execute?
>
> Does Drill execute query on AWS and fetch results to be displayed?
>
>
>
> Regards
>
> Chetan
>
>
>
> -Original Message-
> From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
> Sent: Monday, February 20, 2017 6:19 PM
> To: user@drill.apache.org
> Subject: Re: Query on performance using Drill and Amazon s3.
>
>
>
> how are you doing select * .. using drill UI or sqlline?
>
> where are you running it from ?
>
> is the drill hosted in aws or on your local machine?
>
>
>
> I think majority of the time is spent on displaying the result set instead
> of querying the file if the drill server is on aws.
>
> If the drill server is local then it might be your network which might
> take a lot of time based on s3 bucket location and where your drill server
> is
>
>
>
> On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA  proj.s...@gmail.com"proj.s...@gmail.com> wrote:
>
>
>
> > Hello all,
>
> >
>
> > I am using 1GB data in the form of .tsv file, stored in Amazon S3
>
> > using Drill 1.8. I am using default configurations of Drill using S3
>
> > storage plugin coming out of the box. The drill bits are configured on
>
> > a 5 node cluster with 32GB RAM and 4VCPU.
>
> >
>
> > I see that select * from xxx; query takes 23 mins to fetch 1,040,000
> rows.
>
> >
>
> > Is this the expected behaviour ?
>
> > I am looking for any quick tuning that can improve the performance or
>
> > any other suggestions.
>
> >
>
> > Attaching is the JSON profile for this query.
>
> >
>
> > Regards,
>
> > Projjwal
>
> >
>
>
>
>
>
>
>
> --
>
> Nitin Pawar
>
>
>


RE: Query on performance using Drill and Amazon s3.

2017-02-20 Thread Chetan Kothari
Hi Nitin

 

Where does the query execute?

Does Drill execute query on AWS and fetch results to be displayed?

 

Regards

Chetan

 

-Original Message-
From: Nitin Pawar [mailto:nitinpawar...@gmail.com] 
Sent: Monday, February 20, 2017 6:19 PM
To: user@drill.apache.org
Subject: Re: Query on performance using Drill and Amazon s3.

 

how are you doing select * .. using drill UI or sqlline?

where are you running it from ?

is the drill hosted in aws or on your local machine?

 

I think majority of the time is spent on displaying the result set instead of 
querying the file if the drill server is on aws.

If the drill server is local then it might be your network which might take a 
lot of time based on s3 bucket location and where your drill server is

 

On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA mailto:proj.s...@gmail.com"proj.s...@gmail.com> wrote:

 

> Hello all,

> 

> I am using 1GB data in the form of .tsv file, stored in Amazon S3 

> using Drill 1.8. I am using default configurations of Drill using S3 

> storage plugin coming out of the box. The drill bits are configured on 

> a 5 node cluster with 32GB RAM and 4VCPU.

> 

> I see that select * from xxx; query takes 23 mins to fetch 1,040,000 rows.

> 

> Is this the expected behaviour ?

> I am looking for any quick tuning that can improve the performance or 

> any other suggestions.

> 

> Attaching is the JSON profile for this query.

> 

> Regards,

> Projjwal

> 

 

 

 

--

Nitin Pawar

 


Google Interactive Charts

2017-02-20 Thread Sanjiv Kumar C
-- 
Thanks & Regards
  * Sanjiv Kumar*


Re: Query on performance using Drill and Amazon s3.

2017-02-20 Thread Nitin Pawar
how are you doing select * .. using drill UI or sqlline?
where are you running it from ?
is the drill hosted in aws or on your local machine?

I think majority of the time is spent on displaying the result set instead
of querying the file if the drill server is on aws.
If the drill server is local then it might be your network which might take
a lot of time based on s3 bucket location and where your drill server is

On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA  wrote:

> Hello all,
>
> I am using 1GB data in the form of .tsv file, stored in Amazon S3 using
> Drill 1.8. I am using default configurations of Drill using S3 storage
> plugin coming out of the box. The drill bits are configured on a 5 node
> cluster with 32GB RAM and 4VCPU.
>
> I see that select * from xxx; query takes 23 mins to fetch 1,040,000 rows.
>
> Is this the expected behaviour ?
> I am looking for any quick tuning that can improve the performance or any
> other suggestions.
>
> Attaching is the JSON profile for this query.
>
> Regards,
> Projjwal
>



-- 
Nitin Pawar


Query on performance using Drill and Amazon s3.

2017-02-20 Thread PROJJWAL SAHA
Hello all,

I am using 1GB data in the form of .tsv file, stored in Amazon S3 using
Drill 1.8. I am using default configurations of Drill using S3 storage
plugin coming out of the box. The drill bits are configured on a 5 node
cluster with 32GB RAM and 4VCPU.

I see that select * from xxx; query takes 23 mins to fetch 1,040,000 rows.

Is this the expected behaviour ?
I am looking for any quick tuning that can improve the performance or any
other suggestions.

Attaching is the JSON profile for this query.

Regards,
Projjwal
{
"id": {
"part1": 2834241350655354400,
"part2": -4719640768589854000
},
"type": 1,
"start": 1487585409966,
"end": 1487586748105,
"query": "select * from `xxx`",
"plan": "00-00Screen : rowType = RecordType(ANY *): rowcount = 
1.0704562E7, cumulative cost = {1.17750182E7 rows, 1.17750182E7 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 187\n00-01  Project(*=[$0]) : rowType = 
RecordType(ANY *): rowcount = 1.0704562E7, cumulative cost = {1.0704562E7 rows, 
1.0704562E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 186\n00-02
Scan(groupscan=[EasyGroupScan [selectionRoot=s3a://xxx.tsv, numFiles=1, 
columns=[`*`], files=[s3a://xxx.tsv]]]) : rowType = (DrillRecordRow[*]): 
rowcount = 1.0704562E7, cumulative cost = {1.0704562E7 rows, 1.0704562E7 cpu, 
0.0 io, 0.0 network, 0.0 memory}, id = 185\n",
"foreman": {
"address": "xxx",
"userPort": 31010,
"controlPort": 31011,
"dataPort": 31012
},
"state": 2,
"totalFragments": 1,
"finishedFragments": 0,
"fragmentProfile": [
{
"majorFragmentId": 0,
"minorFragmentProfile": [
{
"state": 3,
"minorFragmentId": 0,
"operatorProfile": [
{
"inputProfile": [
{
"records": 104,
"batches": 129,
"schemas": 1
}
],
"operatorId": 2,
"operatorType": 28,
"setupNanos": 0,
"processNanos": 50858446809,
"peakLocalMemoryAllocated": 15646720,
"waitNanos": 1257947908700
},
{
"inputProfile": [
{
"records": 104,
"batches": 129,
"schemas": 1
}
],
"operatorId": 1,
"operatorType": 10,
"setupNanos": 3929932,
"processNanos": 26307751,
"peakLocalMemoryAllocated": 9142272,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 104,
"batches": 129,
"schemas": 1
}
],
"operatorId": 0,
"operatorType": 13,
"setupNanos": 0,
"processNanos": 38391526,
"peakLocalMemoryAllocated": 9142272,
"metric": [
{
"metricId": 0,
"longValue": 1095420252
}
],
"waitNanos": 19474468
}
],
"startTime": 1487585439164,
"endTime": 1487586748101,
"memoryUsed": 0,
"maxMemoryUsed": 21979712,
"endpoint": {
"address": "xxx",
"userPort": 31010,
"controlPort": 31011,
"dataPort": 31012
},
"lastUpdate": 1487586748102,
"lastProgress": 1487586748102
}
]
}
],
"user": "anonymous"
}