Re: Version upgrade for Presto Integration to 0.186

2017-11-03 Thread Liang Chen
+1

Can you raise one PR for this.

Regards
Liang


bhavya411 wrote
> Hi All,
> 
> Presto 0.186 version has as lot of improvements that will increase the
> performance and improve the reliability. Some of the major issues and
> improvements are listed below.
> 
> 
>- Fix excessive GC overhead caused by map to map cast.
>- Fix issue that may cause queries containing expensive functions, such
>as regular expressions, to continue using CPU resources even after they
> are
>killed.
>- Fix performance issue caused by redundant casts
>- Fix leak in running query counter for failed queries. The counter
>would increment but never decrement for queries that failed before
> starting.
>- Reduce memory usage when building data of VARCHAR or VARBINARY types.
>- Estimate memory usage for GROUP BY more precisely to avoid out of
>memory errors.
>- Add Spill to Disk
> ;
>for joins.
> 
> Currently the Presto version that we are using in Carbondata is 0.166 , I
> would like to suggest to upgrade it to 0.186. Please let me know what the
> group thinks about it.
> 
> 
> Regards
> 
> Bhavya





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-11-03 Thread Naresh P R
Hi,

I was going through the design document & need some clarification.

1) Can we support creating agg tables while creating main table ?

2) Need some help in understanding "Support rollup table for timeseries
data",

i) Do we have any provision to specify timezone of timestamp column ?
ii) Assuming granularity is till hour level, below 4 agg table will be
created
*agg table 1 : *order_time, year, month, day, hour, country, sex,
sum(quantity), max(quantity), count(user_id),
sum(price), avg(price) group by order_time, year, month, day, hour,
country, sex -- here year, month, day & hour are derived from order_time
timestamp column
*agg table 2 : *order_time, year, month, day, country, sex, sum(quantity),
max(quantity), count(user_id),
sum(price), avg(price) group by order_time, year, month, day, country, sex
*agg table 3 :* order_time, year, month, country, sex, sum(quantity),
max(quantity), count(user_id),
sum(price), avg(price) group by order_time, year, month, country, sex
*agg table 4 : *order_time, year, country, sex, sum(quantity),
max(quantity), count(user_id),
sum(price), avg(price) group by order_time, year, country, sex

Please correct me if my understanding is wrong & provide more insight if
possible.

3) All operations like load, Segment LCM, IUD on Agg tables should be
restricted except select & compaction, right?

4) For new loads in parent table, we are creating new segments in agg table.
Can we have segment to segment mapping with parent & agg table, so that
operations like delete segments, update, delete will not be restricted in
parent table.
Or Can we have a table level flag whether to restrict delete segments,
update, delete operations on parent table if agg tables exist

5) I assume filters also will be pushed to agg tables if all conditions
match agg table columns
eg., CREATE TABLE agg_sales TBLPROPERTIES (parent_table=”xx”) AS SELECT c1,
c2, count(distinct c3) FROM source GROUP BY c1, c2
 SELECT c1, c2, count(distinct c3) FROM source where c1 = 'a' and c2 = 'x'
GROUP BY c1, c2 -- This will use agg table
 SELECT c1, c2, count(distinct c3) FROM source where c1 = 'a' and c2 = 'x'
and c3 = 'y' GROUP BY c1, c2 -- This will not use agg table

 Whether below query use agg table ?
 SELECT c1, c2, count(distinct c3) as x FROM source GROUP BY c1, c2 having
x > 10
Whether having clause will be converted to filter on count(distinct c3)
column on agg table or whether spark handles having clause ?

6) Instead of user identifying & creating agg tables, can we try to use any
ML or stats to understand queries/tables & suggest aggregates ?
eg., based on column stats

Regards,
Naresh P R

On Fri, Oct 13, 2017 at 8:03 PM, Jacky Li  wrote:

> Hi community,
>
> In traditional data warehouse, pre-aggregate table or cube is a common
> technology to improve OLAP query performance. To take carbondata support
> for OLAP to next level, I’d like to propose pre-aggregate table support in
> carbondata.
>
> Please refer to CARBONDATA-1516  jira/browse/CARBONDATA-1516> and the design document attached in the JIRA
> ticket (https://issues.apache.org/jira/browse/CARBONDATA-1516 <
> https://issues.apache.org/jira/browse/CARBONDATA-1516>)
>
> This design is still in initial phase, proposed usage and SQL syntax are
> subject to change. Please provide your comment to improve this feature.
> Any suggestion on the design from community is welcomed.
>
> Regards,
> Jacky Li


Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-03 Thread Naresh P R
Hi yixu2001,

>From hadoop code, i could see IOException("Delegation Token can be issued
only with kerberos or web authentication") is thrown only if authentication
method is set as "SIMPLE".

private boolean isAllowedDelegationTokenOp() throws IOException {
AuthenticationMethod authMethod = this.getConnectionAuthenticationMethod();
return !UserGroupInformation.isSecurityEnabled() || authMethod ==
AuthenticationMethod.KERBEROS || authMethod ==
AuthenticationMethod.KERBEROS_SSL || authMethod ==
AuthenticationMethod.CERTIFICATE;
}

Token getDelegationToken(Text renewer)
throws IOException {

if (!this.isAllowedDelegationTokenOp()) {
throw new IOException("Delegation Token can be issued only
with kerberos or web authentication");
}



}

Can you try to execute queries by copying core-site.xml from hadoop-conf
folder to spark2 conf folder & classpath of spark-submit?

>From the provided logs, i could
see carbondata_2.11-1.1.1-bdd-hadoop2.7.2.jar size is 9607344bytes, please
make sure it has only carbondata classes.

I could see Carbon explicitly calling
"TokenCache.obtainTokensForNamenodes" which
is throwing this exception.

If above mentioned steps dint work, you can raise a JIRA to investigate
further on this.

Regards,
Naresh P R

On Fri, Nov 3, 2017 at 3:10 PM, yixu2001  wrote:

> prnaresh.naresh, dev
>
>  The carbon jar I used does not include hadoop classes & core-site.xml.
> The attachment include the jar list while submtting
> spark job, please confirm it.
> --
> yixu2001
>
>
> *From:* Naresh P R 
> *Date:* 2017-11-03 16:07
> *To:* yixu2001 
> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or
> web authentication" will occur in yarn cluster
> Hi yixu2001,
>
> Are you using carbon shaded jar with hadoop classes & core-site.xml
> included in carbon jar ?
>
> If so, can you try to use carbondata individual component jars while
> submtting spark job?
>
> As per my understanding, this happens if client core-site.xml has
> hadoop.security.authentication=simple & hdfs is kerberized.
>
> You can also enable verbose to see hadoop jars used in the error trace
> while querying carbon tables.
>
> Also i am not sure whether CarbonData is tested in HDP kerberos cluster.
> ---
> Regards,
> Naresh P R
>
>
> On Fri, Nov 3, 2017 at 8:36 AM, yixu2001  wrote:
>
>> Naresh P R:
>>  For the attachments can not be uploaded in maillist,I have
>> add the attachments to the mail for you, please check it.
>>  Our platform is installed with HDP 2.4, but spark 2.1 is
>> not included in HDP 2.4, we using spark 2.1 with additiona
>> l installed of apache version.
>> --
>> yixu2001
>>
>>
>> *From:* Naresh P R 
>> *Date:* 2017-11-02 22:02
>> *To:* dev 
>> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or
>> web authentication" will occur in yarn cluster
>> Hi yixu,
>>
>> I am not able to see any attachment in your previous mail.
>> ---
>> Regards,
>> Naresh P R
>>
>> On Thu, Nov 2, 2017 at 4:40 PM, yixu2001  wrote:
>>
>>> dev
>>>  Please refer to the attachment "cluster carbon error2.txt"
>>> for the log trace.
>>> In this log, I try 2 query statements:
>>> select * from e_carbon.prod_inst_his   prod_inst_his is
>>> a hive table, it success.
>>> select * from e_carbon.prod_inst_his_c prod_inst_his_c i
>>> s a carbon table, it failed.
>>>
>>> I pass the principal in my start script, please refer to the
>>>  attachment "testCluster.sh
>>>
>>> ".
>>>
>>> I have set hive.server2.enable.doAs = false in the above tes
>>> t and I have printed it in the log.
>>> --
>>> yixu2001
>>>
>>>
>>> *From:* Naresh P R 
>>> *Date:* 2017-11-01 19:40
>>> *To:* dev 
>>> *Subject:* Re: Delegation Token can be issued only with kerberos or web
>>> authentication" will occur in yarn cluster
>>> Hi,
>>>
>>> Ideally kerberos authentication should work with carbon table, Can you
>>> share us log trace to analyze further more?
>>>
>>> how are you passing the principal in yarn cluster ?
>>>
>>> can you try to set hive.server2.enable.doAs = false & run query on carbon
>>> table ?
>>> 
>>> Regards,
>>> Naresh P R
>>>
>>> On Wed, Nov 1, 2017 at 3:33 PM, yixu2001  wrote:
>>>
>>> > dev
>>> >  I submit a spark application in mode yarn cluster to a cluster with
>>> > kerberos. In this application, it will successfully query a hive
>>> table, but
>>> > when it try to query a carbon table, it failed with infomation
>>> "Delegation
>>> > Token can be issued only with kerberos or web authentication".
>>> >
>>> > If I submit this application in mode yarn client, both hive table and
>>> > carbon table will both success.
>>> >
>>> > And If I submit this application in mode yarn cluster on another
>>> cluster
>>> > without kerberos, both hive table and carbon table will both success.
>>> >
>>> >
>>> > yixu2001
>>> >
>>>
>>>
>>
>


Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-03 Thread yixu2001
prnaresh.naresh, dev 
 
 The carbon jar I used does not include hadoop classes & core-site.xml.
The attachment include the jar list while submtting spark job, please confirm 
it. 


yixu2001
 
From: Naresh P R
Date: 2017-11-03 16:07
To: yixu2001
Subject: Re: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi yixu2001,

Are you using carbon shaded jar with hadoop classes & core-site.xml included in 
carbon jar ?

If so, can you try to use carbondata individual component jars while submtting 
spark job?

As per my understanding, this happens if client core-site.xml has 
hadoop.security.authentication=simple & hdfs is kerberized.

You can also enable verbose to see hadoop jars used in the error trace while 
querying carbon tables.

Also i am not sure whether CarbonData is tested in HDP kerberos cluster.
---
Regards,
Naresh P R


On Fri, Nov 3, 2017 at 8:36 AM, yixu2001  wrote:
Naresh P R: 
 For the attachments can not be uploaded in maillist,I have add the attachments 
to the mail for you, please check it. 
 Our platform is installed with HDP 2.4, but spark 2.1 is not included in HDP 
2.4, we using spark 2.1 with additional installed of apache version.


yixu2001
 
From: Naresh P R
Date: 2017-11-02 22:02
To: dev
Subject: Re: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi yixu,

I am not able to see any attachment in your previous mail.
---
Regards,
Naresh P R

On Thu, Nov 2, 2017 at 4:40 PM, yixu2001  wrote:
dev 
 Please refer to the attachment "cluster carbon error2.txt" for the log trace.
In this log, I try 2 query statements:
select * from e_carbon.prod_inst_his   prod_inst_his is a hive table, it 
success.
select * from e_carbon.prod_inst_his_c prod_inst_his_c is a carbon table, 
it failed.

I pass the principal in my start script, please refer to the attachment 
"testCluster.sh 

".

I have set hive.server2.enable.doAs = false in the above test and I have 
printed it in the log.


yixu2001
 
From: Naresh P R
Date: 2017-11-01 19:40
To: dev
Subject: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi,
 
Ideally kerberos authentication should work with carbon table, Can you
share us log trace to analyze further more?
 
how are you passing the principal in yarn cluster ?
 
can you try to set hive.server2.enable.doAs = false & run query on carbon
table ?

Regards,
Naresh P R
 
On Wed, Nov 1, 2017 at 3:33 PM, yixu2001  wrote:
 
> dev
>  I submit a spark application in mode yarn cluster to a cluster with
> kerberos. In this application, it will successfully query a hive table, but
> when it try to query a carbon table, it failed with infomation "Delegation
> Token can be issued only with kerberos or web authentication".
>
> If I submit this application in mode yarn client, both hive table and
> carbon table will both success.
>
> And If I submit this application in mode yarn cluster on another cluster
> without kerberos, both hive table and carbon table will both success.
>
>
> yixu2001
>




Re: Version upgrade for Presto Integration to 0.186

2017-11-03 Thread Sandeep Purohit
+1. SPI are backward incompatible so make  sure if you use the SPI for
presto integration you need to change following things :


   - Remove owner from ConnectorTableMetadata.
   - Replace the generic getServices() method in Plugin with specific
   methods such as getConnectorFactories(), getTypes(), etc. Dependencies
   like TypeManager are now provided directly rather than being injected
   into Plugin.
   - Add first-class support for functions in the SPI. This replaces the
   old FunctionFactory interface. Plugins can return a list of classes from
   the getFunctions() method:
  - Scalar functions are methods or classes annotated with
  @ScalarFunction.
  - Aggregation functions are methods or classes annotated with
  @AggregationFunction.
  - Window functions are an implementation of WindowFunction. Most
  implementations should be a subclass of RankingWindowFunction or
  ValueWindowFunction.

-Sandeep


On Thu, Nov 2, 2017 at 6:53 PM, Raghunandan S <
carbondatacontributi...@gmail.com> wrote:

> Any backward incompatibilities introduced?
> +1 for the upgrade
> On Thu, 2 Nov 2017 at 12:18 PM, Bhavya Aggarwal 
> wrote:
>
> > Hi All,
> >
> > Presto 0.186 version has as lot of improvements that will increase the
> > performance and improve the reliability. Some of the major issues and
> > improvements are listed below.
> >
> >
> >- Fix excessive GC overhead caused by map to map cast.
> >- Fix issue that may cause queries containing expensive functions,
> such
> >as regular expressions, to continue using CPU resources even after
> they
> > are
> >killed.
> >- Fix performance issue caused by redundant casts
> >- Fix leak in running query counter for failed queries. The counter
> >would increment but never decrement for queries that failed before
> > starting.
> >- Reduce memory usage when building data of VARCHAR or VARBINARY
> types.
> >- Estimate memory usage for GROUP BY more precisely to avoid out of
> >memory errors.
> >- Add Spill to Disk  current/admin/spill.html>
> >for joins.
> >
> > Currently the Presto version that we are using in Carbondata is 0.166 , I
> > would like to suggest to upgrade it to 0.186. Please let me know what the
> > group thinks about it.
> >
> >
> > Regards
> >
> > Bhavya
> >
>


Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-03 Thread yixu2001
Naresh P R: 
 For the attachments can not be uploaded in maillist,I have add the attachments 
to the mail for you, please check it. 
 Our platform is installed with HDP 2.4, but spark 2.1 is not included in HDP 
2.4, we using spark 2.1 with additional installed of apache version.


yixu2001
 
From: Naresh P R
Date: 2017-11-02 22:02
To: dev
Subject: Re: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi yixu,

I am not able to see any attachment in your previous mail.
---
Regards,
Naresh P R

On Thu, Nov 2, 2017 at 4:40 PM, yixu2001  wrote:
dev 
 Please refer to the attachment "cluster carbon error2.txt" for the log trace.
In this log, I try 2 query statements:
select * from e_carbon.prod_inst_his   prod_inst_his is a hive table, it 
success.
select * from e_carbon.prod_inst_his_c prod_inst_his_c is a carbon table, 
it failed.

I pass the principal in my start script, please refer to the attachment 
"testCluster.sh 

".

I have set hive.server2.enable.doAs = false in the above test and I have 
printed it in the log.


yixu2001
 
From: Naresh P R
Date: 2017-11-01 19:40
To: dev
Subject: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi,
 
Ideally kerberos authentication should work with carbon table, Can you
share us log trace to analyze further more?
 
how are you passing the principal in yarn cluster ?
 
can you try to set hive.server2.enable.doAs = false & run query on carbon
table ?

Regards,
Naresh P R
 
On Wed, Nov 1, 2017 at 3:33 PM, yixu2001  wrote:
 
> dev
>  I submit a spark application in mode yarn cluster to a cluster with
> kerberos. In this application, it will successfully query a hive table, but
> when it try to query a carbon table, it failed with infomation "Delegation
> Token can be issued only with kerberos or web authentication".
>
> If I submit this application in mode yarn client, both hive table and
> carbon table will both success.
>
> And If I submit this application in mode yarn cluster on another cluster
> without kerberos, both hive table and carbon table will both success.
>
>
> yixu2001
>