[jira] [Created] (CARBONDATA-423) Added Example to Load Data to carbon Table using case class

2016-11-17 Thread Sangeeta Gulia (JIRA)
Sangeeta Gulia created CARBONDATA-423:
-

 Summary: Added Example to Load Data to carbon Table using case 
class
 Key: CARBONDATA-423
 URL: https://issues.apache.org/jira/browse/CARBONDATA-423
 Project: CarbonData
  Issue Type: Improvement
Reporter: Sangeeta Gulia
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Please vote and advise on building thrift files

2016-11-17 Thread sujith chacko
+1 for first approach.

On Nov 17, 2016 9:58 AM, "金铸"  wrote:

> +1 for proposal 1
>
>
> 在 2016/11/17 12:13, 邢冰 写道:
>
>> +1 for proposal 1
>>
>> thx
>>
>>
>>
>>
>> 发自网易邮箱大师
>> On 11/17/2016 12:09, Ravindra Pesala wrote:
>> +1 for proposal 1
>>
>> On 17 November 2016 at 08:23, Xiaoqiao He  wrote:
>>
>> +1 for proposal 1.
>>>
>>> On Thu, Nov 17, 2016 at 10:31 AM, ZhuWilliam 
>>> wrote:
>>>
>>> +1 for proposal 1 .

 Auto generated code should not be added to project. Also most the of
 time
 ,people who dive into carbondata may not touch format code.



 --
 View this message in context: http://apache-carbondata-
 mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-
 advise-on-building-thrift-files-tp2952p2957.html
 Sent from the Apache CarbonData Mailing List archive mailing list
 archive
 at Nabble.com.


>>
>> --
>> Thanks & Regards,
>> Ravi
>>
>
>
>
>
> 
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,
> storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
> 
> ---
>


回复: Please vote and advise on building thrift files

2016-11-17 Thread Sea
+1 for first proposal too.




-- 原始邮件 --
发件人: "Jean-Baptiste Onofr ";;
发送时间: 2016年11月18日(星期五) 中午11:11
收件人: "dev"; 

主题: Re: Please vote and advise on building thrift files



+1 for first proposal too.

Regards
JB

⁣​

On Nov 18, 2016, 04:05, at 04:05, hseagle  wrote:
>+ vote for proposal 1
>
>
>
>--
>View this message in context:
>http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-advise-on-building-thrift-files-tp2952p2989.html
>Sent from the Apache CarbonData Mailing List archive mailing list
>archive at Nabble.com.

Re: Please vote and advise on building thrift files

2016-11-17 Thread Jean-Baptiste Onofré
+1 for first proposal too.

Regards
JB

⁣​

On Nov 18, 2016, 04:05, at 04:05, hseagle  wrote:
>+ vote for proposal 1
>
>
>
>--
>View this message in context:
>http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-advise-on-building-thrift-files-tp2952p2989.html
>Sent from the Apache CarbonData Mailing List archive mailing list
>archive at Nabble.com.


Re: Please vote and advise on building thrift files

2016-11-17 Thread hseagle
+ vote for proposal 1



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-advise-on-building-thrift-files-tp2952p2989.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: Please vote and advise on building thrift files

2016-11-17 Thread Aniket Adnaik
+1 for Proposal 1

Best Regards,
Aniket

On Wed, Nov 16, 2016 at 10:34 AM, Liang Chen 
wrote:

> Hi all
>
> Please vote the below proposals or advise other better proposal on building
> thrift files.
>
>
> 
> ---
>
> CarbonData is file format and introduce Apache thrift for supporting
> multiple languages and any language can read the file format written by
> thrift.
>
>
>
> Following are 2 proposals to build thrift file
>
>
>
> *Proposal 1:* provide dependency in maven pom file.
>
> For people(IPMC,PPMC, and others)who want to verify release need to install
> thrift for building with this command "mvn clean -Pbuild-with-format
> install".
>
> For general users: no need to install thrift and build with command "mvn
> clean install", because the dependency provided in maven pom file(already
> uploaded the compiled format jar to snapshot repository for dev, and to
> release repository once released)
>
>
>
> pros: use jar dependency, easy to manage.
>
> cons: only verify release need to install thrift.
>
>
>
> *Proposal 2: *generate thrift java code and add to system for building
> without thrift installation
>
>
>
> For people(IPMC,PPMC,and others) who want to verify release,no need to
> install thrift and build with command "mvn clean install".
>
>
> pros: no thrift installation for verify release
>
> cons: introduce around 10k java code to system, for every language(like
> C++),need introduce more code.
>
>
> Regards
> Liang
>


RE: Please vote and advise on building thrift files

2016-11-17 Thread Jihong Ma
+1 for proposal 1.

Jihong

-Original Message-
From: Anurag Srivastava [mailto:anu...@knoldus.com] 
Sent: Thursday, November 17, 2016 2:32 AM
To: dev@carbondata.incubator.apache.org
Subject: Re: Please vote and advise on building thrift files

+1 for proposal 1

On Thu, Nov 17, 2016 at 3:56 PM, Kumar Vishal 
wrote:

> +1 for proposal 1
>
> -Regards
> Kumar Vishal
>
> On Nov 17, 2016 09:58, "金铸"  wrote:
>
> > +1 for proposal 1
> >
> >
> > 在 2016/11/17 12:13, 邢冰 写道:
> >
> >> +1 for proposal 1
> >>
> >> thx
> >>
> >>
> >>
> >>
> >> 发自网易邮箱大师
> >> On 11/17/2016 12:09, Ravindra Pesala wrote:
> >> +1 for proposal 1
> >>
> >> On 17 November 2016 at 08:23, Xiaoqiao He  wrote:
> >>
> >> +1 for proposal 1.
> >>>
> >>> On Thu, Nov 17, 2016 at 10:31 AM, ZhuWilliam 
> >>> wrote:
> >>>
> >>> +1 for proposal 1 .
> 
>  Auto generated code should not be added to project. Also most the of
>  time
>  ,people who dive into carbondata may not touch format code.
> 
> 
> 
>  --
>  View this message in context: http://apache-carbondata-
>  mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-
>  advise-on-building-thrift-files-tp2952p2957.html
>  Sent from the Apache CarbonData Mailing List archive mailing list
>  archive
>  at Nabble.com.
> 
> 
> >>
> >> --
> >> Thanks & Regards,
> >> Ravi
> >>
> >
> >
> >
> >
> > 
> > ---
> > Confidentiality Notice: The information contained in this e-mail and any
> > accompanying attachment(s)
> > is intended only for the use of the intended recipient and may be
> > confidential and/or privileged of
> > Neusoft Corporation, its subsidiaries and/or its affiliates. If any
> reader
> > of this communication is
> > not the intended recipient, unauthorized use, forwarding, printing,
> > storing, disclosure or copying
> > is strictly prohibited, and may be unlawful.If you have received this
> > communication in error,please
> > immediately notify the sender by return e-mail, and delete the original
> > message and all copies from
> > your system. Thank you.
> > 
> > ---
> >
>



-- 
*Thanks®ards*


*Anurag Srivastava**Software Consultant*
*Knoldus Software LLP*

*India - US - Canada*
* Twitter  | FB
 | LinkedIn
*


?????? [apache/incubator-carbondata] [CARBONDATA-374] Support smallint(#328)

2016-11-17 Thread Sea
Carbon only support bigint, double, decimal, string now, if user use smallint, 
it will throw exception.




--  --
??: "Hexiaoqiao";;
: 2016??11??18??(??) 0:22
??: "apache/incubator-carbondata"; 
: "Sea"<261810...@qq.com>; "Mention"; 
: Re: [apache/incubator-carbondata] [CARBONDATA-374] Support smallint(#328)




@cenyuhai 
 Do you think we need to keep SMALLINT, and SHORT as well? Maybe it makes 
confusion since some users have used SHORT, then occur some Incompatible issue 
after upgrade. 
 Please correct me if i am wrong.
 
??
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

[GitHub] incubator-carbondata pull request #329: [CARBONDATA-388] Remove useless file...

2016-11-17 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/incubator-carbondata/pull/329

[CARBONDATA-388] Remove useless file CarbonFileFolderComparator.java

Remove useless file CarbonFileFolderComparator.java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/incubator-carbondata CARBONDATA-388

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #329


commit a2286ba1037e9a677712e13e2e92365844e97a88
Author: cenyuhai <261810...@qq.com>
Date:   2016-11-17T16:16:26Z

remove useless file CarbonFileFolderComparator.java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #328: [CARBONDATA-374] Support smallint

2016-11-17 Thread cenyuhai
GitHub user cenyuhai opened a pull request:

https://github.com/apache/incubator-carbondata/pull/328

[CARBONDATA-374] Support smallint

Why raise this pull request?
Support smallint.

How to test?
Test with TestLoadDataWithHiveSyntax

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/incubator-carbondata CARBONDATA-374

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/328.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #328


commit cde380c6aa1f2338ec71a22f857aef0cd011ce41
Author: cenyuhai <261810...@qq.com>
Date:   2016-11-17T15:49:41Z

support smallint




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #293: [CARBONDATA-374] Support smallint ty...

2016-11-17 Thread cenyuhai
Github user cenyuhai closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/293


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #327: [CARBONDATA-421]Time Stamp Filter is...

2016-11-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/327


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-422) [Bad Records]Select query failed with "NullPointerException" after data-load with options as MAXCOLUMN and BAD_RECORDS_ACTION

2016-11-17 Thread SOURYAKANTA DWIVEDY (JIRA)
SOURYAKANTA DWIVEDY created CARBONDATA-422:
--

 Summary: [Bad Records]Select query failed with 
"NullPointerException" after data-load with options as MAXCOLUMN and 
BAD_RECORDS_ACTION
 Key: CARBONDATA-422
 URL: https://issues.apache.org/jira/browse/CARBONDATA-422
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 0.1.1-incubating
 Environment: 3 node Cluster
Reporter: SOURYAKANTA DWIVEDY
Priority: Minor


Description : Select query failed with "NullPointerException" after data-load 
with options as MAXCOLUMN and BAD_RECORDS_ACTION
Steps:
1. Create table
2. Load data into table with BAD_RECORDS_ACTION option [ Create Table -- 
columns -9 ,CSV coulmn - 10 , Header - 9]
3. Do select * query ,it will pass
 4. Then Load data into table with BAD_RECORDS_ACTION and MAXCOLUMN option [ 
Create Table -- columns -9 ,CSV coulmn - 10 , Header - 9,MAXCOLUMNS -- 9]
5. Do select * query ,it will fail with "NullPointerException"

Log :- 
---
0: jdbc:hive2://ha-cluster/default> create table emp3(ID int,Name string,DOJ 
timestamp,Designation string,Salary double,Dept string,DOB timestamp,Addr 
string,Gender string) STORED BY 'org.apache.carbondata.format';
+-+--+
| result |
+-+--+
+-+--+
No rows selected (0.589 seconds)
0: jdbc:hive2://ha-cluster/default> LOAD DATA inpath 
'hdfs://hacluster/chetan/emp11.csv' into table emp3 options('DELIMITER'=',', 
'QUOTECHAR'='"','FILEHEADER'='ID,Name,DOJ,Designation,Salary,Dept,DOB,Addr,Gender',
 'BAD_RECORDS_ACTION'='FORCE');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (2.415 seconds)
0: jdbc:hive2://ha-cluster/default> select * from emp3;
+---+---+---+--+--+---+---++-+--+
| id | name | doj | designation | salary | dept | dob | addr | gender |
+---+---+---+--+--+---+---++-+--+
| NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | AAA | NULL | Trainee | 1.0 | IT | NULL | Pune | Male |
| 2 | BBB | NULL | SE | 3.0 | NW | NULL | Bangalore | Female |
| 3 | CCC | NULL | SSE | 4.0 | DATA | NULL | Mumbai | Female |
| 4 | DDD | NULL | TL | 6.0 | OPER | NULL | Delhi | Male |
| 5 | EEE | NULL | STL | 8.0 | MAIN | NULL | Chennai | Female |
| 6 | FFF | NULL | Trainee | 1.0 | IT | NULL | Pune | Male |
| 7 | GGG | NULL | SE | 3.0 | NW | NULL | Bangalore | Female |
| 8 | HHH | NULL | SSE | 4.0 | DATA | NULL | Mumbai | Female |
| 9 | III | NULL | TL | 6.0 | OPER | NULL | Delhi | Male |
| 10 | JJJ | NULL | STL | 8.0 | MAIN | NULL | Chennai | Female |
| NULL | Name | NULL | Designation | NULL | Dept | NULL | Addr | Gender |
+---+---+---+--+--+---+---++-+--+
12 rows selected (0.418 seconds)
0: jdbc:hive2://ha-cluster/default> LOAD DATA inpath 
'hdfs://hacluster/chetan/emp11.csv' into table emp3 options('DELIMITER'=',', 
'QUOTECHAR'='"','FILEHEADER'='ID,Name,DOJ,Designation,Salary,Dept,DOB,Addr,Gender','MAXCOLUMNS'='9',
 'BAD_RECORDS_ACTION'='FORCE');
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (1.424 seconds)
0: jdbc:hive2://ha-cluster/default> select * from emp3;
Error: java.io.IOException: java.lang.NullPointerException (state=,code=0)
0: jdbc:hive2://ha-cluster/default>





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-421) Timestamp data type filter issue with format other than "-"

2016-11-17 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-421:
---

 Summary: Timestamp data type filter issue with format other than 
"-"
 Key: CARBONDATA-421
 URL: https://issues.apache.org/jira/browse/CARBONDATA-421
 Project: CarbonData
  Issue Type: Bug
Reporter: kumar vishal
Assignee: kumar vishal


Problem: When time format is /mm/dd other than "-" filter query is not 
working , As in filter only "-" is allowed user need to give the filter value 
is "-" but as data loaded in in "/" filter is not working and returning 0 
result. 
Soluntion: Problem is in filter we are taking default format but we need to 
take format used during data loaded while converting the filter values to 
surrogate key



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #327: [CARBONDATA-421]Time Stamp Filter is...

2016-11-17 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/327

[CARBONDATA-421]Time Stamp Filter issue with other than -mm-dd format

Problem: When time format is /mm/dd other than "-" filter query is not 
working , As in filter only "-" is allowed user need to give the filter value 
is "-" but as data loaded in in "/" filter is not working and returning 0 
result. 
Soluntion: Problem is in filter we are taking default format but we need to 
take format used during data loaded while converting the filter values to 
surrogate key

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
TimeStampFilterIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #327


commit f6f0d4caecd7e3fd326de309e3bf43b7095d1e9c
Author: kumarvishal 
Date:   2016-11-17T12:59:49Z

Time Stamp Filter issue with other than -mm-dd format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #326: [Carbondata-371] Write unit test for...

2016-11-17 Thread harmeetsingh0013
GitHub user harmeetsingh0013 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/326

[Carbondata-371] Write unit test for ColumnDictionaryInfo

Unit test cases for class file 
org.apache.carbondata.core.cache.dictionary.ColumnDictionaryInfo.java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/harmeetsingh0013/incubator-carbondata 
CARBONDATA-371

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/326.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #326


commit 251572db18809cb584cda4ee166523b6683b7ba1
Author: harmeetsingh0013 
Date:   2016-11-16T11:49:03Z

Add unit test cases for ColumnDictionaryInfo class

commit 6e7a84216fba0052479a33bcd9dd858627decbd0
Author: harmeetsingh0013 
Date:   2016-11-17T07:32:05Z

Add unit test cases for ColumnDictionaryInfo class

commit 5b7a57bb98c8be8de47d4f14e44405ee5b9ce84e
Author: harmeetsingh0013 
Date:   2016-11-17T10:53:50Z

Add unit test cases for ColumnDictionaryInfo class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #325: [CARBONDATA-418]Fixed data loading p...

2016-11-17 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/325

[CARBONDATA-418]Fixed data loading performance issue

Problem: In CarbonCSVBasedSeqGenStep for each row dimension column ids 
string converted from string to String array . As split function being called 
it will create string object for each row and it will impact the data loading 
performance.
Soluntion: Create a instance variable and in process row method get in 
column id when first row is passed to this step



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
DataLoadingPerformanceIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #325


commit 024e9be4b63bbd3b5ada639d79b7733e743ca32e
Author: kumarvishal 
Date:   2016-11-16T13:02:25Z

Fixed data loading performance issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Please vote and advise on building thrift files

2016-11-17 Thread Anurag Srivastava
+1 for proposal 1

On Thu, Nov 17, 2016 at 3:56 PM, Kumar Vishal 
wrote:

> +1 for proposal 1
>
> -Regards
> Kumar Vishal
>
> On Nov 17, 2016 09:58, "金铸"  wrote:
>
> > +1 for proposal 1
> >
> >
> > 在 2016/11/17 12:13, 邢冰 写道:
> >
> >> +1 for proposal 1
> >>
> >> thx
> >>
> >>
> >>
> >>
> >> 发自网易邮箱大师
> >> On 11/17/2016 12:09, Ravindra Pesala wrote:
> >> +1 for proposal 1
> >>
> >> On 17 November 2016 at 08:23, Xiaoqiao He  wrote:
> >>
> >> +1 for proposal 1.
> >>>
> >>> On Thu, Nov 17, 2016 at 10:31 AM, ZhuWilliam 
> >>> wrote:
> >>>
> >>> +1 for proposal 1 .
> 
>  Auto generated code should not be added to project. Also most the of
>  time
>  ,people who dive into carbondata may not touch format code.
> 
> 
> 
>  --
>  View this message in context: http://apache-carbondata-
>  mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-
>  advise-on-building-thrift-files-tp2952p2957.html
>  Sent from the Apache CarbonData Mailing List archive mailing list
>  archive
>  at Nabble.com.
> 
> 
> >>
> >> --
> >> Thanks & Regards,
> >> Ravi
> >>
> >
> >
> >
> >
> > 
> > ---
> > Confidentiality Notice: The information contained in this e-mail and any
> > accompanying attachment(s)
> > is intended only for the use of the intended recipient and may be
> > confidential and/or privileged of
> > Neusoft Corporation, its subsidiaries and/or its affiliates. If any
> reader
> > of this communication is
> > not the intended recipient, unauthorized use, forwarding, printing,
> > storing, disclosure or copying
> > is strictly prohibited, and may be unlawful.If you have received this
> > communication in error,please
> > immediately notify the sender by return e-mail, and delete the original
> > message and all copies from
> > your system. Thank you.
> > 
> > ---
> >
>



-- 
*Thanks®ards*


*Anurag Srivastava**Software Consultant*
*Knoldus Software LLP*

*India - US - Canada*
* Twitter  | FB
 | LinkedIn
*


Re: Please vote and advise on building thrift files

2016-11-17 Thread Kumar Vishal
+1 for proposal 1

-Regards
Kumar Vishal

On Nov 17, 2016 09:58, "金铸"  wrote:

> +1 for proposal 1
>
>
> 在 2016/11/17 12:13, 邢冰 写道:
>
>> +1 for proposal 1
>>
>> thx
>>
>>
>>
>>
>> 发自网易邮箱大师
>> On 11/17/2016 12:09, Ravindra Pesala wrote:
>> +1 for proposal 1
>>
>> On 17 November 2016 at 08:23, Xiaoqiao He  wrote:
>>
>> +1 for proposal 1.
>>>
>>> On Thu, Nov 17, 2016 at 10:31 AM, ZhuWilliam 
>>> wrote:
>>>
>>> +1 for proposal 1 .

 Auto generated code should not be added to project. Also most the of
 time
 ,people who dive into carbondata may not touch format code.



 --
 View this message in context: http://apache-carbondata-
 mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-
 advise-on-building-thrift-files-tp2952p2957.html
 Sent from the Apache CarbonData Mailing List archive mailing list
 archive
 at Nabble.com.


>>
>> --
>> Thanks & Regards,
>> Ravi
>>
>
>
>
>
> 
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,
> storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
> 
> ---
>


Re: GC problem and performance refine problem

2016-11-17 Thread An Lan
Hi Kumar Vishal,

The accessory for the last email is on the bottom, Pasted as url. The log
is too big to send. I put it on good driver and paste the link.

2016-11-17 16:59 GMT+08:00 An Lan :

> Hi Kumar Vishal,
>
>
>
> I redo some experiment with a detail driver log. The logs are in the
> accessory. And every log run the same query for 10 times.
>
>
>
> 1.   “driverlog1” is under same condition as previous I did.
>
> 2.   “driverlog2”: in this experiment, before load data, I shuffle
> the csv file to make the rows with same attribute will distribute uniform
> between all csv files. i.e. if we use the query filter on the origin csv
> file, the result quantity is almost same for every csv file. And every csv
> file is one block. So every partition for the loader of carbondata has
> uniform data. That method works well:
>
> Time cost of first experiment: 103543, 49363, 125430, 22138,
> 21626, 27905, 22086, 21332, 23501, 16715
>
> Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
> 16014, 15452, 17440, 15563
>
> 3.   “driverlog3”: First, use the shuffled csv data as 2th
> experiment. Second, modify the create table sql:
>
>
>
> *OLD CREATE TABLE:*
>
> CREATE TABLE IF NOT EXISTS Table1
>
> (* h Int, g Int, d String, f Int, e Int,*
>
> a Int, b Int, …(extra near 300 columns)
>
> STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(
>
> "NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,
>
> …(extra near 300 columns)
>
> "DICTIONARY_INCLUDE”=“a”,
>
> "DICTIONARY_INCLUDE”=“b”,
>
> …(extra near 300 columns)
>
> )
>
>
>
> *NEW CREATE TABLE:*
>
> CREATE TABLE IF NOT EXISTS Table1
>
> (* h Int, g Int, f Int, e Int, d String*
>
> a Int, b Int, …(extra near 300 columns)
>
> STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(
>
> DICTIONARY_INCLUDE"="h, g, f , e , d"
>
> )
>
> The h,g,f,e,d are the columns used in filter. And their distinct value
> increase.
>
> Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
> 34040, 35886, 34610, 32589
>
> The time cost of third experiment is longer than others. It is because
> that I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’’ and d <
> ‘’” as the filter condition. So considering with the mdk build order,
> the result rows will be continuous in the file. So there is some blocket
> matching the filter condition totally, while others matching zero row. But
> the scheduler on driver did not filter out the zero row matched blockets.
> It seems the min/max filter does not work correctly on driver side when
> getSplit from InputFormat. There are some case on executor(blocket size is
> 120k):
>
> +--+++--
> --+---+---+---+
>
> |   task_id|load_blocks_time|load_dictionary_time|scan_blocks_ti
> me|scan_blocks_num|result_size|total_executor_time|
>
> +--+++--
> --+---+---+---+
>
> |603288369228186_62|426 |  2 | 28
> | 3 | 0 |   470 |
>
> +--+++--
> --+---+---+---+
>
>
>
> 16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX] index
> filter : 113609/12
>
> +---+++-
> ---+---+---+---+
>
> |task_id|load_blocks_time|load_dictionary_time|scan_blocks_ti
> me|scan_blocks_num|result_size|total_executor_time|
>
> +---+++-
> ---+---+---+---+
>
> |603288369228186_428|380 |  4 |
> 661 | 3 |113609 | 13634 |
>
> +---+++-
> ---+---+---+---+
>
> In first case result_size is zero, but the blocket is still put in one
> task. That make some task do nothing, but others suffer a heavy work.
>
> In second case, example only one blocket has data(4 or 6 blockets for one
> task). I add the log about invert index filter ratio like “[INDEX] index
> filter : /”
>
> So, how could I make the min/max filter work correctly in driver side?
>
>
>
> Another question: use multi-line in TBLPROPERTIES dose not work correctly
> like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old
> create table sql. Only the column in the last DICTIONARY_INCLUDE declare is
> add into the dimension. The new one works correctly. But the old way did
> not throw any exception.
>
>
>
> Further, I think from the 2th experiment balancing the data is important.
> So I will change the blocket size to 2k rows if the min/max filter could
> work on driver side. I have not changed the int type to double 

Re: GC problem and performance refine problem

2016-11-17 Thread An Lan
Hi Kumar Vishal,



I redo some experiment with a detail driver log. The logs are in the
accessory. And every log run the same query for 10 times.



1.   “driverlog1” is under same condition as previous I did.

2.   “driverlog2”: in this experiment, before load data, I shuffle the
csv file to make the rows with same attribute will distribute uniform
between all csv files. i.e. if we use the query filter on the origin csv
file, the result quantity is almost same for every csv file. And every csv
file is one block. So every partition for the loader of carbondata has
uniform data. That method works well:

Time cost of first experiment: 103543, 49363, 125430, 22138, 21626,
27905, 22086, 21332, 23501, 16715

Time cost of second experiment:62466, 38709, 24700, 18767, 18325, 15887,
16014, 15452, 17440, 15563

3.   “driverlog3”: First, use the shuffled csv data as 2th experiment.
Second, modify the create table sql:



*OLD CREATE TABLE:*

CREATE TABLE IF NOT EXISTS Table1

(* h Int, g Int, d String, f Int, e Int,*

a Int, b Int, …(extra near 300 columns)

STORED BY 'org.apache.carbondata.format'TBLPROPERTIES(

"NO_INVERTED_INDEX”=“a”,"NO_INVERTED_INDEX”=“b”,

…(extra near 300 columns)

"DICTIONARY_INCLUDE”=“a”,

"DICTIONARY_INCLUDE”=“b”,

…(extra near 300 columns)

)



*NEW CREATE TABLE:*

CREATE TABLE IF NOT EXISTS Table1

(* h Int, g Int, f Int, e Int, d String*

a Int, b Int, …(extra near 300 columns)

STORED BY 'org.apache.carbondata.format' TBLPROPERTIES(

DICTIONARY_INCLUDE"="h, g, f , e , d"

)

The h,g,f,e,d are the columns used in filter. And their distinct value
increase.

Time cost of third experiment: 57088, 37105, 41588, 43540, 38143, 35581,
34040, 35886, 34610, 32589

The time cost of third experiment is longer than others. It is because that
I use the “h=66 and g = 67 and f = 12 and e != 5 and d >’’ and d <
‘’” as the filter condition. So considering with the mdk build order,
the result rows will be continuous in the file. So there is some blocket
matching the filter condition totally, while others matching zero row. But
the scheduler on driver did not filter out the zero row matched blockets.
It seems the min/max filter does not work correctly on driver side when
getSplit from InputFormat. There are some case on executor(blocket size is
120k):

+--+++--
--+---+---+---+

|   task_id|load_blocks_time|load_dictionary_time|scan_blocks_
time|scan_blocks_num|result_size|total_executor_time|

+--+++--
--+---+---+---+

|603288369228186_62|426 |  2 | 28
| 3 | 0 |   470 |

+--+++--
--+---+---+---+



16/11/16 21:36:25 INFO impl.FilterScanner: pool-30-thread-1 [INDEX] index
filter : 113609/12

+---+++-
---+---+---+---+

|task_id|load_blocks_time|load_dictionary_time|scan_blocks_
time|scan_blocks_num|result_size|total_executor_time|

+---+++-
---+---+---+---+

|603288369228186_428|380 |  4 |661
| 3 |113609 | 13634 |

+---+++-
---+---+---+---+

In first case result_size is zero, but the blocket is still put in one
task. That make some task do nothing, but others suffer a heavy work.

In second case, example only one blocket has data(4 or 6 blockets for one
task). I add the log about invert index filter ratio like “[INDEX] index
filter : /”

So, how could I make the min/max filter work correctly in driver side?



Another question: use multi-line in TBLPROPERTIES dose not work correctly
like “"DICTIONARY_INCLUDE”=“a”, "DICTIONARY_INCLUDE”=“b”” in the old create
table sql. Only the column in the last DICTIONARY_INCLUDE declare is add
into the dimension. The new one works correctly. But the old way did not
throw any exception.



Further, I think from the 2th experiment balancing the data is important.
So I will change the blocket size to 2k rows if the min/max filter could
work on driver side. I have not changed the int type to double type for
measure, I will did it later.

2016-11-17 16:34 GMT+08:00 An Lan :

> Hi Kumar Vishal,
>
>
>
> I redo some experiment with a detail driver log. The logs are in the
> accessory. And every log run the same query for 10 times.
>
>
>
> 1.   “driverlog1” is under same condition as previous I did.
>
> 2.   “driverlog2”: in this experiment, before load data, I shuffle
> the csv file to make th

[GitHub] incubator-carbondata pull request #259: [WIP]Fix constants and method names

2016-11-17 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/259


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #324: [CARBONDATA-420] Remove unused param...

2016-11-17 Thread HuangxiaoxiaBonnie
GitHub user HuangxiaoxiaBonnie opened a pull request:

https://github.com/apache/incubator-carbondata/pull/324

[CARBONDATA-420] Remove unused parameter in config template file

## Why rasie this pr?
To remove unused parameter in config template file: carbon.max.file.size is 
removed now.
## How to test?
Not about function.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HuangxiaoxiaBonnie/incubator-carbondata 
parameter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #324


commit cf27fd911a2dd03f7bde2243072a42aa5dc68a55
Author: HuangxiaoxiaBonnie <450829...@qq.com>
Date:   2016-11-17T08:17:35Z

Rm unused parameter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-420) Remove unused parameter in config template file

2016-11-17 Thread HuangxiaoxiaBonnie (JIRA)
HuangxiaoxiaBonnie created CARBONDATA-420:
-

 Summary: Remove unused parameter in config template file
 Key: CARBONDATA-420
 URL: https://issues.apache.org/jira/browse/CARBONDATA-420
 Project: CarbonData
  Issue Type: Improvement
Reporter: HuangxiaoxiaBonnie
Priority: Minor


max.file.size is unused now, need to remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)