【DISCUSS】add more index for sort columns

2017-03-14 Thread bill.zhou
hi all Carbon will add min/max index for sort columns which used for better filter query. So can we add more index for the sort column to make filter faster. This is one idea which I get from anther database design. For example this is one student, and the column: score in the student

CarbonDictionaryDecoder should support codegen

2017-03-10 Thread bill.zhou
hi All Now for the canrbon scan support codegen, but carbonditionarydecoder does't support codegen, I think it should support. For example, toady I do one test and the query plan is as following left, if CarbondictionaryDecoder support codegen the plan will change to following right. I

Re: Improving Non-dictionary storage & performance.

2017-03-07 Thread bill.zhou
hi Jacky I think this is not easy for user to control if cabron is online running. May be for one table two different load can be different cardinality for the same column but user cannot give different dictionary columns for one table. Regards Jacky Li wrote > Hi Ravindra, > > Another

Re: [DISCUSS] For the dimension default should be no dictionary

2017-03-02 Thread bill.zhou
hi All I summary this discussion. 1. to make carbonData compatibility for older vesion, keep DICTIONARY_INCLUDE and DICTIONARY_EXCLUDE, default is no dictionary. do not suggestion change this properties to table_dictionary. 2. Suggestion keep the sort_column properties as the same style for

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread bill.zhou
sort columns except C6. > > DICTIONARY_EXCLUDE= 'C2' > DICTIONARY_INCLUDE='ALL' > In the above case all sort columns(C1,C2,C3,C4,C5) are dictionary columns > except C2, here C2 is no-dictionary column. > > Above mentioned are just my idea of how to simplify DDL to handle a

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-26 Thread bill.zhou
Disadvantages : >> 1. Store size will increase drastically. >> 2. IO will increase so query performance will come down. >> 3. Aggregation queries performance will suffer. >> >> >> >> Regards, >> Ravindra. >> >> On 26 February 2017

[DISCUSS] For the dimension default should be no dictionary

2017-02-26 Thread bill.zhou
hi All Now when create the CarbonData table,if the dimension don't add into the dictionary_exclude properties, the dimension will be consider as dictionary default. I think default should be no dictionary. For example when I do the POC for one customer, it has 300 columns and 200

Re: [Discussion] Please vote and comment for carbon data file format change

2016-12-10 Thread bill.zhou
+1 this modification will help all the scenario Kumar Vishal wrote > ​Hello All, > > Improving carbon first time query performance > > Reason: > 1. As file system cache is cleared file reading will make it slower to > read > and cache > 2. In first time query carbon will have to read the

compile error from spark project: scala.reflect.internal.MissingRequirmentError: object scala.runtime in compiler mirror not found

2016-11-29 Thread bill.zhou
hi all today fetch the latest code from the master branch, then compile the project. when compile the project spark it gives following issue. who knows this issue ? -- View this message in context:

Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-25 Thread bill.zhou
+1 Regards Bill Venkata Gollamudi wrote > Hi All, > > CarbonData 0.2.0 has been a good work and stable release with lot of > defects fixed and with number of performance improvements. >

Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-10 Thread bill.zhou
+1 Regards bill.zhou Liang Chen wrote > Hi all > > In 0.2.0 version of CarbonData, there are major performance improvements > like blocklets distribution, support BZIP2 compressed files, and so on > added to enhance the CarbonData performance significantly. Along wit

Re: In load data, CSV row contains invalid quote char and results are invalid

2016-10-29 Thread bill.zhou
hi Singh quotechar in the csv should be in pairs. like name, description, salary, age, dob tammy,$my name$,$90$,22,19/10/2019 tammy1,$delhi$,$32345%*$*,22,19/10/2019 tammy2,$banglore$,$543$,$44$,19/10/2019 tammy3,$city$,$343$,$22$,12/10/2019 tammy4,$punjab$,$23423$,$55$,19/10/2019

Discussion how to crate the CarbonData table with good performance

2016-10-15 Thread bill.zhou
Discussion how to crate the CarbonData table with good performance Suggestion to create Carbon table Recently we used CarbonData to do the performance in Telecommunication filed and summarize some of the Suggestions while creating the CarbonData table.We have tables which range from 10 thousand

Re: [Discussion] Support Date/Time format for Timestamp columns to be defined at column level

2016-09-29 Thread bill.zhou
+1 I agree Vimal's opinion. if user want other formatted, he can use function to convert. Regards Bill -- View this message in context:

Re: [discussion]When table properties is repeated it only set the last one

2016-09-29 Thread bill.zhou
+1 -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/discussion-When-table-properties-is-repeated-it-only-set-the-last-one-tp1539p1559.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Open discussion and Vote: What kind of JIRA issue events need send mail to dev@carbondata.incubator.apache.org

2016-08-18 Thread bill.zhou
Option2, better add Issue closed event -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Open-discussion-and-Vote-What-kind-of-JIRA-issue-events-need-send-mail-to-dev-carbondata-incubator-ag-tp321p325.html Sent from the Apache CarbonData