[jira] [Issue Comment Deleted] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Comment: was deleted (was: Introducing Apache CarbonData : A new hadoop-native file format for faster data analysis, O'Reilly Open Source Convention: OSCON, May 16 - 19, 2016 :https://www.youtube.com/watch?v=VEckmJuU47g) > CSDK: Provide C++ interface for SDK > --- > > Key: CARBONDATA-2951 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2951 > Project: CarbonData > Issue Type: Task > Components: other >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Critical > Fix For: NONE > > > For the some user of using C++ code in their project, they can't call > CarbonData interface and integrate CarbonData into their C++ project. So we > plan to provide C++ interface for C++ user to integrate carbon, including > read and write CarbonData. It's will more convenient for they. > We plan to design and develop as following: > 1. Provide CarbonReader for SDK, it can read carbon data in C++ language > ##features/interfaces > 1.1. create CarbonReader > 1.2.hasNext() > 1.3.readNextRow() > 1.4.close() > 1.5.support OBS(AK/SK/Endpoint) > 1.6 support batch read(withBatch,readNextBatchRow) > 1.7 support vecor read(default) and carbonrecordreader > (withRowRecordReader) > 1.8 projection > > ##support data types: >String, > Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float >Array in carbonrecordreader, not support in vectorreader >byte=>support in java RowUtil, not in C++ carbon reader > > ## Schema and data >Create table tbl_email_form_to_for_XX( > Event_Time Timestamp, > Ingestion_Time Timestamp, > From_Email String, > To_Email String, > From_To_type String, > Event_ID String > ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) > ETL 6 columns from 18 columns table > > example data: > from_email_36550_phillip.al...@enron.com > to_email_36550_stagecoachm...@hotmail.com from_to > <29528303.107585557.JavaMail.evans@thyme> 153801549700 > 975514920 > 2. the performance should be reach X millions records/s/node > 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language > ##features/interfaces > 3.1.create CarbonWriter, including create schema(withCsvInput),set > outputPath, and build, > 3.2.write() > 3.3.close() > 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) > 3.5.writtenBy > 3.6. support withTableProperty, withLoadOption,taskNo, > uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, > localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE > review) > > ##Data types: > Carbon need support base data types, including string, float, > double, int, long, date, timestamp, bool, array. > For other, we can convert: > char array => carbon string > Enum => Carbon string > set and list => carbon array > ##performance > Writing Performance is not required now > > 4. read schema function > readSchema > getVersionDetails =>TODO > 5. support carbonproperties > 5.1 addProperty > 5.2 getProperty > > 6.TODO: > 6.1.getVersionDetails. => to be review > 6.2.updated SDK/CSDK reader doc => to be review > 6.3.support byte(write read) > 6.4.support long string columns > 6.5.support sortBy=> to be review > 6.6.support withCsvInput(Schema schema); create schema(JAVA) > 6.7. optimize the write doc => to be review > /** > * Create a {@link CarbonWriterBuilder} to build a > {@link CarbonWriter} > */ > public static CarbonWriterBuilder builder() { > return new CarbonWriterBuilder(); > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Comment: was deleted (was: Apache Carbondata: An Indexed Columnar File Format for Interactive Query by Jacky Li/Jihong Ma, Spark summit EAST 2017:https://www.youtube.com/watch?v=lhsAg2H_GXc) > CSDK: Provide C++ interface for SDK > --- > > Key: CARBONDATA-2951 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2951 > Project: CarbonData > Issue Type: Task > Components: other >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Critical > Fix For: NONE > > > For the some user of using C++ code in their project, they can't call > CarbonData interface and integrate CarbonData into their C++ project. So we > plan to provide C++ interface for C++ user to integrate carbon, including > read and write CarbonData. It's will more convenient for they. > We plan to design and develop as following: > 1. Provide CarbonReader for SDK, it can read carbon data in C++ language > ##features/interfaces > 1.1. create CarbonReader > 1.2.hasNext() > 1.3.readNextRow() > 1.4.close() > 1.5.support OBS(AK/SK/Endpoint) > 1.6 support batch read(withBatch,readNextBatchRow) > 1.7 support vecor read(default) and carbonrecordreader > (withRowRecordReader) > 1.8 projection > > ##support data types: >String, > Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float >Array in carbonrecordreader, not support in vectorreader >byte=>support in java RowUtil, not in C++ carbon reader > > ## Schema and data >Create table tbl_email_form_to_for_XX( > Event_Time Timestamp, > Ingestion_Time Timestamp, > From_Email String, > To_Email String, > From_To_type String, > Event_ID String > ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) > ETL 6 columns from 18 columns table > > example data: > from_email_36550_phillip.al...@enron.com > to_email_36550_stagecoachm...@hotmail.com from_to > <29528303.107585557.JavaMail.evans@thyme> 153801549700 > 975514920 > 2. the performance should be reach X millions records/s/node > 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language > ##features/interfaces > 3.1.create CarbonWriter, including create schema(withCsvInput),set > outputPath, and build, > 3.2.write() > 3.3.close() > 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) > 3.5.writtenBy > 3.6. support withTableProperty, withLoadOption,taskNo, > uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, > localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE > review) > > ##Data types: > Carbon need support base data types, including string, float, > double, int, long, date, timestamp, bool, array. > For other, we can convert: > char array => carbon string > Enum => Carbon string > set and list => carbon array > ##performance > Writing Performance is not required now > > 4. read schema function > readSchema > getVersionDetails =>TODO > 5. support carbonproperties > 5.1 addProperty > 5.2 getProperty > > 6.TODO: > 6.1.getVersionDetails. => to be review > 6.2.updated SDK/CSDK reader doc => to be review > 6.3.support byte(write read) > 6.4.support long string columns > 6.5.support sortBy=> to be review > 6.6.support withCsvInput(Schema schema); create schema(JAVA) > 6.7. optimize the write doc => to be review > /** > * Create a {@link CarbonWriterBuilder} to build a > {@link CarbonWriter} > */ > public static CarbonWriterBuilder builder() { > return new CarbonWriterBuilder(); > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Comment: was deleted (was: Apache Carbondata: An indexed columnar file format for interactive query with Spark SQL:https://www.youtube.com/watch?v=yya8-GzRW5M) > CSDK: Provide C++ interface for SDK > --- > > Key: CARBONDATA-2951 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2951 > Project: CarbonData > Issue Type: Task > Components: other >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Critical > Fix For: NONE > > > For the some user of using C++ code in their project, they can't call > CarbonData interface and integrate CarbonData into their C++ project. So we > plan to provide C++ interface for C++ user to integrate carbon, including > read and write CarbonData. It's will more convenient for they. > We plan to design and develop as following: > 1. Provide CarbonReader for SDK, it can read carbon data in C++ language > ##features/interfaces > 1.1. create CarbonReader > 1.2.hasNext() > 1.3.readNextRow() > 1.4.close() > 1.5.support OBS(AK/SK/Endpoint) > 1.6 support batch read(withBatch,readNextBatchRow) > 1.7 support vecor read(default) and carbonrecordreader > (withRowRecordReader) > 1.8 projection > > ##support data types: >String, > Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float >Array in carbonrecordreader, not support in vectorreader >byte=>support in java RowUtil, not in C++ carbon reader > > ## Schema and data >Create table tbl_email_form_to_for_XX( > Event_Time Timestamp, > Ingestion_Time Timestamp, > From_Email String, > To_Email String, > From_To_type String, > Event_ID String > ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) > ETL 6 columns from 18 columns table > > example data: > from_email_36550_phillip.al...@enron.com > to_email_36550_stagecoachm...@hotmail.com from_to > <29528303.107585557.JavaMail.evans@thyme> 153801549700 > 975514920 > 2. the performance should be reach X millions records/s/node > 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language > ##features/interfaces > 3.1.create CarbonWriter, including create schema(withCsvInput),set > outputPath, and build, > 3.2.write() > 3.3.close() > 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) > 3.5.writtenBy > 3.6. support withTableProperty, withLoadOption,taskNo, > uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, > localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE > review) > > ##Data types: > Carbon need support base data types, including string, float, > double, int, long, date, timestamp, bool, array. > For other, we can convert: > char array => carbon string > Enum => Carbon string > set and list => carbon array > ##performance > Writing Performance is not required now > > 4. read schema function > readSchema > getVersionDetails =>TODO > 5. support carbonproperties > 5.1 addProperty > 5.2 getProperty > > 6.TODO: > 6.1.getVersionDetails. => to be review > 6.2.updated SDK/CSDK reader doc => to be review > 6.3.support byte(write read) > 6.4.support long string columns > 6.5.support sortBy=> to be review > 6.6.support withCsvInput(Schema schema); create schema(JAVA) > 6.7. optimize the write doc => to be review > /** > * Create a {@link CarbonWriterBuilder} to build a > {@link CarbonWriter} > */ > public static CarbonWriterBuilder builder() { > return new CarbonWriterBuilder(); > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Comment: was deleted (was: https://www.computerhope.com/issues/ch001002.htm) > CSDK: Provide C++ interface for SDK > --- > > Key: CARBONDATA-2951 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2951 > Project: CarbonData > Issue Type: Task > Components: other >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Critical > Fix For: NONE > > > For the some user of using C++ code in their project, they can't call > CarbonData interface and integrate CarbonData into their C++ project. So we > plan to provide C++ interface for C++ user to integrate carbon, including > read and write CarbonData. It's will more convenient for they. > We plan to design and develop as following: > 1. Provide CarbonReader for SDK, it can read carbon data in C++ language > ##features/interfaces > 1.1. create CarbonReader > 1.2.hasNext() > 1.3.readNextRow() > 1.4.close() > 1.5.support OBS(AK/SK/Endpoint) > 1.6 support batch read(withBatch,readNextBatchRow) > 1.7 support vecor read(default) and carbonrecordreader > (withRowRecordReader) > 1.8 projection > > ##support data types: >String, > Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float >Array in carbonrecordreader, not support in vectorreader >byte=>support in java RowUtil, not in C++ carbon reader > > ## Schema and data >Create table tbl_email_form_to_for_XX( > Event_Time Timestamp, > Ingestion_Time Timestamp, > From_Email String, > To_Email String, > From_To_type String, > Event_ID String > ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) > ETL 6 columns from 18 columns table > > example data: > from_email_36550_phillip.al...@enron.com > to_email_36550_stagecoachm...@hotmail.com from_to > <29528303.107585557.JavaMail.evans@thyme> 153801549700 > 975514920 > 2. the performance should be reach X millions records/s/node > 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language > ##features/interfaces > 3.1.create CarbonWriter, including create schema(withCsvInput),set > outputPath, and build, > 3.2.write() > 3.3.close() > 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) > 3.5.writtenBy > 3.6. support withTableProperty, withLoadOption,taskNo, > uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, > localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE > review) > > ##Data types: > Carbon need support base data types, including string, float, > double, int, long, date, timestamp, bool, array. > For other, we can convert: > char array => carbon string > Enum => Carbon string > set and list => carbon array > ##performance > Writing Performance is not required now > > 4. read schema function > readSchema > getVersionDetails =>TODO > 5. support carbonproperties > 5.1 addProperty > 5.2 getProperty > > 6.TODO: > 6.1.getVersionDetails. => to be review > 6.2.updated SDK/CSDK reader doc => to be review > 6.3.support byte(write read) > 6.4.support long string columns > 6.5.support sortBy=> to be review > 6.6.support withCsvInput(Schema schema); create schema(JAVA) > 6.7. optimize the write doc => to be review > /** > * Create a {@link CarbonWriterBuilder} to build a > {@link CarbonWriter} > */ > public static CarbonWriterBuilder builder() { > return new CarbonWriterBuilder(); > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Comment: was deleted (was: https://r2---sn-npoeenek.googlevideo.com/videoplayback?lmt=1521057331061550=1812.178=youtube=25=yes=142.93.137.161=VLv0W5_UNcqagAeN3q7oBA=254556679094EFD17DAC3DAD278E66478407BC49.4E38F21849411EDBDCDBF5990ED176A375B2F06A=cms1=o-AC5nDRva0FaTCRRdBU5bhUeOEws4bx8zmbynLQo0P895=22=video%2Fmp4=1542786997=dur,ei,expire,id,ip,ipbits,itag,lmt,mime,mip,mm,mn,ms,mv,pl,ratebypass,requiressl,source=2=yes=WEB=0_id=lhsAg2H_GXc=Apache+Carbondata-+An+Indexed+Columnar+File+Format+for+Interactive+Query+by+Jacky+Li-Jihong+Ma_counter=1=sn-5hnel77l=23763603_id=7ca3e5fd8923a3ee_redirect=yes=116.66.184.191=34=sn-npoeenek=ltu=1542765298=m) > CSDK: Provide C++ interface for SDK > --- > > Key: CARBONDATA-2951 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2951 > Project: CarbonData > Issue Type: Task > Components: other >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Critical > Fix For: NONE > > > For the some user of using C++ code in their project, they can't call > CarbonData interface and integrate CarbonData into their C++ project. So we > plan to provide C++ interface for C++ user to integrate carbon, including > read and write CarbonData. It's will more convenient for they. > We plan to design and develop as following: > 1. Provide CarbonReader for SDK, it can read carbon data in C++ language > ##features/interfaces > 1.1. create CarbonReader > 1.2.hasNext() > 1.3.readNextRow() > 1.4.close() > 1.5.support OBS(AK/SK/Endpoint) > 1.6 support batch read(withBatch,readNextBatchRow) > 1.7 support vecor read(default) and carbonrecordreader > (withRowRecordReader) > 1.8 projection > > ##support data types: >String, > Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float >Array in carbonrecordreader, not support in vectorreader >byte=>support in java RowUtil, not in C++ carbon reader > > ## Schema and data >Create table tbl_email_form_to_for_XX( > Event_Time Timestamp, > Ingestion_Time Timestamp, > From_Email String, > To_Email String, > From_To_type String, > Event_ID String > ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) > ETL 6 columns from 18 columns table > > example data: > from_email_36550_phillip.al...@enron.com > to_email_36550_stagecoachm...@hotmail.com from_to > <29528303.107585557.JavaMail.evans@thyme> 153801549700 > 975514920 > 2. the performance should be reach X millions records/s/node > 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language > ##features/interfaces > 3.1.create CarbonWriter, including create schema(withCsvInput),set > outputPath, and build, > 3.2.write() > 3.3.close() > 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) > 3.5.writtenBy > 3.6. support withTableProperty, withLoadOption,taskNo, > uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, > localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE > review) > > ##Data types: > Carbon need support base data types, including string, float, > double, int, long, date, timestamp, bool, array. > For other, we can convert: > char array => carbon string > Enum => Carbon string > set and list => carbon array > ##performance > Writing Performance is not required now > > 4. read schema function > readSchema > getVersionDetails =>TODO > 5. support carbonproperties > 5.1 addProperty > 5.2 getProperty > > 6.TODO: > 6.1.getVersionDetails. => to be review > 6.2.updated SDK/CSDK reader doc => to be review > 6.3.support byte(write read) > 6.4.support long string columns > 6.5.support sortBy=> to be review > 6.6.support withCsvInput(Schema schema); create schema(JAVA) > 6.7. optimize the write doc => to be review > /** > * Create a {@link CarbonWriterBuilder} to build a > {@link CarbonWriter} > */ > public static CarbonWriterBuilder builder() { > return new CarbonWriterBuilder(); > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)