[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Description: For the some user of using C++ code in their project, they can't call CarbonData interface and integrate CarbonData into their C++ project. So we plan to provide C++ interface for C++ user to integrate carbon, including read and write CarbonData. It's will more convenient for they. We plan to design and develop as following: 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces 1.1.create CarbonReader 1.2.hasNext() 1.3.readNextRow() 1.4.close() 1.5.support OBS(AK/SK/Endpoint) 1.6 support batch read(withBatch,readNextBatchRow) 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String, To_Email String, From_To_type String, Event_ID String ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) ETL 6 columns from 18 columns table example data: from_email_36550_phillip.al...@enron.com to_email_36550_stagecoachm...@hotmail.com from_to <29528303.107585557.JavaMail.evans@thyme> 153801549700 975514920 2. the performance should be reach X millions records/s/node 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language ##features/interfaces 3.1.create CarbonWriter, including create schema(withCsvInput),set outputPath, and build, 3.2.write() 3.3.close() 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) 3.5.writtenBy 3.6. support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review) ##Data types: Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array. For other, we can convert: char array => carbon string Enum => Carbon string set and list => carbon array ##performance Writing Performance is not required now 4. read schema function readSchema getVersionDetails =>TODO 5. support carbonproperties 5.1 addProperty 5.2 getProperty 6.TODO: 6.1.getVersionDetails. => to be review 6.2.updated SDK/CSDK reader doc => to be review 6.3.support byte(write read) 6.4.support long string columns 6.5.support sortBy=> to be review 6.6.support withCsvInput(Schema schema); create schema(JAVA) 6.7. optimize the write doc => to be review /** * Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter} */ public static CarbonWriterBuilder builder() { return new CarbonWriterBuilder(); } was: For the some user of using C++ code in their project, they can't call CarbonData interface and integrate CarbonData into their C++ project. So we plan to provide C++ interface for C++ user to integrate carbon, including read and write CarbonData. It's will more convenient for they. We plan to design and develop as following: 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces 1.1.create CarbonReader 1.2.hasNext() 1.3.readNextRow() 1.4.close() 1.5.support OBS(AK/SK/Endpoint) 1.6 support batch read(withBatch,readNextBatchRow) 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp,
[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Description: For the some user of using C++ code in their project, they can't call CarbonData interface and integrate CarbonData into their C++ project. So we plan to provide C++ interface for C++ user to integrate carbon, including read and write CarbonData. It's will more convenient for they. We plan to design and develop as following: 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces 1.1.create CarbonReader 1.2.hasNext() 1.3.readNextRow() 1.4.close() 1.5.support OBS(AK/SK/Endpoint) 1.6 support batch read(withBatch,readNextBatchRow) 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String, To_Email String, From_To_type String, Event_ID String ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) ETL 6 columns from 18 columns table example data: from_email_36550_phillip.al...@enron.com to_email_36550_stagecoachm...@hotmail.com from_to <29528303.107585557.JavaMail.evans@thyme> 153801549700 975514920 2. the performance should be reach X millions records/s/node 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language ##features/interfaces 3.1.create CarbonWriter, including create schema(withCsvInput),set outputPath, and build, 3.2.write() 3.3.close() 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) 3.5.writtenBy 3.6. support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review) ##Data types: Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array. For other, we can convert: char array => carbon string Enum => Carbon string set and list => carbon array ##performance Writing Performance is not required now 4. read schema function readSchema getVersionDetails =>TODO 5. support carbonproperties 5.1 addProperty 5.2 getProperty 6.TODO: 6.1.getVersionDetails. =>JIRA 6.2.updated SDK/CSDK reader doc 6.3.support byte(write read) 6.4.support long string columns 6.5.support sortBy 6.6.support withCsvInput(Schema schema); create schema(JAVA) 6.7. optimize the write doc /** * Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter} */ public static CarbonWriterBuilder builder() { return new CarbonWriterBuilder(); } was: For the some user of using C++ code in their project, they can't call CarbonData interface and integrate CarbonData into their C++ project. So we plan to provide C++ interface for C++ user to integrate carbon, including read and write CarbonData. It's will more convenient for they. We plan to design and develop as following: 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces 1.1.create CarbonReader 1.2.hasNext() 1.3.readNextRow() 1.4.close() 1.5.support OBS(AK/SK/Endpoint) 1.6 support batch read(withBatch,readNextBatchRow) 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String,
[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Description: CSDK: Provide C++ interface for SDK 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces #1.1. create CarbonReader # 1.2.hasNext() # 1.3.readNextRow() # 1.4.close() # 1.5.support OBS(AK/SK/Endpoint) # 1.6 support batch read(withBatch,readNextBatchRow) # 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) # 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String, To_Email String, From_To_type String, Event_ID String ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) ETL 6 columns from 18 columns table example data: from_email_36550_phillip.al...@enron.com to_email_36550_stagecoachm...@hotmail.com from_to <29528303.107585557.JavaMail.evans@thyme> 153801549700 975514920 2. the performance should be reach X millions records/s/node 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language ##features/interfaces 3.1.create CarbonWriter, including create schema(withCsvInput),set outputPath, and build, 3.2.write() 3.3.close() 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) 3.5.writtenBy 3.6. support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review) ##Data types: Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array. For other, we can convert: char array => carbon string Enum => Carbon string set and list => carbon array ##performance Writing Performance is not required now 4. read schema function readSchema getVersionDetails =>TODO 5. support carbonproperties 5.1 addProperty 5.2 getProperty 6.TODO: 6.1.getVersionDetails 6.2.updated SDK/CSDK reader doc 6.3.support byte(write read) 6.4.support long string columns 6.5.support sortBy 6.6.support withCsvInput(Schema schema); create schema(JAVA) 6.7. optimize the write doc /** * Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter} */ public static CarbonWriterBuilder builder() { return new CarbonWriterBuilder(); } was: CSDK: Provide C++ interface for SDK 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces 1.1. create CarbonReader 1.2.hasNext() 1.3.readNextRow() 1.4.close() 1.5.support OBS(AK/SK/Endpoint) 1.6 support batch read(withBatch,readNextBatchRow) 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String, To_Email String, From_To_type String, Event_ID String ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) ETL 6 columns from 18 columns table example data: from_email_36550_phillip.al...@enron.com to_email_36550_stagecoachm...@hotmail.com from_to <29528303.107585557.JavaMail.evans@thyme> 153801549700 975514920 2. the performance should be reach X millions records/s/node 3.Provide CarbonWriter for SDK, it can write carbon data
[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Description: CSDK: Provide C++ interface for SDK 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces 1.1. create CarbonReader 1.2.hasNext() 1.3.readNextRow() 1.4.close() 1.5.support OBS(AK/SK/Endpoint) 1.6 support batch read(withBatch,readNextBatchRow) 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String, To_Email String, From_To_type String, Event_ID String ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) ETL 6 columns from 18 columns table example data: from_email_36550_phillip.al...@enron.com to_email_36550_stagecoachm...@hotmail.com from_to <29528303.107585557.JavaMail.evans@thyme> 153801549700 975514920 2. the performance should be reach X millions records/s/node 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language ##features/interfaces 3.1.create CarbonWriter, including create schema(withCsvInput),set outputPath, and build, 3.2.write() 3.3.close() 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) 3.5.writtenBy 3.6. support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review) ##Data types: Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array. For other, we can convert: char array => carbon string Enum => Carbon string set and list => carbon array ##performance Writing Performance is not required now 4. read schema function readSchema getVersionDetails =>TODO 5. support carbonproperties 5.1 addProperty 5.2 getProperty 6.TODO: 6.1.getVersionDetails 6.2.updated SDK/CSDK reader doc 6.3.support byte(write read) 6.4.support long string columns 6.5.support sortBy 6.6.support withCsvInput(Schema schema); create schema(JAVA) 6.7. optimize the write doc /** * Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter} */ public static CarbonWriterBuilder builder() { return new CarbonWriterBuilder(); } was: CSDK: Provide C++ interface for SDK 1.Provide CarbonReader for SDK, it can read carbon data in C++ language 2.Provide CarbonWriter for SDK, it can write carbon data in C++ language > CSDK: Provide C++ interface for SDK > --- > > Key: CARBONDATA-2951 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2951 > Project: CarbonData > Issue Type: Task > Components: other >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Critical > Fix For: NONE > > > CSDK: Provide C++ interface for SDK > 1. Provide CarbonReader for SDK, it can read carbon data in C++ language > ##features/interfaces >1.1. create CarbonReader > 1.2.hasNext() > 1.3.readNextRow() > 1.4.close() > 1.5.support OBS(AK/SK/Endpoint) > 1.6 support batch read(withBatch,readNextBatchRow) > 1.7 support vecor read(default) and carbonrecordreader > (withRowRecordReader) > 1.8 projection > > ##support data types: >String, > Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float >Array in carbonrecordreader, not support in vectorreader >byte=>support in java RowUtil, not in C++ carbon reader > > ## Schema and data >Create table tbl_email_form_to_for_XX( > Event_Time Timestamp, > Ingestion_Time Timestamp, >
[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-2951: Description: For the some user of using C++ code in their project, they can't call CarbonData interface and integrate CarbonData into their C++ project. So we plan to provide C++ interface for C++ user to integrate carbon, including read and write CarbonData. It's will more convenient for they. We plan to design and develop as following: 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces 1.1.create CarbonReader 1.2.hasNext() 1.3.readNextRow() 1.4.close() 1.5.support OBS(AK/SK/Endpoint) 1.6 support batch read(withBatch,readNextBatchRow) 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String, To_Email String, From_To_type String, Event_ID String ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) ETL 6 columns from 18 columns table example data: from_email_36550_phillip.al...@enron.com to_email_36550_stagecoachm...@hotmail.com from_to <29528303.107585557.JavaMail.evans@thyme> 153801549700 975514920 2. the performance should be reach X millions records/s/node 3.Provide CarbonWriter for SDK, it can write carbon data in C++ language ##features/interfaces 3.1.create CarbonWriter, including create schema(withCsvInput),set outputPath, and build, 3.2.write() 3.3.close() 3.4.support OBS(AK/SK/Endpoint)(withHadoopConf) 3.5.writtenBy 3.6. support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review) ##Data types: Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array. For other, we can convert: char array => carbon string Enum => Carbon string set and list => carbon array ##performance Writing Performance is not required now 4. read schema function readSchema getVersionDetails =>TODO 5. support carbonproperties 5.1 addProperty 5.2 getProperty 6.TODO: 6.1.getVersionDetails 6.2.updated SDK/CSDK reader doc 6.3.support byte(write read) 6.4.support long string columns 6.5.support sortBy 6.6.support withCsvInput(Schema schema); create schema(JAVA) 6.7. optimize the write doc /** * Create a {@link CarbonWriterBuilder} to build a {@link CarbonWriter} */ public static CarbonWriterBuilder builder() { return new CarbonWriterBuilder(); } was: CSDK: Provide C++ interface for SDK 1. Provide CarbonReader for SDK, it can read carbon data in C++ language ##features/interfaces #1.1. create CarbonReader # 1.2.hasNext() # 1.3.readNextRow() # 1.4.close() # 1.5.support OBS(AK/SK/Endpoint) # 1.6 support batch read(withBatch,readNextBatchRow) # 1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader) # 1.8 projection ##support data types: String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float Array in carbonrecordreader, not support in vectorreader byte=>support in java RowUtil, not in C++ carbon reader ## Schema and data Create table tbl_email_form_to_for_XX( Event_Time Timestamp, Ingestion_Time Timestamp, From_Email String, To_Email String, From_To_type String, Event_ID String ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’) ETL 6 columns from 18 columns table example data:
[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala updated CARBONDATA-2951: Fix Version/s: (was: 1.5.0) NONE > CSDK: Provide C++ interface for SDK > --- > > Key: CARBONDATA-2951 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2951 > Project: CarbonData > Issue Type: Task > Components: other >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Critical > Fix For: NONE > > > CSDK: Provide C++ interface for SDK > 1.Provide CarbonReader for SDK, it can read carbon data in C++ language > 2.Provide CarbonWriter for SDK, it can write carbon data in C++ language -- This message was sent by Atlassian JIRA (v7.6.3#76005)