[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

2018-11-14 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-2951:

Description: 
For the some user of using C++ code in their project, they can't call 
CarbonData interface and integrate CarbonData into their C++ project. So we 
plan to provide C++  interface for C++ user to integrate carbon, including read 
and write CarbonData. It's will more convenient for they.

We plan to design and develop  as following:

1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
1.1.create CarbonReader
1.2.hasNext()
1.3.readNextRow()
1.4.close()
1.5.support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow) 
1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table

example data:
from_email_36550_phillip.al...@enron.com
to_email_36550_stagecoachm...@hotmail.com   from_to 
<29528303.107585557.JavaMail.evans@thyme>   153801549700
975514920

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
##features/interfaces
3.1.create CarbonWriter, including create schema(withCsvInput),set 
outputPath, and build,
3.2.write()
3.3.close()
3.4.support OBS(AK/SK/Endpoint)(withHadoopConf)
3.5.writtenBy
3.6. support withTableProperty, withLoadOption,taskNo, 
uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)

##Data types:
   Carbon need support base data types, including string, float, 
double, int, long, date, timestamp, bool, array.
  For other, we can convert:
 char array => carbon string
 Enum => Carbon string
  set and list => carbon array

##performance
Writing Performance is not required now

4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
5.1 addProperty
5.2 getProperty

6.TODO:
6.1.getVersionDetails. => to be review
6.2.updated SDK/CSDK reader doc => to be review
6.3.support byte(write read)
6.4.support long string columns
6.5.support sortBy=> to be review
6.6.support withCsvInput(Schema schema);  create schema(JAVA)
6.7. optimize the write doc => to be review
/**
* Create a {@link CarbonWriterBuilder} to build a 
{@link CarbonWriter}
*/
public static CarbonWriterBuilder builder() {
return new CarbonWriterBuilder();
}

  was:
For the some user of using C++ code in their project, they can't call 
CarbonData interface and integrate CarbonData into their C++ project. So we 
plan to provide C++  interface for C++ user to integrate carbon, including read 
and write CarbonData. It's will more convenient for they.

We plan to design and develop  as following:

1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
1.1.create CarbonReader
1.2.hasNext()
1.3.readNextRow()
1.4.close()
1.5.support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow) 
1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,

[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

2018-11-13 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-2951:

Description: 
For the some user of using C++ code in their project, they can't call 
CarbonData interface and integrate CarbonData into their C++ project. So we 
plan to provide C++  interface for C++ user to integrate carbon, including read 
and write CarbonData. It's will more convenient for they.

We plan to design and develop  as following:

1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
1.1.create CarbonReader
1.2.hasNext()
1.3.readNextRow()
1.4.close()
1.5.support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow) 
1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table

example data:
from_email_36550_phillip.al...@enron.com
to_email_36550_stagecoachm...@hotmail.com   from_to 
<29528303.107585557.JavaMail.evans@thyme>   153801549700
975514920

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
##features/interfaces
3.1.create CarbonWriter, including create schema(withCsvInput),set 
outputPath, and build,
3.2.write()
3.3.close()
3.4.support OBS(AK/SK/Endpoint)(withHadoopConf)
3.5.writtenBy
3.6. support withTableProperty, withLoadOption,taskNo, 
uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)

##Data types:
   Carbon need support base data types, including string, float, 
double, int, long, date, timestamp, bool, array.
  For other, we can convert:
 char array => carbon string
 Enum => Carbon string
  set and list => carbon array

##performance
Writing Performance is not required now

4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
5.1 addProperty
5.2 getProperty

6.TODO:
6.1.getVersionDetails. =>JIRA
6.2.updated SDK/CSDK reader doc
6.3.support byte(write read)
6.4.support long string columns
6.5.support sortBy
6.6.support withCsvInput(Schema schema);  create schema(JAVA)
6.7. optimize the write doc
/**
* Create a {@link CarbonWriterBuilder} to build a 
{@link CarbonWriter}
*/
public static CarbonWriterBuilder builder() {
return new CarbonWriterBuilder();
}

  was:
For the some user of using C++ code in their project, they can't call 
CarbonData interface and integrate CarbonData into their C++ project. So we 
plan to provide C++  interface for C++ user to integrate carbon, including read 
and write CarbonData. It's will more convenient for they.

We plan to design and develop  as following:

1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
1.1.create CarbonReader
1.2.hasNext()
1.3.readNextRow()
1.4.close()
1.5.support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow) 
1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,

[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

2018-11-13 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-2951:

Description: 
CSDK:  Provide C++ interface for SDK
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
#1.1.   create CarbonReader
#   1.2.hasNext()
#   1.3.readNextRow()
#   1.4.close()
#   1.5.support OBS(AK/SK/Endpoint)
#   1.6 support batch read(withBatch,readNextBatchRow) 
#   1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
#   1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table

example data:
from_email_36550_phillip.al...@enron.com
to_email_36550_stagecoachm...@hotmail.com   from_to 
<29528303.107585557.JavaMail.evans@thyme>   153801549700
975514920

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
##features/interfaces
3.1.create CarbonWriter, including create schema(withCsvInput),set 
outputPath, and build,
3.2.write()
3.3.close()
3.4.support OBS(AK/SK/Endpoint)(withHadoopConf)
3.5.writtenBy
3.6. support withTableProperty, withLoadOption,taskNo, 
uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)

##Data types:
   Carbon need support base data types, including string, float, 
double, int, long, date, timestamp, bool, array.
  For other, we can convert:
 char array => carbon string
 Enum => Carbon string
  set and list => carbon array

##performance
Writing Performance is not required now

4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
5.1 addProperty
5.2 getProperty

6.TODO:
6.1.getVersionDetails
6.2.updated SDK/CSDK reader doc
6.3.support byte(write read)
6.4.support long string columns
6.5.support sortBy
6.6.support withCsvInput(Schema schema);  create schema(JAVA)
6.7. optimize the write doc
/**
* Create a {@link CarbonWriterBuilder} to build a 
{@link CarbonWriter}
*/
public static CarbonWriterBuilder builder() {
return new CarbonWriterBuilder();
}

  was:
CSDK:  Provide C++ interface for SDK
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
   1.1. create CarbonReader
1.2.hasNext()
1.3.readNextRow()
1.4.close()
1.5.support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow) 
1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table

example data:
from_email_36550_phillip.al...@enron.com
to_email_36550_stagecoachm...@hotmail.com   from_to 
<29528303.107585557.JavaMail.evans@thyme>   153801549700
975514920

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data 

[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

2018-11-13 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-2951:

Description: 
CSDK:  Provide C++ interface for SDK
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
   1.1. create CarbonReader
1.2.hasNext()
1.3.readNextRow()
1.4.close()
1.5.support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow) 
1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table

example data:
from_email_36550_phillip.al...@enron.com
to_email_36550_stagecoachm...@hotmail.com   from_to 
<29528303.107585557.JavaMail.evans@thyme>   153801549700
975514920

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
##features/interfaces
3.1.create CarbonWriter, including create schema(withCsvInput),set 
outputPath, and build,
3.2.write()
3.3.close()
3.4.support OBS(AK/SK/Endpoint)(withHadoopConf)
3.5.writtenBy
3.6. support withTableProperty, withLoadOption,taskNo, 
uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)

##Data types:
   Carbon need support base data types, including string, float, 
double, int, long, date, timestamp, bool, array.
  For other, we can convert:
 char array => carbon string
 Enum => Carbon string
  set and list => carbon array

##performance
Writing Performance is not required now

4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
5.1 addProperty
5.2 getProperty

6.TODO:
6.1.getVersionDetails
6.2.updated SDK/CSDK reader doc
6.3.support byte(write read)
6.4.support long string columns
6.5.support sortBy
6.6.support withCsvInput(Schema schema);  create schema(JAVA)
6.7. optimize the write doc
/**
* Create a {@link CarbonWriterBuilder} to build a 
{@link CarbonWriter}
*/
public static CarbonWriterBuilder builder() {
return new CarbonWriterBuilder();
}

  was:
CSDK: Provide C++ interface for SDK
1.Provide CarbonReader for SDK, it can read carbon data in C++ language
2.Provide CarbonWriter for SDK, it can write carbon data in C++ language


> CSDK: Provide C++ interface for SDK
> ---
>
> Key: CARBONDATA-2951
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2951
> Project: CarbonData
>  Issue Type: Task
>  Components: other
>Affects Versions: 1.5.0
>Reporter: xubo245
>Assignee: xubo245
>Priority: Critical
> Fix For: NONE
>
>
> CSDK:  Provide C++ interface for SDK
> 1. Provide CarbonReader for SDK, it can read carbon data in C++ language
>   ##features/interfaces
>1.1.   create CarbonReader
>   1.2.hasNext()
>   1.3.readNextRow()
>   1.4.close()
>   1.5.support OBS(AK/SK/Endpoint)
>   1.6 support batch read(withBatch,readNextBatchRow) 
>   1.7 support vecor read(default) and carbonrecordreader 
> (withRowRecordReader)
>   1.8 projection
>   
>   ##support data types:
>String, 
> Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
>Array in carbonrecordreader, not support in vectorreader
>byte=>support in java RowUtil, not in C++ carbon reader
>
>   ## Schema and data
>Create table tbl_email_form_to_for_XX( 
>   Event_Time Timestamp,
>   Ingestion_Time Timestamp,
> 

[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

2018-11-13 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-2951:

Description: 
For the some user of using C++ code in their project, they can't call 
CarbonData interface and integrate CarbonData into their C++ project. So we 
plan to provide C++  interface for C++ user to integrate carbon, including read 
and write CarbonData. It's will more convenient for they.

We plan to design and develop  as following:

1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
1.1.create CarbonReader
1.2.hasNext()
1.3.readNextRow()
1.4.close()
1.5.support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow) 
1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table

example data:
from_email_36550_phillip.al...@enron.com
to_email_36550_stagecoachm...@hotmail.com   from_to 
<29528303.107585557.JavaMail.evans@thyme>   153801549700
975514920

2. the performance should be reach X millions records/s/node

3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
##features/interfaces
3.1.create CarbonWriter, including create schema(withCsvInput),set 
outputPath, and build,
3.2.write()
3.3.close()
3.4.support OBS(AK/SK/Endpoint)(withHadoopConf)
3.5.writtenBy
3.6. support withTableProperty, withLoadOption,taskNo, 
uniqueIdentifier, withThreadSafe,  withBlockSize, withBlockletSize, 
localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)

##Data types:
   Carbon need support base data types, including string, float, 
double, int, long, date, timestamp, bool, array.
  For other, we can convert:
 char array => carbon string
 Enum => Carbon string
  set and list => carbon array

##performance
Writing Performance is not required now

4. read schema function
readSchema
getVersionDetails  =>TODO

5. support carbonproperties
5.1 addProperty
5.2 getProperty

6.TODO:
6.1.getVersionDetails
6.2.updated SDK/CSDK reader doc
6.3.support byte(write read)
6.4.support long string columns
6.5.support sortBy
6.6.support withCsvInput(Schema schema);  create schema(JAVA)
6.7. optimize the write doc
/**
* Create a {@link CarbonWriterBuilder} to build a 
{@link CarbonWriter}
*/
public static CarbonWriterBuilder builder() {
return new CarbonWriterBuilder();
}

  was:
CSDK:  Provide C++ interface for SDK
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
#1.1.   create CarbonReader
#   1.2.hasNext()
#   1.3.readNextRow()
#   1.4.close()
#   1.5.support OBS(AK/SK/Endpoint)
#   1.6 support batch read(withBatch,readNextBatchRow) 
#   1.7 support vecor read(default) and carbonrecordreader 
(withRowRecordReader)
#   1.8 projection

##support data types:
 String, 
Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
 Array in carbonrecordreader, not support in vectorreader
 byte=>support in java RowUtil, not in C++ carbon reader
 
## Schema and data
 Create table tbl_email_form_to_for_XX( 
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table

example data:

[jira] [Updated] (CARBONDATA-2951) CSDK: Provide C++ interface for SDK

2018-10-12 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala updated CARBONDATA-2951:

Fix Version/s: (was: 1.5.0)
   NONE

> CSDK: Provide C++ interface for SDK
> ---
>
> Key: CARBONDATA-2951
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2951
> Project: CarbonData
>  Issue Type: Task
>  Components: other
>Affects Versions: 1.5.0
>Reporter: xubo245
>Assignee: xubo245
>Priority: Critical
> Fix For: NONE
>
>
> CSDK: Provide C++ interface for SDK
> 1.Provide CarbonReader for SDK, it can read carbon data in C++ language
> 2.Provide CarbonWriter for SDK, it can write carbon data in C++ language



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)