[GitHub] [carbondata] CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data Type

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data 
Type
URL: https://github.com/apache/carbondata/pull/3173#issuecomment-482565909
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11126/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data Type

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data 
Type
URL: https://github.com/apache/carbondata/pull/3173#issuecomment-482561779
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3097/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3177: [WIP] Distributed index server

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3177: [WIP] Distributed index server
URL: https://github.com/apache/carbondata/pull/3177#issuecomment-482545856
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3095/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data Type

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data 
Type
URL: https://github.com/apache/carbondata/pull/3173#issuecomment-482545340
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2866/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data Type

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3173: [CARBONDATA-3351] Support Binary Data 
Type
URL: https://github.com/apache/carbondata/pull/3173#issuecomment-482542755
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2865/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc formatted
   => Carbon Datasource don't support  ALTER TABLE add calumny 
sql
   =>TODO: ALTER TABLE for binary data type in carbon session
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS  for binary  
2.8 Support compaction for binary(TODO)
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.(TODO)
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html


  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is 

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 
1.Supporting write binary data type by Carbon Java SDK:
1.1 Java SDK needs support write data with specific data types, like int, 
double, byte[ ] data type, no need to convert all data type to string array. 
User read binary file as byte[], then SDK writes byte[] into binary column. 
 
1.2 CarbonData compress binary column because now the compressor is table level.
=>TODO, support configuration for compress, default is no compress because 
binary usually is already compressed, like jpg format image. So no need to 
uncompress for binary column. 1.5.4 will support column level compression, 
after that, we can implement no compress for binary. We can talk with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary data 
usually is big, such as 200k. Otherwise it will be very big for one blocklet 
(32000 rows). =>PR2814

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.
2.1 Supporting read binary data type from non-transaction table, read binary 
column and return as byte[]
2.2 Support create table with binary column, table property doesn’t support 
sort_columns, dictionary, RANGE_COLUMN for binary column
=> Evaluate COLUMN_META_CACHE for binary
=> CARBON Datasource don't support dictionary include column
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc
=> Carbon Datasource don't support ALTER TABLE add column by sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support S3

  was:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). 

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3


> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> 1.Supporting write binary data type by Carbon Java SDK:
> 1.1 Java SDK needs support write data with specific data types, like int, 
> double, byte[ ] data type, no need to convert all data type to string array. 
> User read binary file as byte[], then SDK writes byte[] into binary column.   
>  
> 1.2 CarbonData compress binary column because now the compressor is table 
> level.
> =>TODO, support configuration for compress, default is no compress because 
> binary usually is already compressed, like jpg format image. So no need to 
> uncompress for binary column. 1.5.4 will support column level compression, 
> after that, we 

[GitHub] [carbondata] CarbonDataQA commented on issue #3177: [WIP] Distributed index server

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3177: [WIP] Distributed index server
URL: https://github.com/apache/carbondata/pull/3177#issuecomment-482534409
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11124/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
   => Carbon Datasource don't support  ALTER TABLE add calumny 
sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. 

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 

[jira] [Comment Edited] (CARBONDATA-3001) Propose configurable page size in MB (via carbon property)

2019-04-12 Thread Ajantha Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816145#comment-16816145
 ] 

Ajantha Bhat edited comment on CARBONDATA-3001 at 4/12/19 10:10 AM:


Future scope:
 # child tables support? check about inherit
 # store size increases ?
 # CLI tool or total log summary
 # Impact of many pages on page wise creation of tools.


was (Author: ajantha_bhat):
1. child tables support? check about inherit 
2. store size increases ?
3. CLI tool or total log summary 
4. impact of many pages on page wise creation of tools.

> Propose configurable page size in MB (via carbon property)
> --
>
> Key: CARBONDATA-3001
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3001
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Minor
> Attachments: Propose configurable page size in MB (via carbon 
> property).pdf
>
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> For better in-memory processing of carbondata pages, I am proposing 
> configurable page size in MB (via carbon property).
> please find the attachment for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-3001) Propose configurable page size in MB (via carbon property)

2019-04-12 Thread Ajantha Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816145#comment-16816145
 ] 

Ajantha Bhat commented on CARBONDATA-3001:
--

1. child tables support? check about inherit 
2. store size increases ?
3. CLI tool or total log summary 
4. impact of many pages on page wise creation of tools.

> Propose configurable page size in MB (via carbon property)
> --
>
> Key: CARBONDATA-3001
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3001
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Minor
> Attachments: Propose configurable page size in MB (via carbon 
> property).pdf
>
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> For better in-memory processing of carbondata pages, I am proposing 
> configurable page size in MB (via carbon property).
> please find the attachment for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] CarbonDataQA commented on issue #3177: [WIP] Distributed index server

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3177: [WIP] Distributed index server
URL: https://github.com/apache/carbondata/pull/3177#issuecomment-482509856
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2864/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 opened a new pull request #3177: [WIP] Distributed index server

2019-04-12 Thread GitBox
kunal642 opened a new pull request #3177: [WIP] Distributed index server
URL: https://github.com/apache/carbondata/pull/3177
 
 
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3176: [CARBONDATA-3353 ]Fixed MinMax Based Pruning for Measure column in case of Legacy store

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3176: [CARBONDATA-3353 ]Fixed  MinMax Based 
Pruning for Measure column in case of Legacy store
URL: https://github.com/apache/carbondata/pull/3176#issuecomment-482496314
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3094/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3176: [CARBONDATA-3353 ]Fixed MinMax Based Pruning for Measure column in case of Legacy store

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3176: [CARBONDATA-3353 ]Fixed  MinMax Based 
Pruning for Measure column in case of Legacy store
URL: https://github.com/apache/carbondata/pull/3176#issuecomment-482488934
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11123/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3176: [CARBONDATA-3353 ]Fixed MinMax Based Pruning for Measure column in case of Legacy store

2019-04-12 Thread GitBox
Indhumathi27 commented on a change in pull request #3176: [CARBONDATA-3353 
]Fixed  MinMax Based Pruning for Measure column in case of Legacy store
URL: https://github.com/apache/carbondata/pull/3176#discussion_r274808907
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeLessThanFilterExecuterImpl.java
 ##
 @@ -120,9 +120,13 @@ private void ifDefaultValueMatchesFilter() {
 boolean isScanRequired = false;
 if (isMeasurePresentInCurrentBlock[0] || 
isDimensionPresentInCurrentBlock[0]) {
   if (isMeasurePresentInCurrentBlock[0]) {
-minValue = blockMinValue[measureChunkIndex[0]];
-isScanRequired =
-isScanRequired(minValue, msrFilterRangeValues, 
msrColEvalutorInfoList.get(0).getType());
+if (isMinMaxSet[measureChunkIndex[0]]) {
+  minValue = blockMinValue[measureChunkIndex[0]];
+  isScanRequired = isScanRequired(minValue, msrFilterRangeValues,
+  msrColEvalutorInfoList.get(0).getType());
+} else {
+  isScanRequired = true;
+}
 
 Review comment:
   No


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] qiuchenjian commented on a change in pull request #3176: [CARBONDATA-3353 ]Fixed MinMax Based Pruning for Measure column in case of Legacy store

2019-04-12 Thread GitBox
qiuchenjian commented on a change in pull request #3176: [CARBONDATA-3353 
]Fixed  MinMax Based Pruning for Measure column in case of Legacy store
URL: https://github.com/apache/carbondata/pull/3176#discussion_r274795583
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeLessThanFilterExecuterImpl.java
 ##
 @@ -120,9 +120,13 @@ private void ifDefaultValueMatchesFilter() {
 boolean isScanRequired = false;
 if (isMeasurePresentInCurrentBlock[0] || 
isDimensionPresentInCurrentBlock[0]) {
   if (isMeasurePresentInCurrentBlock[0]) {
-minValue = blockMinValue[measureChunkIndex[0]];
-isScanRequired =
-isScanRequired(minValue, msrFilterRangeValues, 
msrColEvalutorInfoList.get(0).getType());
+if (isMinMaxSet[measureChunkIndex[0]]) {
+  minValue = blockMinValue[measureChunkIndex[0]];
+  isScanRequired = isScanRequired(minValue, msrFilterRangeValues,
+  msrColEvalutorInfoList.get(0).getType());
+} else {
+  isScanRequired = true;
+}
 
 Review comment:
   Does RangeValueFilterExecuterImpl have this issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3176: [CARBONDATA-3353 ]Fixed MinMax Based Pruning for Measure column in case of Legacy store

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3176: [CARBONDATA-3353 ]Fixed  MinMax Based 
Pruning for Measure column in case of Legacy store
URL: https://github.com/apache/carbondata/pull/3176#issuecomment-482469550
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2863/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3354) how to use filiters in datamaps

2019-04-12 Thread suyash yadav (JIRA)
suyash yadav created CARBONDATA-3354:


 Summary: how to use filiters in datamaps
 Key: CARBONDATA-3354
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3354
 Project: CarbonData
  Issue Type: Task
  Components: core
Affects Versions: 1.5.2
 Environment: apache carbon data 1.5.x
Reporter: suyash yadav
 Fix For: NONE


Hi Team,

 

We are doing a POC on apache carbon data so that we can verify if this database 
is capable of handling amount of data we are collecting form network devices.

 

We are stuck on few of our datamap related activities and have below queries: 

 
 # How to use timiebased filters while creating datamap.We tried a time based 
condition while creating a datamap but it didn't work.
 # How to create a timeseries datamap with column which is having value of 
epoch time.Our query is like below:-  *carbon.sql("CREATE DATAMAP test ON TABLE 
carbon_RT_test USING 'timeseries' DMPROPERTIES 
('event_time'='endMs','minute_granularity'='1',) AS SELECT sum(inOctets) FROM 
carbon_RT_test GROUP BY inIfId")*
 # *In above query endMs is having epoch time value.*
 # We got an error like below: "Timeseries event time is only supported on 
Timestamp column"
 # Also we need to know if we can have a time granularity other then 1 like in 
above query, can we have minute_granularity='5*'.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] Indhumathi27 opened a new pull request #3176: [CARBONDATA-3353 ]Fix MinMax Pruning for Measure column in case of Legacy store

2019-04-12 Thread GitBox
Indhumathi27 opened a new pull request #3176: [CARBONDATA-3353 ]Fix  MinMax 
Pruning for Measure column in case of Legacy store
URL: https://github.com/apache/carbondata/pull/3176
 
 
   **Why this PR needed?**
   
   **Problem:**
   For table created and loaded with legacy store having a measure column, 
while building the page min max,
   min is written as max and viceversa, so blocklet level minmax is wrong. With 
current version, when we query with filter on measure column, measure filter 
pruning is skipping some blocks and giving wrong results.
   
   **Solution:**
   Skip MinMax based pruning in case of legacy store for measure column.

   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
 Manually tested
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3353) Fix MinMax Pruning for Measure column in case of Legacy store

2019-04-12 Thread Indhumathi Muthumurugesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3353:
-
Description: For tables created and loaded in legacy store with measure 
column, when we query measure column with current version, query returns wrong 
results  (was: For tables created and loaded in legacy store, when we query 
measure column with current version, query returns wrong results)

> Fix MinMax Pruning for Measure column in case of Legacy store
> -
>
> Key: CARBONDATA-3353
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3353
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> For tables created and loaded in legacy store with measure column, when we 
> query measure column with current version, query returns wrong results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3353) Fix MinMax Pruning for Measure column in case of Legacy store

2019-04-12 Thread Indhumathi Muthumurugesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3353:
-
Description: For tables created and loaded in legacy store, when we query 
measure column with current version, query returns wrong results

> Fix MinMax Pruning for Measure column in case of Legacy store
> -
>
> Key: CARBONDATA-3353
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3353
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> For tables created and loaded in legacy store, when we query measure column 
> with current version, query returns wrong results



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3353) Fix MinMax Pruning for Measure column in case of Legacy store

2019-04-12 Thread Indhumathi Muthumurugesh (JIRA)
Indhumathi Muthumurugesh created CARBONDATA-3353:


 Summary: Fix MinMax Pruning for Measure column in case of Legacy 
store
 Key: CARBONDATA-3353
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3353
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3352) Avro, JSON writer of SDK support binary.

2019-04-12 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3352:
---

 Summary: Avro, JSON writer of SDK support binary.
 Key: CARBONDATA-3352
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3352
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


Avro, JSON writer of SDK support binary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). 

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3

  was:

1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3


> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> 1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK needs support write data with specific data types, 
> like int, 

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 

1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
>   1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK needs support write data with specific data types, 
> like int, double, byte[ ] data type, no need to convert all data type to 
> string array. User read binary file as byte[], then SDK writes byte[] into 
> binary column.  
>   1.2 CarbonData compress binary column because now the compressor is 
> table level.
>   =>TODO, support configuration for compress, default is no 
> compress because binary usually is already compressed, like jpg format image. 
> So no need to uncompress for binary column. 1.5.4 will support column level 
> compression, after that, we can implement no compress for binary. We can talk 
> with community.
>   1.3 CarbonData stores binary as dimension.
>   1.4 Support configure page size for binary data type because binary 
> data usually is big, such as 200k. Otherwise it will be very big for one 
> blocklet (32000 rows).
> TODO: 1.5 Avro, JSON convert need consider
>   
>   2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.[Formal]
>   2.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[]
>   2.2 Support create table with binary column, table property doesn’t 
> support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
> column
>   => Evaluate COLUMN_META_CACHE for binary
> => carbon.column.compressor for all columns
>   2.3 Support CTAS for binary=> transaction/non-transaction
>   2.4 Support external table for binary
>   2.5 Support projection for binary column
>   2.6 Support show table, desc, ALTER TABLE for binary data type
>   2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
>   2.8 Support compaction for binary
>   2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
>  no need min max datamap for binary, support mv and pre-aggregate in the 
> future
>   2.10 CSDK / python SDK support binary in the future.
>   2.11 Support S3



--
This message was sent by Atlassian JIRA

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of 

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.[Formal]
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.[Formal]
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
TODO: 1.5 Avro, JSON convert need consider  
1.6

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots 

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Issue Type: Sub-task  (was: Task)
Parent: CARBONDATA-3336

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.[Formal]
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.[Formal]
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
TODO: 1.5 Avro, JSON convert need consider  
1.6

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when 

[GitHub] [carbondata] CarbonDataQA commented on issue #3173: [WIP][CARBONDATA-3336] Support Binary Data Type

2019-04-12 Thread GitBox
CarbonDataQA commented on issue #3173:  [WIP][CARBONDATA-3336] Support Binary 
Data Type
URL: https://github.com/apache/carbondata/pull/3173#issuecomment-482449111
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3093/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services