[jira] [Created] (CARBONDATA-3470) Upgrade arrow version

2019-07-15 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3470:
---

 Summary: Upgrade arrow version
 Key: CARBONDATA-3470
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3470
 Project: CarbonData
  Issue Type: Improvement
Reporter: xubo245
Assignee: xubo245


Upgrade arrow version



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (CARBONDATA-3443) Update hive guide with Read from hive

2019-07-05 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3443.
-
Resolution: Fixed

> Update hive guide with Read from hive
> -
>
> Key: CARBONDATA-3443
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3443
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: dhatchayani
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3461) Carbon SDK support filter values set.

2019-07-03 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3461:
---

 Summary: Carbon SDK support filter values set.
 Key: CARBONDATA-3461
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3461
 Project: CarbonData
  Issue Type: New Feature
Reporter: xubo245
Assignee: xubo245


Carbon SDK support filter values set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-3398) Implement Show Cache for IndexServer and MV

2019-06-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 closed CARBONDATA-3398.
---

> Implement Show Cache for IndexServer and MV
> ---
>
> Key: CARBONDATA-3398
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3398
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Kunal Kapoor
>Assignee: Kunal Kapoor
>Priority: Major
> Fix For: 1.6.0
>
>  Time Spent: 25h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3412) For Non-transactional tables empty results are displayed with index server enabled

2019-06-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3412.
-
Resolution: Fixed

> For Non-transactional tables empty results are displayed with index server 
> enabled
> --
>
> Key: CARBONDATA-3412
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3412
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Kunal Kapoor
>Assignee: Kunal Kapoor
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3446) Support read schema of complex data type from carbon file folder path

2019-06-19 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3446:
---

 Summary: Support read schema of complex data type from carbon file 
folder path 
 Key: CARBONDATA-3446
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3446
 Project: CarbonData
  Issue Type: New Feature
Reporter: xubo245
Assignee: xubo245


Support read schema of complex data type from carbon file folder path 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3425) Add Documentation for MV datamap

2019-06-19 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3425.
-
Resolution: Fixed

> Add Documentation for MV datamap
> 
>
> Key: CARBONDATA-3425
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3425
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3415) Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table.

2019-06-17 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3415.
-
Resolution: Fixed

> Merge index is not working for partition table. Merge index for partition 
> table is taking significantly longer time than normal table.
> --
>
> Key: CARBONDATA-3415
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3415
> Project: CarbonData
>  Issue Type: Bug
>Reporter: dhatchayani
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Issues:
> (1) Merge index is not working on partition table
> (2) Time taken for merge index is significantly more than the normal carbon 
> table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3434) Fix Data Mismatch between MainTable and MV DataMap table during compaction

2019-06-17 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3434.
-
Resolution: Fixed

> Fix Data Mismatch between MainTable and MV DataMap table during compaction
> --
>
> Key: CARBONDATA-3434
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3434
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3258) Add more test case for mv datamap

2019-06-11 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3258.
-
Resolution: Fixed

> Add more test case for mv datamap
> -
>
> Key: CARBONDATA-3258
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3258
> Project: CarbonData
>  Issue Type: Test
>  Components: data-query
>Reporter: Chenjian Qiu
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Add more test case for mv datamap



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3414) when Insert into partition table fails exception doesn't print reason.

2019-06-11 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3414.
-
Resolution: Fixed

> when Insert into partition table fails exception doesn't print reason.
> --
>
> Key: CARBONDATA-3414
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3414
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Priority: Minor
>
> problem: when Insert into partition table fails, exception doesn't print 
> reason.
>  cause: Exception was caught , but error message was not from that exception. 
> solution: throw the exception directly 
>  
> Steps to reproduce: 
>  # Open multiple spark beeline (say 10)
>  # Create a carbon table with partition 
>  # Insert overwrite to carbon table from all the 10 beeline concurrently
>  # some insert overwrite will be success and some will fail due to 
> non-availability of lock even after retry.
>  # For the failed insert into sql, Exception is just "DataLoadFailure: " no 
> error reason is printed.
> Need to print the valid error reason for the failure. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3411) ClearDatamaps logs an exception in SDK

2019-06-11 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3411.
-
Resolution: Fixed

> ClearDatamaps logs an exception in SDK
> --
>
> Key: CARBONDATA-3411
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3411
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> problem: In sdk when datamaps are cleared, below exception is logged
> java.io.IOException: File does not exist: 
> /home/root1/Documents/ab/workspace/carbonFile/carbondata/store/sdk/testWriteFiles/771604793030370/Metadata/schema
>  at 
> org.apache.carbondata.core.metadata.schema.SchemaReader.readCarbonTableFromStore(SchemaReader.java:60)
>  at 
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildFromTablePath(CarbonTable.java:272)
>  at 
> org.apache.carbondata.core.datamap.DataMapStoreManager.getCarbonTable(DataMapStoreManager.java:566)
>  at 
> org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:514)
>  at 
> org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:504)
>  at 
> org.apache.carbondata.sdk.file.CarbonReaderBuilder.getSplits(CarbonReaderBuilder.java:419)
>  at 
> org.apache.carbondata.sdk.file.CarbonReaderTest.testGetSplits(CarbonReaderTest.java:2605)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at junit.framework.TestCase.runTest(TestCase.java:176)
>  at junit.framework.TestCase.runBare(TestCase.java:141)
>  at junit.framework.TestResult$1.protect(TestResult.java:122)
>  at junit.framework.TestResult.runProtected(TestResult.java:142)
>  at junit.framework.TestResult.run(TestResult.java:125)
>  at junit.framework.TestCase.run(TestCase.java:129)
>  at junit.framework.TestSuite.runTest(TestSuite.java:255)
>  at junit.framework.TestSuite.run(TestSuite.java:250)
>  at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
>  at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
>  at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>  at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>  at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> cause: CarbonTable is required for only launching the job, SDK there is no 
> need to launch job. so , no need to build a carbon table.
> solution: build carbon table only when need to launch job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3424) There are improper exception when query with avg(substr(binary data type)).

2019-06-10 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3424:
---

 Summary: There are improper exception when query with 
avg(substr(binary data type)).
 Key: CARBONDATA-3424
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3424
 Project: CarbonData
  Issue Type: Bug
Reporter: xubo245
Assignee: xubo245


Code:
{code:java}
CREATE TABLE uniqdata (CUST_ID int,CUST_NAME binary,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('table_blocksize'='2000');
LOAD DATA inpath 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',' 
,'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

Select query with average function for substring of binary column is executed.

select 
max(substr(CUST_NAME,1,2)),min(substr(CUST_NAME,1,2)),avg(substr(CUST_NAME,1,2)),count(substr(CUST_NAME,1,2)),sum(substr(CUST_NAME,1,2)),variance(substr(CUST_NAME,1,2))
 from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 
=1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 
1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;

select 
max(substring(CUST_NAME,1,2)),min(substring(CUST_NAME,1,2)),avg(substring(CUST_NAME,1,2)),count(substring(CUST_NAME,1,2)),sum(substring(CUST_NAME,1,2)),variance(substring(CUST_NAME,1,2))
 from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 
=1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 
1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;
{code}

Improper exception:
{code:java}
"Invalid call to name on unresolved object, tree: 
unresolvedalias(avg(substring(CUST_NAME#73, 1, 2)), None)" did not contain 
"cannot resolve 'avg(substring(uniqdata.`CUST_NAME`, 1, 2))' due to data type 
mismatch: function average requires numeric types, not BinaryType"
ScalaTestFailureLocation: 
org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27
 at (TestBinaryDataType.scala:1410)
org.scalatest.exceptions.TestFailedException: "Invalid call to name on 
unresolved object, tree: unresolvedalias(avg(substring(CUST_NAME#73, 1, 2)), 
None)" did not contain "cannot resolve 'avg(substring(uniqdata.`CUST_NAME`, 1, 
2))' due to data type mismatch: function average requires numeric types, not 
BinaryType"
at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
at 
org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27.apply$mcV$sp(TestBinaryDataType.scala:1410)
at 
org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27.apply(TestBinaryDataType.scala:1352)
at 
org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27.apply(TestBinaryDataType.scala:1352)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at 
org.apache.spark.sql.test.util.CarbonFunSuite.withFixture(CarbonFunSuite.scala:41)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.scalatest.Su

[jira] [Created] (CARBONDATA-3423) Validate dictionary for binary data type

2019-06-10 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3423:
---

 Summary: Validate dictionary for binary data type
 Key: CARBONDATA-3423
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3423
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-3410) Add UDF, Hex/Base64 SQL functions for binary

2019-05-31 Thread xubo245 (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852972#comment-16852972
 ] 

xubo245 commented on CARBONDATA-3410:
-

CREATE TABLE uniqdata (CUST_ID int,CUST_NAME binary,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('table_blocksize'='2000');
LOAD DATA inpath 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',' 
,'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

Select query with average function for substring of binary column is executed.

select 
max(substr(CUST_NAME,1,2)),min(substr(CUST_NAME,1,2)),avg(substr(CUST_NAME,1,2)),count(substr(CUST_NAME,1,2)),sum(substr(CUST_NAME,1,2)),variance(substr(CUST_NAME,1,2))
 from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 
=1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 
1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;

select 
max(substring(CUST_NAME,1,2)),min(substring(CUST_NAME,1,2)),avg(substring(CUST_NAME,1,2)),count(substring(CUST_NAME,1,2)),sum(substring(CUST_NAME,1,2)),variance(substring(CUST_NAME,1,2))
 from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 
=1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 
1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;

> Add UDF, Hex/Base64 SQL functions for binary
> 
>
> Key: CARBONDATA-3410
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3410
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> Add UDF, Hex/Base64 SQL functions for binary



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3410) Add UDF, Hex/Base64 SQL functions for binary

2019-05-31 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3410:
---

 Summary: Add UDF, Hex/Base64 SQL functions for binary
 Key: CARBONDATA-3410
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3410
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


Add UDF, Hex/Base64 SQL functions for binary



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3351) Support Binary Data Type

2019-05-31 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3351.
-
Resolution: Fixed

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>  Time Spent: 35h
>  Remaining Estimate: 0h
>
> Background :
> Binary is basic data type and widely used in various scenarios. So it’s 
> better to support binary data type in CarbonData. Download data from S3 will 
> be slow when dataset has lots of small binary data. The majority of 
> application scenarios are  related to storage small binary data type into 
> CarbonData, which can avoid small binary files problem and speed up S3 access 
> performance, also can decrease cost of accessing OBS by decreasing the number 
> of calling S3 API. It also will easier to manage structure data and 
> Unstructured data(binary) by storing them into CarbonData. 
> Goals:
> 1. Supporting write binary data type by Carbon Java SDK.
> 2. Supporting read binary data type by Spark Carbon file format(carbon 
> datasource) and CarbonSession.
> 3. Supporting read binary data type by Carbon SDK
> 4. Supporting write binary by spark
> Approach and Detail:
>   1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK needs support write data with specific data types, 
> like int, double, byte[ ] data type, no need to convert all data type to 
> string array. User read binary file as byte[], then SDK writes byte[] into 
> binary column.=>Done
>   1.2 CarbonData compress binary column because now the compressor is 
> table level.=>Done
>   1.3 CarbonData stores binary as dimension. => Done
>   1.4 Support configure page size for binary data type because binary 
> data usually is big, such as 200k. Otherwise it will be very big for one 
> blocklet (32000 rows). =>Done
>   1.5 Avro, JSON convert need consider
>   •   AVRO fixed and variable length binary can be supported
>   => Avro don't support binary data type => No 
> need
>Support read binary from JSON  => done.
>   1.6 Binay data type as a child columns in Struct, Map   
>   
>=> support it in the future, but priority is not very 
> high, not in 1.5.4
>   1.7 Verify what is the maximum size of the binary value supportred  
> => snappy only support about 1.71 G, the max data size should be 2 GB, 
> but need confirm
>   
>   2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.[Formal]
>   2.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[] =>Done
>   2.2 Support create table with binary column, table property doesn’t 
> support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
> column => Done
>=> CARBON Datasource don't support dictionary include column
>=>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
> compress is for all columns(table level)
>   2.3 Support CTAS for binary=> transaction/non-transaction,  
> Carbon/Hive/Parquet => Done 
>   2.4 Support external table for binary=> Done
>   2.5 Support projection for binary column=> Done
>   2.6 Support desc formatted=> Done
>=> Carbon Datasource don't support  ALTER TABLE add 
> columns sql
>support  ALTER TABLE for(add column, rename, drop column) 
> binary data type in carbon session=> Done
>Don't support change the data type for binary by alter 
> table => Done
>   2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
>   2.8 Support compaction for binary=> Done
>   2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
>  no need min max datamap for binary, support mv and pre-aggregate in the 
> future=> TODO
>   2.10 CSDK / python SDK support binary in the future.=> TODO
>   2.11 Support S3=> Done
> 2.12 support UDF, hex, base64, cast:.=> TODO
>select hex(bin) from carbon_table..=> TODO
> 
> 2.15 support filte

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-05-31 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress  and no compress, 
default no compress because binary usually is already compressed, like jpg 
format image. So no need to uncompress for binary column. 1.5.4 will support 
column level compression, after that, we can implement no compress for binary. 
We can talk with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t BUCKETCOLUMNS  for binary => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO, python 
sdk already merge to pycarbon
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> Done
2.15 How big data size binary data type can support for writing and 
reading?=> TODO
2.16 

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-05-31 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress  and no compress, 
default no compress because binary usually is already compressed, like jpg 
format image. So no need to uncompress for binary column. 1.5.4 will support 
column level compression, after that, we can implement no compress for binary. 
We can talk with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO, python 
sdk already merge to pycarbon
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> Done
2.15 How big data size binary data type can support for writing and 
reading?=> TOD

[jira] [Created] (CARBONDATA-3408) CarbonSession partition support binary data type

2019-05-31 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3408:
---

 Summary: CarbonSession partition support binary data type
 Key: CARBONDATA-3408
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3408
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


CarbonSession partition support binary data type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3366) Support SDK reader to read blocklet level split

2019-05-20 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3366.
-
Resolution: Fixed

> Support SDK reader to read blocklet level split
> ---
>
> Key: CARBONDATA-3366
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3366
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ajantha Bhat
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> To provide more flexibility in SDK reader, blocklet level read support for 
> carbondata files from SDK reader is required.
> With this, SDK reader can be used in distributed environment or in 
> multithread environment by creating carbon readers in each worker at split 
> level (blocklet split)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-05-09 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress  and no compress, 
default no compress because binary usually is already compressed, like jpg 
format image. So no need to uncompress for binary column. 1.5.4 will support 
column level compression, after that, we can implement no compress for binary. 
We can talk with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> Done
2.15 How big data size binary data type can support for writing and 
reading?=> TODO
2.16 support f

[jira] [Created] (CARBONDATA-3374) Optimize documentation and fix some spell errors.

2019-05-07 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3374:
---

 Summary: Optimize documentation and fix some spell errors.
 Key: CARBONDATA-3374
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3374
 Project: CarbonData
  Issue Type: Improvement
Reporter: xubo245
Assignee: xubo245


Optimize documentation and fix some spell errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3363) SDK supports read data from carbondata filelist

2019-04-29 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3363:
---

 Summary: SDK supports read data from carbondata filelist
 Key: CARBONDATA-3363
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3363
 Project: CarbonData
  Issue Type: New Feature
Reporter: xubo245
Assignee: xubo245


SDK supports read data from carbondata filelist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> Done
2.14 Proper Error message for not supported features like SI=> TODO
2.15 How big data size binary data type can support for wri

[jira] [Updated] (CARBONDATA-3358) Support configurable decode for loading binary data, support base64 and Hex decode.

2019-04-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3358:

Description: 
Support configurable decode for loading binary data, support base64 and Hex 
decode.
1. support configurable decode for loading
2. test datamap
3. test datamap and configurable decode

  was:Support configurable decode for loading binary data, support base64 and 
Hex decode.


> Support configurable decode for loading binary data, support base64 and Hex 
> decode.
> ---
>
> Key: CARBONDATA-3358
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3358
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> Support configurable decode for loading binary data, support base64 and Hex 
> decode.
> 1. support configurable decode for loading
> 2. test datamap
> 3. test datamap and configurable decode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3358) Support configurable decode for loading binary data, support base64 and Hex decode.

2019-04-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3358:

Description: 
Support configurable decode for loading binary data, support base64 and Hex 
decode.
1. support configurable decode for loading
2. test datamap: mv, preaggregate, timeseries, bloomfilter, lucene
3. test datamap and configurable decode

  was:
Support configurable decode for loading binary data, support base64 and Hex 
decode.
1. support configurable decode for loading
2. test datamap
3. test datamap and configurable decode


> Support configurable decode for loading binary data, support base64 and Hex 
> decode.
> ---
>
> Key: CARBONDATA-3358
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3358
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> Support configurable decode for loading binary data, support base64 and Hex 
> decode.
> 1. support configurable decode for loading
> 2. test datamap: mv, preaggregate, timeseries, bloomfilter, lucene
> 3. test datamap and configurable decode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> Done
2.14 Proper Error message for not supported features like SI=> TODO
2.15 How big data size binary data type can support for wri

[jira] [Created] (CARBONDATA-3358) Support configurable decode for loading binary data, support base64 and Hex decode.

2019-04-24 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3358:
---

 Summary: Support configurable decode for loading binary data, 
support base64 and Hex decode.
 Key: CARBONDATA-3358
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3358
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


Support configurable decode for loading binary data, support base64 and Hex 
decode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> TODO
2.14 Proper Error message for not supported features like SI=> TODO
2.15 How big data size binary data type can support for wri

[jira] [Commented] (CARBONDATA-3336) Support Binary Data Type

2019-04-23 Thread xubo245 (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823822#comment-16823822
 ] 

xubo245 commented on CARBONDATA-3336:
-

小于2G的场景是可以存储在carbondata文件中, 大于2G文件的场景就必须分离存储? 

另外小于2G的文件写入到carbon中,sdk是否支持流式写入呢。


> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
> Attachments: CarbonData support binary data type V0.1.pdf
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> CarbonData supports binary data type
> Version   Changes Owner   Date
> 0.1   Init doc for Supporting binary data typeXubo2019-4-10
> Background :
> Binary is basic data type and widely used in various scenarios. So it’s 
> better to support binary data type in CarbonData. Download data from S3 will 
> be slow when dataset has lots of small binary data. The majority of 
> application scenarios are  related to storage small binary data type into 
> CarbonData, which can avoid small binary files problem and speed up S3 access 
> performance, also can decrease cost of accessing OBS by decreasing the number 
> of calling S3 API. It also will easier to manage structure data and 
> Unstructured data(binary) by storing them into CarbonData. 
> Goals:
> 1. Supporting write binary data type by Carbon Java SDK.
> 2. Supporting read binary data type by Spark Carbon file format(carbon 
> datasource) and CarbonSession.
> 3. Supporting read binary data type by Carbon SDK
> 4. Supporting write binary by spark
> Approach and Detail:
>   1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK needs support write data with specific data types, 
> like int, double, byte[ ] data type, no need to convert all data type to 
> string array. User read binary file as byte[], then SDK writes byte[] into 
> binary column.=>Done
>   1.2 CarbonData compress binary column because now the compressor is 
> table level.=>Done
>   =>TODO, support configuration for compress, default is no 
> compress because binary usually is already compressed, like jpg format image. 
> So no need to uncompress for binary column. 1.5.4 will support column level 
> compression, after that, we can implement no compress for binary. We can talk 
> with community.
>   1.3 CarbonData stores binary as dimension. => Done
>   1.4 Support configure page size for binary data type because binary 
> data usually is big, such as 200k. Otherwise it will be very big for one 
> blocklet (32000 rows). =>Done
>   1.5 Avro, JSON convert need consider
>   •   AVRO fixed and variable length binary can be supported
>   => Avro don't support binary data type => No 
> need
>Support read binary from JSON  => done.
>   1.6 Binay data type as a child columns in Struct, Map   
>   
>=> support it in the future, but priority is not very 
> high, not in 1.5.4
>   1.7 Verify what is the maximum size of the binary value supportred  
> => snappy only support about 1.71 G, the max data size should be 2 GB, 
> but need confirm
>   
>   2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.[Formal]
>   2.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[] =>Done
>   2.2 Support create table with binary column, table property doesn’t 
> support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
> column => Done
>=> CARBON Datasource don't support dictionary include column
>=>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
> compress is for all columns(table level)
>   2.3 Support CTAS for binary=> transaction/non-transaction,  
> Carbon/Hive/Parquet => Done 
>   2.4 Support external table for binary=> Done
>   2.5 Support projection for binary column=> Done
>   2.6 Support desc formatted=> Done
>=> Carbon Datasource don't support  ALTER TABLE add 
> columns sql
>support  ALTER TABLE for(add column, rename, drop column) 
> binary data type in carbon session=> Done
>Don't support change th

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-23 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 
Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the 
future=> TODO
2.10 CSDK / python SDK support binary in the future.=> TODO
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO

2.15 support filter for binary => Done
2.16 select CAST(s AS BINARY) from carbon_table. => Done

3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]=> Done
3.2 Supporting projection for binary column=> Done
3.3 Supporting S3=> Done
3.4 no need to support filter.=> to be discussd, not in this PR

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV=> Done
4.2 Spark load CSV and convert string to byte[], and storage in 
CarbonData. read binary column a

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-22 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the 
future=> TODO
2.10 CSDK / python SDK support binary in the future.=> TODO
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> TODO
2.14 Proper Error message for not supported features like 
/mv/SI/bloom/streaming=> TODO
2.15 How big data size binary data type can support for writing and 
read

[jira] [Updated] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar

2019-04-20 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3356:

Description: 
There are some exception when  carbonData DataSource read SDK files with 
varchar 

## write data:
{code:java}
  public void testReadSchemaFromDataFileArrayString() {
String path = "./testWriteFiles";
try {
  FileUtils.deleteDirectory(new File(path));

  Field[] fields = new Field[11];
  fields[0] = new Field("stringField", DataTypes.STRING);
  fields[1] = new Field("shortField", DataTypes.SHORT);
  fields[2] = new Field("intField", DataTypes.INT);
  fields[3] = new Field("longField", DataTypes.LONG);
  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
  fields[6] = new Field("dateField", DataTypes.DATE);
  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2));
  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
  Map map = new HashMap<>();
  map.put("complex_delimiter_level_1", "#");
  CarbonWriter writer = CarbonWriter.builder()
  .outputPath(path)
  .withLoadOptions(map)
  .withCsvInput(new Schema(fields))
  .writtenBy("CarbonReaderTest")
  .build();

  for (int i = 0; i < 10; i++) {
String[] row2 = new String[]{
"robot" + (i % 10),
String.valueOf(i % 1),
String.valueOf(i),
String.valueOf(Long.MAX_VALUE - i),
String.valueOf((double) i / 2),
String.valueOf(true),
"2019-03-02",
"2019-02-12 03:03:34",
"12.345",
"varchar",
"Hello#World#From#Carbon"
};
writer.write(row2);
  }
  writer.close();

  File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
  if (name == null) {
return false;
  }
  return name.endsWith("carbondata");
}
  });
  if (dataFiles == null || dataFiles.length < 1) {
throw new RuntimeException("Carbon data file not exists.");
  }
  Schema schema = CarbonSchemaReader
  .readSchema(dataFiles[0].getAbsolutePath())
  .asOriginOrder();
  // Transform the schema
  String[] strings = new String[schema.getFields().length];
  for (int i = 0; i < schema.getFields().length; i++) {
strings[i] = (schema.getFields())[i].getFieldName();
  }

  // Read data
  CarbonReader reader = CarbonReader
  .builder(path, "_temp")
  .projection(strings)
  .build();

  int i = 0;
  while (reader.hasNext()) {
Object[] row = (Object[]) reader.readNextRow();
assert (row[0].equals("robot" + i));
assert (row[2].equals(i));
assert (row[6].equals(17957));
Object[] arr = (Object[]) row[10];
assert (arr[0].equals("Hello"));
assert (arr[3].equals("Carbon"));
i++;
  }
  reader.close();
//  FileUtils.deleteDirectory(new File(path));
} catch (Throwable e) {
  e.printStackTrace();
  Assert.fail(e.getMessage());
}
  }

{code}

## read data

{code:java}
test("Test read image carbon with spark carbon file format, generate by 
sdk, CTAS") {
sql("DROP TABLE IF EXISTS binaryCarbon")
sql("DROP TABLE IF EXISTS binaryCarbon3")
if (SparkUtil.isSparkVersionEqualTo("2.1")) {
sql(s"CREATE TABLE binaryCarbon USING CARBON OPTIONS(PATH 
'$writerPath')")
sql(s"CREATE TABLE binaryCarbon3 USING CARBON OPTIONS(PATH 
'$outputPath')" + " AS SELECT * FROM binaryCarbon")
} else {
//sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION 
'$writerPath'")
sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION 
'/Users/xubo/Desktop/xubo/git/carbondata3/store/sdk/testWriteFiles'")
sql("SELECT COUNT(*) FROM binaryCarbon").show()
}
}
{code}

## exception:

{code:java}
java.io.IOException: All common columns present in the files doesn't have same 
datatype. Unsupported operation on nonTransactional table. Check logs.
at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.updateColumns(AbstractQueryExecutor.java:2

[jira] [Updated] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar

2019-04-20 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3356:

Description: 
There are some exception when  carbonData DataSource read SDK files with 
varchar 

## write data:
{code:java}
  public void testReadSchemaFromDataFileArrayString() {
String path = "./testWriteFiles";
try {
  FileUtils.deleteDirectory(new File(path));

  Field[] fields = new Field[11];
  fields[0] = new Field("stringField", DataTypes.STRING);
  fields[1] = new Field("shortField", DataTypes.SHORT);
  fields[2] = new Field("intField", DataTypes.INT);
  fields[3] = new Field("longField", DataTypes.LONG);
  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
  fields[6] = new Field("dateField", DataTypes.DATE);
  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2));
  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
  Map map = new HashMap<>();
  map.put("complex_delimiter_level_1", "#");
  CarbonWriter writer = CarbonWriter.builder()
  .outputPath(path)
  .withLoadOptions(map)
  .withCsvInput(new Schema(fields))
  .writtenBy("CarbonReaderTest")
  .build();

  for (int i = 0; i < 10; i++) {
String[] row2 = new String[]{
"robot" + (i % 10),
String.valueOf(i % 1),
String.valueOf(i),
String.valueOf(Long.MAX_VALUE - i),
String.valueOf((double) i / 2),
String.valueOf(true),
"2019-03-02",
"2019-02-12 03:03:34",
"12.345",
"varchar",
"Hello#World#From#Carbon"
};
writer.write(row2);
  }
  writer.close();

  File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
  if (name == null) {
return false;
  }
  return name.endsWith("carbondata");
}
  });
  if (dataFiles == null || dataFiles.length < 1) {
throw new RuntimeException("Carbon data file not exists.");
  }
  Schema schema = CarbonSchemaReader
  .readSchema(dataFiles[0].getAbsolutePath())
  .asOriginOrder();
  // Transform the schema
  String[] strings = new String[schema.getFields().length];
  for (int i = 0; i < schema.getFields().length; i++) {
strings[i] = (schema.getFields())[i].getFieldName();
  }

  // Read data
  CarbonReader reader = CarbonReader
  .builder(path, "_temp")
  .projection(strings)
  .build();

  int i = 0;
  while (reader.hasNext()) {
Object[] row = (Object[]) reader.readNextRow();
assert (row[0].equals("robot" + i));
assert (row[2].equals(i));
assert (row[6].equals(17957));
Object[] arr = (Object[]) row[10];
assert (arr[0].equals("Hello"));
assert (arr[3].equals("Carbon"));
i++;
  }
  reader.close();
//  FileUtils.deleteDirectory(new File(path));
} catch (Throwable e) {
  e.printStackTrace();
  Assert.fail(e.getMessage());
}
  }

{code}

## read data

{code:java}
test("Test read image carbon with spark carbon file format, generate by 
sdk, CTAS") {
sql("DROP TABLE IF EXISTS binaryCarbon")
sql("DROP TABLE IF EXISTS binaryCarbon3")
if (SparkUtil.isSparkVersionEqualTo("2.1")) {
sql(s"CREATE TABLE binaryCarbon USING CARBON OPTIONS(PATH 
'$writerPath')")
sql(s"CREATE TABLE binaryCarbon3 USING CARBON OPTIONS(PATH 
'$outputPath')" + " AS SELECT * FROM binaryCarbon")
} else {
//sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION 
'$writerPath'")
sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION 
'/Users/xubo/Desktop/xubo/git/carbondata3/store/sdk/testWriteFiles'")
sql("SELECT COUNT(*) FROM binaryCarbon").show()
}
}
{code}

## exception:

{code:java}
java.io.IOException: All common columns present in the files doesn't have same 
datatype. Unsupported operation on nonTransactional table. Check logs.
at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.updateColumns(AbstractQueryExecutor.java:2

[jira] [Commented] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar

2019-04-20 Thread xubo245 (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822376#comment-16822376
 ] 

xubo245 commented on CARBONDATA-3356:
-

DataType varchar is not supported.(line 1, pos 68)

> There are some exception when  carbonData DataSource read SDK files with 
> varchar 
> -
>
> Key: CARBONDATA-3356
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3356
> Project: CarbonData
>  Issue Type: Bug
>    Reporter: xubo245
>Priority: Major
>
> There are some exception when  carbonData DataSource read SDK files with 
> varchar 
> ## write data:
> {code:java}
>   public void testReadSchemaFromDataFileArrayString() {
> String path = "./testWriteFiles";
> try {
>   FileUtils.deleteDirectory(new File(path));
>   Field[] fields = new Field[11];
>   fields[0] = new Field("stringField", DataTypes.STRING);
>   fields[1] = new Field("shortField", DataTypes.SHORT);
>   fields[2] = new Field("intField", DataTypes.INT);
>   fields[3] = new Field("longField", DataTypes.LONG);
>   fields[4] = new Field("doubleField", DataTypes.DOUBLE);
>   fields[5] = new Field("boolField", DataTypes.BOOLEAN);
>   fields[6] = new Field("dateField", DataTypes.DATE);
>   fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
>   fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
> 2));
>   fields[9] = new Field("varcharField", DataTypes.VARCHAR);
>   fields[10] = new Field("arrayField", 
> DataTypes.createArrayType(DataTypes.STRING));
>   Map map = new HashMap<>();
>   map.put("complex_delimiter_level_1", "#");
>   CarbonWriter writer = CarbonWriter.builder()
>   .outputPath(path)
>   .withLoadOptions(map)
>   .withCsvInput(new Schema(fields))
>   .writtenBy("CarbonReaderTest")
>   .build();
>   for (int i = 0; i < 10; i++) {
> String[] row2 = new String[]{
> "robot" + (i % 10),
> String.valueOf(i % 1),
> String.valueOf(i),
> String.valueOf(Long.MAX_VALUE - i),
> String.valueOf((double) i / 2),
> String.valueOf(true),
> "2019-03-02",
> "2019-02-12 03:03:34",
> "12.345",
> "varchar",
> "Hello#World#From#Carbon"
> };
> writer.write(row2);
>   }
>   writer.close();
>   File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
> @Override
> public boolean accept(File dir, String name) {
>   if (name == null) {
> return false;
>   }
>   return name.endsWith("carbondata");
> }
>   });
>   if (dataFiles == null || dataFiles.length < 1) {
> throw new RuntimeException("Carbon data file not exists.");
>   }
>   Schema schema = CarbonSchemaReader
>   .readSchema(dataFiles[0].getAbsolutePath())
>   .asOriginOrder();
>   // Transform the schema
>   String[] strings = new String[schema.getFields().length];
>   for (int i = 0; i < schema.getFields().length; i++) {
> strings[i] = (schema.getFields())[i].getFieldName();
>   }
>   // Read data
>   CarbonReader reader = CarbonReader
>   .builder(path, "_temp")
>   .projection(strings)
>   .build();
>   int i = 0;
>   while (reader.hasNext()) {
> Object[] row = (Object[]) reader.readNextRow();
> assert (row[0].equals("robot" + i));
> assert (row[2].equals(i));
> assert (row[6].equals(17957));
> Object[] arr = (Object[]) row[10];
> assert (arr[0].equals("Hello"));
> assert (arr[3].equals("Carbon"));
> i++;
>   }
>   reader.close();
> //  FileUtils.deleteDirectory(new File(path));
> } catch (Throwable e) {
>   e.printStackTrace();
>   Assert.fail(e.getMessage());
> }
>   }
> {code}
> ## read data
> {code:java}
> test("Test read image carbon with spark carbon file format, generate by 
> sdk, CTAS") {
> sql("DROP TABLE IF EXISTS binaryCarbon")
>  

[jira] [Created] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar

2019-04-19 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3356:
---

 Summary: There are some exception when  carbonData DataSource read 
SDK files with varchar 
 Key: CARBONDATA-3356
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3356
 Project: CarbonData
  Issue Type: Bug
Reporter: xubo245


There are some exception when  carbonData DataSource read SDK files with 
varchar 

## write data:
{code:java}
  public void testReadSchemaFromDataFileArrayString() {
String path = "./testWriteFiles";
try {
  FileUtils.deleteDirectory(new File(path));

  Field[] fields = new Field[11];
  fields[0] = new Field("stringField", DataTypes.STRING);
  fields[1] = new Field("shortField", DataTypes.SHORT);
  fields[2] = new Field("intField", DataTypes.INT);
  fields[3] = new Field("longField", DataTypes.LONG);
  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
  fields[6] = new Field("dateField", DataTypes.DATE);
  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2));
  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
  Map map = new HashMap<>();
  map.put("complex_delimiter_level_1", "#");
  CarbonWriter writer = CarbonWriter.builder()
  .outputPath(path)
  .withLoadOptions(map)
  .withCsvInput(new Schema(fields))
  .writtenBy("CarbonReaderTest")
  .build();

  for (int i = 0; i < 10; i++) {
String[] row2 = new String[]{
"robot" + (i % 10),
String.valueOf(i % 1),
String.valueOf(i),
String.valueOf(Long.MAX_VALUE - i),
String.valueOf((double) i / 2),
String.valueOf(true),
"2019-03-02",
"2019-02-12 03:03:34",
"12.345",
"varchar",
"Hello#World#From#Carbon"
};
writer.write(row2);
  }
  writer.close();

  File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
  if (name == null) {
return false;
  }
  return name.endsWith("carbondata");
}
  });
  if (dataFiles == null || dataFiles.length < 1) {
throw new RuntimeException("Carbon data file not exists.");
  }
  Schema schema = CarbonSchemaReader
  .readSchema(dataFiles[0].getAbsolutePath())
  .asOriginOrder();
  // Transform the schema
  String[] strings = new String[schema.getFields().length];
  for (int i = 0; i < schema.getFields().length; i++) {
strings[i] = (schema.getFields())[i].getFieldName();
  }

  // Read data
  CarbonReader reader = CarbonReader
  .builder(path, "_temp")
  .projection(strings)
  .build();

  int i = 0;
  while (reader.hasNext()) {
Object[] row = (Object[]) reader.readNextRow();
assert (row[0].equals("robot" + i));
assert (row[2].equals(i));
assert (row[6].equals(17957));
Object[] arr = (Object[]) row[10];
assert (arr[0].equals("Hello"));
assert (arr[3].equals("Carbon"));
i++;
  }
  reader.close();
//  FileUtils.deleteDirectory(new File(path));
} catch (Throwable e) {
  e.printStackTrace();
  Assert.fail(e.getMessage());
}
  }

{code}

## read data

{code:java}
test("Test read image carbon with spark carbon file format, generate by 
sdk, CTAS") {
sql("DROP TABLE IF EXISTS binaryCarbon")
sql("DROP TABLE IF EXISTS binaryCarbon3")
if (SparkUtil.isSparkVersionEqualTo("2.1")) {
sql(s"CREATE TABLE binaryCarbon USING CARBON OPTIONS(PATH 
'$writerPath')")
sql(s"CREATE TABLE binaryCarbon3 USING CARBON OPTIONS(PATH 
'$outputPath')" + " AS SELECT * FROM binaryCarbon")
} else {
//sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION 
'$writerPath'")
sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION 
'/Users/xubo/Desktop/xubo/git/carbondata3/store/sdk/testWriteFiles'")
sql("SELECT COUNT(*) FROM binaryCarbon").show()
}
}
{code}

## exception:

{code:java}
java.io.IOException: All common columns present in the files doesn't have same 
datatype. Unsupported opera

[jira] [Commented] (CARBONDATA-3336) Support Binary Data Type

2019-04-18 Thread xubo245 (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821005#comment-16821005
 ] 

xubo245 commented on CARBONDATA-3336:
-

Array:org.apache.carbondata.processing.loading.parser.impl.RowParserImpl#parseRow

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
> Attachments: CarbonData support binary data type V0.1.pdf
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> CarbonData supports binary data type
> Version   Changes Owner   Date
> 0.1   Init doc for Supporting binary data typeXubo2019-4-10
> Background :
> Binary is basic data type and widely used in various scenarios. So it’s 
> better to support binary data type in CarbonData. Download data from S3 will 
> be slow when dataset has lots of small binary data. The majority of 
> application scenarios are  related to storage small binary data type into 
> CarbonData, which can avoid small binary files problem and speed up S3 access 
> performance, also can decrease cost of accessing OBS by decreasing the number 
> of calling S3 API. It also will easier to manage structure data and 
> Unstructured data(binary) by storing them into CarbonData. 
> Goals:
> 1. Supporting write binary data type by Carbon Java SDK.
> 2. Supporting read binary data type by Spark Carbon file format(carbon 
> datasource) and CarbonSession.
> 3. Supporting read binary data type by Carbon SDK
> 4. Supporting write binary by spark
> Approach and Detail:
>   1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK needs support write data with specific data types, 
> like int, double, byte[ ] data type, no need to convert all data type to 
> string array. User read binary file as byte[], then SDK writes byte[] into 
> binary column.  
>   1.2 CarbonData compress binary column because now the compressor is 
> table level.
>   =>TODO, support configuration for compress, default is no 
> compress because binary usually is already compressed, like jpg format image. 
> So no need to uncompress for binary column. 1.5.4 will support column level 
> compression, after that, we can implement no compress for binary. We can talk 
> with community.
>   1.3 CarbonData stores binary as dimension.
>   1.4 Support configure page size for binary data type because binary 
> data usually is big, such as 200k. Otherwise it will be very big for one 
> blocklet (32000 rows).
> TODO: 1.5 Avro, JSON convert need consider
>   
>   2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.[Formal]
>   2.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[]
>   2.2 Support create table with binary column, table property doesn’t 
> support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
> column
>   => Evaluate COLUMN_META_CACHE for binary
>=> CARBON Datasource don't support dictionary include column
>=> carbon.column.compressor for all columns
>   2.3 Support CTAS for binary=> transaction/non-transaction
>   2.4 Support external table for binary
>   2.5 Support projection for binary column
>   2.6 Support desc formatted
>=> Carbon Datasource don't support  ALTER TABLE add 
> calumny sql
>=>TODO: ALTER TABLE for binary data type in carbon session
>   2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS  for binary  
>   2.8 Support compaction for binary(TODO)
>   2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
>  no need min max datamap for binary, support mv and pre-aggregate in the 
> future
>   2.10 CSDK / python SDK support binary in the future.(TODO)
>   2.11 Support S3
>   TODO:
> 2.12 support UDF, hex, base64, cast:
>select hex(bin) from carbon_table.
>select CAST(s AS BINARY) from carbon_table.
> CarbonSession: impact analysis
>   
>   3. Supporting read binary data type by Carbon SDK
>   3.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[]
>   3.2 Supporting proj

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-16 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc formatted
   => Carbon Datasource don't support  ALTER TABLE add calumny 
sql
   =>TODO: ALTER TABLE for binary data type in carbon session
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS  for binary  
2.8 Support compaction for binary(TODO)
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.(TODO)
2.11 Support S3
TODO:
2.12 support UDF, hex, base64, cast:
   select hex(bin) from carbon_table.
   select CAST(s AS BINARY) from carbon_table.
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV
4.2 Spark load CSV and convert string to byte[], and storage in 
CarbonData. read binary column and return as byte[]
4.3 Supporting insert into (string => binary),  TODO: update, 
delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html


  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-16 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 
1.Supporting write binary data type by Carbon Java SDK:
1.1 Java SDK needs support write data with specific data types, like int, 
double, byte[ ] data type, no need to convert all data type to string array. 
User read binary file as byte[], then SDK writes byte[] into binary column. 
 
1.2 CarbonData compress binary column because now the compressor is table level.
=>TODO, support configuration for compress, default is no compress because 
binary usually is already compressed, like jpg format image. So no need to 
uncompress for binary column. 1.5.4 will support column level compression, 
after that, we can implement no compress for binary. We can talk with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary data 
usually is big, such as 200k. Otherwise it will be very big for one blocklet 
(32000 rows). =>PR2814

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.
2.1 Supporting read binary data type from non-transaction table, read binary 
column and return as byte[]
2.2 Support create table with binary column, table property doesn’t support 
sort_columns, dictionary, RANGE_COLUMN for binary column
=> Evaluate COLUMN_META_CACHE for binary
=> CARBON Datasource don't support dictionary include column
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc
=> Carbon Datasource don't support ALTER TABLE add column by sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support S3

3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV
4.2 Spark load CSV and convert string to byte[], and storage in 
CarbonData. read binary column and return as byte[]
4.3 Supporting insert into (string => binary),  TODO: update, 
delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource


  was:
1.Supporting write binary data type by Carbon Java SDK:
1.1 Java SDK needs support write data with specific data types, like int, 
double, byte[ ] data type, no need to convert all data type to string array. 
User read binary file as byte[], then SDK writes byte[] into binary column. 
 
1.2 CarbonData compress binary column because now the compressor is table level.
=>TODO, support configuration for compress, default is no compress because 
binary usually is already compressed, like jpg format image. So no need to 
uncompress for binary column. 1.5.4 will support column level compression, 
after that, we can implement no compress for binary. We can talk with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary data 
usually is big, such as 200k. Otherwise it will be very big for one blocklet 
(32000 rows). =>PR2814

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.
2.1 Supporting read binary data type from non-transaction table, read binary 
column and return as byte[]
2.2 Support create table with binary column, table property doesn’t support 
sort_columns, dictionary, RANGE_COLUMN for binary column
=> Evaluate COLUMN_META_CACHE for binary
=> CARBON Datasource don't support dictionary include column
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc
=> Carbon Datasource don't support ALTER TABLE add column by sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support S3


> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> 1.Supporting write binary data type by Carbon Java SDK:
> 1.1 Java SDK needs support write data with specific d

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-15 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc formatted
   => Carbon Datasource don't support  ALTER TABLE add calumny 
sql
   =>TODO: ALTER TABLE for binary data type in carbon session
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS  for binary  
2.8 Support compaction for binary(TODO)
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.(TODO)
2.11 Support S3
TODO:
2.12 support UDF, hex, base64, cast:
   select hex(bin) from carbon_table.
   select CAST(s AS BINARY) from carbon_table.
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-bi

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc formatted
   => Carbon Datasource don't support  ALTER TABLE add calumny 
sql
   =>TODO: ALTER TABLE for binary data type in carbon session
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS  for binary  
2.8 Support compaction for binary(TODO)
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.(TODO)
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html


  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo201

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 
1.Supporting write binary data type by Carbon Java SDK:
1.1 Java SDK needs support write data with specific data types, like int, 
double, byte[ ] data type, no need to convert all data type to string array. 
User read binary file as byte[], then SDK writes byte[] into binary column. 
 
1.2 CarbonData compress binary column because now the compressor is table level.
=>TODO, support configuration for compress, default is no compress because 
binary usually is already compressed, like jpg format image. So no need to 
uncompress for binary column. 1.5.4 will support column level compression, 
after that, we can implement no compress for binary. We can talk with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary data 
usually is big, such as 200k. Otherwise it will be very big for one blocklet 
(32000 rows). =>PR2814

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.
2.1 Supporting read binary data type from non-transaction table, read binary 
column and return as byte[]
2.2 Support create table with binary column, table property doesn’t support 
sort_columns, dictionary, RANGE_COLUMN for binary column
=> Evaluate COLUMN_META_CACHE for binary
=> CARBON Datasource don't support dictionary include column
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support desc
=> Carbon Datasource don't support ALTER TABLE add column by sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support S3

  was:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). 

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3


> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>      Issue Type: Sub-task
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> 1.Supporting write binary data type by Carbon Java SDK:
> 1.1 Java SDK needs support write data with specific data types, like int, 
> double, byte[ ] data type, no need to convert all data type to string array. 
> User read binary file as byte[], then SDK writes byte[] into binary column.   
>  
> 1.2 CarbonData compress binary column because now the compressor is table 
> level.
> =>TODO, support configuration for compress, default is no compress because 
> binary usually is already compressed, like jpg format image. So no need to

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
   => Carbon Datasource don't support  ALTER TABLE add calumny 
sql
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely u

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
   => CARBON Datasource don't support dictionary include column
   => carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. 

[jira] [Created] (CARBONDATA-3352) Avro, JSON writer of SDK support binary.

2019-04-12 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3352:
---

 Summary: Avro, JSON writer of SDK support binary.
 Key: CARBONDATA-3352
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3352
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


Avro, JSON writer of SDK support binary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). 

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3

  was:

1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3


> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> 1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Description: 

1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
>   1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK needs support write data with specific data types, 
> like int, double, byte[ ] data type, no need to convert all data type to 
> string array. User read binary file as byte[], then SDK writes byte[] into 
> binary column.  
>   1.2 CarbonData compress binary column because now the compressor is 
> table level.
>   =>TODO, support configuration for compress, default is no 
> compress because binary usually is already compressed, like jpg format image. 
> So no need to uncompress for binary column. 1.5.4 will support column level 
> compression, after that, we can implement no compress for binary. We can talk 
> with community.
>   1.3 CarbonData stores binary as dimension.
>   1.4 Support configure page size for binary data type because binary 
> data usually is big, such as 200k. Otherwise it will be very big for one 
> blocklet (32000 rows).
> TODO: 1.5 Avro, JSON convert need consider
>   
>   2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.[Formal]
>   2.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[]
>   2.2 Support create table with binary column, table property doesn’t 
> support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
> column
>   => Evaluate COLUMN_META_CACHE for binary
> => carbon.column.compressor for all columns
>   2.3 Support CTAS for binary=> transaction/non-transaction
>   2.4 Support external table for binary
>   2.5 Support projection for binary column
>   2.6 Support show table, desc, ALTER TABLE for binary data type
>   2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
>   2.8 Support compaction for binary
>   2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
>  no need min max datamap for binary, support mv and pre-agg

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
  TODO: 1.5 Avro, JSON convert need consider


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary 

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.[Formal]
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.[Formal]
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
TODO: 1.5 Avro, JSON convert need consider  
1.6

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
whe

[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3351:

Issue Type: Sub-task  (was: Task)
Parent: CARBONDATA-3336

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-12 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.[Formal]
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.[Formal]
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
TODO: 1.5 Avro, JSON convert need consider  
1.6

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 
mail list: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html



  was:
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-11 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.[Formal]
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.[Formal]
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.  
1.2 CarbonData compress binary column because now the compressor is 
table level.
=>TODO, support configuration for compress, default is no 
compress because binary usually is already compressed, like jpg format image. 
So no need to uncompress for binary column. 1.5.4 will support column level 
compression, after that, we can implement no compress for binary. We can talk 
with community.
1.3 CarbonData stores binary as dimension.
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows).
TODO: 1.5 Avro, JSON convert need consider  
1.6

2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column
=> Evaluate COLUMN_META_CACHE for binary
=> carbon.column.compressor for all columns
2.3 Support CTAS for binary=> transaction/non-transaction
2.4 Support external table for binary
2.5 Support projection for binary column
2.6 Support show table, desc, ALTER TABLE for binary data type
2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary   
2.8 Support compaction for binary
2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
 no need min max datamap for binary, support mv and pre-aggregate in the future
2.10 CSDK / python SDK support binary in the future.
2.11 Support S3
 
CarbonSession: impact analysis


3. Supporting read binary data type by Carbon SDK
3.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[]
3.2 Supporting projection for binary column
3.3 Supporting S3
3.4 no need to support filter.

4. Supporting write binary by spark (carbon file format / 
carbonsession, POC??)
4.1 Convert binary to String and storage in CSV, encode as Hex, 
Base64
4.2 Spark load CSV and convert string to binary, and storage in 
CarbonData. CarbonData internal will decode Hex to binary.
4.3 Supporting insert (string => binary, configuration for 
encode/decode algorithm, default is Hex, user can change to base64 or others, 
is it ok?), update, delete for binary
4.4 Don’t support stream table.
=> refer hive and Spark2.4 image DataSource

Formal? How to support write into binary read from images in SQL?
Use spark core code is ok.  


 




  was:
Support Binary Data Type:
1. Support write and read binary data type by CarbonData Java SDK
2. Support  read binary data type by Spark Carbon File Format


> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Report

[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-09 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Attachment: (was: CarbonData support binary data type.pdf)

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
> Attachments: CarbonData support binary data type V0.1.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Support Binary Data Type:
> 1. Support write and read binary data type by CarbonData Java SDK
> 2. Support  read binary data type by Spark Carbon File Format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-09 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Attachment: CarbonData support binary data type v0.1.pdf

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
> Attachments: CarbonData support binary data type V0.1.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Support Binary Data Type:
> 1. Support write and read binary data type by CarbonData Java SDK
> 2. Support  read binary data type by Spark Carbon File Format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-09 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Attachment: CarbonData support binary data type V0.1.pdf

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
> Attachments: CarbonData support binary data type V0.1.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Support Binary Data Type:
> 1. Support write and read binary data type by CarbonData Java SDK
> 2. Support  read binary data type by Spark Carbon File Format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-09 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Attachment: (was: CarbonData support binary data type v0.1.pdf)

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
> Attachments: CarbonData support binary data type V0.1.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Support Binary Data Type:
> 1. Support write and read binary data type by CarbonData Java SDK
> 2. Support  read binary data type by Spark Carbon File Format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-04-09 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Attachment: CarbonData support binary data type.pdf

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3336
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
> Attachments: CarbonData support binary data type.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Support Binary Data Type:
> 1. Support write and read binary data type by CarbonData Java SDK
> 2. Support  read binary data type by Spark Carbon File Format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3351) Support Binary Data Type

2019-04-09 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3351:
---

 Summary: Support Binary Data Type
 Key: CARBONDATA-3351
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
 Project: CarbonData
  Issue Type: Task
Reporter: xubo245
Assignee: xubo245






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3342) It throws IllegalArgumentException when using filter

2019-04-04 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3342:
---

 Summary: It throws IllegalArgumentException when using filter
 Key: CARBONDATA-3342
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3342
 Project: CarbonData
  Issue Type: Bug
Reporter: xubo245
Assignee: xubo245



{code:java}
  public void testReadWithFilterOfNonTransactional2() throws IOException, 
InterruptedException {
String path = "./testWriteFiles";
FileUtils.deleteDirectory(new File(path));
DataMapStoreManager.getInstance()
.clearDataMaps(AbsoluteTableIdentifier.from(path));
Field[] fields = new Field[2];
fields[0] = new Field("name", DataTypes.STRING);
fields[1] = new Field("age", DataTypes.INT);

TestUtil.writeFilesAndVerify(200, new Schema(fields), path);

ColumnExpression columnExpression = new ColumnExpression("age", 
DataTypes.INT);

EqualToExpression equalToExpression = new 
EqualToExpression(columnExpression,
new LiteralExpression("-11", DataTypes.INT));
CarbonReader reader = CarbonReader
.builder(path, "_temp")
.projection(new String[]{"name", "age"})
.filter(equalToExpression)
.build();

int i = 0;
while (reader.hasNext()) {
  Object[] row = (Object[]) reader.readNextRow();
  // Default sort column is applied for dimensions. So, need  to validate 
accordingly
  assert (((String) row[0]).contains("robot"));
  assert (1 == (int) (row[1]));
  i++;
}
Assert.assertEquals(i, 1);

reader.close();

FileUtils.deleteDirectory(new File(path));
  }

{code}

Exception:


{code:java}
2019-04-04 18:15:23 INFO  CarbonLRUCache:163 - Removed entry from InMemory lru 
cache :: 
/Users/xubo/Desktop/xubo/git/carbondata2/store/sdk/testWriteFiles/63862773138004_batchno0-0-null-63862150454623.carbonindex

java.lang.IllegalArgumentException: no reader

at 
org.apache.carbondata.sdk.file.CarbonReader.(CarbonReader.java:60)
at 
org.apache.carbondata.sdk.file.CarbonReaderBuilder.build(CarbonReaderBuilder.java:222)
at 
org.apache.carbondata.sdk.file.CarbonReaderTest.testReadWithFilterOfNonTransactional2(CarbonReaderTest.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)


{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3336) Support Binary Data Type

2019-03-31 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3336:
---

 Summary: Support Binary Data Type
 Key: CARBONDATA-3336
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3336
 Project: CarbonData
  Issue Type: New Feature
Reporter: xubo245
Assignee: xubo245


Support Binary Data Type:
1. Support write and read binary data type by CarbonData Java SDK
2. Support  read binary data type by Spark Carbon File Format



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3271) WIP

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3271:

Summary: WIP  (was: CarboData provide python SDK)

> WIP
> ---
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-3283) WIP

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 closed CARBONDATA-3283.
---
Resolution: Incomplete

> WIP
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> WIP



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-3254) [WIP]

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 closed CARBONDATA-3254.
---
Resolution: Incomplete

> [WIP] 
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-3255) WIP

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 closed CARBONDATA-3255.
---
Resolution: Incomplete

> WIP
> ---
>
> Key: CARBONDATA-3255
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3283) WIP

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3283:

Summary: WIP  (was: Support write data with different data type)

> WIP
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData support AI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3283) WIP

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3283:

Description: WIP  (was: CarbonData support AI )

> WIP
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> WIP



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3255) WIP

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3255:

Description: (was: Support binary data type)
Summary: WIP  (was: Support binary data type)

> WIP
> ---
>
> Key: CARBONDATA-3255
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) [WIP]

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Summary: [WIP]   (was: [WIP] CarbonData supports deep learning framework to 
write and read image/voice data)

> [WIP] 
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-3271) WIP

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 closed CARBONDATA-3271.
---
Resolution: Incomplete

> WIP
> ---
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) [WIP] CarbonData supports deep learning framework to write and read image/voice data

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Description: (was: CarbonData supports deep learning framework to write 
and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read, which can be used for deep learning 
framework tensorflow/MXNet or others
* Support write data by different data type
* Support read data by file or file lists)

> [WIP] CarbonData supports deep learning framework to write and read 
> image/voice data
> 
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK

2019-02-01 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3271:

Description: (was: Many users use python to install their project. It's 
not easy for them to use carbon by Java/Scala/C++. And Spark also provide 
python SDK for users. So it's better to provide python SDK for CarbonData

For pyspark, they used py4j for python invoke java code:
![image](http://i.imgur.com/YlI8AqEl.png)

https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf

Please refer:
# https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
# https://issues.apache.org/jira/browse/SPARK-3789

)

> CarboData provide python SDK
> 
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) [WIP] CarbonData supports deep learning framework to write and read image/voice data

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Summary: [WIP] CarbonData supports deep learning framework to write and 
read image/voice data  (was: CarbonData supports deep learning framework to 
write and read image/voice data)

> [WIP] CarbonData supports deep learning framework to write and read 
> image/voice data
> 
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> CarbonData supports deep learning framework to write and read image/voice data
> * Supports write and read image in CarbonData
> * Provide Carbon python SDK for read, which can be used for deep learning 
> framework tensorflow/MXNet or others
> * Support write data by different data type
> * Support read data by file or file lists



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Description: 
CarbonData supports deep learning framework to write and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read
* 

  was:
CarbonData supports deep learning framework to write and read image/voice data

* Supports read image in CarbonData
* Support write image in CarbonData



> CarbonData supports deep learning framework to write and read image/voice data
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData supports deep learning framework to write and read image/voice data
> * Supports write and read image in CarbonData
> * Provide Carbon python SDK for read
> * 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Description: 
CarbonData supports deep learning framework to write and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read, which can be used for deep learning 
framework tensorflow/MXNet or others
* Support write data by different data type
* Support read data by file or file lists

  was:
CarbonData supports deep learning framework to write and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read, which can be used for deep learning 
framework tensorflow/MXNet or others
* Support write data by different data type


> CarbonData supports deep learning framework to write and read image/voice data
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData supports deep learning framework to write and read image/voice data
> * Supports write and read image in CarbonData
> * Provide Carbon python SDK for read, which can be used for deep learning 
> framework tensorflow/MXNet or others
> * Support write data by different data type
> * Support read data by file or file lists



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Description: 
CarbonData supports deep learning framework to write and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read, which can be used for tensor flow/MXnet
* 

  was:
CarbonData supports deep learning framework to write and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read
* 


> CarbonData supports deep learning framework to write and read image/voice data
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData supports deep learning framework to write and read image/voice data
> * Supports write and read image in CarbonData
> * Provide Carbon python SDK for read, which can be used for tensor flow/MXnet
> * 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Description: 
CarbonData supports deep learning framework to write and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read, which can be used for deep learning 
framework tensorflow/MXNet or others
* Support write data by different data type

  was:
CarbonData supports deep learning framework to write and read image/voice data

* Supports write and read image in CarbonData
* Provide Carbon python SDK for read, which can be used for tensor flow/MXnet
* 


> CarbonData supports deep learning framework to write and read image/voice data
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData supports deep learning framework to write and read image/voice data
> * Supports write and read image in CarbonData
> * Provide Carbon python SDK for read, which can be used for deep learning 
> framework tensorflow/MXNet or others
> * Support write data by different data type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Description: 
CarbonData supports deep learning framework to write and read image/voice data

* Supports read image in CarbonData
* Support write image in CarbonData


  was:
CarbonData support AI 
Support write and read image in CarbonData




> CarbonData supports deep learning framework to write and read image/voice data
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData supports deep learning framework to write and read image/voice data
> * Supports read image in CarbonData
> * Support write image in CarbonData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Summary: CarbonData supports deep learning framework to write and read 
image/voice data  (was: CarbonData support AI)

> CarbonData supports deep learning framework to write and read image/voice data
> --
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData support AI 
> Support write and read image in CarbonData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3283) Support write data with different data type

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3283:

Issue Type: Sub-task  (was: New Feature)
Parent: CARBONDATA-3254

> Support write data with different data type
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: Sub-task
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData support AI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3271) CarboData provide python SDK

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 reassigned CARBONDATA-3271:
---

Assignee: xubo245

> CarboData provide python SDK
> 
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Many users use python to install their project. It's not easy for them to use 
> carbon by Java/Scala/C++. And Spark also provide python SDK for users. So 
> it's better to provide python SDK for CarbonData
> For pyspark, they used py4j for python invoke java code:
> ![image](http://i.imgur.com/YlI8AqEl.png)
> https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf
> Please refer:
> # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> # https://issues.apache.org/jira/browse/SPARK-3789



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3271:

Issue Type: Sub-task  (was: New Feature)
Parent: CARBONDATA-3254

> CarboData provide python SDK
> 
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Many users use python to install their project. It's not easy for them to use 
> carbon by Java/Scala/C++. And Spark also provide python SDK for users. So 
> it's better to provide python SDK for CarbonData
> For pyspark, they used py4j for python invoke java code:
> ![image](http://i.imgur.com/YlI8AqEl.png)
> https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf
> Please refer:
> # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> # https://issues.apache.org/jira/browse/SPARK-3789



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3254) CarbonData support AI

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3254:

Description: 
CarbonData support AI 
Support write and read image in CarbonData



  was:
Support write and read image in CarbonData



Summary: CarbonData support AI  (was: Support write and read image in 
CarbonData)

> CarbonData support AI
> -
>
> Key: CARBONDATA-3254
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData support AI 
> Support write and read image in CarbonData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3283) Support write data with different data type

2019-01-30 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3283:

Summary: Support write data with different data type  (was: CarbonData 
support AI )

> Support write data with different data type
> ---
>
> Key: CARBONDATA-3283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
> Project: CarbonData
>  Issue Type: New Feature
>    Reporter: xubo245
>    Assignee: xubo245
>Priority: Major
>
> CarbonData support AI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3283) CarbonData support AI

2019-01-30 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3283:
---

 Summary: CarbonData support AI 
 Key: CARBONDATA-3283
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3283
 Project: CarbonData
  Issue Type: New Feature
Reporter: xubo245
Assignee: xubo245


CarbonData support AI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3280) SDK batch read failed

2019-01-28 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3280:
---

 Summary: SDK batch read failed
 Key: CARBONDATA-3280
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3280
 Project: CarbonData
  Issue Type: Bug
Reporter: xubo245
Assignee: xubo245


SDK batch read failed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3275) There are 4 errors in CI after PR 3094 merged

2019-01-25 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3275:
---

 Summary: There are 4 errors in CI after PR 3094 merged
 Key: CARBONDATA-3275
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3275
 Project: CarbonData
  Issue Type: Bug
Reporter: xubo245
Assignee: xubo245


There are 4 errors in CI after PR 3094 merged



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK

2019-01-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3271:

Description: 
Many users use python to install their project. It's not easy for them to use 
carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's 
better to provide python SDK for CarbonData

For pyspark, they used py4j for python invoke java code:
![image](http://i.imgur.com/YlI8AqEl.png)

https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf

Please refer:
# https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
# https://issues.apache.org/jira/browse/SPARK-3789



  was:
Many users use python to install their project. It's not easy for them to use 
carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's 
better to provide python SDK for CarbonData

For pyspark, they used py4j for python invoke java code:
![image](http://i.imgur.com/YlI8AqEl.png)

https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf

Please refer:
# [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
# [2] https://issues.apache.org/jira/browse/SPARK-3789




> CarboData provide python SDK
> 
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: New Feature
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Priority: Major
>
> Many users use python to install their project. It's not easy for them to use 
> carbon by Java/Scala/C++. And Spark also provide python SDK for users. So 
> it's better to provide python SDK for CarbonData
> For pyspark, they used py4j for python invoke java code:
> ![image](http://i.imgur.com/YlI8AqEl.png)
> https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf
> Please refer:
> # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> # https://issues.apache.org/jira/browse/SPARK-3789



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK

2019-01-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3271:

Description: 
Many users use python to install their project. It's not easy for them to use 
carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's 
better to provide python SDK for CarbonData

For pyspark, they used py4j for python invoke java code:
![](http://i.imgur.com/YlI8AqEl.png)

https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf

Please refer:
# [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
# [2] https://issues.apache.org/jira/browse/SPARK-3789



  was:
Many users use python to install their project. It's not easy for them to use 
carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's 
better to provide python SDK for CarbonData



> CarboData provide python SDK
> 
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: New Feature
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Priority: Major
>
> Many users use python to install their project. It's not easy for them to use 
> carbon by Java/Scala/C++. And Spark also provide python SDK for users. So 
> it's better to provide python SDK for CarbonData
> For pyspark, they used py4j for python invoke java code:
> ![](http://i.imgur.com/YlI8AqEl.png)
> https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf
> Please refer:
> # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> # [2] https://issues.apache.org/jira/browse/SPARK-3789



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK

2019-01-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3271:

Description: 
Many users use python to install their project. It's not easy for them to use 
carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's 
better to provide python SDK for CarbonData

For pyspark, they used py4j for python invoke java code:
![image](http://i.imgur.com/YlI8AqEl.png)

https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf

Please refer:
# [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
# [2] https://issues.apache.org/jira/browse/SPARK-3789



  was:
Many users use python to install their project. It's not easy for them to use 
carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's 
better to provide python SDK for CarbonData

For pyspark, they used py4j for python invoke java code:
![](http://i.imgur.com/YlI8AqEl.png)

https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf

Please refer:
# [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
# [2] https://issues.apache.org/jira/browse/SPARK-3789




> CarboData provide python SDK
> 
>
> Key: CARBONDATA-3271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
> Project: CarbonData
>  Issue Type: New Feature
>Affects Versions: 1.5.1
>    Reporter: xubo245
>Priority: Major
>
> Many users use python to install their project. It's not easy for them to use 
> carbon by Java/Scala/C++. And Spark also provide python SDK for users. So 
> it's better to provide python SDK for CarbonData
> For pyspark, they used py4j for python invoke java code:
> ![image](http://i.imgur.com/YlI8AqEl.png)
> https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf
> Please refer:
> # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
> # [2] https://issues.apache.org/jira/browse/SPARK-3789



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3271) CarboData provide python SDK

2019-01-24 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3271:
---

 Summary: CarboData provide python SDK
 Key: CARBONDATA-3271
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3271
 Project: CarbonData
  Issue Type: New Feature
Affects Versions: 1.5.1
Reporter: xubo245


Many users use python to install their project. It's not easy for them to use 
carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's 
better to provide python SDK for CarbonData




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-3252) Remove unused import and optimize the import order

2019-01-24 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 closed CARBONDATA-3252.
---
Resolution: Fixed

> Remove  unused import and optimize the import order
> ---
>
> Key: CARBONDATA-3252
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3252
> Project: CarbonData
>  Issue Type: Bug
>    Reporter: xubo245
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Remove  unused import and fix some spell error
> * org.apache.carbondata.spark.testsuite.badrecordloger.BadRecordLoggerTest:   
>  
> remove CarbonLoadOptionConstants in line 27
> * 
> org.apache.carbondata.spark.testsuite.directdictionary.TimestampNoDictionaryColumnTestCase:
> remove line 23 and 26



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3255) Support binary data type

2019-01-17 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3255:
---

 Summary: Support binary data type
 Key: CARBONDATA-3255
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


Support binary data type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3254) Support write and read image in CarbonData

2019-01-16 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3254:
---

 Summary: Support write and read image in CarbonData
 Key: CARBONDATA-3254
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3254
 Project: CarbonData
  Issue Type: New Feature
Reporter: xubo245
Assignee: xubo245


Support write and read image in CarbonData





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3252) Remove unused import and optimize the import order

2019-01-16 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3252:

Summary: Remove  unused import and optimize the import order  (was: Remove  
unused import and fix some spell error)

> Remove  unused import and optimize the import order
> ---
>
> Key: CARBONDATA-3252
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3252
> Project: CarbonData
>  Issue Type: Bug
>    Reporter: xubo245
>Priority: Major
>
> Remove  unused import and fix some spell error
> * org.apache.carbondata.spark.testsuite.badrecordloger.BadRecordLoggerTest:   
>  
> remove CarbonLoadOptionConstants in line 27
> * 
> org.apache.carbondata.spark.testsuite.directdictionary.TimestampNoDictionaryColumnTestCase:
> remove line 23 and 26



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3252) Remove unused import and fix some spell error

2019-01-16 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3252:
---

 Summary: Remove  unused import and fix some spell error
 Key: CARBONDATA-3252
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3252
 Project: CarbonData
  Issue Type: Bug
Reporter: xubo245


Remove  unused import and fix some spell error

* org.apache.carbondata.spark.testsuite.badrecordloger.BadRecordLoggerTest:
remove CarbonLoadOptionConstants in line 27
* 
org.apache.carbondata.spark.testsuite.directdictionary.TimestampNoDictionaryColumnTestCase:
remove line 23 and 26



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3251) Fix spark-2.1 UT errors

2019-01-14 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3251:
---

 Summary: Fix spark-2.1 UT errors
 Key: CARBONDATA-3251
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3251
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 1.5.1
Reporter: xubo245
Assignee: xubo245


Fix spark-2.1 UT errors



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3250) Optimize hive

2019-01-14 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3250:

Description: 
Optimize hive, including hive doc and code
1. running command

{code:java}
-DskipTests -Pspark-2.1 -Phadoop-2.7.2 clean package
{code}

warning:
{code:java}
[WARNING] The requested profile "hadoop-2.7.2" could not be activated because 
it does not exist.

{code}

  was:Optimize hive, including hive doc and code


> Optimize hive
> -
>
> Key: CARBONDATA-3250
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3250
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 1.5.1
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> Optimize hive, including hive doc and code
> 1. running command
> {code:java}
> -DskipTests -Pspark-2.1 -Phadoop-2.7.2 clean package
> {code}
> warning:
> {code:java}
> [WARNING] The requested profile "hadoop-2.7.2" could not be activated because 
> it does not exist.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3250) Optimize hive

2019-01-14 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3250:
---

 Summary: Optimize hive
 Key: CARBONDATA-3250
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3250
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 1.5.1
Reporter: xubo245
Assignee: xubo245


Optimize hive, including hive doc and code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3249) SQL and SDK float value is different

2019-01-14 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3249:

Description: 
SQL and SDK float value is different

Code: it's from 
https://github.com/xubo245/carbondata/commit/537c7265cc4bd755c073501773a523722709338a

{code:java}
  test("test float") {
val path = FileFactory.getPath(warehouse + "/sdk1").toString
FileFactory.deleteAllFilesOfDir(new File(warehouse + "/sdk1"))
sql("drop table if exists carbon_float")
var fields: Array[Field] = new Array[Field](1)
// same column name, but name as boolean type
fields(0) = new Field("b", DataTypes.FLOAT)

try {
val builder = CarbonWriter.builder()
val writer =
builder.outputPath(path)
.uniqueIdentifier(System.nanoTime()).withBlockSize(2)
.withCsvInput(new 
Schema(fields)).writtenBy("SparkCarbonDataSourceTest").build()

var i = 0
while (i < 1) {
val array = Array[String](
"2147483648.1")
writer.write(array)
i += 1
}
writer.close()

val reader = CarbonReader.builder(path, "_temp").build
i = 0
var floatValueSDK: Float = 0
while (i < 20 && reader.hasNext) {
val row = reader.readNextRow.asInstanceOf[Array[AnyRef]]
println("SDK float value is: " + row(0))
floatValueSDK = row(0).asInstanceOf[Float]
i += 1
}
reader.close()

sql("create table carbon_float(floatField float) stored as 
carbondata")
sql("insert into carbon_float values('2147483648.1')")
val df = sql("Select * from carbon_float").collect()
println("CarbonSession float value is: " + df(0))
assert(df(0).equals(floatValueSDK))
} catch {
case ex: Exception => throw new RuntimeException(ex)
} finally {
sql("drop table if exists carbon_float")
FileFactory.deleteAllFilesOfDir(new File(warehouse + "/sdk1"))
}
}

{code}


Exception:

{code:java}
SDK float value is: 2.14748365E9
2019-01-14 18:15:24 AUDIT audit:72 - {"time":"January 14, 2019 2:15:24 AM 
PST","username":"xubo","opName":"CREATE 
TABLE","opId":"26423231368673","opStatus":"START"}
2019-01-14 18:15:24 AUDIT audit:93 - {"time":"January 14, 2019 2:15:24 AM 
PST","username":"xubo","opName":"CREATE 
TABLE","opId":"26423231368673","opStatus":"SUCCESS","opTime":"604 
ms","table":"default.carbon_float","extraInfo":{"bad_record_path":"","local_dictionary_enable":"true","external":"false","sort_columns":"","comment":""}}
2019-01-14 18:15:24 AUDIT audit:72 - {"time":"January 14, 2019 2:15:24 AM 
PST","username":"xubo","opName":"INSERT 
INTO","opId":"26424137339770","opStatus":"START"}
2019-01-14 18:15:26 AUDIT audit:93 - {"time":"January 14, 2019 2:15:26 AM 
PST","username":"xubo","opName":"INSERT 
INTO","opId":"26424137339770","opStatus":"SUCCESS","opTime":"1479 
ms","table":"default.carbon_float","extraInfo":{"SegmentId":"0","DataSize":"408.0B","IndexSize":"254.0B"}}
CarbonSession float value is: [2.1474836481E9]
2019-01-14 18:15:26 AUDIT audit:72 - {"time":"January 14, 2019 2:15:26 AM 
PST","username":"xubo","opName":"DROP 
TABLE","opId":"26425973212561","opStatus":"START"}
2019-01-14 18:15:27 AUDIT audit:93 - {"time":"January 14, 2019 2:15:27 AM 
PST","username":"xubo","opName":"DROP 
TABLE","opId":"26425973212561","opStatus":"SUCCESS","opTime":"393 
ms","table":"default.carbon_float","extraInfo":{}}

org.scalatest.exceptions.TestFailedException: df.apply(0).equals(floatValueSDK) 
was false
java.lang.RuntimeException: org.scalatest.exceptions.TestFailedE

  1   2   3   4   5   6   7   8   9   10   >