[jira] [Created] (CARBONDATA-3470) Upgrade arrow version
xubo245 created CARBONDATA-3470: --- Summary: Upgrade arrow version Key: CARBONDATA-3470 URL: https://issues.apache.org/jira/browse/CARBONDATA-3470 Project: CarbonData Issue Type: Improvement Reporter: xubo245 Assignee: xubo245 Upgrade arrow version -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (CARBONDATA-3443) Update hive guide with Read from hive
[ https://issues.apache.org/jira/browse/CARBONDATA-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3443. - Resolution: Fixed > Update hive guide with Read from hive > - > > Key: CARBONDATA-3443 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3443 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Priority: Minor > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3461) Carbon SDK support filter values set.
xubo245 created CARBONDATA-3461: --- Summary: Carbon SDK support filter values set. Key: CARBONDATA-3461 URL: https://issues.apache.org/jira/browse/CARBONDATA-3461 Project: CarbonData Issue Type: New Feature Reporter: xubo245 Assignee: xubo245 Carbon SDK support filter values set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-3398) Implement Show Cache for IndexServer and MV
[ https://issues.apache.org/jira/browse/CARBONDATA-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 closed CARBONDATA-3398. --- > Implement Show Cache for IndexServer and MV > --- > > Key: CARBONDATA-3398 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3398 > Project: CarbonData > Issue Type: Sub-task >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Major > Fix For: 1.6.0 > > Time Spent: 25h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3412) For Non-transactional tables empty results are displayed with index server enabled
[ https://issues.apache.org/jira/browse/CARBONDATA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3412. - Resolution: Fixed > For Non-transactional tables empty results are displayed with index server > enabled > -- > > Key: CARBONDATA-3412 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3412 > Project: CarbonData > Issue Type: Bug >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3446) Support read schema of complex data type from carbon file folder path
xubo245 created CARBONDATA-3446: --- Summary: Support read schema of complex data type from carbon file folder path Key: CARBONDATA-3446 URL: https://issues.apache.org/jira/browse/CARBONDATA-3446 Project: CarbonData Issue Type: New Feature Reporter: xubo245 Assignee: xubo245 Support read schema of complex data type from carbon file folder path -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3425) Add Documentation for MV datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3425. - Resolution: Fixed > Add Documentation for MV datamap > > > Key: CARBONDATA-3425 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3425 > Project: CarbonData > Issue Type: Sub-task >Reporter: Indhumathi Muthumurugesh >Priority: Minor > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3415) Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table.
[ https://issues.apache.org/jira/browse/CARBONDATA-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3415. - Resolution: Fixed > Merge index is not working for partition table. Merge index for partition > table is taking significantly longer time than normal table. > -- > > Key: CARBONDATA-3415 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3415 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Issues: > (1) Merge index is not working on partition table > (2) Time taken for merge index is significantly more than the normal carbon > table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3434) Fix Data Mismatch between MainTable and MV DataMap table during compaction
[ https://issues.apache.org/jira/browse/CARBONDATA-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3434. - Resolution: Fixed > Fix Data Mismatch between MainTable and MV DataMap table during compaction > -- > > Key: CARBONDATA-3434 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3434 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3258) Add more test case for mv datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3258. - Resolution: Fixed > Add more test case for mv datamap > - > > Key: CARBONDATA-3258 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3258 > Project: CarbonData > Issue Type: Test > Components: data-query >Reporter: Chenjian Qiu >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > Add more test case for mv datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3414) when Insert into partition table fails exception doesn't print reason.
[ https://issues.apache.org/jira/browse/CARBONDATA-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3414. - Resolution: Fixed > when Insert into partition table fails exception doesn't print reason. > -- > > Key: CARBONDATA-3414 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3414 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Minor > > problem: when Insert into partition table fails, exception doesn't print > reason. > cause: Exception was caught , but error message was not from that exception. > solution: throw the exception directly > > Steps to reproduce: > # Open multiple spark beeline (say 10) > # Create a carbon table with partition > # Insert overwrite to carbon table from all the 10 beeline concurrently > # some insert overwrite will be success and some will fail due to > non-availability of lock even after retry. > # For the failed insert into sql, Exception is just "DataLoadFailure: " no > error reason is printed. > Need to print the valid error reason for the failure. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3411) ClearDatamaps logs an exception in SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3411. - Resolution: Fixed > ClearDatamaps logs an exception in SDK > -- > > Key: CARBONDATA-3411 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3411 > Project: CarbonData > Issue Type: Bug >Reporter: Ajantha Bhat >Priority: Minor > Time Spent: 4h 50m > Remaining Estimate: 0h > > problem: In sdk when datamaps are cleared, below exception is logged > java.io.IOException: File does not exist: > /home/root1/Documents/ab/workspace/carbonFile/carbondata/store/sdk/testWriteFiles/771604793030370/Metadata/schema > at > org.apache.carbondata.core.metadata.schema.SchemaReader.readCarbonTableFromStore(SchemaReader.java:60) > at > org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildFromTablePath(CarbonTable.java:272) > at > org.apache.carbondata.core.datamap.DataMapStoreManager.getCarbonTable(DataMapStoreManager.java:566) > at > org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:514) > at > org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:504) > at > org.apache.carbondata.sdk.file.CarbonReaderBuilder.getSplits(CarbonReaderBuilder.java:419) > at > org.apache.carbondata.sdk.file.CarbonReaderTest.testGetSplits(CarbonReaderTest.java:2605) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at junit.framework.TestCase.runBare(TestCase.java:141) > at junit.framework.TestResult$1.protect(TestResult.java:122) > at junit.framework.TestResult.runProtected(TestResult.java:142) > at junit.framework.TestResult.run(TestResult.java:125) > at junit.framework.TestCase.run(TestCase.java:129) > at junit.framework.TestSuite.runTest(TestSuite.java:255) > at junit.framework.TestSuite.run(TestSuite.java:250) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > cause: CarbonTable is required for only launching the job, SDK there is no > need to launch job. so , no need to build a carbon table. > solution: build carbon table only when need to launch job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3424) There are improper exception when query with avg(substr(binary data type)).
xubo245 created CARBONDATA-3424: --- Summary: There are improper exception when query with avg(substr(binary data type)). Key: CARBONDATA-3424 URL: https://issues.apache.org/jira/browse/CARBONDATA-3424 Project: CarbonData Issue Type: Bug Reporter: xubo245 Assignee: xubo245 Code: {code:java} CREATE TABLE uniqdata (CUST_ID int,CUST_NAME binary,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='2000'); LOAD DATA inpath 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' ,'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); Select query with average function for substring of binary column is executed. select max(substr(CUST_NAME,1,2)),min(substr(CUST_NAME,1,2)),avg(substr(CUST_NAME,1,2)),count(substr(CUST_NAME,1,2)),sum(substr(CUST_NAME,1,2)),variance(substr(CUST_NAME,1,2)) from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10; select max(substring(CUST_NAME,1,2)),min(substring(CUST_NAME,1,2)),avg(substring(CUST_NAME,1,2)),count(substring(CUST_NAME,1,2)),sum(substring(CUST_NAME,1,2)),variance(substring(CUST_NAME,1,2)) from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10; {code} Improper exception: {code:java} "Invalid call to name on unresolved object, tree: unresolvedalias(avg(substring(CUST_NAME#73, 1, 2)), None)" did not contain "cannot resolve 'avg(substring(uniqdata.`CUST_NAME`, 1, 2))' due to data type mismatch: function average requires numeric types, not BinaryType" ScalaTestFailureLocation: org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27 at (TestBinaryDataType.scala:1410) org.scalatest.exceptions.TestFailedException: "Invalid call to name on unresolved object, tree: unresolvedalias(avg(substring(CUST_NAME#73, 1, 2)), None)" did not contain "cannot resolve 'avg(substring(uniqdata.`CUST_NAME`, 1, 2))' due to data type mismatch: function average requires numeric types, not BinaryType" at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27.apply$mcV$sp(TestBinaryDataType.scala:1410) at org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27.apply(TestBinaryDataType.scala:1352) at org.apache.carbondata.integration.spark.testsuite.binary.TestBinaryDataType$$anonfun$27.apply(TestBinaryDataType.scala:1352) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.apache.spark.sql.test.util.CarbonFunSuite.withFixture(CarbonFunSuite.scala:41) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:381) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at
[jira] [Created] (CARBONDATA-3423) Validate dictionary for binary data type
xubo245 created CARBONDATA-3423: --- Summary: Validate dictionary for binary data type Key: CARBONDATA-3423 URL: https://issues.apache.org/jira/browse/CARBONDATA-3423 Project: CarbonData Issue Type: Sub-task Reporter: xubo245 Assignee: xubo245 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-3410) Add UDF, Hex/Base64 SQL functions for binary
[ https://issues.apache.org/jira/browse/CARBONDATA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852972#comment-16852972 ] xubo245 commented on CARBONDATA-3410: - CREATE TABLE uniqdata (CUST_ID int,CUST_NAME binary,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='2000'); LOAD DATA inpath 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' ,'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); Select query with average function for substring of binary column is executed. select max(substr(CUST_NAME,1,2)),min(substr(CUST_NAME,1,2)),avg(substr(CUST_NAME,1,2)),count(substr(CUST_NAME,1,2)),sum(substr(CUST_NAME,1,2)),variance(substr(CUST_NAME,1,2)) from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10; select max(substring(CUST_NAME,1,2)),min(substring(CUST_NAME,1,2)),avg(substring(CUST_NAME,1,2)),count(substring(CUST_NAME,1,2)),sum(substring(CUST_NAME,1,2)),variance(substring(CUST_NAME,1,2)) from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 =1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10; > Add UDF, Hex/Base64 SQL functions for binary > > > Key: CARBONDATA-3410 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3410 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > Add UDF, Hex/Base64 SQL functions for binary -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3410) Add UDF, Hex/Base64 SQL functions for binary
xubo245 created CARBONDATA-3410: --- Summary: Add UDF, Hex/Base64 SQL functions for binary Key: CARBONDATA-3410 URL: https://issues.apache.org/jira/browse/CARBONDATA-3410 Project: CarbonData Issue Type: Sub-task Reporter: xubo245 Assignee: xubo245 Add UDF, Hex/Base64 SQL functions for binary -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3351) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3351. - Resolution: Fixed > Support Binary Data Type > > > Key: CARBONDATA-3351 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3351 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 35h > Remaining Estimate: 0h > > Background : > Binary is basic data type and widely used in various scenarios. So it’s > better to support binary data type in CarbonData. Download data from S3 will > be slow when dataset has lots of small binary data. The majority of > application scenarios are related to storage small binary data type into > CarbonData, which can avoid small binary files problem and speed up S3 access > performance, also can decrease cost of accessing OBS by decreasing the number > of calling S3 API. It also will easier to manage structure data and > Unstructured data(binary) by storing them into CarbonData. > Goals: > 1. Supporting write binary data type by Carbon Java SDK. > 2. Supporting read binary data type by Spark Carbon file format(carbon > datasource) and CarbonSession. > 3. Supporting read binary data type by Carbon SDK > 4. Supporting write binary by spark > Approach and Detail: > 1.Supporting write binary data type by Carbon Java SDK [Formal]: > 1.1 Java SDK needs support write data with specific data types, > like int, double, byte[ ] data type, no need to convert all data type to > string array. User read binary file as byte[], then SDK writes byte[] into > binary column.=>Done > 1.2 CarbonData compress binary column because now the compressor is > table level.=>Done > 1.3 CarbonData stores binary as dimension. => Done > 1.4 Support configure page size for binary data type because binary > data usually is big, such as 200k. Otherwise it will be very big for one > blocklet (32000 rows). =>Done > 1.5 Avro, JSON convert need consider > • AVRO fixed and variable length binary can be supported > => Avro don't support binary data type => No > need >Support read binary from JSON => done. > 1.6 Binay data type as a child columns in Struct, Map > >=> support it in the future, but priority is not very > high, not in 1.5.4 > 1.7 Verify what is the maximum size of the binary value supportred > => snappy only support about 1.71 G, the max data size should be 2 GB, > but need confirm > > 2. Supporting read and manage binary data type by Spark Carbon file > format(carbon DataSource) and CarbonSession.[Formal] > 2.1 Supporting read binary data type from non-transaction table, > read binary column and return as byte[] =>Done > 2.2 Support create table with binary column, table property doesn’t > support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary > column => Done >=> CARBON Datasource don't support dictionary include column >=>support carbon.column.compressor= snappy,zstd,gzip for binary, > compress is for all columns(table level) > 2.3 Support CTAS for binary=> transaction/non-transaction, > Carbon/Hive/Parquet => Done > 2.4 Support external table for binary=> Done > 2.5 Support projection for binary column=> Done > 2.6 Support desc formatted=> Done >=> Carbon Datasource don't support ALTER TABLE add > columns sql >support ALTER TABLE for(add column, rename, drop column) > binary data type in carbon session=> Done >Don't support change the data type for binary by alter > table => Done > 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done > 2.8 Support compaction for binary=> Done > 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, > no need min max datamap for binary, support mv and pre-aggregate in the > future=> TODO > 2.10 CSDK / python SDK support binary in the future.=> TODO > 2.11 Support S3=> Done > 2.12 support UDF, hex, base64, cast:.=> TODO >select hex(bin) from carbon_table..=> TODO > > 2.15 support filter for binary => Done > 2.16 select CAST(s AS BINARY) from carbon_table. => Done > 3. Supporting read binary data type by Carbon SDK > 3.1 Supporting read binary data type from non-transaction table, > read binary column and return as byte[]=> Done > 3.2 Supporting projection for binary column=> Done >
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done =>TODO, support configuration for compress and no compress, default no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap Support bloomfilter,mv and pre-aggregate Don’t support lucene, timeseries datamap, no need min max datamap for binary =>Done 2.10 CSDK / python SDK support binary in the future.=> TODO, python sdk already merge to pycarbon 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.13 support configurable decode for query, support base64 and Hex decode.=> Done 2.15 How big data size binary data type can support for writing and reading?=> TODO 2.16 support filter for binary => Done 2.17 select CAST(s AS BINARY) from carbon_table. =>
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done =>TODO, support configuration for compress and no compress, default no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap Support bloomfilter,mv and pre-aggregate Don’t support lucene, timeseries datamap, no need min max datamap for binary =>Done 2.10 CSDK / python SDK support binary in the future.=> TODO, python sdk already merge to pycarbon 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.13 support configurable decode for query, support base64 and Hex decode.=> Done 2.15 How big data size binary data type can support for writing and reading?=> TODO 2.16 support filter for binary => Done 2.17 select CAST(s AS BINARY)
[jira] [Created] (CARBONDATA-3408) CarbonSession partition support binary data type
xubo245 created CARBONDATA-3408: --- Summary: CarbonSession partition support binary data type Key: CARBONDATA-3408 URL: https://issues.apache.org/jira/browse/CARBONDATA-3408 Project: CarbonData Issue Type: Sub-task Reporter: xubo245 Assignee: xubo245 CarbonSession partition support binary data type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-3366) Support SDK reader to read blocklet level split
[ https://issues.apache.org/jira/browse/CARBONDATA-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 resolved CARBONDATA-3366. - Resolution: Fixed > Support SDK reader to read blocklet level split > --- > > Key: CARBONDATA-3366 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3366 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > To provide more flexibility in SDK reader, blocklet level read support for > carbondata files from SDK reader is required. > With this, SDK reader can be used in distributed environment or in > multithread environment by creating carbon readers in each worker at split > level (blocklet split) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done =>TODO, support configuration for compress and no compress, default no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap Support bloomfilter,mv and pre-aggregate Don’t support lucene, timeseries datamap, no need min max datamap for binary =>Done 2.10 CSDK / python SDK support binary in the future.=> TODO 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.13 support configurable decode for query, support base64 and Hex decode.=> Done 2.15 How big data size binary data type can support for writing and reading?=> TODO 2.16 support filter for binary => Done 2.17 select CAST(s AS BINARY) from carbon_table. => Done
[jira] [Created] (CARBONDATA-3374) Optimize documentation and fix some spell errors.
xubo245 created CARBONDATA-3374: --- Summary: Optimize documentation and fix some spell errors. Key: CARBONDATA-3374 URL: https://issues.apache.org/jira/browse/CARBONDATA-3374 Project: CarbonData Issue Type: Improvement Reporter: xubo245 Assignee: xubo245 Optimize documentation and fix some spell errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3363) SDK supports read data from carbondata filelist
xubo245 created CARBONDATA-3363: --- Summary: SDK supports read data from carbondata filelist Key: CARBONDATA-3363 URL: https://issues.apache.org/jira/browse/CARBONDATA-3363 Project: CarbonData Issue Type: New Feature Reporter: xubo245 Assignee: xubo245 SDK supports read data from carbondata filelist -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap Support bloomfilter,mv and pre-aggregate Don’t support lucene, timeseries datamap, no need min max datamap for binary =>Done 2.10 CSDK / python SDK support binary in the future.=> TODO 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.13 support configurable decode for query, support base64 and Hex decode.=> Done 2.14 Proper Error message for not supported features like SI=> TODO 2.15 How big data size binary data type can support for writing and reading?=> TODO 2.16 support filter for binary => Done 2.17
[jira] [Updated] (CARBONDATA-3358) Support configurable decode for loading binary data, support base64 and Hex decode.
[ https://issues.apache.org/jira/browse/CARBONDATA-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3358: Description: Support configurable decode for loading binary data, support base64 and Hex decode. 1. support configurable decode for loading 2. test datamap 3. test datamap and configurable decode was:Support configurable decode for loading binary data, support base64 and Hex decode. > Support configurable decode for loading binary data, support base64 and Hex > decode. > --- > > Key: CARBONDATA-3358 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3358 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > Support configurable decode for loading binary data, support base64 and Hex > decode. > 1. support configurable decode for loading > 2. test datamap > 3. test datamap and configurable decode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3358) Support configurable decode for loading binary data, support base64 and Hex decode.
[ https://issues.apache.org/jira/browse/CARBONDATA-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3358: Description: Support configurable decode for loading binary data, support base64 and Hex decode. 1. support configurable decode for loading 2. test datamap: mv, preaggregate, timeseries, bloomfilter, lucene 3. test datamap and configurable decode was: Support configurable decode for loading binary data, support base64 and Hex decode. 1. support configurable decode for loading 2. test datamap 3. test datamap and configurable decode > Support configurable decode for loading binary data, support base64 and Hex > decode. > --- > > Key: CARBONDATA-3358 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3358 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > Support configurable decode for loading binary data, support base64 and Hex > decode. > 1. support configurable decode for loading > 2. test datamap: mv, preaggregate, timeseries, bloomfilter, lucene > 3. test datamap and configurable decode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap Support bloomfilter,mv and pre-aggregate Don’t support lucene, timeseries datamap, no need min max datamap for binary =>Done 2.10 CSDK / python SDK support binary in the future.=> TODO 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.13 support configurable decode for query, support base64 and Hex decode.=> Done 2.14 Proper Error message for not supported features like SI=> TODO 2.15 How big data size binary data type can support for writing and reading?=> TODO 2.16 support filter for binary => Done 2.17
[jira] [Created] (CARBONDATA-3358) Support configurable decode for loading binary data, support base64 and Hex decode.
xubo245 created CARBONDATA-3358: --- Summary: Support configurable decode for loading binary data, support base64 and Hex decode. Key: CARBONDATA-3358 URL: https://issues.apache.org/jira/browse/CARBONDATA-3358 Project: CarbonData Issue Type: Sub-task Reporter: xubo245 Assignee: xubo245 Support configurable decode for loading binary data, support base64 and Hex decode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap Support bloomfilter,mv and pre-aggregate Don’t support lucene, timeseries datamap, no need min max datamap for binary =>Done 2.10 CSDK / python SDK support binary in the future.=> TODO 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.13 support configurable decode for query, support base64 and Hex decode.=> TODO 2.14 Proper Error message for not supported features like SI=> TODO 2.15 How big data size binary data type can support for writing and reading?=> TODO 2.16 support filter for binary => Done 2.17
[jira] [Commented] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823822#comment-16823822 ] xubo245 commented on CARBONDATA-3336: - 小于2G的场景是可以存储在carbondata文件中, 大于2G文件的场景就必须分离存储? 另外小于2G的文件写入到carbon中,sdk是否支持流式写入呢。 > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Attachments: CarbonData support binary data type V0.1.pdf > > Time Spent: 4h 10m > Remaining Estimate: 0h > > CarbonData supports binary data type > Version Changes Owner Date > 0.1 Init doc for Supporting binary data typeXubo2019-4-10 > Background : > Binary is basic data type and widely used in various scenarios. So it’s > better to support binary data type in CarbonData. Download data from S3 will > be slow when dataset has lots of small binary data. The majority of > application scenarios are related to storage small binary data type into > CarbonData, which can avoid small binary files problem and speed up S3 access > performance, also can decrease cost of accessing OBS by decreasing the number > of calling S3 API. It also will easier to manage structure data and > Unstructured data(binary) by storing them into CarbonData. > Goals: > 1. Supporting write binary data type by Carbon Java SDK. > 2. Supporting read binary data type by Spark Carbon file format(carbon > datasource) and CarbonSession. > 3. Supporting read binary data type by Carbon SDK > 4. Supporting write binary by spark > Approach and Detail: > 1.Supporting write binary data type by Carbon Java SDK [Formal]: > 1.1 Java SDK needs support write data with specific data types, > like int, double, byte[ ] data type, no need to convert all data type to > string array. User read binary file as byte[], then SDK writes byte[] into > binary column.=>Done > 1.2 CarbonData compress binary column because now the compressor is > table level.=>Done > =>TODO, support configuration for compress, default is no > compress because binary usually is already compressed, like jpg format image. > So no need to uncompress for binary column. 1.5.4 will support column level > compression, after that, we can implement no compress for binary. We can talk > with community. > 1.3 CarbonData stores binary as dimension. => Done > 1.4 Support configure page size for binary data type because binary > data usually is big, such as 200k. Otherwise it will be very big for one > blocklet (32000 rows). =>Done > 1.5 Avro, JSON convert need consider > • AVRO fixed and variable length binary can be supported > => Avro don't support binary data type => No > need >Support read binary from JSON => done. > 1.6 Binay data type as a child columns in Struct, Map > >=> support it in the future, but priority is not very > high, not in 1.5.4 > 1.7 Verify what is the maximum size of the binary value supportred > => snappy only support about 1.71 G, the max data size should be 2 GB, > but need confirm > > 2. Supporting read and manage binary data type by Spark Carbon file > format(carbon DataSource) and CarbonSession.[Formal] > 2.1 Supporting read binary data type from non-transaction table, > read binary column and return as byte[] =>Done > 2.2 Support create table with binary column, table property doesn’t > support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary > column => Done >=> CARBON Datasource don't support dictionary include column >=>support carbon.column.compressor= snappy,zstd,gzip for binary, > compress is for all columns(table level) > 2.3 Support CTAS for binary=> transaction/non-transaction, > Carbon/Hive/Parquet => Done > 2.4 Support external table for binary=> Done > 2.5 Support projection for binary column=> Done > 2.6 Support desc formatted=> Done >=> Carbon Datasource don't support ALTER TABLE add > columns sql >support ALTER TABLE for(add column, rename, drop column) > binary data type in carbon session=> Done >Don't support change the data type for binary by alter > table => Done > 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done > 2.8 Support compaction for binary=> Done > 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, > no need min max datamap for binary, support mv and
[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3351: Description: Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future=> TODO 2.10 CSDK / python SDK support binary in the future.=> TODO 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.15 support filter for binary => Done 2.16 select CAST(s AS BINARY) from carbon_table. => Done 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[]=> Done 3.2 Supporting projection for binary column=> Done 3.3 Supporting S3=> Done 3.4 no need to support filter.=> to be discussd, not in this PR 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV=> Done 4.2 Spark load CSV and convert string to byte[], and storage in CarbonData. read binary column and return as byte[]=> Done 4.3 Support insert into/update/delete for binary data type => Done
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column.=>Done 1.2 CarbonData compress binary column because now the compressor is table level.=>Done =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. => Done 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>Done 1.5 Avro, JSON convert need consider • AVRO fixed and variable length binary can be supported => Avro don't support binary data type => No need Support read binary from JSON => done. 1.6 Binay data type as a child columns in Struct, Map => support it in the future, but priority is not very high, not in 1.5.4 1.7 Verify what is the maximum size of the binary value supportred => snappy only support about 1.71 G, the max data size should be 2 GB, but need confirm 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] =>Done 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Done => CARBON Datasource don't support dictionary include column =>support carbon.column.compressor= snappy,zstd,gzip for binary, compress is for all columns(table level) 2.3 Support CTAS for binary=> transaction/non-transaction, Carbon/Hive/Parquet => Done 2.4 Support external table for binary=> Done 2.5 Support projection for binary column=> Done 2.6 Support desc formatted=> Done => Carbon Datasource don't support ALTER TABLE add columns sql support ALTER TABLE for(add column, rename, drop column) binary data type in carbon session=> Done Don't support change the data type for binary by alter table => Done 2.7 Don’t support PARTITION, BUCKETCOLUMNS for binary => Done 2.8 Support compaction for binary=> Done 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future=> TODO 2.10 CSDK / python SDK support binary in the future.=> TODO 2.11 Support S3=> Done 2.12 support UDF, hex, base64, cast:.=> TODO select hex(bin) from carbon_table..=> TODO 2.13 support configurable decode for query, support base64 and Hex decode.=> TODO 2.14 Proper Error message for not supported features like /mv/SI/bloom/streaming=> TODO 2.15 How big data size binary data type can support for writing and reading?=> TODO 2.16 support filter for binary => Done 2.17 select CAST(s AS
[jira] [Updated] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar
[ https://issues.apache.org/jira/browse/CARBONDATA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3356: Description: There are some exception when carbonData DataSource read SDK files with varchar ## write data: {code:java} public void testReadSchemaFromDataFileArrayString() { String path = "./testWriteFiles"; try { FileUtils.deleteDirectory(new File(path)); Field[] fields = new Field[11]; fields[0] = new Field("stringField", DataTypes.STRING); fields[1] = new Field("shortField", DataTypes.SHORT); fields[2] = new Field("intField", DataTypes.INT); fields[3] = new Field("longField", DataTypes.LONG); fields[4] = new Field("doubleField", DataTypes.DOUBLE); fields[5] = new Field("boolField", DataTypes.BOOLEAN); fields[6] = new Field("dateField", DataTypes.DATE); fields[7] = new Field("timeField", DataTypes.TIMESTAMP); fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); fields[9] = new Field("varcharField", DataTypes.VARCHAR); fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); Map map = new HashMap<>(); map.put("complex_delimiter_level_1", "#"); CarbonWriter writer = CarbonWriter.builder() .outputPath(path) .withLoadOptions(map) .withCsvInput(new Schema(fields)) .writtenBy("CarbonReaderTest") .build(); for (int i = 0; i < 10; i++) { String[] row2 = new String[]{ "robot" + (i % 10), String.valueOf(i % 1), String.valueOf(i), String.valueOf(Long.MAX_VALUE - i), String.valueOf((double) i / 2), String.valueOf(true), "2019-03-02", "2019-02-12 03:03:34", "12.345", "varchar", "Hello#World#From#Carbon" }; writer.write(row2); } writer.close(); File[] dataFiles = new File(path).listFiles(new FilenameFilter() { @Override public boolean accept(File dir, String name) { if (name == null) { return false; } return name.endsWith("carbondata"); } }); if (dataFiles == null || dataFiles.length < 1) { throw new RuntimeException("Carbon data file not exists."); } Schema schema = CarbonSchemaReader .readSchema(dataFiles[0].getAbsolutePath()) .asOriginOrder(); // Transform the schema String[] strings = new String[schema.getFields().length]; for (int i = 0; i < schema.getFields().length; i++) { strings[i] = (schema.getFields())[i].getFieldName(); } // Read data CarbonReader reader = CarbonReader .builder(path, "_temp") .projection(strings) .build(); int i = 0; while (reader.hasNext()) { Object[] row = (Object[]) reader.readNextRow(); assert (row[0].equals("robot" + i)); assert (row[2].equals(i)); assert (row[6].equals(17957)); Object[] arr = (Object[]) row[10]; assert (arr[0].equals("Hello")); assert (arr[3].equals("Carbon")); i++; } reader.close(); // FileUtils.deleteDirectory(new File(path)); } catch (Throwable e) { e.printStackTrace(); Assert.fail(e.getMessage()); } } {code} ## read data {code:java} test("Test read image carbon with spark carbon file format, generate by sdk, CTAS") { sql("DROP TABLE IF EXISTS binaryCarbon") sql("DROP TABLE IF EXISTS binaryCarbon3") if (SparkUtil.isSparkVersionEqualTo("2.1")) { sql(s"CREATE TABLE binaryCarbon USING CARBON OPTIONS(PATH '$writerPath')") sql(s"CREATE TABLE binaryCarbon3 USING CARBON OPTIONS(PATH '$outputPath')" + " AS SELECT * FROM binaryCarbon") } else { //sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION '$writerPath'") sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION '/Users/xubo/Desktop/xubo/git/carbondata3/store/sdk/testWriteFiles'") sql("SELECT COUNT(*) FROM binaryCarbon").show() } } {code} ## exception: {code:java} java.io.IOException: All common columns present in the files doesn't have same datatype. Unsupported operation on nonTransactional table. Check logs. at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.updateColumns(AbstractQueryExecutor.java:290) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:234) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:407)
[jira] [Updated] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar
[ https://issues.apache.org/jira/browse/CARBONDATA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3356: Description: There are some exception when carbonData DataSource read SDK files with varchar ## write data: {code:java} public void testReadSchemaFromDataFileArrayString() { String path = "./testWriteFiles"; try { FileUtils.deleteDirectory(new File(path)); Field[] fields = new Field[11]; fields[0] = new Field("stringField", DataTypes.STRING); fields[1] = new Field("shortField", DataTypes.SHORT); fields[2] = new Field("intField", DataTypes.INT); fields[3] = new Field("longField", DataTypes.LONG); fields[4] = new Field("doubleField", DataTypes.DOUBLE); fields[5] = new Field("boolField", DataTypes.BOOLEAN); fields[6] = new Field("dateField", DataTypes.DATE); fields[7] = new Field("timeField", DataTypes.TIMESTAMP); fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); fields[9] = new Field("varcharField", DataTypes.VARCHAR); fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); Map map = new HashMap<>(); map.put("complex_delimiter_level_1", "#"); CarbonWriter writer = CarbonWriter.builder() .outputPath(path) .withLoadOptions(map) .withCsvInput(new Schema(fields)) .writtenBy("CarbonReaderTest") .build(); for (int i = 0; i < 10; i++) { String[] row2 = new String[]{ "robot" + (i % 10), String.valueOf(i % 1), String.valueOf(i), String.valueOf(Long.MAX_VALUE - i), String.valueOf((double) i / 2), String.valueOf(true), "2019-03-02", "2019-02-12 03:03:34", "12.345", "varchar", "Hello#World#From#Carbon" }; writer.write(row2); } writer.close(); File[] dataFiles = new File(path).listFiles(new FilenameFilter() { @Override public boolean accept(File dir, String name) { if (name == null) { return false; } return name.endsWith("carbondata"); } }); if (dataFiles == null || dataFiles.length < 1) { throw new RuntimeException("Carbon data file not exists."); } Schema schema = CarbonSchemaReader .readSchema(dataFiles[0].getAbsolutePath()) .asOriginOrder(); // Transform the schema String[] strings = new String[schema.getFields().length]; for (int i = 0; i < schema.getFields().length; i++) { strings[i] = (schema.getFields())[i].getFieldName(); } // Read data CarbonReader reader = CarbonReader .builder(path, "_temp") .projection(strings) .build(); int i = 0; while (reader.hasNext()) { Object[] row = (Object[]) reader.readNextRow(); assert (row[0].equals("robot" + i)); assert (row[2].equals(i)); assert (row[6].equals(17957)); Object[] arr = (Object[]) row[10]; assert (arr[0].equals("Hello")); assert (arr[3].equals("Carbon")); i++; } reader.close(); // FileUtils.deleteDirectory(new File(path)); } catch (Throwable e) { e.printStackTrace(); Assert.fail(e.getMessage()); } } {code} ## read data {code:java} test("Test read image carbon with spark carbon file format, generate by sdk, CTAS") { sql("DROP TABLE IF EXISTS binaryCarbon") sql("DROP TABLE IF EXISTS binaryCarbon3") if (SparkUtil.isSparkVersionEqualTo("2.1")) { sql(s"CREATE TABLE binaryCarbon USING CARBON OPTIONS(PATH '$writerPath')") sql(s"CREATE TABLE binaryCarbon3 USING CARBON OPTIONS(PATH '$outputPath')" + " AS SELECT * FROM binaryCarbon") } else { //sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION '$writerPath'") sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION '/Users/xubo/Desktop/xubo/git/carbondata3/store/sdk/testWriteFiles'") sql("SELECT COUNT(*) FROM binaryCarbon").show() } } {code} ## exception: {code:java} java.io.IOException: All common columns present in the files doesn't have same datatype. Unsupported operation on nonTransactional table. Check logs. at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.updateColumns(AbstractQueryExecutor.java:290) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:234) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:407)
[jira] [Commented] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar
[ https://issues.apache.org/jira/browse/CARBONDATA-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822376#comment-16822376 ] xubo245 commented on CARBONDATA-3356: - DataType varchar is not supported.(line 1, pos 68) > There are some exception when carbonData DataSource read SDK files with > varchar > - > > Key: CARBONDATA-3356 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3356 > Project: CarbonData > Issue Type: Bug >Reporter: xubo245 >Priority: Major > > There are some exception when carbonData DataSource read SDK files with > varchar > ## write data: > {code:java} > public void testReadSchemaFromDataFileArrayString() { > String path = "./testWriteFiles"; > try { > FileUtils.deleteDirectory(new File(path)); > Field[] fields = new Field[11]; > fields[0] = new Field("stringField", DataTypes.STRING); > fields[1] = new Field("shortField", DataTypes.SHORT); > fields[2] = new Field("intField", DataTypes.INT); > fields[3] = new Field("longField", DataTypes.LONG); > fields[4] = new Field("doubleField", DataTypes.DOUBLE); > fields[5] = new Field("boolField", DataTypes.BOOLEAN); > fields[6] = new Field("dateField", DataTypes.DATE); > fields[7] = new Field("timeField", DataTypes.TIMESTAMP); > fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, > 2)); > fields[9] = new Field("varcharField", DataTypes.VARCHAR); > fields[10] = new Field("arrayField", > DataTypes.createArrayType(DataTypes.STRING)); > Map map = new HashMap<>(); > map.put("complex_delimiter_level_1", "#"); > CarbonWriter writer = CarbonWriter.builder() > .outputPath(path) > .withLoadOptions(map) > .withCsvInput(new Schema(fields)) > .writtenBy("CarbonReaderTest") > .build(); > for (int i = 0; i < 10; i++) { > String[] row2 = new String[]{ > "robot" + (i % 10), > String.valueOf(i % 1), > String.valueOf(i), > String.valueOf(Long.MAX_VALUE - i), > String.valueOf((double) i / 2), > String.valueOf(true), > "2019-03-02", > "2019-02-12 03:03:34", > "12.345", > "varchar", > "Hello#World#From#Carbon" > }; > writer.write(row2); > } > writer.close(); > File[] dataFiles = new File(path).listFiles(new FilenameFilter() { > @Override > public boolean accept(File dir, String name) { > if (name == null) { > return false; > } > return name.endsWith("carbondata"); > } > }); > if (dataFiles == null || dataFiles.length < 1) { > throw new RuntimeException("Carbon data file not exists."); > } > Schema schema = CarbonSchemaReader > .readSchema(dataFiles[0].getAbsolutePath()) > .asOriginOrder(); > // Transform the schema > String[] strings = new String[schema.getFields().length]; > for (int i = 0; i < schema.getFields().length; i++) { > strings[i] = (schema.getFields())[i].getFieldName(); > } > // Read data > CarbonReader reader = CarbonReader > .builder(path, "_temp") > .projection(strings) > .build(); > int i = 0; > while (reader.hasNext()) { > Object[] row = (Object[]) reader.readNextRow(); > assert (row[0].equals("robot" + i)); > assert (row[2].equals(i)); > assert (row[6].equals(17957)); > Object[] arr = (Object[]) row[10]; > assert (arr[0].equals("Hello")); > assert (arr[3].equals("Carbon")); > i++; > } > reader.close(); > // FileUtils.deleteDirectory(new File(path)); > } catch (Throwable e) { > e.printStackTrace(); > Assert.fail(e.getMessage()); > } > } > {code} > ## read data > {code:java} > test("Test read image carbon with spark carbon file format, generate by > sdk, CTAS") { > sql("DROP TABLE IF EXISTS binaryCarbon") > sql("DROP TABLE IF EXISTS binaryCarbon3") > if (SparkUtil.isSparkVersionEqualTo("2.1")) { > sql(s"CREATE TABLE binaryCarbon USING CARBON OPTIONS(PATH > '$writerPath')") > sql(s"CREATE TABLE binaryCarbon3 USING CARBON OPTIONS(PATH > '$outputPath')" + " AS SELECT * FROM binaryCarbon") > } else { > //sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION > '$writerPath'") > sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION > '/Users/xubo/Desktop/xubo/git/carbondata3/store/sdk/testWriteFiles'") > sql("SELECT COUNT(*) FROM binaryCarbon").show() > } > } >
[jira] [Created] (CARBONDATA-3356) There are some exception when carbonData DataSource read SDK files with varchar
xubo245 created CARBONDATA-3356: --- Summary: There are some exception when carbonData DataSource read SDK files with varchar Key: CARBONDATA-3356 URL: https://issues.apache.org/jira/browse/CARBONDATA-3356 Project: CarbonData Issue Type: Bug Reporter: xubo245 There are some exception when carbonData DataSource read SDK files with varchar ## write data: {code:java} public void testReadSchemaFromDataFileArrayString() { String path = "./testWriteFiles"; try { FileUtils.deleteDirectory(new File(path)); Field[] fields = new Field[11]; fields[0] = new Field("stringField", DataTypes.STRING); fields[1] = new Field("shortField", DataTypes.SHORT); fields[2] = new Field("intField", DataTypes.INT); fields[3] = new Field("longField", DataTypes.LONG); fields[4] = new Field("doubleField", DataTypes.DOUBLE); fields[5] = new Field("boolField", DataTypes.BOOLEAN); fields[6] = new Field("dateField", DataTypes.DATE); fields[7] = new Field("timeField", DataTypes.TIMESTAMP); fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); fields[9] = new Field("varcharField", DataTypes.VARCHAR); fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); Map map = new HashMap<>(); map.put("complex_delimiter_level_1", "#"); CarbonWriter writer = CarbonWriter.builder() .outputPath(path) .withLoadOptions(map) .withCsvInput(new Schema(fields)) .writtenBy("CarbonReaderTest") .build(); for (int i = 0; i < 10; i++) { String[] row2 = new String[]{ "robot" + (i % 10), String.valueOf(i % 1), String.valueOf(i), String.valueOf(Long.MAX_VALUE - i), String.valueOf((double) i / 2), String.valueOf(true), "2019-03-02", "2019-02-12 03:03:34", "12.345", "varchar", "Hello#World#From#Carbon" }; writer.write(row2); } writer.close(); File[] dataFiles = new File(path).listFiles(new FilenameFilter() { @Override public boolean accept(File dir, String name) { if (name == null) { return false; } return name.endsWith("carbondata"); } }); if (dataFiles == null || dataFiles.length < 1) { throw new RuntimeException("Carbon data file not exists."); } Schema schema = CarbonSchemaReader .readSchema(dataFiles[0].getAbsolutePath()) .asOriginOrder(); // Transform the schema String[] strings = new String[schema.getFields().length]; for (int i = 0; i < schema.getFields().length; i++) { strings[i] = (schema.getFields())[i].getFieldName(); } // Read data CarbonReader reader = CarbonReader .builder(path, "_temp") .projection(strings) .build(); int i = 0; while (reader.hasNext()) { Object[] row = (Object[]) reader.readNextRow(); assert (row[0].equals("robot" + i)); assert (row[2].equals(i)); assert (row[6].equals(17957)); Object[] arr = (Object[]) row[10]; assert (arr[0].equals("Hello")); assert (arr[3].equals("Carbon")); i++; } reader.close(); // FileUtils.deleteDirectory(new File(path)); } catch (Throwable e) { e.printStackTrace(); Assert.fail(e.getMessage()); } } {code} ## read data {code:java} test("Test read image carbon with spark carbon file format, generate by sdk, CTAS") { sql("DROP TABLE IF EXISTS binaryCarbon") sql("DROP TABLE IF EXISTS binaryCarbon3") if (SparkUtil.isSparkVersionEqualTo("2.1")) { sql(s"CREATE TABLE binaryCarbon USING CARBON OPTIONS(PATH '$writerPath')") sql(s"CREATE TABLE binaryCarbon3 USING CARBON OPTIONS(PATH '$outputPath')" + " AS SELECT * FROM binaryCarbon") } else { //sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION '$writerPath'") sql(s"CREATE TABLE binaryCarbon USING CARBON LOCATION '/Users/xubo/Desktop/xubo/git/carbondata3/store/sdk/testWriteFiles'") sql("SELECT COUNT(*) FROM binaryCarbon").show() } } {code} ## exception: {code:java} java.io.IOException: All common columns present in the files doesn't have same datatype. Unsupported operation on nonTransactional table. Check logs. at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.updateColumns(AbstractQueryExecutor.java:290) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:234) at
[jira] [Commented] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821005#comment-16821005 ] xubo245 commented on CARBONDATA-3336: - Array:org.apache.carbondata.processing.loading.parser.impl.RowParserImpl#parseRow > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Attachments: CarbonData support binary data type V0.1.pdf > > Time Spent: 4h 10m > Remaining Estimate: 0h > > CarbonData supports binary data type > Version Changes Owner Date > 0.1 Init doc for Supporting binary data typeXubo2019-4-10 > Background : > Binary is basic data type and widely used in various scenarios. So it’s > better to support binary data type in CarbonData. Download data from S3 will > be slow when dataset has lots of small binary data. The majority of > application scenarios are related to storage small binary data type into > CarbonData, which can avoid small binary files problem and speed up S3 access > performance, also can decrease cost of accessing OBS by decreasing the number > of calling S3 API. It also will easier to manage structure data and > Unstructured data(binary) by storing them into CarbonData. > Goals: > 1. Supporting write binary data type by Carbon Java SDK. > 2. Supporting read binary data type by Spark Carbon file format(carbon > datasource) and CarbonSession. > 3. Supporting read binary data type by Carbon SDK > 4. Supporting write binary by spark > Approach and Detail: > 1.Supporting write binary data type by Carbon Java SDK [Formal]: > 1.1 Java SDK needs support write data with specific data types, > like int, double, byte[ ] data type, no need to convert all data type to > string array. User read binary file as byte[], then SDK writes byte[] into > binary column. > 1.2 CarbonData compress binary column because now the compressor is > table level. > =>TODO, support configuration for compress, default is no > compress because binary usually is already compressed, like jpg format image. > So no need to uncompress for binary column. 1.5.4 will support column level > compression, after that, we can implement no compress for binary. We can talk > with community. > 1.3 CarbonData stores binary as dimension. > 1.4 Support configure page size for binary data type because binary > data usually is big, such as 200k. Otherwise it will be very big for one > blocklet (32000 rows). > TODO: 1.5 Avro, JSON convert need consider > > 2. Supporting read and manage binary data type by Spark Carbon file > format(carbon DataSource) and CarbonSession.[Formal] > 2.1 Supporting read binary data type from non-transaction table, > read binary column and return as byte[] > 2.2 Support create table with binary column, table property doesn’t > support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary > column > => Evaluate COLUMN_META_CACHE for binary >=> CARBON Datasource don't support dictionary include column >=> carbon.column.compressor for all columns > 2.3 Support CTAS for binary=> transaction/non-transaction > 2.4 Support external table for binary > 2.5 Support projection for binary column > 2.6 Support desc formatted >=> Carbon Datasource don't support ALTER TABLE add > calumny sql >=>TODO: ALTER TABLE for binary data type in carbon session > 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary > 2.8 Support compaction for binary(TODO) > 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, > no need min max datamap for binary, support mv and pre-aggregate in the > future > 2.10 CSDK / python SDK support binary in the future.(TODO) > 2.11 Support S3 > TODO: > 2.12 support UDF, hex, base64, cast: >select hex(bin) from carbon_table. >select CAST(s AS BINARY) from carbon_table. > CarbonSession: impact analysis > > 3. Supporting read binary data type by Carbon SDK > 3.1 Supporting read binary data type from non-transaction table, > read binary column and return as byte[] > 3.2 Supporting projection for binary column > 3.3 Supporting S3 > 3.4 no need to support filter. > 4. Supporting write binary by spark (carbon file format / > carbonsession, POC??) > 4.1 Convert binary to String and storage in CSV > 4.2 Spark load CSV and
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support desc formatted => Carbon Datasource don't support ALTER TABLE add calumny sql =>TODO: ALTER TABLE for binary data type in carbon session 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary(TODO) 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future.(TODO) 2.11 Support S3 TODO: 2.12 support UDF, hex, base64, cast: select hex(bin) from carbon_table. select CAST(s AS BINARY) from carbon_table. CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV 4.2 Spark load CSV and convert string to byte[], and storage in CarbonData. read binary column and return as byte[] 4.3 Supporting insert into (string => binary), TODO: update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html was: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various
[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3351: Description: 1.Supporting write binary data type by Carbon Java SDK: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>PR2814 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession. 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support desc => Carbon Datasource don't support ALTER TABLE add column by sql 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support S3 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV 4.2 Spark load CSV and convert string to byte[], and storage in CarbonData. read binary column and return as byte[] 4.3 Supporting insert into (string => binary), TODO: update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource was: 1.Supporting write binary data type by Carbon Java SDK: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>PR2814 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession. 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support desc => Carbon Datasource don't support ALTER TABLE add column by sql 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support S3 > Support Binary Data Type > > > Key: CARBONDATA-3351 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3351 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 16h 50m > Remaining Estimate: 0h > > 1.Supporting write binary data type by Carbon Java SDK: > 1.1 Java SDK needs support write data with specific data types, like int, > double, byte[ ] data type, no need to convert all data type to string
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support desc formatted => Carbon Datasource don't support ALTER TABLE add calumny sql =>TODO: ALTER TABLE for binary data type in carbon session 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary(TODO) 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future.(TODO) 2.11 Support S3 TODO: 2.12 support UDF, hex, base64, cast: select hex(bin) from carbon_table. select CAST(s AS BINARY) from carbon_table. CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support desc formatted => Carbon Datasource don't support ALTER TABLE add calumny sql =>TODO: ALTER TABLE for binary data type in carbon session 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary(TODO) 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future.(TODO) 2.11 Support S3 CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html was: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is
[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3351: Description: 1.Supporting write binary data type by Carbon Java SDK: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). =>PR2814 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession. 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support desc => Carbon Datasource don't support ALTER TABLE add column by sql 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support S3 was: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 > Support Binary Data Type > > > Key: CARBONDATA-3351 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3351 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > 1.Supporting write binary data type by Carbon Java SDK: > 1.1 Java SDK needs support write data with specific data types, like int, > double, byte[ ] data type, no need to convert all data type to string array. > User read binary file as byte[], then SDK writes byte[] into binary column. > > 1.2 CarbonData compress binary column because now the compressor is table > level. > =>TODO, support configuration for compress, default is no compress because > binary usually is already compressed, like jpg format image. So no need to > uncompress for binary column. 1.5.4 will support column level compression, > after that, we
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type => Carbon Datasource don't support ALTER TABLE add calumny sql 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html was: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios.
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => CARBON Datasource don't support dictionary include column => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html was: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3
[jira] [Created] (CARBONDATA-3352) Avro, JSON writer of SDK support binary.
xubo245 created CARBONDATA-3352: --- Summary: Avro, JSON writer of SDK support binary. Key: CARBONDATA-3352 URL: https://issues.apache.org/jira/browse/CARBONDATA-3352 Project: CarbonData Issue Type: Sub-task Reporter: xubo245 Assignee: xubo245 Avro, JSON writer of SDK support binary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3351: Description: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 was: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 > Support Binary Data Type > > > Key: CARBONDATA-3351 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3351 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > 1.Supporting write binary data type by Carbon Java SDK [Formal]: > 1.1 Java SDK needs support write data with specific data types, > like int,
[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3351: Description: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 > Support Binary Data Type > > > Key: CARBONDATA-3351 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3351 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > 1.Supporting write binary data type by Carbon Java SDK [Formal]: > 1.1 Java SDK needs support write data with specific data types, > like int, double, byte[ ] data type, no need to convert all data type to > string array. User read binary file as byte[], then SDK writes byte[] into > binary column. > 1.2 CarbonData compress binary column because now the compressor is > table level. > =>TODO, support configuration for compress, default is no > compress because binary usually is already compressed, like jpg format image. > So no need to uncompress for binary column. 1.5.4 will support column level > compression, after that, we can implement no compress for binary. We can talk > with community. > 1.3 CarbonData stores binary as dimension. > 1.4 Support configure page size for binary data type because binary > data usually is big, such as 200k. Otherwise it will be very big for one > blocklet (32000 rows). > TODO: 1.5 Avro, JSON convert need consider > > 2. Supporting read and manage binary data type by Spark Carbon file > format(carbon DataSource) and CarbonSession.[Formal] > 2.1 Supporting read binary data type from non-transaction table, > read binary column and return as byte[] > 2.2 Support create table with binary column, table property doesn’t > support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary > column > => Evaluate COLUMN_META_CACHE for binary > => carbon.column.compressor for all columns > 2.3 Support CTAS for binary=> transaction/non-transaction > 2.4 Support external table for binary > 2.5 Support projection for binary column > 2.6 Support show table, desc, ALTER TABLE for binary data type > 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary > 2.8 Support compaction for binary > 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, > no need min max datamap for binary, support mv and pre-aggregate in the > future > 2.10 CSDK / python SDK support binary in the future. > 2.11 Support S3 -- This message was sent by Atlassian JIRA
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK. 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession. 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html was: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK.[Formal] 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession.[Formal] 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 1.6 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html was: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots
[jira] [Updated] (CARBONDATA-3351) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3351: Issue Type: Sub-task (was: Task) Parent: CARBONDATA-3336 > Support Binary Data Type > > > Key: CARBONDATA-3351 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3351 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK.[Formal] 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession.[Formal] 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 1.6 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. mail list: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discuss-CarbonData-supports-binary-data-type-td76828.html was: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Description: CarbonData supports binary data type Version Changes Owner Date 0.1 Init doc for Supporting binary data typeXubo2019-4-10 Background : Binary is basic data type and widely used in various scenarios. So it’s better to support binary data type in CarbonData. Download data from S3 will be slow when dataset has lots of small binary data. The majority of application scenarios are related to storage small binary data type into CarbonData, which can avoid small binary files problem and speed up S3 access performance, also can decrease cost of accessing OBS by decreasing the number of calling S3 API. It also will easier to manage structure data and Unstructured data(binary) by storing them into CarbonData. Goals: 1. Supporting write binary data type by Carbon Java SDK.[Formal] 2. Supporting read binary data type by Spark Carbon file format(carbon datasource) and CarbonSession.[Formal] 3. Supporting read binary data type by Carbon SDK 4. Supporting write binary by spark Approach and Detail: 1.Supporting write binary data type by Carbon Java SDK [Formal]: 1.1 Java SDK needs support write data with specific data types, like int, double, byte[ ] data type, no need to convert all data type to string array. User read binary file as byte[], then SDK writes byte[] into binary column. 1.2 CarbonData compress binary column because now the compressor is table level. =>TODO, support configuration for compress, default is no compress because binary usually is already compressed, like jpg format image. So no need to uncompress for binary column. 1.5.4 will support column level compression, after that, we can implement no compress for binary. We can talk with community. 1.3 CarbonData stores binary as dimension. 1.4 Support configure page size for binary data type because binary data usually is big, such as 200k. Otherwise it will be very big for one blocklet (32000 rows). TODO: 1.5 Avro, JSON convert need consider 1.6 2. Supporting read and manage binary data type by Spark Carbon file format(carbon DataSource) and CarbonSession.[Formal] 2.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 2.2 Support create table with binary column, table property doesn’t support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary column => Evaluate COLUMN_META_CACHE for binary => carbon.column.compressor for all columns 2.3 Support CTAS for binary=> transaction/non-transaction 2.4 Support external table for binary 2.5 Support projection for binary column 2.6 Support show table, desc, ALTER TABLE for binary data type 2.7 Don’t support PARTITION, filter, BUCKETCOLUMNS for binary 2.8 Support compaction for binary 2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, no need min max datamap for binary, support mv and pre-aggregate in the future 2.10 CSDK / python SDK support binary in the future. 2.11 Support S3 CarbonSession: impact analysis 3. Supporting read binary data type by Carbon SDK 3.1 Supporting read binary data type from non-transaction table, read binary column and return as byte[] 3.2 Supporting projection for binary column 3.3 Supporting S3 3.4 no need to support filter. 4. Supporting write binary by spark (carbon file format / carbonsession, POC??) 4.1 Convert binary to String and storage in CSV, encode as Hex, Base64 4.2 Spark load CSV and convert string to binary, and storage in CarbonData. CarbonData internal will decode Hex to binary. 4.3 Supporting insert (string => binary, configuration for encode/decode algorithm, default is Hex, user can change to base64 or others, is it ok?), update, delete for binary 4.4 Don’t support stream table. => refer hive and Spark2.4 image DataSource Formal? How to support write into binary read from images in SQL? Use spark core code is ok. was: Support Binary Data Type: 1. Support write and read binary data type by CarbonData Java SDK 2. Support read binary data type by Spark Carbon File Format > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Attachment: (was: CarbonData support binary data type.pdf) > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Attachments: CarbonData support binary data type V0.1.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Support Binary Data Type: > 1. Support write and read binary data type by CarbonData Java SDK > 2. Support read binary data type by Spark Carbon File Format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Attachment: CarbonData support binary data type v0.1.pdf > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Attachments: CarbonData support binary data type V0.1.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Support Binary Data Type: > 1. Support write and read binary data type by CarbonData Java SDK > 2. Support read binary data type by Spark Carbon File Format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Attachment: CarbonData support binary data type V0.1.pdf > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Attachments: CarbonData support binary data type V0.1.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Support Binary Data Type: > 1. Support write and read binary data type by CarbonData Java SDK > 2. Support read binary data type by Spark Carbon File Format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Attachment: (was: CarbonData support binary data type v0.1.pdf) > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Attachments: CarbonData support binary data type V0.1.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Support Binary Data Type: > 1. Support write and read binary data type by CarbonData Java SDK > 2. Support read binary data type by Spark Carbon File Format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type
[ https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3336: Attachment: CarbonData support binary data type.pdf > Support Binary Data Type > > > Key: CARBONDATA-3336 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Attachments: CarbonData support binary data type.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Support Binary Data Type: > 1. Support write and read binary data type by CarbonData Java SDK > 2. Support read binary data type by Spark Carbon File Format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3351) Support Binary Data Type
xubo245 created CARBONDATA-3351: --- Summary: Support Binary Data Type Key: CARBONDATA-3351 URL: https://issues.apache.org/jira/browse/CARBONDATA-3351 Project: CarbonData Issue Type: Task Reporter: xubo245 Assignee: xubo245 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3342) It throws IllegalArgumentException when using filter
xubo245 created CARBONDATA-3342: --- Summary: It throws IllegalArgumentException when using filter Key: CARBONDATA-3342 URL: https://issues.apache.org/jira/browse/CARBONDATA-3342 Project: CarbonData Issue Type: Bug Reporter: xubo245 Assignee: xubo245 {code:java} public void testReadWithFilterOfNonTransactional2() throws IOException, InterruptedException { String path = "./testWriteFiles"; FileUtils.deleteDirectory(new File(path)); DataMapStoreManager.getInstance() .clearDataMaps(AbsoluteTableIdentifier.from(path)); Field[] fields = new Field[2]; fields[0] = new Field("name", DataTypes.STRING); fields[1] = new Field("age", DataTypes.INT); TestUtil.writeFilesAndVerify(200, new Schema(fields), path); ColumnExpression columnExpression = new ColumnExpression("age", DataTypes.INT); EqualToExpression equalToExpression = new EqualToExpression(columnExpression, new LiteralExpression("-11", DataTypes.INT)); CarbonReader reader = CarbonReader .builder(path, "_temp") .projection(new String[]{"name", "age"}) .filter(equalToExpression) .build(); int i = 0; while (reader.hasNext()) { Object[] row = (Object[]) reader.readNextRow(); // Default sort column is applied for dimensions. So, need to validate accordingly assert (((String) row[0]).contains("robot")); assert (1 == (int) (row[1])); i++; } Assert.assertEquals(i, 1); reader.close(); FileUtils.deleteDirectory(new File(path)); } {code} Exception: {code:java} 2019-04-04 18:15:23 INFO CarbonLRUCache:163 - Removed entry from InMemory lru cache :: /Users/xubo/Desktop/xubo/git/carbondata2/store/sdk/testWriteFiles/63862773138004_batchno0-0-null-63862150454623.carbonindex java.lang.IllegalArgumentException: no reader at org.apache.carbondata.sdk.file.CarbonReader.(CarbonReader.java:60) at org.apache.carbondata.sdk.file.CarbonReaderBuilder.build(CarbonReaderBuilder.java:222) at org.apache.carbondata.sdk.file.CarbonReaderTest.testReadWithFilterOfNonTransactional2(CarbonReaderTest.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3336) Support Binary Data Type
xubo245 created CARBONDATA-3336: --- Summary: Support Binary Data Type Key: CARBONDATA-3336 URL: https://issues.apache.org/jira/browse/CARBONDATA-3336 Project: CarbonData Issue Type: New Feature Reporter: xubo245 Assignee: xubo245 Support Binary Data Type: 1. Support write and read binary data type by CarbonData Java SDK 2. Support read binary data type by Spark Carbon File Format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3271) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3271: Summary: WIP (was: CarboData provide python SDK) > WIP > --- > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.1 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-3283) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 closed CARBONDATA-3283. --- Resolution: Incomplete > WIP > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > WIP -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-3254) [WIP]
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 closed CARBONDATA-3254. --- Resolution: Incomplete > [WIP] > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-3255) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 closed CARBONDATA-3255. --- Resolution: Incomplete > WIP > --- > > Key: CARBONDATA-3255 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3255 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3283) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3283: Summary: WIP (was: Support write data with different data type) > WIP > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData support AI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3283) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3283: Description: WIP (was: CarbonData support AI ) > WIP > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > WIP -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3255) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3255: Description: (was: Support binary data type) Summary: WIP (was: Support binary data type) > WIP > --- > > Key: CARBONDATA-3255 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3255 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) [WIP]
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Summary: [WIP] (was: [WIP] CarbonData supports deep learning framework to write and read image/voice data) > [WIP] > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-3271) WIP
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 closed CARBONDATA-3271. --- Resolution: Incomplete > WIP > --- > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.1 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) [WIP] CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Description: (was: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read, which can be used for deep learning framework tensorflow/MXNet or others * Support write data by different data type * Support read data by file or file lists) > [WIP] CarbonData supports deep learning framework to write and read > image/voice data > > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3271: Description: (was: Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData For pyspark, they used py4j for python invoke java code: ![image](http://i.imgur.com/YlI8AqEl.png) https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf Please refer: # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals # https://issues.apache.org/jira/browse/SPARK-3789 ) > CarboData provide python SDK > > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.1 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) [WIP] CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Summary: [WIP] CarbonData supports deep learning framework to write and read image/voice data (was: CarbonData supports deep learning framework to write and read image/voice data) > [WIP] CarbonData supports deep learning framework to write and read > image/voice data > > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData supports deep learning framework to write and read image/voice data > * Supports write and read image in CarbonData > * Provide Carbon python SDK for read, which can be used for deep learning > framework tensorflow/MXNet or others > * Support write data by different data type > * Support read data by file or file lists -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Description: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read * was: CarbonData supports deep learning framework to write and read image/voice data * Supports read image in CarbonData * Support write image in CarbonData > CarbonData supports deep learning framework to write and read image/voice data > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData supports deep learning framework to write and read image/voice data > * Supports write and read image in CarbonData > * Provide Carbon python SDK for read > * -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Description: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read, which can be used for deep learning framework tensorflow/MXNet or others * Support write data by different data type * Support read data by file or file lists was: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read, which can be used for deep learning framework tensorflow/MXNet or others * Support write data by different data type > CarbonData supports deep learning framework to write and read image/voice data > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData supports deep learning framework to write and read image/voice data > * Supports write and read image in CarbonData > * Provide Carbon python SDK for read, which can be used for deep learning > framework tensorflow/MXNet or others > * Support write data by different data type > * Support read data by file or file lists -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Description: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read, which can be used for tensor flow/MXnet * was: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read * > CarbonData supports deep learning framework to write and read image/voice data > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData supports deep learning framework to write and read image/voice data > * Supports write and read image in CarbonData > * Provide Carbon python SDK for read, which can be used for tensor flow/MXnet > * -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Description: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read, which can be used for deep learning framework tensorflow/MXNet or others * Support write data by different data type was: CarbonData supports deep learning framework to write and read image/voice data * Supports write and read image in CarbonData * Provide Carbon python SDK for read, which can be used for tensor flow/MXnet * > CarbonData supports deep learning framework to write and read image/voice data > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData supports deep learning framework to write and read image/voice data > * Supports write and read image in CarbonData > * Provide Carbon python SDK for read, which can be used for deep learning > framework tensorflow/MXNet or others > * Support write data by different data type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Description: CarbonData supports deep learning framework to write and read image/voice data * Supports read image in CarbonData * Support write image in CarbonData was: CarbonData support AI Support write and read image in CarbonData > CarbonData supports deep learning framework to write and read image/voice data > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData supports deep learning framework to write and read image/voice data > * Supports read image in CarbonData > * Support write image in CarbonData -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) CarbonData supports deep learning framework to write and read image/voice data
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Summary: CarbonData supports deep learning framework to write and read image/voice data (was: CarbonData support AI) > CarbonData supports deep learning framework to write and read image/voice data > -- > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData support AI > Support write and read image in CarbonData -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3283) Support write data with different data type
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3283: Issue Type: Sub-task (was: New Feature) Parent: CARBONDATA-3254 > Support write data with different data type > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: Sub-task >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData support AI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-3271) CarboData provide python SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 reassigned CARBONDATA-3271: --- Assignee: xubo245 > CarboData provide python SDK > > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.1 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Many users use python to install their project. It's not easy for them to use > carbon by Java/Scala/C++. And Spark also provide python SDK for users. So > it's better to provide python SDK for CarbonData > For pyspark, they used py4j for python invoke java code: > ![image](http://i.imgur.com/YlI8AqEl.png) > https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf > Please refer: > # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals > # https://issues.apache.org/jira/browse/SPARK-3789 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3271: Issue Type: Sub-task (was: New Feature) Parent: CARBONDATA-3254 > CarboData provide python SDK > > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.1 >Reporter: xubo245 >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Many users use python to install their project. It's not easy for them to use > carbon by Java/Scala/C++. And Spark also provide python SDK for users. So > it's better to provide python SDK for CarbonData > For pyspark, they used py4j for python invoke java code: > ![image](http://i.imgur.com/YlI8AqEl.png) > https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf > Please refer: > # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals > # https://issues.apache.org/jira/browse/SPARK-3789 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3254) CarbonData support AI
[ https://issues.apache.org/jira/browse/CARBONDATA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3254: Description: CarbonData support AI Support write and read image in CarbonData was: Support write and read image in CarbonData Summary: CarbonData support AI (was: Support write and read image in CarbonData) > CarbonData support AI > - > > Key: CARBONDATA-3254 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData support AI > Support write and read image in CarbonData -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3283) Support write data with different data type
[ https://issues.apache.org/jira/browse/CARBONDATA-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3283: Summary: Support write data with different data type (was: CarbonData support AI ) > Support write data with different data type > --- > > Key: CARBONDATA-3283 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 > Project: CarbonData > Issue Type: New Feature >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > CarbonData support AI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3283) CarbonData support AI
xubo245 created CARBONDATA-3283: --- Summary: CarbonData support AI Key: CARBONDATA-3283 URL: https://issues.apache.org/jira/browse/CARBONDATA-3283 Project: CarbonData Issue Type: New Feature Reporter: xubo245 Assignee: xubo245 CarbonData support AI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3280) SDK batch read failed
xubo245 created CARBONDATA-3280: --- Summary: SDK batch read failed Key: CARBONDATA-3280 URL: https://issues.apache.org/jira/browse/CARBONDATA-3280 Project: CarbonData Issue Type: Bug Reporter: xubo245 Assignee: xubo245 SDK batch read failed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3275) There are 4 errors in CI after PR 3094 merged
xubo245 created CARBONDATA-3275: --- Summary: There are 4 errors in CI after PR 3094 merged Key: CARBONDATA-3275 URL: https://issues.apache.org/jira/browse/CARBONDATA-3275 Project: CarbonData Issue Type: Bug Reporter: xubo245 Assignee: xubo245 There are 4 errors in CI after PR 3094 merged -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3271: Description: Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData For pyspark, they used py4j for python invoke java code: ![image](http://i.imgur.com/YlI8AqEl.png) https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf Please refer: # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals # https://issues.apache.org/jira/browse/SPARK-3789 was: Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData For pyspark, they used py4j for python invoke java code: ![image](http://i.imgur.com/YlI8AqEl.png) https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf Please refer: # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals # [2] https://issues.apache.org/jira/browse/SPARK-3789 > CarboData provide python SDK > > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: New Feature >Affects Versions: 1.5.1 >Reporter: xubo245 >Priority: Major > > Many users use python to install their project. It's not easy for them to use > carbon by Java/Scala/C++. And Spark also provide python SDK for users. So > it's better to provide python SDK for CarbonData > For pyspark, they used py4j for python invoke java code: > ![image](http://i.imgur.com/YlI8AqEl.png) > https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf > Please refer: > # https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals > # https://issues.apache.org/jira/browse/SPARK-3789 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3271: Description: Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData For pyspark, they used py4j for python invoke java code: ![](http://i.imgur.com/YlI8AqEl.png) https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf Please refer: # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals # [2] https://issues.apache.org/jira/browse/SPARK-3789 was: Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData > CarboData provide python SDK > > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: New Feature >Affects Versions: 1.5.1 >Reporter: xubo245 >Priority: Major > > Many users use python to install their project. It's not easy for them to use > carbon by Java/Scala/C++. And Spark also provide python SDK for users. So > it's better to provide python SDK for CarbonData > For pyspark, they used py4j for python invoke java code: > ![](http://i.imgur.com/YlI8AqEl.png) > https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf > Please refer: > # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals > # [2] https://issues.apache.org/jira/browse/SPARK-3789 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3271) CarboData provide python SDK
[ https://issues.apache.org/jira/browse/CARBONDATA-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3271: Description: Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData For pyspark, they used py4j for python invoke java code: ![image](http://i.imgur.com/YlI8AqEl.png) https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf Please refer: # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals # [2] https://issues.apache.org/jira/browse/SPARK-3789 was: Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData For pyspark, they used py4j for python invoke java code: ![](http://i.imgur.com/YlI8AqEl.png) https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf Please refer: # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals # [2] https://issues.apache.org/jira/browse/SPARK-3789 > CarboData provide python SDK > > > Key: CARBONDATA-3271 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 > Project: CarbonData > Issue Type: New Feature >Affects Versions: 1.5.1 >Reporter: xubo245 >Priority: Major > > Many users use python to install their project. It's not easy for them to use > carbon by Java/Scala/C++. And Spark also provide python SDK for users. So > it's better to provide python SDK for CarbonData > For pyspark, they used py4j for python invoke java code: > ![image](http://i.imgur.com/YlI8AqEl.png) > https://issues.apache.org/jira/secure/attachment/12752618/PyGraphX_design_doc.pdf > Please refer: > # [1] https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals > # [2] https://issues.apache.org/jira/browse/SPARK-3789 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3271) CarboData provide python SDK
xubo245 created CARBONDATA-3271: --- Summary: CarboData provide python SDK Key: CARBONDATA-3271 URL: https://issues.apache.org/jira/browse/CARBONDATA-3271 Project: CarbonData Issue Type: New Feature Affects Versions: 1.5.1 Reporter: xubo245 Many users use python to install their project. It's not easy for them to use carbon by Java/Scala/C++. And Spark also provide python SDK for users. So it's better to provide python SDK for CarbonData -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-3252) Remove unused import and optimize the import order
[ https://issues.apache.org/jira/browse/CARBONDATA-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 closed CARBONDATA-3252. --- Resolution: Fixed > Remove unused import and optimize the import order > --- > > Key: CARBONDATA-3252 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3252 > Project: CarbonData > Issue Type: Bug >Reporter: xubo245 >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Remove unused import and fix some spell error > * org.apache.carbondata.spark.testsuite.badrecordloger.BadRecordLoggerTest: > > remove CarbonLoadOptionConstants in line 27 > * > org.apache.carbondata.spark.testsuite.directdictionary.TimestampNoDictionaryColumnTestCase: > remove line 23 and 26 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3255) Support binary data type
xubo245 created CARBONDATA-3255: --- Summary: Support binary data type Key: CARBONDATA-3255 URL: https://issues.apache.org/jira/browse/CARBONDATA-3255 Project: CarbonData Issue Type: Sub-task Reporter: xubo245 Assignee: xubo245 Support binary data type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3254) Support write and read image in CarbonData
xubo245 created CARBONDATA-3254: --- Summary: Support write and read image in CarbonData Key: CARBONDATA-3254 URL: https://issues.apache.org/jira/browse/CARBONDATA-3254 Project: CarbonData Issue Type: New Feature Reporter: xubo245 Assignee: xubo245 Support write and read image in CarbonData -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3252) Remove unused import and optimize the import order
[ https://issues.apache.org/jira/browse/CARBONDATA-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3252: Summary: Remove unused import and optimize the import order (was: Remove unused import and fix some spell error) > Remove unused import and optimize the import order > --- > > Key: CARBONDATA-3252 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3252 > Project: CarbonData > Issue Type: Bug >Reporter: xubo245 >Priority: Major > > Remove unused import and fix some spell error > * org.apache.carbondata.spark.testsuite.badrecordloger.BadRecordLoggerTest: > > remove CarbonLoadOptionConstants in line 27 > * > org.apache.carbondata.spark.testsuite.directdictionary.TimestampNoDictionaryColumnTestCase: > remove line 23 and 26 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3252) Remove unused import and fix some spell error
xubo245 created CARBONDATA-3252: --- Summary: Remove unused import and fix some spell error Key: CARBONDATA-3252 URL: https://issues.apache.org/jira/browse/CARBONDATA-3252 Project: CarbonData Issue Type: Bug Reporter: xubo245 Remove unused import and fix some spell error * org.apache.carbondata.spark.testsuite.badrecordloger.BadRecordLoggerTest: remove CarbonLoadOptionConstants in line 27 * org.apache.carbondata.spark.testsuite.directdictionary.TimestampNoDictionaryColumnTestCase: remove line 23 and 26 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3251) Fix spark-2.1 UT errors
xubo245 created CARBONDATA-3251: --- Summary: Fix spark-2.1 UT errors Key: CARBONDATA-3251 URL: https://issues.apache.org/jira/browse/CARBONDATA-3251 Project: CarbonData Issue Type: Bug Affects Versions: 1.5.1 Reporter: xubo245 Assignee: xubo245 Fix spark-2.1 UT errors -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3250) Optimize hive
[ https://issues.apache.org/jira/browse/CARBONDATA-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3250: Description: Optimize hive, including hive doc and code 1. running command {code:java} -DskipTests -Pspark-2.1 -Phadoop-2.7.2 clean package {code} warning: {code:java} [WARNING] The requested profile "hadoop-2.7.2" could not be activated because it does not exist. {code} was:Optimize hive, including hive doc and code > Optimize hive > - > > Key: CARBONDATA-3250 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3250 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 1.5.1 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > > Optimize hive, including hive doc and code > 1. running command > {code:java} > -DskipTests -Pspark-2.1 -Phadoop-2.7.2 clean package > {code} > warning: > {code:java} > [WARNING] The requested profile "hadoop-2.7.2" could not be activated because > it does not exist. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3250) Optimize hive
xubo245 created CARBONDATA-3250: --- Summary: Optimize hive Key: CARBONDATA-3250 URL: https://issues.apache.org/jira/browse/CARBONDATA-3250 Project: CarbonData Issue Type: Improvement Affects Versions: 1.5.1 Reporter: xubo245 Assignee: xubo245 Optimize hive, including hive doc and code -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3249) SQL and SDK float value is different
[ https://issues.apache.org/jira/browse/CARBONDATA-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xubo245 updated CARBONDATA-3249: Description: SQL and SDK float value is different Code: it's from https://github.com/xubo245/carbondata/commit/537c7265cc4bd755c073501773a523722709338a {code:java} test("test float") { val path = FileFactory.getPath(warehouse + "/sdk1").toString FileFactory.deleteAllFilesOfDir(new File(warehouse + "/sdk1")) sql("drop table if exists carbon_float") var fields: Array[Field] = new Array[Field](1) // same column name, but name as boolean type fields(0) = new Field("b", DataTypes.FLOAT) try { val builder = CarbonWriter.builder() val writer = builder.outputPath(path) .uniqueIdentifier(System.nanoTime()).withBlockSize(2) .withCsvInput(new Schema(fields)).writtenBy("SparkCarbonDataSourceTest").build() var i = 0 while (i < 1) { val array = Array[String]( "2147483648.1") writer.write(array) i += 1 } writer.close() val reader = CarbonReader.builder(path, "_temp").build i = 0 var floatValueSDK: Float = 0 while (i < 20 && reader.hasNext) { val row = reader.readNextRow.asInstanceOf[Array[AnyRef]] println("SDK float value is: " + row(0)) floatValueSDK = row(0).asInstanceOf[Float] i += 1 } reader.close() sql("create table carbon_float(floatField float) stored as carbondata") sql("insert into carbon_float values('2147483648.1')") val df = sql("Select * from carbon_float").collect() println("CarbonSession float value is: " + df(0)) assert(df(0).equals(floatValueSDK)) } catch { case ex: Exception => throw new RuntimeException(ex) } finally { sql("drop table if exists carbon_float") FileFactory.deleteAllFilesOfDir(new File(warehouse + "/sdk1")) } } {code} Exception: {code:java} SDK float value is: 2.14748365E9 2019-01-14 18:15:24 AUDIT audit:72 - {"time":"January 14, 2019 2:15:24 AM PST","username":"xubo","opName":"CREATE TABLE","opId":"26423231368673","opStatus":"START"} 2019-01-14 18:15:24 AUDIT audit:93 - {"time":"January 14, 2019 2:15:24 AM PST","username":"xubo","opName":"CREATE TABLE","opId":"26423231368673","opStatus":"SUCCESS","opTime":"604 ms","table":"default.carbon_float","extraInfo":{"bad_record_path":"","local_dictionary_enable":"true","external":"false","sort_columns":"","comment":""}} 2019-01-14 18:15:24 AUDIT audit:72 - {"time":"January 14, 2019 2:15:24 AM PST","username":"xubo","opName":"INSERT INTO","opId":"26424137339770","opStatus":"START"} 2019-01-14 18:15:26 AUDIT audit:93 - {"time":"January 14, 2019 2:15:26 AM PST","username":"xubo","opName":"INSERT INTO","opId":"26424137339770","opStatus":"SUCCESS","opTime":"1479 ms","table":"default.carbon_float","extraInfo":{"SegmentId":"0","DataSize":"408.0B","IndexSize":"254.0B"}} CarbonSession float value is: [2.1474836481E9] 2019-01-14 18:15:26 AUDIT audit:72 - {"time":"January 14, 2019 2:15:26 AM PST","username":"xubo","opName":"DROP TABLE","opId":"26425973212561","opStatus":"START"} 2019-01-14 18:15:27 AUDIT audit:93 - {"time":"January 14, 2019 2:15:27 AM PST","username":"xubo","opName":"DROP TABLE","opId":"26425973212561","opStatus":"SUCCESS","opTime":"393 ms","table":"default.carbon_float","extraInfo":{}} org.scalatest.exceptions.TestFailedException: df.apply(0).equals(floatValueSDK) was false java.lang.RuntimeException: org.scalatest.exceptions.TestFailedException: df.apply(0).equals(floatValueSDK) was false at org.apache.carbondata.spark.testsuite.datetype.FloatTest$$anonfun$1.apply$mcV$sp(FloatTest.scala:20) at org.apache.carbondata.spark.testsuite.datetype.FloatTest$$anonfun$1.apply(FloatTest.scala:12) at org.apache.carbondata.spark.testsuite.datetype.FloatTest$$anonfun$1.apply(FloatTest.scala:12) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) {code} was:SQL and SDK float value is different > SQL and SDK float value is different > > > Key: CARBONDATA-3249 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3249 > Project: CarbonData > Issue Type: Bug >Reporter: xubo245 >Priority: Major > > SQL and SDK float value is different > Code: it's from >