[GitHub] carbondata pull request #2409: [CARBONDATA-2608] Document update about Json ...
GitHub user ajantha-bhat opened a pull request: https://github.com/apache/carbondata/pull/2409 [CARBONDATA-2608] Document update about Json Writer with examples. [CARBONDATA-2608] Document update about Json Writer with examples. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed?NA - [ ] Any backward compatibility impacted?NA - [ ] Document update required?yes, updated - [ ] Testing done. NA - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajantha-bhat/carbondata master_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2409.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2409 commit d8010d3b269d29c69e61d3b367166bde6378dcdf Author: ajantha-bhat Date: 2018-06-25T09:25:41Z [CARBONDATA-2608]Document update about Json Writer with examples. ---
[jira] [Created] (CARBONDATA-2654) Optimize output for explaining query with datamap
xuchuanyin created CARBONDATA-2654: -- Summary: Optimize output for explaining query with datamap Key: CARBONDATA-2654 URL: https://issues.apache.org/jira/browse/CARBONDATA-2654 Project: CarbonData Issue Type: Sub-task Reporter: xuchuanyin Assignee: xuchuanyin Fix For: 1.4.1 Currently, If we have multiple datamaps and query hits all the datamaps. Carbondata explain command will only print the first datamap and all the other datamaps are not shown. We need to show all the datamap information used in the query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2391: [HOTFIX][CARBONDATA-2625] Optimize the performance o...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2391 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5446/ ---
[GitHub] carbondata issue #2391: [HOTFIX][CARBONDATA-2625] Optimize the performance o...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5371/ ---
[GitHub] carbondata issue #2391: [HOTFIX][CARBONDATA-2625] Optimize the performance o...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6542/ ---
[GitHub] carbondata issue #2391: [HOTFIX][CARBONDATA-2625] Optimize the performance o...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2391 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5445/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/2384 @ravipesala : PR is ready and build is success. Please check ---
[GitHub] carbondata issue #2391: [HOTFIX][CARBONDATA-2625] Optimize the performance o...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6541/ ---
[GitHub] carbondata issue #2391: [HOTFIX][CARBONDATA-2625] Optimize the performance o...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2391 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5370/ ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2408 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5369/ ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2408 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5444/ ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2408 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6540/ ---
[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2401 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5443/ ---
[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2401 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5368/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2384 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5442/ ---
[GitHub] carbondata issue #2401: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2401 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6539/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5367/ ---
[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2402 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5366/ ---
[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2402 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5441/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6538/ ---
[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2402 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6537/ ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2396 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5364/ ---
[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2402 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5440/ ---
[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5365/ ---
[GitHub] carbondata issue #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2402 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6536/ ---
[GitHub] carbondata issue #2178: show table in presto doesn't initialized load carbon...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2178 Can one of the admins verify this patch? ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2408 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5362/ ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2396 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5439/ ---
[jira] [Resolved] (CARBONDATA-2627) remove dependecy of tech.allegro.schema.json2avro
[ https://issues.apache.org/jira/browse/CARBONDATA-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-2627. - Resolution: Fixed Fix Version/s: 1.4.1 > remove dependecy of tech.allegro.schema.json2avro > - > > Key: CARBONDATA-2627 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2627 > Project: CarbonData > Issue Type: Bug >Reporter: Babulal >Priority: Minor > Fix For: 1.4.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > currently tech.allegro.schema.json2avro is used for json to avro converter > but it is not formally supported by AVRO and may feature does not work in > converter like byte data type. > Below code can be used instead of > > def jsonToAvro( json:String, schemaStr:String) :GenericRecord= { > var input :InputStream= null; > var writer :DataFileWriter[GenericRecord] = null; > var encoder :Encoder= null; > var output :ByteArrayOutputStream= null; > try { > val schema = new org.apache.avro.Schema.Parser().parse(schemaStr); > val reader = new GenericDatumReader[GenericRecord](schema); > input = new ByteArrayInputStream(json.getBytes()); > output = new ByteArrayOutputStream(); > val din = new DataInputStream(input); > writer = new DataFileWriter[GenericRecord](new > GenericDatumWriter[GenericRecord]()); > writer.create(schema, output); > val decoder = DecoderFactory.get().jsonDecoder(schema, din); > var datum :GenericRecord=null; > datum = reader.read(null, decoder); > return datum; > } finally { > try { > input.close(); > writer.close(); > } catch { > case e:Exception => { > e.printStackTrace() > } > } > } > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2398: [CARBONDATA-2627] removed the dependency of t...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2398 ---
[jira] [Resolved] (CARBONDATA-2630) Alter table set Table comment is throwing exception in spark-2.2 cluster
[ https://issues.apache.org/jira/browse/CARBONDATA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala resolved CARBONDATA-2630. - Resolution: Fixed Fix Version/s: 1.4.1 > Alter table set Table comment is throwing exception in spark-2.2 cluster > > > Key: CARBONDATA-2630 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2630 > Project: CarbonData > Issue Type: Improvement >Reporter: Rahul Kumar >Assignee: Rahul Kumar >Priority: Minor > Fix For: 1.4.1 > > Time Spent: 20m > Remaining Estimate: 0h > > *PreCondition :* ]started spark-2.2 cluster and launched beeline > *Test step:* > *1.* Create Table sample_comment5 (id int,dim1 string,name string,tech > string,measure int,amount int,dim2 string,M1 int,dim3 string,M2 int,dim4 > string,dim5 string,M3 int,dim6 string,dim7 string,M4 int,dim8 string,dim9 > string,M5 int,dim10 string,dim11 string,dim12 string,M6 int,dim13 > string,dim14 string,dim15 string,M7 int,dim16 string,dim17 string,dim18 > string,dim19 string) CoMMent "@" STORED BY 'org.apache.carbondata.format'; > *2.* alter table sample_comment5 SET TBLPROPERTIES(comment="malathi"); > *Expected Output:*comment should have been updated > *Actual Output:*Error: java.lang.RuntimeException: Alter table properties > operation failed: org.apache.spark.sql.SparkSession cannot be cast to > org.apache.spark.sql.CarbonSession (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2392: [CARBONDATA-2630] fix for exception thrown by...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2392 ---
[GitHub] carbondata issue #2398: [CARBONDATA-2627] removed the dependency of tech.all...
Github user rahulforallp commented on the issue: https://github.com/apache/carbondata/pull/2398 done ---
[GitHub] carbondata pull request #2398: [CARBONDATA-2627] removed the dependency of t...
Github user rahulforallp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2398#discussion_r197852117 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/TestUtil.java --- @@ -17,20 +17,58 @@ package org.apache.carbondata.sdk.file; +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.DataInputStream; import java.io.File; import java.io.FileFilter; import java.io.IOException; +import java.io.InputStream; import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.datastore.impl.FileFactory; import org.apache.carbondata.core.util.CarbonProperties; import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.avro.file.DataFileWriter; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericDatumReader; +import org.apache.avro.generic.GenericDatumWriter; +import org.apache.avro.io.DecoderFactory; +import org.apache.avro.io.Encoder; +import org.apache.avro.io.JsonDecoder; import org.junit.Assert; public class TestUtil { + public static GenericData.Record jsonToAvro(String json, String avroSchema) throws IOException { +InputStream input = null; +DataFileWriter writer = null; +Encoder encoder = null; +ByteArrayOutputStream output = null; +try { --- End diff -- test-cases are from two different packages , so we should write the util class separately. ---
[GitHub] carbondata pull request #2398: [CARBONDATA-2627] removed the dependency of t...
Github user rahulforallp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2398#discussion_r197852069 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -2301,3 +2292,29 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("select * from sdkOutputTable"), Seq(Row(Timestamp.valueOf("1970-01-02 16:00:00"), Row(Timestamp.valueOf("1970-01-02 16:00:00") } } + + +object avroUtil{ + + def jsonToAvro(json: String, avroSchema: String): GenericRecord = { +var input: InputStream = null +var writer: DataFileWriter[GenericRecord] = null +var encoder: Encoder = null +var output: ByteArrayOutputStream = null +try { + val schema = new org.apache.avro.Schema.Parser().parse(avroSchema) + val reader = new GenericDatumReader[GenericRecord](schema) + input = new ByteArrayInputStream(json.getBytes()) --- End diff -- test-cases are from two different packages , so we should write the util class separately. ---
[GitHub] carbondata issue #2392: [CARBONDATA-2630] fix for exception thrown by Alter ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2392 LGTM ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2408 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6531/ ---
[GitHub] carbondata issue #2398: [CARBONDATA-2627] removed the dependency of tech.all...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2398 LGTM , Please handle @ajantha-bhat comments I will merge it. ---
[GitHub] carbondata issue #2406: [CARBONDATA-2640][CARBONDATA-2642] Added configurabl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2406 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6533/ ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5360/ ---
[GitHub] carbondata issue #2400: [HOTFIX] Removed BatchedDataSourceScanExec class and...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2400 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5438/ ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user Indhumathi27 commented on the issue: https://github.com/apache/carbondata/pull/2396 retest this please ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2396 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5361/ ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user Indhumathi27 commented on the issue: https://github.com/apache/carbondata/pull/2396 retest this please ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2396 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6532/ ---
[GitHub] carbondata pull request #2398: [CARBONDATA-2627] removed the dependency of t...
Github user rahulforallp commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2398#discussion_r197826528 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -1460,8 +1459,13 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { } test("Read sdk writer Avro output Array Type with Default value") { -buildAvroTestDataSingleFileArrayDefaultType() -assert(new File(writerPath).exists()) +// avro1.8.x Parser donot handles default value , this willbe fixed in 1.9.x. So for now this +// will throw exception. After upgradation of Avro we can change this test case. --- End diff -- community knows this issue , they said this will be fixed in 2.x version ---
[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2384 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6528/ ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197821950 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/v3/CarbonFactDataWriterImplV3.java --- @@ -110,44 +135,51 @@ public CarbonFactDataWriterImplV3(CarbonFactDataHandlerModel model) { */ @Override public void writeTablePage(TablePage tablePage) throws CarbonDataWriterException,IOException { -// condition for writting all the pages -if (!tablePage.isLastPage()) { - boolean isAdded = false; - // check if size more than blocklet size then write the page to file - if (blockletDataHolder.getSize() + tablePage.getEncodedTablePage().getEncodedSize() >= - blockletSizeThreshold) { -// if blocklet size exceeds threshold, write blocklet data -if (blockletDataHolder.getEncodedTablePages().size() == 0) { - isAdded = true; - addPageData(tablePage); -} +try { --- End diff -- dnt format the code if code is not changed ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197821658 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/v3/CarbonFactDataWriterImplV3.java --- @@ -76,7 +79,29 @@ public CarbonFactDataWriterImplV3(CarbonFactDataHandlerModel model) { blockletSizeThreshold = fileSizeInBytes; LOGGER.info("Blocklet size configure for table is: " + blockletSizeThreshold); } -blockletDataHolder = new BlockletDataHolder(); +int numberOfCores = CarbonProperties.getInstance().getNumberOfCores(); --- End diff -- please remove unused code ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197820697 --- Diff: core/src/main/java/org/apache/carbondata/core/localdictionary/PageLevelDictionary.java --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.localdictionary; + +import java.io.IOException; +import java.util.BitSet; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.ColumnType; +import org.apache.carbondata.core.datastore.TableSpec; +import org.apache.carbondata.core.datastore.page.ColumnPage; +import org.apache.carbondata.core.datastore.page.encoding.ColumnPageEncoder; +import org.apache.carbondata.core.datastore.page.encoding.compress.DirectCompressCodec; +import org.apache.carbondata.core.datastore.page.statistics.DummyStatsCollector; +import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException; +import org.apache.carbondata.core.localdictionary.generator.LocalDictionaryGenerator; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.format.LocalDictionaryChunk; + +/** + * Class to maintain page level dictionary. It will store all unique dictionary values + * used in a page. This is required while writing blocklet level dictionary in carbondata + * file + */ +public class PageLevelDictionary { + + /** + * dictionary generator to generate dictionary values for page data + */ + private LocalDictionaryGenerator localDictionaryGenerator; + + /** + * set of dictionary surrogate key in this page + */ + private BitSet usedDictionaryValues; + + private int maxDictValue; + + private String columnName; + + public PageLevelDictionary(LocalDictionaryGenerator localDictionaryGenerator,String columnName) { +this.localDictionaryGenerator = localDictionaryGenerator; +this.usedDictionaryValues = new BitSet(); +this.columnName = columnName; + } + + /** + * Below method will be used to get the dictionary value + * + * @param data column data + * @return dictionary value + * @throws DictionaryThresholdReachedException when threshold crossed for column + */ + public int getDictionaryValue(byte[] data) throws DictionaryThresholdReachedException { +int dictionaryValue = localDictionaryGenerator.generateDictionary(data); +this.usedDictionaryValues.set(dictionaryValue); +if (maxDictValue < dictionaryValue) { + maxDictValue = dictionaryValue; +} +return dictionaryValue; + } + + /** + * Method to merge the dictionary value across pages + * + * @param pageLevelDictionary other page level dictionary + */ + public void mergerDictionaryValues(PageLevelDictionary pageLevelDictionary) { +usedDictionaryValues.and(pageLevelDictionary.usedDictionaryValues); + } + + /** + * Below method will be used to get the local dictionary chunk for writing + * @TODO Support for numeric data type dictionary exclude columns + * @return encoded local dictionary chunk + * @throws MemoryException + * in case of problem in encoding + * @throws IOException + * in case of problem in encoding + */ + public LocalDictionaryChunk getLocalDictionaryChunkForBlocklet() + throws MemoryException, IOException { +// TODO support for actual data type dictionary ColumnSPEC +TableSpec.ColumnSpec spec = TableSpec.ColumnSpec +.newInstance(columnName, DataTypes.BYTE_ARRAY, ColumnType.PLAIN_VALUE); +ColumnPage dictionaryColumnPage = ColumnPage.newPage(spec, DataTypes.BYTE_ARRAY, maxDictValue); +// TODO support data type specific stats collector for numeric data types +
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197817666 --- Diff: core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.localdictionary.dictionaryholder; + +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; + +import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper; +import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException; + +/** + * Map based dictionary holder class, it will use map to hold + * the dictionary key and its value + */ +public class MapBasedDictionaryStore implements DictionaryStore { + + /** + * use to assign dictionary value to new key + */ + private int lastAssignValue; + + /** + * to maintain dictionary key value + */ + private final Map dictionary; + + /** + * maintaining array for reverse lookup + * otherwise iterating everytime in map for reverse lookup will be slowdown the performance + * It will only maintain the reference + */ + private byte[][] referenceDictionaryArray; --- End diff -- Better directly use `DictionaryByteArrayWrapper` array here ---
[GitHub] carbondata pull request #2404: [CARBONDATA-2634][BloomDataMap] Add datamap p...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2404#discussion_r197816298 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/TestDataMapCommand.scala --- @@ -204,6 +204,37 @@ class TestDataMapCommand extends QueryTest with BeforeAndAfterAll { } } + test("test show datamap: show datamap property related information") { +val tableName = "datamapshowtest" +val datamapName = "bloomdatamap" +val datamapName2 = "bloomdatamap2" +val datamapName3 = "bloomdatamap3" +sql(s"drop table if exists $tableName") +sql(s"create table $tableName (a string, b string, c string) stored by 'carbondata'") --- End diff -- no, we should not print the child select query to user, if DM properties is not given for preagg then you can show null, as we do in describe formatted for comment and all, which is the behavior of hive also. ---
[GitHub] carbondata issue #2406: [CARBONDATA-2640][CARBONDATA-2642] Added configurabl...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2406 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5437/ ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197813935 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java --- @@ -623,5 +640,83 @@ public DataMapWriterListener getDataMapWriterlistener() { return dataMapWriterlistener; } + public Map getColumnLocalDictGenMap() { +return columnLocalDictGenMap; + } + + /** + * This method prepares a map which will have column and local dictionary generator mapping for + * all the local dictionary columns. + * @param carbonTable + * @param wrapperColumnSchema + * @param carbonFactDataHandlerModel + */ + public static void setLocalDictToModel(CarbonTable carbonTable, --- End diff -- Keep as `private` ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197813606 --- Diff: core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java --- @@ -172,71 +172,71 @@ IndexHeader indexheaderResult = getIndexHeader(columnCardinality, columnSchemaList, 0, 0L); assertEquals(indexHeader, indexheaderResult); } - - @Test public void testConvertFileFooter() throws Exception { -int[] cardinality = { 1, 2, 3, 4, 5 }; - -org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema colSchema = -new org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema(); -org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema colSchema1 = -new org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema(); - List -columnSchemaList = new ArrayList<>(); -columnSchemaList.add(colSchema); -columnSchemaList.add(colSchema1); - -SegmentProperties segmentProperties = new SegmentProperties(columnSchemaList, cardinality); - -final EncodedColumnPage measure = new EncodedColumnPage(new DataChunk2(), new byte[]{0,1}, -PrimitivePageStatsCollector.newInstance( -org.apache.carbondata.core.metadata.datatype.DataTypes.BYTE)); -new MockUp() { - @SuppressWarnings("unused") @Mock - public EncodedColumnPage getMeasure(int measureIndex) { -return measure; - } -}; - -new MockUp() { - @SuppressWarnings("unused") @Mock - public byte[] serializeStartKey() { -return new byte[]{1, 2}; - } - - @SuppressWarnings("unused") @Mock - public byte[] serializeEndKey() { -return new byte[]{1, 2}; - } -}; - -TablePageKey key = new TablePageKey(3, segmentProperties, false); -EncodedTablePage encodedTablePage = EncodedTablePage.newInstance(3, new EncodedColumnPage[0], new EncodedColumnPage[0], -key); - -List encodedTablePageList = new ArrayList<>(); -encodedTablePageList.add(encodedTablePage); - -BlockletInfo3 blockletInfoColumnar1 = new BlockletInfo3(); - -List blockletInfoColumnarList = new ArrayList<>(); -blockletInfoColumnarList.add(blockletInfoColumnar1); - -byte[] byteMaxArr = "1".getBytes(); -byte[] byteMinArr = "2".getBytes(); - -BlockletIndex index = getBlockletIndex(encodedTablePageList, segmentProperties.getMeasures()); -List indexList = new ArrayList<>(); -indexList.add(index); - -BlockletMinMaxIndex blockletMinMaxIndex = new BlockletMinMaxIndex(); -blockletMinMaxIndex.addToMax_values(ByteBuffer.wrap(byteMaxArr)); -blockletMinMaxIndex.addToMin_values(ByteBuffer.wrap(byteMinArr)); -FileFooter3 footer = convertFileFooterVersion3(blockletInfoColumnarList, -indexList, -cardinality, 2); -assertEquals(footer.getBlocklet_index_list(), indexList); - - } +// --- End diff -- remove if not required ---
[GitHub] carbondata issue #2334: [CARBONDATA-2515][CARBONDATA-2516] fixed Timestamp g...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2334 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5358/ ---
[GitHub] carbondata issue #2406: [CARBONDATA-2640][CARBONDATA-2642] Added configurabl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2406 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6529/ ---
[GitHub] carbondata pull request #2404: [CARBONDATA-2634][BloomDataMap] Add datamap p...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2404#discussion_r197810030 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/TestDataMapCommand.scala --- @@ -204,6 +204,37 @@ class TestDataMapCommand extends QueryTest with BeforeAndAfterAll { } } + test("test show datamap: show datamap property related information") { +val tableName = "datamapshowtest" +val datamapName = "bloomdatamap" +val datamapName2 = "bloomdatamap2" +val datamapName3 = "bloomdatamap3" +sql(s"drop table if exists $tableName") +sql(s"create table $tableName (a string, b string, c string) stored by 'carbondata'") --- End diff -- It seems that preaggregate datamap does not support user specified DMProperties. Howwever it has DMProperties internally. In the current commit, it will show: ``` +---++++ |DataMapName|ClassName |Associated Table|DataMap Properties | +---++++ |datamap1 |preaggregate|default.datamapshowtest_datamap1|'CHILD_SELECT QUERY'='c2VsZWN0IGNvdW50KGEpIGZyb20gZGF0YW1hcHNob3d0ZXN0', 'QUERYTYPE'='AGGREGATION', '_internal.deferred.rebuild'='false'| +---++++ ``` Do you think it's OK? ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197807784 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.blocklet; + +import java.io.IOException; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder; +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.format.LocalDictionaryChunk; + +/** + * Maintains the list of encoded page of a column in a blocklet + * and encoded dictionary values only if column is encoded using local + * dictionary + * Handle the fallback if all the pages in blocklet are not + * encoded with local dictionary + */ +public class BlockletEncodedColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName()); + + /** + * list of encoded page of a column in a blocklet + */ + private List encodedColumnPageList; + + /** + * fallback executor service + */ + private ExecutorService fallbackExecutorService; + + /** + * to check whether pages are local dictionary encoded or not + */ + private boolean isLocalDictEncoded; + + /** + * page level dictionary only when column is encoded with local dictionary + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * fallback future task queue; + */ + private ArrayDeque> fallbackFutureQueue; + + BlockletEncodedColumnPage(ExecutorService fallbackExecutorService, + EncodedColumnPage encodedColumnPage) { +this.encodedColumnPageList = new ArrayList<>(); +this.fallbackExecutorService = fallbackExecutorService; +this.encodedColumnPageList.add(encodedColumnPage); +// if dimension page is local dictionary enabled and encoded with local dictionary +if (encodedColumnPage.isLocalDictionaryEnabled() && encodedColumnPage +.isLocalDictGeneratedPage()) { + this.isLocalDictEncoded = true; + // get first page dictionary + this.pageLevelDictionary = encodedColumnPage.getPageDictionary(); +} + } + + /** + * Below method will be used to add column page of a column + * + * @param encodedColumnPage + * encoded column page + * @throws ExecutionException + * failure in fallback + * @throws InterruptedException + * failure during fallback + */ + void addEncodedColumnColumnPage(EncodedColumnPage encodedColumnPage) + throws ExecutionException, InterruptedException { +// if local dictionary is false or column is encoded with local dictionary then +// add a page +if (!isLocalDictEncoded || encodedColumnPage.isLocalDictGeneratedPage()) { + this.encodedColumnPageList.add(encodedColumnPage); + // merge page level dictionary values + if (null != this.pageLevelDictionary) { + pageLevelDictionary.mergerDictionaryValues(encodedColumnPage.getPageDictionary()); + } +} else { + // if older pages were encoded with dictionary and new pages are without
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2396 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6527/ ---
[GitHub] carbondata issue #2400: [HOTFIX] Removed BatchedDataSourceScanExec class and...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2400 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5357/ ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197804388 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.blocklet; + +import java.io.IOException; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder; +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.format.LocalDictionaryChunk; + +/** + * Maintains the list of encoded page of a column in a blocklet + * and encoded dictionary values only if column is encoded using local + * dictionary + * Handle the fallback if all the pages in blocklet are not + * encoded with local dictionary + */ +public class BlockletEncodedColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName()); + + /** + * list of encoded page of a column in a blocklet + */ + private List encodedColumnPageList; + + /** + * fallback executor service + */ + private ExecutorService fallbackExecutorService; + + /** + * to check whether pages are local dictionary encoded or not + */ + private boolean isLocalDictEncoded; + + /** + * page level dictionary only when column is encoded with local dictionary + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * fallback future task queue; + */ + private ArrayDeque> fallbackFutureQueue; + + BlockletEncodedColumnPage(ExecutorService fallbackExecutorService, + EncodedColumnPage encodedColumnPage) { --- End diff -- Don't add `encodedColumnPage` from constructor, use `addEncodedColumnColumnPage` ---
[GitHub] carbondata pull request #2402: [CARBONDATA-2587][CARBONDATA-2588] Local Dict...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2402#discussion_r197803602 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.blocklet; + +import java.io.IOException; +import java.util.ArrayDeque; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.page.FallbackColumnPageEncoder; +import org.apache.carbondata.core.datastore.page.FallbackEncodedColumnPage; +import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; +import org.apache.carbondata.core.localdictionary.PageLevelDictionary; +import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.format.LocalDictionaryChunk; + +/** + * Maintains the list of encoded page of a column in a blocklet + * and encoded dictionary values only if column is encoded using local + * dictionary + * Handle the fallback if all the pages in blocklet are not + * encoded with local dictionary + */ +public class BlockletEncodedColumnPage { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(BlockletEncodedColumnPage.class.getName()); + + /** + * list of encoded page of a column in a blocklet + */ + private List encodedColumnPageList; + + /** + * fallback executor service + */ + private ExecutorService fallbackExecutorService; + + /** + * to check whether pages are local dictionary encoded or not + */ + private boolean isLocalDictEncoded; + + /** + * page level dictionary only when column is encoded with local dictionary + */ + private PageLevelDictionary pageLevelDictionary; + + /** + * fallback future task queue; + */ + private ArrayDeque> fallbackFutureQueue; + + BlockletEncodedColumnPage(ExecutorService fallbackExecutorService, + EncodedColumnPage encodedColumnPage) { +this.encodedColumnPageList = new ArrayList<>(); +this.fallbackExecutorService = fallbackExecutorService; +this.encodedColumnPageList.add(encodedColumnPage); +// if dimension page is local dictionary enabled and encoded with local dictionary +if (encodedColumnPage.isLocalDictionaryEnabled() && encodedColumnPage --- End diff -- Just keep `this.isLocalDictEncoded =encodedColumnPage.isLocalDictGeneratedPage()` should be ok ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2396 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5436/ ---
[GitHub] carbondata issue #2334: [CARBONDATA-2515][CARBONDATA-2516] fixed Timestamp g...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2334 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6526/ ---
[GitHub] carbondata issue #2400: [HOTFIX] Removed BatchedDataSourceScanExec class and...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2400 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6525/ ---
[GitHub] carbondata issue #2406: [CARBONDATA-2640][CARBONDATA-2642] Added configurabl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2406 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5356/ ---
[GitHub] carbondata issue #2407: [CARBONDATA-2646][DataLoad]change the log level whil...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2407 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5355/ ---
[GitHub] carbondata pull request #2384: [CARBONDATA-2608] SDK Support JSON data loadi...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2384#discussion_r197781951 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonReaderBuilder.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Objects; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.annotations.InterfaceStability; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.processing.loading.jsoninput.JsonInputFormat; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.s3a.Constants; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptID; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl; + +@InterfaceAudience.User +@InterfaceStability.Evolving +public class JsonReaderBuilder { --- End diff -- ok. Removed this file and Integrated with existing builder itself. ---
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2397 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5435/ ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2408 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5354/ ---
[GitHub] carbondata issue #2407: [CARBONDATA-2646][DataLoad]change the log level whil...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2407 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6523/ ---
[GitHub] carbondata issue #2406: [CARBONDATA-2640][CARBONDATA-2642] Added configurabl...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2406 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6524/ ---
[GitHub] carbondata issue #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrec...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2408 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6522/ ---
[GitHub] carbondata issue #2394: [CARBONDATA- 2243] Added test case for database and ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2394 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5353/ ---
[GitHub] carbondata issue #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfil...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2403 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5434/ ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2396 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5352/ ---
[GitHub] carbondata issue #2396: [CARBONDATA-2606] [Complex DataType Enhancements] Pr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2396 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6521/ ---
[GitHub] carbondata issue #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfil...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2403 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6520/ ---
[jira] [Updated] (CARBONDATA-2638) Implement driver min max caching for specified columns and segregate block and blocklet cache
[ https://issues.apache.org/jira/browse/CARBONDATA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Gupta updated CARBONDATA-2638: - Attachment: Driver_Block_Cache.docx > Implement driver min max caching for specified columns and segregate block > and blocklet cache > - > > Key: CARBONDATA-2638 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2638 > Project: CarbonData > Issue Type: New Feature >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Major > Attachments: Driver_Block_Cache.docx > > > *Background* > Current implementation of Blocklet dataMap caching in driver is that it > caches the min and max values of all the columns in schema by default. > *Problem* > Problem with this implementation is that as the number of loads increases > the memory required to hold min and max values also increases considerably. > We know that in most of the scenarios there is a single driver and memory > configured for driver is less as compared to executor. With continuous > increase in memory requirement driver can even go out of memory which makes > the situation further worse. > *Solution* > 1. Cache only the required columns in Driver > 2. Segregation of block and Blocklet level cache** > For more details please check the attached document -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2638) Implement driver min max caching for specified columns and segregate block and blocklet cache
[ https://issues.apache.org/jira/browse/CARBONDATA-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manish Gupta updated CARBONDATA-2638: - Attachment: (was: Driver_Block_Cache.docx) > Implement driver min max caching for specified columns and segregate block > and blocklet cache > - > > Key: CARBONDATA-2638 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2638 > Project: CarbonData > Issue Type: New Feature >Reporter: Manish Gupta >Assignee: Manish Gupta >Priority: Major > Attachments: Driver_Block_Cache.docx > > > *Background* > Current implementation of Blocklet dataMap caching in driver is that it > caches the min and max values of all the columns in schema by default. > *Problem* > Problem with this implementation is that as the number of loads increases > the memory required to hold min and max values also increases considerably. > We know that in most of the scenarios there is a single driver and memory > configured for driver is less as compared to executor. With continuous > increase in memory requirement driver can even go out of memory which makes > the situation further worse. > *Solution* > 1. Cache only the required columns in Driver > 2. Segregation of block and Blocklet level cache** > For more details please check the attached document -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfil...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2403 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5351/ ---
[GitHub] carbondata issue #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfil...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2403 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5433/ ---
[GitHub] carbondata issue #2403: [CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfil...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2403 it depends on PR #2408 ---
[GitHub] carbondata issue #2047: [CARBONDATA-2240] Refactored TestPreaggregateExpress...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2047 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5350/ ---
[GitHub] carbondata pull request #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in i...
GitHub user xuchuanyin reopened a pull request: https://github.com/apache/carbondata/pull/2408 [CARBONDATA-2653][BloomDataMap] Fix bugs in incorrect blocklet number in bloomfilter The last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be generated in the bloom index file. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO` - How it is tested? Please attach test report. `Tested in local machine` - Is it a performance related change? Please attach the performance test report. `NO` - Any additional information to help reviewers in testing this change. `NA` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NA` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0625_bloom_dm_incorrect_number Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2408 commit 8d142f0a0b766cabac53543cc57045d49f7472a1 Author: xuchuanyin Date: 2018-06-25T09:18:49Z Fix bugs in incorrect blocklet number in bloomfilter The last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be generated in the bloom index file. ---
[GitHub] carbondata pull request #2408: [CARBONDATA-2632][BloomDataMap] Fix bugs in i...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2408 [CARBONDATA-2632][BloomDataMap] Fix bugs in incorrect blocklet number in bloomfilter The last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be generated in the bloom index file. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO` - How it is tested? Please attach test report. `Tested in local machine` - Is it a performance related change? Please attach the performance test report. `NO` - Any additional information to help reviewers in testing this change. `NA` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NA` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0625_bloom_dm_incorrect_number Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2408 commit 8d142f0a0b766cabac53543cc57045d49f7472a1 Author: xuchuanyin Date: 2018-06-25T09:18:49Z Fix bugs in incorrect blocklet number in bloomfilter The last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be generated in the bloom index file. ---
[GitHub] carbondata pull request #2408: [CARBONDATA-2653][BloomDataMap] Fix bugs in i...
Github user xuchuanyin closed the pull request at: https://github.com/apache/carbondata/pull/2408 ---
[GitHub] carbondata issue #2047: [CARBONDATA-2240] Refactored TestPreaggregateExpress...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2047 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6518/ ---
[GitHub] carbondata issue #2095: [CARBONDATA-2273] Added sdv test cases for boolean f...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2095 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6519/ ---
[jira] [Updated] (CARBONDATA-2653) Fix bugs in incorrect blocklet number in bloomfilter
[ https://issues.apache.org/jira/browse/CARBONDATA-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2653: --- Summary: Fix bugs in incorrect blocklet number in bloomfilter (was: Fix bugs incorrect blocklet number in bloomfilter) > Fix bugs in incorrect blocklet number in bloomfilter > > > Key: CARBONDATA-2653 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2653 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > > Incorrect blocklet number can be found during bloomfilter pruning. > This is because bloomfilterwriter write a extra blocklet before it finish. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2397: [CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2397 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5349/ ---
[GitHub] carbondata pull request #2407: [CARBONDATA-2646][DataLoad]change the log lev...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/2407 [CARBONDATA-2646][DataLoad]change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks. change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NO - [ ] Document update required? NA - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. Test in environment and check the log displayed - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata bugfix_dts2018062011034 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2407.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2407 commit eef16725b04c339dcf6ed948e6f08ba83ad5e025 Author: ndwangsen Date: 2018-06-25T08:50:18Z Change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197724441 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; + int complexTypeComplexColumnIndex = 0; + boolean isDimensionExists; + private int[] surrogateResult; + private byte[][] noDictionaryKeys; + private byte[][] complexTypeKeyArray; + protected Map comlexDimensionInfoMap; + /** + * Field of this Map is the parent Column and associated child columns. + * Final Projection shuld be a merged list consist of only parents. + */ + public Map> mergedComplexDimensionColumns; --- End diff -- it should be at method level ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197724373 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -102,6 +123,9 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { dictionaryColumnIndex = 0; noDictionaryColumnIndex = 0; complexTypeColumnIndex = 0; +mergedComplexDimensionDataMap = new HashMap<>(); --- End diff -- Why it is created for each row? why not reuse ? ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197723962 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; + int complexTypeComplexColumnIndex = 0; + boolean isDimensionExists; + private int[] surrogateResult; + private byte[][] noDictionaryKeys; + private byte[][] complexTypeKeyArray; + protected Map comlexDimensionInfoMap; + /** + * Field of this Map is the parent Column and associated child columns. + * Final Projection shuld be a merged list consist of only parents. + */ + public Map> mergedComplexDimensionColumns; + + /** + * Fields of this Map of Parent Ordinal with the List is the Child Column Dimension and + * the corresponding data buffer of that column. + */ + + public Map> mergedComplexDimensionDataMap; --- End diff -- it should be private ---
[GitHub] carbondata pull request #2396: [CARBONDATA-2606] [Complex DataType Enhanceme...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2396#discussion_r197723498 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -67,17 +69,38 @@ int noDictionaryColumnIndex; int complexTypeColumnIndex; + int noDictionaryComplexColumnIndex = 0; --- End diff -- Why it cannot be moved? ---
[GitHub] carbondata pull request #2334: [CARBONDATA-2515][CARBONDATA-2516] fixed Time...
Github user sv71294 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2334#discussion_r197722635 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -171,134 +170,89 @@ else if (colType.equals(DecimalType.createDecimalType(carbondataColumnHandle.get * @return */ static Expression parseFilterExpression(TupleDomain originalConstraint) { -ImmutableList.Builder filters = ImmutableList.builder(); Domain domain; +Expression finalFilters = null; --- End diff -- sure, adding comments in code ---
[GitHub] carbondata issue #2406: [CARBONDATA-2640][CARBONDATA-2642] Added configurabl...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2406 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5432/ ---
[GitHub] carbondata pull request #2334: [CARBONDATA-2515][CARBONDATA-2516] fixed Time...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2334#discussion_r197721852 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/PrestoFilterUtil.java --- @@ -171,134 +170,89 @@ else if (colType.equals(DecimalType.createDecimalType(carbondataColumnHandle.get * @return */ static Expression parseFilterExpression(TupleDomain originalConstraint) { -ImmutableList.Builder filters = ImmutableList.builder(); Domain domain; +Expression finalFilters = null; --- End diff -- 1. I mean, please put these detail explanation inside the code in your pr 2. colExpression, just column's name and data type, why may change ? ---