[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2876 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1173/ ---
[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2877 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1172/ ---
[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2876 LGTM ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229563795 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java --- @@ -305,7 +301,7 @@ public void setBlockDataType(DataType blockDataType) { } @Override public CarbonColumnVector getDictionaryVector() { -return dictionaryVector; +return null; --- End diff -- VectorizedCarbonRecordReader is handled in the same way for getCurrentKey(). ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229563694 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java --- @@ -145,9 +158,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO externalTableSegments.add(seg); } } - // do block filtering and get split - List splits = - getSplits(job, filter, externalTableSegments, null, partitionInfo, null); + List splits = new ArrayList<>(); + if (isSDK) { +for (CarbonFile carbonFile : getAllCarbonDataFiles(carbonTable.getTablePath())) { --- End diff -- ok ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229563660 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1737,4 +1738,89 @@ public void testReadNextRowWithProjectionAndRowUtil() { } } + @Test + public void testVectorReader() { +String path = "./testWriteFiles"; +try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[12]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("intField", DataTypes.INT); + fields[3] = new Field("longField", DataTypes.LONG); + fields[4] = new Field("doubleField", DataTypes.DOUBLE); + fields[5] = new Field("boolField", DataTypes.BOOLEAN); + fields[6] = new Field("dateField", DataTypes.DATE); + fields[7] = new Field("timeField", DataTypes.TIMESTAMP); + fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); + fields[9] = new Field("varcharField", DataTypes.VARCHAR); + fields[10] = new Field("byteField", DataTypes.BYTE); + fields[11] = new Field("floatField", DataTypes.FLOAT); + Map map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { +String[] row2 = new String[]{ +"robot" + (i % 10), +String.valueOf(i % 1), +String.valueOf(i), +String.valueOf(Long.MAX_VALUE - i), +String.valueOf((double) i / 2), +String.valueOf(true), +"2019-03-02", +"2019-02-12 03:03:34", +"12.345", +"varchar", +String.valueOf(i), +"1.23" +}; +writer.write(row2); + } + writer.close(); + + // Read data + CarbonReader reader = CarbonReader + .builder(path, "_temp") + .withVectorReader(true) + .build(); + + int i = 0; + while (reader.hasNext()) { +Object[] data = (Object[]) reader.readNextRow(); + +assert (RowUtil.getString(data, 0).equals("robot" + i)); +assertEquals(RowUtil.getShort(data, 4), i); +assertEquals(RowUtil.getInt(data, 5), i); +assert (RowUtil.getLong(data, 6) == Long.MAX_VALUE - i); +assertEquals(RowUtil.getDouble(data, 7), ((double) i) / 2); +assert (RowUtil.getByte(data, 8).equals(new Byte("1"))); +assertEquals(RowUtil.getInt(data, 1), 17957); +assertEquals(RowUtil.getLong(data, 2), 154992081400L); +assert (RowUtil.getDecimal(data, 9).equals("12.35")); +assert (RowUtil.getString(data, 3).equals("varchar")); +assertEquals(RowUtil.getByte(data, 10), new Byte(String.valueOf(i))); +assertEquals(RowUtil.getFloat(data, 11), new Float("1.23")); +i++; + } + reader.close(); --- End diff -- done ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229563709 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java --- @@ -138,6 +138,19 @@ public CarbonInputSplit(String segmentId, Path path, long start, long length, St version = CarbonProperties.getInstance().getFormatVersion(); } + public CarbonInputSplit(String segmentId, Path path, long start, long length, --- End diff -- ok ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229563684 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java --- @@ -145,9 +158,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO externalTableSegments.add(seg); } } - // do block filtering and get split - List splits = - getSplits(job, filter, externalTableSegments, null, partitionInfo, null); + List splits = new ArrayList<>(); + if (isSDK) { --- End diff -- changed ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229563650 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop.util; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.DecimalType; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.scan.executor.QueryExecutor; +import org.apache.carbondata.core.scan.executor.QueryExecutorFactory; +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.core.scan.model.ProjectionDimension; +import org.apache.carbondata.core.scan.model.ProjectionMeasure; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.hadoop.AbstractRecordReader; +import org.apache.carbondata.hadoop.CarbonInputSplit; + +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.log4j.Logger; + +/** + * A specialized RecordReader that reads into CarbonColumnarBatches directly using the + * carbondata column APIs and fills the data directly into columns. + */ +public class CarbonVectorizedRecordReader extends AbstractRecordReader { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName()); + + private CarbonColumnarBatch carbonColumnarBatch; + + private QueryExecutor queryExecutor; + + private int batchIdx = 0; + + private int numBatched = 0; + + private AbstractDetailQueryResultIterator iterator; + + private QueryModel queryModel; + + public CarbonVectorizedRecordReader(QueryModel queryModel) { +this.queryModel = queryModel; + } + + @Override public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) + throws IOException, InterruptedException { +List splitList; +if (inputSplit instanceof CarbonInputSplit) { + splitList = new ArrayList<>(1); + splitList.add((CarbonInputSplit) inputSplit); +} else { + throw new RuntimeException("unsupported input split type: " + inputSplit); +} +List tableBlockInfoList = CarbonInputSplit.createBlocks(splitList); +queryModel.setTableBlockInfos(tableBlockInfoList); +queryModel.setVectorReader(true); +try { + queryExecutor = + QueryExecutorFactory.getQueryExecutor(queryModel, taskAttemptContext.getConfiguration()); + iterator = (AbstractDetailQueryResultIterator) queryExecutor.execute(queryModel); +} catch (QueryExecutionException e) { + LOGGER.error(e); + throw new InterruptedException(e.getMessage()); +} catch (Exception e) { + LOGGER.error(e); + throw e; +} + } + + @Override public boolean nextKeyValue() throws IOException, InterruptedException { +initBatch(); --- End diff --
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229563643 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop.util; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.DecimalType; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.scan.executor.QueryExecutor; +import org.apache.carbondata.core.scan.executor.QueryExecutorFactory; +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.core.scan.model.ProjectionDimension; +import org.apache.carbondata.core.scan.model.ProjectionMeasure; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.hadoop.AbstractRecordReader; +import org.apache.carbondata.hadoop.CarbonInputSplit; + +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.log4j.Logger; + +/** + * A specialized RecordReader that reads into CarbonColumnarBatches directly using the + * carbondata column APIs and fills the data directly into columns. + */ +public class CarbonVectorizedRecordReader extends AbstractRecordReader { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName()); + + private CarbonColumnarBatch carbonColumnarBatch; + + private QueryExecutor queryExecutor; + + private int batchIdx = 0; + + private int numBatched = 0; + + private AbstractDetailQueryResultIterator iterator; + + private QueryModel queryModel; + + public CarbonVectorizedRecordReader(QueryModel queryModel) { +this.queryModel = queryModel; + } + + @Override public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) + throws IOException, InterruptedException { +List splitList; +if (inputSplit instanceof CarbonInputSplit) { + splitList = new ArrayList<>(1); + splitList.add((CarbonInputSplit) inputSplit); +} else { + throw new RuntimeException("unsupported input split type: " + inputSplit); +} +List tableBlockInfoList = CarbonInputSplit.createBlocks(splitList); +queryModel.setTableBlockInfos(tableBlockInfoList); +queryModel.setVectorReader(true); +try { + queryExecutor = + QueryExecutorFactory.getQueryExecutor(queryModel, taskAttemptContext.getConfiguration()); + iterator = (AbstractDetailQueryResultIterator) queryExecutor.execute(queryModel); +} catch (QueryExecutionException e) { + LOGGER.error(e); + throw new InterruptedException(e.getMessage()); +} catch (Exception e) { + LOGGER.error(e); + throw e; +} + } + + @Override public boolean nextKeyValue() throws IOException, InterruptedException { +initBatch(); +if
[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2807 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1384/ ---
[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2807 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9435/ ---
[jira] [Created] (CARBONDATA-3063) Support set carbon property in CSDK
xubo245 created CARBONDATA-3063: --- Summary: Support set carbon property in CSDK Key: CARBONDATA-3063 URL: https://issues.apache.org/jira/browse/CARBONDATA-3063 Project: CarbonData Issue Type: Sub-task Affects Versions: 1.5.1 Reporter: xubo245 Assignee: xubo245 Fix For: 1.5.1 when user write CarbonData or read CarbonData in CSDK, user maybe need to change or add carbon property to avoid some problem. such as OOM. So we should support set carbon property in CSDK -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1383/ ---
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2816 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9434/ ---
[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2807 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1171/ ---
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1170/ ---
[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2869 please fix CI error @kunal642 ---
[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2807 rebase ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/2804 @CI pass, please check again. ---
[GitHub] carbondata pull request #2871: [CARBONDATA-3051] Fix bugs in unclosed stream...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2871#discussion_r229535208 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -169,7 +170,7 @@ public CarbonReaderBuilder withHadoopConf(String key, String value) { reader.initialize(split, attempt); readers.add(reader); } catch (Exception e) { - reader.close(); + CarbonUtil.closeStreams(readers.toArray(new RecordReader[0])); --- End diff -- `CarbonUtil.closeStreams` will loop and close the readers. Calling this will save the loc (line of code) ---
[GitHub] carbondata pull request #2874: [CARBONDATA-3053][Cli] Fix bugs for carbon-cl...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2874#discussion_r229535025 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java --- @@ -500,4 +503,7 @@ private double computePercentage(byte[] data, byte[] min, byte[] max, ColumnSche } } + public void close() throws IOException { +this.fileReader.finish(); --- End diff -- this is the `close` method for `fileReader` ---
[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2883 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9433/ ---
[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2883 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1382/ ---
[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2883 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1169/ ---
[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2883 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1381/ ---
[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2883 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9432/ ---
[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2868 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1380/ ---
[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2883 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1168/ ---
[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2876 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1378/ ---
[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2876 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9429/ ---
[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2868 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1167/ ---
[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2877 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9430/ ---
[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2877 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1379/ ---
[jira] [Resolved] (CARBONDATA-3000) Provide C++ interface for writing carbon data
[ https://issues.apache.org/jira/browse/CARBONDATA-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3000. -- Resolution: Fixed Fix Version/s: 1.5.1 > Provide C++ interface for writing carbon data > - > > Key: CARBONDATA-3000 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3000 > Project: CarbonData > Issue Type: Sub-task >Affects Versions: 1.5.0 >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Fix For: 1.5.1 > > Time Spent: 8.5h > Remaining Estimate: 0h > > Provide C++ interface for writing carbon data -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2837 ---
[GitHub] carbondata pull request #2883: [CARBONDATA-3062] Fix Compatibility issue wit...
GitHub user Indhumathi27 opened a pull request: https://github.com/apache/carbondata/pull/2883 [CARBONDATA-3062] Fix Compatibility issue with cache_level as blocklet **Why this PR for?** In case of hybrid store we can have block as well as blocklet schema. Scenario: When there is a hybrid store in which few loads are from legacy store which do not contain the blocklet information and hence they will be, by default have cache_level as BLOCK and few loads with latest store which contain the BLOCKLET information and have cache_level BLOCKLET. For these type of scenarios we need to have separate task and footer schemas. For all loads with/without blocklet info there will not be any additional cost of maintaining 2 variables - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Indhumathi27/carbondata column_comp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2883.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2883 ---
[jira] [Created] (CARBONDATA-3062) Fix Compatibility issue with cache_level as blocklet
Indhumathi Muthumurugesh created CARBONDATA-3062: Summary: Fix Compatibility issue with cache_level as blocklet Key: CARBONDATA-3062 URL: https://issues.apache.org/jira/browse/CARBONDATA-3062 Project: CarbonData Issue Type: Improvement Reporter: Indhumathi Muthumurugesh Assignee: Indhumathi Muthumurugesh -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2877 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1166/ ---
[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2868#discussion_r229363551 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java --- @@ -141,7 +141,12 @@ public boolean renameTo(String changetoName) { } public boolean delete() { -return file.delete(); +try { + return deleteFile(file.getAbsolutePath(), FileFactory.getFileType(file.getAbsolutePath())); +} catch (IOException e) { + LOGGER.error("Exception occurred:" + e.getMessage()); --- End diff -- ok ---
[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2868#discussion_r229363490 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java --- @@ -62,6 +62,11 @@ boolean renameForce(String changetoName); + /** + * This method will delete the files recursively from file system + * + * @return --- End diff -- ok ---
[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2868 > If the table is on S3, will it behave correctly since it does not have "folder" concept? I have not changed any existing behavior, so it should work fine ---
[jira] [Created] (CARBONDATA-3061) Add validation for supported format version and Encoding type to throw proper exception to the user while reading a file
Manish Gupta created CARBONDATA-3061: Summary: Add validation for supported format version and Encoding type to throw proper exception to the user while reading a file Key: CARBONDATA-3061 URL: https://issues.apache.org/jira/browse/CARBONDATA-3061 Project: CarbonData Issue Type: Improvement Reporter: Manish Gupta Assignee: Manish Gupta This jira is raised to handle forward compatibility. Through this PR if any data file is read using a lower version (>=1.5.1), a proper exception will be thrown if columnar format version or any encoding type is not supported for read in that version -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2876 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1165/ ---
[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2861 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9428/ ---
[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2869 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9427/ ---
[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2861 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1164/ ---
[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2869 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1163/ ---
[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2882 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1162/ ---
[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2882 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1377/ ---
[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2861 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1376/ ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2804 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9425/ ---
[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2869 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1375/ ---
[GitHub] carbondata issue #2873: [WIP] Fix partition load issue when custom location ...
Github user kumarvishal09 commented on the issue: https://github.com/apache/carbondata/pull/2873 LGTM ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2804 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1161/ ---
[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2837 LGTM ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2804 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1374/ ---
[GitHub] carbondata issue #2870: [HOTFIX-compatibility] Handle Lazy loading with inve...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2870 LGTM ---
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1160/ ---
[GitHub] carbondata issue #2867: [HOTFIX] Fixed data loading failure with safe column...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2867 LGTM ---
[GitHub] carbondata issue #2873: [WIP] Fix partition load issue when custom location ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2873 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9423/ ---
[GitHub] carbondata issue #2879: [CARBONDATA-3058] Fix some exception coding in data ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2879 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1159/ ---
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9424/ ---
[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2882 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9421/ ---
[GitHub] carbondata issue #2873: [WIP] Fix partition load issue when custom location ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2873 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1372/ ---
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1373/ ---
[GitHub] carbondata issue #2879: [CARBONDATA-3058] Fix some exception coding in data ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2879 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9422/ ---
[GitHub] carbondata issue #2879: [CARBONDATA-3058] Fix some exception coding in data ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2879 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1371/ ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r229300837 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java --- @@ -314,23 +312,26 @@ private void printColumnStats(String columnName) throws IOException, MemoryExcep minPercent = String.format("%.1f", blocklet.getColumnChunk().getMinPercentage() * 100); maxPercent = String.format("%.1f", blocklet.getColumnChunk().getMaxPercentage() * 100); DataFile.ColumnChunk columnChunk = blocklet.columnChunk; - if (columnChunk.column.isDimensionColumn() && DataTypeUtil + if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || blocklet + .getColumnChunk().column.getColumnName().contains(".val") || blocklet --- End diff -- this will be for no dictionary colmplex column, for complex column min max can be shown as NA, that will be ok right ---
[GitHub] carbondata pull request #2850: [CARBONDATA-3056] Added concurrent reading th...
Github user NamanRastogi commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2850#discussion_r229299196 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -114,6 +117,43 @@ public static CarbonReaderBuilder builder(String tablePath) { return builder(tablePath, tableName); } + /** + * Return a new list of {@link CarbonReader} objects + * --- End diff -- Done! ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229298501 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1737,4 +1738,89 @@ public void testReadNextRowWithProjectionAndRowUtil() { } } + @Test + public void testVectorReader() { +String path = "./testWriteFiles"; +try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[12]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("intField", DataTypes.INT); + fields[3] = new Field("longField", DataTypes.LONG); + fields[4] = new Field("doubleField", DataTypes.DOUBLE); + fields[5] = new Field("boolField", DataTypes.BOOLEAN); + fields[6] = new Field("dateField", DataTypes.DATE); + fields[7] = new Field("timeField", DataTypes.TIMESTAMP); + fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); + fields[9] = new Field("varcharField", DataTypes.VARCHAR); + fields[10] = new Field("byteField", DataTypes.BYTE); + fields[11] = new Field("floatField", DataTypes.FLOAT); + Map map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)) + .writtenBy("CarbonReaderTest") + .build(); + + for (int i = 0; i < 10; i++) { +String[] row2 = new String[]{ +"robot" + (i % 10), +String.valueOf(i % 1), +String.valueOf(i), +String.valueOf(Long.MAX_VALUE - i), +String.valueOf((double) i / 2), +String.valueOf(true), +"2019-03-02", +"2019-02-12 03:03:34", +"12.345", +"varchar", +String.valueOf(i), +"1.23" +}; +writer.write(row2); + } + writer.close(); + + // Read data + CarbonReader reader = CarbonReader + .builder(path, "_temp") + .withVectorReader(true) + .build(); + + int i = 0; + while (reader.hasNext()) { +Object[] data = (Object[]) reader.readNextRow(); + +assert (RowUtil.getString(data, 0).equals("robot" + i)); +assertEquals(RowUtil.getShort(data, 4), i); +assertEquals(RowUtil.getInt(data, 5), i); +assert (RowUtil.getLong(data, 6) == Long.MAX_VALUE - i); +assertEquals(RowUtil.getDouble(data, 7), ((double) i) / 2); +assert (RowUtil.getByte(data, 8).equals(new Byte("1"))); +assertEquals(RowUtil.getInt(data, 1), 17957); +assertEquals(RowUtil.getLong(data, 2), 154992081400L); +assert (RowUtil.getDecimal(data, 9).equals("12.35")); +assert (RowUtil.getString(data, 3).equals("varchar")); +assertEquals(RowUtil.getByte(data, 10), new Byte(String.valueOf(i))); +assertEquals(RowUtil.getFloat(data, 11), new Float("1.23")); +i++; + } + reader.close(); --- End diff -- Add validation for total number of rows read. ---
[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2882#discussion_r229297954 --- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java --- @@ -443,7 +444,8 @@ void computePercentage(byte[] shardMin, byte[] shardMax) { * @return result */ private double computePercentage(byte[] data, byte[] min, byte[] max, ColumnSchema column) { - if (column.getDataType() == DataTypes.STRING) { + if (column.getDataType() == DataTypes.STRING || column.getDataType() == DataTypes.BOOLEAN + || column.hasEncoding(Encoding.DICTIONARY)) { --- End diff -- yes, but min max will be surrogate keys right, showing min and max as dictionary value is not useful right ---
[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2861#discussion_r229294323 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala --- @@ -175,6 +172,11 @@ with Serializable { dataSchema: StructType, context: TaskAttemptContext): OutputWriter = { val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration) +val appName = context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME) +if (null != appName) { --- End diff -- actually the appname will be always set ,spark will always set the appname, this check is added for one of the test cases , i will remove this ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229288967 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java --- @@ -169,7 +169,7 @@ public void fillRow(int rowId, CarbonColumnVector vector, int vectorRow) { length)) { vector.putNull(vectorRow); --- End diff -- added check ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229289006 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java --- @@ -178,6 +178,25 @@ public boolean delete() { return carbonFiles; } + @Override public List listFiles(Boolean recursive, CarbonFileFilter fileFilter) --- End diff -- fixed ---
[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...
Github user kevinjmh commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2879#discussion_r229287888 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java --- @@ -308,7 +312,7 @@ private void processBatch(CarbonRowBatch batch, CarbonFactHandler dataHandler, i } writeCounter[iteratorIndex] += batch.getSize(); } catch (Exception e) { - throw new CarbonDataLoadingException("unable to generate the mdkey", e); + throw new CarbonDataLoadingException(e); --- End diff -- The KeyGenException extend Exception, it needs CarbonDataLoadingException(RuntimeException) to wrap and throw. ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2804 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9420/ ---
[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2869 Can you change the PR title to be more specific ---
[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2861#discussion_r229285778 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala --- @@ -175,6 +172,11 @@ with Serializable { dataSchema: StructType, context: TaskAttemptContext): OutputWriter = { val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration) +val appName = context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME) +if (null != appName) { --- End diff -- Is there is no appName, I think we should construct one, the appName should be always write into the file ---
[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2861#discussion_r229285529 --- Diff: integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala --- @@ -121,15 +121,13 @@ class SparkCarbonFileFormat extends FileFormat dataSchema: StructType): OutputWriterFactory = { val conf = job.getConfiguration - +conf + .set(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, --- End diff -- move to previous line ---
[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2861#discussion_r229285477 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -37,6 +37,11 @@ abstract class CarbonRDD[T: ClassTag]( @transient private val ss: SparkSession, @transient private var deps: Seq[Dependency[_]]) extends RDD[T](ss.sparkContext, deps) { + @transient val sparkAppName: String = ss.sparkContext.appName + CarbonProperties.getInstance() +.addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, + sparkAppName) --- End diff -- move to previous line ---
[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2861 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1367/ ---
[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...
Github user kevinjmh commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2879#discussion_r229285339 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java --- @@ -212,7 +212,11 @@ private void finish(CarbonFactHandler dataHandler, int iteratorIndex) { try { processingComplete(dataHandler); } catch (CarbonDataLoadingException e) { - exception = new CarbonDataWriterException(e.getMessage(), e); + // only assign when exception is null + // else it will erase original root cause + if (null == exception) { --- End diff -- not for the statistics. better to read the whole method. It has two stages: finish the handler and close the handler. the exception could be assigned in either stage. ---
[GitHub] carbondata pull request #2842: [CARBONDATA-3032] Remove carbon.blocklet.size...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2842#discussion_r229285157 --- Diff: docs/sdk-guide.md --- @@ -24,7 +24,8 @@ CarbonData provides SDK to facilitate # SDK Writer -In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. +If you want to use SDK, it needs other carbon jar or you can use carbondata-sdk.jar. --- End diff -- `it needs other carbon jar` This sentence is not very clear ---
[GitHub] carbondata pull request #2836: [CARBONDATA-3027] Increase unsafe working mem...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2836#discussion_r229284826 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1234,7 +1234,7 @@ @CarbonProperty public static final String UNSAFE_WORKING_MEMORY_IN_MB = "carbon.unsafe.working.memory.in.mb"; - public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "512"; + public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "1024"; --- End diff -- You can change the configuration in your application, there is no need to change the default value of this parameter ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2804 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1158/ ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...
Github user ajantha-bhat commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r229284011 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.hadoop.util; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.metadata.datatype.DecimalType; +import org.apache.carbondata.core.metadata.datatype.StructField; +import org.apache.carbondata.core.scan.executor.QueryExecutor; +import org.apache.carbondata.core.scan.executor.QueryExecutorFactory; +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.core.scan.model.ProjectionDimension; +import org.apache.carbondata.core.scan.model.ProjectionMeasure; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch; +import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.hadoop.AbstractRecordReader; +import org.apache.carbondata.hadoop.CarbonInputSplit; + +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.log4j.Logger; + +/** + * A specialized RecordReader that reads into CarbonColumnarBatches directly using the + * carbondata column APIs and fills the data directly into columns. + */ +public class CarbonVectorizedRecordReader extends AbstractRecordReader { + + private static final Logger LOGGER = + LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName()); + + private CarbonColumnarBatch carbonColumnarBatch; + + private QueryExecutor queryExecutor; + + private int batchIdx = 0; + + private int numBatched = 0; + + private AbstractDetailQueryResultIterator iterator; + + private QueryModel queryModel; + + public CarbonVectorizedRecordReader(QueryModel queryModel) { +this.queryModel = queryModel; + } + + @Override public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) + throws IOException, InterruptedException { +List splitList; +if (inputSplit instanceof CarbonInputSplit) { + splitList = new ArrayList<>(1); + splitList.add((CarbonInputSplit) inputSplit); +} else { + throw new RuntimeException("unsupported input split type: " + inputSplit); +} +List tableBlockInfoList = CarbonInputSplit.createBlocks(splitList); +queryModel.setTableBlockInfos(tableBlockInfoList); +queryModel.setVectorReader(true); +try { + queryExecutor = + QueryExecutorFactory.getQueryExecutor(queryModel, taskAttemptContext.getConfiguration()); + iterator = (AbstractDetailQueryResultIterator) queryExecutor.execute(queryModel); +} catch (QueryExecutionException e) { + LOGGER.error(e); + throw new InterruptedException(e.getMessage()); +} catch (Exception e) { + LOGGER.error(e); + throw e; +} + } + + @Override public boolean nextKeyValue() throws IOException, InterruptedException { +initBatch(); +if
[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2868#discussion_r229283191 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java --- @@ -141,7 +141,12 @@ public boolean renameTo(String changetoName) { } public boolean delete() { -return file.delete(); +try { + return deleteFile(file.getAbsolutePath(), FileFactory.getFileType(file.getAbsolutePath())); +} catch (IOException e) { + LOGGER.error("Exception occurred:" + e.getMessage()); --- End diff -- include the exception in the error log ---
[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2868#discussion_r229283056 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java --- @@ -62,6 +62,11 @@ boolean renameForce(String changetoName); + /** + * This method will delete the files recursively from file system + * + * @return --- End diff -- complete the comment ---
[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2868 If the table is on S3, will it behave correctly since it does not have "folder" concept? ---
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2816 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1157/ ---
[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2882 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1368/ ---
[GitHub] carbondata pull request #2871: [CARBONDATA-3051] Fix bugs in unclosed stream...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2871#discussion_r229282082 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -169,7 +170,7 @@ public CarbonReaderBuilder withHadoopConf(String key, String value) { reader.initialize(split, attempt); readers.add(reader); } catch (Exception e) { - reader.close(); + CarbonUtil.closeStreams(readers.toArray(new RecordReader[0])); --- End diff -- Why not loop and close each one in the `readers`? ---
[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2861 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1156/ ---
[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2861 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9419/ ---
[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2879#discussion_r229280016 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java --- @@ -212,7 +212,11 @@ private void finish(CarbonFactHandler dataHandler, int iteratorIndex) { try { processingComplete(dataHandler); } catch (CarbonDataLoadingException e) { - exception = new CarbonDataWriterException(e.getMessage(), e); + // only assign when exception is null + // else it will erase original root cause + if (null == exception) { --- End diff -- Why should we keep this exceptionï¼If we only want to do some statistics, we can add that code in finally code block and you can just throw the exception in catch code block ---
[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2879#discussion_r229280235 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java --- @@ -308,7 +312,7 @@ private void processBatch(CarbonRowBatch batch, CarbonFactHandler dataHandler, i } writeCounter[iteratorIndex] += batch.getSize(); } catch (Exception e) { - throw new CarbonDataLoadingException("unable to generate the mdkey", e); + throw new CarbonDataLoadingException(e); --- End diff -- I think there is no need to wrap the exception here, just remove the try-catch code. ---
[GitHub] carbondata pull request #2881: [HOTFIX] Remove unuse javax.servlet jar from ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2881 ---
[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2879#discussion_r229280564 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterProcessorStepImpl.java --- @@ -259,7 +259,7 @@ public void processRow(CarbonRow row, CarbonFactHandler dataHandler) throws KeyG readCounter++; dataHandler.addDataToStore(row); } catch (Exception e) { - throw new CarbonDataLoadingException("unable to generate the mdkey", e); --- End diff -- no need to wrap the exception, just remove the try-catch code ---
[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2879#discussion_r229280426 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterBatchProcessorStepImpl.java --- @@ -141,7 +141,9 @@ private void finish(String tableName, CarbonFactHandler dataHandler) { try { processingComplete(dataHandler); } catch (Exception e) { - exception = new CarbonDataWriterException(e.getMessage(), e); + if (null == exception) { --- End diff -- no need to keep the exception here. you can do the statistics in finally code block ---