Re: Unable to perform compaction,
Hi Liang, Sorry for the late reply, *For auto compaction:* I've set my default threshold for compaction and set and for force compaction I've used the *ALTER* Query But on both of the cases it is showing me error -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Unable-to-perform-compaction-tp2099p2349.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-338) Remove the method arguments as they are never used inside the method
Shivansh created CARBONDATA-338: --- Summary: Remove the method arguments as they are never used inside the method Key: CARBONDATA-338 URL: https://issues.apache.org/jira/browse/CARBONDATA-338 Project: CarbonData Issue Type: Improvement Components: core Reporter: Shivansh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284][WIP] Abstracting in...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061025 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.memory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { --- End diff -- I understand InMemoryBTreeIndex is segment level's index. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284][WIP] Abstracting in...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061184 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.memory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { + + private static final Log LOG = LogFactory.getLog(InMemoryBTreeIndex.class); + private Segment segment; + + InMemoryBTreeIndex(Segment segment) { +this.segment = segment; + } + + @Override + public String getName() { +return null; + } + + @Override + public List filter(JobContext job, FilterResolverIntf filter) + throws IOException { + +List result = new LinkedList(); + +FilterExpressionProcessor filterExpressionProcessor = new FilterExpressionProcessor(); + +AbsoluteTableIdentifier absoluteTableIdentifier = null; + //CarbonInputFormatUtil.getAbsoluteTableIdentifier(job.getConfiguration()); + +//for this segment fetch blocks matching filter in BTree +List dataRefNodes = null; +try { + dataRefNodes = getDataBlocksOfSegment(job, filterExpressionProcessor, absoluteTableIdentifier, + filter, segment.getId()); +} catch (IndexBuilderException e) { + throw new IOException(e.getMessage()); +} +for (DataRefNode dataRefNode : dataRefNodes) { + BlockBTreeLeafNode leafNode = (BlockBTreeLeafNode) dataRefNode; + TableBlockInfo tableBlockInfo = leafNode.getTableBlockInfo(); + result.add(new CarbonInputSplit(segme
[GitHub] incubator-carbondata pull request #258: [CARBONDATA-338] Removed the unused ...
GitHub user shiv4nsh opened a pull request: https://github.com/apache/incubator-carbondata/pull/258 [CARBONDATA-338] Removed the unused value inside the method Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/shiv4nsh/incubator-carbondata improvement/CARBONDATA-338 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #258 commit 97cdfdc6bd4fc112253437628683d8fbdaab8c6f Author: Knoldus Date: 2016-10-26T08:01:35Z Removed the unused value inside the method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #250: CARBONDATA-330: Fix compiler warning...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/250 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #258: [CARBONDATA-338] Removed the unused ...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/258 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #257: [CARBONDATA-337] Inverted Index Spel...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/257 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-339) Align storePath name in generateGlobalDictionary() of GlobalDictionaryUtil.scala
Liang Chen created CARBONDATA-339: - Summary: Align storePath name in generateGlobalDictionary() of GlobalDictionaryUtil.scala Key: CARBONDATA-339 URL: https://issues.apache.org/jira/browse/CARBONDATA-339 Project: CarbonData Issue Type: Bug Reporter: Liang Chen Assignee: Liang Chen Priority: Trivial Align storePath name in generateGlobalDictionary() of GlobalDictionaryUtil.scala: Change all "hdfsLocation" to "storePath". I can support any path, not only hdfs path,need to change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
B-Tree LRU cache (New Feature)
Hi All, Please find the problem and proposed solution. *B-Tree LRU Cache:* Problem: CarbonData is maintaining two level of B-Tree cache, one at the driver level and another at executor level. Currently CarbonData has the mechanism to invalidate the segments and blocks cache for the invalid table segments, but there is no eviction policy for the unused cached object. So the instance at which complete memory is utilized then the system will not be able to process any new requests. *Solution:* In the cache maintained at the driver level and at the executor there must be objects in cache currently not in use. Therefore system should have the mechanism to below mechanism. 1. Set the max memory limit till which objects could be hold in the memory. 2. When configured memory limit reached then identify the cached objects currently not in use so that the required memory could be freed without impacting the existing process. 3. Eviction should be done only till the required memory is not meet. For details please refer to attachments. Regards. Shahid
[GitHub] incubator-carbondata pull request #259: Fix constants and method names
GitHub user Zhangshunyu opened a pull request: https://github.com/apache/incubator-carbondata/pull/259 Fix constants and method names ## Why raise this pr? To rename some constants and method names, for example: It is hard to get clear about what the parameter is used for 'carbon.number.of.cores', cores for what? It is hard to get clear about what the method is used for 'getNumberOfCores', query or load cores? etc ## How to test? Pass all the test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Zhangshunyu/incubator-carbondata constants Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/259.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #259 commit 8a3c1b4758a93d7e5b7c1d983f9a9309995f4c79 Author: Zhangshunyu Date: 2016-10-26T13:53:21Z Fix constans --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/251#discussion_r85157225 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/writer/DataWriterProcessorStepImpl.java --- @@ -0,0 +1,360 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.processing.newflow.steps.writer; + +import java.io.File; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.carbon.CarbonTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.metadata.CarbonMetadata; +import org.apache.carbondata.core.carbon.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.carbon.path.CarbonStorePath; +import org.apache.carbondata.core.carbon.path.CarbonTablePath; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.IgnoreDictionary; +import org.apache.carbondata.core.keygenerator.KeyGenerator; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.processing.datatypes.GenericDataType; +import org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep; +import org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration; +import org.apache.carbondata.processing.newflow.DataField; +import org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants; +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; +import org.apache.carbondata.processing.newflow.row.CarbonRow; +import org.apache.carbondata.processing.newflow.row.CarbonRowBatch; +import org.apache.carbondata.processing.store.CarbonDataFileAttributes; +import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel; +import org.apache.carbondata.processing.store.CarbonFactHandler; +import org.apache.carbondata.processing.store.CarbonFactHandlerFactory; +import org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException; +import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; + +/** + * It reads data from sorted files which are generated in previous sort step. + * And it writes data to carbondata file. It also generates mdk key while writing to carbondata file + */ +public class DataWriterProcessorStepImpl extends AbstractDataLoadProcessorStep { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName()); + + private String storeLocation; + + private boolean[] isUseInvertedIndex; + + private int[] dimLens; + + private int dimensionCount; + + private List wrapperColumnSchema; + + private int[] colCardinality; + + private SegmentProperties segmentProperties; + + private KeyGenerator keyGenerator; + + private CarbonFactHandler dataHandler; + + private Map complexIndexMap; + + private int noDictionaryCount; + + private int complexDimensionCount; + + private int measureCount; + + private long readCounter; + + private long writeCounter; + + private int measureIndex = IgnoreDictionary.MEASURES_INDEX_IN_ROW.getIndex(); + + private int noDimByteArrayIndex = IgnoreDictionary.BYTE_ARRAY_INDEX_IN_ROW.getIndex(); + + private int dimsArrayIndex = IgnoreDictionary.DIMENSION_INDEX_I
[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/251#discussion_r85159146 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactHandlerFactory.java --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.processing.store; + +/** + * Factory class for CarbonFactHandler. + */ +public final class CarbonFactHandlerFactory { + + /** + * Creating fact handler to write data. + * @param model + * @param handlerType + * @return + */ + public static CarbonFactHandler createCarbonFactHandler(CarbonFactDataHandlerModel model, --- End diff -- One doubt, in `CarbonFactDataHandlerColumnar.addDataToStore`, why semaphore is needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/251#discussion_r85159483 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -304,4 +311,92 @@ public static String getLocalDataFolderLocation(String databaseName, String tabl return ArrayUtils .toPrimitive(noDictionaryMapping.toArray(new Boolean[noDictionaryMapping.size()])); } + + /** + * Preparing the boolean [] to map whether the dimension use inverted index or not. + */ + public static boolean[] getIsUseInvertedIndex(DataField[] fields) { +List isUseInvertedIndexList = new ArrayList(); +for (DataField field : fields) { + if (field.getColumn().isUseInvertedIndnex() && field.getColumn().isDimesion()) { +isUseInvertedIndexList.add(true); + } else if(field.getColumn().isDimesion()){ +isUseInvertedIndexList.add(false); + } +} +return ArrayUtils +.toPrimitive(isUseInvertedIndexList.toArray(new Boolean[isUseInvertedIndexList.size()])); + } + + private static String getComplexTypeString(DataField[] dataFields) { +StringBuilder dimString = new StringBuilder(); +for (int i = 0; i < dataFields.length; i++) { + DataField dataField = dataFields[i]; + if (dataField.getColumn().getDataType().equals(DataType.ARRAY) || dataField.getColumn() + .getDataType().equals(DataType.STRUCT)) { +addAllComplexTypeChildren((CarbonDimension) dataField.getColumn(), dimString, ""); +dimString.append(CarbonCommonConstants.SEMICOLON_SPC_CHARACTER); + } +} +return dimString.toString(); + } + + /** + * This method will return all the child dimensions under complex dimension + * + */ + private static void addAllComplexTypeChildren(CarbonDimension dimension, StringBuilder dimString, + String parent) { +dimString.append( +dimension.getColName() + CarbonCommonConstants.COLON_SPC_CHARACTER + dimension.getDataType() --- End diff -- change `+` to append --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #260: Add one FQA in Readme
GitHub user bill1208 opened a pull request: https://github.com/apache/incubator-carbondata/pull/260 Add one FQA in Readme Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/bill1208/incubator-carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/260.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #260 commit 4ba5bf16383bbca15d7273457c083b3c65137d34 Author: å¨å¹¿æ Date: 2016-10-15T16:10:04Z Create test1 commit cedf7c30dd3a9fec7b9a92ff9d4fac180da73b74 Author: å¨å¹¿æ Date: 2016-10-15T16:10:33Z Create test2 commit d7dfa6ee53b89bee65768333ff1a9e622a35eab5 Author: å¨å¹¿æ Date: 2016-10-15T16:11:18Z Create parameter01 commit 47b4c09e643a41fbb910bf826d8e129844795a46 Author: å¨å¹¿æ Date: 2016-10-15T16:11:51Z Create parameter02 commit f6e778ad3681321b08fbe9d4e160a2933916d245 Author: å¨å¹¿æ Date: 2016-10-15T16:13:01Z Update test1 commit 15f3a9988a3d44efe3e640c6ec295b5fef6cfa64 Author: å¨å¹¿æ Date: 2016-10-16T11:58:15Z Delete parameter01 commit d53a0db63014c85d248dfc3647fb3a96a81e1b72 Author: å¨å¹¿æ Date: 2016-10-16T11:58:24Z Delete parameter02 commit a963ff4552855605ec851e9b8c9750a9b45a6f49 Author: å¨å¹¿æ Date: 2016-10-16T11:58:33Z Delete test1 commit aa256de5ed4e4efbbb654bf0c9e81cc92db7ced5 Author: å¨å¹¿æ Date: 2016-10-16T11:58:42Z Delete test2 commit 0becfb6ce038fecb220c5ce270843bddfa9acfc1 Author: å¨å¹¿æ Date: 2016-10-15T16:10:04Z Create test1 commit 351bc337402d62da0252ade3c6b060fbff1e0631 Author: å¨å¹¿æ Date: 2016-10-15T16:10:33Z Create test2 commit 8aca24399b04e2fb695c91a0ff801a3d5814904a Author: å¨å¹¿æ Date: 2016-10-15T16:11:18Z Create parameter01 commit 6d7cf10a986680309cb271e059cf71a190892157 Author: å¨å¹¿æ Date: 2016-10-15T16:11:51Z Create parameter02 commit e0eb15f70689cbbcce4e267f6f7f104f0d349874 Author: å¨å¹¿æ Date: 2016-10-15T16:13:01Z Update test1 commit 4a194152d56f917ca972d93396c62efb76ae9c93 Author: å¨å¹¿æ Date: 2016-10-16T11:58:15Z Delete parameter01 commit eb035a2ee99bb84c63736a00d4da44598fe55477 Author: å¨å¹¿æ Date: 2016-10-16T11:58:24Z Delete parameter02 commit f6eb83092c27c26eec79ceb98c76c0db1e0035fc Author: å¨å¹¿æ Date: 2016-10-16T11:58:33Z Delete test1 commit 1eca56cfdb3300f043fec8759431ab2adc3f65ff Author: å¨å¹¿æ Date: 2016-10-16T11:58:42Z Delete test2 commit 94b1d55abaf1659ff15cc54232fb2da5877604e4 Author: bill1208 Date: 2016-10-18T17:04:42Z add the doc suggestion to create the carbondata table commit 914512e9b6069c1138af41de1956cac59d86853f Author: bill1208 Date: 2016-10-18T17:05:17Z merge branch 'master' of https://github.com/bill1208/incubator-carbondata 2016-10-19 01:10:35:ãI need merger the code from githup bill1208 to my local master commit 514f09c0e56a38a087becfb992ed264d2f5450a1 Author: bill1208 Date: 2016-10-18T17:19:44Z add the doc suggestion to create the carbondata table commit bb2048e01562aeb0103cfa3bdfa390911d478d4f Author: bill1208 Date: 2016-10-22T17:13:58Z add the suggestion file commit 1e8a72ce3b1c14c3269243a51e452a890b39208b Author: bill1208 Date: 2016-10-22T17:18:55Z modify the ruby commit be031df64b82b4ea900c3a5773a96383ba767a47 Author: bill1208 Date: 2016-10-22T17:21:19Z finish all the ruby formatted commit 6b3eab2dd9ed06512daaee62eee1854d3b72a7da Author: bill1208 Date: 2016-10-22T17:23:26Z modify the last sentense commit 254d2d20ce85a7b71482435674a241a838d38470 Author: å¨å¹¿æ Date: 2016-10-22T17:29:00Z Delete Suggestion-To-Create-Carbon-Table.md commit 298171c8304f5f
Podling Report Reminder - November 2016
Dear podling, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 16 November 2016, 10:30 am PDT. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted 2 weeks before the board meeting, to allow sufficient time for review and submission (Wed, November 02). Please submit your report with sufficient time to allow the Incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is 2 weeks prior to the board meeting. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. This should be appended to the Incubator Wiki page at: http://wiki.apache.org/incubator/November2016 Note: This is manually populated. You may need to wait a little before this page is created from a template. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85248921 --- Diff: processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java --- @@ -37,7 +37,7 @@ private int surrogateKey = -1; @Before public void setUp() throws Exception { -TimeStampDirectDictionaryGenerator generator = TimeStampDirectDictionaryGenerator.instance; +TimeStampDirectDictionaryGenerator generator = new TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); --- End diff -- This file is a test file, I think the TimeStampDirectDictionaryGenerator should be set 'CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT' for testing. pls check again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85249559 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java --- @@ -39,37 +39,32 @@ */ public class TimeStampDirectDictionaryGenerator implements DirectDictionaryGenerator { - private TimeStampDirectDictionaryGenerator() { + private ThreadLocal threadLocal = new ThreadLocal<>(); - } - - public static TimeStampDirectDictionaryGenerator instance = - new TimeStampDirectDictionaryGenerator(); + private String dateFormat; /** * The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis. */ - public static final long granularityFactor; + public long granularityFactor; /** * The date timestamp to be considered as start date for calculating the timestamp * java counts the number of milliseconds from start of "January 1, 1970", this property is * customized the start of position. for example "January 1, 2000" */ - public static final long cutOffTimeStamp; + public long cutOffTimeStamp; /** * Logger instance */ + private static final LogService LOGGER = - LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); + LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85250472 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java --- @@ -651,6 +654,7 @@ public void setDefault() { columnSchemaDetails = ""; columnsDataTypeString=""; tableOption = ""; +dateFormat = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT; --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85255469 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -470,6 +474,36 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; } } +HashMap dateformatsHashMap = new HashMap(); +if (meta.dateFormat != null) { + String[] dateformats = meta.dateFormat.split(","); + for (String dateFormat:dateformats) { +String[] dateFormatSplits = dateFormat.split(":", 2); + dateformatsHashMap.put(dateFormatSplits[0],dateFormatSplits[1]); +// TODO verify the dateFormatSplits is valid or not + } +} +directDictionaryGenerators = +new DirectDictionaryGenerator[meta.getDimensionColumnIds().length]; +for (int i = 0; i < meta.getDimensionColumnIds().length; i++) { + ColumnSchemaDetails columnSchemaDetails = columnSchemaDetailsWrapper.get( + meta.getDimensionColumnIds()[i]); + if (columnSchemaDetails.isDirectDictionary()) { +if (dateformatsHashMap.containsKey(columnSchemaDetails.getColumnName())) { + directDictionaryGenerators[i] = + DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator( + columnSchemaDetails.getColumnType(), + dateformatsHashMap.get(columnSchemaDetails.getColumnName())); +} else { + String dateFormat = CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, + CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); + directDictionaryGenerators[i] = + DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator( + columnSchemaDetails.getColumnType(), dateFormat); --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85256460 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java --- @@ -111,7 +110,7 @@ /** * timeFormat */ - protected SimpleDateFormat timeFormat; + protected String dateFormat; --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #261: fix issue carbondata-339
GitHub user hseagle opened a pull request: https://github.com/apache/incubator-carbondata/pull/261 fix issue carbondata-339 fix jira issue carbondata-339, replace hdfsLocation with storePath in the function generateGlobalDictionary https://issues.apache.org/jira/browse/CARBONDATA-339 You can merge this pull request into a Git repository by running: $ git pull https://github.com/hseagle/incubator-carbondata carbondata-339 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/261.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #261 commit 64d4d6daaf6e8adede6cfffe94221d20f365631c Author: hseagle Date: 2016-10-27T02:55:53Z fix issue carbondata-339 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] [WIP] Use CarbonInp...
GitHub user jackylk opened a pull request: https://github.com/apache/incubator-carbondata/pull/262 [CARBONDATA-308] [WIP] Use CarbonInputFormat in CarbonScanRDD compute Use CarbonInputFormat in CarbonScanRDD compute function 1. In driver side, only getSplit is required, so only filter condition is required, no need to create full QueryModel object, so creation of QueryModel is moved from driver side to executor side. 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead of use QueryExecutor directly You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata scanrdd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/262.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #262 commit ef4a889db9b86653c273794c9a810a9cd9683437 Author: jackylk Date: 2016-10-22T18:43:53Z use CarbonInputFormat in executor commit a5c17f523c7127b538cc2d384cbff4fa454a007a Author: jackylk Date: 2016-10-27T04:01:36Z modify getPartition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/251#discussion_r85267443 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactHandlerFactory.java --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.processing.store; + +/** + * Factory class for CarbonFactHandler. + */ +public final class CarbonFactHandlerFactory { + + /** + * Creating fact handler to write data. + * @param model + * @param handlerType + * @return + */ + public static CarbonFactHandler createCarbonFactHandler(CarbonFactDataHandlerModel model, --- End diff -- Yes, I don't see the advantage of using semaphore here because we are already using fixed thread pool to control the threads. I will discuss with team and confirm whether it is needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/251#discussion_r85267495 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/writer/DataWriterProcessorStepImpl.java --- @@ -0,0 +1,360 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.processing.newflow.steps.writer; + +import java.io.File; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.carbon.CarbonTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.metadata.CarbonMetadata; +import org.apache.carbondata.core.carbon.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.carbon.path.CarbonStorePath; +import org.apache.carbondata.core.carbon.path.CarbonTablePath; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.IgnoreDictionary; +import org.apache.carbondata.core.keygenerator.KeyGenerator; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.processing.datatypes.GenericDataType; +import org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep; +import org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration; +import org.apache.carbondata.processing.newflow.DataField; +import org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants; +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; +import org.apache.carbondata.processing.newflow.row.CarbonRow; +import org.apache.carbondata.processing.newflow.row.CarbonRowBatch; +import org.apache.carbondata.processing.store.CarbonDataFileAttributes; +import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel; +import org.apache.carbondata.processing.store.CarbonFactHandler; +import org.apache.carbondata.processing.store.CarbonFactHandlerFactory; +import org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException; +import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; + +/** + * It reads data from sorted files which are generated in previous sort step. + * And it writes data to carbondata file. It also generates mdk key while writing to carbondata file + */ +public class DataWriterProcessorStepImpl extends AbstractDataLoadProcessorStep { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName()); + + private String storeLocation; + + private boolean[] isUseInvertedIndex; + + private int[] dimLens; + + private int dimensionCount; + + private List wrapperColumnSchema; + + private int[] colCardinality; + + private SegmentProperties segmentProperties; + + private KeyGenerator keyGenerator; + + private CarbonFactHandler dataHandler; + + private Map complexIndexMap; + + private int noDictionaryCount; + + private int complexDimensionCount; + + private int measureCount; + + private long readCounter; + + private long writeCounter; + + private int measureIndex = IgnoreDictionary.MEASURES_INDEX_IN_ROW.getIndex(); + + private int noDimByteArrayIndex = IgnoreDictionary.BYTE_ARRAY_INDEX_IN_ROW.getIndex(); + + private int dimsArrayIndex = IgnoreDictionary.DIMENSION_INDE
[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/251#discussion_r85270229 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -304,4 +311,92 @@ public static String getLocalDataFolderLocation(String databaseName, String tabl return ArrayUtils .toPrimitive(noDictionaryMapping.toArray(new Boolean[noDictionaryMapping.size()])); } + + /** + * Preparing the boolean [] to map whether the dimension use inverted index or not. + */ + public static boolean[] getIsUseInvertedIndex(DataField[] fields) { +List isUseInvertedIndexList = new ArrayList(); +for (DataField field : fields) { + if (field.getColumn().isUseInvertedIndnex() && field.getColumn().isDimesion()) { +isUseInvertedIndexList.add(true); + } else if(field.getColumn().isDimesion()){ +isUseInvertedIndexList.add(false); + } +} +return ArrayUtils +.toPrimitive(isUseInvertedIndexList.toArray(new Boolean[isUseInvertedIndexList.size()])); + } + + private static String getComplexTypeString(DataField[] dataFields) { +StringBuilder dimString = new StringBuilder(); +for (int i = 0; i < dataFields.length; i++) { + DataField dataField = dataFields[i]; + if (dataField.getColumn().getDataType().equals(DataType.ARRAY) || dataField.getColumn() + .getDataType().equals(DataType.STRUCT)) { +addAllComplexTypeChildren((CarbonDimension) dataField.getColumn(), dimString, ""); +dimString.append(CarbonCommonConstants.SEMICOLON_SPC_CHARACTER); + } +} +return dimString.toString(); + } + + /** + * This method will return all the child dimensions under complex dimension + * + */ + private static void addAllComplexTypeChildren(CarbonDimension dimension, StringBuilder dimString, + String parent) { +dimString.append( +dimension.getColName() + CarbonCommonConstants.COLON_SPC_CHARACTER + dimension.getDataType() --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/251#discussion_r85270264 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/writer/DataWriterProcessorStepImpl.java --- @@ -0,0 +1,360 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.processing.newflow.steps.writer; + +import java.io.File; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.carbon.CarbonTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.metadata.CarbonMetadata; +import org.apache.carbondata.core.carbon.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.carbon.path.CarbonStorePath; +import org.apache.carbondata.core.carbon.path.CarbonTablePath; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.constants.IgnoreDictionary; +import org.apache.carbondata.core.keygenerator.KeyGenerator; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.processing.datatypes.GenericDataType; +import org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep; +import org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration; +import org.apache.carbondata.processing.newflow.DataField; +import org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants; +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; +import org.apache.carbondata.processing.newflow.row.CarbonRow; +import org.apache.carbondata.processing.newflow.row.CarbonRowBatch; +import org.apache.carbondata.processing.store.CarbonDataFileAttributes; +import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel; +import org.apache.carbondata.processing.store.CarbonFactHandler; +import org.apache.carbondata.processing.store.CarbonFactHandlerFactory; +import org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException; +import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; + +/** + * It reads data from sorted files which are generated in previous sort step. + * And it writes data to carbondata file. It also generates mdk key while writing to carbondata file + */ +public class DataWriterProcessorStepImpl extends AbstractDataLoadProcessorStep { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName()); + + private String storeLocation; + + private boolean[] isUseInvertedIndex; + + private int[] dimLens; + + private int dimensionCount; + + private List wrapperColumnSchema; + + private int[] colCardinality; + + private SegmentProperties segmentProperties; + + private KeyGenerator keyGenerator; + + private CarbonFactHandler dataHandler; + + private Map complexIndexMap; + + private int noDictionaryCount; + + private int complexDimensionCount; + + private int measureCount; + + private long readCounter; + + private long writeCounter; + + private int measureIndex = IgnoreDictionary.MEASURES_INDEX_IN_ROW.getIndex(); + + private int noDimByteArrayIndex = IgnoreDictionary.BYTE_ARRAY_INDEX_IN_ROW.getIndex(); + + private int dimsArrayIndex = IgnoreDictionary.DIMENSION_INDE