[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638192#comment-15638192 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/297 > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637276#comment-15637276 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/208 > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614544#comment-15614544 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480694 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; --- End diff -- ok > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614540#comment-15614540 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480562 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Block; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.index.IndexLoader; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * This segment is backed by index, thus getSplits can use the index to do file pruning. + */ +public class IndexedSegment extends Segment { + + private IndexLoader loader; + + public IndexedSegment(String name, String path, IndexLoader loader) { +super(name, path); +this.loader = loader; + } + + @Override + public List getSplits(JobContext job, FilterResolverIntf filterResolver) + throws IOException { +// do as following +// 1. create the index or get from cache by the filter name in the configuration +// 2. filter by index to get the filtered block +// 3. create input split from filtered block + +List output = new LinkedList<>(); +Index index = loader.load(job.getConfiguration()); +List blocks = index.filter(job, filterResolver); --- End diff -- You are right, I will modify > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614252#comment-15614252 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85470293 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Block; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.index.IndexLoader; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * This segment is backed by index, thus getSplits can use the index to do file pruning. + */ +public class IndexedSegment extends Segment { + + private IndexLoader loader; + + public IndexedSegment(String name, String path, IndexLoader loader) { +super(name, path); +this.loader = loader; + } + + @Override + public List getSplits(JobContext job, FilterResolverIntf filterResolver) + throws IOException { +// do as following +// 1. create the index or get from cache by the filter name in the configuration +// 2. filter by index to get the filtered block +// 3. create input split from filtered block + +List output = new LinkedList<>(); +Index index = loader.load(job.getConfiguration()); --- End diff -- if loader internally implement cache then we can keep as `IndexLoader` only. > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614074#comment-15614074 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85464310 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; --- End diff -- please use internal.CarbonInputSplit > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613866#comment-15613866 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85457078 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.api; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonProjection; +import org.apache.carbondata.hadoop.internal.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.segment.SegmentManager; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.scan.expression.Expression; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; + +/** + * Input format of CarbonData file. + * @param + */ +public class CarbonTableInputFormat extends FileInputFormat{ + + private static final String FILTER_PREDICATE = + "mapreduce.input.carboninputformat.filter.predicate"; + + private SegmentManager segmentManager; + + public CarbonTableInputFormat(SegmentManager segmentManager) { --- End diff -- accept > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612029#comment-15612029 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85346673 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Block; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.index.IndexLoader; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * This segment is backed by index, thus getSplits can use the index to do file pruning. + */ +public class IndexedSegment extends Segment { + + private IndexLoader loader; + + public IndexedSegment(String name, String path, IndexLoader loader) { +super(name, path); +this.loader = loader; + } + + @Override + public List getSplits(JobContext job, FilterResolverIntf filterResolver) + throws IOException { +// do as following +// 1. create the index or get from cache by the filter name in the configuration +// 2. filter by index to get the filtered block +// 3. create input split from filtered block + +List output = new LinkedList<>(); +Index index = loader.load(job.getConfiguration()); --- End diff -- does it required to load index every time? I guess we are just creating the instance of index here, so why don't you use factory here? > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611953#comment-15611953 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85340545 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.impl; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { + + private static final Log LOG = LogFactory.getLog(InMemoryBTreeIndex.class); + private Segment segment; + + InMemoryBTreeIndex(Segment segment) { +this.segment = segment; + } + + @Override + public String getName() { +return null; + } + + @Override + public List filter(JobContext job, FilterResolverIntf filter) --- End diff -- It seems method return type is incompatible. > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611994#comment-15611994 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85343636 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.impl; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { + + private static final Log LOG = LogFactory.getLog(InMemoryBTreeIndex.class); + private Segment segment; + + InMemoryBTreeIndex(Segment segment) { --- End diff -- I guess we supposed to pass list of valid segments here. > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611939#comment-15611939 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85339106 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/CarbonFormat.java --- @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal; + +public enum CarbonFormat { + COLUMNR --- End diff -- typo : COLUMNAR > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611920#comment-15611920 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85337928 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.api; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonProjection; +import org.apache.carbondata.hadoop.internal.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.segment.SegmentManager; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.scan.expression.Expression; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; + +/** + * Input format of CarbonData file. + * @param + */ +public class CarbonTableInputFormat extends FileInputFormat{ + + private static final String FILTER_PREDICATE = + "mapreduce.input.carboninputformat.filter.predicate"; + + private SegmentManager segmentManager; + + public CarbonTableInputFormat(SegmentManager segmentManager) { +this.segmentManager = segmentManager; + } + + @Override + public RecordReader createRecordReader(InputSplit split, + TaskAttemptContext context) throws IOException, InterruptedException { +switch (((CarbonInputSplit)split).formatType()) { --- End diff -- Why don't you take the formatType from job conf? Better don't touch InputSplit as it comes from outside. > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.3.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607805#comment-15607805 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061184 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.memory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { + + private static final Log LOG = LogFactory.getLog(InMemoryBTreeIndex.class); + private Segment segment; + + InMemoryBTreeIndex(Segment segment) { +this.segment = segment; + } + + @Override + public String getName() { +return null; + } + + @Override + public List filter(JobContext job, FilterResolverIntf filter) + throws IOException { + +List result = new LinkedList(); + +FilterExpressionProcessor filterExpressionProcessor = new FilterExpressionProcessor(); + +AbsoluteTableIdentifier absoluteTableIdentifier = null; + //CarbonInputFormatUtil.getAbsoluteTableIdentifier(job.getConfiguration()); + +//for this segment fetch blocks matching filter in BTree +List dataRefNodes = null; +try { + dataRefNodes = getDataBlocksOfSegment(job, filterExpressionProcessor, absoluteTableIdentifier, + filter, segment.getId()); +} catch (IndexBuilderException e) { + throw new IOException(e.getMessage()); +}
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558273#comment-15558273 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r82505613 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormatBase.java --- @@ -0,0 +1,69 @@ +/* --- End diff -- Yes, it is like that only > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558270#comment-15558270 ] ASF GitHub Bot commented on CARBONDATA-284: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r82505582 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/StreamingSegment.java --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment; + +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.hadoop.api.CarbonInputFormatBase; +import org.apache.carbondata.scan.model.QueryModel; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +public class StreamingSegment extends Segment { --- End diff -- If I understand the comment correctly, the answer is that all segments are handled unifiedly in `CarbonInputFormatBase`, however, the internally read implementation of this segment is different from IndexedSegment. It uses Row input format to read. Is this the question? > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
[ https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542797#comment-15542797 ] ASF GitHub Bot commented on CARBONDATA-284: --- GitHub user jackylk opened a pull request: https://github.com/apache/incubator-carbondata/pull/208 [CARBONDATA-284][WIP] Abstracting index and segment interface This PR adds new User API and Dev API for carbon-hadoop module: ### User API - `CarbonColumnarInputFormat/OutputFormat`: it uses current `CarbonInputFormat` as internal implementation. - `CarbonRowInputFormat/OutputFormat`: it needs to be implemented - `CarbonOutputCommitter`: used for managing segment commit They are based on `CarbonInputFormatBase/OutputFormatBase` ### Dev API - Segment: an abstract class represents a single load of data, used by CarbonInputFormatBase to get all InputSplit by matching QueryModel, and used by CarbonOutputCommitter to prepare for reading. Implementation examples are `IndexedSegment` and `StreamingSegment`. - SegmentManager: an interface to manage segments. Current implementation is `ZkSegmentManager`, which need to be mapped to existing logic. - Index: an interface that can is used by `IndexedSegment` to filter InputSplit. Current implementation is `InMemoryBTreeIndex` which load the index into driver's memory. `CarbonInputFormatUtil` is modified so that it can also be used by `CarbonColumnarInputFormat`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata index-interface Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #208 commit 398d2ec3e6706c615918a734a90f9dc4111067d8 Author: jackylkDate: 2016-10-03T16:01:48Z add User API commit 1d92a00403faeebc09bf595ba11b3e55d4c997f2 Author: jackylk Date: 2016-10-03T16:02:04Z add Developer API commit 1812a0a68b53ba5d48fc030e2a59329b0e827b05 Author: jackylk Date: 2016-10-03T16:02:49Z refactory existing code commit 430e7710b88725b587c1f3542d4d66ab02958cbc Author: jackylk Date: 2016-10-03T16:27:10Z change Index interface > Abstracting Index and Segment interface > --- > > Key: CARBONDATA-284 > URL: https://issues.apache.org/jira/browse/CARBONDATA-284 > Project: CarbonData > Issue Type: Improvement > Components: hadoop-integration >Affects Versions: 0.1.0-incubating >Reporter: Jacky Li > Fix For: 0.2.0-incubating > > > This issue is intended to abstract developer API and user API to achieve > following goals: > Goal 1: User can choose the place to store Index data, it can be stored in > processing framework's memory space (like in spark driver memory) or in > another service outside of the processing framework (like using a > independent database service, which can be shared across client) > Goal 2: Developer can add more index of his choice to CarbonData files. > Besides B+ tree on multi-dimensional key which current CarbonData supports, > developers are free to add other indexing technology to make certain > workload faster. These new indices should be added in a pluggable way. > This Jira has been discussed in maillist: > http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)