[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638192#comment-15638192
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/297


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637276#comment-15637276
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/208


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614544#comment-15614544
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480694
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
--- End diff --

ok


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614540#comment-15614540
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480562
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
+List blocks = index.filter(job, filterResolver);
--- End diff --

You are right, I will modify


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614252#comment-15614252
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85470293
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
--- End diff --

if loader internally implement cache then we can keep as `IndexLoader` only.


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614074#comment-15614074
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85464310
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
--- End diff --

please use internal.CarbonInputSplit


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613866#comment-15613866
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85457078
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.internal.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.segment.SegmentManager;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.scan.expression.Expression;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+
+/**
+ * Input format of CarbonData file.
+ * @param 
+ */
+public class CarbonTableInputFormat extends FileInputFormat {
+
+  private static final String FILTER_PREDICATE =
+  "mapreduce.input.carboninputformat.filter.predicate";
+
+  private SegmentManager segmentManager;
+
+  public CarbonTableInputFormat(SegmentManager segmentManager) {
--- End diff --

accept


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612029#comment-15612029
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85346673
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
--- End diff --

does it required to load index every time?
I guess we are just creating the instance of index here, so why don't you 
use factory here?


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611953#comment-15611953
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85340545
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
+this.segment = segment;
+  }
+
+  @Override
+  public String getName() {
+return null;
+  }
+
+  @Override
+  public List filter(JobContext job, FilterResolverIntf filter)
--- End diff --

It seems method return type is incompatible. 


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the 

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611994#comment-15611994
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85343636
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
--- End diff --

I guess we supposed to pass list of valid segments here.


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional 

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611939#comment-15611939
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85339106
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/CarbonFormat.java ---
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal;
+
+public enum CarbonFormat {
+  COLUMNR
--- End diff --

typo : COLUMNAR


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611920#comment-15611920
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85337928
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.internal.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.segment.SegmentManager;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.scan.expression.Expression;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+
+/**
+ * Input format of CarbonData file.
+ * @param 
+ */
+public class CarbonTableInputFormat extends FileInputFormat {
+
+  private static final String FILTER_PREDICATE =
+  "mapreduce.input.carboninputformat.filter.predicate";
+
+  private SegmentManager segmentManager;
+
+  public CarbonTableInputFormat(SegmentManager segmentManager) {
+this.segmentManager = segmentManager;
+  }
+
+  @Override
+  public RecordReader createRecordReader(InputSplit split,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+switch (((CarbonInputSplit)split).formatType()) {
--- End diff --

Why don't you take the formatType from job conf? Better don't touch 
InputSplit as it comes from outside. 


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607805#comment-15607805
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061184
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.memory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
+this.segment = segment;
+  }
+
+  @Override
+  public String getName() {
+return null;
+  }
+
+  @Override
+  public List filter(JobContext job, FilterResolverIntf filter)
+  throws IOException {
+
+List result = new LinkedList();
+
+FilterExpressionProcessor filterExpressionProcessor = new 
FilterExpressionProcessor();
+
+AbsoluteTableIdentifier absoluteTableIdentifier = null;
+
//CarbonInputFormatUtil.getAbsoluteTableIdentifier(job.getConfiguration());
+
+//for this segment fetch blocks matching filter in BTree
+List dataRefNodes = null;
+try {
+  dataRefNodes = getDataBlocksOfSegment(job, 
filterExpressionProcessor, absoluteTableIdentifier,
+  filter, segment.getId());
+} catch (IndexBuilderException e) {
+  throw new IOException(e.getMessage());
+}
 

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558273#comment-15558273
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r82505613
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormatBase.java
 ---
@@ -0,0 +1,69 @@
+/*
--- End diff --

Yes, it is like that only


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558270#comment-15558270
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r82505582
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/StreamingSegment.java
 ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.api.CarbonInputFormatBase;
+import org.apache.carbondata.scan.model.QueryModel;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+public class StreamingSegment extends Segment {
--- End diff --

If I understand the comment correctly, the answer is that all segments are 
handled unifiedly in `CarbonInputFormatBase`, however, the internally read 
implementation of this segment is different from IndexedSegment. It uses Row 
input format to read. Is this the question?


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542797#comment-15542797
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

GitHub user jackylk opened a pull request:

https://github.com/apache/incubator-carbondata/pull/208

[CARBONDATA-284][WIP] Abstracting index and segment interface

This PR adds new User API and Dev API for carbon-hadoop module:

### User API
- `CarbonColumnarInputFormat/OutputFormat`: it uses current 
`CarbonInputFormat` as internal implementation. 
- `CarbonRowInputFormat/OutputFormat`: it needs to be implemented
- `CarbonOutputCommitter`: used for managing segment commit

They are based on `CarbonInputFormatBase/OutputFormatBase`

### Dev API
- Segment: an abstract class represents a single load of data,  used by 
CarbonInputFormatBase to get all InputSplit by matching QueryModel, and used by 
CarbonOutputCommitter to prepare for reading. Implementation examples are 
`IndexedSegment` and `StreamingSegment`.
- SegmentManager: an interface to manage segments. Current implementation 
is `ZkSegmentManager`, which need to be mapped to existing logic.
- Index: an interface that can is used by `IndexedSegment` to filter 
InputSplit. Current implementation is `InMemoryBTreeIndex` which load the index 
into driver's memory.

`CarbonInputFormatUtil` is modified so that it can also be used by 
`CarbonColumnarInputFormat`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata index-interface

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #208


commit 398d2ec3e6706c615918a734a90f9dc4111067d8
Author: jackylk 
Date:   2016-10-03T16:01:48Z

add User API

commit 1d92a00403faeebc09bf595ba11b3e55d4c997f2
Author: jackylk 
Date:   2016-10-03T16:02:04Z

add Developer API

commit 1812a0a68b53ba5d48fc030e2a59329b0e827b05
Author: jackylk 
Date:   2016-10-03T16:02:49Z

refactory existing code

commit 430e7710b88725b587c1f3542d4d66ab02958cbc
Author: jackylk 
Date:   2016-10-03T16:27:10Z

change Index interface




> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)