date:20180803

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6159/



---

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6501/



---

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1//



---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207699726
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/BlockScanUnit.java ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.store;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+
+/**
+ * It contains a block to scan, and a destination worker who should scan it
+ */
+@InterfaceAudience.Internal
+public class BlockScanUnit implements ScanUnit {
+
+  // the data block to scan
+  private CarbonInputSplit inputSplit;
+
+  // the worker who should scan this unit
+  private Schedulable schedulable;
--- End diff --

fixed


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207699730
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/ScanUnit.java ---
@@ -15,26 +15,27 @@
  * limitations under the License.
  */
 
-package org.apache.carbondata.store.impl.rpc;
+package org.apache.carbondata.sdk.store;
 
-import org.apache.carbondata.common.annotations.InterfaceAudience;
-import org.apache.carbondata.store.impl.rpc.model.BaseResponse;
-import org.apache.carbondata.store.impl.rpc.model.LoadDataRequest;
-import org.apache.carbondata.store.impl.rpc.model.QueryResponse;
-import org.apache.carbondata.store.impl.rpc.model.Scan;
-import org.apache.carbondata.store.impl.rpc.model.ShutdownRequest;
-import org.apache.carbondata.store.impl.rpc.model.ShutdownResponse;
-
-import org.apache.hadoop.ipc.VersionedProtocol;
-
-@InterfaceAudience.Internal
-public interface StoreService extends VersionedProtocol {
-
-  long versionID = 1L;
+import java.io.Serializable;
 
-  BaseResponse loadData(LoadDataRequest request);
-
-  QueryResponse query(Scan scan);
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+import org.apache.carbondata.core.metadata.schema.table.Writable;
 
-  ShutdownResponse shutdown(ShutdownRequest request);
+/**
+ * An unit for the scanner in Carbon Store
+ */
+@InterfaceAudience.User
+@InterfaceStability.Unstable
+public interface ScanUnit extends Serializable, Writable {
--- End diff --

fixed


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207699719
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -444,4 +444,16 @@ public void setFormat(FileFormat fileFormat) {
   public Blocklet makeBlocklet() {
 return new Blocklet(getPath().getName(), blockletId);
   }
+
+  public String[] preferredLocations() {
--- End diff --

fixed


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ajithme

Github user ajithme commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207699358
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/ScanUnit.java ---
@@ -15,26 +15,27 @@
  * limitations under the License.
  */
 
-package org.apache.carbondata.store.impl.rpc;
+package org.apache.carbondata.sdk.store;
 
-import org.apache.carbondata.common.annotations.InterfaceAudience;
-import org.apache.carbondata.store.impl.rpc.model.BaseResponse;
-import org.apache.carbondata.store.impl.rpc.model.LoadDataRequest;
-import org.apache.carbondata.store.impl.rpc.model.QueryResponse;
-import org.apache.carbondata.store.impl.rpc.model.Scan;
-import org.apache.carbondata.store.impl.rpc.model.ShutdownRequest;
-import org.apache.carbondata.store.impl.rpc.model.ShutdownResponse;
-
-import org.apache.hadoop.ipc.VersionedProtocol;
-
-@InterfaceAudience.Internal
-public interface StoreService extends VersionedProtocol {
-
-  long versionID = 1L;
+import java.io.Serializable;
 
-  BaseResponse loadData(LoadDataRequest request);
-
-  QueryResponse query(Scan scan);
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+import org.apache.carbondata.core.metadata.schema.table.Writable;
 
-  ShutdownResponse shutdown(ShutdownRequest request);
+/**
+ * An unit for the scanner in Carbon Store
+ */
+@InterfaceAudience.User
+@InterfaceStability.Unstable
+public interface ScanUnit extends Serializable, Writable {
--- End diff --

can remove Generics


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ajithme

Github user ajithme commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207699345
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/BlockScanUnit.java ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.store;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+
+/**
+ * It contains a block to scan, and a destination worker who should scan it
+ */
+@InterfaceAudience.Internal
+public class BlockScanUnit implements ScanUnit {
+
+  // the data block to scan
+  private CarbonInputSplit inputSplit;
+
+  // the worker who should scan this unit
+  private Schedulable schedulable;
--- End diff --

Add this in Writable interface else it will be null after deserialization


---

[jira] [Created] (CARBONDATA-2827) Refactor Segment Status Manager Interface

2018-08-03 Thread Ravindra Pesala (JIRA)

Ravindra Pesala created CARBONDATA-2827:
---

 Summary: Refactor Segment Status Manager Interface
 Key: CARBONDATA-2827
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2827
 Project: CarbonData
  Issue Type: Improvement
Reporter: Ravindra Pesala
 Attachments: Segment Status Management interface design_V1.docx

Carbon uses tablestatus file to record segment status and details of each 
segment during each load. This tablestatus enables carbon to support concurrent 
loads and reads without data inconsistency or corruption.

So it is very important feature of carbondata and we should have clean 
interfaces to maintain it. Current tablestatus updation is shattered to 
multiple places and there is no clean interface, so I am proposing to refactor 
current SegmentStatusManager interface and bringing all tablestatus operations 
to single interface.  

This new interface allows to add table status to any other storage like DB. 
This is needed for S3 type object stores as  these are eventually consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ajithme

Github user ajithme commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207699308
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -444,4 +444,16 @@ public void setFormat(FileFormat fileFormat) {
   public Blocklet makeBlocklet() {
 return new Blocklet(getPath().getName(), blockletId);
   }
+
+  public String[] preferredLocations() {
--- End diff --

The super FileSplit.file is not serializable. Refer HADOOP-13519 so java 
serialization may return empty


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207699000
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/descriptor/ScanDescriptor.java
 ---
@@ -15,23 +15,33 @@
  * limitations under the License.
  */
 
-package org.apache.carbondata.store.api.descriptor;
+package org.apache.carbondata.sdk.store.descriptor;
 
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
 import java.util.Objects;
 
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
 import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.util.ObjectSerializationUtil;
 
-public class SelectDescriptor {
+import org.apache.hadoop.io.Writable;
+
+@InterfaceAudience.User
+@InterfaceStability.Evolving
+public class ScanDescriptor implements Writable {
 
   private TableIdentifier table;
   private String[] projection;
   private Expression filter;
   private long limit;
--- End diff --

ok


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207698994
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/ScannerImpl.java ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.store;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.row.CarbonRow;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
+import org.apache.carbondata.hadoop.api.CarbonInputFormat;
+import org.apache.carbondata.sdk.store.conf.StoreConf;
+import org.apache.carbondata.sdk.store.descriptor.ScanDescriptor;
+import org.apache.carbondata.sdk.store.descriptor.TableIdentifier;
+import org.apache.carbondata.sdk.store.exception.CarbonException;
+import org.apache.carbondata.sdk.store.service.DataService;
+import org.apache.carbondata.sdk.store.service.PruneService;
+import org.apache.carbondata.sdk.store.service.ServiceFactory;
+import org.apache.carbondata.sdk.store.service.model.PruneRequest;
+import org.apache.carbondata.sdk.store.service.model.PruneResponse;
+import org.apache.carbondata.sdk.store.service.model.ScanRequest;
+import org.apache.carbondata.sdk.store.service.model.ScanResponse;
+
+import org.apache.hadoop.conf.Configuration;
+
+class ScannerImpl implements Scanner {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(ScannerImpl.class.getCanonicalName());
+
+  private PruneService pruneService;
+  private TableInfo tableInfo;
+
+  ScannerImpl(StoreConf conf, TableInfo tableInfo) throws IOException {
+this.pruneService = ServiceFactory.createPruneService(
+conf.masterHost(), conf.registryServicePort());
+this.tableInfo = tableInfo;
+  }
+
+  /**
+   * Trigger a RPC to Carbon Master to do pruning
+   * @param table table identifier
+   * @param filterExpression expression of filter predicate given by user
+   * @return list of ScanUnit
+   * @throws CarbonException if any error occurs
+   */
+  @Override
+  public List prune(TableIdentifier table, Expression 
filterExpression)
+  throws CarbonException {
+try {
+  Configuration configuration = new Configuration();
+  CarbonInputFormat.setTableName(configuration, table.getTableName());
--- End diff --

ok


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ajithme

Github user ajithme commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207501460
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/descriptor/ScanDescriptor.java
 ---
@@ -15,23 +15,33 @@
  * limitations under the License.
  */
 
-package org.apache.carbondata.store.api.descriptor;
+package org.apache.carbondata.sdk.store.descriptor;
 
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
 import java.util.Objects;
 
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
 import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.util.ObjectSerializationUtil;
 
-public class SelectDescriptor {
+import org.apache.hadoop.io.Writable;
+
+@InterfaceAudience.User
+@InterfaceStability.Evolving
+public class ScanDescriptor implements Writable {
 
   private TableIdentifier table;
   private String[] projection;
   private Expression filter;
   private long limit;
--- End diff --

Must be Long.MAX_VALUE


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ajithme

Github user ajithme commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207431095
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/ScannerImpl.java ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.store;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.row.CarbonRow;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
+import org.apache.carbondata.hadoop.api.CarbonInputFormat;
+import org.apache.carbondata.sdk.store.conf.StoreConf;
+import org.apache.carbondata.sdk.store.descriptor.ScanDescriptor;
+import org.apache.carbondata.sdk.store.descriptor.TableIdentifier;
+import org.apache.carbondata.sdk.store.exception.CarbonException;
+import org.apache.carbondata.sdk.store.service.DataService;
+import org.apache.carbondata.sdk.store.service.PruneService;
+import org.apache.carbondata.sdk.store.service.ServiceFactory;
+import org.apache.carbondata.sdk.store.service.model.PruneRequest;
+import org.apache.carbondata.sdk.store.service.model.PruneResponse;
+import org.apache.carbondata.sdk.store.service.model.ScanRequest;
+import org.apache.carbondata.sdk.store.service.model.ScanResponse;
+
+import org.apache.hadoop.conf.Configuration;
+
+class ScannerImpl implements Scanner {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(ScannerImpl.class.getCanonicalName());
+
+  private PruneService pruneService;
+  private TableInfo tableInfo;
+
+  ScannerImpl(StoreConf conf, TableInfo tableInfo) throws IOException {
+this.pruneService = ServiceFactory.createPruneService(
+conf.masterHost(), conf.registryServicePort());
--- End diff --

must be prune service port


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ajithme

Github user ajithme commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207431252
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/service/StoreService.java
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.store.service;
+
+import java.util.List;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.row.CarbonRow;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.sdk.store.descriptor.LoadDescriptor;
+import org.apache.carbondata.sdk.store.descriptor.ScanDescriptor;
+import org.apache.carbondata.sdk.store.descriptor.TableDescriptor;
+import org.apache.carbondata.sdk.store.descriptor.TableIdentifier;
+import org.apache.carbondata.sdk.store.exception.CarbonException;
+
+import org.apache.hadoop.ipc.VersionedProtocol;
+
+@InterfaceAudience.Internal
+public interface StoreService extends VersionedProtocol {
+  long versionID = 1L;
+
+  void createTable(TableDescriptor descriptor) throws CarbonException;
+
+  void dropTable(TableIdentifier table) throws CarbonException;
+
+  CarbonTable getTable(TableIdentifier table) throws CarbonException;
--- End diff --

hadoop RPC need response object to be a 
org.apache.hadoop.io.serializer.WritableSerialization


---

[GitHub] carbondata pull request #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ajithme

Github user ajithme commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2589#discussion_r207433215
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/store/ScannerImpl.java ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.store;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastore.row.CarbonRow;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
+import org.apache.carbondata.hadoop.api.CarbonInputFormat;
+import org.apache.carbondata.sdk.store.conf.StoreConf;
+import org.apache.carbondata.sdk.store.descriptor.ScanDescriptor;
+import org.apache.carbondata.sdk.store.descriptor.TableIdentifier;
+import org.apache.carbondata.sdk.store.exception.CarbonException;
+import org.apache.carbondata.sdk.store.service.DataService;
+import org.apache.carbondata.sdk.store.service.PruneService;
+import org.apache.carbondata.sdk.store.service.ServiceFactory;
+import org.apache.carbondata.sdk.store.service.model.PruneRequest;
+import org.apache.carbondata.sdk.store.service.model.PruneResponse;
+import org.apache.carbondata.sdk.store.service.model.ScanRequest;
+import org.apache.carbondata.sdk.store.service.model.ScanResponse;
+
+import org.apache.hadoop.conf.Configuration;
+
+class ScannerImpl implements Scanner {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(ScannerImpl.class.getCanonicalName());
+
+  private PruneService pruneService;
+  private TableInfo tableInfo;
+
+  ScannerImpl(StoreConf conf, TableInfo tableInfo) throws IOException {
+this.pruneService = ServiceFactory.createPruneService(
+conf.masterHost(), conf.registryServicePort());
+this.tableInfo = tableInfo;
+  }
+
+  /**
+   * Trigger a RPC to Carbon Master to do pruning
+   * @param table table identifier
+   * @param filterExpression expression of filter predicate given by user
+   * @return list of ScanUnit
+   * @throws CarbonException if any error occurs
+   */
+  @Override
+  public List prune(TableIdentifier table, Expression 
filterExpression)
+  throws CarbonException {
+try {
+  Configuration configuration = new Configuration();
+  CarbonInputFormat.setTableName(configuration, table.getTableName());
--- End diff --

can use CarbonInputFormat.setTableInfo(configuration, tableInfo); else 
org.apache.carbondata.hadoop.api.CarbonInputFormat#getAbsoluteTableIdentifier 
will have empty path


---

[jira] [Created] (CARBONDATA-2826) SELECT support using distributed carbon store

2018-08-03 Thread Ajith S (JIRA)

Ajith S created CARBONDATA-2826:
---

 Summary: SELECT support using distributed carbon store
 Key: CARBONDATA-2826
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2826
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Ajith S
Assignee: Ajith S


Change the carbon code to support scanning ( table select using spark ) using 
distributed carbon store API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (CARBONDATA-2825) Store Service Interface

2018-08-03 Thread Ajith S (JIRA)

Ajith S created CARBONDATA-2825:
---

 Summary: Store Service Interface
 Key: CARBONDATA-2825
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2825
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Ajith S
Assignee: Jacky Li


This Jira targets on providing the interfaces from Distributed CarbonStore 
perspective



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (CARBONDATA-2824) Distributed CarbonStore

2018-08-03 Thread Ajith S (JIRA)

Ajith S created CARBONDATA-2824:
---

 Summary: Distributed CarbonStore
 Key: CARBONDATA-2824
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2824
 Project: CarbonData
  Issue Type: New Feature
Reporter: Ajith S
Assignee: Ajith S


Currently the CarbonStore is very tightly coupled with FileSystem interface and 
which runs in process JVM like in spark. We can instead make CarbonStore run as 
a separate service which can be accessed via network/rpc. So as a Followup of 
CARBONDATA-2688 (CarbonStore Java API and REST API) we can make carbon store 
distributed 

This has some advantages. 
1. Distributed CarbonStore can support parallel scanning i.e multiple tasks can 
start scanning data parallely, which may have a higher parallelism factor than 
compute layer 
2. Distributed CarbonStore can support index service to multiple apps like 
(spark/ flink/ presto), such that index will be shared to save resource 
3. Distributed CarbonStore  resource consumption is isolated from application 
and easily scalable to support higher workloads 
4. As a future improvement, Distributed CarbonStore  can implement a query 
cache since it has independent resources 

Distributed CarbonStore will have 2 main deployment parts: 
Cluster of remote carbon store service 
SDK which acts as a client for communication with store 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6500/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7776/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread chenliang613

Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
retest this please


---

[jira] [Resolved] (CARBONDATA-2815) Add documentation for memory spill and rebuild datamap

2018-08-03 Thread Liang Chen (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen resolved CARBONDATA-2815.

   Resolution: Fixed
Fix Version/s: 1.4.1
   1.5.0

> Add documentation for memory spill and rebuild datamap
> --
>
> Key: CARBONDATA-2815
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2815
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2604: [CARBONDATA-2815][Doc] Add documentation for ...

2018-08-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2604


---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6158/



---

[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6157/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6156/



---

[GitHub] carbondata issue #2606: [CARBONDATA-2817]Thread Leak in Update and in No sor...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2606
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6155/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6154/



---

[GitHub] carbondata issue #2606: [CARBONDATA-2817]Thread Leak in Update and in No sor...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2606
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6153/



---

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6152/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6151/



---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6150/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6149/



---

[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Block rebuilding for bloo...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6148/



---

[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Block rebuilding for bloo...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6147/



---

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6146/



---

[GitHub] carbondata issue #2537: [CARBONDATA-2768][CarbonStore] Fix error in tests fo...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2537
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6145/



---

[GitHub] carbondata issue #2537: [CARBONDATA-2768][CarbonStore] Fix error in tests fo...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2537
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6144/



---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6143/



---

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6142/



---

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6141/



---

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6140/



---

[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6139/



---

[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6138/



---

[jira] [Created] (CARBONDATA-2823) Alter table set local dictionary include after bloom creation and merge index on old V3 store fails throwing incorrect error

2018-08-03 Thread Chetan Bhat (JIRA)

Chetan Bhat created CARBONDATA-2823:
---

 Summary: Alter table set local dictionary include after bloom 
creation and merge index on old V3 store fails throwing incorrect error
 Key: CARBONDATA-2823
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2823
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.4.1
 Environment: Spark 2.1
Reporter: Chetan Bhat


Steps :

In old version V3 store create table and load data.

CREATE TABLE uniqdata_load (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format';
LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
uniqdata_load OPTIONS('DELIMITER'=',' , 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

In 1.4.1 version refresh the table of old V3 store.

refresh table uniqdata_load;

Create bloom filter and merge index.

CREATE DATAMAP dm_uniqdata1_tmstmp ON TABLE uniqdata_load USING 'bloomfilter' 
DMPROPERTIES ('INDEX_COLUMNS' = 'DOJ', 'BLOOM_SIZE'='64', 
'BLOOM_FPP'='0.1');

Alter table set local dictionary include.

 alter table uniqdata_load set 
tblproperties('local_dictionary_include'='CUST_NAME');

 

Issue : Alter table set local dictionary include fails with incorrect error.

0: jdbc:hive2://10.18.98.101:22550/default> alter table uniqdata_load set 
tblproperties('local_dictionary_include'='CUST_NAME');

*Error: 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
streaming is not supported for index datamap (state=,code=0)*

 

Expected : Operation should be success. If the operation is unsupported it 
should throw correct error message.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2606: [CARBONDATA-2817]Thread Leak in Update and in No sor...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2606
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6137/



---

[GitHub] carbondata issue #2605: [CARBONDATA-2585] Fix local dictionary for both tabl...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2605
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6136/



---

[GitHub] carbondata issue #2604: [CARBONDATA-2815][Doc] Add documentation for spillin...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2604
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6135/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6499/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7775/



---

[GitHub] carbondata issue #2605: [CARBONDATA-2585] Fix local dictionary for both tabl...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2605
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6134/



---

[GitHub] carbondata issue #2607: [CARBONDATA-2818] Presto Upgrade to 0.206

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2607
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7774/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7773/



---

[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...

2018-08-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2603


---

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread kunal642

Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
LGTM


---

[GitHub] carbondata issue #2606: [CARBONDATA-2817]Thread Leak in Update and in No sor...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2606
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6496/



---

[GitHub] carbondata issue #2606: [CARBONDATA-2817]Thread Leak in Update and in No sor...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2606
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7772/



---

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6493/



---

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7769/



---

[GitHub] carbondata issue #2604: [CARBONDATA-2815][Doc] Add documentation for spillin...

2018-08-03 Thread QiangCai

Github user QiangCai commented on the issue:

https://github.com/apache/carbondata/pull/2604
  
LGTM


---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread sraghunandan

Github user sraghunandan commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
Lgtm


---

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-08-03 Thread vandana7

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r207519570
  
--- Diff: integration/presto/presto-integration-technical-note.md ---
@@ -0,0 +1,253 @@
+
+
+# Presto Integration Technical Note
+Presto Integration with Carbon data include the below steps:
+
+* Setting up Presto Cluster
+
+* Setting up cluster to use carbondata as a catalog along with other 
catalogs provided by presto.
+
+In this technical note we will first learn about the above two points and 
after that we will see how we can do performance tuning with Presto.
+
+## **Let us begin with the first step of Presto Cluster Setup:**
+
+
+* ### Installing Presto
+
+ 1. Download the 0.187 version of Presto using:
+  `wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+
+ 3. Download the Presto CLI for the coordinator and name it presto.
+
+  ```
+wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+
+mv presto-cli-0.187-executable.jar presto
+
+chmod +x presto
+  ```
+
+### Create Configuration Files
+
+  1. Create `etc` folder in presto-server-0.187 directory.
+  2. Create `config.properties`, `jvm.config`, `log.properties`, and 
`node.properties` files.
+  3. Install uuid to generate a node.id.
+
+  ```
+  sudo apt-get install uuid
+
+  uuid
+  ```
+
+
+# Contents of your node.properties file
+
+  ```
+  node.environment=production
+  node.id=
+  node.data-dir=/home/ubuntu/data
+  ```
+
+# Contents of your jvm.config file
+
+  ```
+  -server
+  -Xmx16G
+  -XX:+UseG1GC
+  -XX:G1HeapRegionSize=32M
+  -XX:+UseGCOverheadLimit
+  -XX:+ExplicitGCInvokesConcurrent
+  -XX:+HeapDumpOnOutOfMemoryError
+  -XX:OnOutOfMemoryError=kill -9 %p
+  ```
+
+# Contents of your log.properties file
+  ```
+  com.facebook.presto=INFO
+  ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, 
`INFO`, `WARN` and `ERROR`.
+
+### Coordinator Configurations
+
+# Contents of your config.properties
+  ```
+  coordinator=true
+  node-scheduler.include-coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery-server.enabled=true
+  discovery.uri=:8086
+  ```
+The options `node-scheduler.include-coordinator=false` and 
`coordinator=true` indicate that the node is the coordinator and tells the 
coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: We recommend setting `query.max-memory-per-node` to half of the 
JVM config max memory, though if your workload is highly concurrent, you may 
want to use a lower value for `query.max-memory-per-node`.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+### Worker Configurations
+
+# Contents of your config.properties
+
+  ```
+  coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery.uri=:8086
+  ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the 
nodes (worker + coordinator). All the nodes should have different 
`node.id`.(generated by uuid command).
+
+### **With this we are ready with the Presto Cluster setup but to 
integrate with carbon data further steps are required which are as follows:**
+
+### Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the 
nodes of the cluster including the coordinator.
+
+# Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and 
set the required properties on all the nodes.
+
+### Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+### Start Presto Server on all nodes
+
+```
+./presto-server-0.187/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.187/bin/launcher run
+```
+To run it in foreground.
+
+### Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server :8086 --catalog carbondata --schema 

+```

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-08-03 Thread vandana7

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r207517977
  
--- Diff: integration/presto/presto-integration-technical-note.md ---
@@ -0,0 +1,253 @@
+
+
+# Presto Integration Technical Note
+Presto Integration with Carbon data include the below steps:
+
+* Setting up Presto Cluster
+
+* Setting up cluster to use carbondata as a catalog along with other 
catalogs provided by presto.
+
+In this technical note we will first learn about the above two points and 
after that we will see how we can do performance tuning with Presto.
+
+## **Let us begin with the first step of Presto Cluster Setup:**
+
+
+* ### Installing Presto
+
+ 1. Download the 0.187 version of Presto using:
+  `wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+
+ 3. Download the Presto CLI for the coordinator and name it presto.
+
+  ```
+wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+
+mv presto-cli-0.187-executable.jar presto
+
+chmod +x presto
+  ```
+
+### Create Configuration Files
+
+  1. Create `etc` folder in presto-server-0.187 directory.
+  2. Create `config.properties`, `jvm.config`, `log.properties`, and 
`node.properties` files.
+  3. Install uuid to generate a node.id.
+
+  ```
+  sudo apt-get install uuid
+
+  uuid
+  ```
+
+
+# Contents of your node.properties file
+
+  ```
+  node.environment=production
+  node.id=
+  node.data-dir=/home/ubuntu/data
+  ```
+
+# Contents of your jvm.config file
+
+  ```
+  -server
+  -Xmx16G
+  -XX:+UseG1GC
+  -XX:G1HeapRegionSize=32M
+  -XX:+UseGCOverheadLimit
+  -XX:+ExplicitGCInvokesConcurrent
+  -XX:+HeapDumpOnOutOfMemoryError
+  -XX:OnOutOfMemoryError=kill -9 %p
+  ```
+
+# Contents of your log.properties file
+  ```
+  com.facebook.presto=INFO
+  ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, 
`INFO`, `WARN` and `ERROR`.
+
+### Coordinator Configurations
+
+# Contents of your config.properties
+  ```
+  coordinator=true
+  node-scheduler.include-coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery-server.enabled=true
+  discovery.uri=:8086
+  ```
+The options `node-scheduler.include-coordinator=false` and 
`coordinator=true` indicate that the node is the coordinator and tells the 
coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: We recommend setting `query.max-memory-per-node` to half of the 
JVM config max memory, though if your workload is highly concurrent, you may 
want to use a lower value for `query.max-memory-per-node`.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+### Worker Configurations
+
+# Contents of your config.properties
+
+  ```
+  coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery.uri=:8086
+  ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the 
nodes (worker + coordinator). All the nodes should have different 
`node.id`.(generated by uuid command).
+
+### **With this we are ready with the Presto Cluster setup but to 
integrate with carbon data further steps are required which are as follows:**
+
+### Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the 
nodes of the cluster including the coordinator.
+
+# Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and 
set the required properties on all the nodes.
+
+### Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+### Start Presto Server on all nodes
+
+```
+./presto-server-0.187/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.187/bin/launcher run
+```
+To run it in foreground.
+
+### Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server :8086 --catalog carbondata --schema 

+```

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread chetandb

Github user chetandb commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
LGTM


---

[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...

2018-08-03 Thread sgururajshetty

Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2603#discussion_r207516087
  
--- Diff: docs/configuration-parameters.md ---
@@ -140,7 +140,7 @@ This section provides the details of all the 
configurations required for CarbonD
 | carbon.enableMinMax | true | Min max is feature added to enhance query 
performance. To disable this feature, set it false. |
 | carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum 
time (unit in seconds) the scheduler can wait for executor to be active. 
Minimum value is 5 sec and maximum value is 15 sec. |
 | carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the 
minimum resource (executor) ratio needed for starting the block distribution. 
The default value is 0.8, which indicates 80% of the requested resource is 
allocated for starting block distribution.  The minimum value is 0.1 min and 
the maximum value is 1.0. | 
-| carbon.search.enabled | false | If set to true, it will use CarbonReader 
to do distributed scan directly instead of using compute framework like spark, 
thus avoiding limitation of compute framework like SQL optimizer and task 
scheduling overhead. |
+| carbon.search.enabled (Alpha Feature) | false | If set to true, it will 
use CarbonReader to do distributed scan directly instead of using compute 
framework like spark, thus avoiding limitation of compute framework like SQL 
optimizer and task scheduling overhead. |
 
 * **Global Dictionary Configurations**
--- End diff --

This issue is handled in a different PR #2576


---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7767/



---

[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...

2018-08-03 Thread sgururajshetty

Github user sgururajshetty commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2603#discussion_r207516006
  
--- Diff: docs/configuration-parameters.md ---
@@ -140,7 +140,7 @@ This section provides the details of all the 
configurations required for CarbonD
 | carbon.enableMinMax | true | Min max is feature added to enhance query 
performance. To disable this feature, set it false. |
 | carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum 
time (unit in seconds) the scheduler can wait for executor to be active. 
Minimum value is 5 sec and maximum value is 15 sec. |
 | carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the 
minimum resource (executor) ratio needed for starting the block distribution. 
The default value is 0.8, which indicates 80% of the requested resource is 
allocated for starting block distribution.  The minimum value is 0.1 min and 
the maximum value is 1.0. | 
-| carbon.search.enabled | false | If set to true, it will use CarbonReader 
to do distributed scan directly instead of using compute framework like spark, 
thus avoiding limitation of compute framework like SQL optimizer and task 
scheduling overhead. |
+| carbon.search.enabled (Alpha Feature) | false | If set to true, it will 
use CarbonReader to do distributed scan directly instead of using compute 
framework like spark, thus avoiding limitation of compute framework like SQL 
optimizer and task scheduling overhead. |
 
 * **Global Dictionary Configurations**
--- End diff --

The minimum value need not be mentioned now


---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6491/



---

[jira] [Created] (CARBONDATA-2821) For non-lazy index datamap (index datamap that not specified as deferred rebuild), rebuilding is not skipped

2018-08-03 Thread Chetan Bhat (JIRA)

Chetan Bhat created CARBONDATA-2821:
---

 Summary: For non-lazy index datamap (index datamap that not 
specified as deferred rebuild), rebuilding is not skipped
 Key: CARBONDATA-2821
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2821
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.4.1
 Environment: Spark 2.1, Spark 2.2
Reporter: Chetan Bhat
Assignee: xuchuanyin


Steps :

User creates a datamap on a table.

User loads the data.

User tries to rebuild the datamap.

 

Actual Issue : 

For non-lazy index datamap (index datamap that not specified as deferred 
rebuild), rebuilding is not skipped. As a result the rebuild datamap fails and 
throws error.

 

Expected :

For non-lazy index datamap (index datamap that not specified as deferred 
rebuild), rebuilding can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Block rebuilding for bloo...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7765/



---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6132/



---

[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Block rebuilding for bloo...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6489/



---

[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6131/



---

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-08-03 Thread vandana7

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r207481959
  
--- Diff: integration/presto/presto-integration-technical-note.md ---
@@ -0,0 +1,253 @@
+
+
+# Presto Integration Technical Note
+Presto Integration with Carbon data include the below steps:
+
+* Setting up Presto Cluster
+
+* Setting up cluster to use carbondata as a catalog along with other 
catalogs provided by presto.
+
+In this technical note we will first learn about the above two points and 
after that we will see how we can do performance tuning with Presto.
+
+## **Let us begin with the first step of Presto Cluster Setup:**
+
+
+* ### Installing Presto
+
+ 1. Download the 0.187 version of Presto using:
+  `wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+
+ 3. Download the Presto CLI for the coordinator and name it presto.
+
+  ```
+wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+
+mv presto-cli-0.187-executable.jar presto
+
+chmod +x presto
+  ```
+
+### Create Configuration Files
+
+  1. Create `etc` folder in presto-server-0.187 directory.
+  2. Create `config.properties`, `jvm.config`, `log.properties`, and 
`node.properties` files.
+  3. Install uuid to generate a node.id.
+
+  ```
+  sudo apt-get install uuid
+
+  uuid
+  ```
+
+
+# Contents of your node.properties file
+
+  ```
+  node.environment=production
+  node.id=
+  node.data-dir=/home/ubuntu/data
+  ```
+
+# Contents of your jvm.config file
+
+  ```
+  -server
+  -Xmx16G
+  -XX:+UseG1GC
+  -XX:G1HeapRegionSize=32M
+  -XX:+UseGCOverheadLimit
+  -XX:+ExplicitGCInvokesConcurrent
+  -XX:+HeapDumpOnOutOfMemoryError
+  -XX:OnOutOfMemoryError=kill -9 %p
+  ```
+
+# Contents of your log.properties file
+  ```
+  com.facebook.presto=INFO
+  ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, 
`INFO`, `WARN` and `ERROR`.
+
+### Coordinator Configurations
+
+# Contents of your config.properties
+  ```
+  coordinator=true
+  node-scheduler.include-coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery-server.enabled=true
+  discovery.uri=:8086
+  ```
+The options `node-scheduler.include-coordinator=false` and 
`coordinator=true` indicate that the node is the coordinator and tells the 
coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: We recommend setting `query.max-memory-per-node` to half of the 
JVM config max memory, though if your workload is highly concurrent, you may 
want to use a lower value for `query.max-memory-per-node`.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+### Worker Configurations
+
+# Contents of your config.properties
+
+  ```
+  coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery.uri=:8086
+  ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the 
nodes (worker + coordinator). All the nodes should have different 
`node.id`.(generated by uuid command).
+
+### **With this we are ready with the Presto Cluster setup but to 
integrate with carbon data further steps are required which are as follows:**
+
+### Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the 
nodes of the cluster including the coordinator.
+
+# Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and 
set the required properties on all the nodes.
+
+### Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+### Start Presto Server on all nodes
+
+```
+./presto-server-0.187/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.187/bin/launcher run
+```
+To run it in foreground.
+
+### Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server :8086 --catalog carbondata --schema 

+```

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-08-03 Thread vandana7

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r207479703
  
--- Diff: integration/presto/presto-integration-technical-note.md ---
@@ -0,0 +1,253 @@
+
+
+# Presto Integration Technical Note
+Presto Integration with Carbon data include the below steps:
+
+* Setting up Presto Cluster
+
+* Setting up cluster to use carbondata as a catalog along with other 
catalogs provided by presto.
+
+In this technical note we will first learn about the above two points and 
after that we will see how we can do performance tuning with Presto.
+
+## **Let us begin with the first step of Presto Cluster Setup:**
+
+
+* ### Installing Presto
+
+ 1. Download the 0.187 version of Presto using:
+  `wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+
+ 3. Download the Presto CLI for the coordinator and name it presto.
+
+  ```
+wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+
+mv presto-cli-0.187-executable.jar presto
+
+chmod +x presto
+  ```
+
+### Create Configuration Files
+
+  1. Create `etc` folder in presto-server-0.187 directory.
+  2. Create `config.properties`, `jvm.config`, `log.properties`, and 
`node.properties` files.
+  3. Install uuid to generate a node.id.
+
+  ```
+  sudo apt-get install uuid
+
+  uuid
+  ```
+
+
+# Contents of your node.properties file
+
+  ```
+  node.environment=production
+  node.id=
+  node.data-dir=/home/ubuntu/data
+  ```
+
+# Contents of your jvm.config file
+
+  ```
+  -server
+  -Xmx16G
+  -XX:+UseG1GC
+  -XX:G1HeapRegionSize=32M
+  -XX:+UseGCOverheadLimit
+  -XX:+ExplicitGCInvokesConcurrent
+  -XX:+HeapDumpOnOutOfMemoryError
+  -XX:OnOutOfMemoryError=kill -9 %p
+  ```
+
+# Contents of your log.properties file
+  ```
+  com.facebook.presto=INFO
+  ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, 
`INFO`, `WARN` and `ERROR`.
+
+### Coordinator Configurations
+
+# Contents of your config.properties
+  ```
+  coordinator=true
+  node-scheduler.include-coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery-server.enabled=true
+  discovery.uri=:8086
+  ```
+The options `node-scheduler.include-coordinator=false` and 
`coordinator=true` indicate that the node is the coordinator and tells the 
coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: We recommend setting `query.max-memory-per-node` to half of the 
JVM config max memory, though if your workload is highly concurrent, you may 
want to use a lower value for `query.max-memory-per-node`.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+### Worker Configurations
+
+# Contents of your config.properties
+
+  ```
+  coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery.uri=:8086
+  ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the 
nodes (worker + coordinator). All the nodes should have different 
`node.id`.(generated by uuid command).
+
+### **With this we are ready with the Presto Cluster setup but to 
integrate with carbon data further steps are required which are as follows:**
+
+### Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the 
nodes of the cluster including the coordinator.
+
+# Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and 
set the required properties on all the nodes.
+
+### Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+### Start Presto Server on all nodes
+
+```
+./presto-server-0.187/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.187/bin/launcher run
+```
+To run it in foreground.
+
+### Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server :8086 --catalog carbondata --schema 

+```

[jira] [Created] (CARBONDATA-2820) Block rebuilding for preagg, bloom and lucene datamap

2018-08-03 Thread xuchuanyin (JIRA)

xuchuanyin created CARBONDATA-2820:
--

 Summary: Block rebuilding for preagg, bloom and lucene datamap
 Key: CARBONDATA-2820
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2820
 Project: CarbonData
  Issue Type: Improvement
Reporter: xuchuanyin
Assignee: xuchuanyin


currently we will block rebuilding these datamap



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (CARBONDATA-2819) cannot drop preagg datamap on table if the table has other index datamaps

2018-08-03 Thread lianganping (JIRA)

lianganping created CARBONDATA-2819:
---

 Summary: cannot drop preagg datamap on table if the table has 
other index datamaps
 Key: CARBONDATA-2819
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2819
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 1.4.1
Reporter: lianganping


1.create table student_test(id int,name string,class_number int,male int,female 
int) stored by 'carbondata';

2.create datamap dm1_preaggr_student_test ON TABLE student_test USING 
'preaggregate' as select class_number,sum(male) from student_test group by 
class_number

3.create datamap dm_lucene_student_test on table student_test using 'lucene' 
dmproperties('index_columns' = 'name');

4.drop datamap dm1_preaggr_student_test on table student_test;

and will get this error:

Error: org.apache.carbondata.common.exceptions.sql.NoSuchDataMapException: 
Datamap with name dm1_preaggr_student_test does not exist (state=,code=0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-08-03 Thread vandana7

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r207475132
  
--- Diff: integration/presto/presto-integration-in-carbondata.md ---
@@ -0,0 +1,134 @@
+
+
+# PRESTO INTEGRATION IN CARBONDATA
+
+1. [Document Purpose](#document-purpose)
+1. [Purpose](#purpose)
+1. [Scope](#scope)
+1. [Definitions and Acronyms](#definitions-and-acronyms)
+1. [Requirements addressed](#requirements-addressed)
+1. [Design Considerations](#design-considerations)
+1. [Row Iterator Implementation](#row-iterator-implementation)
+1. [ColumnarReaders or StreamReaders 
approach](#columnarreaders-or-streamreaders-approach)
+1. [Module Structure](#module-structure)
+1. [Detailed design](#detailed-design)
+1. [Modules](#modules)
+1. [Functions Developed](#functions-developed)
+1. [Integration Tests](#integration-tests)
+1. [Tools and languages used](#tools-and-languages-used)
+1. [References](#references)
+
+## Document Purpose
+
+ *  _Purpose_
+ The purpose of this document is to outline the technical design of the 
Presto Integration in CarbonData.
+
+ Its main purpose is to -
+   *  Provide the link between the Functional Requirement and the detailed 
Technical Design documents.
+   *  Detail the functionality which will be provided by each component or 
group of components and show how the various components interact in the design.
+
+ This document is not intended to address installation and configuration 
details of the actual implementation. Installation and configuration details 
are provided in technology guides provided on CarbonData wiki page.As is true 
with any high level design, this document will be updated and refined based on 
changing requirements.
+ *  _Scope_
+ Presto Integration with CarbonData will allow execution of CarbonData 
queries on the Presto CLI. Â CarbonData can be added easily as a Data Source 
among the multiple heterogeneous data sources for Presto.
+ *  _Definitions and Acronyms_
+  **CarbonData :** CarbonData is a fully indexed columnar and Hadoop 
native data-store for processing heavy analytical workloads and detailed 
queries on big data. In customer benchmarks, CarbonData has proven to manage 
Petabyte of data running on extraordinarily low-cost hardware and answers 
queries around 10 times faster than the current open source solutions 
(column-oriented SQL on Hadoop data-stores).
+
+ **Presto :** Presto is a distributed SQL query engine designed to query 
large data sets distributed over one or more heterogeneous data sources.
+
+## Requirements addressed
+This integration of Presto mainly serves two purpose:
+ * Support of Apache CarbonData as Data Source in Presto.
+ * Execution of Apache CarbonData Queries on Presto.
+
+## Design Considerations
+Following are the design considerations for the Presto Integration with 
the Carbondata.
+
+ Row Iterator Implementation
+
+   Presto provides a way to iterate the records through a 
RecordSetProvider which creates a RecordCursor so we have to extend this class 
to create a CarbondataRecordSetProvider and CarbondataRecordCursor to read data 
from Carbondata core module. The CarbondataRecordCursor will utilize the 
DictionaryBasedResultCollector class of Core module to read data row by row. 
This approach has two drawbacks.
+   * The Presto converts this row data into columnar data again since 
carbondata itself store data in columnar format we are adding an additional 
conversion to row to column instead of directly using the column.
+   * The cursor reads the data row by row instead of a batch of data , so 
this is a costly operation as we are already storing the data in pages or 
batches we can directly read the batches of data.
+
+ ColumnarReaders or StreamReaders approach
+
+   In this design we can create StreamReaders that can read data from the 
Carbondata Column based on DataType and directly convert it into Presto Block. 
This approach saves us the row by row processing as well as reduce the 
transition and conversion of data . By this approach we can achieve the fastest 
read from Presto and create a Presto Page by extending PageSourceProvider and 
PageSource class. This design will be discussed in detail in the next sections 
of this document.
+
+## Module Structure
+
+
+![module structure](../presto/images/module-structure.jpg?raw=true)
+
+
+
+## Detailed design
+ Modules
+
+Based on the above functionality, Presto Integration is implemented as 
following module:
+
+1. **Presto**
+
+Integration of Presto with CarbonData includes implementation of connector 
Api of the Presto.
--- End diff --

done


---

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-08-03 Thread vandana7

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r207474334
  
--- Diff: integration/presto/presto-integration-in-carbondata.md ---
@@ -0,0 +1,134 @@
+
+
+# PRESTO INTEGRATION IN CARBONDATA
+
+1. [Document Purpose](#document-purpose)
+1. [Purpose](#purpose)
+1. [Scope](#scope)
+1. [Definitions and Acronyms](#definitions-and-acronyms)
+1. [Requirements addressed](#requirements-addressed)
+1. [Design Considerations](#design-considerations)
+1. [Row Iterator Implementation](#row-iterator-implementation)
+1. [ColumnarReaders or StreamReaders 
approach](#columnarreaders-or-streamreaders-approach)
+1. [Module Structure](#module-structure)
+1. [Detailed design](#detailed-design)
+1. [Modules](#modules)
+1. [Functions Developed](#functions-developed)
+1. [Integration Tests](#integration-tests)
+1. [Tools and languages used](#tools-and-languages-used)
+1. [References](#references)
+
+## Document Purpose
+
+ *  _Purpose_
+ The purpose of this document is to outline the technical design of the 
Presto Integration in CarbonData.
+
+ Its main purpose is to -
+   *  Provide the link between the Functional Requirement and the detailed 
Technical Design documents.
+   *  Detail the functionality which will be provided by each component or 
group of components and show how the various components interact in the design.
+
+ This document is not intended to address installation and configuration 
details of the actual implementation. Installation and configuration details 
are provided in technology guides provided on CarbonData wiki page.As is true 
with any high level design, this document will be updated and refined based on 
changing requirements.
+ *  _Scope_
+ Presto Integration with CarbonData will allow execution of CarbonData 
queries on the Presto CLI. Â CarbonData can be added easily as a Data Source 
among the multiple heterogeneous data sources for Presto.
--- End diff --

done.


---

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7763/



---

[jira] [Created] (CARBONDATA-2818) Migrate Presto Integration from 0.187 to 0.206

2018-08-03 Thread Bhavya Aggarwal (JIRA)

Bhavya Aggarwal created CARBONDATA-2818:
---

 Summary: Migrate Presto Integration from 0.187 to 0.206
 Key: CARBONDATA-2818
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2818
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 1.4.2
Reporter: Bhavya Aggarwal
Assignee: Bhavya Aggarwal


Presto Integration Module migration to 0.206



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6487/



---

[GitHub] carbondata issue #2537: [CARBONDATA-2768][CarbonStore] Fix error in tests fo...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2537
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7762/



---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6130/



---

[GitHub] carbondata issue #2537: [CARBONDATA-2768][CarbonStore] Fix error in tests fo...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2537
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6486/



---

[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7760/



---

[GitHub] carbondata pull request #2568: [Presto-integration-Technical-note] created d...

2018-08-03 Thread vandana7

Github user vandana7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2568#discussion_r207462575
  
--- Diff: integration/presto/presto-integration-in-carbondata.md ---
@@ -0,0 +1,134 @@
+
+
+# PRESTO INTEGRATION IN CARBONDATA
+
+1. [Document Purpose](#document-purpose)
+1. [Purpose](#purpose)
+1. [Scope](#scope)
+1. [Definitions and Acronyms](#definitions-and-acronyms)
+1. [Requirements addressed](#requirements-addressed)
+1. [Design Considerations](#design-considerations)
+1. [Row Iterator Implementation](#row-iterator-implementation)
+1. [ColumnarReaders or StreamReaders 
approach](#columnarreaders-or-streamreaders-approach)
+1. [Module Structure](#module-structure)
+1. [Detailed design](#detailed-design)
+1. [Modules](#modules)
+1. [Functions Developed](#functions-developed)
+1. [Integration Tests](#integration-tests)
+1. [Tools and languages used](#tools-and-languages-used)
+1. [References](#references)
+
+## Document Purpose
+
+ *  _Purpose_
+ The purpose of this document is to outline the technical design of the 
Presto Integration in CarbonData.
+
+ Its main purpose is to -
+   *  Provide the link between the Functional Requirement and the detailed 
Technical Design documents.
+   *  Detail the functionality which will be provided by each component or 
group of components and show how the various components interact in the design.
+
+ This document is not intended to address installation and configuration 
details of the actual implementation. Installation and configuration details 
are provided in technology guides provided on CarbonData wiki page.As is true 
with any high level design, this document will be updated and refined based on 
changing requirements.
--- End diff --

To make it more clear I have linked the installation and configuration for 
integrating Carbondata with presto to this document. If anyone wants to know 
about installation and configuration they can easily visit that document page.


---

[GitHub] carbondata pull request #2594: [CARBONDATA-2809][DataMap] Skip rebuilding fo...

2018-08-03 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2594#discussion_r207462265
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapRebuildCommand.scala
 ---
@@ -48,7 +50,17 @@ case class CarbonDataMapRebuildCommand(
 )(sparkSession)
 }
 val provider = DataMapManager.get().getDataMapProvider(table, schema, 
sparkSession)
-provider.rebuild()
+// for non-lazy index datamap, the data of datamap will be generated 
immediately after
+// the datamap is created or the main table is loaded, so there is no 
need to
+// rebuild this datamap.
+if (!schema.isLazy && provider.isInstanceOf[IndexDataMapProvider]) {
--- End diff --

OK.


---

[jira] [Resolved] (CARBONDATA-2804) Incorrect error message when bloom filter or preaggregate datamap tried to be created on older V1-V2 version stores

2018-08-03 Thread xuchuanyin (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-2804.

   Resolution: Fixed
 Assignee: wangsen
Fix Version/s: 1.4.1

> Incorrect error message when bloom filter or preaggregate datamap tried to be 
> created on older V1-V2 version stores
> ---
>
> Key: CARBONDATA-2804
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2804
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.4.1
> Environment: Spark 2.1
>Reporter: Chetan Bhat
>Assignee: wangsen
>Priority: Minor
> Fix For: 1.4.1
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Steps :
> User creates a table with V1 version store and loads data to the table.
> create table brinjal (imei string,AMSize string,channelsId 
> string,ActiveCountry string, Activecity string,gamePointId 
> double,deviceInformationId double,productionDate Timestamp,deliveryDate 
> timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' 
> TBLPROPERTIES('table_blocksize'='1');
>  LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO 
> TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
> '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
> 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
> In 1.4.1 version user refreshes the table with V1 store and tries to create a 
> bloom filter datamap.
> CREATE DATAMAP dm_brinjal ON TABLE brinjal2 USING 'bloomfilter' DMPROPERTIES 
> ('INDEX_COLUMNS' = 'AMSize', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> create datamap brinjal_agg on table brinjal2 using 'preaggregate' as select 
> AMSize, avg(gamePointId) from brinjal group by gamePointId, AMSize;
> Issue : Bloom filter or preaggregate datamap fails with incorrect error 
> message.
> 0: jdbc:hive2://10.18.98.101:22550/default> CREATE DATAMAP dm_brinjal ON 
> TABLE brinjal2 USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 'AMSize', 
> 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
>  Error: java.io.IOException: org.apache.thrift.protocol.TProtocolException: 
> Required field 'version' was not found in serialized data! Struct: 
> org.apache.carbondata.format.FileHeader$FileHeaderStandardScheme@4d5aa8b2 
> (state=,code=0)
>  0: jdbc:hive2://10.18.98.101:22550/default> create datamap brinjal_agg on 
> table brinjal2 using 'preaggregate' as select AMSize, avg(gamePointId) from 
> brinjal group by gamePointId, AMSize;
>  Error: java.io.IOException: org.apache.thrift.protocol.TProtocolException: 
> Required field 'version' was not found in serialized data! Struct: 
> org.apache.carbondata.format.FileHeader$FileHeaderStandardScheme@55d8323c 
> (state=,code=0)
> Expected : Correct error message should be displayed when bloom filter or 
> preaggregate datamap creation is blocked/fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...

2018-08-03 Thread chetandb

Github user chetandb commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2603#discussion_r207462001
  
--- Diff: docs/configuration-parameters.md ---
@@ -140,7 +140,7 @@ This section provides the details of all the 
configurations required for CarbonD
 | carbon.enableMinMax | true | Min max is feature added to enhance query 
performance. To disable this feature, set it false. |
 | carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum 
time (unit in seconds) the scheduler can wait for executor to be active. 
Minimum value is 5 sec and maximum value is 15 sec. |
 | carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the 
minimum resource (executor) ratio needed for starting the block distribution. 
The default value is 0.8, which indicates 80% of the requested resource is 
allocated for starting block distribution.  The minimum value is 0.1 min and 
the maximum value is 1.0. | 
-| carbon.search.enabled | false | If set to true, it will use CarbonReader 
to do distributed scan directly instead of using compute framework like spark, 
thus avoiding limitation of compute framework like SQL optimizer and task 
scheduling overhead. |
+| carbon.search.enabled (Alpha Feature) | false | If set to true, it will 
use CarbonReader to do distributed scan directly instead of using compute 
framework like spark, thus avoiding limitation of compute framework like SQL 
optimizer and task scheduling overhead. |
 
 * **Global Dictionary Configurations**
--- End diff --

In Local Dictionary section the following updates needs to be done.
1)  Remove the line: â44ad8fb40â¦ Updated documentation on Local 
Dictionary Supoort |â in Page no: 7 at the Local Dictionary Configuration 
section in the Opensource PDF.

2)  Change  the description for âLocal dictionary thresholdâ from: 
âThe maximum cardinality for local dictionary generation (maximum - 
10)â 
to 
âThe maximum cardinality for local dictionary generation (maximum value 
is 10 and minimum value is 1000. If the âlocal_dictionary_thresholdâ 
value is set below 1000 or above 10, then it would take the default value 
1)â

 



---

[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2601


---

[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...

2018-08-03 Thread chetandb

Github user chetandb commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2603#discussion_r207460915
  
--- Diff: docs/configuration-parameters.md ---
@@ -140,7 +140,7 @@ This section provides the details of all the 
configurations required for CarbonD
 | carbon.enableMinMax | true | Min max is feature added to enhance query 
performance. To disable this feature, set it false. |
 | carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum 
time (unit in seconds) the scheduler can wait for executor to be active. 
Minimum value is 5 sec and maximum value is 15 sec. |
 | carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the 
minimum resource (executor) ratio needed for starting the block distribution. 
The default value is 0.8, which indicates 80% of the requested resource is 
allocated for starting block distribution.  The minimum value is 0.1 min and 
the maximum value is 1.0. | 
-| carbon.search.enabled | false | If set to true, it will use CarbonReader 
to do distributed scan directly instead of using compute framework like spark, 
thus avoiding limitation of compute framework like SQL optimizer and task 
scheduling overhead. |
+| carbon.search.enabled (Alpha Feature) | false | If set to true, it will 
use CarbonReader to do distributed scan directly instead of using compute 
framework like spark, thus avoiding limitation of compute framework like SQL 
optimizer and task scheduling overhead. |
 
 * **Global Dictionary Configurations**
--- End diff --

In S3 section.  
1. there should not be any Space in parameter . Should be   
carbon.storelocation.
2. "Concurrent queries are not supported" should be changed to "Only 
concurrent put (data management operations like load,insert,update)are 
supported."
3. The "Another way of setting the authentication parameters is as follows" 
should be removed.



---

[GitHub] carbondata pull request #2594: [CARBONDATA-2809][DataMap] Skip rebuilding fo...

2018-08-03 Thread KanakaKumar

Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2594#discussion_r207460748
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapRebuildCommand.scala
 ---
@@ -48,7 +50,17 @@ case class CarbonDataMapRebuildCommand(
 )(sparkSession)
 }
 val provider = DataMapManager.get().getDataMapProvider(table, schema, 
sparkSession)
-provider.rebuild()
+// for non-lazy index datamap, the data of datamap will be generated 
immediately after
+// the datamap is created or the main table is loaded, so there is no 
need to
+// rebuild this datamap.
+if (!schema.isLazy && provider.isInstanceOf[IndexDataMapProvider]) {
--- End diff --

Right now rebuild call on pre-aggregate DM ithrows 
"NoSuchDataMapException". Please handle to give correct message as 
pre-aggregate also rebuild is not required.


---

[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-03 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
LGTM


---

[GitHub] carbondata pull request #2603: [Documentation] Editorial review comment fixe...

2018-08-03 Thread chetandb

Github user chetandb commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2603#discussion_r207459790
  
--- Diff: docs/sdk-guide.md ---
@@ -351,7 +351,7 @@ public CarbonWriter buildWriterForCSVInput() throws 
IOException, InvalidLoadOpti
 * @throws IOException
 * @throws InvalidLoadOptionException
 */
-public CarbonWriter buildWriterForAvroInput() throws IOException, 
InvalidLoadOptionException;
+public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema schema) 
throws IOException, InvalidLoadOptionException;
 ```
 
--- End diff --




TestSdkJson example code needs to be corrected. testJsonSdkWriter  should 
be static and IOException should be handled
import java.io.IOException;
import 
org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
import org.apache.carbondata.core.metadata.datatype.DataTypes;
import org.apache.carbondata.core.util.CarbonProperties;
import org.apache.carbondata.sdk.file.CarbonWriter;
import org.apache.carbondata.sdk.file.CarbonWriterBuilder;
import org.apache.carbondata.sdk.file.Field;
import org.apache.carbondata.sdk.file.Schema;
public class TestSdkJson {
public static void main(String[] args) throws InvalidLoadOptionException {
testJsonSdkWriter();
}
public void testJsonSdkWriter() throws InvalidLoadOptionException {
String path = "./target/testJsonSdkWriter";
Field[] fields = new Field[2];
fields[0] = new Field("name", DataTypes.STRING);
fields[1] = new Field("age", DataTypes.INT);
Schema CarbonSchema = new Schema(fields);
CarbonWriterBuilder builder = CarbonWriter.builder().outputPath(path);
// initialize json writer with carbon schema
CarbonWriter writer = builder.buildWriterForJsonInput(CarbonSchema);
// one row of json Data as String
String JsonRow = "{\"name\":\"abcd\", \"age\":10}";
int rows = 5;
for (int i = 0; i < rows; i++) {
writer.write(JsonRow);
}
writer.close();
}
}
8.2


---

[GitHub] carbondata pull request #2598: [CARBONDATA-2811][BloomDataMap] Add query tes...

2018-08-03 Thread kevinjmh

Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2598#discussion_r207458031
  
--- Diff: 
integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala
 ---
@@ -219,6 +220,62 @@ class BloomCoarseGrainDataMapSuite extends QueryTest 
with BeforeAndAfterAll with
 sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable")
   }
 
+  test("test using search mode to query tabel with bloom datamap") {
+sql(
+  s"""
+ | CREATE TABLE $normalTable(id INT, name STRING, city STRING, age 
INT,
+ | s1 STRING, s2 STRING, s3 STRING, s4 STRING, s5 STRING, s6 
STRING, s7 STRING, s8 STRING)
+ | STORED BY 'carbondata' TBLPROPERTIES('table_blocksize'='128')
+ |  """.stripMargin)
+sql(
+  s"""
+ | CREATE TABLE $bloomDMSampleTable(id INT, name STRING, city 
STRING, age INT,
+ | s1 STRING, s2 STRING, s3 STRING, s4 STRING, s5 STRING, s6 
STRING, s7 STRING, s8 STRING)
+ | STORED BY 'carbondata' TBLPROPERTIES('table_blocksize'='128')
+ |  """.stripMargin)
+
+// load two segments
+(1 to 2).foreach { i =>
+  sql(
+s"""
+   | LOAD DATA LOCAL INPATH '$bigFile' INTO TABLE $normalTable
+   | OPTIONS('header'='false')
+ """.stripMargin)
+  sql(
+s"""
+   | LOAD DATA LOCAL INPATH '$bigFile' INTO TABLE 
$bloomDMSampleTable
+   | OPTIONS('header'='false')
+ """.stripMargin)
+}
+
+sql(
+  s"""
+ | CREATE DATAMAP $dataMapName ON TABLE $bloomDMSampleTable
+ | USING 'bloomfilter'
+ | DMProperties('INDEX_COLUMNS'='city,id', 'BLOOM_SIZE'='64')
+  """.stripMargin)
+
+checkExistence(sql(s"SHOW DATAMAP ON TABLE $bloomDMSampleTable"), 
true, dataMapName)
+
+// get answer before search mode is enable
+val expectedAnswer1 = sql(s"select * from $normalTable where id = 
1").collect()
+val expectedAnswer2 = sql(s"select * from $normalTable where city in 
('city_999')").collect()
+
+carbonSession.startSearchMode()
+assert(carbonSession.isSearchModeEnabled)
+
+checkAnswer(
--- End diff --

Question also for `LuceneFineGrainDataMapWithSearchModeSuite`

If we use EXPLAIN command, it won't run in Search Mode. 

When we debug this test case, we can see that the query will be pruned in 
Master side of search mode using `getSplit` method in CarbonTableInputFormat 
which finally using datamap to prune.
  
So that should be confirm in other test case with same table schema and 
data, and take this test case as an extended test only for  Search Mode 
feature.  This test case also does not care about whether the datamap created 
before or after data load.


---

[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-03 Thread manishgupta88

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
LGTM


---

[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-03 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6484/



---

[jira] [Updated] (CARBONDATA-2816) MV Datamap - With the hive metastore disabled, MV is not working as expected.

2018-08-03 Thread Prasanna Ravichandran (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2816:
--
Description: 
When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then 
the below issues are seen.

CARBONDATA-2534

CARBONDATA-2539

CARBONDATA-2576

 

  was:
When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then 
the below issues are seen.

CARBONDATA-2540

CARBONDATA-2539

CARBONDATA-2576

 


> MV Datamap - With the hive metastore disabled, MV is not working as expected.
> -
>
> Key: CARBONDATA-2816
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2816
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: MV
>
> When the hive metastore is disabled(spark.carbon.hive.schema.store=false), 
> then the below issues are seen.
> CARBONDATA-2534
> CARBONDATA-2539
> CARBONDATA-2576
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 112 matches

Mail list logo