subject:"\[GitHub\] carbondata pull request #2816\: \[CARBONDATA\-3003\] Suppor read batch row in CS..."

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-11-02 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r230312203
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1521,4 +1521,294 @@ public boolean accept(File dir, String name) {
   e.printStackTrace();
 }
   }
+
+   @Test
--- End diff --

yes


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-31 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229713578
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/RowBatch.java ---
@@ -100,4 +100,19 @@ public int getSize() {
 counter++;
 return row;
   }
+
+  /**
+   * read next batch
+   *
+   * @return rows
+   */
+  public List nextBatch() {
+if (!hasNext()) {
+  throw new NoSuchElementException();
+}
+List row;
+row = rows.subList(counter, rows.size());
--- End diff --

ok, I return rows directly


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-31 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229629760
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/RowBatch.java ---
@@ -100,4 +100,19 @@ public int getSize() {
 counter++;
 return row;
   }
+
+  /**
+   * read next batch
+   *
+   * @return rows
+   */
+  public List nextBatch() {
+if (!hasNext()) {
+  throw new NoSuchElementException();
+}
+List row;
+row = rows.subList(counter, rows.size());
--- End diff --

where is the batch reading happening now ?

you just have to return rows I think. 



---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229162173
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1737,4 +1739,95 @@ public void 
testReadNextRowWithProjectionAndRowUtil() {
 }
   }
 
+  @Test
+  public void testReadNextBatchRow() {
+String path = "./carbondata";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("CarbonReaderTest")
+  .build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+"Hello#World#From#Carbon",
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withBatch(3)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] batch = reader.readNextBatchRow();
+
+for (int j = 0; j < batch.length; j++) {
--- End diff --

ok,done


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161840
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +93,20 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next batch row objects
+   */
+  public Object[] readNextBatchRow() throws Exception {
+validateReader();
+int batch = Integer.parseInt(CarbonProperties.getInstance()
--- End diff --

no need batch in here, I removed.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161378
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +93,20 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next batch row objects
+   */
+  public Object[] readNextBatchRow() throws Exception {
+validateReader();
+int batch = Integer.parseInt(CarbonProperties.getInstance()
--- End diff --

ok, I added default batch size


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161334
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
  */
 bool readFromS3(JNIEnv *env, char *argv[]) {
 printf("\nRead data from S3:\n");
+struct timeval start, build, read;
+gettimeofday(, NULL);
+
+CarbonReader reader;
+
+char *args[3];
+// "your access key"
+args[0] = argv[1];
+// "your secret key"
+args[1] = argv[2];
+// "your endPoint"
+args[2] = argv[3];
+
+reader.builder(env, "s3a://sdk/WriterOutput/carbondata", "test");
+reader.withHadoopConf("fs.s3a.access.key", argv[1]);
+reader.withHadoopConf("fs.s3a.secret.key", argv[2]);
+reader.withHadoopConf("fs.s3a.endpoint", argv[3]);
+reader.build();
+
+gettimeofday(, NULL);
+int time = 100 * (build.tv_sec - start.tv_sec) + build.tv_usec - 
start.tv_usec;
+int buildTime = time / 100.0;
+printf("build time: %lf s\n", time / 100.0);
+
+CarbonRow carbonRow(env);
+int i = 0;
+while (reader.hasNext()) {
+jobject row = reader.readNextRow();
+i++;
+carbonRow.setCarbonRow(row);
+
+printf("%s\t", carbonRow.getString(0));
+printf("%d\t", carbonRow.getInt(1));
+printf("%ld\t", carbonRow.getLong(2));
+printf("%s\t", carbonRow.getVarchar(3));
+jobjectArray arr = carbonRow.getArray(4);
+jsize length = env->GetArrayLength(arr);
+int j = 0;
+for (j = 0; j < length; j++) {
+jobject element = env->GetObjectArrayElement(arr, j);
+char *str = (char *) env->GetStringUTFChars((jstring) element, 
JNI_FALSE);
+printf("%s\t", str);
+}
+env->DeleteLocalRef(arr);
+printf("%d\t", carbonRow.getShort(5));
+printf("%d\t", carbonRow.getInt(6));
+printf("%ld\t", carbonRow.getLong(7));
+printf("%lf\t", carbonRow.getDouble(8));
+bool bool1 = carbonRow.getBoolean(9);
+if (bool1) {
+printf("true\t");
+} else {
+printf("false\t");
+}
+printf("%s\t", carbonRow.getDecimal(10));
+printf("%f\t", carbonRow.getFloat(11));
+printf("\n");
+env->DeleteLocalRef(row);
+}
+gettimeofday(, NULL);
+time = 100 * (read.tv_sec - start.tv_sec) + read.tv_usec - 
start.tv_usec;
+printf("total lines is %d: build time: %lf, read time is %lf s, 
average speed is %lf records/s\n",
+   i, buildTime, time / 100.0, i / (time / 100.0));
+
+reader.close();
+}
+
+/**
+ * read data from S3
+ * parameter is ak sk endpoint
+ *
+ * @param env jni env
+ * @param argv argument vector
+ * @return
+ */
+bool readFromS3ForBigData(JNIEnv *env, char **argv) {
--- End diff --

removed this test case


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229161289
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
  */
 bool readFromS3(JNIEnv *env, char *argv[]) {
 printf("\nRead data from S3:\n");
+struct timeval start, build, read;
--- End diff --

ok, optimized, please check.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r229155198
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/RowBatch.java ---
@@ -100,4 +100,25 @@ public int getSize() {
 counter++;
 return row;
   }
+
+  /**
+   * read next batch
+   *
+   * @param batch batch size
+   * @return rows
+   */
+  public List nextBatch(int batch) {
+if (!hasNext()) {
+  throw new NoSuchElementException();
+}
+List row;
+if (counter + batch > rows.size()) {
+  row = rows.subList(counter, rows.size());
+  counter = counter + row.size();
--- End diff --

counter != row.size()  before change readNextBatchRow(batch) to 
readNextBatchRow(). the batch is different between withBatch(batch) and 
readNextBatchRow(batch) before. But after change, the batch is the same, so 
counter = row.size()


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228935418
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1737,4 +1739,95 @@ public void 
testReadNextRowWithProjectionAndRowUtil() {
 }
   }
 
+  @Test
+  public void testReadNextBatchRow() {
+String path = "./carbondata";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("CarbonReaderTest")
+  .build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+"Hello#World#From#Carbon",
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withBatch(3)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] batch = reader.readNextBatchRow();
+
+for (int j = 0; j < batch.length; j++) {
--- End diff --

need to validate whether the batch size is same as we set or lesser than 
that (for last batch). Must not be greater than our set batch size


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228932987
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +93,20 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next batch row objects
+   */
+  public Object[] readNextBatchRow() throws Exception {
+validateReader();
+int batch = Integer.parseInt(CarbonProperties.getInstance()
--- End diff --

What is this property is not set ? we get NPE here


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228932584
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
  */
 bool readFromS3(JNIEnv *env, char *argv[]) {
 printf("\nRead data from S3:\n");
+struct timeval start, build, read;
+gettimeofday(, NULL);
+
+CarbonReader reader;
+
+char *args[3];
+// "your access key"
+args[0] = argv[1];
+// "your secret key"
+args[1] = argv[2];
+// "your endPoint"
+args[2] = argv[3];
+
+reader.builder(env, "s3a://sdk/WriterOutput/carbondata", "test");
+reader.withHadoopConf("fs.s3a.access.key", argv[1]);
+reader.withHadoopConf("fs.s3a.secret.key", argv[2]);
+reader.withHadoopConf("fs.s3a.endpoint", argv[3]);
+reader.build();
+
+gettimeofday(, NULL);
+int time = 100 * (build.tv_sec - start.tv_sec) + build.tv_usec - 
start.tv_usec;
+int buildTime = time / 100.0;
+printf("build time: %lf s\n", time / 100.0);
+
+CarbonRow carbonRow(env);
+int i = 0;
+while (reader.hasNext()) {
+jobject row = reader.readNextRow();
+i++;
+carbonRow.setCarbonRow(row);
+
+printf("%s\t", carbonRow.getString(0));
+printf("%d\t", carbonRow.getInt(1));
+printf("%ld\t", carbonRow.getLong(2));
+printf("%s\t", carbonRow.getVarchar(3));
+jobjectArray arr = carbonRow.getArray(4);
+jsize length = env->GetArrayLength(arr);
+int j = 0;
+for (j = 0; j < length; j++) {
+jobject element = env->GetObjectArrayElement(arr, j);
+char *str = (char *) env->GetStringUTFChars((jstring) element, 
JNI_FALSE);
+printf("%s\t", str);
+}
+env->DeleteLocalRef(arr);
+printf("%d\t", carbonRow.getShort(5));
+printf("%d\t", carbonRow.getInt(6));
+printf("%ld\t", carbonRow.getLong(7));
+printf("%lf\t", carbonRow.getDouble(8));
+bool bool1 = carbonRow.getBoolean(9);
+if (bool1) {
+printf("true\t");
+} else {
+printf("false\t");
+}
+printf("%s\t", carbonRow.getDecimal(10));
+printf("%f\t", carbonRow.getFloat(11));
+printf("\n");
+env->DeleteLocalRef(row);
+}
+gettimeofday(, NULL);
+time = 100 * (read.tv_sec - start.tv_sec) + read.tv_usec - 
start.tv_usec;
+printf("total lines is %d: build time: %lf, read time is %lf s, 
average speed is %lf records/s\n",
+   i, buildTime, time / 100.0, i / (time / 100.0));
+
+reader.close();
+}
+
+/**
+ * read data from S3
+ * parameter is ak sk endpoint
+ *
+ * @param env jni env
+ * @param argv argument vector
+ * @return
+ */
+bool readFromS3ForBigData(JNIEnv *env, char **argv) {
--- End diff --

same as above


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228932392
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -220,6 +393,86 @@ bool tryCatchException(JNIEnv *env) {
  */
 bool readFromS3(JNIEnv *env, char *argv[]) {
 printf("\nRead data from S3:\n");
+struct timeval start, build, read;
--- End diff --

Like we discussed in the previous PR, separate S3 test cases are not 
required. only thing we do in s3 testcase is calling withHadoopConf API.  user 
can call this if he needs to run in S3 environment.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-29 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228926855
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/RowBatch.java ---
@@ -100,4 +100,25 @@ public int getSize() {
 counter++;
 return row;
   }
+
+  /**
+   * read next batch
+   *
+   * @param batch batch size
+   * @return rows
+   */
+  public List nextBatch(int batch) {
+if (!hasNext()) {
+  throw new NoSuchElementException();
+}
+List row;
+if (counter + batch > rows.size()) {
+  row = rows.subList(counter, rows.size());
+  counter = counter + row.size();
--- End diff --

isn't it counter = row.size() ?

because we are copying rows.size()-counter size data. So it is like counter 
= counter + (row.size()-counter) ?


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228480901
  
--- Diff: store/CSDK/main.cpp ---
@@ -99,21 +102,187 @@ bool readFromLocalWithoutProjection(JNIEnv *env) {
 printf("%s\t", carbonRow.getDecimal(10));
 printf("%f\t", carbonRow.getFloat(11));
 printf("\n");
+env->DeleteLocalRef(row);
+env->DeleteLocalRef(array1);
 }
 
 carbonReaderClass.close();
 }
 
+/**
+ * test next Row Performance
+ *
+ * @param env  jni env
+ * @return
+ */
+bool testNextRowPerformance(JNIEnv *env, char *path, int printNum, char 
*argv[], int argc) {
--- End diff --

Example code must be independent of data. It should work for small data as 
well as bigdata.
Also as you know we cannot keep huge data testcase in automation, PR 
builder will take time. So we avoid huge data test case. It is DFX scenario.



---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228479559
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java
 ---
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.benchmark;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+
+import 
org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.sdk.file.*;
+
+/**
+ * Test SDK read performance
+ */
+public class SDKReaderBenchmark {
--- End diff --

**you can have reader benchmarking code,** 

but it should just be a reader code that takes path and prints the rows and 
time taken.
But this example is extending data (writing data) That should not be there. 
 
Also for everything separate S3 example not required. current examples only 
set conf and it works. So don't add multiple examples also.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228477978
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderExampleForBigData.java
 ---
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.benchmark;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+
+import 
org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.sdk.file.*;
+
+/**
+ * Test SDK read example for big data
+ */
+public class SDKReaderExampleForBigData {
+  public static void main(String[] args) throws InterruptedException, 
InvalidLoadOptionException, IOException {
+System.out.println("start to read data");
+String path = "../../../../Downloads/carbon-data-big";
+if (args.length > 0) {
+  path = args[0];
+}
+double num = 10.0;
+String originPath = "../../../../Downloads/carbon-data";
+String newPath = "../../../../Downloads/carbon-data-big";
+boolean writeNewData = false;
+if (writeNewData) {
+  extendData(originPath, newPath);
+}
+
+Configuration conf = new Configuration();
+if (args.length > 3) {
+  conf.set("fs.s3a.access.key", args[1]);
+  conf.set("fs.s3a.secret.key", args[2]);
+  conf.set("fs.s3a.endpoint", args[3]);
+}
+readNextBatchRow(path, num, conf, 10, 10);
+readNextRow(path, num, conf, 10);
+  }
+
+  public static void readNextRow(String path, double num, Configuration 
conf, int printNum) {
+System.out.println("readNextRow");
+try {
+  // Read data
+  Long startTime = System.nanoTime();
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withHadoopConf(conf)
+  .build();
+
+  Long startReadTime = System.nanoTime();
+  System.out.println("build time is " + (startReadTime - startTime) / 
num);
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+i++;
+if (i % printNum == 0) {
+  Long point = System.nanoTime();
+  System.out.print(i + ": time is " + (point - startReadTime) / num
+  + " s, speed is " + (i / ((point - startReadTime) / num)) + 
" records/s \t");
+  for (int j = 0; j < data.length; j++) {
+System.out.print(data[j] + "\t\t");
+  }
+  System.out.println();
+}
+  }
+  Long endReadTime = System.nanoTime();
+  System.out.println("total lines is " + i + ", build time is " + 
(startReadTime - startTime) / num
+  + " s, \ttotal read time is " + (endReadTime - startReadTime) / 
num
+  + " s, \taverage speed is " + (i / ((endReadTime - 
startReadTime) / num))
+  + " records/s.");
+  reader.close();
+} catch (Throwable e) {
+  e.printStackTrace();
+}
+  }
+
+  /**
+   * read next batch row
+   *
+   * @param path data path
+   * @param num  number for time
+   * @param conf configuration
+   * @param batchbatch size
+   * @param printNum print number for each batch
+   */
+  public static void readNextBatchRow(String path, double num, 
Configuration conf, int batch, int printNum) {
+System.out.println("readNextBatchRow");
+try {
+  // Read data
+  Long startTime = System.nanoTime();
+  CarbonReader reader =

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228469216
  
--- Diff: store/CSDK/main.cpp ---
@@ -99,21 +102,187 @@ bool readFromLocalWithoutProjection(JNIEnv *env) {
 printf("%s\t", carbonRow.getDecimal(10));
 printf("%f\t", carbonRow.getFloat(11));
 printf("\n");
+env->DeleteLocalRef(row);
+env->DeleteLocalRef(array1);
 }
 
 carbonReaderClass.close();
 }
 
+/**
+ * test next Row Performance
+ *
+ * @param env  jni env
+ * @return
+ */
+bool testNextRowPerformance(JNIEnv *env, char *path, int printNum, char 
*argv[], int argc) {
--- End diff --

I want to test for big data(more than 100 million rows), I change the test 
case name, is it ok?


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228469192
  
--- Diff: store/CSDK/main.cpp ---
@@ -224,8 +406,81 @@ bool readFromS3(JNIEnv *env, char *argv[]) {
 printf("%s\t", carbonRow.getDecimal(10));
 printf("%f\t", carbonRow.getFloat(11));
 printf("\n");
+env->DeleteLocalRef(row);
+env->DeleteLocalRef(array1);
+}
+gettimeofday(, NULL);
+time = 100 * (read.tv_sec - start.tv_sec) + read.tv_usec - 
start.tv_usec;
+printf("total lines is %d: build time: %lf, read time is %lf s, 
average speed is %lf records/s\n",
+   i, buildTime, time / 100.0, i / (time / 100.0));
+
+reader.close();
+}
+
+/**
+ * read data from S3
+ * parameter is ak sk endpoint
+ *
+ * @param env jni env
+ * @param argv argument vector
+ * @return
+ */
+bool readPerformanceFromS3(JNIEnv *env, char *argv[]) {
--- End diff --

I want to test for big data(more than 100 million rows), I change the test 
case name, is it ok?


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228461385
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +91,18 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next batch row objects
+   */
+  public Object[] readNextBatchRow(int batch) throws Exception {
--- End diff --

ok, optimized. remove batch in readNextBatchRow and keep in withBatch


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228459548
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1723,4 +1723,93 @@ public void testReadNextCarbonRowWithProjection() {
 }
   }
 
+  @Test
+  public void testReadNextBatchRow() {
+String path = "./carbondata";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields)).build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+"Hello#World#From#Carbon",
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
--- End diff --

ok, done


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228458636
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -74,27 +75,41 @@ jobject CarbonReader::withHadoopConf(char *key, char 
*value) {
 return carbonReaderBuilderObject;
 }
 
+jobject CarbonReader::withBatch(int batch) {
+jclass carbonReaderBuilderClass = 
jniEnv->GetObjectClass(carbonReaderBuilderObject);
+jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, 
"withBatch",
+"(I)Lorg/apache/carbondata/sdk/file/CarbonReaderBuilder;");
+
+jvalue args[1];
+args[0].i = batch;
+carbonReaderBuilderObject = 
jniEnv->CallObjectMethodA(carbonReaderBuilderObject, buildID, args);
+return carbonReaderBuilderObject;
+}
+
 jobject CarbonReader::build() {
 jclass carbonReaderBuilderClass = 
jniEnv->GetObjectClass(carbonReaderBuilderObject);
 jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, 
"build",
 "()Lorg/apache/carbondata/sdk/file/CarbonReader;");
 carbonReaderObject = 
jniEnv->CallObjectMethod(carbonReaderBuilderObject, buildID);
+carbonReader = jniEnv->GetObjectClass(carbonReaderObject);
+hasNextID = jniEnv->GetMethodID(carbonReader, "hasNext", "()Z");
--- End diff --

this has been done in PR2792


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-26 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228457234
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -74,27 +75,41 @@ jobject CarbonReader::withHadoopConf(char *key, char 
*value) {
 return carbonReaderBuilderObject;
 }
 
+jobject CarbonReader::withBatch(int batch) {
+jclass carbonReaderBuilderClass = 
jniEnv->GetObjectClass(carbonReaderBuilderObject);
+jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, 
"withBatch",
--- End diff --

GetObjectClass shouldn't null, we should check whether 
carbonReaderBuilderObject is null.
checked GetMethodID.
CallObjectMethodA of withBatch won't throw exception



---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228392301
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
@@ -116,6 +116,20 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext context)
 return readSupport.readRow(carbonIterator.next());
   }
 
+  /**
+   * get batch result
+   *
+   * @param batch batch size
+   * @return rows
+   */
+  public List getBatchValue(int batch) {
+rowCount += batch;
--- End diff --

ok, optimized


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228391972
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java
 ---
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.benchmark;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+
+import 
org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.sdk.file.*;
+
+/**
+ * Test SDK read performance
+ */
+public class SDKReaderBenchmark {
--- End diff --

we need know the SDK reader performance, and then we can know JNI 
performance lose by comparing with CSDK performance.  And @KanakaKumar told we 
should test SDK reader performance in 1.5.1. 
example has two benchmark currentlyï¼ ConcurrentQueryBenchmark and 
SimpleQueryBenchmarkã

If there are lack some test or  indepth (IO time, pages scanned and all) or 
granularï¼ we can add.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228390686
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/RowBatch.java ---
@@ -100,4 +100,24 @@ public int getSize() {
 counter++;
 return row;
   }
+
+  /**
+   * read next batch
+   *
+   * @param batch batch size
+   * @return rows
+   */
+  public List nextBatch(int batch) {
+if (!hasNext()) {
+  throw new NoSuchElementException();
+}
+List row;
+if (counter + batch > rows.size()) {
+  row = rows.subList(counter, rows.size());
+} else {
+  row = rows.subList(counter, counter + batch);
+}
+counter = counter + batch;
--- End diff --

ok, optimized


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228386483
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1616,7 +1616,6 @@ public boolean accept(File dir, String name) {
 assertEquals(RowUtil.getDouble(data, 4), ((double) i) / 2);
 assert (RowUtil.getBoolean(data, 5));
 assertEquals(RowUtil.getInt(data, 6), 17957);
-assertEquals(RowUtil.getLong(data, 7), 154991181400L);
--- End diff --

It's different between local machine and CI machine.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228221442
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +91,18 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next batch row objects
+   */
+  public Object[] readNextBatchRow(int batch) throws Exception {
--- End diff --

Why do we need to pass batch size here ? It should return batch same as set 
from withBatch(int) ??


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228220014
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1723,4 +1723,93 @@ public void testReadNextCarbonRowWithProjection() {
 }
   }
 
+  @Test
+  public void testReadNextBatchRow() {
+String path = "./carbondata";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields)).build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+"Hello#World#From#Carbon",
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
--- End diff --

Need to set batch size and test ? withBatch(int)


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228214955
  
--- Diff: store/CSDK/main.cpp ---
@@ -224,8 +406,81 @@ bool readFromS3(JNIEnv *env, char *argv[]) {
 printf("%s\t", carbonRow.getDecimal(10));
 printf("%f\t", carbonRow.getFloat(11));
 printf("\n");
+env->DeleteLocalRef(row);
+env->DeleteLocalRef(array1);
+}
+gettimeofday(, NULL);
+time = 100 * (read.tv_sec - start.tv_sec) + read.tv_usec - 
start.tv_usec;
+printf("total lines is %d: build time: %lf, read time is %lf s, 
average speed is %lf records/s\n",
+   i, buildTime, time / 100.0, i / (time / 100.0));
+
+reader.close();
+}
+
+/**
+ * read data from S3
+ * parameter is ak sk endpoint
+ *
+ * @param env jni env
+ * @param argv argument vector
+ * @return
+ */
+bool readPerformanceFromS3(JNIEnv *env, char *argv[]) {
--- End diff --

all internal comparision performance test cases are not needed.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228213436
  
--- Diff: store/CSDK/main.cpp ---
@@ -99,21 +102,187 @@ bool readFromLocalWithoutProjection(JNIEnv *env) {
 printf("%s\t", carbonRow.getDecimal(10));
 printf("%f\t", carbonRow.getFloat(11));
 printf("\n");
+env->DeleteLocalRef(row);
+env->DeleteLocalRef(array1);
 }
 
 carbonReaderClass.close();
 }
 
+/**
+ * test next Row Performance
+ *
+ * @param env  jni env
+ * @return
+ */
+bool testNextRowPerformance(JNIEnv *env, char *path, int printNum, char 
*argv[], int argc) {
--- End diff --

Same comment, this is internal comparision. No need to add in example test 
case.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228209499
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -74,27 +75,41 @@ jobject CarbonReader::withHadoopConf(char *key, char 
*value) {
 return carbonReaderBuilderObject;
 }
 
+jobject CarbonReader::withBatch(int batch) {
+jclass carbonReaderBuilderClass = 
jniEnv->GetObjectClass(carbonReaderBuilderObject);
+jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, 
"withBatch",
+"(I)Lorg/apache/carbondata/sdk/file/CarbonReaderBuilder;");
+
+jvalue args[1];
+args[0].i = batch;
+carbonReaderBuilderObject = 
jniEnv->CallObjectMethodA(carbonReaderBuilderObject, buildID, args);
+return carbonReaderBuilderObject;
+}
+
 jobject CarbonReader::build() {
 jclass carbonReaderBuilderClass = 
jniEnv->GetObjectClass(carbonReaderBuilderObject);
 jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, 
"build",
 "()Lorg/apache/carbondata/sdk/file/CarbonReader;");
 carbonReaderObject = 
jniEnv->CallObjectMethod(carbonReaderBuilderObject, buildID);
+carbonReader = jniEnv->GetObjectClass(carbonReaderObject);
+hasNextID = jniEnv->GetMethodID(carbonReader, "hasNext", "()Z");
--- End diff --

These filling can be moved to those functions only down. just have to check 
if not null or 0, then only fill. So that only one time we fill


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228206123
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -74,27 +75,41 @@ jobject CarbonReader::withHadoopConf(char *key, char 
*value) {
 return carbonReaderBuilderObject;
 }
 
+jobject CarbonReader::withBatch(int batch) {
+jclass carbonReaderBuilderClass = 
jniEnv->GetObjectClass(carbonReaderBuilderObject);
+jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, 
"withBatch",
--- End diff --

Need to add validation. GetObjectClass, GetMethodID can return null.

Also need to take java exception to cpp for CallObjectMethodA.  


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228202864
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
@@ -116,6 +116,20 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext context)
 return readSupport.readRow(carbonIterator.next());
   }
 
+  /**
+   * get batch result
+   *
+   * @param batch batch size
+   * @return rows
+   */
+  public List getBatchValue(int batch) {
+rowCount += batch;
--- End diff --

we may not get complete batch size data from RowBatch.nextBatch(), it can 
return lesser data also. So, incrementing rowcount as batchsize is wrong. 
we need to increment this value same as returned rows from batch iterator.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228191657
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java
 ---
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.benchmark;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+
+import 
org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.sdk.file.*;
+
+/**
+ * Test SDK read performance
+ */
+public class SDKReaderBenchmark {
--- End diff --

This call not required. Please remove.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228183334
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/RowBatch.java ---
@@ -100,4 +100,24 @@ public int getSize() {
 counter++;
 return row;
   }
+
+  /**
+   * read next batch
+   *
+   * @param batch batch size
+   * @return rows
+   */
+  public List nextBatch(int batch) {
+if (!hasNext()) {
+  throw new NoSuchElementException();
+}
+List row;
+if (counter + batch > rows.size()) {
+  row = rows.subList(counter, rows.size());
+} else {
+  row = rows.subList(counter, counter + batch);
+}
+counter = counter + batch;
--- End diff --

What if code enters if check (line 116), incrementing a counter by batch 
size wrong. because we fetched data lesser that batch size (only till row size)


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-25 Thread ajantha-bhat

Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228156985
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1616,7 +1616,6 @@ public boolean accept(File dir, String name) {
 assertEquals(RowUtil.getDouble(data, 4), ((double) i) / 2);
 assert (RowUtil.getBoolean(data, 5));
 assertEquals(RowUtil.getInt(data, 6), 17957);
-assertEquals(RowUtil.getLong(data, 7), 154991181400L);
--- End diff --

Why removed this validation ?


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228028896
  
--- Diff: README.md ---
@@ -61,6 +61,7 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
  * [CarbonData Pre-aggregate 
DataMap](https://github.com/apache/carbondata/blob/master/docs/preaggregate-datamap-guide.md)
 
  * [CarbonData Timeseries 
DataMap](https://github.com/apache/carbondata/blob/master/docs/timeseries-datamap-guide.md)
 
 * [SDK 
Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md) 
+* [CSDK 
Guide](https://github.com/apache/carbondata/blob/master/docs/CSDK-guide.md)
--- End diff --

ok, done


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228028346
  
--- Diff: store/CSDK/main.cpp ---
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include "CarbonReader.h"
+#include "CarbonRow.h"
+#include 
 
 using namespace std;
--- End diff --

This is main file in C/C++. but only for test. In the future, CSDK will 
support test framework(such as googletest) to instead of  main.cpp.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228026273
  
--- Diff: store/CSDK/CMakeLists.txt ---
@@ -1,17 +1,17 @@
-cmake_minimum_required (VERSION 2.8)
-project (CJDK)
+cmake_minimum_required(VERSION 2.8)
--- End diff --

ok, added


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228025690
  
--- Diff: store/CSDK/main.cpp ---
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include "CarbonReader.h"
+#include "CarbonRow.h"
+#include 
--- End diff --

ok, I also change the others.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228025597
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -17,6 +17,7 @@
 
 #include "CarbonReader.h"
 #include 
+#include 
--- End diff --

ok, done


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228025429
  
--- Diff: store/CSDK/CMakeLists.txt ---
@@ -1,17 +1,17 @@
-cmake_minimum_required (VERSION 2.8)
-project (CJDK)
+cmake_minimum_required(VERSION 2.8)
--- End diff --

I think no need to add license header in this file.
CMakeLists.txt like pom.xml, it's not code file. 
Tensorflow and caffe project also didn't add license header in 
CMakeLists.txt .


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r228024318
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java
 ---
@@ -74,4 +76,13 @@ public ChunkRowIterator(CarbonIterator 
iterator) {
 return currentChunk.next();
   }
 
+  /**
+   * get
--- End diff --

ok,optimized


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r227681835
  
--- Diff: README.md ---
@@ -61,6 +61,7 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
  * [CarbonData Pre-aggregate 
DataMap](https://github.com/apache/carbondata/blob/master/docs/preaggregate-datamap-guide.md)
 
  * [CarbonData Timeseries 
DataMap](https://github.com/apache/carbondata/blob/master/docs/timeseries-datamap-guide.md)
 
 * [SDK 
Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md) 
+* [CSDK 
Guide](https://github.com/apache/carbondata/blob/master/docs/CSDK-guide.md)
--- End diff --

I think it is better call C++ SDK Guide. The SDK is for C++.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r227681593
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1521,4 +1521,294 @@ public boolean accept(File dir, String name) {
   e.printStackTrace();
 }
   }
+
+   @Test
--- End diff --

Is there testcase in CPP?


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r227681343
  
--- Diff: store/CSDK/main.cpp ---
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include "CarbonReader.h"
+#include "CarbonRow.h"
+#include 
 
 using namespace std;
--- End diff --

why this file is called main.cpp?


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r227681257
  
--- Diff: store/CSDK/main.cpp ---
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include "CarbonReader.h"
+#include "CarbonRow.h"
+#include 
--- End diff --

Move before "CarbonReader.h"


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r227680646
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -17,6 +17,7 @@
 
 #include "CarbonReader.h"
 #include 
+#include 
--- End diff --

I think system header file should be before carbon header file, so 
"CarbonReader.h" should be moved down


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r227680374
  
--- Diff: store/CSDK/CMakeLists.txt ---
@@ -1,17 +1,17 @@
-cmake_minimum_required (VERSION 2.8)
-project (CJDK)
+cmake_minimum_required(VERSION 2.8)
--- End diff --

Shouldn't add license header?


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-24 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r227679865
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java
 ---
@@ -74,4 +76,13 @@ public ChunkRowIterator(CarbonIterator 
iterator) {
 return currentChunk.next();
   }
 
+  /**
+   * get
--- End diff --

description is not correct


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-19 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r226565398
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java
 ---
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.benchmark;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+
+import 
org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.sdk.file.*;
+
+/**
+ * Test SDK read performance
+ */
+public class SDKReaderBenchmark {
--- End diff --

1. write code is for read.
This PR didn't test write performance.
I will write SDKWriterBenchmark  for writing date after CSDK support write 
carbondata

2. The data is from could stream, and this PR will enlarge the data.


---

[GitHub] carbondata pull request #2816: [CARBONDATA-3003] Suppor read batch row in CS...

2018-10-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2816#discussion_r226558824
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/benchmark/SDKReaderBenchmark.java
 ---
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.benchmark;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+
+import 
org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.sdk.file.*;
+
+/**
+ * Test SDK read performance
+ */
+public class SDKReaderBenchmark {
--- End diff --

1. It seems this class is not only for reader but also for writer. If it is 
so, please optimize the class name; If it is not, I think you can include them 
in one class.
2. For a benchmark, I don't see how the data is generated.



---

53 matches

Mail list logo