[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-08 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r200843765
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -53,11 +60,75 @@ public int fillVector(int[] filteredRowId, 
ColumnVectorInfo[] vectorInfo, int ch
 throw new UnsupportedOperationException("internal error");
   }
 
-  @Override
-  public byte[] getChunkData(int rowId) {
-return columnPage.getBytes(rowId);
+  @Override public byte[] getChunkData(int rowId) {
+ColumnType columnType = columnPage.getColumnSpec().getColumnType();
+DataType srcDataType = columnPage.getColumnSpec().getSchemaDataType();
+DataType targetDataType = columnPage.getDataType();
+if (columnPage.getNullBits().get(rowId)) {
+  // if this row is null, return default null represent in byte array
+  return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY;
+}
+if ((columnType == ColumnType.COMPLEX_PRIMITIVE) && 
this.isAdaptiveComplexPrimitive()) {
+  if (srcDataType == DataTypes.DOUBLE || srcDataType == 
DataTypes.FLOAT) {
+double doubleData = columnPage.getDouble(rowId);
+if (srcDataType == DataTypes.FLOAT) {
+  float out = (float) doubleData;
--- End diff --

Convert to actual type (float) and get bytes adds one additional conversion 
per row. Can we avoid by extract/copy only required  bytes based on type?


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-08 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r200843282
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
 ---
@@ -359,38 +412,36 @@ public void freeMemory() {
 }
   }
 
-  @Override
-  public void convertValue(ColumnPageValueConverter codec) {
-int pageSize = getPageSize();
+  @Override public void convertValue(ColumnPageValueConverter codec) {
 if (dataType == DataTypes.BYTE) {
-  for (long i = 0; i < pageSize; i++) {
+  for (long i = 0; i < totalLength / ByteUtil.SIZEOF_BYTE; i++) {
--- End diff --

for loop end condition (totalLength / ByteUtil.SIZEOF_BYTE) is evaluated 
for every row.  if we extract the computation of page size to a method, we can 
avoid this


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-08 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r200842962
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -53,11 +60,75 @@ public int fillVector(int[] filteredRowId, 
ColumnVectorInfo[] vectorInfo, int ch
 throw new UnsupportedOperationException("internal error");
   }
 
-  @Override
-  public byte[] getChunkData(int rowId) {
-return columnPage.getBytes(rowId);
+  @Override public byte[] getChunkData(int rowId) {
+ColumnType columnType = columnPage.getColumnSpec().getColumnType();
+DataType srcDataType = columnPage.getColumnSpec().getSchemaDataType();
+DataType targetDataType = columnPage.getDataType();
+if (columnPage.getNullBits().get(rowId)) {
+  // if this row is null, return default null represent in byte array
+  return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY;
+}
+if ((columnType == ColumnType.COMPLEX_PRIMITIVE) && 
this.isAdaptiveComplexPrimitive()) {
+  if (srcDataType == DataTypes.DOUBLE || srcDataType == 
DataTypes.FLOAT) {
+double doubleData = columnPage.getDouble(rowId);
+if (srcDataType == DataTypes.FLOAT) {
+  float out = (float) doubleData;
+  return ByteUtil.toBytes(out);
+} else {
+  return ByteUtil.toBytes(doubleData);
+}
+  } else if (DataTypes.isDecimal(srcDataType)) {
+throw new RuntimeException("unsupported type: " + srcDataType);
+  } else if ((srcDataType == DataTypes.BYTE) ||
+  (srcDataType == DataTypes.BOOLEAN) ||
+  (srcDataType == DataTypes.SHORT) ||
+  (srcDataType == DataTypes.SHORT_INT) ||
+  (srcDataType == DataTypes.INT) ||
+  (srcDataType == DataTypes.LONG) ||
+  (srcDataType == DataTypes.TIMESTAMP)) {
+long longData = columnPage.getLong(rowId);
--- End diff --

Should we read the bytes from column page based type ? Otherwise for small 
types like byte,short, also reading long would consume 8 bytes from the page 
which leads wrong data? 


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-05 Thread sounakr
GitHub user sounakr reopened a pull request:

https://github.com/apache/carbondata/pull/2417

[WIP][Complex Column Enhancements]Primitive DataType Adaptive Encoding

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sounakr/incubator-carbondata 
primitive_adaptive

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2417


commit 546761fc2fba34b7694b9d127373ab11e792bb91
Author: ajantha-bhat 
Date:   2018-07-03T13:35:09Z

backup

commit dc35518631566dc7197c64ca9866e2c0971f2322
Author: ajantha-bhat 
Date:   2018-07-03T17:06:49Z

some more

commit e771e86ff0caf0ae74b3cfeea17c79de845468d8
Author: sounakr 
Date:   2018-07-04T04:13:39Z

Read

commit 14c387a518b2af904f75e30f855e4340280bbc24
Author: ajantha-bhat 
Date:   2018-07-04T04:52:26Z

fix negative array size issue

commit f65742253c0322b357522a759f062502a902e361
Author: ajantha-bhat 
Date:   2018-07-04T07:07:01Z

TODO: revert CCC and test case change

commit 396efbff6a1c10f0c5020836fb669a51f151db62
Author: ajantha-bhat 
Date:   2018-07-04T10:24:04Z

struct of int

commit 26c32ce7d8b3266c883d2f21d9f6fe02e6370141
Author: sounakr 
Date:   2018-07-04T09:38:56Z

Safe Page Changes

commit a0c2c60af324b24bbd111b9f62a1a7acdc698982
Author: sounakr 
Date:   2018-07-04T10:25:07Z

L1

commit dfaedea96449f07fed7a36ea7f47e54b44d83cfc
Author: sounakr 
Date:   2018-07-04T12:01:45Z

Unsafe Fix Changes

commit 210c5ab733ceeafdd3bc1593995da39acb0294a7
Author: ajantha-bhat 
Date:   2018-07-04T12:41:43Z

fixed array type

commit ce35af9fd9fb23e191f11d6d0f441ac9c8068147
Author: ajantha-bhat 
Date:   2018-07-04T15:07:15Z

issue fixes

commit 7eed4c75ea17197c602373ceab3fb6185e1d261b
Author: ajantha-bhat 
Date:   2018-07-04T15:36:13Z

unsafe issue fix

commit b244b9b62ad448c862a1fddfb19126c8e3f22dfd
Author: ajantha-bhat 
Date:   2018-07-04T15:51:06Z

fix style

commit 16d4280cd021ee389e538e23bcf1f62f6e849e7b
Author: ajantha-bhat 
Date:   2018-07-05T02:28:09Z

compilation fix

commit a37ee3dff7f9c43ed88d4205f72f8ebbc6a73c3a
Author: ajantha-bhat 
Date:   2018-07-05T02:45:06Z

null value

commit be861f3b1c7b748605c5be3b0a55d241c5de8394
Author: ajantha-bhat 
Date:   2018-07-05T05:16:53Z

refactoring changes

commit 19ec84d82e27e4ed5f23f8c12bec1b9d507e2ec6
Author: sounakr 
Date:   2018-07-05T10:47:56Z

Float DataType Support

commit f8b5b1adb21b1e42f23e30bd469730c23972ec35
Author: sounakr 
Date:   2018-07-05T11:30:26Z

Refactor

commit 2882c17552f81c821e43ccd6cda79feba7d60873
Author: ajantha-bhat 
Date:   2018-07-05T11:33:14Z

clean up

commit a9e8141b48eda7cabcd4748980196b64ab2cb685
Author: sounakr 
Date:   2018-07-05T15:15:38Z

Adaptive Complex

commit 663f0296dbee80efca0fda3fe39b91d3a9ad57c4
Author: sounakr 
Date:   2018-07-05T16:47:12Z

TimeStamp Adaptive and Date Block




---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-05 Thread sounakr
Github user sounakr closed the pull request at:

https://github.com/apache/carbondata/pull/2417


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-05 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r200320193
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3177,4 +3177,34 @@ public static void 
setLocalDictColumnsToWrapperSchema(List columns
 }
 return columnLocalDictGenMap;
   }
+
+  public static DataType getMappingDataType(String type) {
--- End diff --

done


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-05 Thread dhatchayani
Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r200307851
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3177,4 +3177,34 @@ public static void 
setLocalDictColumnsToWrapperSchema(List columns
 }
 return columnLocalDictGenMap;
   }
+
+  public static DataType getMappingDataType(String type) {
--- End diff --

Please reuse 
org.apache.carbondata.core.util.DataTypeUtil#valueOf(java.lang.String)  this 
method.. This seems to be duplicate method


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-02 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199482930
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -161,14 +174,16 @@ private static DataType fitLongMinMax(long max, long 
min) {
   }
 
   private static DataType fitMinMax(DataType dataType, Object max, Object 
min) {
-if (dataType == DataTypes.BYTE) {
+if ((dataType == DataTypes.BYTE) || (dataType == DataTypes.BOOLEAN)) {
--- End diff --

Use Switch instead of ifelse


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-02 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199481217
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -77,14 +127,70 @@ public boolean isExplicitSorted() {
 return false;
   }
 
-  @Override
-  public int compareTo(int rowId, byte[] compareValue) {
-throw new UnsupportedOperationException("internal error");
+  @Override public int compareTo(int rowId, byte[] compareValue) {
+throw new UnsupportedOperationException(
+"internal error: should be called for only dictionary columns");
   }
 
   @Override
   public void freeMemory() {
 
   }
 
+  private void fillData(int[] rowMapping, ColumnVectorInfo 
columnVectorInfo,
+  CarbonColumnVector vector) {
+int offsetRowId = columnVectorInfo.offset;
+int vectorOffset = columnVectorInfo.vectorOffset;
+int maxRowId = offsetRowId + columnVectorInfo.size;
+BitSet nullBitSet = columnPage.getNullBits();
+TableSpec.ColumnSpec columnSpec = columnPage.getColumnSpec();
+if (columnSpec.getColumnType() == PLAIN_VALUE) {
+  for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+if (nullBitSet.get(currentRowId)) {
+  // to handle the null values
+  vector.putNull(vectorOffset++);
+} else {
+  if (columnSpec.getSchemaDataType() == DataTypes.STRING) {
+byte[] data = columnPage.getBytes(currentRowId);
+if (ByteUtil.UnsafeComparer.INSTANCE
+.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 
data)) {
+  vector.putNull(vectorOffset++);
+} else {
+  vector.putBytes(vectorOffset++, 0, data.length, data);
+}
+  } else if (columnSpec.getSchemaDataType() == DataTypes.BOOLEAN) {
+boolean data = columnPage.getBoolean(currentRowId);
+vector.putBoolean(vectorOffset++, data);
+  } else if (columnSpec.getSchemaDataType() == DataTypes.SHORT) {
+short data = columnPage.getShort(currentRowId);
+vector.putShort(vectorOffset++, data);
--- End diff --

Use Switch instead of ifelse


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-07-02 Thread gvramana
Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199481192
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -30,32 +40,72 @@ public ColumnPageWrapper(ColumnPage columnPage) {
 this.columnPage = columnPage;
   }
 
+  public ColumnPage getColumnPage() {
+return columnPage;
+  }
+
   @Override
   public int fillRawData(int rowId, int offset, byte[] data, 
KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+throw new UnsupportedOperationException(
+"internal error: should be called for only dictionary columns");
   }
 
   @Override
   public int fillSurrogateKey(int rowId, int chunkIndex, int[] 
outputSurrogateKey,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+throw new UnsupportedOperationException(
+"internal error: should be called for only dictionary columns");
   }
 
   @Override
   public int fillVector(ColumnVectorInfo[] vectorInfo, int chunkIndex,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+// fill the vector with data in column page
+ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex];
+CarbonColumnVector vector = columnVectorInfo.vector;
+fillData(null, columnVectorInfo, vector);
+return chunkIndex + 1;
   }
 
+
   @Override
   public int fillVector(int[] filteredRowId, ColumnVectorInfo[] 
vectorInfo, int chunkIndex,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex];
+CarbonColumnVector vector = columnVectorInfo.vector;
+fillData(filteredRowId, columnVectorInfo, vector);
+return chunkIndex + 1;
   }
 
-  @Override
-  public byte[] getChunkData(int rowId) {
-return columnPage.getBytes(rowId);
+  @Override public byte[] getChunkData(int rowId) {
+ColumnType columnType = columnPage.getColumnSpec().getColumnType();
+// TODO: No need to convert to Byte array, handle like measure
+// But interface currently doesn't support, need to add new interface.
+if (columnType == ColumnType.PLAIN_VALUE) {
+  if (columnPage.getNullBits().get(rowId)) {
+// if this row is null, return default null represent in byte array
+return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY;
+  }
+  if (columnPage.getDataType() == DataTypes.BYTE) {
+byte byteData = columnPage.getByte(rowId);
+return ByteUtil.toBytes(byteData);
+  } else if (columnPage.getDataType() == DataTypes.SHORT) {
--- End diff --

Use Switch instead of ifelse


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-29 Thread sounakr
Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199081933
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java ---
@@ -405,13 +446,30 @@ public void putData(int rowId, Object value) {
 } else if (dataType == DataTypes.STRING
 || dataType == DataTypes.BYTE_ARRAY
 || dataType == DataTypes.VARCHAR) {
-  putBytes(rowId, (byte[]) value);
-  statsCollector.update((byte[]) value);
+  byte[] valueWithLength;
+  if (columnSpec.getColumnType() != ColumnType.PLAIN_VALUE) {
+// This case is for GLOBAL_DICTIONARY and DIRECT_DICTIONARY. In 
this
+// scenario the dataType is BYTE_ARRAY and passed bytearray should
+// be saved.
+putBytes(rowId, (byte[]) value);
+statsCollector.update((byte[]) value);
+  } else {
+if (dataType == DataTypes.VARCHAR) {
+  // Add length and then add the data.
+  valueWithLength = addIntLengthToByteArray((byte[]) value);
+} else {
+  valueWithLength = addShortLengthToByteArray((byte[]) value);
+}
+putBytes(rowId, valueWithLength);
+statsCollector.update((byte[]) valueWithLength);
+  }
 } else {
   throw new RuntimeException("unsupported data type: " + dataType);
 }
   }
 
+
--- End diff --

Done


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-29 Thread sounakr
Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199081333
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -58,16 +67,36 @@ public static EncodingFactory getInstance() {
   @Override
   public ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, 
ColumnPage inputPage) {
 // TODO: add log
+ColumnPageEncoder pageEncoder = null;
 if (columnSpec instanceof TableSpec.MeasureSpec) {
   return createEncoderForMeasure(inputPage);
-} else {
-  if (newWay) {
-return createEncoderForDimension((TableSpec.DimensionSpec) 
columnSpec, inputPage);
-  } else {
-assert columnSpec instanceof TableSpec.DimensionSpec;
+} else if (columnSpec instanceof TableSpec.DimensionSpec) {
+  pageEncoder = createCodecForDimension((TableSpec.DimensionSpec) 
columnSpec, inputPage);
+  if (pageEncoder == null) {
 return createEncoderForDimensionLegacy((TableSpec.DimensionSpec) 
columnSpec);
   }
 }
+return pageEncoder;
+  }
+
+  private ColumnPageEncoder 
createCodecForDimension(TableSpec.DimensionSpec columnSpec,
+  ColumnPage inputPage) {
+switch (columnSpec.getColumnType()) {
+  case PLAIN_VALUE:
+if ((inputPage.getDataType() == DataTypes.BYTE) || 
(inputPage.getDataType()
+== DataTypes.SHORT) || (inputPage.getDataType() == 
DataTypes.INT) || (
+inputPage.getDataType() == DataTypes.LONG)) {
+  return 
selectCodecByAlgorithmForIntegral(inputPage.getStatistics()).createEncoder(null);
+} else if ((inputPage.getDataType() == DataTypes.FLOAT) || 
(inputPage.getDataType()
+== DataTypes.DOUBLE)) {
+  return 
selectCodecByAlgorithmForFloating(inputPage.getStatistics()).createEncoder(null);
+} else if (inputPage.getDataType() == DataTypes.STRING) {
+  // TODO. Currently let string go through legacy encoding. Later 
will change the encoding.
+  return null;
+}
+break;
+}
+return null;
   }
 
   private ColumnPageEncoder 
createEncoderForDimension(TableSpec.DimensionSpec columnSpec,
--- End diff --

Done


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-29 Thread sounakr
Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199081394
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -436,10 +436,11 @@ public static boolean isFixedSizeDataType(DataType 
dataType) {
*
* @param dataInBytesdata
* @param actualDataType actual data type
+   * @param isTimeStampConversion
* @return actual data after conversion
*/
   public static Object getDataBasedOnDataTypeForNoDictionaryColumn(byte[] 
dataInBytes,
-  DataType actualDataType) {
+  DataType actualDataType, boolean isTimeStampConversion) {
--- End diff --

Done


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-29 Thread sounakr
Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199081355
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java 
---
@@ -354,6 +341,18 @@ public EncodedTablePage getEncodedTablePage() {
   .getColumnType());
   }
 }
+//for (int i = 0; i < dimensionPages.length; i++) {
--- End diff --

Done


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-29 Thread sounakr
Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r199081296
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -38,6 +38,15 @@
 import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import 
org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
 
+import static 
org.apache.carbondata.core.metadata.datatype.DataTypes.BOOLEAN;
--- End diff --

Done


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198877123
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -436,10 +436,11 @@ public static boolean isFixedSizeDataType(DataType 
dataType) {
*
* @param dataInBytesdata
* @param actualDataType actual data type
+   * @param isTimeStampConversion
* @return actual data after conversion
*/
   public static Object getDataBasedOnDataTypeForNoDictionaryColumn(byte[] 
dataInBytes,
-  DataType actualDataType) {
+  DataType actualDataType, boolean isTimeStampConversion) {
--- End diff --

Add one more method to pass the `isTimeStampConversion`


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198877038
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java 
---
@@ -354,6 +341,18 @@ public EncodedTablePage getEncodedTablePage() {
   .getColumnType());
   }
 }
+//for (int i = 0; i < dimensionPages.length; i++) {
--- End diff --

remove commented code


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198874999
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -58,16 +67,36 @@ public static EncodingFactory getInstance() {
   @Override
   public ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, 
ColumnPage inputPage) {
 // TODO: add log
+ColumnPageEncoder pageEncoder = null;
 if (columnSpec instanceof TableSpec.MeasureSpec) {
   return createEncoderForMeasure(inputPage);
-} else {
-  if (newWay) {
-return createEncoderForDimension((TableSpec.DimensionSpec) 
columnSpec, inputPage);
-  } else {
-assert columnSpec instanceof TableSpec.DimensionSpec;
+} else if (columnSpec instanceof TableSpec.DimensionSpec) {
+  pageEncoder = createCodecForDimension((TableSpec.DimensionSpec) 
columnSpec, inputPage);
+  if (pageEncoder == null) {
 return createEncoderForDimensionLegacy((TableSpec.DimensionSpec) 
columnSpec);
   }
 }
+return pageEncoder;
+  }
+
+  private ColumnPageEncoder 
createCodecForDimension(TableSpec.DimensionSpec columnSpec,
+  ColumnPage inputPage) {
+switch (columnSpec.getColumnType()) {
+  case PLAIN_VALUE:
+if ((inputPage.getDataType() == DataTypes.BYTE) || 
(inputPage.getDataType()
+== DataTypes.SHORT) || (inputPage.getDataType() == 
DataTypes.INT) || (
+inputPage.getDataType() == DataTypes.LONG)) {
+  return 
selectCodecByAlgorithmForIntegral(inputPage.getStatistics()).createEncoder(null);
+} else if ((inputPage.getDataType() == DataTypes.FLOAT) || 
(inputPage.getDataType()
+== DataTypes.DOUBLE)) {
+  return 
selectCodecByAlgorithmForFloating(inputPage.getStatistics()).createEncoder(null);
+} else if (inputPage.getDataType() == DataTypes.STRING) {
+  // TODO. Currently let string go through legacy encoding. Later 
will change the encoding.
+  return null;
+}
+break;
+}
+return null;
   }
 
   private ColumnPageEncoder 
createEncoderForDimension(TableSpec.DimensionSpec columnSpec,
--- End diff --

remove it , no body uses it


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198874161
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -38,6 +38,15 @@
 import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import 
org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
 
+import static 
org.apache.carbondata.core.metadata.datatype.DataTypes.BOOLEAN;
--- End diff --

just add `import static 
org.apache.carbondata.core.metadata.datatype.DataTypes.*`


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198873521
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/LazyColumnPage.java
 ---
@@ -283,16 +283,16 @@ public byte getByte(int rowId) {
 
   @Override
   public short getShort(int rowId) {
--- End diff --

Check for float also


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198872182
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java ---
@@ -405,13 +446,30 @@ public void putData(int rowId, Object value) {
 } else if (dataType == DataTypes.STRING
 || dataType == DataTypes.BYTE_ARRAY
 || dataType == DataTypes.VARCHAR) {
-  putBytes(rowId, (byte[]) value);
-  statsCollector.update((byte[]) value);
+  byte[] valueWithLength;
+  if (columnSpec.getColumnType() != ColumnType.PLAIN_VALUE) {
+// This case is for GLOBAL_DICTIONARY and DIRECT_DICTIONARY. In 
this
+// scenario the dataType is BYTE_ARRAY and passed bytearray should
+// be saved.
+putBytes(rowId, (byte[]) value);
+statsCollector.update((byte[]) value);
+  } else {
+if (dataType == DataTypes.VARCHAR) {
+  // Add length and then add the data.
+  valueWithLength = addIntLengthToByteArray((byte[]) value);
+} else {
+  valueWithLength = addShortLengthToByteArray((byte[]) value);
+}
+putBytes(rowId, valueWithLength);
+statsCollector.update((byte[]) valueWithLength);
+  }
 } else {
   throw new RuntimeException("unsupported data type: " + dataType);
 }
   }
 
+
--- End diff --

remove unnecessary gaps


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198872075
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java ---
@@ -405,13 +446,30 @@ public void putData(int rowId, Object value) {
 } else if (dataType == DataTypes.STRING
 || dataType == DataTypes.BYTE_ARRAY
 || dataType == DataTypes.VARCHAR) {
-  putBytes(rowId, (byte[]) value);
-  statsCollector.update((byte[]) value);
+  byte[] valueWithLength;
+  if (columnSpec.getColumnType() != ColumnType.PLAIN_VALUE) {
+// This case is for GLOBAL_DICTIONARY and DIRECT_DICTIONARY. In 
this
+// scenario the dataType is BYTE_ARRAY and passed bytearray should
+// be saved.
+putBytes(rowId, (byte[]) value);
+statsCollector.update((byte[]) value);
+  } else {
+if (dataType == DataTypes.VARCHAR) {
+  // Add length and then add the data.
+  valueWithLength = addIntLengthToByteArray((byte[]) value);
+} else {
+  valueWithLength = addShortLengthToByteArray((byte[]) value);
+}
+putBytes(rowId, valueWithLength);
+statsCollector.update((byte[]) valueWithLength);
--- End diff --

Move down


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198868873
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -77,14 +134,115 @@ public boolean isExplicitSorted() {
 return false;
   }
 
-  @Override
-  public int compareTo(int rowId, byte[] compareValue) {
-throw new UnsupportedOperationException("internal error");
+  @Override public int compareTo(int rowId, byte[] compareValue) {
+if (columnPage.getColumnSpec().getColumnType() == 
ColumnType.DIRECT_DICTIONARY) {
+  int surrogate = columnPage.getInt(rowId);
+  int input = ByteBuffer.wrap(compareValue).getInt();
+  return surrogate - input;
+} else {
+  byte[] data;
+  if (columnPage.getDataType() == DataTypes.INT) {
--- End diff --

First convert `compareValue` to respective datatype and compare with actual 
value


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198867991
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -77,14 +134,115 @@ public boolean isExplicitSorted() {
 return false;
   }
 
-  @Override
-  public int compareTo(int rowId, byte[] compareValue) {
-throw new UnsupportedOperationException("internal error");
+  @Override public int compareTo(int rowId, byte[] compareValue) {
+if (columnPage.getColumnSpec().getColumnType() == 
ColumnType.DIRECT_DICTIONARY) {
--- End diff --

remove dictionary


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198866126
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -17,45 +17,102 @@
 
 package org.apache.carbondata.core.datastore.chunk.store;
 
+import java.nio.ByteBuffer;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.ColumnType;
+import org.apache.carbondata.core.datastore.TableSpec;
 import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.page.ColumnPage;
+import 
org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
 import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import org.apache.carbondata.core.util.ByteUtil;
 
 public class ColumnPageWrapper implements DimensionColumnPage {
 
   private ColumnPage columnPage;
+  private TableSpec.ColumnSpec columnSpec;
+  private int columnValueSize = 0;
+
 
   public ColumnPageWrapper(ColumnPage columnPage) {
 this.columnPage = columnPage;
+this.columnSpec = columnPage.getColumnSpec();
   }
 
   @Override
   public int fillRawData(int rowId, int offset, byte[] data, 
KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+// TODO verify the implementation. Mostly this is for dictionary.
+int surrogate = columnPage.getInt(rowId);
+ByteBuffer buffer = ByteBuffer.wrap(data);
+buffer.putInt(offset, surrogate);
+return columnValueSize;
   }
 
   @Override
   public int fillSurrogateKey(int rowId, int chunkIndex, int[] 
outputSurrogateKey,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+outputSurrogateKey[chunkIndex] = columnPage.getInt(rowId);
+return chunkIndex + 1;
   }
 
   @Override
   public int fillVector(ColumnVectorInfo[] vectorInfo, int chunkIndex,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+// fill the vector with data in column page
+ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex];
+CarbonColumnVector vector = columnVectorInfo.vector;
+fillData(null, columnVectorInfo, vector);
+return chunkIndex + 1;
   }
 
+
   @Override
   public int fillVector(int[] filteredRowId, ColumnVectorInfo[] 
vectorInfo, int chunkIndex,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+ColumnVectorInfo columnVectorInfo = vectorInfo[chunkIndex];
+CarbonColumnVector vector = columnVectorInfo.vector;
+fillData(filteredRowId, columnVectorInfo, vector);
+return chunkIndex + 1;
   }
 
-  @Override
-  public byte[] getChunkData(int rowId) {
-return columnPage.getBytes(rowId);
+  @Override public byte[] getChunkData(int rowId) {
+ColumnType columnType = columnPage.getColumnSpec().getColumnType();
+if (columnType == ColumnType.DIRECT_DICTIONARY) {
--- End diff --

remove dictionary and direct dicionary


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198865395
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -77,14 +134,115 @@ public boolean isExplicitSorted() {
 return false;
   }
 
-  @Override
-  public int compareTo(int rowId, byte[] compareValue) {
-throw new UnsupportedOperationException("internal error");
+  @Override public int compareTo(int rowId, byte[] compareValue) {
+if (columnPage.getColumnSpec().getColumnType() == 
ColumnType.DIRECT_DICTIONARY) {
+  int surrogate = columnPage.getInt(rowId);
+  int input = ByteBuffer.wrap(compareValue).getInt();
+  return surrogate - input;
+} else {
+  byte[] data;
+  if (columnPage.getDataType() == DataTypes.INT) {
+data = ByteUtil.toBytes(columnPage.getInt(rowId));
+  } else if (columnPage.getDataType() == DataTypes.STRING) {
+data = columnPage.getBytes(rowId);
+  } else {
+throw new RuntimeException("invalid data type for dimension: " + 
columnPage.getDataType());
+  }
+  return ByteUtil.UnsafeComparer.INSTANCE
+  .compareTo(data, 0, data.length, compareValue, 0, 
compareValue.length);
+}
   }
 
   @Override
   public void freeMemory() {
 
   }
 
+  private void fillData(int[] rowMapping, ColumnVectorInfo 
columnVectorInfo,
+  CarbonColumnVector vector) {
+int offsetRowId = columnVectorInfo.offset;
+int vectorOffset = columnVectorInfo.vectorOffset;
+int maxRowId = offsetRowId + columnVectorInfo.size;
+BitSet nullBitset = columnPage.getNullBits();
+switch (columnSpec.getColumnType()) {
+  case DIRECT_DICTIONARY:
+DirectDictionaryGenerator generator = 
columnVectorInfo.directDictionaryGenerator;
+assert (generator != null);
+DataType dataType = generator.getReturnType();
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+int surrogate = columnPage.getInt(currentRowId);
+Object valueFromSurrogate = 
generator.getValueFromSurrogate(surrogate);
+if (valueFromSurrogate == null) {
+  vector.putNull(vectorOffset++);
+} else {
+  if (dataType == DataTypes.INT) {
+vector.putInt(vectorOffset++, (int) valueFromSurrogate);
+  } else {
+vector.putLong(vectorOffset++, (long) valueFromSurrogate);
+  }
+}
+  }
+}
+break;
+  case GLOBAL_DICTIONARY:
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+int data = columnPage.getInt(currentRowId);
+vector.putInt(vectorOffset++, data);
+  }
+}
+break;
+  case PLAIN_VALUE:
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+if (columnSpec.getSchemaDataType() == DataTypes.STRING) {
+  byte[] data = columnPage.getBytes(currentRowId);
+  if (isNullPlainValue(data)) {
+vector.putNull(vectorOffset++);
+  } else {
+vector.putBytes(vectorOffset++, 0, data.length, data);
+  }
+} else if (columnSpec.getSchemaDataType() == 
DataTypes.BOOLEAN) {
+  boolean data = columnPage.getBoolean(currentRowId);
+  vector.putBoolean(vectorOffset++, (boolean) data);
+} else if (columnSpec.getSchemaDataType() == DataTypes.INT) {
+  // TODO have to check for other dataTypes. Only INT 
Specified Now.
+  int data = columnPage.getInt(currentRowId);
+  vector.putInt(vectorOffset++, (int) data);
+} else if (columnSpec.getSchemaDataType() == DataTypes.LONG) {
+  long data = columnPage.getLong(currentRowId);
+  vector.putLong(vectorOffset++, (long) data);
+} else if (columnSpec.getSchemaDataType() == 
DataTypes.TIMESTAMP) {
+  long 

[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198864334
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -77,14 +134,115 @@ public boolean isExplicitSorted() {
 return false;
   }
 
-  @Override
-  public int compareTo(int rowId, byte[] compareValue) {
-throw new UnsupportedOperationException("internal error");
+  @Override public int compareTo(int rowId, byte[] compareValue) {
+if (columnPage.getColumnSpec().getColumnType() == 
ColumnType.DIRECT_DICTIONARY) {
+  int surrogate = columnPage.getInt(rowId);
+  int input = ByteBuffer.wrap(compareValue).getInt();
+  return surrogate - input;
+} else {
+  byte[] data;
+  if (columnPage.getDataType() == DataTypes.INT) {
+data = ByteUtil.toBytes(columnPage.getInt(rowId));
+  } else if (columnPage.getDataType() == DataTypes.STRING) {
+data = columnPage.getBytes(rowId);
+  } else {
+throw new RuntimeException("invalid data type for dimension: " + 
columnPage.getDataType());
+  }
+  return ByteUtil.UnsafeComparer.INSTANCE
+  .compareTo(data, 0, data.length, compareValue, 0, 
compareValue.length);
+}
   }
 
   @Override
   public void freeMemory() {
 
   }
 
+  private void fillData(int[] rowMapping, ColumnVectorInfo 
columnVectorInfo,
+  CarbonColumnVector vector) {
+int offsetRowId = columnVectorInfo.offset;
+int vectorOffset = columnVectorInfo.vectorOffset;
+int maxRowId = offsetRowId + columnVectorInfo.size;
+BitSet nullBitset = columnPage.getNullBits();
+switch (columnSpec.getColumnType()) {
+  case DIRECT_DICTIONARY:
+DirectDictionaryGenerator generator = 
columnVectorInfo.directDictionaryGenerator;
+assert (generator != null);
+DataType dataType = generator.getReturnType();
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+int surrogate = columnPage.getInt(currentRowId);
+Object valueFromSurrogate = 
generator.getValueFromSurrogate(surrogate);
+if (valueFromSurrogate == null) {
+  vector.putNull(vectorOffset++);
+} else {
+  if (dataType == DataTypes.INT) {
+vector.putInt(vectorOffset++, (int) valueFromSurrogate);
+  } else {
+vector.putLong(vectorOffset++, (long) valueFromSurrogate);
+  }
+}
+  }
+}
+break;
+  case GLOBAL_DICTIONARY:
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+int data = columnPage.getInt(currentRowId);
+vector.putInt(vectorOffset++, data);
+  }
+}
+break;
+  case PLAIN_VALUE:
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+if (columnSpec.getSchemaDataType() == DataTypes.STRING) {
+  byte[] data = columnPage.getBytes(currentRowId);
+  if (isNullPlainValue(data)) {
+vector.putNull(vectorOffset++);
+  } else {
+vector.putBytes(vectorOffset++, 0, data.length, data);
+  }
+} else if (columnSpec.getSchemaDataType() == 
DataTypes.BOOLEAN) {
+  boolean data = columnPage.getBoolean(currentRowId);
+  vector.putBoolean(vectorOffset++, (boolean) data);
+} else if (columnSpec.getSchemaDataType() == DataTypes.INT) {
+  // TODO have to check for other dataTypes. Only INT 
Specified Now.
+  int data = columnPage.getInt(currentRowId);
+  vector.putInt(vectorOffset++, (int) data);
--- End diff --

remove all typecasts, not required


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198864176
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -77,14 +134,115 @@ public boolean isExplicitSorted() {
 return false;
   }
 
-  @Override
-  public int compareTo(int rowId, byte[] compareValue) {
-throw new UnsupportedOperationException("internal error");
+  @Override public int compareTo(int rowId, byte[] compareValue) {
+if (columnPage.getColumnSpec().getColumnType() == 
ColumnType.DIRECT_DICTIONARY) {
+  int surrogate = columnPage.getInt(rowId);
+  int input = ByteBuffer.wrap(compareValue).getInt();
+  return surrogate - input;
+} else {
+  byte[] data;
+  if (columnPage.getDataType() == DataTypes.INT) {
+data = ByteUtil.toBytes(columnPage.getInt(rowId));
+  } else if (columnPage.getDataType() == DataTypes.STRING) {
+data = columnPage.getBytes(rowId);
+  } else {
+throw new RuntimeException("invalid data type for dimension: " + 
columnPage.getDataType());
+  }
+  return ByteUtil.UnsafeComparer.INSTANCE
+  .compareTo(data, 0, data.length, compareValue, 0, 
compareValue.length);
+}
   }
 
   @Override
   public void freeMemory() {
 
   }
 
+  private void fillData(int[] rowMapping, ColumnVectorInfo 
columnVectorInfo,
+  CarbonColumnVector vector) {
+int offsetRowId = columnVectorInfo.offset;
+int vectorOffset = columnVectorInfo.vectorOffset;
+int maxRowId = offsetRowId + columnVectorInfo.size;
+BitSet nullBitset = columnPage.getNullBits();
+switch (columnSpec.getColumnType()) {
+  case DIRECT_DICTIONARY:
+DirectDictionaryGenerator generator = 
columnVectorInfo.directDictionaryGenerator;
+assert (generator != null);
+DataType dataType = generator.getReturnType();
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+int surrogate = columnPage.getInt(currentRowId);
+Object valueFromSurrogate = 
generator.getValueFromSurrogate(surrogate);
+if (valueFromSurrogate == null) {
+  vector.putNull(vectorOffset++);
+} else {
+  if (dataType == DataTypes.INT) {
+vector.putInt(vectorOffset++, (int) valueFromSurrogate);
+  } else {
+vector.putLong(vectorOffset++, (long) valueFromSurrogate);
+  }
+}
+  }
+}
+break;
+  case GLOBAL_DICTIONARY:
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+int data = columnPage.getInt(currentRowId);
+vector.putInt(vectorOffset++, data);
+  }
+}
+break;
+  case PLAIN_VALUE:
+for (int rowId = offsetRowId; rowId < maxRowId; rowId++) {
+  int currentRowId = (rowMapping == null) ? rowId : 
rowMapping[rowId];
+  if (nullBitset.get(currentRowId)) {
+vector.putNull(vectorOffset++);
+  } else {
+if (columnSpec.getSchemaDataType() == DataTypes.STRING) {
+  byte[] data = columnPage.getBytes(currentRowId);
+  if (isNullPlainValue(data)) {
+vector.putNull(vectorOffset++);
+  } else {
+vector.putBytes(vectorOffset++, 0, data.length, data);
+  }
+} else if (columnSpec.getSchemaDataType() == 
DataTypes.BOOLEAN) {
+  boolean data = columnPage.getBoolean(currentRowId);
+  vector.putBoolean(vectorOffset++, (boolean) data);
+} else if (columnSpec.getSchemaDataType() == DataTypes.INT) {
+  // TODO have to check for other dataTypes. Only INT 
Specified Now.
--- End diff --

remove it


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198864088
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -77,14 +134,115 @@ public boolean isExplicitSorted() {
 return false;
   }
 
-  @Override
-  public int compareTo(int rowId, byte[] compareValue) {
-throw new UnsupportedOperationException("internal error");
+  @Override public int compareTo(int rowId, byte[] compareValue) {
+if (columnPage.getColumnSpec().getColumnType() == 
ColumnType.DIRECT_DICTIONARY) {
+  int surrogate = columnPage.getInt(rowId);
+  int input = ByteBuffer.wrap(compareValue).getInt();
+  return surrogate - input;
+} else {
+  byte[] data;
+  if (columnPage.getDataType() == DataTypes.INT) {
+data = ByteUtil.toBytes(columnPage.getInt(rowId));
+  } else if (columnPage.getDataType() == DataTypes.STRING) {
+data = columnPage.getBytes(rowId);
+  } else {
+throw new RuntimeException("invalid data type for dimension: " + 
columnPage.getDataType());
+  }
+  return ByteUtil.UnsafeComparer.INSTANCE
+  .compareTo(data, 0, data.length, compareValue, 0, 
compareValue.length);
+}
   }
 
   @Override
   public void freeMemory() {
 
   }
 
+  private void fillData(int[] rowMapping, ColumnVectorInfo 
columnVectorInfo,
+  CarbonColumnVector vector) {
+int offsetRowId = columnVectorInfo.offset;
+int vectorOffset = columnVectorInfo.vectorOffset;
+int maxRowId = offsetRowId + columnVectorInfo.size;
+BitSet nullBitset = columnPage.getNullBits();
+switch (columnSpec.getColumnType()) {
+  case DIRECT_DICTIONARY:
--- End diff --

No need to handle `DIRECT_DICTIONARY` and 'GLOBAL_DICTIONARY'


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198861716
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -17,45 +17,102 @@
 
 package org.apache.carbondata.core.datastore.chunk.store;
 
+import java.nio.ByteBuffer;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.ColumnType;
+import org.apache.carbondata.core.datastore.TableSpec;
 import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.page.ColumnPage;
+import 
org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
 import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import org.apache.carbondata.core.util.ByteUtil;
 
 public class ColumnPageWrapper implements DimensionColumnPage {
 
   private ColumnPage columnPage;
+  private TableSpec.ColumnSpec columnSpec;
+  private int columnValueSize = 0;
+
 
   public ColumnPageWrapper(ColumnPage columnPage) {
 this.columnPage = columnPage;
+this.columnSpec = columnPage.getColumnSpec();
   }
 
   @Override
   public int fillRawData(int rowId, int offset, byte[] data, 
KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+// TODO verify the implementation. Mostly this is for dictionary.
+int surrogate = columnPage.getInt(rowId);
+ByteBuffer buffer = ByteBuffer.wrap(data);
+buffer.putInt(offset, surrogate);
+return columnValueSize;
   }
 
   @Override
   public int fillSurrogateKey(int rowId, int chunkIndex, int[] 
outputSurrogateKey,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+outputSurrogateKey[chunkIndex] = columnPage.getInt(rowId);
--- End diff --

not required, remove


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198861844
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -17,45 +17,102 @@
 
 package org.apache.carbondata.core.datastore.chunk.store;
 
+import java.nio.ByteBuffer;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.ColumnType;
+import org.apache.carbondata.core.datastore.TableSpec;
 import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.page.ColumnPage;
+import 
org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
 import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import org.apache.carbondata.core.util.ByteUtil;
 
 public class ColumnPageWrapper implements DimensionColumnPage {
 
   private ColumnPage columnPage;
+  private TableSpec.ColumnSpec columnSpec;
+  private int columnValueSize = 0;
+
 
   public ColumnPageWrapper(ColumnPage columnPage) {
 this.columnPage = columnPage;
+this.columnSpec = columnPage.getColumnSpec();
   }
 
   @Override
   public int fillRawData(int rowId, int offset, byte[] data, 
KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+// TODO verify the implementation. Mostly this is for dictionary.
+int surrogate = columnPage.getInt(rowId);
+ByteBuffer buffer = ByteBuffer.wrap(data);
+buffer.putInt(offset, surrogate);
+return columnValueSize;
   }
 
   @Override
   public int fillSurrogateKey(int rowId, int chunkIndex, int[] 
outputSurrogateKey,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+outputSurrogateKey[chunkIndex] = columnPage.getInt(rowId);
+return chunkIndex + 1;
   }
 
   @Override
   public int fillVector(ColumnVectorInfo[] vectorInfo, int chunkIndex,
   KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+// fill the vector with data in column page
--- End diff --

no need


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198861537
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 ---
@@ -17,45 +17,102 @@
 
 package org.apache.carbondata.core.datastore.chunk.store;
 
+import java.nio.ByteBuffer;
+import java.util.BitSet;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.ColumnType;
+import org.apache.carbondata.core.datastore.TableSpec;
 import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.page.ColumnPage;
+import 
org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import org.apache.carbondata.core.scan.executor.infos.KeyStructureInfo;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
 import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import org.apache.carbondata.core.util.ByteUtil;
 
 public class ColumnPageWrapper implements DimensionColumnPage {
 
   private ColumnPage columnPage;
+  private TableSpec.ColumnSpec columnSpec;
+  private int columnValueSize = 0;
+
 
   public ColumnPageWrapper(ColumnPage columnPage) {
 this.columnPage = columnPage;
+this.columnSpec = columnPage.getColumnSpec();
   }
 
   @Override
   public int fillRawData(int rowId, int offset, byte[] data, 
KeyStructureInfo restructuringInfo) {
-throw new UnsupportedOperationException("internal error");
+// TODO verify the implementation. Mostly this is for dictionary.
+int surrogate = columnPage.getInt(rowId);
+ByteBuffer buffer = ByteBuffer.wrap(data);
+buffer.putInt(offset, surrogate);
+return columnValueSize;
--- End diff --

No implementation required here as it is used only for dictionary. It will 
go to FixedChunReader so just `throw new UnsupportedOperationExceptio`


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-28 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2417#discussion_r198857878
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
 ---
@@ -221,6 +221,9 @@ private boolean isEncodedWithMeta(DataChunk2 
pageMetadata) {
 if (encodings != null && encodings.size() == 1) {
   Encoding encoding = encodings.get(0);
   switch (encoding) {
+case ADAPTIVE_INTEGRAL:
+case ADAPTIVE_DELTA_INTEGRAL:
+case ADAPTIVE_FLOATING:
--- End diff --

`ADAPTIVE_DELTA_FLOATING` is not required to add here?


---


[GitHub] carbondata pull request #2417: [WIP][Complex Column Enhancements]Primitive D...

2018-06-26 Thread sounakr
GitHub user sounakr opened a pull request:

https://github.com/apache/carbondata/pull/2417

[WIP][Complex Column Enhancements]Primitive DataType Adaptive Encoding

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sounakr/incubator-carbondata 
primitive_adaptive

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2417


commit 6d8a958726af35103d3a64d081e28fc2c423c990
Author: sounakr 
Date:   2018-06-26T11:18:39Z

Adaptive Encoding For No Dictionary For INT DataType

commit 86e1756f19c34ffd2020c1cf3f177bf3205fdabe
Author: sounakr 
Date:   2018-06-26T14:04:49Z

String DataType No Dictionary Rectification

commit 16bf26e8504221d7213ed655065bde921aef71e1
Author: sounakr 
Date:   2018-06-26T14:47:31Z

Long DataType No Dictionary

commit 97380d56dd70f7a158fdda86443e13fb270acc5e
Author: sounakr 
Date:   2018-06-26T22:16:37Z

TimeStamp DataType No Dictionary

commit 3343d0074c908cd6ee6b1f6fb481179f4f37d1fb
Author: sounakr 
Date:   2018-06-27T03:07:01Z

Review




---