[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

2018-09-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2778
  
LGTM


---


[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

2018-09-28 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2778
  
@xuchuanyin .`CarbonLRUCache` instance is one per JVM but cache 
implementation is different for different purpose like we have separate cache 
provider implementation for blockDataMap and dictionary. You can check the 
`CacheProvider` class to get some more details on it.
Cache is a standard interface which has very generic methods. Because we 
are storing all the entries in LRU cache to control the memory through LRU 
eviction all the keys are being stored in LRU cache map. BloomDataMap should 
also be aware about its keys and then use the invalidate API to clean all the 
required keys. Check the `clear` method implementation of BlockletDataMpaFactory


---


[GitHub] carbondata issue #2778: [CARBONDATA-2980][BloomDataMap] Fix bug in clearing ...

2018-09-28 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2778
  
@xuchuanyin ...I think this PR changes are not correct...you are calling 
`lruCache.clear()` which will clear the complete LRU cache mapthat means 
all the entries for all the tables will be cleared because LRUCache instance is 
only 1 per JVM.I believe through this PR you are only trying to clear the 
stale entry for table being recreated
You can check the flow of drop table to understand how the dataMap cache 
gets cleared on dropping the table and try to incorporate Bloom dataMap cache 
clearing changes as per that. You can use the existing API `void invalidate(K 
key)` and try not to introduce a new API `void invalidateAll()`


---


[GitHub] carbondata pull request #2774: [CARBONDATA-2979] select count fails when car...

2018-09-27 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2774

[CARBONDATA-2979] select count fails when carbondata file is written 
through SDK and read through sparkfileformat for complex datatype 
map(struct->array->map)

**Problem**
Select query failed issue for map type when data is loaded using avro SDK 
and external table using carbon file format is used to query the data

**Analysis**
When data is loaded through Avro SDK which has a schema of type 
struct>, fieldName was hard coded to "val" because of which during 
query the schema written in the file footer and schema inferred for the 
external table had a mismatch which lead to failure.

**Solution**
Instead of hard coding the field value as "val" use the given field name in 
the schema

 - [ ] Any interfaces changed?
No 
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
UT Added
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata fix_map_type_issues

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2774.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2774


commit 1e1eda1dcfcfdf424c9fa7748555d4400c6c14c3
Author: manishgupta88 
Date:   2018-09-27T12:32:34Z

Fixed query failure issue for map type when data is loaded using avro SDK 
and external table using carbon file format is used to query the data




---


[GitHub] carbondata issue #2758: [CARBONDATA-2972] Debug Logs and function added for ...

2018-09-27 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2758
  
LGTM


---


[GitHub] carbondata issue #2766: [CARBONDATA-2973] Added documentation for fallback c...

2018-09-26 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2766
  
LGTM


---


[GitHub] carbondata pull request #2758: [CARBONDATA-2972] Debug Logs and function add...

2018-09-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2758#discussion_r220531471
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoder.java
 ---
@@ -78,6 +78,13 @@ public DataType getTargetDataType(ColumnPage inputPage) {
 }
   }
 
+  public Encoding getTypeOfEncoding() {
+if (CarbonUtil.isEncodedWithMeta(getEncodingList())) {
+  return getEncodingList().get(0);
--- End diff --

1. `getEncodingList()` is called 2 times. Hold its value in one variable 
and use at both the places
2. Change the method name from `getTypeOfEncoding()` to `getEncodingType()`


---


[GitHub] carbondata pull request #2759: [HOTFIX] Fix NPE in LRU cache when entry from...

2018-09-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2759#discussion_r220505034
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/memory/UnsafeMemoryManager.java 
---
@@ -182,10 +224,15 @@ public long getUsableMemory() {
*/
   public static MemoryBlock allocateMemoryWithRetry(long taskId, long size)
   throws MemoryException {
+return allocateMemoryWithRetry(taskId, size, false);
+  }
+
+  public static MemoryBlock allocateMemoryWithRetry(long taskId, long 
size, boolean isDriver)
--- End diff --

ok done


---


[GitHub] carbondata pull request #2759: [HOTFIX] Fix NPE in LRU cache when entry from...

2018-09-25 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2759

[HOTFIX] Fix NPE in LRU cache when entry from the same table is getting 
evicted to load another entry from same table

**Problem**
When driver LRU cache size is configured to a small value then on running 
concurrent queries sometimes while loading the block dataMap in LRU cache one 
of the dataMap entries from the same table is getting deleted because of 
shortage of space. Due to this in the flow after loading the dataMap cache NPE 
is thrown.
This is because when an cacheable entry is removed from LRU cache then 
invalidate is called on that cacheable entry to clear the unsafe memory used by 
that entry. Invalidate method makes the references null and clears the unsafe 
memory which leads to NPE when accessed again.

**Solution**
Currently dataMap cache uses unsafe offheap memory for datamap caching. To 
avoid this the code is modified to use unsafe with onheap so that JVM itself 
takes care of clearing the memory when required. We do not require to 
explicitly set the references to null.

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Yes verified manually   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata lru_cache_NPE

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2759


commit 598fceb3e7ee4f5797ef75ef04cd3b50c426984b
Author: manishgupta88 
Date:   2018-09-25T13:51:08Z

Fix NPE in LRU cache when entry from the same table is gettign evicted to 
load another entry




---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-09-18 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
LGTM


---


[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2725
  
@xuchuanyin ...please check my reply on the mailing list

http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Optimizing-the-writing-of-min-max-for-a-column-td62515.html


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298762
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1831,6 +1831,16 @@
 
   public static final short LOCAL_DICT_ENCODED_BYTEARRAY_SIZE = 3;
 
+  /**
+   * property to be used for specifying the max character limit for 
string/varchar data type till
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298776
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader 
fileReader, int columnInde
* @return
*/
   BitSetGroup getIndexedData();
+
+  /**
+   * Return the array which contains the flag for each column whether the 
min max for that column
+   * is written in metadata or not
+   *
+   * @return
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218298749
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapRowIndexes.java
 ---
@@ -40,12 +40,14 @@
 
   int BLOCK_LENGTH = 8;
 
+  int BLOCK_MIN_MAX_FLAG = 9;
+
   // below variables are specific for blockletDataMap
-  int BLOCKLET_INFO_INDEX = 9;
+  int BLOCKLET_INFO_INDEX = 10;
--- End diff --

No it will not have any impact as the change this code is in query part


---


[GitHub] carbondata issue #2725: [CARBONDATA-2942] Add read and write support for wri...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2725
  
@ravipesala ..Fixed review comments. Kindly review


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139529
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/DataRefNode.java ---
@@ -125,4 +125,12 @@ DimensionRawColumnChunk readDimensionChunk(FileReader 
fileReader, int columnInde
* @return
*/
   BitSetGroup getIndexedData();
+
+  /**
+   * Return the array which contains the flag for each column whether the 
min max for that column
+   * is written in metadata or not
+   *
+   * @return
+   */
+  boolean[] isMinMaxSet();
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139409
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java
 ---
@@ -194,9 +198,12 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   BitSetGroup bitSetGroup = new 
BitSetGroup(rawColumnChunk.getPagesCount());
   FilterExecuter filterExecuter = null;
   boolean isExclude = false;
+  boolean isMinMaxSetForFilterDimension =
+  rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
   for (int i = 0; i < rawColumnChunk.getPagesCount(); i++) {
 if (rawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(rawColumnChunk.getMaxValues()[i], 
this.filterRangeValues)) {
+  if (!isMinMaxSetForFilterDimension || 
isScanRequired(rawColumnChunk.getMaxValues()[i],
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139494
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
 ---
@@ -133,6 +134,21 @@
 return null;
   }
 
+  @Override
+  public boolean[] isMinMaxSet() {
+BlockletIndex blockletIndex =
+
blockInfos.get(index).getDetailInfo().getBlockletInfo().getBlockletIndex();
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139378
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java
 ---
@@ -122,11 +122,11 @@ public boolean isScanRequired(DataRefNode dataBlock) {
 bitSet = ((ImplicitColumnFilterExecutor) filterExecuter)
 .isFilterValuesPresentInBlockOrBlocklet(
 dataBlock.getColumnsMaxValue(),
-dataBlock.getColumnsMinValue(), blockletId);
+dataBlock.getColumnsMinValue(), blockletId, 
dataBlock.isMinMaxSet());
   } else {
 bitSet = this.filterExecuter
 .isScanRequired(dataBlock.getColumnsMaxValue(),
-dataBlock.getColumnsMinValue());
+dataBlock.getColumnsMinValue(), dataBlock.isMinMaxSet());
--- End diff --

For blocklet min max comparison the footer is getting read in executor and 
the same min max flag information is getting used. The information serialized 
from driver is not getting used for blocklet min max comparison in executor


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139444
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -108,10 +108,13 @@ public BitSetGroup 
applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
   BitSetGroup bitSetGroup = new 
BitSetGroup(dimensionRawColumnChunk.getPagesCount());
   filterValues = dimColumnExecuterInfo.getFilterKeys();
   boolean isDecoded = false;
+  boolean isMinMaxSetForFilterDimension =
+  rawBlockletColumnChunks.getDataBlock().isMinMaxSet()[chunkIndex];
   for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
 if (dimensionRawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
-  dimensionRawColumnChunk.getMinValues()[i], 
dimColumnExecuterInfo.getFilterKeys())) {
+  if (!isMinMaxSetForFilterDimension || isScanRequired(
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218139467
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/blocklet/index/BlockletIndex.java
 ---
@@ -75,4 +80,12 @@ public void setMinMaxIndex(BlockletMinMaxIndex 
minMaxIndex) {
 this.minMaxIndex = minMaxIndex;
   }
 
+  @Override public void write(DataOutput out) throws IOException {
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [CARBONDATA-2942] Add read and write support ...

2018-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2725#discussion_r218138883
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
@@ -250,6 +275,42 @@ public static BlockletIndex 
getBlockletIndex(EncodedBlocklet encodedBlocklet,
   }
 
   /**
+   * This method will combine the writeMinMax flag from all the pages. If 
any page for a given
+   * dimension has writeMinMax flag set to false then min max for that 
dimension will nto be
+   * written in any of the page and metadata
+   *
+   * @param blockletMinMaxIndex
+   * @param encodedBlocklet
+   */
+  private static List mergeWriteMinMaxFlagForAllPages(
+  BlockletMinMaxIndex blockletMinMaxIndex, EncodedBlocklet 
encodedBlocklet) {
+Boolean[] mergedWriteMinMaxFlag =
+new Boolean[encodedBlocklet.getNumberOfDimension() + 
encodedBlocklet.getNumberOfMeasure()];
+// set writeMinMax flag to true for all the columns by default and 
then update if stats object
+// has the this flag set to false
+Arrays.fill(mergedWriteMinMaxFlag, true);
+for (int i = 0; i < encodedBlocklet.getNumberOfDimension(); i++) {
+  for (int pageIndex = 0; pageIndex < 
encodedBlocklet.getNumberOfPages(); pageIndex++) {
+EncodedColumnPage encodedColumnPage =
+
encodedBlocklet.getEncodedDimensionColumnPages().get(i).getEncodedColumnPageList()
+.get(pageIndex);
+SimpleStatsResult stats = encodedColumnPage.getStats();
+if (!stats.writeMinMax()) {
+  mergedWriteMinMaxFlag[i] = stats.writeMinMax();
+  String columnName = 
encodedColumnPage.getActualPage().getColumnSpec().getFieldName();
+  LOGGER.info("Min Max writing ignored for column " + columnName + 
" from page 0 to "
--- End diff --

ok


---


[GitHub] carbondata pull request #2725: [WIP] Added code to support storing min max f...

2018-09-16 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2725

[WIP] Added code to support storing min max for string columns based on 
number of characacters


 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
string_min_max_decision

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2725.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2725


commit 6ef6d880e68aa3bfc1c704659c916b384e5d545a
Author: manishgupta88 
Date:   2018-09-14T05:13:20Z

Modified code to support writing the min max flag for all the columns in 
the metadata. This will help in deciding whether min max for a column is 
written or not

commit 4ed24f3972cb4c03133912ba8a1c9d35b3122e7a
Author: manishgupta88 
Date:   2018-09-15T06:20:03Z

Modified code to support filter query using min max flag for a column




---


[GitHub] carbondata issue #2716: [HOTFIX] Fixed 2.3 CI

2018-09-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2716
  
LGTM


---


[GitHub] carbondata pull request #2702: [WIP] Fixed multiple complex type issue

2018-09-10 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2702

[WIP] Fixed multiple complex type issue

**This PR contains**
1. Fix to support nested complex type till integer length. When the data 
length in one row for nested complex type was crossing the short limit 
exception was being throw. Now length till integer limit is supported.
2. Fixed error message when map type was given as sort column in SDK
3. Fixed map type parsing issue for nested structures like 
struct> or array>

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Added test cases   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata fix_map_type_issues

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2702


commit f9e76162043206a006cf68822121c995f12b3c6b
Author: manishgupta88 
Date:   2018-09-10T07:12:33Z

Fixed multiple complex type issue and added test cases for validation




---


[GitHub] carbondata issue #2687: [CARBONDATA-2876]Fix Avro decimal datatype with prec...

2018-09-10 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2687
  
LGTM


---


[GitHub] carbondata issue #2698: [HOTFIX] Fixed LRU cache bug to invalidate the cache...

2018-09-07 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2698
  
@ravipesala ..handled review comment. Kindly review and merge


---


[GitHub] carbondata pull request #2698: [HOTFIX] Fixed LRU cache bug to invalidate th...

2018-09-07 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2698#discussion_r215984873
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCacheKeyValue.java
 ---
@@ -100,6 +100,10 @@ public long getMemorySize() {
   return size;
 }
 
+@Override public void invalidate() {
--- End diff --

Made bloomFilters object to null


---


[GitHub] carbondata issue #2685: [CARBONDATA-2910] Support backward compatability in ...

2018-09-07 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2685
  
LGTM


---


[GitHub] carbondata issue #2694: [CARBONDATA-2876]AVRO datatype support through SDK

2018-09-06 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2694
  
LGTM


---


[GitHub] carbondata pull request #2698: [HOTFIX] Fixed LRU cache bug to invalidate th...

2018-09-06 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2698

[HOTFIX] Fixed LRU cache bug to invalidate the cacheable object to clean up 
the resources

This PR contains the fix for LRU cache bug to invalidate the Cacheable 
object while removing it from LRU cache. This will help in clearing the unsafe 
memory for cacheable objects like BlockDataMaps

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata lru_cache_bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2698


commit ba4221cc18700b1c22866dbd86055065aefdb1d4
Author: manishgupta88 
Date:   2018-09-06T17:26:49Z

Fixed LRU cache of bug to invalidate the Cacheable object while removing it 
from LRU cache. This will help in clearing the unsafe memory for cacheable 
objects like BlockDataMaps




---


[GitHub] carbondata pull request #2687: [CARBONDATA-2876]Fix Avro decimal datatype wi...

2018-09-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2687#discussion_r214790148
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -407,6 +413,19 @@ private Object avroFieldToObjectForUnionType(Schema 
avroField, Object fieldValue
   out = null;
 }
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+// As binary type is not supported yet,value will be null
+if (logicalType instanceof LogicalTypes.Decimal) {
+  BigDecimal decimalValue = new BigDecimal(new 
String(((ByteBuffer) fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  out = (decimalValue.round(
+  new MathContext(((LogicalTypes.Decimal) 
avroField.getLogicalType()).getPrecision(
+  .setScale(((LogicalTypes.Decimal) 
avroField.getLogicalType()).getScale(),
+  RoundingMode.HALF_UP);
--- End diff --

replace bigdecimal conversion with below code at all places in the code
`BigDecimal bigDecimal = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(), CarbonCommonConstants.DEFAULT_CHARSET_CLASS)
.setScale(dimension.getColumnSchema().getScale(), 
RoundingMode.HALF_UP);`


---


[GitHub] carbondata pull request #2663: [CARBONDATA-2894] Add support for complex map...

2018-09-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2663#discussion_r214717788
  
--- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/Field.java 
---
@@ -213,4 +218,58 @@ public String getColumnComment() {
   public void setColumnComment(String columnComment) {
 this.columnComment = columnComment;
   }
+
+  private void initComplexTypeChildren() {
+if (getDataType().isComplexType()) {
+  StructField subFields = prepareSubFields(getFieldName(), 
getDataType());
+  if (DataTypes.isArrayType(getDataType()) || 
DataTypes.isMapType(getDataType())) {
+children = subFields.getChildren();
+  } else if (DataTypes.isStructType(getDataType())) {
+children = ((StructType) subFields.getDataType()).getFields();
+  }
+}
+  }
+
+  /**
+   * prepare sub fields for complex types
+   *
+   * @param fieldName
+   * @param dType
--- End diff --

ok


---


[GitHub] carbondata issue #2663: [CARBONDATA-2894] Add support for complex map type t...

2018-09-03 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2663
  
@ravipesala ...handled review comments. Kindly review and merge


---


[GitHub] carbondata pull request #2663: [CARBONDATA-2894] Add support for complex map...

2018-09-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2663#discussion_r214643424
  
--- Diff: 
integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
 ---
@@ -214,6 +216,30 @@ class SparkCarbonFileFormat extends FileFormat
   data
 }
 
+private def extractMapData(data: AnyRef, mapType: MapType): 
ArrayObject = {
+  val mapData = data.asInstanceOf[MapData]
+  val keys: ArrayData = mapData.keyArray()
+  val values: ArrayData = mapData.valueArray()
+  var keyValueHolder = scala.collection.mutable.ArrayBuffer[AnyRef]()
--- End diff --

ok done


---


[GitHub] carbondata pull request #2663: [CARBONDATA-2894] Add support for complex map...

2018-09-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2663#discussion_r214643373
  
--- Diff: 
integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonSparkDataSourceUtil.scala
 ---
@@ -198,7 +249,20 @@ object CarbonSparkDataSourceUtil {
   val dataType = convertSparkToCarbonDataType(field.dataType)
   dataType match {
 case s: CarbonStructType =>
-  new Field(field.name, s, s.getFields)
+  val subFields = prepareSubFields(field.name, s)
--- End diff --

moved the preparation to Field class


---


[GitHub] carbondata issue #2654: [CARBONDATA-2896] Adaptive Encoding for Primitive da...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2654
  
@dhatchayani 
You can raise one more to improvise the code at some places:
1. Unify isScanRequired code in all the filter classes using ENUM and flag 
based on min max comparison
2. Create new page wrapper that extends from ColumnPageWrapper and sends 
the actual data for no dictionary primitive type columns


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214371546
  
--- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java
 ---
@@ -331,8 +332,18 @@ private BloomQueryModel 
buildQueryModelInternal(CarbonColumn carbonColumn,
   // for dictionary/date columns, convert the surrogate key to bytes
   internalFilterValue = CarbonUtil.getValueAsBytes(DataTypes.INT, 
convertedValue);
 } else {
-  // for non dictionary dimensions, is already bytes,
-  internalFilterValue = (byte[]) convertedValue;
+  // for non dictionary dimensions, numeric columns will be of 
original data,
+  // so convert the data to bytes
+  if (DataTypeUtil.isPrimitiveColumn(carbonColumn.getDataType())) {
+if (convertedValue == null) {
--- End diff --

if possible initialize and store the flag  in constructor and remove the 
check `DataTypeUtil.isPrimitiveColumn` wherever applicable in the below code


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214361007
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java
 ---
@@ -110,8 +112,19 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks 
rawBlockletColumnChunks,
   boolean isDecoded = false;
   for (int i = 0; i < dimensionRawColumnChunk.getPagesCount(); i++) {
 if (dimensionRawColumnChunk.getMaxValues() != null) {
-  if (isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
-  dimensionRawColumnChunk.getMinValues()[i], 
dimColumnExecuterInfo.getFilterKeys())) {
+  boolean scanRequired;
+  // for no dictionary measure column comparison can be done
+  // on the original data as like measure column
+  if 
(DataTypeUtil.isPrimitiveColumn(dimColumnEvaluatorInfo.getDimension().getDataType())
+  && 
!dimColumnEvaluatorInfo.getDimension().hasEncoding(Encoding.DICTIONARY)) {
+scanRequired = 
isScanRequired(dimensionRawColumnChunk.getMaxValues()[i],
--- End diff --

You can create a `isPrimitiveNoDictionaryColumn` flag and check 
`DataTypeUtil.isPrimitiveColum` in the constructor. This will avoid the check 
for every page. Do this for all the filters


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214356896
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
 ---
@@ -239,12 +239,25 @@ private boolean isEncodedWithMeta(DataChunk2 
pageMetadata) {
   protected DimensionColumnPage decodeDimension(DimensionRawColumnChunk 
rawColumnPage,
   ByteBuffer pageData, DataChunk2 pageMetadata, int offset)
   throws IOException, MemoryException {
+List encodings = pageMetadata.getEncoders();
 if (isEncodedWithMeta(pageMetadata)) {
   ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, 
pageData, offset,
   null != rawColumnPage.getLocalDictionary());
   
decodedPage.setNullBits(QueryUtil.getNullBitSet(pageMetadata.presence));
-  return new ColumnPageWrapper(decodedPage, 
rawColumnPage.getLocalDictionary(),
-  isEncodedWithAdaptiveMeta(pageMetadata));
+  int[] invertedIndexes = new int[0];
--- End diff --

add a comment to explain that this scenario is to handle no dictionary 
primitive type columns where inverted index can be created on row id's during 
data load


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214354541
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/TablePage.java 
---
@@ -363,7 +398,16 @@ public EncodedTablePage getEncodedTablePage() {
   columnPageEncoder = encodingFactory.createEncoder(
   spec,
   noDictDimensionPages[noDictIndex]);
-  encodedPage = 
columnPageEncoder.encode(noDictDimensionPages[noDictIndex++]);
+  encodedPage = 
columnPageEncoder.encode(noDictDimensionPages[noDictIndex]);
+  DataType targetDataType =
+  
columnPageEncoder.getTargetDataType(noDictDimensionPages[noDictIndex]);
+  if (null != targetDataType) {
+LOGGER.info("Encoder result ---> Source data type: " + 
noDictDimensionPages[noDictIndex]
--- End diff --

make this logger debug and check for debugenabled


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214352965
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -346,12 +371,21 @@ static ColumnPageCodec 
selectCodecByAlgorithmForDecimal(SimpleStatsResult stats,
   // no effect to use adaptive or delta, use compression only
   return new DirectCompressCodec(stats.getDataType());
 }
+boolean isSort = false;
+boolean isInvertedIndex = false;
+if (columnSpec instanceof TableSpec.DimensionSpec
+&& columnSpec.getColumnType() != ColumnType.COMPLEX_PRIMITIVE) {
+  isSort = ((TableSpec.DimensionSpec) columnSpec).isInSortColumns();
+  isInvertedIndex = isSort && ((TableSpec.DimensionSpec) 
columnSpec).isDoInvertedIndex();
+}
--- End diff --

Put the above changes in one method as the same code is used in above 
places also and then call this method while creating the encoding type


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214351650
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/TableSpec.java ---
@@ -91,6 +92,30 @@ private void addMeasures(List measures) {
 }
   }
 
+  /**
+   * No dictionary and complex dimensions of the table
+   *
+   * @return
+   */
+  public DimensionSpec[] getNoDictAndComplexDimensions() {
+List noDicOrCompIndexes = new 
ArrayList<>(dimensionSpec.length);
+int noDicCount = 0;
+for (int i = 0; i < dimensionSpec.length; i++) {
+  if (dimensionSpec[i].getColumnType() == ColumnType.PLAIN_VALUE
+  || dimensionSpec[i].getColumnType() == 
ColumnType.COMPLEX_PRIMITIVE
+  || dimensionSpec[i].getColumnType() == ColumnType.COMPLEX) {
+noDicOrCompIndexes.add(i);
+noDicCount++;
+  }
+}
+
+DimensionSpec[] dims = new DimensionSpec[noDicCount];
+for (int i = 0; i < dims.length; i++) {
+  dims[i] = dimensionSpec[noDicOrCompIndexes.get(i)];
+}
+return dims;
--- End diff --

Avoid the below for loop in this method


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214338168
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -375,6 +454,47 @@ public void 
writeRawRowAsIntermediateSortTempRowToOutputStream(Object[] row,
 outputStream.write(rowBuffer.array(), 0, packSize);
   }
 
+  /**
+   * Write the data to stream
+   *
+   * @param data
+   * @param outputStream
+   * @param idx
+   * @throws IOException
+   */
+  private void writeDataToStream(Object data, DataOutputStream 
outputStream, int idx)
+  throws IOException {
+DataType dataType = noDicSortDataTypes[idx];
+if (null == data) {
+  outputStream.writeBoolean(false);
+  return;
--- End diff --

do not use return statement instead use the if else block wisely


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214336180
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -224,10 +237,15 @@ public IntermediateSortTempRow 
readWithNoSortFieldConvert(
 
 // read no-dict & sort data
 for (int idx = 0; idx < this.noDictSortDimCnt; idx++) {
-  short len = inputStream.readShort();
-  byte[] bytes = new byte[len];
-  inputStream.readFully(bytes);
-  noDictSortDims[idx] = bytes;
+  // for no dict measure column get the original data
+  if (DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[idx])) {
+noDictSortDims[idx] = readDataFromStream(inputStream, idx);
+  } else {
+short len = inputStream.readShort();
+byte[] bytes = new byte[len];
+inputStream.readFully(bytes);
+noDictSortDims[idx] = bytes;
+  }
--- End diff --

Above also there is a similar..refactor the code to one method and call 
from both these places


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341633
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/partition/impl/RawRowComparator.java
 ---
@@ -30,24 +33,39 @@
 public class RawRowComparator implements Comparator {
   private int[] sortColumnIndices;
   private boolean[] isSortColumnNoDict;
+  private DataType[] noDicDataTypes;
 
-  public RawRowComparator(int[] sortColumnIndices, boolean[] 
isSortColumnNoDict) {
+  public RawRowComparator(int[] sortColumnIndices, boolean[] 
isSortColumnNoDict,
+  DataType[] noDicDataTypes) {
 this.sortColumnIndices = sortColumnIndices;
 this.isSortColumnNoDict = isSortColumnNoDict;
+this.noDicDataTypes = noDicDataTypes;
   }
 
   @Override
   public int compare(CarbonRow o1, CarbonRow o2) {
 int diff = 0;
 int i = 0;
+int noDicIdx = 0;
 for (int colIdx : sortColumnIndices) {
   if (isSortColumnNoDict[i]) {
-byte[] colA = (byte[]) o1.getObject(colIdx);
-byte[] colB = (byte[]) o2.getObject(colIdx);
-diff = UnsafeComparer.INSTANCE.compareTo(colA, colB);
-if (diff != 0) {
-  return diff;
+if (DataTypeUtil.isPrimitiveColumn(noDicDataTypes[noDicIdx])) {
+  // for no dictionary numeric column get comparator based on the 
data type
+  SerializableComparator comparator = 
org.apache.carbondata.core.util.comparator.Comparator
--- End diff --

increment `noDicIdx` in if block and remove from method end


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341135
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/IntermediateSortTempRowComparator.java
 ---
@@ -45,18 +52,31 @@ public int compare(IntermediateSortTempRow rowA, 
IntermediateSortTempRow rowB) {
 int diff = 0;
 int dictIndex = 0;
 int nonDictIndex = 0;
+int noDicTypeIdx = 0;
 
 for (boolean isNoDictionary : isSortColumnNoDictionary) {
 
   if (isNoDictionary) {
-byte[] byteArr1 = rowA.getNoDictSortDims()[nonDictIndex];
-byte[] byteArr2 = rowB.getNoDictSortDims()[nonDictIndex];
-nonDictIndex++;
+if 
(DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[noDicTypeIdx])) {
+  // use data types based comparator for the no dictionary measure 
columns
+  SerializableComparator comparator = 
org.apache.carbondata.core.util.comparator.Comparator
--- End diff --

Increment the no dictionary type index here `noDicTypeIdx` in if block and 
not at the end


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341442
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/NewRowComparator.java
 ---
@@ -43,15 +53,31 @@ public NewRowComparator(boolean[] 
noDictionarySortColumnMaping) {
   public int compare(Object[] rowA, Object[] rowB) {
 int diff = 0;
 int index = 0;
+int dataTypeIdx = 0;
+int noDicSortIdx = 0;
 
-for (boolean isNoDictionary : noDictionarySortColumnMaping) {
-  if (isNoDictionary) {
-byte[] byteArr1 = (byte[]) rowA[index];
-byte[] byteArr2 = (byte[]) rowB[index];
+for (int i = 0; i < noDicDimColMapping.length; i++) {
+  if (noDicDimColMapping[i]) {
+if (noDicSortColumnMapping[noDicSortIdx++]) {
+  if (DataTypeUtil.isPrimitiveColumn(noDicDataTypes[dataTypeIdx])) 
{
+// use data types based comparator for the no dictionary 
measure columns
+SerializableComparator comparator =
--- End diff --

increment `dataTypeIdx` in if block and remove from method end


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214337384
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/SortStepRowHandler.java
 ---
@@ -359,9 +433,14 @@ public void 
writeRawRowAsIntermediateSortTempRowToOutputStream(Object[] row,
 
 // write no-dict & sort
 for (int idx = 0; idx < this.noDictSortDimCnt; idx++) {
-  byte[] bytes = (byte[]) row[this.noDictSortDimIdx[idx]];
-  outputStream.writeShort(bytes.length);
-  outputStream.write(bytes);
+  if (DataTypeUtil.isPrimitiveColumn(noDicSortDataTypes[idx])) {
--- End diff --

I can see that at multiple places for every row 
DataTypeUtil.isPrimitiveColumn is getting used. Please check the load 
performance impact of this


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214341815
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/comparator/UnsafeRowComparator.java
 ---
@@ -60,26 +64,50 @@ public int compare(UnsafeCarbonRow rowL, Object 
baseObjectL, UnsafeCarbonRow row
   if (isNoDictionary) {
 short lengthA = CarbonUnsafe.getUnsafe().getShort(baseObjectL,
 rowA + dictSizeInMemory + sizeInNonDictPartA);
-byte[] byteArr1 = new byte[lengthA];
 sizeInNonDictPartA += 2;
-CarbonUnsafe.getUnsafe()
-.copyMemory(baseObjectL, rowA + dictSizeInMemory + 
sizeInNonDictPartA,
-byteArr1, CarbonUnsafe.BYTE_ARRAY_OFFSET, lengthA);
-sizeInNonDictPartA += lengthA;
-
 short lengthB = CarbonUnsafe.getUnsafe().getShort(baseObjectR,
 rowB + dictSizeInMemory + sizeInNonDictPartB);
-byte[] byteArr2 = new byte[lengthB];
 sizeInNonDictPartB += 2;
-CarbonUnsafe.getUnsafe()
-.copyMemory(baseObjectR, rowB + dictSizeInMemory + 
sizeInNonDictPartB,
-byteArr2, CarbonUnsafe.BYTE_ARRAY_OFFSET, lengthB);
-sizeInNonDictPartB += lengthB;
+DataType dataType = 
tableFieldStat.getNoDicSortDataType()[noDicSortIdx];
+if (DataTypeUtil.isPrimitiveColumn(dataType)) {
+  Object data1 = null;
--- End diff --

increment `noDicSortIdx` in if block and remove from method end


---


[GitHub] carbondata issue #2644: [CARBONDATA-2853] Implement file-level min/max index...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2644
  
@QiangCai In General I can see that you put empty lines at many places 
in the code. Please remove those empty lines everywhere and add some code 
comments for better understanding


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214310465
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java ---
@@ -96,14 +96,35 @@ private static FileFooter3 
getFileFooter3(List infoList,
 return footer;
   }
 
-  public static BlockletIndex getBlockletIndex(
-  org.apache.carbondata.core.metadata.blocklet.index.BlockletIndex 
info) {
+  public static 
org.apache.carbondata.core.metadata.blocklet.index.BlockletMinMaxIndex
+  convertExternalMinMaxIndex(BlockletMinMaxIndex minMaxIndex) {
--- End diff --

please add a method comment to explain what is meaning of 
convertExternalMinMaxIndex


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214303472
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datamap;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import 
org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.reader.CarbonIndexFileReader;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import 
org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.util.CarbonMetadataUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.format.BlockIndex;
+
+@InterfaceAudience.Internal
+public class StreamDataMap {
+
+  private CarbonTable carbonTable;
+
+  private AbsoluteTableIdentifier identifier;
--- End diff --

If carbonTable is getting stored then no need to store identifier...you can 
get it from carbontable


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214307411
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datamap;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import 
org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.reader.CarbonIndexFileReader;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import 
org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.util.CarbonMetadataUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.format.BlockIndex;
+
+@InterfaceAudience.Internal
+public class StreamDataMap {
+
+  private CarbonTable carbonTable;
+
+  private AbsoluteTableIdentifier identifier;
+
+  private FilterExecuter filterExecuter;
+
+  public StreamDataMap(CarbonTable carbonTable) {
+this.carbonTable = carbonTable;
+this.identifier = carbonTable.getAbsoluteTableIdentifier();
+  }
+
+  public void init(FilterResolverIntf filterExp) {
+if (filterExp != null) {
+
+  List minMaxCacheColumns = new ArrayList<>();
+  for (CarbonDimension dimension : carbonTable.getDimensions()) {
+if (!dimension.isComplex()) {
+  minMaxCacheColumns.add(dimension);
+}
+  }
+  minMaxCacheColumns.addAll(carbonTable.getMeasures());
+
+  List listOfColumns =
+  carbonTable.getTableInfo().getFactTable().getListOfColumns();
+  int[] columnCardinality = new int[listOfColumns.size()];
+  for (int index = 0; index < columnCardinality.length; index++) {
+columnCardinality[index] = Integer.MAX_VALUE;
+  }
+
+  SegmentProperties segmentProperties =
+  new SegmentProperties(listOfColumns, columnCardinality);
+
+  filterExecuter = FilterUtil.getFilterExecuterTree(
+  filterExp, segmentProperties, null, minMaxCacheColumns);
+}
+  }
+
+  public List prune(List segments) throws IOException 
{
+if (filterExecuter == null) {
+  return listAllStreamFiles(segments, false);
+} else {
+  List streamFileList = new ArrayList<>();
+  for (StreamFile streamFile : listAllStreamFiles(segments, true)) {
+if (isScanRequire(streamFile)) {
+  streamFileList.add(streamFile);
+  streamFile.setMinMaxIndex(null);
+}
+  }
+  return streamFileList;
+}
+  }
+
+  private boolean isScanRequire(StreamFile streamFile) {
+// backward compatibility, old stream file without min/max index
+if (streamFile.getMinMaxIndex() == null) {
+  return true;
+}
+
+byte[][] maxValue = streamFile.getMinMaxIndex().getMaxValues();
+byte[][] m

[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214313170
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala
 ---
@@ -205,8 +205,9 @@ class StreamHandoffRDD[K, V](
 segmentList.add(Segment.toSegment(handOffSegmentId, null))
 val splits = inputFormat.getSplitsOfStreaming(
   job,
-  
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.getAbsoluteTableIdentifier,
-  segmentList
+  segmentList,
+  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
+  null
--- End diff --

Once you add the overloaded method as explained in above comment you can 
call the method with 3 arguments from here


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214311953
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -342,60 +341,52 @@ public void refreshSegmentCacheIfRequired(JobContext 
job, CarbonTable carbonTabl
   /**
* use file list in .carbonindex file to get the split of streaming.
*/
-  public List getSplitsOfStreaming(JobContext job, 
AbsoluteTableIdentifier identifier,
-  List streamSegments) throws IOException {
+  public List getSplitsOfStreaming(JobContext job, 
List streamSegments,
+  CarbonTable carbonTable, FilterResolverIntf filterResolverIntf) 
throws IOException {
 List splits = new ArrayList();
 if (streamSegments != null && !streamSegments.isEmpty()) {
   numStreamSegments = streamSegments.size();
   long minSize = Math.max(getFormatMinSplitSize(), 
getMinSplitSize(job));
   long maxSize = getMaxSplitSize(job);
-  for (Segment segment : streamSegments) {
-String segmentDir =
-CarbonTablePath.getSegmentPath(identifier.getTablePath(), 
segment.getSegmentNo());
-FileFactory.FileType fileType = 
FileFactory.getFileType(segmentDir);
-if (FileFactory.isFileExist(segmentDir, fileType)) {
-  SegmentIndexFileStore segmentIndexFileStore = new 
SegmentIndexFileStore();
-  segmentIndexFileStore.readAllIIndexOfSegment(segmentDir);
-  Map carbonIndexMap = 
segmentIndexFileStore.getCarbonIndexMap();
-  CarbonIndexFileReader indexReader = new CarbonIndexFileReader();
-  for (byte[] fileData : carbonIndexMap.values()) {
-indexReader.openThriftReader(fileData);
-try {
-  // map block index
-  while (indexReader.hasNext()) {
-BlockIndex blockIndex = indexReader.readBlockIndexInfo();
-String filePath = segmentDir + File.separator + 
blockIndex.getFile_name();
-Path path = new Path(filePath);
-long length = blockIndex.getFile_size();
-if (length != 0) {
-  BlockLocation[] blkLocations;
-  FileSystem fs = FileFactory.getFileSystem(path);
-  FileStatus file = fs.getFileStatus(path);
-  blkLocations = fs.getFileBlockLocations(path, 0, length);
-  long blockSize = file.getBlockSize();
-  long splitSize = computeSplitSize(blockSize, minSize, 
maxSize);
-  long bytesRemaining = length;
-  while (((double) bytesRemaining) / splitSize > 1.1) {
-int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
-splits.add(makeSplit(segment.getSegmentNo(), path, 
length - bytesRemaining,
-splitSize, blkLocations[blkIndex].getHosts(),
-blkLocations[blkIndex].getCachedHosts(), 
FileFormat.ROW_V1));
-bytesRemaining -= splitSize;
-  }
-  if (bytesRemaining != 0) {
-int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
-splits.add(makeSplit(segment.getSegmentNo(), path, 
length - bytesRemaining,
-bytesRemaining, blkLocations[blkIndex].getHosts(),
-blkLocations[blkIndex].getCachedHosts(), 
FileFormat.ROW_V1));
-  }
-} else {
-  //Create empty hosts array for zero length files
-  splits.add(makeSplit(segment.getSegmentNo(), path, 0, 
length, new String[0],
-  FileFormat.ROW_V1));
-}
-  }
-} finally {
-  indexReader.closeThriftReader();
+
+  if (filterResolverIntf == null) {
+if (carbonTable != null) {
+  Expression filter = getFilterPredicates(job.getConfiguration());
+  if (filter != null) {
+carbonTable.processFilterExpression(filter, null, null);
+filterResolverIntf = carbonTable.resolveFilter(filter);
+  }
+}
+  }
+  StreamDataMap streamDataMap =
+  DataMapStoreManager.getInstance().getStreamDataMap(carbonTable);
+  streamDataMap.init(filterResolverIntf);
+  List streamFiles = streamDataMap.prune(streamSegments);
+  for (StreamFile streamFile : streamFiles) {
+if (FileFactory.isFileExist(streamFile.getFilePath())) {
+  Path path = new Path(streamFile.getFilePath());
+  long length = streamFile.

[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214311329
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -342,60 +341,52 @@ public void refreshSegmentCacheIfRequired(JobContext 
job, CarbonTable carbonTabl
   /**
* use file list in .carbonindex file to get the split of streaming.
*/
-  public List getSplitsOfStreaming(JobContext job, 
AbsoluteTableIdentifier identifier,
-  List streamSegments) throws IOException {
+  public List getSplitsOfStreaming(JobContext job, 
List streamSegments,
+  CarbonTable carbonTable, FilterResolverIntf filterResolverIntf) 
throws IOException {
--- End diff --

You can write an overloaded method for getSplitsOfStreaming. One which 
accepts 3 parameters and one with 4 parameters.
1.  getSplitsOfStreaming(JobContext job, AbsoluteTableIdentifier 
identifier,List streamSegments)
-- From this method you can the other method and pass null as the 4th 
argument. This will avoid passing null at all places above.
2. getSplitsOfStreaming(JobContext job, List streamSegments, 
CarbonTable carbonTable, FilterResolverIntf filterResolverIntf)


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214305126
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/StreamDataMap.java ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datamap;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import 
org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.reader.CarbonIndexFileReader;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import 
org.apache.carbondata.core.scan.filter.executer.ImplicitColumnFilterExecutor;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.util.CarbonMetadataUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.format.BlockIndex;
+
+@InterfaceAudience.Internal
+public class StreamDataMap {
--- End diff --

Please check the feasibility if we can extend DataMap interface and 
implement all its method to keep it similar like BlockDataMap. I think it 
should be feasible


---


[GitHub] carbondata pull request #2644: [CARBONDATA-2853] Implement file-level min/ma...

2018-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2644#discussion_r214316607
  
--- Diff: 
streaming/src/main/java/org/apache/carbondata/streaming/CarbonStreamRecordWriter.java
 ---
@@ -212,9 +271,13 @@ private void initializeAtFirstRow() throws 
IOException, InterruptedException {
 byte[] col = (byte[]) columnValue;
 output.writeShort(col.length);
 output.writeBytes(col);
+dimensionStatsCollectors[dimCount].update(col);
   } else {
 output.writeInt((int) columnValue);
+
dimensionStatsCollectors[dimCount].update(ByteUtil.toBytes((int) columnValue));
--- End diff --

For min/max comparison you are converting from Int to byte array for all 
the rows. This can impact the writing performance. Instead you can typecast 
into Int and do the comparison. After all the data is loaded then at the end 
you can convert all the values into byte array based on datatype. At that time 
it will be only one conversion for the final min/max values


---


[GitHub] carbondata issue #2671: [CARBONDATA-2876]AVRO datatype support through SDK

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2671
  
LGTM...can be merged once build passes


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214244301
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +181,124 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case UNION:
+// Union type will be internally stored as Struct
+// Fill data object only if fieldvalue is instance of datatype
+// For other field datatypes, fill value as Null
+List unionFields = avroField.schema().getTypes();
+int notNullUnionFieldsCount = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+notNullUnionFieldsCount++;
+  }
+}
+Object[] values = new Object[notNullUnionFieldsCount];
+int j = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL) && 
checkFieldValueType(
+  unionField.getType(), fieldValue)) {
+values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
+break;
+  }
+  if (notNullUnionFieldsCount != 1) {
+j++;
+  }
+}
--- End diff --

Modify the code as below
int j = 0;
for (Schema unionField : unionFields) {
  if (unionField.getType().equals(Schema.Type.NULL) ) {
 continue;
  }
   if (checkFieldValueType(unionField.getType(), fieldValue)) {
values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
break;
  }
  j++;
}


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214047186
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortParameters.java
 ---
@@ -88,6 +88,12 @@
 
   private DataType[] measureDataType;
 
+  private DataType[] noDicDataType;
+
+  private DataType[] noDicSortDataType;
+
+  private DataType[] noDicNoSortDataType;
--- End diff --

write the usage of each type to get a better understanding


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214044420
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -976,4 +978,122 @@ public static long 
getDataBasedOnRestructuredDataType(byte[] data, DataType rest
 return value;
   }
 
+  /**
+   * Check if the column is a no dictionary primitive column
+   *
+   * @param dataType
+   * @return
+   */
+  public static boolean isPrimitiveColumn(DataType dataType) {
+if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE || 
dataType == DataTypes.SHORT
+|| dataType == DataTypes.INT || dataType == DataTypes.LONG || 
DataTypes.isDecimal(dataType)
+|| dataType == DataTypes.FLOAT || dataType == DataTypes.DOUBLE
+|| dataType == DataTypes.BYTE_ARRAY) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Put the data to unsafe memory
+   *
+   * @param dataType
+   * @param data
+   * @param baseObject
+   * @param address
+   * @param size
+   * @param sizeInBytes
+   */
+  public static void putDataToUnsafe(DataType dataType, Object data, 
Object baseObject,
+  long address, int size, int sizeInBytes) {
+dataType = DataTypeUtil.valueOf(dataType.getName());
+if (dataType == DataTypes.BOOLEAN) {
+  CarbonUnsafe.getUnsafe().putBoolean(baseObject, address + size, 
(boolean) data);
+} else if (dataType == DataTypes.BYTE) {
+  CarbonUnsafe.getUnsafe().putByte(baseObject, address + size, (byte) 
data);
+} else if (dataType == DataTypes.SHORT) {
+  CarbonUnsafe.getUnsafe().putShort(baseObject, address + size, 
(short) data);
+} else if (dataType == DataTypes.INT) {
+  CarbonUnsafe.getUnsafe().putInt(baseObject, address + size, (int) 
data);
+} else if (dataType == DataTypes.LONG) {
+  CarbonUnsafe.getUnsafe().putLong(baseObject, address + size, (long) 
data);
+} else if (DataTypes.isDecimal(dataType) || dataType == 
DataTypes.DOUBLE) {
+  CarbonUnsafe.getUnsafe().putDouble(baseObject, address + size, 
(double) data);
+} else if (dataType == DataTypes.FLOAT) {
+  CarbonUnsafe.getUnsafe().putFloat(baseObject, address + size, 
(float) data);
+} else if (dataType == DataTypes.BYTE_ARRAY) {
+  CarbonUnsafe.getUnsafe()
+  .copyMemory(data, CarbonUnsafe.BYTE_ARRAY_OFFSET, baseObject, 
address + size,
+  sizeInBytes);
+}
+  }
+
+  /**
+   * Retrieve/Get the data from unsafe memory
+   *
+   * @param dataType
+   * @param baseObject
+   * @param address
+   * @param size
+   * @param sizeInBytes
+   * @return
+   */
+  public static Object getDataFromUnsafe(DataType dataType, Object 
baseObject, long address,
+  int size, int sizeInBytes) {
+dataType = DataTypeUtil.valueOf(dataType.getName());
+Object data = new Object();
+if (dataType == DataTypes.BOOLEAN) {
+  data = CarbonUnsafe.getUnsafe().getBoolean(baseObject, address + 
size);
+} else if (dataType == DataTypes.BYTE) {
+  data = CarbonUnsafe.getUnsafe().getByte(baseObject, address + size);
+} else if (dataType == DataTypes.SHORT) {
+  data = CarbonUnsafe.getUnsafe().getShort(baseObject, address + size);
+} else if (dataType == DataTypes.INT) {
+  data = CarbonUnsafe.getUnsafe().getInt(baseObject, address + size);
+} else if (dataType == DataTypes.LONG) {
+  data = CarbonUnsafe.getUnsafe().getLong(baseObject, address + size);
+} else if (DataTypes.isDecimal(dataType) || dataType == 
DataTypes.DOUBLE) {
+  data = CarbonUnsafe.getUnsafe().getDouble(baseObject, address + 
size);
+} else if (dataType == DataTypes.FLOAT) {
+  data = CarbonUnsafe.getUnsafe().getDouble(baseObject, address + 
size);
+} else if (dataType == DataTypes.BYTE_ARRAY) {
+  CarbonUnsafe.getUnsafe()
+  .copyMemory(baseObject, address + size, data, 
CarbonUnsafe.BYTE_ARRAY_OFFSET,
+  sizeInBytes);
+}
+return data;
+  }
+
+  /**
+   * Put the data to vector
+   *
+   * @param vector
+   * @param value
+   * @param vectorRow
+   * @param length
+   */
+  public static void putDataToVector(CarbonColumnVector vector, byte[] 
value, int vectorRow,
--- End diff --

Move this method to DimensionVectorDataProcessor.java


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214051061
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -450,6 +450,114 @@ public static boolean isHeaderValid(String tableName, 
String[] csvHeader,
 return type;
   }
 
+  /**
+   * Get the no dictionary data types on the table
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static DataType[] getNoDicDataTypes(String databaseName, String 
tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+int noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+noDicCount++;
+  }
+}
+DataType[] type = new DataType[noDicCount];
+noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+type[noDicCount++] = dimensions.get(i).getDataType();
+  }
+}
+return type;
+  }
+
+  /**
+   * Get the no dictionary sort column mapping of the table
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static boolean[] getNoDicSortColMapping(String databaseName, 
String tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+List noDicIndexes = new ArrayList<>(dimensions.size());
+int noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+noDicIndexes.add(i);
+noDicCount++;
+  }
+}
+
+boolean[] noDicSortColMapping = new boolean[noDicCount];
+for (int i = 0; i < noDicSortColMapping.length; i++) {
+  if (dimensions.get(noDicIndexes.get(i)).isSortColumn()) {
+noDicSortColMapping[i] = true;
+  }
+}
+return noDicSortColMapping;
+  }
+
+  /**
+   * Get the data types of the no dictionary sort columns
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static DataType[] getNoDicSortDataTypes(String databaseName, 
String tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+int noDicSortCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY) && 
dimensions.get(i).isSortColumn()) {
+noDicSortCount++;
+  }
+}
+DataType[] type = new DataType[noDicSortCount];
+noDicSortCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY) && 
dimensions.get(i).isSortColumn()) {
+type[noDicSortCount++] = dimensions.get(i).getDataType();
+  }
+}
+return type;
+  }
+
+  /**
+   * Get the data types of the no dictionary no sort columns
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static DataType[] getNoDicNoSortDataTypes(String databaseName, 
String tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+int noDicNoSortCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY) && 
!dimensions.get(i)
+  .isSortColumn()) {
+noDicNoSortCount++;
+  }
+}
+DataType[] type = new DataType[noDicNoSortCount];
+noDicNoSortCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY) && 
!dimensions.get(i)
+  .isSortColumn()) {
+type[noDicNoSortCount++] = dimensions.get(i).getDataType();
+  }
+}
+return type;
--- End diff --

same comment as above


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214049718
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -450,6 +450,114 @@ public static boolean isHeaderValid(String tableName, 
String[] csvHeader,
 return type;
   }
 
+  /**
+   * Get the no dictionary data types on the table
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static DataType[] getNoDicDataTypes(String databaseName, String 
tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+int noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+noDicCount++;
+  }
+}
+DataType[] type = new DataType[noDicCount];
+noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+type[noDicCount++] = dimensions.get(i).getDataType();
+  }
+}
+return type;
+  }
+
+  /**
+   * Get the no dictionary sort column mapping of the table
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static boolean[] getNoDicSortColMapping(String databaseName, 
String tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+List noDicIndexes = new ArrayList<>(dimensions.size());
+int noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+noDicIndexes.add(i);
+noDicCount++;
+  }
+}
+
+boolean[] noDicSortColMapping = new boolean[noDicCount];
+for (int i = 0; i < noDicSortColMapping.length; i++) {
+  if (dimensions.get(noDicIndexes.get(i)).isSortColumn()) {
+noDicSortColMapping[i] = true;
+  }
+}
+return noDicSortColMapping;
+  }
+
+  /**
+   * Get the data types of the no dictionary sort columns
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static DataType[] getNoDicSortDataTypes(String databaseName, 
String tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+int noDicSortCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY) && 
dimensions.get(i).isSortColumn()) {
+noDicSortCount++;
+  }
+}
+DataType[] type = new DataType[noDicSortCount];
+noDicSortCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY) && 
dimensions.get(i).isSortColumn()) {
+type[noDicSortCount++] = dimensions.get(i).getDataType();
+  }
--- End diff --

same comment as above. Check on the callers and merge all 3 methods into 
one if possible


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214055038
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
 ---
@@ -320,12 +325,22 @@ public static CarbonFactDataHandlerModel 
getCarbonFactDataHandlerModel(CarbonLoa
 List allDimensions = carbonTable.getDimensions();
 int dictDimCount = allDimensions.size() - 
segmentProperties.getNumberOfNoDictionaryDimension()
 - segmentProperties.getComplexDimensions().size();
+CarbonColumn[] noDicAndComplexColumns =
+new 
CarbonColumn[segmentProperties.getNumberOfNoDictionaryDimension() + 
segmentProperties
+.getComplexDimensions().size()];
+int noDic = 0;
--- End diff --

Remove this unused variable


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214038515
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -976,4 +978,122 @@ public static long 
getDataBasedOnRestructuredDataType(byte[] data, DataType rest
 return value;
   }
 
+  /**
+   * Check if the column is a no dictionary primitive column
+   *
+   * @param dataType
+   * @return
+   */
+  public static boolean isPrimitiveColumn(DataType dataType) {
+if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE || 
dataType == DataTypes.SHORT
+|| dataType == DataTypes.INT || dataType == DataTypes.LONG || 
DataTypes.isDecimal(dataType)
+|| dataType == DataTypes.FLOAT || dataType == DataTypes.DOUBLE
+|| dataType == DataTypes.BYTE_ARRAY) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Put the data to unsafe memory
+   *
+   * @param dataType
+   * @param data
+   * @param baseObject
+   * @param address
+   * @param size
+   * @param sizeInBytes
+   */
+  public static void putDataToUnsafe(DataType dataType, Object data, 
Object baseObject,
--- End diff --

Create a new class CarbonUnsafeUtil and move this method there


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214038624
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -976,4 +978,122 @@ public static long 
getDataBasedOnRestructuredDataType(byte[] data, DataType rest
 return value;
   }
 
+  /**
+   * Check if the column is a no dictionary primitive column
+   *
+   * @param dataType
+   * @return
+   */
+  public static boolean isPrimitiveColumn(DataType dataType) {
+if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE || 
dataType == DataTypes.SHORT
+|| dataType == DataTypes.INT || dataType == DataTypes.LONG || 
DataTypes.isDecimal(dataType)
+|| dataType == DataTypes.FLOAT || dataType == DataTypes.DOUBLE
+|| dataType == DataTypes.BYTE_ARRAY) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Put the data to unsafe memory
+   *
+   * @param dataType
+   * @param data
+   * @param baseObject
+   * @param address
+   * @param size
+   * @param sizeInBytes
+   */
+  public static void putDataToUnsafe(DataType dataType, Object data, 
Object baseObject,
+  long address, int size, int sizeInBytes) {
+dataType = DataTypeUtil.valueOf(dataType.getName());
+if (dataType == DataTypes.BOOLEAN) {
+  CarbonUnsafe.getUnsafe().putBoolean(baseObject, address + size, 
(boolean) data);
+} else if (dataType == DataTypes.BYTE) {
+  CarbonUnsafe.getUnsafe().putByte(baseObject, address + size, (byte) 
data);
+} else if (dataType == DataTypes.SHORT) {
+  CarbonUnsafe.getUnsafe().putShort(baseObject, address + size, 
(short) data);
+} else if (dataType == DataTypes.INT) {
+  CarbonUnsafe.getUnsafe().putInt(baseObject, address + size, (int) 
data);
+} else if (dataType == DataTypes.LONG) {
+  CarbonUnsafe.getUnsafe().putLong(baseObject, address + size, (long) 
data);
+} else if (DataTypes.isDecimal(dataType) || dataType == 
DataTypes.DOUBLE) {
+  CarbonUnsafe.getUnsafe().putDouble(baseObject, address + size, 
(double) data);
+} else if (dataType == DataTypes.FLOAT) {
+  CarbonUnsafe.getUnsafe().putFloat(baseObject, address + size, 
(float) data);
+} else if (dataType == DataTypes.BYTE_ARRAY) {
+  CarbonUnsafe.getUnsafe()
+  .copyMemory(data, CarbonUnsafe.BYTE_ARRAY_OFFSET, baseObject, 
address + size,
+  sizeInBytes);
+}
+  }
+
+  /**
+   * Retrieve/Get the data from unsafe memory
+   *
+   * @param dataType
+   * @param baseObject
+   * @param address
+   * @param size
+   * @param sizeInBytes
+   * @return
+   */
+  public static Object getDataFromUnsafe(DataType dataType, Object 
baseObject, long address,
--- End diff --

same comment as above


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214049002
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -450,6 +450,114 @@ public static boolean isHeaderValid(String tableName, 
String[] csvHeader,
 return type;
   }
 
+  /**
+   * Get the no dictionary data types on the table
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static DataType[] getNoDicDataTypes(String databaseName, String 
tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+int noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+noDicCount++;
+  }
+}
+DataType[] type = new DataType[noDicCount];
+noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+type[noDicCount++] = dimensions.get(i).getDataType();
+  }
+}
--- End diff --

The below for loop can be removed and the filling of data type can be done 
in single iteration by taking an arraylist and while sending back you can 
convert back to array


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214049631
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -450,6 +450,114 @@ public static boolean isHeaderValid(String tableName, 
String[] csvHeader,
 return type;
   }
 
+  /**
+   * Get the no dictionary data types on the table
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static DataType[] getNoDicDataTypes(String databaseName, String 
tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+int noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+noDicCount++;
+  }
+}
+DataType[] type = new DataType[noDicCount];
+noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+type[noDicCount++] = dimensions.get(i).getDataType();
+  }
+}
+return type;
+  }
+
+  /**
+   * Get the no dictionary sort column mapping of the table
+   *
+   * @param databaseName
+   * @param tableName
+   * @return
+   */
+  public static boolean[] getNoDicSortColMapping(String databaseName, 
String tableName) {
+CarbonTable carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(databaseName, tableName);
+List dimensions = 
carbonTable.getDimensionByTableName(tableName);
+List noDicIndexes = new ArrayList<>(dimensions.size());
+int noDicCount = 0;
+for (int i = 0; i < dimensions.size(); i++) {
+  if (!dimensions.get(i).hasEncoding(Encoding.DICTIONARY)) {
+noDicIndexes.add(i);
+noDicCount++;
+  }
+}
+
+boolean[] noDicSortColMapping = new boolean[noDicCount];
+for (int i = 0; i < noDicSortColMapping.length; i++) {
+  if (dimensions.get(noDicIndexes.get(i)).isSortColumn()) {
+noDicSortColMapping[i] = true;
+  }
--- End diff --

same comment as above


---


[GitHub] carbondata pull request #2654: [CARBONDATA-2896] Adaptive Encoding for Primi...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2654#discussion_r214055654
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
 ---
@@ -320,12 +325,22 @@ public static CarbonFactDataHandlerModel 
getCarbonFactDataHandlerModel(CarbonLoa
 List allDimensions = carbonTable.getDimensions();
 int dictDimCount = allDimensions.size() - 
segmentProperties.getNumberOfNoDictionaryDimension()
 - segmentProperties.getComplexDimensions().size();
+CarbonColumn[] noDicAndComplexColumns =
+new 
CarbonColumn[segmentProperties.getNumberOfNoDictionaryDimension() + 
segmentProperties
+.getComplexDimensions().size()];
+int noDic = 0;
+int noDicAndComp = 0;
 for (CarbonDimension dim : allDimensions) {
   if (!dim.isComplex() && !dim.hasEncoding(Encoding.DICTIONARY) &&
   dim.getDataType() == DataTypes.VARCHAR) {
 // ordinal is set in 
CarbonTable.fillDimensionsAndMeasuresForTables()
 varcharDimIdxInNoDict.add(dim.getOrdinal() - dictDimCount);
   }
+  if (!dim.hasEncoding(Encoding.DICTIONARY) || dim.isComplex() || null 
!= dim
+  .getComplexParentDimension()) {
+noDicAndComplexColumns[noDicAndComp++] =
--- End diff --

Modify the if loop check to check only for dictionary encoding
if (!dim.hasEncoding(Encoding.DICTIONARY))


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214024187
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -310,6 +503,31 @@ private static Field prepareFields(Schema.Field 
avroField) {
 } else {
   return null;
 }
+  case UNION:
+int i = 0;
+// Get union types and store as Struct
+ArrayList unionFields = new ArrayList<>();
+for (Schema avroSubField : avroField.schema().getTypes()) {
+  StructField unionField = prepareSubFields(avroField.name() + 
i++, avroSubField);
--- End diff --

check for NULL schema here


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214023293
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +181,126 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case UNION:
+// Union type will be internally stored as Struct
+// Fill data object only if fieldvalue is instance of datatype
+// For other field datatypes, fill value as Null
+List unionFields = avroField.schema().getTypes();
+int notNullUnionFieldsCount = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+notNullUnionFieldsCount++;
+  }
+}
+Object[] values = new Object[notNullUnionFieldsCount];
+int j = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+if (checkFieldValueType(unionField.getType(), fieldValue)) {
+  values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
+} else {
+  values[j] = null;
+}
+j++;
+  }
+}
+out = new StructObject(values);
+break;
+  default:
+out = avroPrimitiveFieldToObject(type, logicalType, fieldValue);
+}
+return out;
+  }
+
+  /**
+   * For Union type, fill data if Schema.Type is instance of fieldValue
+   * and return result
+   *
+   * @param type
+   * @param fieldValue
+   * @return
+   */
+  private boolean checkFieldValueType(Schema.Type type, Object fieldValue) 
{
+switch (type) {
+  case INT:
+return (fieldValue instanceof Integer);
+  case BOOLEAN:
+return (fieldValue instanceof Boolean);
+  case LONG:
+return (fieldValue instanceof Long);
+  case DOUBLE:
+return (fieldValue instanceof Double);
+  case STRING:
+return (fieldValue instanceof Utf8 || fieldValue instanceof 
String);
+  case FLOAT:
+return (fieldValue instanceof Float);
+  case RECORD:
+return (fieldValue instanceof GenericData.Record);
+  case ARRAY:
+return (fieldValue instanceof GenericData.Array || fieldValue 
instanceof ArrayList);
+  case BYTES:
+return (fieldValue instanceof ByteBuffer);
+  case MAP:
+return (fieldValue instanceof HashMap);
+  case ENUM:
+return (fieldValue instanceof GenericData.EnumSymbol);
+  default:
+return false;
+}
+  }
+
+  private Object avroPrimitiveFieldToObject(Schema.Type type, LogicalType 
logicalType,
+  Object fieldValue) {
+Object out;
+switch (type) {
+  case INT:
+if (logicalType != null) {
+  if (logicalType instanceof LogicalTypes.Date) {
+int dateIntValue = (int) fieldValue;
+out = dateIntValue * 
DateDirectDictionaryGenerator.MILLIS_PER_DAY;
+  } else {
+LOGGER.warn("Actual type: INT, Logical Type: " + 
logicalType.getName());
+out = fieldValue;
+  }
+} else {
+  out = fieldValue;
+}
+break;
+  case BOOLEAN:
+  case LONG:
+if (logicalType != null && !(logicalType instanceof 
LogicalTypes.TimestampMillis)) {
+  if (logicalType instanceof LogicalTypes.TimestampMicros) {
+long dateIntValue = (long) fieldValue;
+out = dateIntValue / 1000L;
+  } else {
+LOGGER.warn("Actual type: INT, Logical Type: " + 
logicalType.getName());
+out = fieldValue;
+  }
+} else {
+  out = fieldValue;
+}
+break;
+  case DOUBLE:
+  case STRING:
+  case ENUM:
+out = fieldValue;
+break;
+  case FLOAT:
+// direct conversion will change precision. So parse from string.
+// also carbon internally needs float as double
+out = Double.parseDouble(fieldValue.toString());
+break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new S

[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214024597
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -310,6 +503,31 @@ private static Field prepareFields(Schema.Field 
avroField) {
 } else {
   return null;
 }
+  case UNION:
+int i = 0;
+// Get union types and store as Struct
+ArrayList unionFields = new ArrayList<>();
+for (Schema avroSubField : avroField.schema().getTypes()) {
+  StructField unionField = prepareSubFields(avroField.name() + 
i++, avroSubField);
+  if (unionField != null) {
+unionFields.add(unionField);
+  }
+}
+if (unionFields.size() != 0) {
--- End diff --

replace 'unionFields.size()' with 'unionFields.isEmpty()'


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214022747
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +181,126 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case UNION:
+// Union type will be internally stored as Struct
+// Fill data object only if fieldvalue is instance of datatype
+// For other field datatypes, fill value as Null
+List unionFields = avroField.schema().getTypes();
+int notNullUnionFieldsCount = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+notNullUnionFieldsCount++;
+  }
+}
+Object[] values = new Object[notNullUnionFieldsCount];
+int j = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+if (checkFieldValueType(unionField.getType(), fieldValue)) {
+  values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
+} else {
+  values[j] = null;
+}
--- End diff --

1. Remove else block
2. Combine above 2 if conditions into 1 using && operator
3. break the loop once if check is success


---


[GitHub] carbondata issue #2663: [CARBONDATA-2894] Add support for complex map type t...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2663
  
retest this please


---


[GitHub] carbondata issue #2671: [CARBONDATA-2876]AVRO datatype support through SDK

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2671
  
@Indhumathi27 ..please modify the code as per the comments then we can 
continue with further review of code


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213914935
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
--- End diff --

Remove this semi-colon


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213916136
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
+} else {
+  out = fieldValue;
+}
+break;
+  case UNION:
+List fieldsUnion = avroField.schema().getTypes();
+int countIfNotNull = 0;
+for (Schema unionField : fieldsUnion) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+countIfNotNull++;
+  }
+}
+Object[] values;
+values = new Object[countIfNotNull];
--- End diff --

1. Rename countIfNotNull to notNullUnionFieldsCount
2. merge above 2 lines 'Object[] values = new Object[countIfNotNull];'
3. Check union behavior for only NULL type and handle if any special 
handling is required


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213915304
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
+} else {
+  out = fieldValue;
--- End diff --

else case handling is not required as Binary data type is not 
supported...so write a proper comment to say that only decimal logical type is 
support for avro Byte data type. Once binary data type is supported we can add 
the else block


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213922678
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
+} else {
+  out = fieldValue;
+}
+break;
+  case UNION:
+List fieldsUnion = avroField.schema().getTypes();
+int countIfNotNull = 0;
+for (Schema unionField : fieldsUnion) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+countIfNotNull++;
+  }
+}
+Object[] values;
+values = new Object[countIfNotNull];
+int j = 0;
+for (Schema unionField : fieldsUnion) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
--- End diff --

1. Try to reuse the code as much as possible. You can extract the 
primitives types computation in a separate method and override only complex 
types as different methods for union and rest of the data types
2. Write a function for union type to identify the schema for which the 
computation need to be done. Do not call the computation for all the union 
types as at one time only one type of value will exists


---


[GitHub] carbondata issue #2663: [CARBONDATA-2894] Add support for complex map type t...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2663
  
retest this please


---


[GitHub] carbondata issue #2649: [CARBONDATA-2869] Add support for Avro Map data type...

2018-08-29 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2649
  
@ravipesala ...fixed review comment. Kindly review and merge


---


[GitHub] carbondata pull request #2663: [CARBONDATA-2894] Add support for complex map...

2018-08-27 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2663

[CARBONDATA-2894] Add support for complex map type through spark carbon 
file format API

This PR supports loading querying complex map type through spark carbon 
file format API.

**Note: This PR is dependent on PR #2649** 

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Added test cases   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
map_spark_carbon_file_support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2663.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2663


commit e67bd0cb485c4bed15ce8ac3ef3be9b3a4f3798e
Author: manishgupta88 
Date:   2018-08-20T04:59:29Z

Added support for Avro Map type using SDK

commit 6db7f2a0d7c02406e0ecc9aa7ac69e2ec2e540a6
Author: manishgupta88 
Date:   2018-08-27T13:47:21Z

Add support for complex map type using spark carbon file format API




---


[GitHub] carbondata issue #2651: [HOTFIX] Support TableProperties Map API for SDK

2018-08-24 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2651
  
LGTM...can be merged once build is passed


---


[GitHub] carbondata pull request #2651: [HOTFIX] Support TableProperties Map API for ...

2018-08-24 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2651#discussion_r212582796
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -272,6 +272,56 @@ public CarbonWriterBuilder withLoadOptions(Map options) {
 return this;
   }
 
+  /**
+   * To support the table properties for sdk writer
+   *
+   * @param options key,value pair of create table properties.
+   * supported keys values are
+   * a. blocksize -- [1-2048] values in MB. Default value is 1024
+   * b. blockletsize -- values in MB. Default value is 64 MB
+   * c. localDictionaryThreshold -- positive value, default is 1
+   * d. enableLocalDictionary -- true / false. Default is false
+   * e. sortcolumns -- comma separated column. "c1,c2". Default all 
dimensions are sorted.
+   *
+   * @return updated CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withTableProperties(Map 
options) {
+Objects.requireNonNull(options, "Table properties should not be null");
+//validate the options.
+if (options.size() > 5) {
+  throw new IllegalArgumentException("Supports only 5 options now. "
+  + "Refer method header or documentation");
+}
+
+for (String option: options.keySet()) {
+  if (!option.equalsIgnoreCase("blocksize") &&
+  !option.equalsIgnoreCase("blockletsize") &&
+  !option.equalsIgnoreCase("localDictionaryThreshold") &&
+  !option.equalsIgnoreCase("enableLocalDictionary") &&
+  !option.equalsIgnoreCase("sortcolumns")) {
+throw new IllegalArgumentException("Unsupported options. "
--- End diff --

Better to put all the allowed options in a set and then each property u can 
check using Set.Contains API which will be faster and a cleaner code


---


[GitHub] carbondata issue #2649: [WIP] Add support for Avro Map data type support for...

2018-08-23 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2649
  
@jackylk 
1. This PR is only for SDK support
2. Yes map is implemented as an array of struct. For more 
details you can check the design document

https://docs.google.com/document/d/1HHe2fdkIh3Jyz1y3494_2kGRSc4muTWuilAmwg5lpVw/edit?usp=sharing


---


[GitHub] carbondata pull request #2649: [WIP] Add support for Avro Map data type supp...

2018-08-22 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2649

[WIP] Add support for Avro Map data type support for SDK

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata sdk_map_support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2649.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2649


commit 40c678fe7b5892ccb6fe30b5b484fe612c1fda54
Author: manishgupta88 
Date:   2018-08-20T04:59:29Z

changes for supporting map complex type for SDK




---


[GitHub] carbondata pull request #2613: [HOTFIX] Modified code to fix the degrade in ...

2018-08-07 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2613#discussion_r208182587
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/iterator/RawResultIterator.java
 ---
@@ -53,153 +39,124 @@
*/
   private CarbonIterator detailRawQueryResultIterator;
 
-  private boolean prefetchEnabled;
-  private List currentBuffer;
-  private List backupBuffer;
-  private int currentIdxInBuffer;
-  private ExecutorService executorService;
-  private Future fetchFuture;
-  private Object[] currentRawRow = null;
-  private boolean isBackupFilled = false;
+  /**
+   * Counter to maintain the row counter.
+   */
+  private int counter = 0;
+
+  private Object[] currentConveretedRawRow = null;
+
+  /**
+   * LOGGER
+   */
+  private static final LogService LOGGER =
+  LogServiceFactory.getLogService(RawResultIterator.class.getName());
+
+  /**
+   * batch of the result.
+   */
+  private RowBatch batch;
 
   public RawResultIterator(CarbonIterator 
detailRawQueryResultIterator,
-  SegmentProperties sourceSegProperties, SegmentProperties 
destinationSegProperties,
-  boolean isStreamingHandOff) {
+  SegmentProperties sourceSegProperties, SegmentProperties 
destinationSegProperties) {
 this.detailRawQueryResultIterator = detailRawQueryResultIterator;
 this.sourceSegProperties = sourceSegProperties;
 this.destinationSegProperties = destinationSegProperties;
-this.executorService = Executors.newFixedThreadPool(1);
-
-if (!isStreamingHandOff) {
-  init();
-}
   }
 
-  private void init() {
-this.prefetchEnabled = CarbonProperties.getInstance().getProperty(
-CarbonCommonConstants.CARBON_COMPACTION_PREFETCH_ENABLE,
-
CarbonCommonConstants.CARBON_COMPACTION_PREFETCH_ENABLE_DEFAULT).equalsIgnoreCase("true");
-try {
-  new RowsFetcher(false).call();
-  if (prefetchEnabled) {
-this.fetchFuture = executorService.submit(new RowsFetcher(true));
-  }
-} catch (Exception e) {
-  LOGGER.error(e, "Error occurs while fetching records");
-  throw new RuntimeException(e);
-}
-  }
+  @Override public boolean hasNext() {
 
-  /**
-   * fetch rows
-   */
-  private final class RowsFetcher implements Callable {
-private boolean isBackupFilling;
-
-private RowsFetcher(boolean isBackupFilling) {
-  this.isBackupFilling = isBackupFilling;
-}
-
-@Override
-public Void call() throws Exception {
-  if (isBackupFilling) {
-backupBuffer = fetchRows();
-isBackupFilled = true;
+if (null == batch || checkIfBatchIsProcessedCompletely(batch)) {
+  if (detailRawQueryResultIterator.hasNext()) {
+batch = null;
+batch = detailRawQueryResultIterator.next();
+counter = 0; // batch changed so reset the counter.
   } else {
-currentBuffer = fetchRows();
+return false;
   }
-  return null;
 }
-  }
 
-  private List fetchRows() {
-if (detailRawQueryResultIterator.hasNext()) {
-  return detailRawQueryResultIterator.next().getRows();
+if (!checkIfBatchIsProcessedCompletely(batch)) {
+  return true;
 } else {
-  return new ArrayList<>();
+  return false;
 }
   }
 
-  private void fillDataFromPrefetch() {
-try {
-  if (currentIdxInBuffer >= currentBuffer.size() && 0 != 
currentIdxInBuffer) {
-if (prefetchEnabled) {
-  if (!isBackupFilled) {
-fetchFuture.get();
-  }
-  // copy backup buffer to current buffer and fill backup buffer 
asyn
-  currentIdxInBuffer = 0;
-  currentBuffer = backupBuffer;
-  isBackupFilled = false;
-  fetchFuture = executorService.submit(new RowsFetcher(true));
-} else {
-  currentIdxInBuffer = 0;
-  new RowsFetcher(false).call();
+  @Override public Object[] next() {
--- End diff --

ok


---


[GitHub] carbondata pull request #2613: [HOTFIX] Modified code to fix the degrade in ...

2018-08-07 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2613

[HOTFIX] Modified code to fix the degrade in compaction performance

Problem
Compaction performance for 3.5 billion degraded by 16-20%

Analysis:
Code modification in RawResultIerator.java has caused the problem wherein 
few extra checks are performed as compared to previous code

Fix
Revert the changes in RawResultIerator class

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 NA
 - [ ] Document update required?
No
 - [ ] Testing done
Verified in cluster testing   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata revert_pr_2133

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2613


commit 1481759abb7e8728bca95a557ce610f6116a2652
Author: m00258959 
Date:   2018-08-07T05:24:16Z

Problem
Compaction performance for 3.5 billion degraded by 16-20%

Analysis:
Code modification in RawResultIerator.java has caused the problem wherein 
few extra checks are performed as compared to previous code

Fix
Revert the changes in RawResultIerator class




---


[GitHub] carbondata issue #2608: [CARBONDATA-2829][CARBONDATA-2832] Fix creating merg...

2018-08-06 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2608
  
LGTM


---


[GitHub] carbondata issue #2601: [CARBONDATA-2804][DataMap] fix the bug when bloom fi...

2018-08-03 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2601
  
LGTM


---


[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2601#discussion_r207224591
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion 
getFormatVersion(CarbonTable carbonTable)
   }
   storePath = 
carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo());
 }
-
-CarbonFile[] carbonFiles = FileFactory
-.getCarbonFile(storePath)
-.listFiles(new CarbonFileFilter() {
-  @Override
-  public boolean accept(CarbonFile file) {
-if (file == null) {
-  return false;
-}
-return file.getName().endsWith("carbondata");
-  }
-});
-if (carbonFiles == null || carbonFiles.length < 1) {
-  return CarbonProperties.getInstance().getFormatVersion();
+// get the carbon index file header
+FileFactory.FileType fileType = FileFactory.getFileType(storePath);
+ColumnarFormatVersion version = null;
+if (FileFactory.isFileExist(storePath, fileType)) {
+  SegmentIndexFileStore fileStore = new SegmentIndexFileStore();
+  fileStore.readAllIIndexOfSegment(storePath);
+  Map carbonIndexMap = fileStore.getCarbonIndexMap();
+  if (carbonIndexMap.size() == 0) {
+version = CarbonProperties.getInstance().getFormatVersion();
+  }
+  CarbonIndexFileReader indexReader = new CarbonIndexFileReader();
+  for (byte[] fileData : carbonIndexMap.values()) {
+indexReader.openThriftReader(fileData);
+IndexHeader indexHeader = indexReader.readIndexHeader();
+version = 
ColumnarFormatVersion.valueOf((short)indexHeader.getVersion());
+break;
--- End diff --

once reading is complete close indexReader using try and finally block
indexReader.closeThriftReader()


---


[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2601#discussion_r207223659
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion 
getFormatVersion(CarbonTable carbonTable)
   }
   storePath = 
carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo());
 }
-
-CarbonFile[] carbonFiles = FileFactory
-.getCarbonFile(storePath)
-.listFiles(new CarbonFileFilter() {
-  @Override
-  public boolean accept(CarbonFile file) {
-if (file == null) {
-  return false;
-}
-return file.getName().endsWith("carbondata");
-  }
-});
-if (carbonFiles == null || carbonFiles.length < 1) {
-  return CarbonProperties.getInstance().getFormatVersion();
+// get the carbon index file header
+FileFactory.FileType fileType = FileFactory.getFileType(storePath);
+ColumnarFormatVersion version = null;
+if (FileFactory.isFileExist(storePath, fileType)) {
--- End diff --

Rename storePath to segmentPath


---


[GitHub] carbondata pull request #2601: [CARBONDATA-2804][DataMap] fix the bug when b...

2018-08-02 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2601#discussion_r207225748
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -3212,28 +3213,27 @@ public static ColumnarFormatVersion 
getFormatVersion(CarbonTable carbonTable)
   }
   storePath = 
carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo());
 }
-
-CarbonFile[] carbonFiles = FileFactory
-.getCarbonFile(storePath)
-.listFiles(new CarbonFileFilter() {
-  @Override
-  public boolean accept(CarbonFile file) {
-if (file == null) {
-  return false;
-}
-return file.getName().endsWith("carbondata");
-  }
-});
-if (carbonFiles == null || carbonFiles.length < 1) {
-  return CarbonProperties.getInstance().getFormatVersion();
+// get the carbon index file header
+FileFactory.FileType fileType = FileFactory.getFileType(storePath);
+ColumnarFormatVersion version = null;
+if (FileFactory.isFileExist(storePath, fileType)) {
+  SegmentIndexFileStore fileStore = new SegmentIndexFileStore();
+  fileStore.readAllIIndexOfSegment(storePath);
+  Map carbonIndexMap = fileStore.getCarbonIndexMap();
+  if (carbonIndexMap.size() == 0) {
+version = CarbonProperties.getInstance().getFormatVersion();
+  }
+  CarbonIndexFileReader indexReader = new CarbonIndexFileReader();
+  for (byte[] fileData : carbonIndexMap.values()) {
+indexReader.openThriftReader(fileData);
+IndexHeader indexHeader = indexReader.readIndexHeader();
+version = 
ColumnarFormatVersion.valueOf((short)indexHeader.getVersion());
+break;
+  }
+} else {
+  version = CarbonProperties.getInstance().getFormatVersion();
--- End diff --

1. This else condition is not required as for a valid segment its path will 
exist in the file system.
2. If at all the path does not exist then read the version info from the 
next valid segment and log a warning that valid segment path does not exist in 
the system


---


[GitHub] carbondata issue #2600: [CARBONDATA-2813] Fixed code to get data size from L...

2018-08-02 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2600
  
LGTM


---


[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....

2018-08-01 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2595
  
@xuchuanyin Usually in production scenarios driver memory will be less 
than the executor memory. Now we are using unsafe for caching block/blocklet 
dataMap in driver.  Current unsafe memory configured fo executor is getting 
used for driver also which is not a good idea.
Therefore it is required to separate out driver and executor unsafe memory.
You can observe the same in spark configuration also that spark has given 
different parameters for configuring driver and executor memory overhead to 
control the unsafe memory usage.
spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead


---


[GitHub] carbondata pull request #2595: [Documentation] [Unsafe Configuration] Added ...

2018-08-01 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2595

[Documentation] [Unsafe Configuration] Added 
carbon.unsafe.driver.working.memory.in.mb parameter to differentiate between 
driver and executor unsafe memory

Added carbon.unsafe.driver.working.memory.in.mb parameter to differentiate 
between driver and executor unsafe memory

 - [ ] Any interfaces changed?
No 
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
Yes. Updated
 - [ ] Testing done
Verified manually   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
unsafe_driver_property

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2595


commit 8196be0a50d9d0f38452cfdf5fbe0b0485973e87
Author: manishgupta88 
Date:   2018-08-01T14:08:30Z

Added carbon.unsafe.driver.working.memory.in.mb parameter to differentiate 
between driver and executor unsafe memory




---


[GitHub] carbondata pull request #2593: [CARBONDATA-2753][Compatibility] Merge Index ...

2018-08-01 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2593

[CARBONDATA-2753][Compatibility] Merge Index file not getting created with 
blocklet information for old store

**Problem**
Merge Index file not getting created with blocklet information for old store

**Analysis**
In legacy store (store <= 1.1 version), blocklet information is not written 
in the carbon Index files. When merge Index is created using the Alter DDL 
command on old store then merge Index file should be created with blocklet 
information which is as per the new store. This is not happening because the 
flag to read the carbondata file footer is not passed as true from Alter DDL 
command flow.

**Fix**
Pass the flag to read carbondataFileFooter as true while creating the merge 
Index file using Alter DDL command

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Manually verified   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
merge_index_compatibility

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2593


commit 4af316a7cebd0f05a08c4f4519302961eccbcb1f
Author: manishgupta88 
Date:   2018-08-01T08:54:52Z

Problem
Merge Index file not getting created with blocklet information for old store

Analysis
In legacy store (store <= 1.1 version), blocklet information is not written 
in the carbon Index files. When merge Index is created using the Alter DDL 
command on old store then merge Index file should be created with blocklet 
information which is as per the new store. This is not happening because the 
flag to read the carbondata file footer is not passed as true from Alter DDL 
command flow.

Fix
Pass the flag to read carbondataFileFooter as true while creating the merge 
Index file using Alter DDL command




---


[GitHub] carbondata issue #2585: [CARBONDATA-2805]fix the ordering mismatch of segmen...

2018-08-01 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2585
  
LGTM


---


<    1   2   3   4   5   6   7   8   >