[GitHub] incubator-carbondata pull request #296: [CARBONDATA-382]Like Filter Query Op...

2016-11-04 Thread sujith71955
GitHub user sujith71955 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/296

[CARBONDATA-382]Like Filter Query Optimization for Dictionary Columns

**Like Filter Query Optimization for Dictionary Columns**
a) Added pushdown mechanism for the Like filters like startsWith,endsWith 
and contains so that the respective filters will be processed in Carbon layer 
itself.
b) This mechanism can provide significant gain in the performance of Like 
filter queries applied in the dictionary since block and blocklet level pruning 
will be done in the carbon layer before applying the filters in dictionary 
columns.
c) Since three new expressions has been added in carbon layer the carbon 
will be applying the expression only once for startsWith/endsWith/contains 
filter query, this will make the dictionary lookup also once for applying the 
Like expression.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujith71955/incubator-carbondata 
master_filterstartswith

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #296


commit 608906a25afc2bbabecfdd3c7bfa1351e83de9a0
Author: sujith71955 <sujithchacko.2...@gmail.com>
Date:   2016-11-04T17:36:18Z

[CARBONDATA-382]Like Filter Query Optimization for Dictionary Columns




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim property fo...

2016-11-01 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957010
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/graphgenerator/GraphGenerator.java
 ---
@@ -998,4 +1000,24 @@ private void 
prepareIsUseInvertedIndex(List dims,
 graphConfig.setIsUseInvertedIndex(
 isUseInvertedIndexList.toArray(new 
Boolean[isUseInvertedIndexList.size()]));
   }
+
+  /**
+   * Preparing the boolean [] to map whether the dimension use trim or not.
+   *
+   * @param dims
+   * @param graphConfig
+   */
+  private void prepareIsUseTrim(List dims,
+GraphConfigurationInfo graphConfig) {
+List isUseTrimList = new ArrayList();
+for (CarbonDimension dimension : dims) {
+  if (dimension.isUseTrim()) {
--- End diff --

Can we add this trim option as property of the column, i mean inside a 
column property rather than setting directly as CarbonDimension property, 
Already in CarbonColumn there is a property map which holds column related 
properties, i think we can use that, please check the feasibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim property fo...

2016-11-01 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957803
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -472,6 +475,7 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+<<<<<<< HEAD
--- End diff --

is this file is having any conflict?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim property fo...

2016-11-01 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957411
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -1694,5 +1699,19 @@ public void setTableOption(String tableOption) {
   public TableOptionWrapper getTableOptionWrapper() {
 return tableOptionWrapper;
   }
+
+  public String getIsUseTrim() {
+return isUseTrim;
+  }
+
+  public void setIsUseTrim(Boolean[] isUseTrim) {
+for (Boolean flag: isUseTrim) {
+  if (flag) {
+this.isUseTrim += "T";
--- End diff --

Use  TRUE/FALSE for better readability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #246: [CARBONDATA-321] If user changes the...

2016-10-17 Thread sujith71955
GitHub user sujith71955 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/246

[CARBONDATA-321] If user changes the blocklet size the queries will b…

**Problem:**
If user changes the blocklet size (50) the queries will be failed.
Currently byte size calculation in number compressor is 
int array length * bit length=x
bytesize= x/8+x%8;
when cardinality is 50
then byte size will be 3
but to calculate the actual int array size of index reverse based on the 
reverse calculation it is coming 4
int arrayLength=(3*8)/6
**Solution:**
We need to round off bit length to nearest value which should be divisible 
by 8



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujith71955/incubator-carbondata 
master_BlockletIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #246


commit e1d91197b4173625a1cb8c469cec95290bb93185
Author: sujith71955 <sujithchacko.2...@gmail.com>
Date:   2016-10-17T11:12:16Z

[CARBONDATA-321] If user changes the blocklet size the queries will be 
failed. this is happening since the inverted index compression is not been 
handled properly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...

2016-10-16 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/194#discussion_r83547082
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java ---
@@ -1426,4 +1423,25 @@ private static void 
getUnknownExpressionsList(Expression expression,
   getUnknownExpressionsList(child, lst);
 }
   }
+  /**
+   * This method will compare double values it will preserve
+   * the -0.0 and 0.0 equality as per == ,also preserve NaN equality check 
as per
+   * java.lang.Double.equals()
+   *
+   * @param d1 double value for equality check
+   * @param d2 double value for equality check
+   * @return boolean after comparing two double values.
+   */
+  public static int compare(Double d1, Double d2) {
+if ((d1.doubleValue() == d2.doubleValue()) || (Double.isNaN(d1) && 
Double.isNaN(d2))) {
+  return 0;
+}
+if (d1 < d2) {
--- End diff --

one condition we can save in else, its fine i will update as per your 
comments. Thanks for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...

2016-10-16 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/194#discussion_r83546963
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java ---
@@ -1426,4 +1423,25 @@ private static void 
getUnknownExpressionsList(Expression expression,
   getUnknownExpressionsList(child, lst);
 }
   }
+  /**
+   * This method will compare double values it will preserve
+   * the -0.0 and 0.0 equality as per == ,also preserve NaN equality check 
as per
+   * java.lang.Double.equals()
+   *
+   * @param d1 double value for equality check
+   * @param d2 double value for equality check
+   * @return boolean after comparing two double values.
+   */
+  public static int compare(Double d1, Double d2) {
+if ((d1.doubleValue() == d2.doubleValue()) || (Double.isNaN(d1) && 
Double.isNaN(d2))) {
+  return 0;
+}
+if (d1 < d2) {
--- End diff --

since we are returning once any condition matches i think both if or if 
else makes no difference in this context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #241: [CARBONDATA-319] Bad Records logging...

2016-10-15 Thread sujith71955
GitHub user sujith71955 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/241

[CARBONDATA-319] Bad Records logging for column data type is not proper

**Problem:**
Bad Records logging for column data type is not proper in case of long, 
carbon system while creating the table metadata uses BIGINT instead of LONG , 
internally it converts the bigint to long type and processes , while processing 
the data if any long type data is having issue it will be logged into the bad 
record with data type Long which is not proper since as per the metadata the 
datatype of column is BIGINT.
**Solution :**
Pass name for each data type ENUM constant in constructor, while logging 
the bad record get the column data type name value from the data type ENUM 
constant, for LONG the name will be BIGINT.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujith71955/incubator-carbondata 
master_BadRecord_Issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #241


commit 2c08ecdbb50e57afcea26d25d80fe2b7f8208310
Author: sujith71955 <sujithchacko.2...@gmail.com>
Date:   2016-10-15T15:39:27Z

[CARBONDATA-319]
Bad Records logging for column data type is not proper in case of long, 
carbon system while creating the table metadata uses BIGINT instead of LONG , 
internally it converts the bigint to long type and processes , while processing 
the data if any long type data is having issue it will be logged into the bad 
record with data type Long which is not proper since as per the metadata the 
datatype of column is BIGINT.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #224: [CARBONDATA-239]Add scan_blocklet_nu...

2016-10-12 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/224#discussion_r82983904
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/scanner/impl/FilterScanner.java 
---
@@ -78,10 +80,11 @@ public FilterScanner(BlockExecutionInfo 
blockExecutionInfo) {
* @throws QueryExecutionException
* @throws FilterUnsupportedException
*/
-  @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder 
blocksChunkHolder)
+  @Override public AbstractScannedResult scanBlocklet(BlocksChunkHolder 
blocksChunkHolder,
+  QueryStatisticsModel 
queryStatisticsModel)
   throws QueryExecutionException {
 try {
-  fillScannedResult(blocksChunkHolder);
+  fillScannedResult(blocksChunkHolder, queryStatisticsModel);
--- End diff --

Pass the model in constructor so that no need to change in all API


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-12 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977592
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

pros of this approach will be suppose in one load user loaded with dirty 
data and suddenly he realizes no i need to trim then in the next load he will 
enable the option and load the data, this will increase the dictionary space 
also, also in query dictionary lookup overhead will increase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-12 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968468
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Guys i think if while data loading we are reading from configuration 
inorder to trim it or not same we need to do while doing filter also, based on 
configuration value decide.
ex:  while loading i enabled trim property, so system will trim and load 
the data, now in filter query also if user provides while space then it needs 
to be trimmed while creating filter model. This will provide more system 
consistentency. if user enable trim then we wont trim it while loading and also 
while querying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82758567
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Same you need to handle in CarbonFilters.scala also, since while processing 
filter expressions the spaces are not getting trimmed, so here also you need to 
take care based on your feature


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #148: [CARBONDATA-233] bad record logger s...

2016-10-07 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/148#discussion_r82339778
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java
 ---
@@ -89,41 +95,55 @@ public static String hasBadRecord(String key) {
 return badRecordEntry.remove(key);
   }
 
-  public void addBadRecordsToBilder(Object[] row, int size, String reason, 
String valueComparer) {
+  public void addBadRecordsToBuilder(Object[] row, int size, String 
reason, String valueComparer,
+  boolean badRecordsLogRedirect, boolean badRecordLoggerEnable) {
 StringBuilder logStrings = new StringBuilder();
+size = row.length;
--- End diff --

better to validate whether we need to log the bad record in the starting of 
method, and then continue the log string creation logic


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

2016-09-30 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/189#discussion_r81310297
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -419,6 +420,7 @@ class TableNewProcessor(cm: tableModel, sqlContext: 
SQLContext) {
 schemaEvol
   .setSchemaEvolutionEntryList(new 
util.ArrayList[SchemaEvolutionEntry]())
 tableSchema.setTableId(UUID.randomUUID().toString)
+
tableSchema.setBlocksize(Integer.parseInt(cm.tableBlockSize.getOrElse(0).toString))
--- End diff --

What if user will provide value as 1024M or 1MB, you will get exception and 
the same has not been handled.
I think we need to handle


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-09-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281079
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus 
schmaModel, SchemaInfo info
 model.setEscapeCharacter(schmaModel.getEscapeCharacter());
 model.setQuoteCharacter(schmaModel.getQuoteCharacter());
 model.setCommentCharacter(schmaModel.getCommentCharacter());
+model.setTrim(schmaModel.getTrim());
--- End diff --

Not clear about the usecase, can you make it more clear by providing more 
details


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-09-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281050
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus 
schmaModel, SchemaInfo info
 model.setEscapeCharacter(schmaModel.getEscapeCharacter());
 model.setQuoteCharacter(schmaModel.getQuoteCharacter());
 model.setCommentCharacter(schmaModel.getCommentCharacter());
+model.setTrim(schmaModel.getTrim());
--- End diff --

why we need to set this in schemamodel?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

2016-09-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/189#discussion_r81280420
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/metadata/schema/table/TableSchema.java
 ---
@@ -102,6 +107,14 @@ public void setSchemaEvalution(SchemaEvolution 
schemaEvalution) {
 this.schemaEvalution = schemaEvalution;
   }
 
+  public int getBlockszie() {
+return blockszie;
+  }
+
+  public void setBlockszie(int blockszie) {
+this.blockszie = blockszie;
--- End diff --

provide proper variable name 'blocksize'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

2016-09-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/189#discussion_r81280323
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala ---
@@ -546,6 +548,18 @@ class CarbonSqlParser()
 colGrpNames.toString()
   }
 
+  protected def getTableBlockSize(tableProperties: Map[String, String]): 
Integer = {
+var tableBlockSize: Integer = 0
+if 
(tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).isDefined) {
+  val blockSizeStr: String = 
tableProperties.get(CarbonCommonConstants.TABLE_BLOCKSIZE).get
+  try {
+tableBlockSize = Integer.parseInt(blockSizeStr)
+  } catch {
+case e: NumberFormatException => tableBlockSize = 0
--- End diff --

better assign default value. so that system will be in a consistent state 
always


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...

2016-09-23 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/194#discussion_r80221578
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java ---
@@ -1401,8 +1401,7 @@ public static void logError(Throwable e, boolean 
invalidRowsPresent) {
   public static boolean nanSafeEqualsDoubles(Double d1, Double d2) {
 Boolean xIsNan = Double.isNaN(d1);
 Boolean yIsNan = Double.isNaN(d2);
-if ((xIsNan && yIsNan) || (d1.doubleValue() == d2.doubleValue())) {
-
+if ((d1.doubleValue() == d2.doubleValue()) || (xIsNan && yIsNan)) {
--- End diff --

yes ramana, right, i will fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...

2016-09-17 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/156#discussion_r79291710
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -287,12 +307,18 @@ private Integer compareDateValues(Long loadValue, 
Long userValue) {
 }
 
   } else {
-LOG.error("Unable to acquire the metadata lock");
+String errorMsg = "Delete segment by id is failed for " + 
tableDetails
++ ". Not able to acquire the delete segment lock due to other 
operation running "
--- End diff --

LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...

2016-09-17 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/156#discussion_r79291381
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -287,12 +307,18 @@ private Integer compareDateValues(Long loadValue, 
Long userValue) {
 }
 
   } else {
-LOG.error("Unable to acquire the metadata lock");
+String errorMsg = "Delete segment by id is failed for " + 
tableDetails
++ ". Not able to acquire the delete segment lock due to other 
operation running "
--- End diff --

update message, provide operation name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...

2016-09-17 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/156#discussion_r79291346
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/locks/LockUsage.java ---
@@ -28,5 +28,7 @@
   public static String COMPACTION_LOCK = "compaction.lock";
   public static String SYSTEMLEVEL_COMPACTION_LOCK = 
"system_level_compaction.lock";
   public static String TABLE_STATUS_LOCK = "tablestatus.lock";
+  public static String DELETE_SEGMENT_LOCK = "delete_segment.lock";
--- End diff --

make it final/use enum


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #163: Problem:Column heading was missing i...

2016-09-17 Thread sujith71955
GitHub user sujith71955 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/163

Problem:Column heading was missing in driver statistics table and sca…

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---

…n block time was always zero in query statistics table.

Analysis:While creating driver statistics table header was not created and 
the scan block time was not initialized before recording in statistics recorder.

Fix:One more line is added to print the column header in driver statistics 
table and the scan block time is initialized before recording the statistics.

Impact:Logs of all query

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujith71955/incubator-carbondata 
statistics_recorder_issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/163.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #163


commit 665978e937603fd85c55a64c656655705ca20f0d
Author: akashrn5 <akash.r.nilu...@huawei.com>
Date:   2016-09-17T07:03:31Z

Problem:Column heading was missing in driver statistics table and scan 
block time was always zero in query statistics table.

Analysis:While creating driver statistics table header was not created and 
the scan block time was not initialized before recording in statistics recorder.

Fix:One more line is added to print the column header in driver statistics 
table and the scan block time is initialized before recording the statistics.

Impact:Logs of all query




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...

2016-09-16 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/156#discussion_r79204842
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -306,13 +330,18 @@ private Integer compareDateValues(Long loadValue, 
Long userValue) {
* @return
*/
   public List updateDeletionStatus(String loadDate, String 
tableFolderPath,
-  Long loadStartTime) {
-ICarbonLock carbonLock = CarbonLockFactory
-
.getCarbonLockObj(absoluteTableIdentifier.getCarbonTableIdentifier(),
-LockUsage.METADATA_LOCK);
+  Long loadStartTime) throws Exception {
+CarbonTableIdentifier carbonTableIdentifier =
+absoluteTableIdentifier.getCarbonTableIdentifier();
+ICarbonLock carbonMetadataLock =
+CarbonLockFactory.getCarbonLockObj(carbonTableIdentifier, 
LockUsage.METADATA_LOCK);
+ICarbonLock carbonTableStatusLock =
+CarbonLockFactory.getCarbonLockObj(carbonTableIdentifier, 
LockUsage.TABLE_STATUS_LOCK);
+String tableDetails =
+carbonTableIdentifier.getDatabaseName() + "." + 
carbonTableIdentifier.getTableName();
 List invalidLoadTimestamps = new ArrayList(0);
 try {
-  if (carbonLock.lockWithRetries()) {
+  if (carbonMetadataLock.lockWithRetries()) {
--- End diff --

make it as operational log


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #156: [CARBONDATA-244] Load and delete seg...

2016-09-16 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/156#discussion_r79187047
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1077,19 +1077,38 @@ object CarbonDataRDDFactory extends Logging {
   val segmentStatusManager = new 
SegmentStatusManager(table.getAbsoluteTableIdentifier)
   val details = segmentStatusManager
 .readLoadMetadata(loadMetadataFilePath)
+  val carbonTableStatusLock = CarbonLockFactory
+
.getCarbonLockObj(table.getAbsoluteTableIdentifier.getCarbonTableIdentifier,
+  LockUsage.TABLE_STATUS_LOCK)
 
   // Delete marked loads
   val isUpdationRequired = DeleteLoadFolders
 .deleteLoadFoldersFromFileSystem(carbonLoadModel, 
hdfsStoreLocation,
   partitioner.partitionCount, isForceDeletion, details)
 
   if (isUpdationRequired) {
+try {
 // Update load metadate file after cleaning deleted nodes
-CarbonLoaderUtil.writeLoadMetadata(
-  carbonLoadModel.getCarbonDataLoadSchema,
-  carbonLoadModel.getDatabaseName,
-  carbonLoadModel.getTableName, details.toList.asJava
-)
+if (carbonTableStatusLock.lockWithRetries()) {
+  logger.info("Table status lock has been successfully acquired.")
+  CarbonLoaderUtil.writeLoadMetadata(
+carbonLoadModel.getCarbonDataLoadSchema,
+carbonLoadModel.getDatabaseName,
+carbonLoadModel.getTableName, details.toList.asJava
+  )
+}
+else {
+  val errorMsg = "Clean files request is failed for " + 
carbonLoadModel.getDatabaseName +
+ "." + carbonLoadModel.getTableName +
+ ". Not able to acquire the table status lock."
--- End diff --

also mention that "due to  other operation running in the  background ", 
this message will give more clear picture to user about the task failure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #159: [Issue Number] CARBONDATA-242

2016-09-16 Thread sujith71955
GitHub user sujith71955 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/159

[Issue Number] CARBONDATA-242

[Problem] Filter result was not compatible with hive result when a null 
filter member is present in not in
filter model, as per hive no result shall be return if a NOT IN filter 
model has null object for comparison.
eg: select country from t3 where country not in (null,'china','france') 
group by country
[Description] When user provides Null member inside NOT IN filter condition 
the resultset is not compatible with hive result.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sujith71955/incubator-carbondata 
master_StartsWithIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #159


commit 8709af0cebc07e8ac25f40f4a31dfdc95b4eb5e6
Author: sujith71955 <sujithchacko.2...@gmail.com>
Date:   2016-09-15T09:23:04Z

[Issue Number] CARBONDATA-242
[Description] When user provides Null member inside NOT IN filter condition 
the resultset is not compatible with hive result




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...

2016-09-02 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/122#discussion_r77335816
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -791,16 +819,23 @@ object GlobalDictionaryUtil extends Logging {
   if (requireDimension.nonEmpty) {
 val model = createDictionaryLoadModel(carbonLoadModel, table, 
requireDimension,
   hdfsLocation, dictfolderPath, false)
+// check if dictionary files contains bad record
+val accumulator = sqlContext.sparkContext.accumulator(0)
 // read local dictionary file, and group by key
 val allDictionaryRdd = readAllDictionaryFiles(sqlContext, 
headers,
-  requireColumnNames, allDictionaryPath)
+  requireColumnNames, allDictionaryPath, accumulator)
 // read exist dictionary and combine
 val inputRDD = new 
CarbonAllDictionaryCombineRDD(allDictionaryRdd, model)
   .partitionBy(new 
ColumnPartitioner(model.primDimensions.length))
 // generate global dictionary files
 val statusList = new 
CarbonGlobalDictionaryGenerateRDD(inputRDD, model).collect()
 // check result status
 checkStatus(carbonLoadModel, sqlContext, model, statusList)
+// if the dictionary contains wrong format record, throw ex
+if (accumulator.value > 0) {
+  throw new DataLoadingException("Data Loading failure, the 
dictionary file " +
--- End diff --

i think content term will be too generic, we can say dictionary values, 
Whats your opinion? better to use more meaningful term which is related to that 
particular file, like Dictionary values


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #122: [CARBONDATA-202] Handled exception t...

2016-09-02 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/122#discussion_r77332668
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -588,30 +588,59 @@ object GlobalDictionaryUtil extends Logging {
  allDictionaryPath: String) = {
 var allDictionaryRdd: RDD[(String, Iterable[String])] = null
 try {
-  // read local dictionary file, and spilt (columnIndex, columnValue)
-  val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
-.map(x => {
+  // parse record and validate record
+  def parseRecord(x: String, accum: Accumulator[Int]) : (String, 
String) = {
 val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR)
-if (tokens.size != 2) {
-  logError("Read a bad dictionary record: " + x)
-}
-var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME
+var columnName: String = ""
 var value: String = ""
-try {
-  columnName = csvFileColumns(tokens(0).toInt)
-  value = tokens(1)
-} catch {
-  case ex: Exception =>
-logError("Reset bad dictionary record as default value")
+// such as "," , "", throw ex
+if (tokens.size == 0) {
+  logError("Read a bad dictionary record: " + x)
+  accum += 1
+} else if (tokens.size == 1) {
+  // such as "1", "jone", throw ex
+  if (x.contains(",") == false) {
+accum += 1
+  } else {
+try {
+  columnName = csvFileColumns(tokens(0).toInt)
+} catch {
+  case ex: Exception =>
+logError("Read a bad dictionary record: " + x)
+accum += 1
+}
+  }
+} else {
+  try {
+columnName = csvFileColumns(tokens(0).toInt)
+value = tokens(1)
+  } catch {
+case ex: Exception =>
+  logError("Read a bad dictionary record: " + x)
+  accum += 1
+  }
 }
 (columnName, value)
-  })
+  }
 
+  val accumulator = sqlContext.sparkContext.accumulator(0)
+  // read local dictionary file, and spilt (columnIndex, columnValue)
+  val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
+.map(x => parseRecord(x, accumulator)).persist()
+  // for accumulator updates performed inside actions only
+  basicRdd.count()
--- End diff --

again we are calling action which is not necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #105: [CARBONDATA-189] Drop database casca...

2016-08-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/105#discussion_r76657713
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala ---
@@ -1342,4 +1345,9 @@ class CarbonSqlParser()
   }
 }
 
+  protected lazy val dropDatabaseCascade: Parser[LogicalPlan] =
+DROP ~> (DATABASE|SCHEMA) ~> opt(IF ~> EXISTS) ~> ident ~> CASCADE <~ 
opt(";") ^^ {
+  case cascade => throw new MalformedCarbonCommandException(
+  "Unsupported cascade operation in drop database command")
--- End diff --

since system supports both database and schema better to provide message 
including schema like database/schema command like "Unsupported cascade 
operation in drop database/schema command" , else better provide  "Unsupported 
cascade operation in drop command".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

2016-08-27 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/100#discussion_r76519992
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
 // filepath regex, look like "/path/*.dictionary"
 if (filePath.getName.startsWith("*")) {
   val dictExt = filePath.getName.substring(1)
-  val listFiles = filePath.getParentFile.listFiles()
-  if (listFiles.exists(file =>
-file.getName.endsWith(dictExt) && file.getSize > 0)) {
-true
+  if (filePath.getParentFile.exists()) {
+val listFiles = filePath.getParentFile.listFiles()
--- End diff --

what if filePath.getParentFile returns null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

2016-08-27 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/100#discussion_r76519971
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -591,29 +591,31 @@ object GlobalDictionaryUtil extends Logging {
   val basicRdd = sqlContext.sparkContext.textFile(allDictionaryPath)
 .map(x => {
 val tokens = x.split("" + CSVWriter.DEFAULT_SEPARATOR)
-var index: Int = 0
+if (tokens.size != 2) {
+  logError("[ALL_DICTIONARY] Read a bad dictionary record: " + x)
+}
+var columnName: String = CarbonCommonConstants.DEFAULT_COLUMN_NAME
 var value: String = ""
 try {
-  index = tokens(0).toInt
+  columnName = csvFileColumns(tokens(0).toInt)
   value = tokens(1)
 } catch {
   case ex: Exception =>
-logError("read a bad dictionary record" + x)
+logError("[ALL_DICTIONARY] Reset bad dictionary record as 
default value")
 }
-(index, value)
+(columnName, value)
   })
+
   // group by column index, and filter required columns
   val requireColumnsList = requireColumns.toList
   allDictionaryRdd = basicRdd
 .groupByKey()
-.map(x => (csvFileColumns(x._1), x._2))
 .filter(x => requireColumnsList.contains(x._1))
 } catch {
   case ex: Exception =>
-logError("read local dictionary files failed")
+logError("[ALL_DICTIONARY] Read dictionary files failed. Caused 
by" + ex.getMessage)
--- End diff --

ALL_DICTIONARY this term is really required in this context?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---