Re: About measure in carbon

2016-12-05 Thread manishgupta88
Hi Ravi,

Currently we are using hive parser to parse carbon create table DDL. Hive
does not support short datatype instead it supports short and byte with a
different name as smallInt(16 bit) and tinyInt(8 bit) data types. But in
carbon we do not support these datatypes.

Do we need to handle these datatypes in carbon?. If yes then I can create a
jira for this issue and track this feature.

Regards
Manish Gupta



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/About-measure-in-carbon-tp3627p3802.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[GitHub] incubator-carbondata pull request #292: [CARBONDATA-375] Dictionary cache no...

2016-11-03 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/292

[CARBONDATA-375] Dictionary cache not getting cleared after task completion 
in dictionary decoder

Problem: Dictionary cache not getting cleared after task completion in 
dictionary decoder

Analysis: Currently LRU cache eviction policy is based on dictionary access 
count. For cache to remove a entry its access count must be 0. In dictionary 
decoder after conversion of surrogate key to actual value the access count for 
dictionary columns in query is not getting decremented due to which it will 
never be cleared from memory when LRU cache size is configured.

Fix: Add a task completion listener which will take care of clearing the 
dictionary in case of both success and failure

Impact area: LRU cache eviction policy which can lead to query and data 
load failure

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata 
dictionary_decoder_clear_dictionary

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #292


commit b305f34e1014267b3706c287cef7070189fc3c28
Author: manishgupta88 <tomanishgupt...@gmail.com>
Date:   2016-11-03T15:48:03Z

Problem: Dictionary cache not getting cleared after task completion in 
dictionary decoder

Analysis: Currently LRU cache eviction policy is based on dictionary access 
count. For cache to remove a entry its access count must be 0. In dictionary 
decoder after conversion of surrogate key to actual value the access count for 
dictionary columns in query is not getting decremented due to which it will 
never be cleared from memory when LRU cache size is configured.

Fix: Add a task completion listener which will take care of clearing the 
dictionary in case of both success and failure

Impact area: LRU cache eviction policy which can lead to query and data 
load failure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #234: [CARBONDATA-315] Data loading fails ...

2016-10-13 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/234

[CARBONDATA-315] Data loading fails if parsing a double value returns 
infinity

Problem: Data loading fails if parsing a double value returns infinity

Analysis: During data load, if a value specified is too big for a double 
DataType column then while parsing that value as double result is returned as 
"Infinity". Due to this while we calculate min and max value for measures in 
carbon data writer step it throws an exception.

Fix: If result is Infinity or NAN for double value parsing then make the 
value as null and add it to bad records.

Impact area: Data load which contains non parseable values for a datatype.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata 
double_value_range_failure

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #234


commit f7225f974828edd8b340f88fbfaa2f60d8a7d582
Author: manishgupta88 <tomanishgupt...@gmail.com>
Date:   2016-10-13T09:47:52Z

Problem: Data loading fails if parsing a double value returns infinity

Analysis: During data load, if a value specified is too big for a double 
DataType column then while parsing that value as double result is returned as 
"Infinity". Due to this while we calculate min and max value for measures in 
carbon data writer step it throws an exception.

Fix: If result is Infinity or NAN for double value parsing then make the 
value as null and add it to bad records.

Impact area: Data load which contains non parseable values for a datatype.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

2016-09-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/189#discussion_r81299063
  
--- Diff: format/src/main/thrift/schema.thrift ---
@@ -124,6 +124,7 @@ struct TableSchema{
1: required string table_id;  // ID used to
2: required list table_columns; // Columns in the table
3: required SchemaEvolution schema_evolution; // History of schema 
evolution of this table
+   4: optional i32 block_size
--- End diff --

@Zhangshunyu ...if you register the property in hive metastore, the 
proeprty will never be lost even on restart of thrift server as we do for 
tablepath...you can create table flow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

2016-09-29 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/189#discussion_r81279628
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -196,11 +197,22 @@ public AbstractFactDataWriter(String storeLocation, 
int measureCount, int mdKeyL
 blockIndexInfoList = new ArrayList<>();
 // get max file size;
 CarbonProperties propInstance = CarbonProperties.getInstance();
-this.fileSizeInBytes = Long.parseLong(propInstance
-.getProperty(CarbonCommonConstants.MAX_FILE_SIZE,
-CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL))
-* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
-* CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L;
+// If blocksize is defined by create table ddl, then use this value 
otherwise
+// use system level property
+if (blocksize >= 
CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL_MIN_VAL &&
+blocksize <= 
CarbonCommonConstants.MAX_FILE_SIZE_DEFAULT_VAL_MAX_VAL) {
+  LOGGER.audit("The max file size is set in table level. ");
+  this.fileSizeInBytes = blocksize * 
CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR
+  * CarbonCommonConstants.BYTE_TO_KB_CONVERSION_FACTOR * 1L;
+} else {
+  LOGGER.audit("The max file size in create table ddl is not set or is 
invalid, " +
+  "so here use " + CarbonCommonConstants.MAX_FILE_SIZE + "to set. 
");
--- End diff --

@Zhangshunyu ...I am thinking instead of validating block size here 
specified by the user, we can validate at the time of creating the table and in 
error message we can display the range of allowed block size so that user is 
not confused.
If user does not specify the block size the consider the default value and 
update in the schema


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #180: [CARBONDATA-260] Equal or lesser val...

2016-09-20 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/180

[CARBONDATA-260] Equal or lesser value of MAXCOLUMNS option than column 
count in CSV header results into array index of bound exception

Problem: Equal or lesser value of MAXCOLUMNS option than column count in 
CSV header results into array index of bound exception

Analysis: If column count in CSV header is more or equal to MAXCOLUMNS 
option value then array index out of bound exception is thrown by the Univocity 
CSV parser. This is because while parsing the row, parser adds each row to an 
array and increments the index and after incrementing it performs one more 
operation using the incremented index value which leads to array index pf bound 
exception. Code snipped as attached below for CSV parser.

public void valueParsed() {
this.parsedValues[column++] = appender.getAndReset();
this.appender = appenders[column];
}

e.g. In the above code if column value is 7 then array index will be from 
0-6 and when column value becomes 6 then in the second line 
ArrayIndexOutOfBoundException will be thrown as column value will become 7.

Fix: Whenever Column count in CSV header is equal or more than MAXCOLUMNS 
option value or default value, increment it by 1.

Impact: Data load flow


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata 
maxcolumns_array_indexOfBound

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #180


commit 3f32424e55615c8e45470d5169b817f9f703dc3e
Author: manishgupta88 <tomanishgupt...@gmail.com>
Date:   2016-09-20T14:21:33Z

Problem: Equal or lesser value of MAXCOLUMNS option than column count in 
CSV header results into array index of bound exception

Analysis: If column count in CSV header is more or equal to MAXCOLUMNS 
option value then array index out of bound exception is thrown by the Univocity 
CSV parser. This is because while parsing the row, parser adds each row to an 
array and increments the index and after incrementing it performs one more 
operation using the incremented index value which leads to array index pf bound 
exception. Code snipped as attached below for CSV parser.

public void valueParsed() {
this.parsedValues[column++] = appender.getAndReset();
this.appender = appenders[column];
}

Fix: Whenever Column count in CSV header is equal or more than MAXCOLUMNS 
option value or default value, increment it by 1.

Impact: Data load flow




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...

2016-09-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/161#discussion_r79282917
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/Compactor.scala
 ---
@@ -69,7 +69,9 @@ object Compactor {
   schemaName,
   factTableName,
   validSegments,
-  
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId
+  
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId,
+  colCardinality = Array[Int](0),
--- End diff --

Instead of creating an empty list and reassigning it you can create a 
reference of Array[Int] type like
var colCardinality: Array[Int] = null


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #156: [WIP] Added table Status lock while ...

2016-09-16 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/156#discussion_r79111960
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -306,13 +330,17 @@ private Integer compareDateValues(Long loadValue, 
Long userValue) {
* @return
*/
   public List updateDeletionStatus(String loadDate, String 
tableFolderPath,
-  Long loadStartTime) {
-ICarbonLock carbonLock = CarbonLockFactory
+  Long loadStartTime, String dbName, String tableName) throws 
Exception {
--- End diff --

same comment as above...take the database and table name from carbon table 
identifier instance


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #156: [WIP] Added table Status lock while ...

2016-09-16 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/156#discussion_r79111804
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -246,15 +247,22 @@ private Integer compareDateValues(Long loadValue, 
Long userValue) {
* updates deletion status
* @param loadIds
* @param tableFolderPath
+   * @param dbName
+   * @param tableName
* @return
*/
-  public List updateDeletionStatus(List loadIds, String 
tableFolderPath) {
-ICarbonLock carbonLock = CarbonLockFactory
+  public List updateDeletionStatus(List loadIds, String 
tableFolderPath,
+  String dbName, String tableName) throws Exception {
--- End diff --

I think no need to to change the method signature here...database name and 
table name can be taken from carbontable identifier and carbonTableIdentifier 
object is available in the mehod


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #148: [CARBONDATA-233] bad record logger s...

2016-09-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/148#discussion_r78314030
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -1133,6 +1149,9 @@ else if(isComplexTypeColumn[j]) {
   
.getDirectDictionaryGenerator(details.getColumnType());
   surrogateKeyForHrrchy[0] =
   
directDictionaryGenerator1.generateDirectSurrogateKey(tuple);
+  if (badRecordsLoggerEnable && surrogateKeyForHrrchy[0] == 1) 
{
+surrogateKeyForHrrchy[0] = -1;
--- End diff --

Replace -1 and 1 with proper identifiers from CarbonCommonConstants to give 
proper meaning and improve code readability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #124: [CARBONDATA-125] Remove usage of cur...

2016-09-02 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/124

[CARBONDATA-125] Remove usage of currentRestructureNumber from the code

This PR is mainly to improve code quality of Carbon which includes removal 
of an unwanted feature.

Problem: Code contains currentRestructureNumber variable which is not used 
in the code now

Impact area: Data load flow

Fix: Remove the usage of currentRestructureNumber variable everywhere in 
the code

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata 
remove_current_restructure_number

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #124


commit fdb52d7f774acc89cfc27b830ed851926b71bd7e
Author: manishgupta88 <tomanishgupt...@gmail.com>
Date:   2016-09-02T10:36:05Z

Problem: Code contains currentRestructureNumber variable which is not used 
in the code now

Impact area: Data load flow

Fix: Remove the usage of currentRestructureNumber variable everywhere in 
the code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #107: [CARBONDATA-191]Fix bug when quote c...

2016-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/107#discussion_r76787331
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithSingleQuotechar.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.dataload
+
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+/**
+ * Test Class for data loading when there is single quote in fact data
+ *
+ */
+class TestLoadDataWithSingleQuotechar extends QueryTest with 
BeforeAndAfterAll {
+  override def beforeAll {
+sql("DROP TABLE IF EXISTS carbontable")
+sql(
+  "CREATE TABLE carbontable (id Int, name String) STORED BY 
'carbondata'")
+  }
+
+  test("test data loading with single quote char") {
+try {
+  sql(
+"LOAD DATA LOCAL INPATH 
'./src/test/resources/dataWithSingleQuote.csv' INTO TABLE " +
+  "carbontable OPTIONS('DELIMITER'= ',')")
+  checkAnswer(
+sql("SELECT * from carbontable"),
+Seq(Row("Tom",1),
+  Row("Tony\n3,Lily",2),
+  Row("Games\"",4),
+  Row("prival\"\n6,\"hello\"",5)
+)
--- End diff --

@Jay357089 ...Please perform the below mentioned activities.
1. Can you test the same scenario with same set of data in spark and hive 
to check their behavior and paste the result screenshot here.
2. What about cases when instead of default escape character (\), some 
other escape character is given (like @, ^, /, etc)? Test with some other 
escape characters in carbon, spark and hive and verify that behavior is same 
everywhere and paste the snapshots here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...

2016-08-29 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/82#discussion_r76625697
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -1443,5 +1446,32 @@ public static int getDictionaryChunkSize() {
 }
 return dictionaryOneChunkSize;
   }
+
+  /**
+   * @param csvFilePath
+   * @return
+   */
+  public static String readHeader(String csvFilePath) {
+
+DataInputStream fileReader = null;
+BufferedReader bufferedReader = null;
+String readLine = null;
+
+try {
+  fileReader =
+  FileFactory.getDataInputStream(csvFilePath, 
FileFactory.getFileType(csvFilePath));
+  bufferedReader =
+  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
--- End diff --

@foryou2030 instead of using Charset.defaultCharset() use the below 
line of code.
Charset.forName( CarbonCommonConstants.DEFAULT_CHARSET)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #97: [CARBONDATA-180]give proper error mes...

2016-08-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/97#discussion_r76381309
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -689,6 +689,10 @@ object GlobalDictionaryUtil extends Logging {
   generatePredefinedColDictionary(colDictFilePath, table,
 dimensions, carbonLoadModel, sqlContext, hdfsLocation, 
dictfolderPath)
 }
+if (headers.length > df.columns.length) {
+  logError("Maybe either delimiter or fileheader is wrong")
+  throw new DataLoadingException("Maybe either delimiter or 
fileheader is wrong")
--- End diff --

@QiangCai ...Please modify this message as below:
"Either delimiter or fileheader provided is not correct"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #97: [CARBONDATA-180]give proper error mes...

2016-08-26 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/97#discussion_r76373466
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -705,7 +709,7 @@ object GlobalDictionaryUtil extends Logging {
   // check result status
   checkStatus(carbonLoadModel, sqlContext, model, statusList)
 } else {
-  logInfo("have no column need to generate global dictionary in 
Fact file")
+  logInfo("have no column need to generate global dictionary in 
source data files")
--- End diff --

@QiangCai ...Please modify the logger message as below:
"No column found for generating global dictionary in source data files"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #90: [CARBONDATA-175]Fix Bug: Load Decimal...

2016-08-25 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/90#discussion_r76257293
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -62,7 +62,12 @@ public static Object 
getMeasureValueBasedOnDataType(String msrValue, DataType da
   case INT:
 return Double.valueOf(msrValue).longValue();
   case LONG:
-return Long.valueOf(msrValue);
+int index = msrValue.indexOf(".");
+if (index != -1) {
+  return Long.valueOf(msrValue.substring(0,index));
+} else {
+  return Long.valueOf(msrValue);
+}
--- End diff --

@lion-x ...It is a correct behavior to return null value in this case as 
our validation is strict. If any column which has BigInt datatype is included 
as dimension then also we make decimal value as null because of strict 
validation. 
That is why I suggested that dont change the behavior for Long parsing, 
instead change the parsing logic for Int datatype to keep the strict validation 
behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #90: [CARBONDATA-175]Fix Bug: Load Decimal...

2016-08-24 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/90#discussion_r76028231
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -62,7 +62,12 @@ public static Object 
getMeasureValueBasedOnDataType(String msrValue, DataType da
   case INT:
 return Double.valueOf(msrValue).longValue();
   case LONG:
-return Long.valueOf(msrValue);
+int index = msrValue.indexOf(".");
+if (index != -1) {
+  return Long.valueOf(msrValue.substring(0,index));
+} else {
+  return Long.valueOf(msrValue);
+}
--- End diff --

@lion-x ...keep the code same for INT and LONG datatype case...i think 
instead of writing this code, that would be better. Sample code as below
case INT:
case LONG:
  return Double.valueOf(msrValue).longValue();


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

2016-08-24 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/81#discussion_r76017047
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -773,20 +776,27 @@ object CarbonDataRDDFactory extends Logging {
   } catch {
 case ex: Throwable =>
   loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
-  logInfo("DataLoad failure")
+  ex match {
+case sparkException: SparkException =>
+  if 
(sparkException.getCause.isInstanceOf[DataLoadingException]) {
+executorMessage = sparkException.getCause.getMessage
+errorMessage = errorMessage + ": " + executorMessage
+  }
+case _ =>
--- End diff --

@Zhangshunyu ...for case _ also get the message from throwable and add it 
to error message same as you have done for sparkException case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

2016-08-23 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/81#discussion_r75869589
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
   loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
   logInfo("DataLoad failure")
   logger.error(ex)
--- End diff --

@Zhangshunyu Move logInfo and logger.error below the case match 
blockalso in logInfo log the executorMessage. you can move the variable 
message also above with default value as "DataLoad failure: " so that the same 
can be used for logInfo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

2016-08-23 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/81#discussion_r75869120
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
   loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
   logInfo("DataLoad failure")
   logger.error(ex)
+  ex match {
+case sparkException: SparkException =>
+  if 
(sparkException.getCause.isInstanceOf[DataLoadingException]) {
+executorMessage = sparkException.getCause.getMessage
+  }
+case _ =>
--- End diff --

@Zhangshunyu ...for underscore case also get the message and assign it to 
variable executorMessage because lets say the control comes to case _ then 
variable executorMessage will have empty value and message will remain as 
"DataLoad failure: " which will be incorrect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #83: [CARBONDATA-169]ColDict and All Dicti...

2016-08-23 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/83#discussion_r75865451
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/util/ExternalColumnDictionaryTestCase.scala
 ---
@@ -205,6 +205,20 @@ class ExternalColumnDictionaryTestCase extends 
QueryTest with BeforeAndAfterAll
   loadSqlRelation, "deviceInformationId", "10086")
   }
 
+  test("COLUMNDICT and ALL_DICTIONARY_PATH can not be used together") {
+try {
+  sql(s"""
+LOAD DATA LOCAL INPATH "$complexFilePath1" INTO TABLE loadSqlTest
+
OPTIONS('COLUMNDICT'='$extColDictFilePath1',"ALL_DICTIONARY_PATH"='$extColDictFilePath1')
+""")
+  assert(false)
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #83: [CARBONDATA-169]ColDict and All Dicti...

2016-08-23 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/83#discussion_r75860680
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/util/ExternalColumnDictionaryTestCase.scala
 ---
@@ -205,6 +205,20 @@ class ExternalColumnDictionaryTestCase extends 
QueryTest with BeforeAndAfterAll
   loadSqlRelation, "deviceInformationId", "10086")
   }
 
+  test("COLUMNDICT and ALL_DICTIONARY_PATH can not be used together") {
+try {
+  sql(s"""
+LOAD DATA LOCAL INPATH "$complexFilePath1" INTO TABLE loadSqlTest
+
OPTIONS('COLUMNDICT'='$extColDictFilePath1',"ALL_DICTIONARY_PATH"='$extColDictFilePath1')
+""")
+  assert(false)
--- End diff --

assert(false) is not required as Load command will throw an exception and 
control will go to exception block


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #83: [CARBONDATA-169]ColDict and All Dicti...

2016-08-23 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/83#discussion_r75860533
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala ---
@@ -1001,6 +1001,12 @@ class CarbonSqlParser()
   "SERIALIZATION_NULL_FORMAT",
   "ALL_DICTIONARY_PATH"
 )
+if (options.exists(_._1.equalsIgnoreCase("COLUMNDICT")) &&
+  options.exists(_._1.equalsIgnoreCase("ALL_DICTIONARY_PATH"))) {
+  val errorMessage = "Error: COLUMNDICT and ALL_DICTIONARY_PATH can 
not be used together" +
+" in options"
+  throw new MalformedCarbonCommandException(errorMessage)
+}
--- End diff --

@Jay357089 .Move this newly added code below if loop for not 
isSupported check. I think ideally first we should validate the options whether 
they are among the ones allowed for load command and then perform other checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---