[GitHub] carbondata issue #1032: [CARBONDATA-1149] Fixed range info overlapping value...

2018-01-18 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1032
  
Not required as partition feature is re-implemented.


---


[GitHub] carbondata pull request #1830: [CARBONDATA-2051] Added like query ends with ...

2018-01-18 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1830

[CARBONDATA-2051] Added like query ends with and contains with filter push 
down suport to carbondata

**Problem**
Current like filter with start with expression is only pushed down to 
carbondata. In case of ends with and contains like filter all the data is given 
back to spark and then spark applies the filter on it.
This behavior is fine for the queries which has lesser number of queried 
columns. But as the number of columns and data increases there is performance 
impact because the data being sent to spark becomes more thereby increasing the 
IO. 
If like filter is push down then first filter column is read and blocks are 
pruned. In this cases the data returned to the spark is after applying the 
filter and only blocklets matching the data are fully read. This reduces IO and 
increases the query performance.

**Solution**
Modify code to push down like query with ends and contains with filter

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Added test case to verify push down is happening
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata like_query_pushdown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1830.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1830


commit bff9fbf316941d0c732c04d3c9b7e775285ec893
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-01-18T08:53:17Z

Added like query ends with and contains with filter push down suport to 
carbondata




---


[GitHub] carbondata pull request #1810: [CARBONDATA-2037]Store carbondata locations i...

2018-01-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1810#discussion_r162253805
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMap.java
 ---
@@ -598,16 +627,16 @@ public boolean isScanRequired(FilterResolverIntf 
filterExp) {
 if (filterExp == null) {
   int rowCount = unsafeMemoryDMStore.getRowCount();
   for (int i = 0; i < rowCount; i++) {
-DataMapRow unsafeRow = unsafeMemoryDMStore.getUnsafeRow(i);
-blocklets.add(createBlocklet(unsafeRow, i));
+DataMapRow safeRow = 
unsafeMemoryDMStore.getUnsafeRow(i).convertToSafeRow();
+blocklets.add(createBlocklet(safeRow, i));
   }
 } else {
   int startIndex = findStartIndex(convertToRow(searchStartKey), 
comparator);
   int endIndex = findEndIndex(convertToRow(searchEndKey), comparator);
   FilterExecuter filterExecuter =
   FilterUtil.getFilterExecuterTree(filterExp, segmentProperties, 
null);
   while (startIndex <= endIndex) {
-DataMapRow unsafeRow = 
unsafeMemoryDMStore.getUnsafeRow(startIndex);
+DataMapRow unsafeRow = 
unsafeMemoryDMStore.getUnsafeRow(startIndex).convertToSafeRow();
--- End diff --

Please rename this variable also to safeRow


---


[GitHub] carbondata pull request #1813: [CARBONDATA-2015] Restricted maximum length o...

2018-01-16 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1813#discussion_r161963254
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/NonDictionaryFieldConverterImpl.java
 ---
@@ -70,13 +70,14 @@ public NonDictionaryFieldConverterImpl(DataField 
dataField, String nullformat, i
   try {
 byte[] value = DataTypeUtil
 .getBytesBasedOnDataTypeForNoDictionaryColumn(dimensionValue, 
dataType, dateFormat);
-if (dataType == DataTypes.STRING) {
-  assert value.length <= 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT;
+if (dataType == DataTypes.STRING
+&& value.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
+  throw new CarbonDataLoadingException("Dataload failed, String 
size cannot exceed "
+  + CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " 
bytes");
 }
 row.update(value, index);
-  } catch (AssertionError ae) {
-throw new CarbonDataLoadingException("Dataload failed, String size 
cannot exceed "
-+ CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " 
bytes");
+  } catch (CarbonDataLoadingException e) {
--- End diff --

Kindly check and confirm if CarbonDataLoadingException can come from any 
other place also in this flow


---


[GitHub] carbondata pull request #1818: [CARBONDATA-2020][Old Store Support] Add filt...

2018-01-16 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1818

[CARBONDATA-2020][Old Store Support] Add filter support for old store 
reading to improve query performance

**Problem**
For old stores blocklet level min/max comparison was not happening in the 
executor side due to which all the blocklets were getting scanned. This 
increased the IO and scanning time in the executor.

**Solution**
Modified code to retrieve the min/max value from blocklet node and use it 
for comparsion while scanning for valid blocklets.

**Note:** This PR need to be merged after #1796 

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Yes. Manually verified   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
filter_support_for_old_store

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1818


commit b0c7f1211c74f478fe5f561872b655a1869d6756
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-01-17T05:57:09Z

Add filter support for old store reading




---


[GitHub] carbondata issue #1789: [CARBONDATA-2020] Fix avoid reading of all block inf...

2018-01-16 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1789
  
LGTM


---


[GitHub] carbondata pull request #1810: [WIP]Store carbondata locations in datamap to...

2018-01-16 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1810#discussion_r161739311
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMap.java
 ---
@@ -604,18 +731,26 @@ private ExtendedBlocklet createBlocklet(DataMapRow 
row, int blockletId) {
 detailInfo.setBlockletId((short) blockletId);
 detailInfo.setDimLens(columnCardinality);
 
detailInfo.setSchemaUpdatedTimeStamp(row.getLong(SCHEMA_UPADATED_TIME_INDEX));
-BlockletInfo blockletInfo = new BlockletInfo();
-try {
-  byte[] byteArray = row.getByteArray(BLOCK_INFO_INDEX);
-  ByteArrayInputStream stream = new ByteArrayInputStream(byteArray);
-  DataInputStream inputStream = new DataInputStream(stream);
-  blockletInfo.readFields(inputStream);
-  inputStream.close();
-} catch (IOException e) {
-  throw new RuntimeException(e);
+byte[] byteArray = row.getByteArray(BLOCK_INFO_INDEX);
+BlockletInfo blockletInfo = null;
+if (byteArray.length > 0) {
+  try {
+blockletInfo = new BlockletInfo();
+ByteArrayInputStream stream = new ByteArrayInputStream(byteArray);
+DataInputStream inputStream = new DataInputStream(stream);
+blockletInfo.readFields(inputStream);
+inputStream.close();
+blocklet.setLocation(
--- End diff --

I think this should be outside if block because even if blocklet info does 
not exist in case of old store still we need to set the file location


---


[GitHub] carbondata issue #1785: [CARBONDATA-2015] Restricted maximum length of bytes...

2018-01-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1785
  
LGTM


---


[GitHub] carbondata pull request #1796: [WIP] Modified code to add relative blocklet ...

2018-01-12 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1796

[WIP] Modified code to add relative blocklet id during initialization in 
the blocklet data map

Modified code to add relative blocklet id during initialization in the 
blocklet data map

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
add_blocklet_id_to_data_map

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1796


commit 0ba193ef4a53f6bd480c259126622f18c7e80ad8
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-01-12T12:58:56Z

Modified code to add relative blocklet id during initialization in the 
blocklet data map




---


[GitHub] carbondata issue #1769: [CARBONDATA-2014]update table status for failure onl...

2018-01-11 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1769
  
LGTM...will merge once all build run


---


[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

2018-01-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1785#discussion_r161144886
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ---
@@ -113,6 +113,9 @@ object CarbonScalaUtil {
 case s: String => if (s.length > 
CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
   throw new Exception("Dataload failed, String length cannot 
exceed " +
   CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT 
+ " characters")
+} else if (ByteUtil.toBytes(s).length > 
CSVInputFormat.MAX_CHARS_PER_COLUMN_DEFAULT) {
--- End diff --

this can impact the data load performance. Move this check to 
CarbonDictionaryWriterImpl class where we are already converting the string 
values to byte array. This will not add any extra cost for for conversion of 
string to byte array


---


[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

2018-01-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1785#discussion_r161144455
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
 ---
@@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterAll {
 } catch {
   case _:Exception => assert(true)
 }
+  }
+
+  test("test load / insert / update with data more than 32000 bytes - 
dictionary_exclude") {
+val testdata = s"$resourcesPath/unicodechar.csv"
+sql("drop table if exists load32000bytes")
+sql("create table load32000bytes(name string) stored by 'carbondata'")
+sql("insert into table load32000bytes select 'aaa'")
+
+assert(intercept[Exception] {
+  sql(s"load data local inpath '$testdata' into table load32000bytes 
OPTIONS ('FILEHEADER'='name')")
+}.getMessage.equals("DataLoad failure: Dataload failed, String size 
cannot exceed 32000 bytes"))
+
+val source = scala.io.Source.fromFile(testdata)
+val data = source.mkString
+
+try {
+  sql(s"insert into load32000bytes values('$data')")
+  assert(false)
+} catch {
+  case _: Exception =>
+assert(true)
+}
+
+try {
--- End diff --

same comment as above


---


[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

2018-01-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1785#discussion_r161144939
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
 ---
@@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterAll {
 } catch {
   case _:Exception => assert(true)
 }
+  }
+
+  test("test load / insert / update with data more than 32000 bytes - 
dictionary_exclude") {
+val testdata = s"$resourcesPath/unicodechar.csv"
+sql("drop table if exists load32000bytes")
+sql("create table load32000bytes(name string) stored by 'carbondata'")
+sql("insert into table load32000bytes select 'aaa'")
+
+assert(intercept[Exception] {
+  sql(s"load data local inpath '$testdata' into table load32000bytes 
OPTIONS ('FILEHEADER'='name')")
+}.getMessage.equals("DataLoad failure: Dataload failed, String size 
cannot exceed 32000 bytes"))
+
+val source = scala.io.Source.fromFile(testdata)
+val data = source.mkString
+
+try {
+  sql(s"insert into load32000bytes values('$data')")
+  assert(false)
+} catch {
+  case _: Exception =>
+assert(true)
+}
+
+try {
+  sql(s"update load32000bytes set(name)= ('$data')").show()
+  assert(false)
+} catch {
+  case _: Exception =>
+assert(true)
+}
+
+sql("drop table if exists load32000bytes")
+
+  }
+
+  test("test load / insert / update with data more than 32000 bytes - 
dictionary_include") {
+val testdata = s"$resourcesPath/unicodechar.csv"
+sql("drop table if exists load32000bytes")
+sql("create table load32000bytes(name string) stored by 'carbondata' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='name')")
+sql("insert into table load32000bytes select 'aaa'")
+
+val error = intercept[Exception] {
+  sql(s"load data local inpath '$testdata' into table load32000bytes 
OPTIONS ('FILEHEADER'='name')")
+}.getMessage
+
+assert(intercept[Exception] {
+  sql(s"load data local inpath '$testdata' into table load32000bytes 
OPTIONS ('FILEHEADER'='name')")
+}.getMessage.contains("generate global dictionary failed, Dataload 
failed, String size cannot exceed 32000 bytes"))
+
+val source = scala.io.Source.fromFile(testdata)
+val data = source.mkString
+
+try {
+  sql(s"insert into load32000bytes values('$data')")
+  assert(false)
+} catch {
+  case _: Exception =>
+assert(true)
+}
+
+try {
--- End diff --

avoid using try catch at all the places in the modified code


---


[GitHub] carbondata pull request #1785: [CARBONDATA-2015] Restricted maximum length o...

2018-01-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1785#discussion_r161144435
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
 ---
@@ -187,6 +187,75 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterAll {
 } catch {
   case _:Exception => assert(true)
 }
+  }
+
+  test("test load / insert / update with data more than 32000 bytes - 
dictionary_exclude") {
+val testdata = s"$resourcesPath/unicodechar.csv"
+sql("drop table if exists load32000bytes")
+sql("create table load32000bytes(name string) stored by 'carbondata'")
+sql("insert into table load32000bytes select 'aaa'")
+
+assert(intercept[Exception] {
+  sql(s"load data local inpath '$testdata' into table load32000bytes 
OPTIONS ('FILEHEADER'='name')")
+}.getMessage.equals("DataLoad failure: Dataload failed, String size 
cannot exceed 32000 bytes"))
+
+val source = scala.io.Source.fromFile(testdata)
+val data = source.mkString
+
+try {
+  sql(s"insert into load32000bytes values('$data')")
--- End diff --

avoid using try catch and use intercept[Exception]


---


[GitHub] carbondata pull request #1769: [CARBONDATA-2014]update table status for fail...

2018-01-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1769#discussion_r161143694
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
@@ -251,7 +255,9 @@ case class CarbonLoadDataCommand(
 case ex: Exception =>
   LOGGER.error(ex)
   // update the load entry in table status file for changing the 
status to marked for delete
-  CarbonLoaderUtil.updateTableStatusForFailure(carbonLoadModel)
+  if (isUpdateTableStatusRequired && !table.isHivePartitionTable) {
--- End diff --

!table.isHivePartitionTable...this condition is not required while calling 
updateTableStatusForFailure as boolean flag isUpdateTableStatusRequired  will 
be set ti true only if the table is not a hive partition table


---


[GitHub] carbondata issue #1782: [CARBONDATA-2019] Enhancement of merge index compact...

2018-01-11 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1782
  
@ravipesala ..handled review comments...kindly review and merge


---


[GitHub] carbondata issue #1782: [WIP] Changes for creating carbon index merge file f...

2018-01-09 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1782
  
retest sdv please


---


[GitHub] carbondata pull request #1782: [WIP] Changes for creating carbon index merge...

2018-01-09 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1782

[WIP] Changes for creating carbon index merge file from old store which did 
not contain the blocklet info in index information

Changes for creating carbon index merge file from old store which did not 
contain the blocklet info in index information

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
enhace_segment_index_compaction

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1782.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1782


commit 5f4a75f7395c8949a5ccf53030ebdcba5411022f
Author: manishgupta88 <tomanishgupta18@...>
Date:   2018-01-09T15:02:36Z

changes for creating carbon index merge file from old store which did not 
contain the blocklet info in index information




---


[GitHub] carbondata issue #1702: [CARBONDATA-1896] Clean files operation improvement

2018-01-05 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1702
  
LGTM


---


[GitHub] carbondata issue #1718: [CARBONDATA-1929][Validation]carbon property configu...

2018-01-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1718
  
LGTM


---


[GitHub] carbondata issue #1734: [CARBONDATA-1912] Handling lock issues for alter ren...

2018-01-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1734
  
LGTM


---


[GitHub] carbondata issue #1734: [CARBONDATA-1912] Handling lock issues for alter ren...

2018-01-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1734
  
LGTMwill merge after all builds run


---


[GitHub] carbondata issue #1760: [CARBONDATA-1979] ][IMPLICIT COLUMN] Modified implic...

2018-01-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1760
  
LGTM


---


[GitHub] carbondata issue #1740: [CARBONDATA-1949, CARBONDATA-1950] Fixed bug related...

2018-01-04 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1740
  
LGTM..I will merge once all build complete


---


[GitHub] carbondata pull request #1747: [Compatibility] Added changes for backward co...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1747#discussion_r159589331
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java ---
@@ -231,7 +231,22 @@ public String getSchemaFilePath() {
* @return schema file path
*/
   public static String getSchemaFilePath(String tablePath) {
-return tablePath + File.separator + METADATA_DIR + File.separator + 
SCHEMA_FILE;
+return getActualSchemaFilePath(tablePath);
+  }
+
+  private static String getActualSchemaFilePath(String tablePath) {
+String metaPath = tablePath + File.separator + METADATA_DIR;
--- End diff --

use CarbonCommonConstants.FILE_SEPARATOR instead of File.separator


---


[GitHub] carbondata pull request #1747: [Compatibility] Added changes for backward co...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1747#discussion_r159589266
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ---
@@ -291,4 +291,52 @@ object CarbonScalaUtil {
   })
 otherFields
   }
+
+  /**
+   * If the table is from an old store then the table parameters are in 
lowercase. In the current
+   * code we are reading the parameters as camel case.
+   * This method will convert all the schema parts to camel case
+   *
+   * @param parameters
+   * @return
+   */
+  def getDeserializedParameters(parameters: Map[String, String]): 
Map[String, String] = {
--- End diff --

there is no calling point for this method


---


[GitHub] carbondata pull request #1747: [Compatibility] Added changes for backward co...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1747#discussion_r159589250
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala
 ---
@@ -291,4 +291,52 @@ object CarbonScalaUtil {
   })
 otherFields
   }
+
+  /**
+   * If the table is from an old store then the table parameters are in 
lowercase. In the current
+   * code we are reading the parameters as camel case.
+   * This method will convert all the schema parts to camel case
+   *
+   * @param parameters
+   * @return
+   */
+  def getDeserializedParameters(parameters: Map[String, String]): 
Map[String, String] = {
+val keyParts = 
parameters.getOrElse("spark.sql.sources.options.keys.numparts", "0").toInt
+if (keyParts == 0) {
+  parameters
+} else {
+  var keyStr = ""
+  for(i <- 0 until keyParts) {
+keyStr += parameters(s"spark.sql.sources.options.keys.part.$i")
+  }
+  val finalProperties = scala.collection.mutable.Map.empty[String, 
String]
+  keyStr.split(",") foreach {
--- End diff --

Please check the logic here.  Keystr is not appended explicitly with , but 
splitting is done based on comma


---


[GitHub] carbondata pull request #1082: [CARBONDATA-1218] [GLOBAL SORT] In case of da...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1082#discussion_r159586543
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -145,6 +147,15 @@ public static void 
renameBadRecordsFromInProgressToNormal(
 }
   }
 
+  public static void renameBadRecord(CarbonDataLoadConfiguration 
configuration) {
+// rename the bad record in progress to normal
+CarbonTableIdentifier identifier =
+configuration.getTableIdentifier().getCarbonTableIdentifier();
+
CarbonDataProcessorUtil.renameBadRecordsFromInProgressToNormal(configuration,
--- End diff --

1. Avoid calling the method using classname. Method is present in the same 
class.
2. I think we can introduce a new class called BadRecordsUtil which will 
have all the functions related to bad records like 
renameBadRecord, hasBadRecord, renameBadRecordsFromInProgressToNormal (make 
it private in the new util class)


---


[GitHub] carbondata pull request #1082: [CARBONDATA-1218] [GLOBAL SORT] In case of da...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1082#discussion_r159584326
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/badrecordloger/BadRecordLoggerTest.scala
 ---
@@ -247,6 +248,33 @@ class BadRecordLoggerTest extends QueryTest with 
BeforeAndAfterAll {
 }
   }
 
+  test(
+"test if first load failed due to bad record then second load should 
not failed if there is " +
+"no bad record") {
+sql("drop table IF EXISTS loadIssue")
+sql("""CREATE TABLE IF NOT EXISTS loadIssue(ID BigInt, item String) 
STORED BY 'carbondata'""")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION,
+  "FAIL").addProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, 
"GLOBAL_SORT")
+try {
+  sql("insert into loadIssue select 'x','Book'");
+} catch {
+  case ex: Exception =>
+assert(true)
+}
+try {
+  sql("insert into loadIssue select 1,'Book'");
+} catch {
+  case ex: Exception =>
+assert(false)
+} finally {
+  CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, 
"FORCE")
+.addProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, "LOCAL_SORT")
+}
+assert(true)
+sql("drop table IF EXISTS loadIssue")
+  }
+
--- End diff --

1. Use only one try and finally block inside test code..try should start 
from first line and end at last and finally should only reset the carbon 
property
2. Instead of try catch use intercept[Exception] only for the 1st case 
where exception is expected. For 2nd case try catch is not required...if 
exception comes anyways test case will fail


---


[GitHub] carbondata pull request #1732: [CARBONDATA-1946] Exception thrown after alte...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1732#discussion_r159585172
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -775,4 +775,25 @@ public static DataType valueOf(String name) {
 }
   }
 
-}
\ No newline at end of file
+  /**
+   * Method to type case the data based on modified data type. This method 
will used for
+   * retrieving the data after change in data type restructure operation
+   *
+   * @param data
+   * @param restructuredDataType
+   * @param currentDataOffset
+   * @param length
+   * @return
+   */
+  public static long getDataBasedOnRestructuredDataType(byte[] data, 
DataType restructuredDataType,
--- End diff --

yes because alter datatype change is only for Int to BigInt and decimal 
from lower precision to higher precision. So float to double case is not 
required to be handled


---


[GitHub] carbondata pull request #1732: [CARBONDATA-1946] Exception thrown after alte...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1732#discussion_r159585515
  
--- Diff: 
integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/ChangeDataTypeTestCases.scala
 ---
@@ -164,6 +164,21 @@ class ChangeDataTypeTestCases extends Spark2QueryTest 
with BeforeAndAfterAll {
 sql("drop table if exists PreAggMain_preagg1")
   }
 
+  test("test data type change for dictionary exclude INT type column") {
--- End diff --

Alter table datatype change does not support short to int, short to long 
and float to double. It only supports Int to BigInt and decimal from lower 
precision to higher precision. If any other data type change request comes 
validation is done and exception is thrown


---


[GitHub] carbondata issue #1754: [CARBONDATA-1975] Wrong input metrics displayed for ...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1754
  
LGTM...please attach the test results snapshots when the jira and github 
issue is resolved


---


[GitHub] carbondata issue #1734: [CARBONDATA-1912] Handling lock issues for alter ren...

2018-01-03 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1734
  
@ManoharVanam ...In that case can we change the logger error to warning in 
case of non existence of file and check whether still the warning log is 
getting printed. If warning logs are not getting printed in the console then we 
can make change in the logic instead changing the logic


---


[GitHub] carbondata issue #1740: [CARBONDATA-1949, CARBONDATA-1950] Fixed bug related...

2018-01-02 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1740
  
@geetikagupta16 Can you please add a test case for describe formatted 
command wherein you can take all the rows from 1st column and assert for total 
rows and match each row value with expected output...Also resolve the conflicts


---


[GitHub] carbondata issue #1734: [CARBONDATA-1912] Handling lock issues for alter ren...

2018-01-02 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1734
  
@ManoharVanam...what is need of releasing the locks in catch block when 
finally block assures that lock is released?


---


[GitHub] carbondata issue #1732: [CARBONDATA-1946] Exception thrown after alter data ...

2017-12-29 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1732
  
retest this please


---


[GitHub] carbondata issue #1718: [CARBONDATA-1929][Validation]carbon property configu...

2017-12-28 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1718
  
@mohammadshahidkhan ..IDG update is required for the min and max values of 
the properties in this PR


---


[GitHub] carbondata issue #1703: [CARBONDATA-1917] While loading, check for stale dic...

2017-12-28 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1703
  
@dhatchayani ...From my perspective I think this PR is not required. This 
is a case when user deletes the dictionary and sort index file intentionally 
from back end from the system and keeps only the dictionary meta file. But in 
any case user should not be allowed to delete any of the carbon store files.
Therefore I think we should not handle this scenario.


---


[GitHub] carbondata issue #1723: [CARBONDATA-1939] Added show segments validation tes...

2017-12-28 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1723
  
LGTM


---


[GitHub] carbondata issue #1732: [CARBONDATA-1946] Exception thrown after alter data ...

2017-12-27 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1732
  
retest this please


---


[GitHub] carbondata pull request #1732: [WIP] Fixed select query failure after alter ...

2017-12-27 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1732

[WIP] Fixed select query failure after alter change data type operation for 
dictionary exclude columns

Fixed select query failure after alter change data type operation for 
dictionary exclude columns

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
select_ater_change_datatype_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1732


commit 31baf92318a033e0ea2fba06f4cc94dc38146770
Author: manishgupta88 <tomanishgupta18@...>
Date:   2017-12-27T17:39:58Z

Handle select query failure after alter change data tye operation for 
dictionary exclude columns




---


[GitHub] carbondata issue #1721: [CARBONDATA-1822] Documentation - Added REFRESH TABL...

2017-12-27 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1721
  
LGTM


---


[GitHub] carbondata issue #1708: [CARBONDATA-1928] Seperate the properties for timeou...

2017-12-25 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1708
  
LGTM


---


[GitHub] carbondata issue #1715: [CARBONDATA-1934] Incorrect results are returned by ...

2017-12-22 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1715
  
retest this please


---


[GitHub] carbondata issue #1708: [CARBONDATA-1928] Seperate the properties for timeou...

2017-12-22 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1708
  
LGTM


---


[GitHub] carbondata issue #1696: [CARBONDATA-1884] SDV test cases for CTAS support to...

2017-12-22 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1696
  
LGTM


---


[GitHub] carbondata pull request #1715: [CARBONDATA-1934] Incorrect results are retur...

2017-12-22 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1715

[CARBONDATA-1934] Incorrect results are returned by select query in case 
when the number of blocklets for one part file are > 1 in the same task

Problem: When a select query is triggered, driver will prune the segments 
and give a list of blocklets that need to be scanned. The number of tasks from 
spark will be equal to the number of blocklets identified.
In case where one task has more than one blocklet for same file, then 
BlockExecution getting formed is incorrect. Due to this the query results are 
incorrect.

Fix: Use the abstract index to fill all the details in BlockExecutionInfo

 - [ ] Any interfaces changed?
No 
 - [ ] Any backward compatibility impacted?
No
 - [ ] Document update required?
No
 - [ ] Testing done
 Manual testing
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata data_loss_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1715


commit b0c518d4aa7d4b2387899deefc0f9ed39b5c463c
Author: manishgupta88 <tomanishgupta18@...>
Date:   2017-12-22T10:35:31Z

Incorrect results are returned by select query in case when the number of 
blocklets for one part file are > 1 in the same task




---


[GitHub] carbondata issue #1311: [CARBONDATA-1439] Wrong Error message shown for Bad ...

2017-12-21 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1311
  
LGTM


---


[GitHub] carbondata issue #1126: [CARBONDATA-1258] CarbonData should not allow loadin...

2017-12-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1126
  
LGTM


---


[GitHub] carbondata issue #1311: [CARBONDATA-1439] Wrong Error message shown for Bad ...

2017-12-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1311
  
retest this please


---


[GitHub] carbondata issue #1670: [CARBONDATA-1899] Add CarbonData concurrency test ca...

2017-12-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1670
  
LGTM


---


[GitHub] carbondata issue #1670: [CARBONDATA-1899] Add CarbonData concurrency test ca...

2017-12-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1670
  
LGTM...once build runs I will merge


---


[GitHub] carbondata issue #1676: [CARBONDATA-1906] Update registerTempTable method be...

2017-12-20 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1676
  
@xubo245 ..actual modification of code is in 2-3 files, remaining files 
have been modified for import changes..kindly revert unwanted file changes


---


[GitHub] carbondata pull request #1670: [CARBONDATA-1899] Add CarbonData concurrency ...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1670#discussion_r157947962
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/ConcurrencyTest.scala
 ---
@@ -0,0 +1,355 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+import java.util
+import java.util.concurrent.{Callable, Executors, Future, TimeUnit}
+
+import scala.util.Random
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object ConcurrencyTest {
+
+  var totalNum = 100 * 1000 * 1000
+  var ThreadNum = 16
+  var TaskNum = 100
+  var ResultIsEmpty = true
+  val cardinalityId = 1 * 1
+  val cardinalityCity = 6
+
+  def parquetTableName: String = "comparetest_parquet"
+
+  def orcTableName: String = "comparetest_orc"
+
+  def carbonTableName(version: String): String = 
s"comparetest_carbonV$version"
+
+  // Table schema:
+  // +-+---+-+-++
+  // | id  | string| 100,000,000 | dimension   | no |
+  // +-+---+-+-++
+  // | Column name | Data type | Cardinality | Column type | Dictionary |
+  // +-+---+-+-++
+  // | city| string| 6   | dimension   | yes|
+  // +-+---+-+-++
+  // | country | string| 6   | dimension   | yes|
+  // +-+---+-+-++
+  // | planet  | string| 100,007 | dimension   | yes|
+  // +-+---+-+-++
+  // | m1  | short | NA  | measure | no |
+  // +-+---+-+-++
+  // | m2  | int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | m3  | big int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | m4  | double| NA  | measure | no |
+  // +-+---+-+-++
+  // | m5  | decimal   | NA  | measure | no |
+  // +-+---+-+-++
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+val rdd = spark.sparkContext
+  .parallelize(1 to totalNum, 4)
+  .map { x =>
+((x % 1).toString, "city" + x % 6, "country" + x % 6, 
"planet" + x % 10007,
+  (x % 16).toShort, x / 2, (x << 1).toLong, x.toDouble / 13,
+  BigDecimal.valueOf(x.toDouble / 11))
+  }.map { x =>
+  Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8, x._9)
+}
+
+val schema = StructType(
+  Seq(
+StructField("id", StringType, nullable = false),
+StructField("city", StringType, nullable = false),
+StructField("country", StringType, nullable = false),
+StructField("planet", StringType, nullable = false),
+StructField("m1", ShortType, nullable = false),
+StructField("m2", IntegerType, nullable = false),
+StructField("m3", LongType, nullable = false),
+StructField("

[GitHub] carbondata pull request #1670: [CARBONDATA-1899] Add CarbonData concurrency ...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1670#discussion_r157947459
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/ConcurrencyTest.scala
 ---
@@ -0,0 +1,355 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+import java.util
+import java.util.concurrent.{Callable, Executors, Future, TimeUnit}
+
+import scala.util.Random
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object ConcurrencyTest {
+
+  var totalNum = 100 * 1000 * 1000
+  var ThreadNum = 16
+  var TaskNum = 100
+  var ResultIsEmpty = true
+  val cardinalityId = 1 * 1
+  val cardinalityCity = 6
+
+  def parquetTableName: String = "comparetest_parquet"
+
+  def orcTableName: String = "comparetest_orc"
+
+  def carbonTableName(version: String): String = 
s"comparetest_carbonV$version"
+
+  // Table schema:
+  // +-+---+-+-++
+  // | id  | string| 100,000,000 | dimension   | no |
+  // +-+---+-+-++
+  // | Column name | Data type | Cardinality | Column type | Dictionary |
+  // +-+---+-+-++
+  // | city| string| 6   | dimension   | yes|
+  // +-+---+-+-++
+  // | country | string| 6   | dimension   | yes|
+  // +-+---+-+-++
+  // | planet  | string| 100,007 | dimension   | yes|
+  // +-+---+-+-++
+  // | m1  | short | NA  | measure | no |
+  // +-+---+-+-++
+  // | m2  | int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | m3  | big int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | m4  | double| NA  | measure | no |
+  // +-+---+-+-++
+  // | m5  | decimal   | NA  | measure | no |
+  // +-+---+-+-++
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+val rdd = spark.sparkContext
+  .parallelize(1 to totalNum, 4)
+  .map { x =>
+((x % 1).toString, "city" + x % 6, "country" + x % 6, 
"planet" + x % 10007,
+  (x % 16).toShort, x / 2, (x << 1).toLong, x.toDouble / 13,
+  BigDecimal.valueOf(x.toDouble / 11))
+  }.map { x =>
+  Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8, x._9)
+}
+
+val schema = StructType(
+  Seq(
+StructField("id", StringType, nullable = false),
+StructField("city", StringType, nullable = false),
+StructField("country", StringType, nullable = false),
+StructField("planet", StringType, nullable = false),
+StructField("m1", ShortType, nullable = false),
+StructField("m2", IntegerType, nullable = false),
+StructField("m3", LongType, nullable = false),
+StructField("

[GitHub] carbondata pull request #1670: [CARBONDATA-1899] Add CarbonData concurrency ...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1670#discussion_r157946201
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/ConcurrencyTest.scala
 ---
@@ -0,0 +1,355 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+import java.util
+import java.util.concurrent.{Callable, Executors, Future, TimeUnit}
+
+import scala.util.Random
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object ConcurrencyTest {
+
+  var totalNum = 100 * 1000 * 1000
+  var ThreadNum = 16
+  var TaskNum = 100
+  var ResultIsEmpty = true
+  val cardinalityId = 1 * 1
+  val cardinalityCity = 6
+
+  def parquetTableName: String = "comparetest_parquet"
+
+  def orcTableName: String = "comparetest_orc"
+
+  def carbonTableName(version: String): String = 
s"comparetest_carbonV$version"
+
+  // Table schema:
+  // +-+---+-+-++
+  // | id  | string| 100,000,000 | dimension   | no |
+  // +-+---+-+-++
+  // | Column name | Data type | Cardinality | Column type | Dictionary |
+  // +-+---+-+-++
+  // | city| string| 6   | dimension   | yes|
+  // +-+---+-+-++
+  // | country | string| 6   | dimension   | yes|
+  // +-+---+-+-++
+  // | planet  | string| 100,007 | dimension   | yes|
+  // +-+---+-+-++
+  // | m1  | short | NA  | measure | no |
+  // +-+---+-+-++
+  // | m2  | int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | m3  | big int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | m4  | double| NA  | measure | no |
+  // +-+---+-+-++
+  // | m5  | decimal   | NA  | measure | no |
+  // +-+---+-+-++
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+val rdd = spark.sparkContext
+  .parallelize(1 to totalNum, 4)
+  .map { x =>
+((x % 1).toString, "city" + x % 6, "country" + x % 6, 
"planet" + x % 10007,
+  (x % 16).toShort, x / 2, (x << 1).toLong, x.toDouble / 13,
+  BigDecimal.valueOf(x.toDouble / 11))
+  }.map { x =>
+  Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8, x._9)
+}
+
+val schema = StructType(
+  Seq(
+StructField("id", StringType, nullable = false),
+StructField("city", StringType, nullable = false),
+StructField("country", StringType, nullable = false),
+StructField("planet", StringType, nullable = false),
+StructField("m1", ShortType, nullable = false),
+StructField("m2", IntegerType, nullable = false),
+StructField("m3", LongType, nullable = false),
+StructField("

[GitHub] carbondata issue #1670: [CARBONDATA-1899] Add CarbonData concurrency test ca...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1670
  
retest sdv please


---


[GitHub] carbondata issue #1126: [CARBONDATA-1258] CarbonData should not allow loadin...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1126
  
retest this please


---


[GitHub] carbondata issue #1679: [CARBONDATA-1907] Avoid unnecessary logging to impro...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1679
  
@kumarvishal09 ..please review


---


[GitHub] carbondata pull request #1679: [CARBONDATA-1907] Avoid unnecessary logging t...

2017-12-19 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1679

[CARBONDATA-1907] Avoid unnecessary logging to improve query performance 
for no dictionary non string columns

Changes done to return null in case of no dictionary column for non string 
data types when data is empty. This is done to avoid excessive logging which is 
impacting the query performance.

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Added UT
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata empty_byte_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1679


commit f5690380ec6d0abaee89acd0bb8fda23eb7f3a33
Author: manishgupta88 <tomanishgupt...@gmail.com>
Date:   2017-12-19T11:57:30Z

Changes done to return null in case of no dictionary column for non string 
data types when data is empty. This is done to avoid excessive logging which is 
impacting the query performance.




---


[GitHub] carbondata issue #1666: [CARBONDATA-1900][Core,processing] Modify loadmetada...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1666
  
LGTM


---


[GitHub] carbondata issue #1126: [CARBONDATA-1258] CarbonData should not allow loadin...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1126
  
Correct the PR header also...remove dots from the end and put a proper 
message


---


[GitHub] carbondata pull request #1126: [CARBONDATA-1258] CarbonData should not allow...

2017-12-19 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1126#discussion_r157700973
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/DateDirectDictionaryGenerator.java
 ---
@@ -42,12 +42,37 @@
 
   private String dateFormat;
 
+  /**
+   * min value supported for date type column
+   */
+  private static final long MIN_VALUE;
+  /**
+   * MAx value supported for date type column
+   */
+  private static final long MAX_VALUE;
   /**
* Logger instance
*/
   private static final LogService LOGGER =
   
LogServiceFactory.getLogService(DateDirectDictionaryGenerator.class.getName());
 
+  static {
+SimpleDateFormat df = new SimpleDateFormat("-MM-dd");
+df.setTimeZone(TimeZone.getTimeZone("GMT"));
+long minValue = 0;
+long maxValue = 0;
+try {
+  minValue = df.parse("0001-01-01").getTime();
+  maxValue = df.parse("-12-31").getTime();
+} catch (ParseException e) {
--- End diff --

As you are defining the date format yourself and parsing it, ParseException 
will not be thrown. But as we need to handle this exception, you can just put a 
warning logger in catch block and take out the remaining logic. If we dont get 
any findbug for empty catch block then better to just put a comment in catch 
block and not do anything


---


[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

2017-12-18 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1665
  
retest this please


---


[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

2017-12-17 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1665#discussion_r157405351
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetaStore.scala
 ---
@@ -144,6 +145,15 @@ trait CarbonMetaStore {
   def getThriftTableInfo(tablePath: CarbonTablePath)(sparkSession: 
SparkSession): TableInfo
 
   def getTableFromMetadataCache(database: String, tableName: String): 
Option[CarbonTable]
+
+  def getCarbonDataSourceHadoopRelation(sparkSession: SparkSession,
+  tableIdentifier: TableIdentifier): CarbonDatasourceHadoopRelation
+
+  def getSchemaFromUnresolvedRelation(sparkSession: SparkSession,
--- End diff --

This method is called from the parser and this logic for retrieving schema 
from unresolved relation, according to me should not return in parser. Please 
let me know if you have a different perspective


---


[GitHub] carbondata issue #1311: [CARBONDATA-1439] Wrong Error message shown for Bad ...

2017-12-17 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1311
  
retest this please


---


[GitHub] carbondata issue #1641: [CARBONDATA-1882] select with group by and insertove...

2017-12-15 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1641
  
retest sdv please


---


[GitHub] carbondata issue #1167: [CARBONDATA-1304] [IUD Bug] Iud with single pass

2017-12-15 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1167
  
LGTM


---


[GitHub] carbondata issue #1653: [CARBONDATA-1893] Data load with multiple QUOTECHAR ...

2017-12-15 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1653
  
LGTM


---


[GitHub] carbondata issue #1665: [CARBONDATA-1884] Add CTAS support to carbondata

2017-12-15 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1665
  
retest this please


---


[GitHub] carbondata pull request #1665: [CARBONDATA-1884] Add CTAS support to carbond...

2017-12-15 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1665

[CARBONDATA-1884] Add CTAS support to carbondata

Implemented CTAS feature in carbondata. This will hep to create a carbon 
table from other parquet/orc tables.

 - [ ] Any interfaces changed?
 New DDL has been introduced.
Syntax:
**CREATE TABLE [IF NOT EXISTS] [db_name.]table_name stored by 'carbondata'  
AS select_statement**
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
Yes
 - [ ] Testing done
 Yes added 12 functional test cases covering various scenarios for CTAS 
test  
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata CTAS_implementation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1665


commit 1dc3df29d92d2cddcd73f6ea0c26140bd4a70ad6
Author: manishgupta88 <tomanishgupt...@gmail.com>
Date:   2017-12-13T16:10:19Z

Added code for implementing CTAS




---


[GitHub] carbondata pull request #1167: [CARBONDATA-1304] [IUD BuggFix] Iud with sing...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1167#discussion_r157133954
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/TestUpdateCarbonTableWithPersistFalse.scala
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.iud
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.constants.LoggerAction
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestUpdateCarbonTableWithPersistFalse extends QueryTest with 
BeforeAndAfterAll {
--- End diff --

Dont add a new test case. Add this test case in the existing update table 
test case class


---


[GitHub] carbondata pull request #1167: [CARBONDATA-1304] [IUD BuggFix] Iud with sing...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1167#discussion_r157133997
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/TestUpdateCarbonTableWithSinglePass.scala
 ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.iud
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.constants.LoggerAction
+import org.apache.carbondata.core.constants.{CarbonCommonConstants, 
CarbonLoadOptionConstants}
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestUpdateCarbonTableWithSinglePass extends QueryTest with 
BeforeAndAfterAll {
--- End diff --

Dont add a new test case. Add this test case in the existing update table 
test case class


---


[GitHub] carbondata pull request #1167: [CARBONDATA-1304] [IUD BuggFix] Iud with sing...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1167#discussion_r157134388
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
@@ -314,14 +315,19 @@ case class CarbonLoadDataCommand(
 } else {
   None
 }
+var loadDataFrame = dataFrame
--- End diff --

Modify the code to make the variable as Val
val loadDataFrame = if (updateModel.isDefined) {
   Some(getDataFrameWithTupleID)
 } else {
dataFrame
 }


---


[GitHub] carbondata issue #1663: [CARBONDATA-1897] remove column group in describe co...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1663
  
LGTM


---


[GitHub] carbondata pull request #1653: [CARBONDATA-1893] Data load with multiple QUO...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1653#discussion_r157009066
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -871,6 +871,32 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
   throw new MalformedCarbonCommandException(errorMessage)
 }
 
+// Validate QUOTECHAR length
+if (options.exists(_._1.equalsIgnoreCase("QUOTECHAR"))) {
+  val quoteChar: String = options.get("quotechar").get.head._2
+  if (quoteChar.length > 1 && !isValidEscapeSequence(quoteChar)) {
--- End diff --

No need to validate quote character for a valid escape sequence


---


[GitHub] carbondata pull request #1653: [CARBONDATA-1893] Data load with multiple QUO...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1653#discussion_r157009247
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadOptions.scala
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataload
+
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import 
org.apache.carbondata.spark.exception.MalformedCarbonCommandException
+
+class TestLoadOptions extends QueryTest with BeforeAndAfterAll{
+
+  override def beforeAll {
+sql("drop table if exists TestLoadTableOptions")
+sql("CREATE table TestLoadTableOptions (ID int, date String, country 
String, name String," +
+"phonetype String, serialname String, salary int) stored by 
'org.apache.carbondata.format'")
+
+  }
+
+  override def afterAll {
+sql("drop table if exists TestLoadTableOptions")
+  }
+
+
+  test("test load data with more than one char in quotechar option") {
+val errorMessage = intercept[MalformedCarbonCommandException] {
+  sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/dataretention1.csv' 
INTO TABLE " +
+  s"TestLoadTableOptions OPTIONS('QUOTECHAR'='')")
+}.getMessage
+assert(errorMessage.equals("Quotation cannot be more than one 
character."))
+  }
+
+  test("test load data with more than one char in commentchar option") {
+val errorMessage = intercept[MalformedCarbonCommandException] {
+  sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/dataretention1.csv' 
INTO TABLE " +
+  s"TestLoadTableOptions OPTIONS('COMMENTCHAR'='##')")
+  assert(false)
+}.getMessage
+assert(errorMessage.equals("Comment marker cannot be more than one 
character."))
+  }
+
+  test("test load data with more than one char in escapechar option") {
+val errorMessage = intercept[MalformedCarbonCommandException] {
+  sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/dataretention1.csv' 
INTO TABLE " +
+  s"TestLoadTableOptions OPTIONS('ESCAPECHAR'='')")
+  assert(false)
+}.getMessage
+assert(errorMessage.equals("Escape character cannot be more than one 
character."))
+  }
+
+  test("test load data with invalid escape sequence in quotechar option") {
+val errorMessage = intercept[MalformedCarbonCommandException] {
+  sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/dataretention1.csv' 
INTO TABLE " +
+  s"TestLoadTableOptions OPTIONS('QUOTECHAR'='\\y')")
+}.getMessage
+assert(errorMessage.equals("Quotation cannot be more than one 
character."))
+  }
+
+  test("test load data with with valid escape sequence in quotechar 
option") {
+sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/dataretention1.csv' INTO 
TABLE " +
+s"TestLoadTableOptions OPTIONS('QUOTECHAR'='\\n')")
--- End diff --

Replace quotechar with escapechar in above 2 test cases


---


[GitHub] carbondata pull request #1653: [CARBONDATA-1893] Data load with multiple QUO...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1653#discussion_r156958685
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -932,6 +958,11 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+  def isEscapeSequence(loadOption: String): Boolean = {
--- End diff --

Rename it to isValidEscapeSequence and also add one valid test  case for 
validating a correct escape sequence


---


[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1632#discussion_r156954293
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
 ---
@@ -121,17 +121,18 @@ object DataLoadProcessBuilderOnSpark {
 CarbonProperties.getInstance().getGlobalSortRddStorageLevel()))
 }
 
+val sortStepRowConverter: SortStepRowHandler = new 
SortStepRowHandler(sortParameters)
 import scala.reflect.classTag
+
+// 3. sort
 val sortRDD = convertRDD
-  .sortBy(_.getData, numPartitions = numPartitions)(RowOrdering, 
classTag[Array[AnyRef]])
-  .mapPartitionsWithIndex { case (index, rows) =>
-DataLoadProcessorStepOnSpark.convertTo3Parts(rows, index, 
modelBroadcast,
-  sortStepRowCounter)
-  }
+  .map(r => DataLoadProcessorStepOnSpark.convertTo3Parts(r, 
TaskContext.getPartitionId(),
+modelBroadcast, sortStepRowConverter, sortStepRowCounter))
+  .sortBy(r => r.getData, numPartitions = numPartitions)(RowOrdering, 
classTag[Array[AnyRef]])
--- End diff --

@xuchuanyin ...
This PR is for compressing sort temp files but this code modification is 
for data load using global sort flow which does not involve creation of sort 
temp files.  Can you please clarify?


---


[GitHub] carbondata pull request #1632: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1632#discussion_r156919460
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/NonDictionaryUtil.java ---
@@ -108,60 +105,21 @@ public static Object getMeasure(int index, Object[] 
row) {
 return measures[index];
   }
 
-  public static byte[] getByteArrayForNoDictionaryCols(Object[] row) {
-
-return (byte[]) row[WriteStepRowUtil.NO_DICTIONARY_AND_COMPLEX];
+  /**
+   * Method to get the required non-dictionary & complex from 3-parted row
+   * @param index
+   * @param row
+   * @return
+   */
+  public static byte[] getNonDictOrComplex(int index, Object[] row) {
--- End diff --

Rename the method to getNoDictOrComplex


---


[GitHub] carbondata issue #1311: [CARBONDATA-1439] Wrong Error message shown for Bad ...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1311
  
@shivangi1015 ...kindly rebase this PR..it has merge conflicts


---


[GitHub] carbondata issue #1655: [CARBONDATA-1894] Add compactionType Parameter to co...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1655
  
LGTM


---


[GitHub] carbondata pull request #1653: [CARBONDATA-1893] Data load with multiple QUO...

2017-12-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1653#discussion_r156888264
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -871,6 +871,32 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
   throw new MalformedCarbonCommandException(errorMessage)
 }
 
+// Validate QUOTECHAR length
+if (options.exists(_._1.equalsIgnoreCase("QUOTECHAR"))) {
+  val quoteChar: String = options.get("quotechar").get.head._2
+  if (quoteChar.length > 1) {
+throw new MalformedCarbonCommandException("Quotation cannot be 
more than one character.")
+  }
+}
+
+// Validate COMMENTCHAR length
+if (options.exists(_._1.equalsIgnoreCase("COMMENTCHAR"))) {
+  val commentChar: String = options.get("commentchar").get.head._2
+  if (commentChar.length > 1) {
+throw new MalformedCarbonCommandException(
+  "Comment marker cannot be more than one character.")
+  }
+}
+
+// Validate ESCAPECHAR length
+if (options.exists(_._1.equalsIgnoreCase("ESCAPECHAR"))) {
+  val escapechar: String = options.get("escapechar").get.head._2
+  if (escapechar.length > 1) {
--- End diff --

\n, \t are valid escape sequences...add a method to validate valid and 
allowed escape sequences


---


[GitHub] carbondata issue #1637: [CARBONDATA-1876]clean all the InProgress segments f...

2017-12-13 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1637
  
retest sdv please


---


[GitHub] carbondata issue #1643: [CARBONDATA-1883] Improvement in merge index code

2017-12-12 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1643
  
LGTM


---


[GitHub] carbondata issue #1622: [CARBONDATA-1865] Remove unnecessary condition to ch...

2017-12-12 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1622
  
LGTM


---


[GitHub] carbondata issue #1646: [CARBONDATA-1886] Delete stale segment folders on ne...

2017-12-12 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1646
  
LGTM


---


[GitHub] carbondata issue #1627: [CARBONDATA-1759]make visibility of segments as fals...

2017-12-12 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1627
  
LGTM


---


[GitHub] carbondata issue #1630: [CARBONDATA-1826] Carbon 1.3.0 - Spark 2.2: Describe...

2017-12-12 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1630
  
LGTM


---


[GitHub] carbondata pull request #1643: [CARBONDATA-1883] Improvement in merge index ...

2017-12-12 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1643#discussion_r156416528
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAlterTableCompactionCommand.scala
 ---
@@ -199,22 +199,34 @@ case class CarbonAlterTableCompactionCommand(
 carbonTable.getAbsoluteTableIdentifier).asScala,
   carbonLoadModel.getTablePath,
   carbonTable, true)
-lock.unlock()
-return
+
+// trigger event for merge index
+val operationContext = new OperationContext
+val alterTableCompactionPreEvent: AlterTableCompactionPreEvent 
=
+  AlterTableCompactionPreEvent(sqlContext.sparkSession,
+carbonTable,
+null,
+"")
+OperationListenerBus.getInstance
+  .fireEvent(alterTableCompactionPreEvent, operationContext)
+
+  } else {
+CarbonDataRDDFactory.startCompactionThreads(
+  sqlContext,
+  carbonLoadModel,
+  storeLocation,
+  compactionModel,
+  lock,
+  operationContext
+)
   }
-  CarbonDataRDDFactory.startCompactionThreads(
-sqlContext,
-carbonLoadModel,
-storeLocation,
-compactionModel,
-lock,
-operationContext
-  )
 } catch {
   case e: Exception =>
 LOGGER.error(s"Exception in start compaction thread. ${ 
e.getMessage }")
 lock.unlock()
--- End diff --

Remove lock.unlock() from catch block


---


[GitHub] carbondata pull request #1630: [CARBONDATA-1826] Carbon 1.3.0 - Spark 2.2: D...

2017-12-12 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1630#discussion_r156305867
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala
 ---
@@ -175,6 +175,15 @@ object CarbonReflectionUtils {
 }
   }
 
+  def getDescribeTableFormattedField[T: TypeTag : reflect.ClassTag](obj: 
T): Boolean = {
+var isFormatted: Boolean = false
--- End diff --

make it val


---


[GitHub] carbondata pull request #1627: [CARBONDATA-1759]make visibility of segments ...

2017-12-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1627#discussion_r156275385
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/DeleteLoadFolders.java
 ---
@@ -122,26 +122,21 @@ private static boolean 
checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
 return false;
   }
 
-  public static boolean deleteLoadFoldersFromFileSystem(
+  public static void deleteLoadFoldersFromFileSystem(
   AbsoluteTableIdentifier absoluteTableIdentifier, boolean 
isForceDelete,
   LoadMetadataDetails[] details) {
-boolean isDeleted = false;
 
 if (details != null && details.length != 0) {
   for (LoadMetadataDetails oneLoad : details) {
 if (checkIfLoadCanBeDeleted(oneLoad, isForceDelete)) {
   String path = getSegmentPath(absoluteTableIdentifier, 0, 
oneLoad);
-  boolean deletionStatus = 
physicalFactAndMeasureMetadataDeletion(path);
-  if (deletionStatus) {
-isDeleted = true;
-oneLoad.setVisibility("false");
-LOGGER.info("Info: Deleted the load " + oneLoad.getLoadName());
-  }
+  physicalFactAndMeasureMetadataDeletion(path);
--- End diff --

While handling deletion in this method return true in case file does not 
exist and add a warning logger. As we dont have any other mechanism for clean 
up it will be good if we keep all the remaining code same and only set the 
status as true in case of non existence of file


---


[GitHub] carbondata issue #1625: [CARBONDATA-1869] Null pointer exception thrown when...

2017-12-11 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1625
  
LGTM


---


[GitHub] carbondata pull request #1630: [CARBONDATA-1826] Carbon 1.3.0 - Spark 2.2: D...

2017-12-11 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1630#discussion_r156134688
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala
 ---
@@ -175,6 +175,17 @@ object CarbonReflectionUtils {
 }
   }
 
+  def getDescribeTableFormattedField[T: TypeTag : reflect.ClassTag](obj: 
T): Boolean = {
+var isFormatted: Boolean = false
+val im = rm.reflect(obj)
+for (m <- typeOf[T].members.filter(!_.isMethod)) {
+  if (m.toString.contains("isFormatted")) {
+isFormatted = im.reflectField(m.asTerm).get.asInstanceOf[Boolean]
--- End diff --

Once the condition is satisfied, exit the loopAlso can you try to use 
find method here instead of using for loop..


---


[GitHub] carbondata issue #1624: [CARBONDATA-1867] Add support for task/segment level...

2017-12-06 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1624
  
@ravipesala ..handled review comments..kindly review and merge


---


[GitHub] carbondata pull request #1624: [WIP][CARBONDATA-1867] Add support for task/s...

2017-12-06 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1624

[WIP][CARBONDATA-1867] Add support for task/segment level pruning

Added support for task/segment level pruning. Added code to compute task 
level min/max which can be helpful for task/segment level pruning

Note: Dependent on PR #1619 ..Need to be rebased once PR 1619 is merged.

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
add_task_level_min_max

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1624


commit 5ada233b0e96baf9864407d6c9364495315ca6bc
Author: manishgupta88 <tomanishgupt...@gmail.com>
Date:   2017-12-06T11:41:51Z

Added support for task/segment level pruning. Added code to compute task 
level min/max which can be helpful for task/segment level pruning




---


[GitHub] carbondata pull request #1619: [CARBONDATA-1854] Add support for implicit co...

2017-12-05 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1619

[CARBONDATA-1854] Add support for implicit column filter

Added code to support implicit column filtering

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
implicit_filter_support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1619






---


[GitHub] carbondata pull request #1617: [CARBONDATA-1592] Added TaskPreSubmitExecutio...

2017-12-05 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/1617

[CARBONDATA-1592] Added TaskPreSubmitExecutionEvent for any operations to 
be done before submitting a task

Added TaskPreSubmitExecutionEvent for any operations to be done before 
submitting a task and added properties map in carbon table to holdexternal 
properties if any

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
 No
 - [ ] Testing done
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
preExecutorSubmitEvent

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1617






---


[GitHub] carbondata issue #1574: [CARBONDATA-1816] Changing BAD_RECORDS_ACTION defaul...

2017-11-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1574
  
LGTM


---


[GitHub] carbondata issue #1561: [CARBONDATA-1802] Alter query fails if a column is d...

2017-11-29 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1561
  
LGMT


---


<    2   3   4   5   6   7   8   >