[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-12 Thread lion-x
Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r83039531
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala
 ---
@@ -0,0 +1,78 @@
+package org.apache.carbondata.spark.testsuite.dataload
+
+import java.io.File
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.sql.Row
+
+/**
+  * Created by x00381807 on 2016/9/26.
--- End diff --

Oh, my fault


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-12 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82523962
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestDataLoadWithTrimOption.scala
 ---
@@ -0,0 +1,78 @@
+package org.apache.carbondata.spark.testsuite.dataload
+
+import java.io.File
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import org.apache.spark.sql.Row
+
+/**
+  * Created by x00381807 on 2016/9/26.
--- End diff --

please remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-12 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82977592
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

pros of this approach will be suppose in one load user loaded with dirty 
data and suddenly he realizes no i need to trim then in the next load he will 
enable the option and load the data, this will increase the dictionary space 
also, also in query dictionary lookup overhead will increase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-12 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82968468
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Guys i think if while data loading we are reading from configuration 
inorder to trim it or not same we need to do while doing filter also, based on 
configuration value decide.
ex:  while loading i enabled trim property, so system will trim and load 
the data, now in filter query also if user provides while space then it needs 
to be trimmed while creating filter model. This will provide more system 
consistentency. if user enable trim then we wont trim it while loading and also 
while querying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-12 Thread lion-x
Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82960653
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

hi sujith,
I agree with Eason, when user query with some space filter, it should be 
considered as forbidden action.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-11 Thread eason-lyx
Github user eason-lyx commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82923303
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

hi sujith
i am thinking if user already trim the data with the option setting,then 
when user query with some space filter,it would no getting result.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-10-11 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r82758567
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -102,8 +102,8 @@ public void initialize() throws IOException {
 parserSettings.setMaxColumns(
 getMaxColumnsForParsing(csvParserVo.getNumberOfColumns(), 
csvParserVo.getMaxColumns()));
 parserSettings.setNullValue("");
-parserSettings.setIgnoreLeadingWhitespaces(false);
-parserSettings.setIgnoreTrailingWhitespaces(false);
+parserSettings.setIgnoreLeadingWhitespaces(csvParserVo.getTrim());
--- End diff --

Same you need to handle in CarbonFilters.scala also, since while processing 
filter expressions the spaces are not getting trimmed, so here also you need to 
take care based on your feature


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-09-30 Thread lion-x
Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r81306658
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus 
schmaModel, SchemaInfo info
 model.setEscapeCharacter(schmaModel.getEscapeCharacter());
 model.setQuoteCharacter(schmaModel.getQuoteCharacter());
 model.setCommentCharacter(schmaModel.getCommentCharacter());
+model.setTrim(schmaModel.getTrim());
--- End diff --

In some cases, users want to trim some string field in csv, he set in Load 
DML option, and we need to handle to in csvinput step. So we the trim should be 
transformed to CSVinput step.

For Example, in some usecases, the LeadingWhiteSpace in a String, like " 
fsdfsd" maybe meaningful. Users want to keep the LeadingWhiteSpace. So we add 
this option for users, let them choose.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-09-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281079
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus 
schmaModel, SchemaInfo info
 model.setEscapeCharacter(schmaModel.getEscapeCharacter());
 model.setQuoteCharacter(schmaModel.getQuoteCharacter());
 model.setCommentCharacter(schmaModel.getCommentCharacter());
+model.setTrim(schmaModel.getTrim());
--- End diff --

Not clear about the usecase, can you make it more clear by providing more 
details


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #200: [CARBONDATA-276]add trim option

2016-09-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r81281050
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -465,6 +466,7 @@ private static void generateGraph(IDataProcessStatus 
schmaModel, SchemaInfo info
 model.setEscapeCharacter(schmaModel.getEscapeCharacter());
 model.setQuoteCharacter(schmaModel.getQuoteCharacter());
 model.setCommentCharacter(schmaModel.getCommentCharacter());
+model.setTrim(schmaModel.getTrim());
--- End diff --

why we need to set this in schemamodel?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---