[GitHub] incubator-carbondata pull request #327: [CARBONDATA-421]Time Stamp Filter is...

2016-11-17 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/327

[CARBONDATA-421]Time Stamp Filter issue with other than -mm-dd format

Problem: When time format is /mm/dd other than "-" filter query is not 
working , As in filter only "-" is allowed user need to give the filter value 
is "-" but as data loaded in in "/" filter is not working and returning 0 
result. 
Soluntion: Problem is in filter we are taking default format but we need to 
take format used during data loaded while converting the filter values to 
surrogate key

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
TimeStampFilterIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #327


commit f6f0d4caecd7e3fd326de309e3bf43b7095d1e9c
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-11-17T12:59:49Z

Time Stamp Filter issue with other than -mm-dd format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #325: [CARBONDATA-418]Fixed data loading p...

2016-11-17 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/325

[CARBONDATA-418]Fixed data loading performance issue

Problem: In CarbonCSVBasedSeqGenStep for each row dimension column ids 
string converted from string to String array . As split function being called 
it will create string object for each row and it will impact the data loading 
performance.
Soluntion: Create a instance variable and in process row method get in 
column id when first row is passed to this step



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
DataLoadingPerformanceIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #325


commit 024e9be4b63bbd3b5ada639d79b7733e743ca32e
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-11-16T13:02:25Z

Fixed data loading performance issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #296: [CARBONDATA-382]Like Filter Query Op...

2016-11-10 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/296#discussion_r87425199
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterExpressionProcessor.java
 ---
@@ -286,6 +289,13 @@ private FilterResolverIntf 
getFilterResolverBasedOnExpressionType(
   return new RowLevelFilterResolverImpl(expression, 
isExpressionResolve, true,
   tableIdentifier);
 }
+if (currentCondExpression.getFilterExpressionType() == 
ExpressionType.CONTAINS
--- End diff --

For dictionary column do we need to create row level expression?? I think 
for dictionary column creating a include filter for like query will  good 
enough, because we have the dictionary values we can search in dictionary to 
get all the valid values and we can apply filter. Please correct me if i am 
wrong:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-03 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86393676
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -311,80 +278,6 @@ private void addSegmentsIfEmpty(JobContext job, 
AbsoluteTableIdentifier absolute
 return result;
   }
 
-  /**
-   * get total number of rows. Same as count(*)
-   *
-   * @throws IOException
-   * @throws IndexBuilderException
-   */
-  public long getRowCount(JobContext job) throws IOException, 
IndexBuilderException {
--- End diff --

This method is useful for count(*) query as we can return number of rows 
from driver itself , currently we are pushing down to executor, better keep 
this method it will be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-03 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86391469
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/datastore/block/Distributable.java
 ---
@@ -16,10 +16,12 @@
  */
 package org.apache.carbondata.core.carbon.datastore.block;
 
+import java.io.IOException;
+
 /**
- * Abstract class which is maintains the locations of node.
+ * interface to get the locations of node. Used for making task 
distribution based on locality
  */
-public abstract class Distributable implements Comparable {
+public interface Distributable extends Comparable {
 
-  public abstract String[] getLocations();
+  String[] getLocations() throws IOException;
--- End diff --

Any reason to throw IOException form this method, I think this is not 
required ?? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #237: [CARBONDATA-317] - CSV having only s...

2016-10-16 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/237#discussion_r83549178
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/csv/CarbonCsvRelation.scala
 ---
@@ -148,6 +150,10 @@ case class CarbonCsvRelation protected[spark] (
   .withSkipHeaderRecord(false)
 CSVParser.parse(firstLine, 
csvFormat).getRecords.get(0).asScala.toArray
   }
+  if(null == firstRow) {
+throw new DataLoadingException("Please check your input path and 
make sure " +
--- End diff --

May be csv file does not have header and user is passing the header from 
load command in that case is this a valid message??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...

2016-10-16 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/194#discussion_r83546209
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java ---
@@ -1426,4 +1423,25 @@ private static void 
getUnknownExpressionsList(Expression expression,
   getUnknownExpressionsList(child, lst);
 }
   }
+  /**
+   * This method will compare double values it will preserve
+   * the -0.0 and 0.0 equality as per == ,also preserve NaN equality check 
as per
+   * java.lang.Double.equals()
+   *
+   * @param d1 double value for equality check
+   * @param d2 double value for equality check
+   * @return boolean after comparing two double values.
+   */
+  public static int compare(Double d1, Double d2) {
--- End diff --

Move this method to DataTypeUtil as it can be used from multiple places and 
change the method name to compareDouble


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...

2016-10-16 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/194#discussion_r83546183
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java ---
@@ -1426,4 +1423,25 @@ private static void 
getUnknownExpressionsList(Expression expression,
   getUnknownExpressionsList(child, lst);
 }
   }
+  /**
+   * This method will compare double values it will preserve
+   * the -0.0 and 0.0 equality as per == ,also preserve NaN equality check 
as per
+   * java.lang.Double.equals()
+   *
+   * @param d1 double value for equality check
+   * @param d2 double value for equality check
+   * @return boolean after comparing two double values.
+   */
+  public static int compare(Double d1, Double d2) {
+if ((d1.doubleValue() == d2.doubleValue()) || (Double.isNaN(d1) && 
Double.isNaN(d2))) {
+  return 0;
+}
+if (d1 < d2) {
--- End diff --

can't we add else if and else why three if condition is required??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #196: [CARBONDATA-271]Fixed non filter dat...

2016-09-23 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/196

[CARBONDATA-271]Fixed non filter data mismatch issue.

Problem: While generating the default end key we are taking LONG.MAX key 
and using segment key generator we are generating the end key if cardinality is 
less than it will give some value with in its cardinality and btree searching 
will fail

Solution: From segment property get the dimension cardinality as this is 
the max value for segment 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
NonFilterDataMismatchIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #196


commit db865d428639be0ab7e593f92931f85fcf71
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-09-23T10:48:52Z

Fixed non filter data mismatch issue.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #184: [CARBONDATA-264]Fixed limit query sc...

2016-09-20 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/184

[CARBONDATA-264]Fixed limit query scan time statistics issue

Problem: Scan time is not logging in statistics in case of limit query 
Soluntion: Moved statistics recording from iterator to queryExecutor finish 
method as in case of limit query after consuming records call is not coming to 
hasNext method 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
statisticsissue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/184.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #184


commit 60c92b08d642c9580533b5cf7cc27e0ecf84b244
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-09-21T03:37:09Z

Fixed limit query statistics issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #183: [CARBONDATA-263]Configurable blockle...

2016-09-20 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/183

[CARBONDATA-263]Configurable blocklet distribution

Problem: In case of limit query if limit value is 100 so after consuming 
100 records executor service is not getting shutdown, and it may cause memory 
issue
Solution: Add executor service in query model need to shutdown the executor 
after query execution in carbonscan rdd

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
configurableblockletdistribution

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #183


commit 4c803c9c503d1081a25289103aff60f6d8e9e0d7
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-09-20T18:57:43Z

configurable blocklet distribution




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #182: [CARBONDATA-262]Fixed limit query is...

2016-09-20 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/182

[CARBONDATA-262]Fixed limit query issue

Problem: In case of limit query if limit value is 100 so after consuming 
100 records executor service is not getting shutdown, and it may cause memory 
issue 

Solution: Add executor service in query model need to shutdown the executor 
after query execution in carbonscan rdd

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
limitqueryissue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/182.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #182


commit 2155403f3b217788a71221d14be4af309b891831
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-09-20T13:49:31Z

fixed limit query issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #170: [CARBONDATA-253]OOM issue if distrib...

2016-09-17 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/170

[CARBONDATA-253]OOM issue if distribution is based on blocklet duing query 
execution

Problem:In case of query execution when distribution is based on blocklet 
same blocks are getting loaded multiple times this is because hash code and 
equals method contract is not same, this is can cause OOM issue if distribution 
is based on blocklet
Solution: As same class will be used to identify unique blocks while 
distribution and while loading so creating a wrapper class and implementing 
hash code and equals method based on filepath, offset and length, this will 
remove duplicate blocks and only one block's metadata will be loaded in memory 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
equalsAndHashCodeIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #170


commit 49a76db0ff8e63fc95a309984d215086d888fc7b
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-09-17T13:25:36Z

equalsAndHashCodeIssue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-16 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79207294
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
 ---
@@ -20,20 +20,13 @@ package org.apache.spark.sql
 import java.text.SimpleDateFormat
 import java.util.Date
 
-import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-15 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79109942
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -102,6 +91,60 @@ public long getTableStatusLastModifiedTime() throws 
IOException {
 
   /**
* get valid segment for given table
+   *
+   * @return
+   * @throws IOException
+   */
+  public InvalidSegmentsInfo getInvalidSegments() throws IOException {
--- End diff --

ok i will handle 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

2016-09-15 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/158

[CARBONDATA-241]Fixed out of memory issue during query execution

**Problem:** During long run query execution is taking more time and it is 
throwing out of memory issue.
**Reason**: In compaction we are compacting segments and each segment 
metadata is loaded in memory. So after compaction compacted segments are 
invalid but its meta data is not removed from memory because of this duplicate 
metadata is pile up and it is taking more memory and after few days query 
execution is throwing OOM
**Solution**: Need to remove invalid blocks from memory

 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata OOMIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #158


commit 3ff39301df586597eebf3a8d92ca3c60f5eba531
Author: kumarvishal <kumarvishal.1...@gmail.com>
Date:   2016-09-15T08:41:00Z

Fixed out of memory issue during query execution




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-09-01 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77135391
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/SingleQueryStatisticsRecorder.java
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.carbon.querystatistics;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+/**
+ * Class will be used to record and log the query statistics
+ */
+public class SingleQueryStatisticsRecorder implements Serializable {
--- End diff --

remove Serializable as it not required 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-09-01 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77134233
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/SingleQueryStatisticsRecorder.java
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.core.carbon.querystatistics;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+
+/**
+ * Class will be used to record and log the query statistics
+ */
+public class SingleQueryStatisticsRecorder implements Serializable {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(SingleQueryStatisticsRecorder.class.getName());
+  /**
+   * serialization version
+   */
+  private static final long serialVersionUID = -1L;
+
+  /**
+   * singleton QueryStatisticsRecorder for driver
+   */
+  private HashMap<String, List> queryStatisticsMap;
--- End diff --

use Map<String,List>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-09-01 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r77120540
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsCommonConstants.java
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.carbon.querystatistics;
+
+public final class QueryStatisticsCommonConstants {
--- End diff --

CarbonCommonConstants also we can change it to interface


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #111: [CARBONDATA-194] ArrayIndexOfBoundEx...

2016-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/111#discussion_r76973485
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -104,6 +108,20 @@ public void initialize() throws IOException {
   }
 
   /**
+   * This method will decide the number of coulmns to be parsed for a row 
by univocity parser
+   *
+   * @param columnCountInSchema total number of columns in schema
+   * @return
+   */
+  private int getMaxColumnsForParsing(int columnCountInSchema) {
+int maxNumberOfColumnsForParsing = columnCountInSchema;
+if (columnCountInSchema < MAX_NUMBER_OF_COLUMNS_FOR_PARSING) {
+  maxNumberOfColumnsForParsing = MAX_NUMBER_OF_COLUMNS_FOR_PARSING;
+}
+return maxNumberOfColumnsForParsing;
--- End diff --

I added +10 because to avoid this bug, i think now no need to add 10 as we 
are allowing user to give max number of columns. @gvramana Please comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #111: [CARBONDATA-194] ArrayIndexOfBoundEx...

2016-08-31 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/111#discussion_r76936492
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -41,6 +41,10 @@
 public class UnivocityCsvParser {
 
   /**
+   * Max number of columns that will be parsed for a row by univocity 
parsing
+   */
+  private static final int MAX_NUMBER_OF_COLUMNS_FOR_PARSING = 2000;
--- End diff --

I think i will fail if csv file has more number of column than 2000 and 
schema you have selected less columns. Better expose one property so user can 
also configure max number of columns in csv file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #106: [CARBONDATA-160]Data mismatch issue ...

2016-08-30 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/106#discussion_r76785449
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java ---
@@ -57,7 +68,17 @@
 import org.apache.carbondata.scan.expression.conditional.ListExpression;
 import 
org.apache.carbondata.scan.expression.exception.FilterIllegalMemberException;
 import 
org.apache.carbondata.scan.expression.exception.FilterUnsupportedException;
-import org.apache.carbondata.scan.filter.executer.*;
+import org.apache.carbondata.scan.filter.executer.AndFilterExecuterImpl;
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-30 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r76760660
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/result/iterator/DetailQueryResultIterator.java
 ---
@@ -45,10 +47,28 @@
 
   public DetailQueryResultIterator(List infos, 
QueryModel queryModel) {
 super(infos, queryModel);
+this.queryModel = queryModel;
+  }
+
+  private Boolean flag;
+
+  private Long total = 0L;
+
+  private QueryModel queryModel;
+
+  @Override public boolean hasNext() {
+flag = super.hasNext();
+if(!flag && total > 0) {
--- End diff --

why total >0 check is required? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-30 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r76760563
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/result/iterator/DetailQueryResultIterator.java
 ---
@@ -45,10 +47,28 @@
 
   public DetailQueryResultIterator(List infos, 
QueryModel queryModel) {
 super(infos, queryModel);
+this.queryModel = queryModel;
+  }
+
+  private Boolean flag;
+
+  private Long total = 0L;
--- End diff --

Why wrapper Long is required why not primitive long ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-30 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r76760461
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/result/iterator/DetailQueryResultIterator.java
 ---
@@ -45,10 +47,28 @@
 
   public DetailQueryResultIterator(List infos, 
QueryModel queryModel) {
 super(infos, queryModel);
+this.queryModel = queryModel;
+  }
+
+  private Boolean flag;
+
+  private Long total = 0L;
+
+  private QueryModel queryModel;
+
+  @Override public boolean hasNext() {
+flag = super.hasNext();
--- End diff --

why we are overriding hasnext?? We can handle this in super class 
AbstractDetailQueryResultIterator 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-30 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r76758327
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsRecorder.java
 ---
@@ -61,14 +67,275 @@ public QueryStatisticsRecorder(String queryId) {
*/
   public synchronized void recordStatistics(QueryStatistic statistic) {
 queryStatistics.add(statistic);
+// refresh query Statistics Map
+String key = statistic.getQueryId();
+if (!StringUtils.isEmpty(key)) {
+  // 240954528274124_0 and 240954528274124 is the same query id
+  key = key.substring(0, 15);
+}
+if (queryStatisticsMap.get(key) != null) {
--- End diff --

query id is not based on segment it is based on task 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-30 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r76752133
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsRecorder.java
 ---
@@ -61,14 +67,275 @@ public QueryStatisticsRecorder(String queryId) {
*/
   public synchronized void recordStatistics(QueryStatistic statistic) {
 queryStatistics.add(statistic);
+// refresh query Statistics Map
+String key = statistic.getQueryId();
+if (!StringUtils.isEmpty(key)) {
+  // 240954528274124_0 and 240954528274124 is the same query id
+  key = key.substring(0, 15);
+}
+if (queryStatisticsMap.get(key) != null) {
+  queryStatisticsMap.get(key).add(statistic);
+} else {
+  List newQueryStatistics = new 
ArrayList();
+  newQueryStatistics.add(statistic);
+  queryStatisticsMap.put(key, newQueryStatistics);
+}
   }
 
   /**
* Below method will be used to log the statistic
*/
   public void logStatistics() {
 for (QueryStatistic statistic : queryStatistics) {
-  LOGGER.statistic(statistic.getStatistics(queryIWthTask));
+  LOGGER.statistic(statistic.getStatistics());
+}
+  }
+
+  /**
+   * Below method will be used to show statistic log as table
+   */
+  public void logStatisticsTable() {
+String tableInfo = putStatisticsIntoTable();
+if (null != tableInfo) {
+  LOGGER.statistic(tableInfo);
+}
+  }
+
+  /**
+   * Below method will parse queryStatisticsMap and put time into table
+   */
+  public String putStatisticsIntoTable() {
+for (String key: queryStatisticsMap.keySet()) {
+  try {
+// TODO: get the finished query, and print Statistics
+if (queryStatisticsMap.get(key).size() > 8) {
+  String jdbc_connection_time = "";
+  String sql_parse_time = "";
+  String load_meta_time = "";
+  String block_identification_time = "";
+  String schedule_time = "";
+  String driver_part_time = "";
+  String executor_part_time = "";
+  String load_index_time = "";
+  String scan_data_time = "";
+  String dictionary_load_time = "";
--- End diff --

dictionary loading time required in both the side executor and driver, 
please add in driver side 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #91: [WIP] Add performance statistics logs...

2016-08-30 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/91#discussion_r76751950
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/QueryStatisticsRecorder.java
 ---
@@ -61,14 +67,275 @@ public QueryStatisticsRecorder(String queryId) {
*/
   public synchronized void recordStatistics(QueryStatistic statistic) {
 queryStatistics.add(statistic);
+// refresh query Statistics Map
+String key = statistic.getQueryId();
+if (!StringUtils.isEmpty(key)) {
+  // 240954528274124_0 and 240954528274124 is the same query id
+  key = key.substring(0, 15);
+}
+if (queryStatisticsMap.get(key) != null) {
+  queryStatisticsMap.get(key).add(statistic);
+} else {
+  List newQueryStatistics = new 
ArrayList();
+  newQueryStatistics.add(statistic);
+  queryStatisticsMap.put(key, newQueryStatistics);
+}
   }
 
   /**
* Below method will be used to log the statistic
*/
   public void logStatistics() {
 for (QueryStatistic statistic : queryStatistics) {
-  LOGGER.statistic(statistic.getStatistics(queryIWthTask));
+  LOGGER.statistic(statistic.getStatistics());
+}
+  }
+
+  /**
+   * Below method will be used to show statistic log as table
+   */
+  public void logStatisticsTable() {
+String tableInfo = putStatisticsIntoTable();
+if (null != tableInfo) {
+  LOGGER.statistic(tableInfo);
+}
+  }
+
+  /**
+   * Below method will parse queryStatisticsMap and put time into table
+   */
+  public String putStatisticsIntoTable() {
+for (String key: queryStatisticsMap.keySet()) {
+  try {
+// TODO: get the finished query, and print Statistics
+if (queryStatisticsMap.get(key).size() > 8) {
+  String jdbc_connection_time = "";
+  String sql_parse_time = "";
+  String load_meta_time = "";
+  String block_identification_time = "";
+  String schedule_time = "";
+  String driver_part_time = "";
+  String executor_part_time = "";
+  String load_index_time = "";
--- End diff --

change load_index_time to block loading time 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #88: [CARBONDATA-173]The exception info is...

2016-08-25 Thread kumarvishal09
Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/88#discussion_r76234490
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -489,9 +489,10 @@ object GlobalDictionaryUtil extends Logging {
 val preDictDimensionOption = dimensions.filter(
   _.getColName.equalsIgnoreCase(dimParent))
 if (preDictDimensionOption.length == 0) {
-  logError(s"No column $dimParent exists in 
${table.getDatabaseName}.${table.getTableName}")
-  throw new DataLoadingException(s"No column $colName exists " +
-  s"in ${table.getDatabaseName}.${table.getTableName}")
+  logError(s"No key column $dimParent exists in 
${table.getDatabaseName}.${table.getTableName}")
+  throw new DataLoadingException(s"No key column $colName exists in " +
+s"${table.getDatabaseName}.${table.getTableName}, please make sure 
" +
+s"that $colName can be dictionary.")
--- End diff --

Please add a message only key column can be part of dictionary 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---