[jira] [Commented] (CARBONDATA-241) OOM error during query execution in long run

2016-09-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495467#comment-15495467
 ] 

ASF GitHub Bot commented on CARBONDATA-241:
---

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79109942
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -102,6 +91,60 @@ public long getTableStatusLastModifiedTime() throws 
IOException {
 
   /**
* get valid segment for given table
+   *
+   * @return
+   * @throws IOException
+   */
+  public InvalidSegmentsInfo getInvalidSegments() throws IOException {
--- End diff --

ok i will handle 


> OOM error during query execution in long run
> 
>
> Key: CARBONDATA-241
> URL: https://issues.apache.org/jira/browse/CARBONDATA-241
> Project: CarbonData
>  Issue Type: Bug
>Reporter: kumar vishal
>Assignee: kumar vishal
>
> **Problem:** During long run query execution is taking more time and it is 
> throwing out of memory issue.
> **Reason**: In compaction we are compacting segments and each segment 
> metadata is loaded in memory. So after compaction compacted segments are 
> invalid but its meta data is not removed from memory because of this 
> duplicate metadata is pile up and it is taking more memory and after few days 
> query exeution is throwing OOM
> **Solution**: Need to remove invalid blocks from memory
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-243) Filters on dictionary column not always work.

2016-09-15 Thread Yue Shang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Shang updated CARBONDATA-243:
-
Description: 
Filter on dictionary column not always work.

Table loaded contains 100 columns, 100M rows, and one of the column c100's 
cardinality is 10M. Dictionary include c100. 

DDL statement as follows:
CREATE TABLE IF NOT EXISTS big_table4
(
c100 Int,
c1 Int, c2 Int, c3 Int, c4 Int, c5 Int, c6 Int, c7 Int, c8 Int, c9 Int, 
c10 Int, c11 Int, c12 Int, c13 Int, c14 Int, c15 Int, c16 Int, c17 Int, c18 
Int, c19 Int, c20 Int,
c21 Int, c22 Int, c23 Int, c24 Int, c25 Int, c26 Int, c27 Int, c28 Int, 
c29 Int, c30 Int, c31 Int, c32 Int, c33 Int, c34 Int, c35 Int, c36 Int, c37 
Int, c38 Int, c39 Int, c40 Int,
c41 Int, c42 Int, c43 Int, c44 Int, c45 Int, c46 Int, c47 Int, c48 Int, 
c49 Int, c50 Int, c51 Int, c52 Int, c53 Int, c54 Int, c55 Int, c56 Int, c57 
Int, c58 Int, c59 Int, c60 Int,
c61 Int, c62 Int, c63 Int, c64 Int, c65 Int, c66 Int, c67 Int, c68 Int, 
c69 Int, c70 Int, c71 Int, c72 Int, c73 Int, c74 Int, c75 Int, c76 Int, c77 
Int, c78 Int, c79 Int, c80 Int,
c81 Int, c82 Int, c83 Int, c84 Int, c85 Int, c86 Int, c87 Int, c88 Int, 
c89 Int, c90 Int, c91 Int, c92 Int, c93 Int, c94 Int, c95 Int, c96 Int, c97 
Int, c98 Int, c99 Int,
 c101 String
)
STORED BY 'carbondata'
TBLPROPERTIES ("DICTIONARY_INCLUDE"="c101,c100, c1, 
c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,
c15")


I tried this query to make sure the value i'm querying is exist.
select c100 from big_table4 where c100 like '1234_' 

​+-+
| c100|
+-+
|12340|
|12341|
|12342|
|12343|
|12344|
|12345|
|12346|
|12347|
|12348|
|12349|
+-+
​
But when I tried: " select c100 from big_table4 where c100 like '12345'​  ", I 
got a runtime error(Log is at the bottom of this mail).

But the most wield part is, some values can be queried while others not.
cc.sql("select c100 from big_table4 where c100 like '116'")
This query will get the exact correct answer.

Any idea about this? 


   log



​ERROR 14-09 17:16:52,024 - [Executor task launch 
worker-0][partitionID:table4;queryID:275884499876481_0] 
java.lang.NullPointerException
at 
org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:95)
at 
org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.(AbstractDetailQueryResultIterator.java:87)
at 
org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.(DetailQueryResultIterator.java:47)
at 
org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:193)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR 14-09 17:16:52,026 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: Exception occurred in query execution.Please check 
logs.
at scala.sys.package$.error(package.scala:27)
at 

[jira] [Created] (CARBONDATA-243) Filters on dictionary column not always work.

2016-09-15 Thread Yue Shang (JIRA)
Yue Shang created CARBONDATA-243:


 Summary: Filters on dictionary column not always work.
 Key: CARBONDATA-243
 URL: https://issues.apache.org/jira/browse/CARBONDATA-243
 Project: CarbonData
  Issue Type: Bug
  Components: examples
Affects Versions: 0.2.0-incubating
 Environment: Ubuntu 14.04
Reporter: Yue Shang
Priority: Minor


Filter on dictionary column not always work.

Table loaded contains 100 columns, 100M rows, and one of the column c100's 
cardinality is 10M. Dictionary include c100. 

I tried this query to make sure the value i'm querying is exist.
select c100 from big_table4 where c100 like '1234_' 

​+-+
| c100|
+-+
|12340|
|12341|
|12342|
|12343|
|12344|
|12345|
|12346|
|12347|
|12348|
|12349|
+-+
​
But when I tried: " select c100 from big_table4 where c100 like '12345'​  ", I 
got a runtime error(Log is at the bottom of this mail).

But the most wield part is, some values can be queried while others not.
cc.sql("select c100 from big_table4 where c100 like '116'")
This query will get the exact correct answer.

Any idea about this? 


   log



​ERROR 14-09 17:16:52,024 - [Executor task launch 
worker-0][partitionID:table4;queryID:275884499876481_0] 
java.lang.NullPointerException
at 
org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:95)
at 
org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.(AbstractDetailQueryResultIterator.java:87)
at 
org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.(DetailQueryResultIterator.java:47)
at 
org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:193)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR 14-09 17:16:52,026 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: Exception occurred in query execution.Please check 
logs.
at scala.sys.package$.error(package.scala:27)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:203)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
 

[jira] [Commented] (CARBONDATA-241) OOM error during query execution in long run

2016-09-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493871#comment-15493871
 ] 

ASF GitHub Bot commented on CARBONDATA-241:
---

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/158#discussion_r79008577
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
 ---
@@ -20,20 +20,13 @@ package org.apache.spark.sql
 import java.text.SimpleDateFormat
 import java.util.Date
 
-import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier
--- End diff --

Merged this commit (compilation issue) changes separately. So can take out 
those changes from PR.


> OOM error during query execution in long run
> 
>
> Key: CARBONDATA-241
> URL: https://issues.apache.org/jira/browse/CARBONDATA-241
> Project: CarbonData
>  Issue Type: Bug
>Reporter: kumar vishal
>Assignee: kumar vishal
>
> **Problem:** During long run query execution is taking more time and it is 
> throwing out of memory issue.
> **Reason**: In compaction we are compacting segments and each segment 
> metadata is loaded in memory. So after compaction compacted segments are 
> invalid but its meta data is not removed from memory because of this 
> duplicate metadata is pile up and it is taking more memory and after few days 
> query exeution is throwing OOM
> **Solution**: Need to remove invalid blocks from memory
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)