[jira] [Commented] (CARBONDATA-241) OOM error during query execution in long run
[ https://issues.apache.org/jira/browse/CARBONDATA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495467#comment-15495467 ] ASF GitHub Bot commented on CARBONDATA-241: --- Github user kumarvishal09 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79109942 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -102,6 +91,60 @@ public long getTableStatusLastModifiedTime() throws IOException { /** * get valid segment for given table + * + * @return + * @throws IOException + */ + public InvalidSegmentsInfo getInvalidSegments() throws IOException { --- End diff -- ok i will handle > OOM error during query execution in long run > > > Key: CARBONDATA-241 > URL: https://issues.apache.org/jira/browse/CARBONDATA-241 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > > **Problem:** During long run query execution is taking more time and it is > throwing out of memory issue. > **Reason**: In compaction we are compacting segments and each segment > metadata is loaded in memory. So after compaction compacted segments are > invalid but its meta data is not removed from memory because of this > duplicate metadata is pile up and it is taking more memory and after few days > query exeution is throwing OOM > **Solution**: Need to remove invalid blocks from memory > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-243) Filters on dictionary column not always work.
[ https://issues.apache.org/jira/browse/CARBONDATA-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Shang updated CARBONDATA-243: - Description: Filter on dictionary column not always work. Table loaded contains 100 columns, 100M rows, and one of the column c100's cardinality is 10M. Dictionary include c100. DDL statement as follows: CREATE TABLE IF NOT EXISTS big_table4 ( c100 Int, c1 Int, c2 Int, c3 Int, c4 Int, c5 Int, c6 Int, c7 Int, c8 Int, c9 Int, c10 Int, c11 Int, c12 Int, c13 Int, c14 Int, c15 Int, c16 Int, c17 Int, c18 Int, c19 Int, c20 Int, c21 Int, c22 Int, c23 Int, c24 Int, c25 Int, c26 Int, c27 Int, c28 Int, c29 Int, c30 Int, c31 Int, c32 Int, c33 Int, c34 Int, c35 Int, c36 Int, c37 Int, c38 Int, c39 Int, c40 Int, c41 Int, c42 Int, c43 Int, c44 Int, c45 Int, c46 Int, c47 Int, c48 Int, c49 Int, c50 Int, c51 Int, c52 Int, c53 Int, c54 Int, c55 Int, c56 Int, c57 Int, c58 Int, c59 Int, c60 Int, c61 Int, c62 Int, c63 Int, c64 Int, c65 Int, c66 Int, c67 Int, c68 Int, c69 Int, c70 Int, c71 Int, c72 Int, c73 Int, c74 Int, c75 Int, c76 Int, c77 Int, c78 Int, c79 Int, c80 Int, c81 Int, c82 Int, c83 Int, c84 Int, c85 Int, c86 Int, c87 Int, c88 Int, c89 Int, c90 Int, c91 Int, c92 Int, c93 Int, c94 Int, c95 Int, c96 Int, c97 Int, c98 Int, c99 Int, c101 String ) STORED BY 'carbondata' TBLPROPERTIES ("DICTIONARY_INCLUDE"="c101,c100, c1, c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14, c15") I tried this query to make sure the value i'm querying is exist. select c100 from big_table4 where c100 like '1234_' +-+ | c100| +-+ |12340| |12341| |12342| |12343| |12344| |12345| |12346| |12347| |12348| |12349| +-+ But when I tried: " select c100 from big_table4 where c100 like '12345' ", I got a runtime error(Log is at the bottom of this mail). But the most wield part is, some values can be queried while others not. cc.sql("select c100 from big_table4 where c100 like '116'") This query will get the exact correct answer. Any idea about this? log ERROR 14-09 17:16:52,024 - [Executor task launch worker-0][partitionID:table4;queryID:275884499876481_0] java.lang.NullPointerException at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:95) at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.(AbstractDetailQueryResultIterator.java:87) at org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.(DetailQueryResultIterator.java:47) at org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:193) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR 14-09 17:16:52,026 - Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Exception occurred in query execution.Please check logs. at scala.sys.package$.error(package.scala:27) at
[jira] [Created] (CARBONDATA-243) Filters on dictionary column not always work.
Yue Shang created CARBONDATA-243: Summary: Filters on dictionary column not always work. Key: CARBONDATA-243 URL: https://issues.apache.org/jira/browse/CARBONDATA-243 Project: CarbonData Issue Type: Bug Components: examples Affects Versions: 0.2.0-incubating Environment: Ubuntu 14.04 Reporter: Yue Shang Priority: Minor Filter on dictionary column not always work. Table loaded contains 100 columns, 100M rows, and one of the column c100's cardinality is 10M. Dictionary include c100. I tried this query to make sure the value i'm querying is exist. select c100 from big_table4 where c100 like '1234_' +-+ | c100| +-+ |12340| |12341| |12342| |12343| |12344| |12345| |12346| |12347| |12348| |12349| +-+ But when I tried: " select c100 from big_table4 where c100 like '12345' ", I got a runtime error(Log is at the bottom of this mail). But the most wield part is, some values can be queried while others not. cc.sql("select c100 from big_table4 where c100 like '116'") This query will get the exact correct answer. Any idea about this? log ERROR 14-09 17:16:52,024 - [Executor task launch worker-0][partitionID:table4;queryID:275884499876481_0] java.lang.NullPointerException at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.intialiseInfos(AbstractDetailQueryResultIterator.java:95) at org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultIterator.(AbstractDetailQueryResultIterator.java:87) at org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.(DetailQueryResultIterator.java:47) at org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:39) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:193) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR 14-09 17:16:52,026 - Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Exception occurred in query execution.Please check logs. at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.(CarbonScanRDD.scala:203) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:174) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
[jira] [Commented] (CARBONDATA-241) OOM error during query execution in long run
[ https://issues.apache.org/jira/browse/CARBONDATA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493871#comment-15493871 ] ASF GitHub Bot commented on CARBONDATA-241: --- Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79008577 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala --- @@ -20,20 +20,13 @@ package org.apache.spark.sql import java.text.SimpleDateFormat import java.util.Date -import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier --- End diff -- Merged this commit (compilation issue) changes separately. So can take out those changes from PR. > OOM error during query execution in long run > > > Key: CARBONDATA-241 > URL: https://issues.apache.org/jira/browse/CARBONDATA-241 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > > **Problem:** During long run query execution is taking more time and it is > throwing out of memory issue. > **Reason**: In compaction we are compacting segments and each segment > metadata is loaded in memory. So after compaction compacted segments are > invalid but its meta data is not removed from memory because of this > duplicate metadata is pile up and it is taking more memory and after few days > query exeution is throwing OOM > **Solution**: Need to remove invalid blocks from memory > -- This message was sent by Atlassian JIRA (v6.3.4#6332)