Re: query err 'NullPointerException' but fine after table cached in memory

2017-01-11 Thread Kumar Vishal
Hi,
Can u please share executor log.

-Regards
Kumar Vishal

On Thu, Jan 12, 2017 at 1:59 PM, Li Peng  wrote:

> Hello,
>
> use carbondata 0.2.0, following is the problem:
>
> Only one column 'store_id' throws NullPointerException  when query, but it
> works fine  when some value or table is cached in memory.
>
> store_id's  type is int, cardinality is 200 Thousand, is configured about
> dictionary and inverted index.
>
> sql:
> select
> order_code,saletype,checkout_date,cashier_code,item_cont,
> invoice_price,giveamt,saleamt
> from store.sale where store_id=28
>
> error:
> ERROR 12-01 10:40:16,861 - [Executor task launch
> worker-0][partitionID:sale;queryID:1438806645368420_0]
> java.lang.NullPointerException
> at
> org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultItera
> tor.intialiseInfos(AbstractDetailQueryResultIterator.java:117)
> at
> org.apache.carbondata.scan.result.iterator.AbstractDetailQueryResultItera
> tor.(AbstractDetailQueryResultIterator.java:107)
> at
> org.apache.carbondata.scan.result.iterator.DetailQueryResultIterator.<
> init>(DetailQueryResultIterator.java:43)
> at
> org.apache.carbondata.scan.executor.impl.DetailQueryExecutor.execute(
> DetailQueryExecutor.java:39)
> at
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<
> init>(CarbonScanRDD.scala:216)
> at
> org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(
> CarbonScanRDD.scala:192)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> ERROR 12-01 10:40:16,874 - Exception in task 0.1 in stage 0.0 (TID 1)
> java.lang.RuntimeException: Exception occurred in query execution.Please
> check logs.
> at scala.sys.package$.error(package.scala:27)
> at
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.<
> init>(CarbonScanRDD.scala:226)
> at
> org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(
> CarbonScanRDD.scala:192)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 

[jira] [Created] (CARBONDATA-628) Issue when measure selection with out table order gives wrong result with vectorized reader enabled

2017-01-11 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created CARBONDATA-628:
--

 Summary: Issue when measure selection with out table order gives 
wrong result with vectorized reader enabled
 Key: CARBONDATA-628
 URL: https://issues.apache.org/jira/browse/CARBONDATA-628
 Project: CarbonData
  Issue Type: Bug
Reporter: Ravindra Pesala
Assignee: Ravindra Pesala
Priority: Minor


If the table is created with measure order like m1, m2 and user selects the 
measures m2, m1 then it returns wrong result with vectorized reader enabled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-627) Fix Union unit test case for spark2

2017-01-11 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-627:
---

 Summary: Fix Union unit test case for spark2
 Key: CARBONDATA-627
 URL: https://issues.apache.org/jira/browse/CARBONDATA-627
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor
 Fix For: 1.0.0-incubating


UnionTestCase failed in spark2, We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Created] (CARBONDATA-624) Complete CarbonData document to be present in git and the same needs to sync with the carbondata.apace.org and for further updates.

2017-01-11 Thread Liang Chen
OK, thank you start this work.
One thing please notice : Please only put .md files to github, don't suggest
adding other kind of files to github, like pdf,text and so on.

Regards
Liang



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/jira-Created-CARBONDATA-624-Complete-CarbonData-document-to-be-present-in-git-and-the-same-needs-to--tp5988p6001.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[jira] [Created] (CARBONDATA-625) Abnormal behaviour of Int datatype

2017-01-11 Thread Geetika Gupta (JIRA)
Geetika Gupta created CARBONDATA-625:


 Summary: Abnormal behaviour of Int datatype
 Key: CARBONDATA-625
 URL: https://issues.apache.org/jira/browse/CARBONDATA-625
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.0.0-incubating
 Environment: Spark: 1.6  and hadoop: 2.6.5 
Reporter: Geetika Gupta
Priority: Minor
 Attachments: Screenshot from 2017-01-11 18-36-24.png, 
testMaxValueForBigInt.csv

I was trying to create a table having int as a column and loaded data into the 
table. Data loading was performed successfully but when I viewed the data of 
the table, there was some wrong data present in the table. I was trying to load 
BigInt data to an int column. All the data in int column is loaded with the 
first value of the csv. Below are the details for the queries:

create table xyz(a int, b string)stored by 'carbondata';

Data load query:
LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/testMaxValueForBigInt.csv' 
into table xyz OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='a,b');

select query:
select * from xyz;

PFA the screenshot of the output and the csv file.








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-623) If we drop table after this condition ---(Firstly we load data in table with single pass true and use kettle false and then in same table load data 2nd time with sing

2017-01-11 Thread Payal (JIRA)
Payal created CARBONDATA-623:


 Summary: If we drop table after this condition ---(Firstly we load 
data in table with single pass true and use kettle false and then in same table 
load data 2nd time with single pass true and use kettle false ), it is throwing 
Error: java.lang.NullPointerException 
 Key: CARBONDATA-623
 URL: https://issues.apache.org/jira/browse/CARBONDATA-623
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Reporter: Payal


1.Firstly we load data in table with single pass true and use kettle false data 
load successfully and  we are getting result set properly.
2.then in same table load data in table with single pass true and use kettle 
false data load successfully and  we are getting result set properly.
3.But after that if we drop the table ,its is throwing null pointer exception.

Queries

0: jdbc:hive2://hadoop-master:1> CREATE TABLE uniqdata_INCLUDEDICTIONARY 
(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ 
timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, 
Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 
'org.apache.carbondata.format' 
TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (1.13 seconds)
0: jdbc:hive2://hadoop-master:1> LOAD DATA INPATH 
'hdfs://hadoop-master:54311/data/uniqdata/7000_UniqData.csv' into table 
uniqdata_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 
'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='false','USE_KETTLE'
 ='false');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (22.814 seconds)
0: jdbc:hive2://hadoop-master:1> 
0: jdbc:hive2://hadoop-master:1> select count (distinct CUST_NAME) from 
uniqdata_INCLUDEDICTIONARY ;
+---+--+
|  _c0  |
+---+--+
| 7002  |
+---+--+
1 row selected (3.055 seconds)
0: jdbc:hive2://hadoop-master:1> select  count(CUST_NAME) from 
uniqdata_INCLUDEDICTIONARY ;
+---+--+
|  _c0  |
+---+--+
| 7013  |
+---+--+
1 row selected (0.366 seconds)
0: jdbc:hive2://hadoop-master:1> LOAD DATA INPATH 
'hdfs://hadoop-master:54311/data/uniqdata/7000_UniqData.csv' into table 
uniqdata_INCLUDEDICTIONARY OPTIONS('DELIMITER'=',' , 
'QUOTECHAR'='"','BAD_RECORDS_LOGGER_ENABLE'='TRUE', 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='true','USE_KETTLE'
 ='false');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (4.837 seconds)
0: jdbc:hive2://hadoop-master:1> select  count(CUST_NAME) from 
uniqdata_INCLUDEDICTIONARY ;
++--+
|  _c0   |
++--+
| 14026  |
++--+
1 row selected (0.458 seconds)
0: jdbc:hive2://hadoop-master:1> select count (distinct CUST_NAME) from 
uniqdata_INCLUDEDICTIONARY ;
+---+--+
|  _c0  |
+---+--+
| 7002  |
+---+--+
1 row selected (3.173 seconds)
0: jdbc:hive2://hadoop-master:1> drop table uniqdata_includedictionary;
Error: java.lang.NullPointerException (state=,code=0)




Logs 

WARN  11-01 12:56:52,722 - Lost task 0.0 in stage 61.0 (TID 1740, 
hadoop-slave-2): FetchFailed(BlockManagerId(0, hadoop-slave-3, 45331), 
shuffleId=22, mapId=0, reduceId=0, message=
org.apache.spark.shuffle.FetchFailedException: Failed to connect to 
hadoop-slave-3:45331
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:504)
at