[jira] [Created] (CARBONDATA-906) Always OOM error when import large dataset (100milion rows)

2017-04-11 Thread Crabo Yang (JIRA)
Crabo Yang created CARBONDATA-906:
-

 Summary: Always OOM error when import large dataset (100milion 
rows)
 Key: CARBONDATA-906
 URL: https://issues.apache.org/jira/browse/CARBONDATA-906
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.0.0-incubating
Reporter: Crabo Yang


java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:457)
at 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
at 
org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.addDataToDictionaryMap(ColumnReverseDictionaryInfo.java:101)
at 
org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.addDictionaryChunk(ColumnReverseDictionaryInfo.java:88)
at 
org.apache.carbondata.core.cache.dictionary.DictionaryCacheLoaderImpl.fillDictionaryValuesAndAddToDictionaryChunks(DictionaryCacheLoaderImpl.java:113)
at 
org.apache.carbondata.core.cache.dictionary.DictionaryCacheLoaderImpl.load(DictionaryCacheLoaderImpl.java:81)
at 
org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.loadDictionaryData(AbstractDictionaryCache.java:236)
at 
org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:186)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.getDictionary(ReverseDictionaryCache.java:174)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:67)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:38)
at 
org.apache.carbondata.processing.newflow.converter.impl.DictionaryFieldConverterImpl.(DictionaryFieldConverterImpl.java:92)
at 
org.apache.carbondata.processing.newflow.converter.impl.FieldEncoderFactory.createFieldEncoder(FieldEncoderFactory.java:77)
at 
org.apache.carbondata.processing.newflow.converter.impl.RowConverterImpl.initialize(RowConverterImpl.java:102)
at 
org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl.initialize(DataConverterProcessorStepImpl.java:69)
at 
org.apache.carbondata.processing.newflow.steps.SortProcessorStepImpl.initialize(SortProcessorStepImpl.java:57)
at 
org.apache.carbondata.processing.newflow.steps.DataWriterProcessorStepImpl.initialize(DataWriterProcessorStepImpl.java:79)
at 
org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:45)
at 
org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.(NewCarbonDataLoadRDD.scala:425)
at 
org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:383)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-907) The grammar for DELETE SEGMENT FOR DATE in website is not correct

2017-04-11 Thread chenerlu (JIRA)
chenerlu created CARBONDATA-907:
---

 Summary: The grammar for DELETE SEGMENT FOR DATE in website is not 
correct 
 Key: CARBONDATA-907
 URL: https://issues.apache.org/jira/browse/CARBONDATA-907
 Project: CarbonData
  Issue Type: Bug
Reporter: chenerlu


The grammar for DELETE SEGMENT FOR DATE in website is not correct 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-908) bitmap encode

2017-04-11 Thread Jarck (JIRA)
Jarck created CARBONDATA-908:


 Summary: bitmap encode
 Key: CARBONDATA-908
 URL: https://issues.apache.org/jira/browse/CARBONDATA-908
 Project: CarbonData
  Issue Type: New Feature
  Components: core, data-load, data-query
Reporter: Jarck
Assignee: Jarck


for frequent filter queries on low cardinality columns, use bitmap encode can 
speed up query



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-909) Single pass option in dataframe writer

2017-04-11 Thread Sanoj MG (JIRA)
Sanoj MG created CARBONDATA-909:
---

 Summary: Single pass option in dataframe writer
 Key: CARBONDATA-909
 URL: https://issues.apache.org/jira/browse/CARBONDATA-909
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 1.2.0-incubating, 1.1.1-incubating
 Environment: HDP 2.5 / Spark 1.6
Reporter: Sanoj MG
Assignee: Sanoj MG
Priority: Minor
 Fix For: 1.2.0-incubating, 1.1.1-incubating


While creating a Carbondata table from dataframe, currently it is not possible 
to specify single pass load in Spark 1.6. An option is required to specify it 
as below :

df.write.format("carbondata")
.option("tableName", "test")
.option("compress","true")
.option("single_pass","true")
.mode(SaveMode.Overwrite)
.save()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


how to distribute cabon.properties file

2017-04-11 Thread ZhuWilliam
As we know, cabon.properties is the configuration file of carbondata. However
,for now, the driver and executors load this file from local disk which
means we should distribute this file to all yarn nodes. At the same time ,we
also should configure some items like follows:

--conf
"spark.driver.extraJavaOptions=-Dcarbon.properties.filepath=/home/carbon/carbon.properties"
  
--conf
"spark.executor.extraJavaOptions=-Dcarbon.properties.filepath=/home/carbon/carbon.properties"

I think the best way to handle this is using "--files". Eg.

--files /home/carbon/carbon.properties

Then carbondata should try to load from classpath first. 

Also,we hope we can using --conf to overwrite the configurations in
carbon.properties. Eg.

--conf 'carbon.properties.filepath=HDFSLock'

It's easy to implements this function, we can just modify the the code like
follows:


 






--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/how-to-distribute-cabon-properties-file-tp10687.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


CarbonLock Exception refractor

2017-04-11 Thread ZhuWilliam


 

 

Carbondata will try to lock table/column/dict when loading data. However
,for now ,it silences the exception and just return false,then tells the
user "Table is locked for updation. Please try after some time" which  makes
people confused. There are a lot of reasons which can make locking  fail.
Maybe we should refractor here to give 
users more specific error .



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonLock-Exception-refractor-tp10686.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[jira] [Created] (CARBONDATA-899) Added Support for Decimal data type and Fixed the timestamp and date issues for Spark-2.1

2017-04-11 Thread Bhavya Aggarwal (JIRA)
Bhavya Aggarwal created CARBONDATA-899:
--

 Summary: Added Support for Decimal data type and Fixed the 
timestamp and date issues for Spark-2.1
 Key: CARBONDATA-899
 URL: https://issues.apache.org/jira/browse/CARBONDATA-899
 Project: CarbonData
  Issue Type: Improvement
  Components: presto-integration
Reporter: Bhavya Aggarwal
Assignee: Bhavya Aggarwal
Priority: Minor


The Decimal Type correct support is added as well as issues related to 
timestamp and date are resolved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-897) Redundant Fields Inside * **Global Dictionary Configurations** in Configuration-parameters.md

2017-04-11 Thread Pallavi Singh (JIRA)
Pallavi Singh created CARBONDATA-897:


 Summary: Redundant Fields Inside  * **Global Dictionary 
Configurations** in Configuration-parameters.md
 Key: CARBONDATA-897
 URL: https://issues.apache.org/jira/browse/CARBONDATA-897
 Project: CarbonData
  Issue Type: Bug
  Components: docs
Reporter: Pallavi Singh
Assignee: Pallavi Singh
Priority: Minor
 Attachments: Configurations.png

In the Configuration-parameters.md file under the table Global Dictionary 
Configurations the row for field  high.cardinality.threshold has extra columns 
with redundant values in the md file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-902) NoClassDefFoundError for Decimal datatype during select queries

2017-04-11 Thread Neha Bhardwaj (JIRA)
Neha Bhardwaj created CARBONDATA-902:


 Summary: NoClassDefFoundError for Decimal datatype during select 
queries
 Key: CARBONDATA-902
 URL: https://issues.apache.org/jira/browse/CARBONDATA-902
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
 Environment: Spark 2.1, Hive 1.2.1
Reporter: Neha Bhardwaj
Priority: Minor
 Attachments: testHive1.csv

Decimal data type raises exception while selecting the data from the table in 
hive.

Steps to reproduce:
1) In Spark Shell :

 a) Create Table -
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._

val carbon = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://localhost:54310/opt/data")
 
 scala> carbon.sql(""" create table testHive1(id int,name string,dob 
timestamp,experience decimal,salary double,incentive bigint) stored 
by'carbondata' """).show 

 b) Load Data - 
scala> carbon.sql(""" load data inpath 
'hdfs://localhost:54310/Files/testHive1.csv' into table testHive1 """ ).show


2) In Hive : 

 a) Add Jars - 
add jar 
/home/neha/incubator-carbondata/integration/hive/carbondata-hive-1.1.0-incubating-SNAPSHOT.jar;
add jar 
/home/neha/incubator-carbondata/assembly/target/scala-2.11/carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar;
 
 b) Set Properties - 
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;

c) Alter location - 
hive> alter table testHive1 set LOCATION 
'hdfs://localhost:54310/opt/data/default/testhive1' ;

d) Alter FileFormat -
alter table testHive1 set FILEFORMAT
INPUTFORMAT "org.apache.carbondata.hive.MapredCarbonInputFormat"
OUTPUTFORMAT "org.apache.carbondata.hive.MapredCarbonOutputFormat"
SERDE "org.apache.carbondata.hive.CarbonHiveSerDe";

 e) Create Table -
create table testHive1(id int,name string,dob timestamp,experience 
decimal,salary double,incentive bigint);

 f) Execute Queries - 
select * from testHive1;

3) Query :
hive> select * from testHive1;

Expected Output : 
ResultSet should display all the data present in the table.

Result:
Exception in thread "[main][partitionID:testhive1;queryID:8945394553892]" 
java.lang.NoClassDefFoundError: org/apache/spark/sql/types/Decimal
at 
org.apache.carbondata.core.scan.collector.impl.AbstractScannedResultCollector.getMeasureData(AbstractScannedResultCollector.java:109)
at 
org.apache.carbondata.core.scan.collector.impl.AbstractScannedResultCollector.fillMeasureData(AbstractScannedResultCollector.java:78)
at 
org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.fillMeasureData(DictionaryBasedResultCollector.java:158)
at 
org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.collectData(DictionaryBasedResultCollector.java:115)
at 
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:51)
at 
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:32)
at 
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:50)
at 
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41)
at 
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:31)
at 
org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.(ChunkRowIterator.java:41)
at 
org.apache.carbondata.hive.CarbonHiveRecordReader.initialize(CarbonHiveRecordReader.java:84)
at 
org.apache.carbondata.hive.CarbonHiveRecordReader.(CarbonHiveRecordReader.java:66)
at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:68)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at 

[jira] [Created] (CARBONDATA-900) Is null query on a newly added measure column is not returning proper results

2017-04-11 Thread Manish Gupta (JIRA)
Manish Gupta created CARBONDATA-900:
---

 Summary:  Is null query on a newly added measure column is not 
returning proper results
 Key: CARBONDATA-900
 URL: https://issues.apache.org/jira/browse/CARBONDATA-900
 Project: CarbonData
  Issue Type: Bug
Reporter: Manish Gupta
Assignee: Manish Gupta
Priority: Minor
 Fix For: 1.1.0-incubating


When is null query is executed on newly added measure column, control goes to 
RowLevelFilterExecuterImpl class, where measure existence is checked. In case 
the measure is not found, bitset group is not getting populated with default 
values due to which that block is not returning any result.
Below queries can be executed to reproduce the issue:

CREATE TABLE uniqdata110 (CUST_ID int,CUST_NAME String) STORED BY 'carbondata'
LOAD DATA INPATH '' into table uniqdata110 
OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='TRUE', 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME')
ALTER TABLE uniqdata110  ADD COLUMNS (a6 int)
LOAD DATA INPATH '' into table uniqdata110 
OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='TRUE', 
'BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,a6')
select * from uniqdata110
select * from uniqdata110 where a6 is null

Data:
7,hello1
8,welcome1
bye,11




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-901) fix some spelling mistakes

2017-04-11 Thread Cao Gaofei (JIRA)
Cao Gaofei created CARBONDATA-901:
-

 Summary: fix some spelling mistakes
 Key: CARBONDATA-901
 URL: https://issues.apache.org/jira/browse/CARBONDATA-901
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Cao Gaofei
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-896) throws nullpointerException while insert data

2017-04-11 Thread sehriff (JIRA)
sehriff created CARBONDATA-896:
--

 Summary: throws nullpointerException while insert data
 Key: CARBONDATA-896
 URL: https://issues.apache.org/jira/browse/CARBONDATA-896
 Project: CarbonData
  Issue Type: Bug
Reporter: sehriff


insert data into carbon table from hive using sql like:
cc.sql("insert into carbon.table_carbon select * from hivetable").show
and got the following error:
Job aborted due to stage failure: Task 0 in stage 26.3 failed 4 times, most 
recent failure: Lost task 0.3 in stage 26.3 (TID 5628, HDD013): 
java.lang.NullPointerException
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getLastModifiedTime(AbstractDFSCarbonFile.java:135)
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isFileModified(AbstractDFSCarbonFile.java:210)
at 
org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.isDictionaryMetaFileModified(AbstractDictionaryCache.java:119)
at 
org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:158)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.getDictionary(ReverseDictionaryCache.java:174)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:67)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:38)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:463)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:469)
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:413)
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:342)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-904) ArrayIndexOutOfBoundsException

2017-04-11 Thread SWATI RAO (JIRA)
SWATI RAO created CARBONDATA-904:


 Summary: ArrayIndexOutOfBoundsException 
 Key: CARBONDATA-904
 URL: https://issues.apache.org/jira/browse/CARBONDATA-904
 Project: CarbonData
  Issue Type: Bug
Reporter: SWATI RAO
 Attachments: Test_Data1.csv, Test_Data1_h1.csv

Or operator is not working properly.

When we execute these query in hive it is working fine but when we execute the 
same in carbondata it throws an exception:
java.lang.ArrayIndexOutOfBoundsException

HIVE:
0: jdbc:hive2://hadoop-master:1> create table Test_Boundary_h1 (c1_int 
int,c2_Bigint Bigint,c3_Decimal Decimal(38,30),c4_double double,c5_string 
string,c6_Timestamp Timestamp,c7_Datatype_Desc string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ;
+-+--+
| result  |
+-+--+
+-+--+
No rows selected (1.177 seconds)
0: jdbc:hive2://hadoop-master:1> load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/Test_Data1_h1.csv' OVERWRITE INTO TABLE 
Test_Boundary_h1 ;
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.437 seconds)
0: jdbc:hive2://hadoop-master:1> select c6_Timestamp,max(c6_Timestamp) from 
Test_Boundary_h1 where c6_Timestamp ='2017-07-01 12:07:28' or c6_Timestamp 
='2019-07-05 13:07:30' or c6_Timestamp = '1999-01-06 10:05:29' group by 
c6_Timestamp ;
+++--+
|  c6_Timestamp  |  _c1   |
+++--+
| 2017-07-01 12:07:28.0  | 2017-07-01 12:07:28.0  |
+++--+
1 row selected (1.637 seconds)

CARBONDATA:
0: jdbc:hive2://hadoop-master:1> create table Test_Boundary (c1_int 
int,c2_Bigint Bigint,c3_Decimal Decimal(38,30),c4_double double,c5_string 
string,c6_Timestamp Timestamp,c7_Datatype_Desc string) STORED BY 
'org.apache.carbondata.format' ;
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (4.48 seconds)

0: jdbc:hive2://hadoop-master:1> LOAD DATA INPATH 
'hdfs://192.168.2.145:54310/BabuStore/Data/Test_Data1.csv' INTO table 
Test_Boundary 
OPTIONS('DELIMITER'=',','QUOTECHAR'='','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='')
 ;
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (4.445 seconds)
0: jdbc:hive2://hadoop-master:1> select c6_Timestamp,max(c6_Timestamp) from 
Test_Boundary where c6_Timestamp ='2017-07-01 12:07:28' or c6_Timestamp =' 
2019-07-05 13:07:30' or c6_Timestamp = '1999-01-06 10:05:29' group by 
c6_Timestamp ;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 
(TID 8, hadoop-master): java.lang.RuntimeException: 
java.util.concurrent.ExecutionException: 
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.updateScanner(AbstractDataBlockIterator.java:136)
at 
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:50)
at 
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:32)
at 
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:50)
at 
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41)
at 
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:31)
at 
org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.(ChunkRowIterator.java:41)
at 
org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:79)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:204)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at 

[jira] [Created] (CARBONDATA-905) Unable to execute method public: org.apache.hadoop.hive.ql.metadata.HiveException

2017-04-11 Thread SWATI RAO (JIRA)
SWATI RAO created CARBONDATA-905:


 Summary: Unable to execute method public: 
org.apache.hadoop.hive.ql.metadata.HiveException
 Key: CARBONDATA-905
 URL: https://issues.apache.org/jira/browse/CARBONDATA-905
 Project: CarbonData
  Issue Type: Bug
 Environment: Spark1.6
Reporter: SWATI RAO
 Fix For: 1.1.0-incubating
 Attachments: Test_Data1.csv, Test_Data1_h1.csv

When we execute Same query in hive, it is working fine but when we execute in 
carbondata "org.apache.hadoop.hive.ql.metadata.HiveException" occurs. 

HIVE:
0: jdbc:hive2://hadoop-master:1> create table Test_Boundary_h1 (c1_int 
int,c2_Bigint Bigint,c3_Decimal Decimal(38,30),c4_double double,c5_string 
string,c6_Timestamp Timestamp,c7_Datatype_Desc string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ;
-+
result

-+
-+
No rows selected (1.177 seconds)
0: jdbc:hive2://hadoop-master:1> load data local inpath 
'/opt/Carbon/CarbonData/TestData/Data/Test_Data1_h1.csv' OVERWRITE INTO TABLE 
Test_Boundary_h1 ;
-+
Result

-+
-+
No rows selected (0.437 seconds)

0: jdbc:hive2://hadoop-master:1> select 
min(c1_int),max(c1_int),sum(c1_int),avg(c1_int) , count(c1_int), 
variance(c1_int) from Test_Boundary_h1 where rand(c1_int)=0.6201007799387834 or 
rand(c1_int)=0.45540022789662593 ;
+---+---+---+---+--+---+--+
|  _c0  |  _c1  |  _c2  |  _c3  | _c4  |  _c5  |
+---+---+---+---+--+---+--+
| NULL  | NULL  | NULL  | NULL  | 0| NULL  |
+---+---+---+---+--+---+--+
1 row selected (0.996 seconds)


CARBONDATA:
0: jdbc:hive2://hadoop-master:1> create table Test_Boundary (c1_int 
int,c2_Bigint Bigint,c3_Decimal Decimal(38,30),c4_double double,c5_string 
string,c6_Timestamp Timestamp,c7_Datatype_Desc string) STORED BY 
'org.apache.carbondata.format' ;
-+
Result

-+
-+
No rows selected (4.48 seconds)

0: jdbc:hive2://hadoop-master:1> LOAD DATA INPATH 
'hdfs://192.168.2.145:54310/BabuStore/Data/Test_Data1.csv' INTO table 
Test_Boundary 
OPTIONS('DELIMITER'=',','QUOTECHAR'='','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='')
 ;
-+
Result

-+
-+
No rows selected (4.445 seconds)

0: jdbc:hive2://hadoop-master:1> select 
min(c1_int),max(c1_int),sum(c1_int),avg(c1_int) , count(c1_int), 
variance(c1_int) from Test_Boundary where rand(c1_int)=0.6201007799387834 or 
rand(c1_int)=0.45540022789662593 ;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 19.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
19.0 (TID 826, hadoop-master): 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method 
public org.apache.hadoop.hive.serde2.io.DoubleWritable 
org.apache.hadoop.hive.ql.udf.UDFRand.evaluate(org.apache.hadoop.io.LongWritable)
  on object org.apache.hadoop.hive.ql.udf.UDFRand@3152da1e of class 
org.apache.hadoop.hive.ql.udf.UDFRand with arguments {null} of size 1
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:981)
at org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:185)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$$anonfun$create$2.apply(GeneratePredicate.scala:68)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$$anonfun$create$2.apply(GeneratePredicate.scala:68)
at 
org.apache.spark.sql.execution.Filter$$anonfun$2$$anonfun$apply$2.apply(basicOperators.scala:74)
at 
org.apache.spark.sql.execution.Filter$$anonfun$2$$anonfun$apply$2.apply(basicOperators.scala:72)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:504)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at 

Re: [jira] [Created] (CARBONDATA-836) Error in load using dataframe - columns containing comma

2017-04-11 Thread Jacky Li
Hi Sanoj,

This is because in CarbonData loading flow, it needs to scan input data twice 
(one for generating global dictionary, another for actual loading). If user is 
using Dataframe to write to CarbonData, and if the input dataframe compute is 
costly, it is better to save it as a temporary CSV file first and load into 
CarbonData instead of computing the dataframe twice.

However there is another option that can do single pass data load, by using 
.option(“single_pass”, “true”), in this case, the input dataframe should be 
computed only once. But when I check the code just now, it seems this behavior 
is not implemented. :( 
I think you are free to create JIRA ticket if you want.

Regards,
Jacky

> 在 2017年4月11日,上午10:36,Sanoj MG  写道:
> 
> Hi All,
> 
> In CarbonDataFrameWriter, there is an option to load using CSV file.
> 
> if (options.tempCSV) {
> 
>  loadTempCSV(options)
> } else {
>  loadDataFrame(options)
> }
> 
> Why is this choice required? Is there any issue if we load it directly
> without using CSV?
> 
> I have many dimension table with comma in string columns, and so always use
> .option("tempCSV", "false"). In CarbonOption can we set the default value
> as "false" as below
> 
> def tempCSV: Boolean = options.getOrElse("tempCSV", "false").toBoolean
> 
> Thanks,
> Sanoj
> 
> 
> On Thu, Mar 30, 2017 at 12:14 PM, Sanoj MG (JIRA)  wrote:
> 
>> Sanoj MG created CARBONDATA-836:
>> ---
>> 
>> Summary: Error in load using dataframe  - columns containing
>> comma
>> Key: CARBONDATA-836
>> URL: https://issues.apache.org/jira/browse/CARBONDATA-836
>> Project: CarbonData
>>  Issue Type: Bug
>>  Components: spark-integration
>>Affects Versions: 1.1.0-incubating
>> Environment: HDP sandbox 2.5, Spark 1.6.2
>>Reporter: Sanoj MG
>>Priority: Minor
>> Fix For: NONE
>> 
>> 
>> While trying to load data into Carabondata table using dataframe, the
>> columns containing commas are not properly loaded.
>> 
>> Eg:
>> scala> df.show(false)
>> +---+--+---++-+--+
>> |Country|Branch|Name   |Address |ShortName|Status|
>> +---+--+---++-+--+
>> |2  |1 |Main Branch|, Dubai, UAE|UHO  |256   |
>> +---+--+---++-+--+
>> 
>> 
>> scala>  df.write.format("carbondata").option("tableName",
>> "Branch1").option("compress", "true").mode(SaveMode.Overwrite).save()
>> 
>> 
>> scala> cc.sql("select * from branch1").show(false)
>> 
>> +---+--+---+---+-+--+
>> |country|branch|name   |address|shortname|status|
>> +---+--+---+---+-+--+
>> |2  |1 |Main Branch|   | Dubai   |null  |
>> +---+--+---+---+-+--+
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.15#6346)
>>