[jira] [Created] (CARBONDATA-517) Use carbon property to get the store path/kettle home

2016-12-09 Thread Fei Wang (JIRA)
Fei Wang created CARBONDATA-517:
---

 Summary: Use carbon property to get the store path/kettle home
 Key: CARBONDATA-517
 URL: https://issues.apache.org/jira/browse/CARBONDATA-517
 Project: CarbonData
  Issue Type: Sub-task
  Components: spark-integration
Affects Versions: 0.2.0-incubating
Reporter: Fei Wang
Assignee: Fei Wang


to distinguish the carbon config with spark config. for carbon config we use 
carbon property to get them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: About hive integration

2016-12-09 Thread Sea
It looks like that we just need to implement CarbonFileStorageFomartDescriptor 
and CarbonHiveSerde
CarbonInputformat/CarbonOutputformat already exists in master branch


@Liang, can you create a module for hive? 



import java.util.Set;

import org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat;
import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;

import com.google.common.collect.ImmutableSet;

public class ParquetFileStorageFormatDescriptor extends 
AbstractStorageFormatDescriptor {
  @Override
  public Set getNames() {
return ImmutableSet.of(IOConstants.PARQUETFILE, IOConstants.PARQUET);
  }
  @Override
  public String getInputFormat() {
return MapredParquetInputFormat.class.getName();
  }
  @Override
  public String getOutputFormat() {
return MapredParquetOutputFormat.class.getName();
  }
  @Override
  public String getSerde() {
return ParquetHiveSerDe.class.getName();
  }
}





-- Original --
From:  "Liang Chen";;
Date:  Fri, Dec 9, 2016 11:56 AM
To:  "dev"; 

Subject:  Re: About hive integration



Hi

Agree. Hive has been widely used, this is a consensus?? Apache CarbonData
community already have the plan to support hive integration, look forward to
seeing your contribution on hive integration also :)

Regards
Liang 


cenyuhai wrote
> Hi, all:
> Now carbondata is not working in hive which is the most widely used
> query engine. In my company, if I want to use carbon, I need to query
> carbondata table in hive.
> I think we should implement the following features in hive:
> 1. DDL create/drop/alter carbondata table
> 2. DML insert(overwrite) /select
> 
> 
> What do you think?





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/About-hive-integration-tp3626p3976.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: carbondata-0.2 load data failed in yarn molde

2016-12-09 Thread geda
Yes,it work thanks。
follow wiki quick start,use spark-default.conf , should config
 carbondata.properties
but  i  use spark-shell , not include carbon.properties in it.



2016-12-09 11:48 GMT+08:00 Liang Chen [via Apache CarbonData Mailing List
archive] :

> Hi
>
> Have you solved this issue after applying new configurations?
>
> Regards
> Liang
>
> geda wrote
> hello:
> i test  data in spark locak model ,then load data inpath to table ,works
> well.
> but when i use yarn-client modle,  with 1w rows , size :940k ,but error
> happend ,there is no lock find in  tmp dir ,i don't know how to
> debug,help.thanks.
> spark1.6 hadoop 2.7|2.6 carbondata 0.2
> local mode: run ok
> $SPARK_HOME/bin/spark-shell --master local[4]  --jars /usr/local/spark/lib/
> carbondata_2.10-0.2.0-incubating-shade-hadoop2.7.1.jar
>
>
> yarn command : run bad
>  $SPARK_HOME/bin/spark-shell --verbose  --master yarn-client
> --driver-memory 1G --driver-cores 1   --executor-memory 4G --num-executors
> 5 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-XX:NewRatio=2
> -XX:PermSize=512m -XX:MaxPermSize=512m -XX:SurvivorRatio=6  -verbose:gc
> -XX:-PrintGCDetails -XX:+PrintGCTimeStamps " --conf "spark.driver.
> extraJavaOptions=-XX:MaxPermSize=512m -XX:PermSize=512m"  --conf
> spark.yarn.driver.memoryOverhead=1024 --conf 
> spark.yarn.executor.memoryOverhead=3096
>--jars /usr/local/spark/lib/carbondata_2.10-0.2.0-
> incubating-shade-hadoop2.7.1.jar
>
> import java.io._
> import org.apache.hadoop.hive.conf.HiveConf
> import org.apache.spark.sql.CarbonContext
> val storePath = "hdfs://test:8020/usr/carbondata/store"
> val cc = new CarbonContext(sc, storePath)
> cc.setConf(HiveConf.ConfVars.HIVECHECKFILEFORMAT.varname, "false")
> cc.setConf("carbon.kettle.home","/usr/local/spark/carbondata/carbonplugins")
>
> cc.sql("CREATE TABLE `LINEORDER3` (   LO_ORDERKEY   bigint,
> LO_LINENUMBER int,   LO_CUSTKEYbigint,   LO_PARTKEY
>  bigint,   LO_SUPPKEYbigint,   LO_ORDERDATE  int,
> LO_ORDERPRIOTITY  string,   LO_SHIPPRIOTITY   int,   LO_QUANTITY   int,
>   LO_EXTENDEDPRICE  int,   LO_ORDTOTALPRICE  int,   LO_DISCOUNT   int,
>   LO_REVENUEint,   LO_SUPPLYCOST int,   LO_TAXint,
>   LO_COMMITDATE int,   LO_SHIPMODE   string ) STORED BY
> 'carbondata'")
> cc.sql(s"load data local inpath 'hdfs://test:8020/tmp/lineorder_1w.tbl'
>  into table lineorder3 options('DELIMITER'='|', 'FILEHEADER'='LO_ORDERKEY,
> LO_LINENUMBER, LO_CUSTKEY, LO_PARTKEY , LO_SUPPKEY , LO_ORDERDATE ,
> LO_ORDERPRIOTITY ,   LO_SHIPPRIOTITY , LO_QUANTITY ,LO_EXTENDEDPRICE ,
> LO_ORDTOTALPRICE ,LO_DISCOUNT , LO_REVENUE  ,   LO_SUPPLYCOST,   LO_TAX,
> LO_COMMITDATE,   LO_SHIPMODE')")
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 2.0 (TID 8, datanode03-bi-dev): java.lang.RuntimeException: Dictionary file
> lo_orderpriotity is locked for updation. Please try after some time
> at scala.sys.package$.error(package.scala:27)
> at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:353)
> at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> RDD.compute(CarbonGlobalDictionaryRDD.scala:293)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1419)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1418)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at scala.Option.foreach(Option.scala:236)
> at 
> 

Re: B-Tree LRU cache (New Feature)

2016-12-09 Thread jarray888
+1



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/B-Tree-LRU-cache-New-Feature-tp2366p4016.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: [Discussion] Some confused properties

2016-12-09 Thread jarray888
when you load data into carbon , if the source data file have dirty data , it
will forward to this location.

#Path where the bad records are stored 
carbon.badRecords.location=/opt/Carbon/Spark/badrecords  



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Some-confused-properties-tp3958p4017.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: [Discussion] Please vote and comment for carbon data file format change

2016-12-09 Thread jarray888
+1 , currrent dataformat have first time query slow issue , should be fixed.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4018.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.