[jira] [Created] (KYLIN-2496) Table snapshot should be no greater than 300MB

2017-03-09 Thread Kailun Zhang (JIRA)
Kailun Zhang created KYLIN-2496:
---

 Summary: Table snapshot should be no greater than 300MB
 Key: KYLIN-2496
 URL: https://issues.apache.org/jira/browse/KYLIN-2496
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2
Reporter: Kailun Zhang
 Fix For: v1.5.2


my fact table has 1000w terms,and join with look up table by userid,the look up 
table has 600w terms, I set the colums gender as dimension to build the 
cube,failed caused by java.lang.IllegalStateException:Table snapshot should be 
no greater than 300 MB,but TableDesc[database=mydatabase name=my table name] 
size is 1442042137.
could kylin affords the high cardinality dimension to join?
how can i resolve the promblem and biuld the cube,thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: build cube with spark ERROR

2017-03-09 Thread ShaoFeng Shi
Spark didn't find mysql connector jar on classpath; Check :
https://stackoverflow.com/questions/33192886/com-mysql-jdbc-driver-not-found-on-classpath-while-starting-spark-sql-and-thrift

You can add additional spark jar in kylin.properties, e.g.,:

kylin.engine.spark.additional-jars=/path/to/mysql-connector-java-5.1.38-bin.jar


2017-03-10 11:37 GMT+08:00 仇同心 :

> Hi all,
>
>When built cube with spark ,I met some errors, Seems to be linked to
> the hive seems to be linked to the hive,can you help me?
>
>
>
>
>
>
>
> javax.jdo.JDOFatalInternalException: Error creating transactional
> connection factory
>
> at org.datanucleus.api.jdo.NucleusJDOHelper.
> getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
>
> at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.
> freezeConfiguration(JDOPersistenceManagerFactory.java:788)
>
> at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.
> getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
>
> at javax.jdo.JDOHelper.invokeGetPersistenceManagerFac
> toryOnImplementation(JDOHelper.java:1166)
>
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(
> JDOHelper.java:808)
>
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(
> JDOHelper.java:701)
>
> at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(
> ObjectStore.java:365)
>
> at org.apache.hadoop.hive.metastore.ObjectStore.
> getPersistenceManager(ObjectStore.java:394)
>
> at org.apache.hadoop.hive.metastore.ObjectStore.
> initialize(ObjectStore.java:291)
>
> at org.apache.hadoop.hive.metastore.ObjectStore.setConf(
> ObjectStore.java:258)
>
> at org.apache.hadoop.util.ReflectionUtils.setConf(
> ReflectionUtils.java:73)
>
> at org.apache.hadoop.util.ReflectionUtils.newInstance(
> ReflectionUtils.java:133)
>
> at org.apache.hadoop.hive.metastore.RawStoreProxy.
> (RawStoreProxy.java:57)
>
> at org.apache.hadoop.hive.metastore.RawStoreProxy.
> getProxy(RawStoreProxy.java:66)
>
> at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.newRawStore(HiveMetaStore.java:593)
>
> at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.getMS(HiveMetaStore.java:571)
>
> at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>
> at org.apache.hadoop.hive.metastore.HiveMetaStore$
> HMSHandler.init(HiveMetaStore.java:461)
>
> at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<
> init>(RetryingHMSHandler.java:66)
>
> at org.apache.hadoop.hive.metastore.RetryingHMSHandler.
> getProxy(RetryingHMSHandler.java:72)
>
> at org.apache.hadoop.hive.metastore.HiveMetaStore.
> newRetryingHMSHandler(HiveMetaStore.java:5762)
>
> at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.
> (HiveMetaStoreClient.java:199)
>
> at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<
> init>(SessionHiveMetaStoreClient.java:74)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:57)
>
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
> at org.apache.hadoop.hive.metastore.MetaStoreUtils.
> newInstance(MetaStoreUtils.java:1521)
>
> at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.
> (RetryingMetaStoreClient.java:86)
>
> at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.
> getProxy(RetryingMetaStoreClient.java:132)
>
> at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.
> getProxy(RetryingMetaStoreClient.java:104)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.
> createMetaStoreClient(Hive.java:3005)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(
> Hive.java:1234)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(
> Hive.java:174)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
>
> at org.apache.hadoop.hive.ql.session.SessionState.start(
> SessionState.java:503)
>
> at 

[jira] [Created] (KYLIN-2495) query exception when integer column encoded as date/time encoding

2017-03-09 Thread hongbin ma (JIRA)
hongbin ma created KYLIN-2495:
-

 Summary: query exception when integer column encoded as date/time 
encoding 
 Key: KYLIN-2495
 URL: https://issues.apache.org/jira/browse/KYLIN-2495
 Project: Kylin
  Issue Type: Bug
Reporter: hongbin ma
Assignee: hongbin ma


in KYLIN-, we claimed that integer column can use date/time encoding. 
however when I tried to query on such cube, an exception is thrown:

{code}
java.sql.SQLException: Error while executing SQL "select * from fact0309
LIMIT 5": For input string: "70225920"
{code}

the fact table desc is: 

{code}
hive> desc fact0309
> ;
OK
tdate   int 
country string  
price   decimal(10,0) 
{code}

and the sample data is:

{code}
19980302US  100
19920403CN  100
19920403US  33
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Question Regrading Cube Query Time

2017-03-09 Thread shaileshprajapati15
Hello,
I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data (~40GB). 
The build was successful, but i am facing issues with queries. Simple 
aggregation queries are returning results in sub seconds, but queries with 
order by/group by taking too much time. In first place, queries were failing 
with timeout error because of records scan threshold, i then increased 
"kylin.query.scan.threshold" value in kylin.properties. The threshold error got 
fixed, but queries were taking around 200 sec. Which is totally not acceptable 
because HIVE was returning result in 10 seconds for the same query. I am 
attaching one of the query(standard TPC-DS query q3) i am trying to run,
SELECT date_dim.d_year,item.i_brand_id, 
item.i_brand,sum(facttable.ss_ext_discount_amt) sum_agg FROM store_sales 
facttableINNER JOIN date_dim date_dim ON (facttable.ss_sold_date_sk = 
date_dim.d_date_sk)INNER JOIN item item ON (facttable.ss_item_sk = 
item.i_item_sk) WHERE item.i_manufact_id = 783 and date_dim.d_moy = 11 GROUP BY 
date_dim.d_year, item.i_brand,item.i_brand_id ORDER BY date_dim.d_year,sum_agg 
DESC,item.i_brand_idLIMIT 100;
My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with hdp 
2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)

Just to investigate, i checked region server logs of all the nodes and found 
that during query execution only one region server was doing all the work while 
others were idle. And, my Cube's Hbase table was also showing 1 region count, 
So i tried changing following properties but still no luck.
kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
Please let me know, if there is any other configuration needed in order to fix 
large query time.
Thanks 



[jira] [Created] (KYLIN-2494) Model has no dup column on dimensions and measures

2017-03-09 Thread liyang (JIRA)
liyang created KYLIN-2494:
-

 Summary: Model has no dup column on dimensions and measures
 Key: KYLIN-2494
 URL: https://issues.apache.org/jira/browse/KYLIN-2494
 Project: Kylin
  Issue Type: Improvement
Reporter: liyang
Assignee: liyang


It does not make sense that a column appears as both dimension and measure in a 
model. A column must be either dimension or measure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KYLIN-2493) BufferOverflowException in FactDistinctColumnsMapper when a value is very long

2017-03-09 Thread XIE FAN (JIRA)
XIE FAN created KYLIN-2493:
--

 Summary: BufferOverflowException in FactDistinctColumnsMapper when 
a value is very long
 Key: KYLIN-2493
 URL: https://issues.apache.org/jira/browse/KYLIN-2493
 Project: Kylin
  Issue Type: Bug
Reporter: XIE FAN
Assignee: XIE FAN


error stack:
Error: java.nio.BufferOverflowException at 
java.nio.HeapByteBuffer.put(HeapByteBuffer.java:183) at 
java.nio.ByteBuffer.put(ByteBuffer.java:832) at 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.doMap(FactDistinctColumnsMapper.java:157)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:48) at 
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)