Re: Build Erro in Step : Build Dimension Dictionary

2016-11-22 Thread Billy(Yiming) Liu
Kylin build snapshots for lookup tables. But it's OK for NULL value. Could
you log a JIRA and show the reproduce steps?

2016-11-23 8:46 GMT+08:00 Mars J <xujiao.myc...@gmail.com>:

> We solved this problem, it caused  by some diatomite column has value
> NULL, but that column is not included in the cube,just in the dimension
> table.
>
> 在 2016年11月19日,下午7:57,Billy(Yiming) Liu <liuyiming@gmail.com> 写道:
>
> Please check if the step before build dimension has finished successfully.
>
> 2016-11-18 18:54 GMT+08:00 Mars J <xujiao.myc...@gmail.com>:
>
>> I get this error of message :" java.lang.NullPointerException
>> result code : 2"
>>
>> there is no more messages in the kylin.log.
>>
>> Is anyone met this problem ?
>>
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: some confuse about Mandatory Dimensions

2016-11-16 Thread Billy(Yiming) Liu
If you set A, B, and C as mandatory dimensions, that means Kylin will save
the cuboid result by grouping A, B, C internally. But that not means you
could only query by grouping A, B, C.  If you only query A, B. The final
result will do post-aggregation by grouping the above cuboid. Same as query
grouping A. The cost is performance, since more post-aggregation needed.
But if you query by grouping D. There would be no result, since you missed
the mandatory dimension.

2016-11-17 13:31 GMT+08:00 张晓明(zhangxiaoming)-技术产品中心 :

> Hi,all
>
>  I have create a cube in My System with Mandatory Dimensions such
> as  A B C, and the Measure use count distinct filed “u” will HLL ,
>
> When the segment of the cube complete,I query the result with kylin sql
> as “select count(distinct u) from table where A=xxx and b=yyy” or “select
> count(distinct u) from table where A=xxx ”. The result is correct
>
> In my opinion, all of the query condition must be set (A=xxx,B=,C=zzz)
> ,the kylin sql can be wrok,
>
> The question is How the Kylin query the result and the distinct value is
> right ?  that is unbelievable
>



-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: query hbase table data from kylin

2016-11-15 Thread Billy(Yiming) Liu
Kylin supports Hive and Kafka as data source, not HBase.
As a workaround, maybe you could find some tools to transfer the data from
HBase to Hive first.

2016-11-16 12:21 GMT+08:00 夏天松 :

> hi kylin team,
> the version of kylin is 1.5.3
> Now I want to use kylin as our data warehouse, but I want to query hbase
> data from kylin( like impala for hbase ).  We put data to hbase realtime,
> and we can make some verification analysis in kylin.
>
> Is there  any method to do this in kylin?
>
> thinks.
>



-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: Kylin Dependencies

2016-11-02 Thread Billy(Yiming) Liu
Here is a quick start for running Kylin on docker,
https://github.com/kyligence/kylin-docker

>From the docker file, you could find the kylin dependencies.

2016-11-02 22:46 GMT+08:00 Alberto Ramón :

> With configs ... I can try it (Will be an interesting exercise for me)
> But libraries, ...
>These libraries can be static compiled on Kylin?
> Any Idea / solution about how to solve all dependecies with out
> install HDFS, Yarn, Hive, HBase in this minimal Linux... ?
>
> the idea is make "minimal linux + Kylin" "to docker it"
> (The result must be few MB, < 150 MB)
>
>
> 2016-11-02 14:20 GMT+01:00 Li Yang :
>
>> Kylin needs Hadoop client library and configs, including hdfs, yarn,
>> hive, hbase.
>>
>> On Sun, Oct 30, 2016 at 1:42 AM, Alberto Ramón > > wrote:
>>
>>> Hi
>>>
>>> Target:
>>>   All Kylin docker are VERY heavy !! (GB and hundred of process) --->
>>> That Is Good for Develop / testing , but BAD Idea for production
>>> I'm trying to install Kylin on minimal linux, ideally Alpine or similar
>>>
>>> I have:
>>> -  a clean install of linux (minimal Centos for example) , without
>>> Hadoop, and and install Kylin from binary
>>>  - use remote HBase & Hive
>>>
>>>
>>> Which dependencies of Kylin I Will need on my Centos / Alpine?
>>>
>>> BR, Alb
>>>
>>
>>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: [Announce] Apache Kylin 1.5.4.1 released

2016-09-28 Thread Billy(Yiming) Liu
Thanks Shaofeng. Release is a hard work.

2016-09-28 17:34 GMT+08:00 ShaoFeng Shi :

> The Apache Kylin team is pleased to announce the immediate availability of
> the 1.5.4.1 release.
>
> This is a bug fix release based on 1.5.4; All of the changes in this
> release can be found in:
> https://kylin.apache.org/docs15/release_notes.html
>
> You can download the source release and binary packages from
> https://www.apache.org/dyn/closer.cgi?path=/kylin/apache-kylin-1.5.4.1/
>
> More information about the binary packages is on Kylin's download page
> https://kylin.apache.org/download/
>
> Apache Kylin is an open source Distributed Analytics Engine designed to
> provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop,
> supporting extremely large datasets.
>
> Apache Kylin lets you query massive data set at sub-second latency in 3
> steps:
> 1. Identify a Star Schema data on Hadoop.
> 2. Build Cube on Hadoop.
> 3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
> or RESTful API.
>
> Thanks everyone who have contributed to the 1.5.4.1 release.
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kylin.apache.org/
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: [Award] Kylin won InfoWord Bossie Awards 2016, again!

2016-09-22 Thread Billy(Yiming) Liu
So cool. Congrats.

2016-09-23 11:09 GMT+08:00 Hao Chen :

> Congratulation!
>
> Hao
>
> On Sep 23, 2016, at 12:35 AM, Luke Han  wrote:
>
> Hi community,
>  You may already know, in the latest announced news from InfoWorld,
> Apache Kylin has been selected for this year's Bossie Awards again:
> *The Best Open Source Big Data Tools*. This is second time we won this.
>   There are 12 projects in this years list, they are TensorFlow, Beam,
> Spark, Kylin
> Kafka,Impala, Elasticsearch, SlamData, Zeppelin, Solr, Streamsets, Titan.
> Most of
> them are Apache projects (include incubating). Congrats to ASF!
>
>Would like to take this monument to say thanks to everybody, we
> never could
> won such industry recognition without everyone's contribution, thanks:-)
>
> Here's news link, enjoy it:
> http://www.infoworld.com/article/3120856/open-source-
> tools/bossie-awards-2016-the-best-open-source-big-data-tools.html#slide9
>
> Thanks.
> Luke Han
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: group by on varchar column

2016-09-21 Thread Billy(Yiming) Liu
So, you hit this bug, https://issues.apache.org/jira/browse/KYLIN-1971

2016-09-21 19:41 GMT+08:00 Sandeep Khurana :

> Found the issue, when fact and dimension have same column names on which
> group by is being done then this problem occurs. I removed similar named
> column from the fact table.
>
> On Tue, Sep 20, 2016 at 2:13 AM, Sandeep Khurana 
> wrote:
>
>> btw, we are using Kylin version 1.5.2
>>
>> On Tue, Sep 20, 2016 at 2:03 AM, Sandeep Khurana 
>> wrote:
>>
>>> Hello
>>>
>>> I have a query where i do group by on a varchar column. The column
>>> values are long sentences (not just single words). This column is part of a
>>> dimension table.
>>>
>>> When i select just from dimension table with this group by then I do
>>> get ~2000 records   .
>>>
>>> But when I join this dimension with the fact table and run the group by
>>> query then I get just 1 record as Kylin somehow assumes the VARCHAR column
>>> values as NULL. There is not even a single row which has value of this
>>> VARCHAR field as null.
>>>
>>> Same query I copy paste and run on the Hive tables, I do get more than
>>> thousand records.
>>>
>>> Strange thing is when I change the aggregate column to another VARCHAR
>>> column (city_name) whose values are just one word and run on kylin SQL
>>> editor then I do get proper records .
>>>
>>> 2 questions
>>>
>>> - Any idea why such behaviour ? Especially when Hive gives proper
>>> records whereas kylin returns just one record which has value of this big
>>> varchar field as NULL.
>>>
>>> - Is there any work around?
>>>
>>>
>>
>>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: Sample Cube Building Error during #4 Step Name: Build Dimension Dictionary

2016-09-21 Thread Billy(Yiming) Liu
The function initializeSerDe introduced by Hive 1.0+, it seems some old
hive libraries exist in the $hive_dependency.

2016-09-21 13:47 GMT+08:00 Sunwei :

> Hi all:
>
>   I'm using *apache-kylin-1.5.4-HBase1.x-bin* with* hadoop2.6.4 Hive1.2.1
> Hbase1.1.4*.
>
>   When i build the sample cube "kylin_sales_cube", i got a error at #4
> Step Name: Build Dimension Dictionary
>
> log output is:
>
> java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.
> SerDeUtils.initializeSerDe(Lorg/apache/hadoop/hive/
> serde2/Deserializer;Lorg/apache/hadoop/conf/Configuration;Ljava/util/
> Properties;Ljava/util/Properties;)V
>
>  at org.apache.hive.hcatalog.mapreduce.InternalUtil.
> initializeDeserializer(InternalUtil.java:156)
>
>  at org.apache.hive.hcatalog.mapreduce.HCatRecordReader.
> createDeserializer(HCatRecordReader.java:127)
>
>  at org.apache.hive.hcatalog.mapreduce.HCatRecordReader.
> initialize(HCatRecordReader.java:92)
>
>  at org.apache.hive.hcatalog.data.transfer.impl.
> HCatInputFormatReader.read(HCatInputFormatReader.java:87)
>
>  at org.apache.kylin.source.hive.HiveTableReader.
> loadHCatRecordItr(HiveTableReader.java:174)
>
>  at org.apache.kylin.source.hive.HiveTableReader.next(
> HiveTableReader.java:99)
>
>  at org.apache.kylin.dict.TableColumnValueEnumerator.moveNext(
> TableColumnValueEnumerator.java:43)
>
>  at org.apache.kylin.dict.DictionaryGenerator$
> NumberDictBuilder.build(DictionaryGenerator.java:174)
>
>  at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:81)
>
>  at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:73)
>
>  at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> DictionaryManager.java:321)
>
>  at org.apache.kylin.cube.CubeManager.buildDictionary(
> CubeManager.java:185)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:50)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:41)
>
>  at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> CreateDictionaryJob.java:56)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> doWork(HadoopShellExecutable.java:63)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:112)
>
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.
> doWork(DefaultChainedExecutable.java:57)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:112)
>
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:136)
>
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
>  at java.lang.Thread.run(Thread.java:745)
>
> How can i fix it?
>
> thanks ~
>



-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: Kylin and BI Tools

2016-09-19 Thread Billy(Yiming) Liu
So cool, impressive. Thank you, Alberto.

2016-09-19 21:42 GMT+08:00 Alberto Ramón :

> Hello
>
> This is the end of all my previous articles, about Kylin and differents
> tools
> With some successful and some failures   :)
>
>
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>
>
>
> If you have any comment / improvement, feel free to indicate me the changes
> A lot of thanks to the "Kylin Team", Alb
>



-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: [Announce] Apache Kylin 1.5.4 released

2016-09-16 Thread Billy(Yiming) Liu
Thanks Shaofeng and our community.

2016-09-16 21:25 GMT+08:00 ShaoFeng Shi :

> The Apache Kylin team is pleased to announce the immediate availability of
> the 1.5.4 release.
>
> This is a bug fix release based on 1.5.3; All of the changes in this
> release can be found in:
> https://kylin.apache.org/docs15/release_notes.html
>
> You can download the source release and binary packages from
> https://www.apache.org/dyn/closer.cgi?path=/kylin/apache-kylin-1.5.4/
>
> More information about the binary packages is on Kylin's download page
> https://kylin.apache.org/download/
>
> Apache Kylin is an open source Distributed Analytics Engine designed to
> provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop,
> supporting extremely large datasets.
>
> Apache Kylin lets you query massive data set at sub-second latency in 3
> steps:
> 1. Identify a Star Schema data on Hadoop.
> 2. Build Cube on Hadoop.
> 3. Query data with ANSI-SQL and get results in sub-second, via ODBC, JDBC
> or RESTful API.
>
> Thanks everyone who have contributed to the 1.5.4 release.
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kylin.apache.org/
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>



-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: How does Kylin communicate with underlying Hadoop services ?

2016-09-14 Thread Billy(Yiming) Liu
Hive is the data source, not the query engine for Kylin. Kylin is the query
engine itself.
The Hive command you found is the command to retrieve the data source, not
user query.

2016-09-14 10:42 GMT+08:00 udana pathirana :

> So basically you execute Beeline or Hive command to send queries to
> Hive.(according to the logs , I could see  the same thing).
> I am just curious, you didn't you use JDBC connection to HiveServer2
> (thrift server) instead , without executing system commands.
> Something like :
>
> Connection con = DriverManager.getConnection("jdbc:hive://hive-server.
> hadoop.local:1/", "", "");
> Statement stmt = con.createStatement();
> String tableName = "testHiveDriverTable";
>
> On Mon, Sep 12, 2016 at 9:46 PM, Yiming Liu 
> wrote:
>
>> 1. Yes
>> 2. Both Hive CLI and Beeline are supported. check kylin.hive.client in
>> kylin.properties
>> 3. Yes. ZK works as a lock service for cube build.
>>
>> 2016-09-12 10:00 GMT+08:00 udana pathirana :
>>
>>> I have some questions about how Kylin connects to different services
>>>
>>> 1) Does Kylin communicate with HBase using hbase-client library
>>> (org.apache.hadoop.hbase.client.Connection) ? I assume it reads HBase
>>> master node/port from the hbase-site.xml ?
>>>
>>> 2) Does Kylin communicate with Hive using JDBC/SQL protocol? Same as
>>> Beeline ? If so, does it read Hive server/port from hive-site.xml ?
>>>
>>> 3) Does Kylin connect to ZK ? IF so why ? How does it connect ; using
>>> zkCli or client library ?
>>>
>>
>>
>>
>> --
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: which cube to use when u build many cubes under a model

2016-09-13 Thread Billy(Yiming) Liu
Kylin has a cost-based router algorithm to find the lowest cost cube to
serve the query.

2016-09-13 15:04 GMT+08:00 Mars J :

> Hi ,
>  I'm every curious about that if I define and build many cubes under
> one specific model, e.g cube1,cube2,cube3, these cubes may have same
> dimensions, so if a query invole such a dimension ,which cube kylin will
> use to respond ?
>



-- 
With Warm regards

Yiming Liu (刘一鸣)


Re: Cassandra?

2016-09-13 Thread Billy(Yiming) Liu
https://community.mapr.com/docs/DOC-1558

There is a document introducing run Kylin on MapR. HBase is the mandatory
component.

2016-09-13 14:43 GMT+08:00 Something Something :

> One of my clients uses MarR distribution & they don't use HBase. We tried
> using Kylin with MapR tables & it didn't work.
>
> We could write & contribute but here's my question:
>
> It appears, the HBase key is this:
>
> +++  =>  values of measures for this combination,
> correct?
>
> Which means every query (except when all DIMs are grouped) would require
> scan, correct?
>
> How does Kylin do the HBase key lookup? I mean, if query requires grouping
> by only 2 DIMs,  & , how is 'lookup' & 'scanning' handled?
>
>
>
> On Mon, Sep 12, 2016 at 6:28 PM, ShaoFeng Shi 
> wrote:
>
>> Cassandra isn't supported, as we don't see stronge need for that; if you
>> have the case, please share the scenario, thanks!
>>
>> Besides, we welcome contribution from the community, if you have the
>> willing to implement this, please also let us know.
>>
>
>


-- 
With Warm regards

Yiming Liu (刘一鸣)