Re: Error while building cube from stream

2016-09-26 Thread ShaoFeng Shi
Hi Tony,

You're correct; The global dictionary wasn't supported in stream builder
(this is the first reporting); Could you please open a JIRA?
https://issues.apache.org/jira/secure/Dashboard.jspa

BTW, we're developing the new version of streaming engine, which will reuse
most of the logic of batch cubing engine, planned to roll out in v1.6. I
believe with the new design there will have no such issue.

2016-09-26 14:56 GMT+08:00 Tony Lee :

> Thanks
>
> But this does not work on streaming cube.
>
> I read some code and found that in class *StreamingCubeBuilder,* the
> dictionary map was built by *DictionaryGenerator.buildDictionary()*
> instead of *DictionaryManager.buildDictionary()*. Does this mean that
> streaming cube does not support global dictionary?
>
> I add USERID to the dimensions, then the cube was built successfully. But
> I think the result will be incorrect if I calculate count distinct in
> different segments. Is that right
>
>
> Tony
>
> On Sat, Sep 24, 2016 at 10:29 PM, ShaoFeng Shi 
> wrote:
>
>> Hi Tony,
>>
>> The error was occurred when building a bitmap counter (for distinct
>> count); from your cube descriptor, it seems there is no global dictionary
>> be specified for the user id column. Please check this blog:
>> https://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/
>>
>> 2016-09-22 10:49 GMT+08:00 Tony Lee :
>>
>>> Thanks, ShaoFeng Shi. That is the reason.
>>>
>>> But unfortunately, I have a new problem about count distinct (precisely)
>>>
>>> I  added a streaming table on version 1.5.4 with my own json, which is
>>> like this
>>> {
>>> "logTimestamp":1474456891127,
>>> "datetime":"2016-09-21 19:21:31",
>>> "uploadTime":"20160921192023",
>>> "userId":"f2d28cbf9e21340a49e97063486db1f5",
>>> "accountId":"84108490",
>>> "otherfield":""
>>> }
>>>
>>> *The error message while building the cube is*
>>>
>>> 2016-09-22 10:01:40,731 ERROR [main StreamingCLI:103]: error start
>>> streaming
>>> java.lang.RuntimeException: error build cube from StreamingBatch
>>> at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder.
>>> build(StreamingCubeBuilder.java:105)
>>> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.r
>>> un(OneOffStreamingBuilder.java:79)
>>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneO
>>> ffCubeStreaming(StreamingCLI.java:123)
>>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.main(Stre
>>> amingCLI.java:97)
>>> Caused by: java.lang.NullPointerException
>>> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(
>>> BitmapMeasureType.java:100)
>>> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(
>>> BitmapMeasureType.java:89)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>>> rter.buildValueOf(InMemCubeBuilderInputConverter.java:122)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>>> rter.buildValue(InMemCubeBuilderInputConverter.java:94)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>>> rter.convert(InMemCubeBuilderInputConverter.java:70)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv
>>> erter$1.next(InMemCubeBuilder.java:542)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv
>>> erter$1.next(InMemCubeBuilder.java:523)
>>> at org.apache.kylin.gridtable.GTAggregateScanner.iterator(GTAgg
>>> regateScanner.java:139)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.createBas
>>> eCuboid(InMemCubeBuilder.java:339)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM
>>> emCubeBuilder.java:166)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM
>>> emCubeBuilder.java:135)
>>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM
>>> emCubeBuilder.java:122)
>>> at org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1
>>> .run(AbstractInMemCubeBuilder.java:80)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>>> s.java:471)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1145)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>> *and the cube json is*
>>> {
>>>   "uuid": "db91bcea-b33f-48af-a2f5-6014b14031f4",
>>>   "last_modified": 1474511879506,
>>>   "version": "1.5.4",
>>>   "name": "hot_play_c",
>>>   "model_name": "hot_play_cube",
>>>   "description": "",
>>>   "null_string": null,
>>>   "dimensions": [
>>> {
>>>   "name": "DEFAULT.HOT_PLAY.HOUR_START",
>>>   "table": "DEFAULT.HOT_PLAY",
>>>   "column": "HOUR_START",
>>>   "derived": null
>>>  

Re: Error while building cube from stream

2016-09-26 Thread Tony Lee
Thanks

But this does not work on streaming cube.

I read some code and found that in class *StreamingCubeBuilder,* the
dictionary map was built by *DictionaryGenerator.buildDictionary()* instead
of *DictionaryManager.buildDictionary()*. Does this mean that streaming
cube does not support global dictionary?

I add USERID to the dimensions, then the cube was built successfully. But I
think the result will be incorrect if I calculate count distinct in
different segments. Is that right


Tony

On Sat, Sep 24, 2016 at 10:29 PM, ShaoFeng Shi 
wrote:

> Hi Tony,
>
> The error was occurred when building a bitmap counter (for distinct
> count); from your cube descriptor, it seems there is no global dictionary
> be specified for the user id column. Please check this blog:
> https://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/
>
> 2016-09-22 10:49 GMT+08:00 Tony Lee :
>
>> Thanks, ShaoFeng Shi. That is the reason.
>>
>> But unfortunately, I have a new problem about count distinct (precisely)
>>
>> I  added a streaming table on version 1.5.4 with my own json, which is
>> like this
>> {
>> "logTimestamp":1474456891127,
>> "datetime":"2016-09-21 19:21:31",
>> "uploadTime":"20160921192023",
>> "userId":"f2d28cbf9e21340a49e97063486db1f5",
>> "accountId":"84108490",
>> "otherfield":""
>> }
>>
>> *The error message while building the cube is*
>>
>> 2016-09-22 10:01:40,731 ERROR [main StreamingCLI:103]: error start
>> streaming
>> java.lang.RuntimeException: error build cube from StreamingBatch
>> at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder.
>> build(StreamingCubeBuilder.java:105)
>> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.
>> run(OneOffStreamingBuilder.java:79)
>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneO
>> ffCubeStreaming(StreamingCLI.java:123)
>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.main(
>> StreamingCLI.java:97)
>> Caused by: java.lang.NullPointerException
>> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(
>> BitmapMeasureType.java:100)
>> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(
>> BitmapMeasureType.java:89)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>> rter.buildValueOf(InMemCubeBuilderInputConverter.java:122)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>> rter.buildValue(InMemCubeBuilderInputConverter.java:94)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>> rter.convert(InMemCubeBuilderInputConverter.java:70)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv
>> erter$1.next(InMemCubeBuilder.java:542)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv
>> erter$1.next(InMemCubeBuilder.java:523)
>> at org.apache.kylin.gridtable.GTAggregateScanner.iterator(GTAgg
>> regateScanner.java:139)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.createBas
>> eCuboid(InMemCubeBuilder.java:339)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(
>> InMemCubeBuilder.java:166)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(
>> InMemCubeBuilder.java:135)
>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(
>> InMemCubeBuilder.java:122)
>> at org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$
>> 1.run(AbstractInMemCubeBuilder.java:80)
>> at java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> *and the cube json is*
>> {
>>   "uuid": "db91bcea-b33f-48af-a2f5-6014b14031f4",
>>   "last_modified": 1474511879506,
>>   "version": "1.5.4",
>>   "name": "hot_play_c",
>>   "model_name": "hot_play_cube",
>>   "description": "",
>>   "null_string": null,
>>   "dimensions": [
>> {
>>   "name": "DEFAULT.HOT_PLAY.HOUR_START",
>>   "table": "DEFAULT.HOT_PLAY",
>>   "column": "HOUR_START",
>>   "derived": null
>> },
>> {
>>   "name": "DEFAULT.HOT_PLAY.MINUTE_START",
>>   "table": "DEFAULT.HOT_PLAY",
>>   "column": "MINUTE_START",
>>   "derived": null
>> }
>>   ],
>>   "measures": [
>> {
>>   "name": "_COUNT_",
>>   "function": {
>> "expression": "COUNT",
>> "parameter": {
>>   "type": "constant",
>>   "value": "1",
>>   "next_parameter": null
>> },
>> "returntype": "bigint"
>>   },
>>   "dependent_measure_ref": null
>> },
>> {
>>   "name": "COUNT_DISTINCT_USER",
>>   

Re: Error while building cube from stream

2016-09-24 Thread ShaoFeng Shi
Hi Tony,

The error was occurred when building a bitmap counter (for distinct count);
from your cube descriptor, it seems there is no global dictionary be
specified for the user id column. Please check this blog:
https://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/

2016-09-22 10:49 GMT+08:00 Tony Lee :

> Thanks, ShaoFeng Shi. That is the reason.
>
> But unfortunately, I have a new problem about count distinct (precisely)
>
> I  added a streaming table on version 1.5.4 with my own json, which is
> like this
> {
> "logTimestamp":1474456891127,
> "datetime":"2016-09-21 19:21:31",
> "uploadTime":"20160921192023",
> "userId":"f2d28cbf9e21340a49e97063486db1f5",
> "accountId":"84108490",
> "otherfield":""
> }
>
> *The error message while building the cube is*
>
> 2016-09-22 10:01:40,731 ERROR [main StreamingCLI:103]: error start
> streaming
> java.lang.RuntimeException: error build cube from StreamingBatch
> at org.apache.kylin.engine.streaming.cube.
> StreamingCubeBuilder.build(StreamingCubeBuilder.java:105)
> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.run(
> OneOffStreamingBuilder.java:79)
> at org.apache.kylin.engine.streaming.cli.StreamingCLI.
> startOneOffCubeStreaming(StreamingCLI.java:123)
> at org.apache.kylin.engine.streaming.cli.StreamingCLI.
> main(StreamingCLI.java:97)
> Caused by: java.lang.NullPointerException
> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.
> valueOf(BitmapMeasureType.java:100)
> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.
> valueOf(BitmapMeasureType.java:89)
> at org.apache.kylin.cube.inmemcubing.
> InMemCubeBuilderInputConverter.buildValueOf(InMemCubeBuilderInputConverter
> .java:122)
> at org.apache.kylin.cube.inmemcubing.
> InMemCubeBuilderInputConverter.buildValue(InMemCubeBuilderInputConverter
> .java:94)
> at org.apache.kylin.cube.inmemcubing.
> InMemCubeBuilderInputConverter.convert(InMemCubeBuilderInputConverter
> .java:70)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$
> InputConverter$1.next(InMemCubeBuilder.java:542)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$
> InputConverter$1.next(InMemCubeBuilder.java:523)
> at org.apache.kylin.gridtable.GTAggregateScanner.iterator(
> GTAggregateScanner.java:139)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> createBaseCuboid(InMemCubeBuilder.java:339)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:166)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:135)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:122)
> at org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.
> run(AbstractInMemCubeBuilder.java:80)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> *and the cube json is*
> {
>   "uuid": "db91bcea-b33f-48af-a2f5-6014b14031f4",
>   "last_modified": 1474511879506,
>   "version": "1.5.4",
>   "name": "hot_play_c",
>   "model_name": "hot_play_cube",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
> {
>   "name": "DEFAULT.HOT_PLAY.HOUR_START",
>   "table": "DEFAULT.HOT_PLAY",
>   "column": "HOUR_START",
>   "derived": null
> },
> {
>   "name": "DEFAULT.HOT_PLAY.MINUTE_START",
>   "table": "DEFAULT.HOT_PLAY",
>   "column": "MINUTE_START",
>   "derived": null
> }
>   ],
>   "measures": [
> {
>   "name": "_COUNT_",
>   "function": {
> "expression": "COUNT",
> "parameter": {
>   "type": "constant",
>   "value": "1",
>   "next_parameter": null
> },
> "returntype": "bigint"
>   },
>   "dependent_measure_ref": null
> },
> {
>   "name": "COUNT_DISTINCT_USER",
>   "function": {
> "expression": "COUNT_DISTINCT",
> "parameter": {
>   "type": "column",
>   "value": "USERID",
>   "next_parameter": null
> },
> "returntype": "bitmap"
>   },
>   "dependent_measure_ref": null
> }
>   ],
>   "dictionaries": [],
>   "rowkey": {
> "rowkey_columns": [
>   {
> "column": "HOUR_START",
> "encoding": "time",
> "isShardBy": false
>   },
>   {
> "column": "MINUTE_START",
> "encoding": "time",
> "isShardBy": false
>   }
> ]
>   },
>   "hbase_mapping": {
> "column_family": [
>   {
> "name": "F1",
> 

Re: Error while building cube from stream

2016-09-20 Thread Alberto Ramón
I don't know but , can you check this change?: KYLIN-1744
 in V1.3


2016-09-20 14:50 GMT+02:00 Tony Lee :

> Hi,
>
> I was building cube from stream as the document(http://kylin.apache.
> org/docs15/tutorial/cube_streaming.html
>
> ) says.
>
> I was using 1.5.3, and i encounter this error. Same error on 1.5.4.
> Everything fine on 1.5.2.1.
>
> Any idea how to solve this?
>
>
> 2016-09-20 20:31:51,520 INFO  [main KafkaStreamingInput:129]: finish to
> get streaming batch, total message count:30
> 2016-09-20 20:31:51,532 DEBUG [main CubeManager:855]: Reloaded new cube:
> STREAMING_CUBE with reference beingCUBE[name=STREAMING_CUBE] having 1
> segments:KYLIN_2822I1W3CX
> 2016-09-20 20:31:51,536 INFO  [main CubeManager:314]: Updating cube
> instance 'STREAMING_CUBE'
> 2016-09-20 20:31:51,538 WARN  [main StreamingCLI:127]: invalid
> args:streaming start STREAMING_CUBE 147437454_147437460 -start
> 147437454 -end 147437460 -cube STREAMING_CUBE
> 2016-09-20 20:31:51,539 ERROR [main StreamingCLI:103]: error start
> streaming
> java.lang.IllegalStateException: Segments overlap:
> STREAMING_CUBE[FULL_BUILD] and STREAMING_CUBE[FULL_BUILD]
> at org.apache.kylin.cube.CubeValidator.validate(CubeValidator.java:85)
> at org.apache.kylin.cube.CubeManager.updateCubeWithRetry(
> CubeManager.java:358)
> at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:301)
> at org.apache.kylin.cube.CubeManager.appendSegment(CubeManager.java:441)
> at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder.
> createBuildable(StreamingCubeBuilder.java:118)
> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.run(
> OneOffStreamingBuilder.java:76)
> at org.apache.kylin.engine.streaming.cli.StreamingCLI.
> startOneOffCubeStreaming(StreamingCLI.java:123)
> at org.apache.kylin.engine.streaming.cli.StreamingCLI.
> main(StreamingCLI.java:97)
> 2016-09-20 20:31:51,543 INFO  [Thread-0 ConnectionManager$
> HConnectionImplementation:1678]: Closing zookeeper
> sessionid=0x35708fbc2740013
> 2016-09-20 20:31:51,549 INFO  [Thread-0 ZooKeeper:684]: Session:
> 0x35708fbc2740013 closed
> 2016-09-20 20:31:51,549 INFO  [main-EventThread ClientCnxn:512]:
> EventThread shut down
>
>


Error while building cube from stream

2016-09-20 Thread Tony Lee
Hi,

I was building cube from stream as the document(
http://kylin.apache.org/docs15/tutorial/cube_streaming.html

) says.

I was using 1.5.3, and i encounter this error. Same error on 1.5.4.
Everything fine on 1.5.2.1.

Any idea how to solve this?


2016-09-20 20:31:51,520 INFO  [main KafkaStreamingInput:129]: finish to get
streaming batch, total message count:30
2016-09-20 20:31:51,532 DEBUG [main CubeManager:855]: Reloaded new cube:
STREAMING_CUBE with reference beingCUBE[name=STREAMING_CUBE] having 1
segments:KYLIN_2822I1W3CX
2016-09-20 20:31:51,536 INFO  [main CubeManager:314]: Updating cube
instance 'STREAMING_CUBE'
2016-09-20 20:31:51,538 WARN  [main StreamingCLI:127]: invalid
args:streaming start STREAMING_CUBE 147437454_147437460 -start
147437454 -end 147437460 -cube STREAMING_CUBE
2016-09-20 20:31:51,539 ERROR [main StreamingCLI:103]: error start streaming
java.lang.IllegalStateException: Segments overlap:
STREAMING_CUBE[FULL_BUILD] and STREAMING_CUBE[FULL_BUILD]
at org.apache.kylin.cube.CubeValidator.validate(CubeValidator.java:85)
at
org.apache.kylin.cube.CubeManager.updateCubeWithRetry(CubeManager.java:358)
at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:301)
at org.apache.kylin.cube.CubeManager.appendSegment(CubeManager.java:441)
at
org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder.createBuildable(StreamingCubeBuilder.java:118)
at
org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.run(OneOffStreamingBuilder.java:76)
at
org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneOffCubeStreaming(StreamingCLI.java:123)
at
org.apache.kylin.engine.streaming.cli.StreamingCLI.main(StreamingCLI.java:97)
2016-09-20 20:31:51,543 INFO  [Thread-0
ConnectionManager$HConnectionImplementation:1678]: Closing zookeeper
sessionid=0x35708fbc2740013
2016-09-20 20:31:51,549 INFO  [Thread-0 ZooKeeper:684]: Session:
0x35708fbc2740013 closed
2016-09-20 20:31:51,549 INFO  [main-EventThread ClientCnxn:512]:
EventThread shut down