Hi Tony, You're correct; The global dictionary wasn't supported in stream builder (this is the first reporting); Could you please open a JIRA? https://issues.apache.org/jira/secure/Dashboard.jspa
BTW, we're developing the new version of streaming engine, which will reuse most of the logic of batch cubing engine, planned to roll out in v1.6. I believe with the new design there will have no such issue. 2016-09-26 14:56 GMT+08:00 Tony Lee <btony...@gmail.com>: > Thanks > > But this does not work on streaming cube. > > I read some code and found that in class *StreamingCubeBuilder,* the > dictionary map was built by *DictionaryGenerator.buildDictionary()* > instead of *DictionaryManager.buildDictionary()*. Does this mean that > streaming cube does not support global dictionary? > > I add USERID to the dimensions, then the cube was built successfully. But > I think the result will be incorrect if I calculate count distinct in > different segments. Is that right > > > Tony > > On Sat, Sep 24, 2016 at 10:29 PM, ShaoFeng Shi <shaofeng...@apache.org> > wrote: > >> Hi Tony, >> >> The error was occurred when building a bitmap counter (for distinct >> count); from your cube descriptor, it seems there is no global dictionary >> be specified for the user id column. Please check this blog: >> https://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/ >> >> 2016-09-22 10:49 GMT+08:00 Tony Lee <btony...@gmail.com>: >> >>> Thanks, ShaoFeng Shi. That is the reason. >>> >>> But unfortunately, I have a new problem about count distinct (precisely) >>> >>> I added a streaming table on version 1.5.4 with my own json, which is >>> like this >>> { >>> "logTimestamp":1474456891127, >>> "datetime":"2016-09-21 19:21:31", >>> "uploadTime":"20160921192023", >>> "userId":"f2d28cbf9e21340a49e97063486db1f5", >>> "accountId":"84108490", >>> "otherfield":"...." >>> } >>> >>> *The error message while building the cube is* >>> >>> 2016-09-22 10:01:40,731 ERROR [main StreamingCLI:103]: error start >>> streaming >>> java.lang.RuntimeException: error build cube from StreamingBatch >>> at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder. >>> build(StreamingCubeBuilder.java:105) >>> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.r >>> un(OneOffStreamingBuilder.java:79) >>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneO >>> ffCubeStreaming(StreamingCLI.java:123) >>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.main(Stre >>> amingCLI.java:97) >>> Caused by: java.lang.NullPointerException >>> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf( >>> BitmapMeasureType.java:100) >>> at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf( >>> BitmapMeasureType.java:89) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve >>> rter.buildValueOf(InMemCubeBuilderInputConverter.java:122) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve >>> rter.buildValue(InMemCubeBuilderInputConverter.java:94) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve >>> rter.convert(InMemCubeBuilderInputConverter.java:70) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv >>> erter$1.next(InMemCubeBuilder.java:542) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv >>> erter$1.next(InMemCubeBuilder.java:523) >>> at org.apache.kylin.gridtable.GTAggregateScanner.iterator(GTAgg >>> regateScanner.java:139) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.createBas >>> eCuboid(InMemCubeBuilder.java:339) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM >>> emCubeBuilder.java:166) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM >>> emCubeBuilder.java:135) >>> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM >>> emCubeBuilder.java:122) >>> at org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1 >>> .run(AbstractInMemCubeBuilder.java:80) >>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor >>> s.java:471) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>> Executor.java:1145) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>> lExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> >>> *and the cube json is* >>> { >>> "uuid": "db91bcea-b33f-48af-a2f5-6014b14031f4", >>> "last_modified": 1474511879506, >>> "version": "1.5.4", >>> "name": "hot_play_c", >>> "model_name": "hot_play_cube", >>> "description": "", >>> "null_string": null, >>> "dimensions": [ >>> { >>> "name": "DEFAULT.HOT_PLAY.HOUR_START", >>> "table": "DEFAULT.HOT_PLAY", >>> "column": "HOUR_START", >>> "derived": null >>> }, >>> { >>> "name": "DEFAULT.HOT_PLAY.MINUTE_START", >>> "table": "DEFAULT.HOT_PLAY", >>> "column": "MINUTE_START", >>> "derived": null >>> } >>> ], >>> "measures": [ >>> { >>> "name": "_COUNT_", >>> "function": { >>> "expression": "COUNT", >>> "parameter": { >>> "type": "constant", >>> "value": "1", >>> "next_parameter": null >>> }, >>> "returntype": "bigint" >>> }, >>> "dependent_measure_ref": null >>> }, >>> { >>> "name": "COUNT_DISTINCT_USER", >>> "function": { >>> "expression": "COUNT_DISTINCT", >>> "parameter": { >>> "type": "column", >>> "value": "USERID", >>> "next_parameter": null >>> }, >>> "returntype": "bitmap" >>> }, >>> "dependent_measure_ref": null >>> } >>> ], >>> "dictionaries": [], >>> "rowkey": { >>> "rowkey_columns": [ >>> { >>> "column": "HOUR_START", >>> "encoding": "time", >>> "isShardBy": false >>> }, >>> { >>> "column": "MINUTE_START", >>> "encoding": "time", >>> "isShardBy": false >>> } >>> ] >>> }, >>> "hbase_mapping": { >>> "column_family": [ >>> { >>> "name": "F1", >>> "columns": [ >>> { >>> "qualifier": "M", >>> "measure_refs": [ >>> "_COUNT_" >>> ] >>> } >>> ] >>> }, >>> { >>> "name": "F2", >>> "columns": [ >>> { >>> "qualifier": "M", >>> "measure_refs": [ >>> "COUNT_DISTINCT_USER" >>> ] >>> } >>> ] >>> } >>> ] >>> }, >>> "aggregation_groups": [ >>> { >>> "includes": [ >>> "HOUR_START", >>> "MINUTE_START" >>> ], >>> "select_rule": { >>> "hierarchy_dims": [], >>> "mandatory_dims": [], >>> "joint_dims": [] >>> } >>> } >>> ], >>> "signature": "QXddyWCVVCYQcozxd4Zh2w==", >>> "notify_list": [], >>> "status_need_notify": [ >>> "ERROR", >>> "DISCARDED", >>> "SUCCEED" >>> ], >>> "partition_date_start": 0, >>> "partition_date_end": 3153600000000, >>> "auto_merge_time_ranges": [ >>> 604800000, >>> 2419200000 >>> ], >>> "retention_range": 0, >>> "engine_type": 2, >>> "storage_type": 2, >>> "override_kylin_properties": {} >>> } >>> >>> *no error after i change the returntype to hllc(16)* >>> >>> *i have struggled for several days. Any hints about this?* >>> >>> On Wed, Sep 21, 2016 at 10:47 PM, ShaoFeng Shi <shaofeng...@apache.org> >>> wrote: >>> >>>> Hi Tony, >>>> >>>> It seems your cube isn't partitioned (no partition date column >>>> specified); please check or provide the cube JSON. >>>> >>>> 2016-09-21 0:30 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>: >>>> >>>>> I don't know but , can you check this change?: KYLIN-1744 >>>>> <https://issues.apache.org/jira/browse/KYLIN-1744> in V1.3 >>>>> >>>>> >>>>> 2016-09-20 14:50 GMT+02:00 Tony Lee <btony...@gmail.com>: >>>>> >>>>>> Hi, >>>>>> >>>>>> I was building cube from stream as the document(http://kylin.apache.o >>>>>> rg/docs15/tutorial/cube_streaming.html >>>>>> >>>>>> ) says. >>>>>> >>>>>> I was using 1.5.3, and i encounter this error. Same error on 1.5.4. >>>>>> Everything fine on 1.5.2.1. >>>>>> >>>>>> Any idea how to solve this? >>>>>> >>>>>> >>>>>> 2016-09-20 20:31:51,520 INFO [main KafkaStreamingInput:129]: finish >>>>>> to get streaming batch, total message count:30 >>>>>> 2016-09-20 20:31:51,532 DEBUG [main CubeManager:855]: Reloaded new >>>>>> cube: STREAMING_CUBE with reference beingCUBE[name=STREAMING_CUBE] >>>>>> having 1 >>>>>> segments:KYLIN_2822I1W3CX >>>>>> 2016-09-20 20:31:51,536 INFO [main CubeManager:314]: Updating cube >>>>>> instance 'STREAMING_CUBE' >>>>>> 2016-09-20 20:31:51,538 WARN [main StreamingCLI:127]: invalid >>>>>> args:streaming start STREAMING_CUBE 1474374540000_1474374600000 -start >>>>>> 1474374540000 -end 1474374600000 -cube STREAMING_CUBE >>>>>> 2016-09-20 20:31:51,539 ERROR [main StreamingCLI:103]: error start >>>>>> streaming >>>>>> java.lang.IllegalStateException: Segments overlap: >>>>>> STREAMING_CUBE[FULL_BUILD] and STREAMING_CUBE[FULL_BUILD] >>>>>> at org.apache.kylin.cube.CubeValidator.validate(CubeValidator.j >>>>>> ava:85) >>>>>> at org.apache.kylin.cube.CubeManager.updateCubeWithRetry(CubeMa >>>>>> nager.java:358) >>>>>> at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:301) >>>>>> at org.apache.kylin.cube.CubeManager.appendSegment(CubeManager. >>>>>> java:441) >>>>>> at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder. >>>>>> createBuildable(StreamingCubeBuilder.java:118) >>>>>> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.r >>>>>> un(OneOffStreamingBuilder.java:76) >>>>>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneO >>>>>> ffCubeStreaming(StreamingCLI.java:123) >>>>>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.main(Stre >>>>>> amingCLI.java:97) >>>>>> 2016-09-20 20:31:51,543 INFO [Thread-0 >>>>>> ConnectionManager$HConnectionImplementation:1678]: >>>>>> Closing zookeeper sessionid=0x35708fbc2740013 >>>>>> 2016-09-20 20:31:51,549 INFO [Thread-0 ZooKeeper:684]: Session: >>>>>> 0x35708fbc2740013 closed >>>>>> 2016-09-20 20:31:51,549 INFO [main-EventThread ClientCnxn:512]: >>>>>> EventThread shut down >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> Shaofeng Shi 史少锋 >>>> >>>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋