[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-griffin/pull/389 ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209132757 --- Diff: service/src/main/java/org/apache/griffin/core/job/JobServiceImpl.java --- @@ -551,18 +551,19 @@ public String getJobHdfsPersistPath(String jobName, long timestamp) { return null; } if (jobList.get(0).getType().toLowerCase().equals("batch")) { -return getPersistPath(ENV_BATCH) + "/" + jobName + "/" + timestamp + ""; +return getSinksPath(ENV_BATCH) + "/" + jobName + "/" + timestamp + ""; } -return getPersistPath(ENV_STREAMING) + "/" + jobName + "/" + timestamp + ""; +return getSinksPath(ENV_STREAMING) + "/" + jobName + "/" + timestamp + ""; } -private String getPersistPath(String jsonString) { +private String getSinksPath(String jsonString) { try { JSONObject obj = new JSONObject(jsonString); JSONArray persistArray = obj.getJSONArray("persist"); --- End diff -- "persist" should change to "sinks" for the new format of env.json ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209130221 --- Diff: service/src/main/resources/env/env_streaming.json --- @@ -3,10 +3,10 @@ "log.level": "WARN", "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}", "init.clear": true, -"batch.interval": "30s", -"process.interval": "3m", +"batch.interval": "2s", +"process.interval": "10s", "config": { - "spark.default.parallelism": 4, --- End diff -- I prefer keep this line ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209132142 --- Diff: service/src/main/java/org/apache/griffin/core/measure/entity/Rule.java --- @@ -49,14 +45,18 @@ Licensed to the Apache Software Foundation (ASF) under one @NotNull private String dslType; --- End diff -- I suggest set dslType as enum too, containing "spark-sql", "df-ops" and "griffin-dsl" ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209129596 --- Diff: service/src/main/resources/env/env_streaming.json --- @@ -3,10 +3,10 @@ "log.level": "WARN", "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}", "init.clear": true, -"batch.interval": "30s", -"process.interval": "3m", +"batch.interval": "2s", +"process.interval": "10s", "config": { - "spark.default.parallelism": 4, + "spark.master": "local[*]", --- End diff -- this line should be removed ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209130016 --- Diff: service/src/main/resources/env/env_streaming.json --- @@ -3,10 +3,10 @@ "log.level": "WARN", "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}", "init.clear": true, -"batch.interval": "30s", -"process.interval": "3m", +"batch.interval": "2s", +"process.interval": "10s", --- End diff -- I suggest configure like this: ``` "batch.interval": "1m", "process.interval": "5m", ``` ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209130355 --- Diff: service/src/main/resources/env/env_streaming.json --- @@ -17,40 +17,40 @@ "spark.hadoop.fs.hdfs.impl.disable.cache": true } }, - "persist": [ + "sinks": [ { - "type": "log", + "type": "CONSOLE", "config": { -"max.log.lines": 2 +"max.log.lines": 100 } }, { - "type": "hdfs", + "type": "HDFS", "config": { -"path": "hdfs:///griffin/persist" +"path": "hdfs:///griffin/persist", +"max.persist.lines": 1, +"max.lines.per.file": 1 } }, { - "type": "http", + "type": "ELASTICSEARCH", "config": { "method": "post", "api": "http://es:9200/griffin/accuracy"; } } ], - "info.cache": [ + "griffin.checkpoint": [ { "type": "zk", "config": { "hosts": "zk:2181", "namespace": "griffin/infocache", "lock.path": "lock", "mode": "persist", -"init.clear": false, +"init.clear": true, --- End diff -- should configure "init.clear" as false for production usage. ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209127874 --- Diff: service/src/main/java/org/apache/griffin/core/job/entity/JobDataSegment.java --- @@ -43,19 +43,19 @@ Licensed to the Apache Software Foundation (ASF) under one @NotNull private String dataConnectorName; -private boolean baseline = false; +private boolean asTmsBaseline = false; --- End diff -- I prefer the field name to be "asTsBaseline" ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209128145 --- Diff: service/src/main/java/org/apache/griffin/core/measure/entity/Measure.java --- @@ -104,6 +129,26 @@ public void setDeleted(boolean deleted) { this.deleted = deleted; } +@PrePersist +@PreUpdate +public void save() throws JsonProcessingException { +if (sinksList != null) { +this.sinks = JsonUtil.toJson(sinksList); +} else { +this.sinks = null; +} +} + +@PostLoad +public void load() throws IOException { +if (!StringUtils.isEmpty(sinks)) { +this.sinksList = JsonUtil.toEntity(sinks, new TypeReference>>() { --- End diff -- sinksList should be a `List`, why is it `List>` here? ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
Github user bhlx3lyx7 commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/389#discussion_r209129552 --- Diff: service/src/main/resources/env/env_batch.json --- @@ -1,55 +1,34 @@ { "spark": { "log.level": "WARN", -"checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}", -"batch.interval": "20s", -"process.interval": "1m", "config": { - "spark.default.parallelism": 4, - "spark.task.maxFailures": 5, - "spark.streaming.kafkaMaxRatePerPartition": 1000, - "spark.streaming.concurrentJobs": 4, - "spark.yarn.maxAppAttempts": 5, - "spark.yarn.am.attemptFailuresValidityInterval": "1h", - "spark.yarn.max.executor.failures": 120, - "spark.yarn.executor.failuresValidityInterval": "1h", - "spark.hadoop.fs.hdfs.impl.disable.cache": true + "spark.master": "local[*]" --- End diff -- this line should be removed, you can also remove the whole "config" field here ---
[GitHub] incubator-griffin pull request #389: update measure field to support new for...
GitHub user ahutsunshine opened a pull request: https://github.com/apache/incubator-griffin/pull/389 update measure field to support new format and ut 1.update env_batch.json and env_streaming.json 2.Rule - add "inDataFrameName" and "outDataFrameName", remove "name" - add "out" param array, move "metric", "record" param inside "out" array DataSource - add boolean field "baseline" - change "cache" to "checkpoint" DataConnector - add "dataFrameName" Measure - add "sinks" string array - update dqType from String to enum JobServiceImpl - change "persist" to "sinks" - compare literal string "hdfs" case insensitively 3.update measure ut and fix predicate ut bug You can merge this pull request into a Git repository by running: $ git pull https://github.com/ahutsunshine/incubator-griffin master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-griffin/pull/389.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #389 commit a1c4d2d9efecd79f1c0207d2787d99960ac6a7f3 Author: ahutsunshine Date: 2018-08-09T19:09:37Z update measure field and ut ---