[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-griffin/pull/389


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209132757
  
--- Diff: 
service/src/main/java/org/apache/griffin/core/job/JobServiceImpl.java ---
@@ -551,18 +551,19 @@ public String getJobHdfsPersistPath(String jobName, 
long timestamp) {
 return null;
 }
 if (jobList.get(0).getType().toLowerCase().equals("batch")) {
-return getPersistPath(ENV_BATCH) + "/" + jobName + "/" + 
timestamp + "";
+return getSinksPath(ENV_BATCH) + "/" + jobName + "/" + 
timestamp + "";
 }
 
-return getPersistPath(ENV_STREAMING) + "/" + jobName + "/" + 
timestamp + "";
+return getSinksPath(ENV_STREAMING) + "/" + jobName + "/" + 
timestamp + "";
 }
 
-private String getPersistPath(String jsonString) {
+private String getSinksPath(String jsonString) {
 try {
 JSONObject obj = new JSONObject(jsonString);
 JSONArray persistArray = obj.getJSONArray("persist");
--- End diff --

"persist" should change to "sinks" for the new format of env.json


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209130221
  
--- Diff: service/src/main/resources/env/env_streaming.json ---
@@ -3,10 +3,10 @@
 "log.level": "WARN",
 "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}",
 "init.clear": true,
-"batch.interval": "30s",
-"process.interval": "3m",
+"batch.interval": "2s",
+"process.interval": "10s",
 "config": {
-  "spark.default.parallelism": 4,
--- End diff --

I prefer keep this line


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209132142
  
--- Diff: 
service/src/main/java/org/apache/griffin/core/measure/entity/Rule.java ---
@@ -49,14 +45,18 @@ Licensed to the Apache Software Foundation (ASF) under 
one
 @NotNull
 private String dslType;
--- End diff --

I suggest set dslType as enum too, containing "spark-sql", "df-ops" and 
"griffin-dsl"


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209129596
  
--- Diff: service/src/main/resources/env/env_streaming.json ---
@@ -3,10 +3,10 @@
 "log.level": "WARN",
 "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}",
 "init.clear": true,
-"batch.interval": "30s",
-"process.interval": "3m",
+"batch.interval": "2s",
+"process.interval": "10s",
 "config": {
-  "spark.default.parallelism": 4,
+  "spark.master": "local[*]",
--- End diff --

this line should be removed


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209130016
  
--- Diff: service/src/main/resources/env/env_streaming.json ---
@@ -3,10 +3,10 @@
 "log.level": "WARN",
 "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}",
 "init.clear": true,
-"batch.interval": "30s",
-"process.interval": "3m",
+"batch.interval": "2s",
+"process.interval": "10s",
--- End diff --

I suggest configure like this:
```
"batch.interval": "1m",
"process.interval": "5m",
```



---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209130355
  
--- Diff: service/src/main/resources/env/env_streaming.json ---
@@ -17,40 +17,40 @@
   "spark.hadoop.fs.hdfs.impl.disable.cache": true
 }
   },
-  "persist": [
+  "sinks": [
 {
-  "type": "log",
+  "type": "CONSOLE",
   "config": {
-"max.log.lines": 2
+"max.log.lines": 100
   }
 },
 {
-  "type": "hdfs",
+  "type": "HDFS",
   "config": {
-"path": "hdfs:///griffin/persist"
+"path": "hdfs:///griffin/persist",
+"max.persist.lines": 1,
+"max.lines.per.file": 1
   }
 },
 {
-  "type": "http",
+  "type": "ELASTICSEARCH",
   "config": {
 "method": "post",
 "api": "http://es:9200/griffin/accuracy";
   }
 }
   ],
-  "info.cache": [
+  "griffin.checkpoint": [
 {
   "type": "zk",
   "config": {
 "hosts": "zk:2181",
 "namespace": "griffin/infocache",
 "lock.path": "lock",
 "mode": "persist",
-"init.clear": false,
+"init.clear": true,
--- End diff --

should configure "init.clear" as false for production usage.


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209127874
  
--- Diff: 
service/src/main/java/org/apache/griffin/core/job/entity/JobDataSegment.java ---
@@ -43,19 +43,19 @@ Licensed to the Apache Software Foundation (ASF) under 
one
 @NotNull
 private String dataConnectorName;
 
-private boolean baseline = false;
+private boolean asTmsBaseline = false;
--- End diff --

I prefer the field name to be "asTsBaseline"


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209128145
  
--- Diff: 
service/src/main/java/org/apache/griffin/core/measure/entity/Measure.java ---
@@ -104,6 +129,26 @@ public void setDeleted(boolean deleted) {
 this.deleted = deleted;
 }
 
+@PrePersist
+@PreUpdate
+public void save() throws JsonProcessingException {
+if (sinksList != null) {
+this.sinks = JsonUtil.toJson(sinksList);
+} else {
+this.sinks = null;
+}
+}
+
+@PostLoad
+public void load() throws IOException {
+if (!StringUtils.isEmpty(sinks)) {
+this.sinksList = JsonUtil.toEntity(sinks, new 
TypeReference>>() {
--- End diff --

sinksList should be a `List`, why is it `List>` 
here?


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread bhlx3lyx7
Github user bhlx3lyx7 commented on a diff in the pull request:

https://github.com/apache/incubator-griffin/pull/389#discussion_r209129552
  
--- Diff: service/src/main/resources/env/env_batch.json ---
@@ -1,55 +1,34 @@
 {
   "spark": {
 "log.level": "WARN",
-"checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}",
-"batch.interval": "20s",
-"process.interval": "1m",
 "config": {
-  "spark.default.parallelism": 4,
-  "spark.task.maxFailures": 5,
-  "spark.streaming.kafkaMaxRatePerPartition": 1000,
-  "spark.streaming.concurrentJobs": 4,
-  "spark.yarn.maxAppAttempts": 5,
-  "spark.yarn.am.attemptFailuresValidityInterval": "1h",
-  "spark.yarn.max.executor.failures": 120,
-  "spark.yarn.executor.failuresValidityInterval": "1h",
-  "spark.hadoop.fs.hdfs.impl.disable.cache": true
+  "spark.master": "local[*]"
--- End diff --

this line should be removed, you can also remove the whole "config" field 
here


---


[GitHub] incubator-griffin pull request #389: update measure field to support new for...

2018-08-09 Thread ahutsunshine
GitHub user ahutsunshine opened a pull request:

https://github.com/apache/incubator-griffin/pull/389

update measure field to support new format and ut

1.update env_batch.json and env_streaming.json
2.Rule
- add "inDataFrameName" and "outDataFrameName", remove "name"
-  add "out" param array, move "metric", "record" param inside "out" array
DataSource
- add boolean field "baseline"
- change "cache" to "checkpoint"
DataConnector
- add "dataFrameName"
Measure
- add "sinks" string array
- update dqType from String to enum
JobServiceImpl
- change "persist" to "sinks"
-  compare literal string "hdfs" case insensitively

3.update measure ut and fix predicate ut bug

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ahutsunshine/incubator-griffin master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-griffin/pull/389.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #389


commit a1c4d2d9efecd79f1c0207d2787d99960ac6a7f3
Author: ahutsunshine 
Date:   2018-08-09T19:09:37Z

update measure field and ut




---