Re:Metrics not persisted when writing a query in SPARK-SQL instead of Griffin DSL

Lionel Liu Thu, 11 Oct 2018 07:51:01 -0700


Hi Vikram,



In your JSON body, I notice that in the "rules" field, there's no "out" field, 
which means griffin measure application will only calculate without output. You 
might just changed the "dsl.type" from "griffin-dsl" to "spark-sql", actually, 
for a "griffin-dsl" rule with "dq.type" as "profiling", we create a output for 
it in transform phase: 
https://github.com/apache/incubator-griffin/blob/griffin-0.3.0-incubating-rc1/measure/src/main/scala/org/apache/griffin/measure/step/builder/dsl/transform/ProfilingExpr2DQSteps.scala#L97,
 but for a "spark-sql" rule, we don't parse it, so we don't know how it would 
work, you need to manually configure the output field to enable it.


You can refer to this document to configure the output field: 
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/measure/measure-configuration-guide.md#rule
Or just simply refer to the demo json for spark-sql profiling rules:
https://github.com/apache/incubator-griffin/blob/griffin-0.3.0-incubating-rc1/measure/src/test/resources/_profiling-batch-sparksql.json


Hope this could help you.


--

Regards,
Lionel, Liu



At 2018-10-11 17:30:29, "Vikram Jain" <vikram.j...@enquero.com> wrote:
>Hello,
>
>I was trying to create a measure and write the rule in Spark-SQL directly 
>instead of Griffin-DSL. I use Postman to create the measure. The measure is 
>created successfully, the job is created and executed successfully.
>
>However, the output metrics of execution of jobs are not persisted in 
>ElasticSearch. The entry is created in Elastic but the "metricValues" array is 
>NULL.
>
>The same SQL query works fine directly on Spark-Shell.
>
>I am not using Docker and building the environment (Griffin 3.0) on my local 
>machine. All the measures created using UI are executing well. And measures 
>created using Postman with griffin-dsl rule are also working well.
>
>Below is the body of json which I am passing to add measure API call from 
>Postman. Please help me understand what is going wrong.
>
>
>{
>   "name": "custom_profiling_measure_2",
>   "measure.type": "griffin",
>   "dq.type": "PROFILING",
>   "rule.description": {
>     "details": [
>       {
>         "name": "id",
>         "infos": "Total Count"
>       }
>     ]
>   },
>   "process.type": "BATCH",
>   "owner": "test",
>   "description": "custom_profiling_measure_2",
>   "data.sources": [
>     {
>       "name": "source",
>       "connectors": [
>         {
>           "name": "source123",
>           "type": "HIVE",
>           "version": "1.2",
>           "data.unit": "1day",
>           "data.time.zone": "",
>           "config": {
>             "database": "default",
>             "table.name": "demo_src",
>             "where": ""
>           }
>         }
>       ]
>     }
>   ],
>   "evaluate.rule": {
>     "out.dataframe.name": "profiling_2",
>     "rules": [
>       {
>         "dsl.type": "spark-sql",
>         "dq.type": "PROFILING",
>         "rule": "SELECT count(id) AS cnt, max(age) AS Max_Age from demo_src",
>         "out.dataframe.name": "id_count_2"
>       }
>     ]
>   }
>}
>
>
>
>
>
>Regards,
>
>Vikram
>

Re:Metrics not persisted when writing a query in SPARK-SQL instead of Griffin DSL

Reply via email to