from:"Lionel Liu"

[jira] [Commented] (GRIFFIN-211) [Service] JobInstance appUrl error

2018-11-15 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688896#comment-16688896
 ] 

Lionel Liu commented on GRIFFIN-211:


I think the space character inside the url is a bug, we need to fix this.

> [Service] JobInstance appUrl error
> --
>
> Key: GRIFFIN-211
> URL: https://issues.apache.org/jira/browse/GRIFFIN-211
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: xiangrong,chen
>Assignee: xiangrong,chen
>Priority: Major
> Attachments: WeChat Image_20181116093345.png
>
>
> when I try to get jobInstance by calling API 
> "http://localhost:8080/api/v1/jobs/instances?jobId=8=0=10;, 
> response is:
> {
>  "id": 212,
>  "sessionId": 53,
>  "state": "SUCCESS",
>  "type": "BATCH",
>  "appId": "application_1542252279758_0053",
>  "appUri": "{color:#33}http://griffin:8088{color} 
> {color:#FF}/cluster/app/{color} application_1542252279758_0053",
>  "predicateGroup": "PG",
>  "predicateName": "accu-job1_predicate_154233028",
>  "timestamp": 154233028,
>  "expireTimestamp": 1542934800028
>  },
> I find there are 2 blanks on both sides of '/cluster/app/', and it make this 
> url can not work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Build another release for 0.4.0

2018-11-14 Thread Lionel Liu

I agree, we've made several enhancements beyond last release, let's do this.

Thanks,
Lionel

On Wed, Nov 14, 2018 at 10:11 AM Eugene Liu  wrote:

> I agree.
>
> It's time to go next iteration
>
> Thanks
> Eugene
> 
> From: William Guo 
> Sent: Wednesday, November 14, 2018 10:06 AM
> To: dev@griffin.incubator.apache.org
> Subject: [DISCUSS] Build another release for 0.4.0
>
> hi all,
>
> We have implemented several features and fix a lot of bugs recently,
>
> I think it it time for apache griffin to build 0.4.0 release, what do you
> think?
>
>
> Thanks,
> William
>

[jira] [Created] (GRIFFIN-209) [Measure] In paramUtil, the util function getParamStringMap doesn't work as expected

2018-11-13 Thread Lionel Liu (JIRA)

Lionel Liu created GRIFFIN-209:
--

 Summary: [Measure] In paramUtil, the util function 
getParamStringMap doesn't work as expected
 Key: GRIFFIN-209
 URL: https://issues.apache.org/jira/browse/GRIFFIN-209
 Project: Griffin (Incubating)
  Issue Type: Bug
Reporter: Lionel Liu
Assignee: Lionel Liu
 Fix For: 0.3.1-incubating


Need to update paramUtil, to make getParamStringMap and the other util 
functions work as expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Hooks support in JobService

2018-10-22 Thread Lionel Liu

LGTM, we can start this.


Thanks,
Lionel

On Mon, Oct 22, 2018 at 8:58 PM Eugene Liu  wrote:

> I'm fine this hook proposal.
>
> one comment, could we define annotation to take list of hook class names
> as well.
>
> 
> From: William Guo 
> Sent: Monday, October 22, 2018 3:42 PM
> To: dev@griffin.incubator.apache.org
> Subject: Re: Hooks support in JobService
>
> Sounds good to me.
>
>- application.properties would have a property containing list of hook
>class names (or probably, spring bean names?), configuring subset and
> order
>of hook instances enabled at deployment time.
> [William] I like a logic name as spring bean name.
>
>- Hook instantiation mechanism should create them as proper spring
>beans, so that they could use same property file for configuration, and
>access existing repositories and services. At the same time, only
>explicitly enabled hooks should be instantiated, to avoid initializing
> any
>optional heavy hooks each time.
> [William] We can launch our hook mechanism by leveraging spring bean
> instantializtion mechanism.
>
>- Hooks would be implementing GriffinHook interface, and particular
>implementation would use any kind of internal logic, from spring
>integration to some asynchronous handling.
> [William] Sounds good to me.
>
>- Hook interface would have single method accepting subclass of
>GriffinHookEvent. Hook implementation would pick events it would like to
>react on, and ignore the others. To give it some structure, events would
> be
>organized in hierarchy, like JobEvents, JobInstanceEvents,
> MeasureEvents,
>etc.
> [William] Events are common parts for the Event driven flow.
>
>- New extension points would be added by introducing new types of events
>corresponding to some point inside griffin services. Service instance
> would
>be creating corresponding event object, and call onEvent method on all
> hook
>instances defined in .properties file, according to defined order.
> [William] Let's start with basic events, get it run in prototype, we can
> refine this later.
>
>
>
> On Mon, Oct 22, 2018 at 2:51 PM Nick Sokolov  wrote:
>
> > I think this is best detail I can give right now without writing any
> code:
> >
> >- application.properties would have a property containing list of hook
> >class names (or probably, spring bean names?), configuring subset and
> > order
> >of hook instances enabled at deployment time.
> >- Hook instantiation mechanism should create them as proper spring
> >beans, so that they could use same property file for configuration,
> and
> >access existing repositories and services. At the same time, only
> >explicitly enabled hooks should be instantiated, to avoid initializing
> > any
> >optional heavy hooks each time.
> >- Hooks would be implementing GriffinHook interface, and particular
> >implementation would use any kind of internal logic, from spring
> >integration to some asynchronous handling.
> >- Hook interface would have single method accepting subclass of
> >GriffinHookEvent. Hook implementation would pick events it would like
> to
> >react on, and ignore the others. To give it some structure, events
> > would be
> >organized in hierarchy, like JobEvents, JobInstanceEvents,
> > MeasureEvents,
> >etc.
> >- New extension points would be added by introducing new types of
> events
> >corresponding to some point inside griffin services. Service instance
> > would
> >be creating corresponding event object, and call onEvent method on all
> > hook
> >instances defined in .properties file, according to defined order.
> >
> > Probably it's time to build some technical prototype.
> >
> > On Sun, Oct 21, 2018 at 6:46 PM Eugene Liu  wrote:
> >
> > > Guys,
> > >
> > > you have agree to integration style that keeps two potential solutions,
> > > Hive/Spring, leaves final decision to users. I'm fine on this
> agreement.
> > >
> > > but do you have comprehensive design schema, I think the topic
> discussion
> > > is not closing in jira, we should keep in same page about
> > > event/message/sync/async details.
> > >
> > > thanks
> > > 
> > > From: William Guo 
> > > Sent: Monday, October 22, 2018 8:26 AM
> > > To: dev@griffin.incubator.apache.org
> > > Subject: Re: Hooks support in JobService
> > >
> > > Sounds good to me.
> > >
> > > You can implement your preferred hive-style, I can implement spring
> > > integration based on your interface.
> > >
> > > William
> > >
> > > On Mon, Oct 22, 2018 at 2:10 AM Nick Sokolov 
> > wrote:
> > >
> > > > Hi William,
> > > >
> > > > Totally agree on keeping interface as abstract as possible.
> Interfaces
> > > feel
> > > > like natural solution here.
> > > > It should be relatively easy to provide spring integration
> > implementation
> > > > of hook interface.
> > > > Probably we can even have

[jira] [Commented] (GRIFFIN-200) Lifecycle hooks support

2018-10-19 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656370#comment-16656370
 ] 

Lionel Liu commented on GRIFFIN-200:


I've finished all your discussion here, which is really a long thread.

I like the idea of griffin job lifecycle hook mechanism, it will help if users 
need some extra processes before or after griffin job 
create/start/pause/stop/delete action. Synchronous semantics seems better in 
hook processes in lifecycles, e.g.: if I add a hook before griffin job starts, 
I expect the job starts exactly after the hook process ends. Maybe we can use 
synchronous semantics by default, and users could also use it in asynchronous 
way by another parameter or another function or something like that.

On the other hand, I think we can also leverage the similar mechanism for job 
instance lifecycle, some potential use cases might be:
- before each job instance starts, the data source files or some mark files 
could be checked if existed, to enable the job 
- before each job instance starts, the data source file needs to be moved to 
the specific place first.
- after job instance ends, the data source files are required to be removed.
- after job instance ends, the result files (like missing records in accuracy 
measure) needs to be the data source file of another job instance, the hook 
there could help to trigger the next job.

Maybe the examples are not appropriate, but I believe that some of them are 
useful.
Actually, as the first example to check some mark files before job instance 
starts, it's implemented in griffin service at current, in an inappropriate 
way. It's a good chance to bring it to the right way here.

> Lifecycle hooks support
> ---
>
> Key: GRIFFIN-200
> URL: https://issues.apache.org/jira/browse/GRIFFIN-200
> Project: Griffin (Incubating)
>  Issue Type: New Feature
>Reporter: Nikolay Sokolov
>Assignee: William Guo
>Priority: Minor
>
> In some environments, users might want to perform certain actions 
> before/after job is created, before/after job is activated, before/after job 
> is deleted, and so on.
> To fullfill that need, some hook plugin mechanism can be provided, similar to 
> what Hive is doing. User would place respective jar files into Service module 
> classpath at deployment time, and would specify class names using some 
> annotation or using property listing class names (particular mechanism is yet 
> to be determined).
> Proposed signature:
> {code:none}
> public interface JobLifecycleHook {
> void onJobEvent(JobLifecycleEvent event) throws Exception;
> }
> public class BeforeJobCreated implements JobLifecycleEvent { ... }
> public class AfterJobCreated implements JobLifecycleEvent { ... }
> public class BeforeJobDeleted implements JobLifecycleEvent { ... }
> public class AfterJobDeleted implements JobLifecycleEvent { ... }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Graduate Apache Griffin (incubating) as a TLP

2018-10-18 Thread Lionel Liu

Hi all,

We've updated the Podling Suitable Name Search ticket:
https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-148

BTW, I think we can remove the "creation of a set of bylaws" line, to just
follow the ASF bylaws.

Thanks,
Lionel

On Thu, Oct 18, 2018 at 10:37 AM Henry Saputra 
wrote:

> Hi Dave,
>
> You are right. Will wait until the name search been resolved.
>
> - Henry
>
> On Wed, Oct 17, 2018 at 8:46 AM Dave Fisher  wrote:
>
> > Hi -
> >
> > The Podling Suitable Name Search is unresolved:
> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-148 <
> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-148>
> >
> > Note that links are given, but some remarks about the results are needed.
> >
> > Regards,
> > Dave
> >
> > > On Oct 17, 2018, at 8:35 AM, Henry Saputra 
> wrote:
> > >
> > > Thanks for driving the discussion, William.
> > >
> > > As one of the mentors of the podling, I support the graduation proposal
> > and
> > > looking forward for the VOTE thread.
> > >
> > > - Henry
> > >
> > > On Mon, Oct 15, 2018 at 5:51 PM William Guo  wrote:
> > >
> > >> Hi all,
> > >>
> > >> After an enthusiastic discussion with the community:
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/ba389dd1f7a9e8c82912d4dbf06abceda5461429806f5f7b112fc05d@%3Cdev.griffin.apache.org%3E
> > >> culminating with a positive vote:
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/df9c3a36d66c140b82ae655d49e6d3437564cb895724689ff266d954@%3Cdev.griffin.apache.org%3E
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/3fdf92ae3510b6bca3ac25f51200c0b1e48a4ea1185945733259a7b7@%3Cdev.griffin.apache.org%3E
> > >>
> > >> We'd like to bring this to a discussing at the IPMC.
> > >>
> > >> Please see the proposed resolution below and let us know what do you
> > think.
> > >>
> > >> A few stats to help with the discussion:
> > >>
> > >> Now we have the developer team[1,2] from
> > >> eBay, VMWARE, NetEase, Pingan Bank, Huawei,
> > >> Grid Dynamics, Paypal, Alipay, Yitu, JD, Ontario Institute for Cancer
> > >> Research.
> > >>
> > >> 458 commits on development of the project.
> > >> 437 PRs on the Github
> > >> 31 contributors
> > >> 206+ issues created
> > >> 180+ issues resolved
> > >>
> > >> dev has 76 subscribers
> > >> 285 emails sent by 24 people, divided into 139 topics in last
> month.[3]
> > >>
> > >> Please check out Apache Maturity Model Assessment for Griffin[4] For
> > more
> > >> information.
> > >>
> > >> [1]http://griffin.incubator.apache.org/docs/community.html
> > >> [2]http://griffin.incubator.apache.org/docs/contributors.html
> > >> [3]https://lists.apache.org/trends.html?d...@griffin.apache.org:2018-9
> > >> [4]
> > >>
> >
> https://cwiki.apache.org/confluence/display/GRIFFIN/ASF+Maturity+Evaluation
> > >>
> > >> Thanks,
> > >> William.
> > >>
> > >>
> > >> Establish the Apache Griffin Project
> > >>
> > >> WHEREAS, the Board of Directors deems it to be in the best
> > >> interests of the Foundation and consistent with the
> > >> Foundation's purpose to establish a Project Management
> > >> Committee charged with the creation and maintenance of
> > >> open-source software, for distribution at no charge to
> > >> the public, related to a data quality solution for big data,
> > >> including both streaming and batch mode. It offers an unified
> > >> process to measure data quality from different perspectives.
> > >>
> > >> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> > >> Committee (PMC), to be known as the "Apache Griffin Project",
> > >> be and hereby is established pursuant to Bylaws of the
> > >> Foundation; and be it further
> > >>
> > >> RESOLVED, that the Apache Griffin Project be and hereby is
> > >> responsible for the creation and maintenance of software
> > >> related to a data quality solution for big data,
> > >> including both streaming and batch mode. It offers an unified
> > >> process to measure data qualit

Re: [DISCUSS] Graduate Apache Griffin (incubating) as a TLP

2018-10-16 Thread Lionel Liu

As I've commented in dev list, I agree with this proposal.

Thanks,
Lionel

On Tue, Oct 16, 2018 at 4:59 PM William Guo  wrote:

> Hi Bertrand,
>
> Thanks for the feedback!  We will rephrase this in coming voting phase.
>
> William
>
> On Tue, Oct 16, 2018 at 4:25 PM Bertrand Delacretaz <
> bdelacre...@codeconsult.ch> wrote:
>
> > Hi,
> >
> > On Tue, Oct 16, 2018 at 2:51 AM William Guo  wrote:
> > > ...to establish a Project Management
> > > Committee charged with the creation and maintenance of
> > > open-source software, for distribution at no charge to
> > > the public, related to a data quality solution for big data,
> > > including both streaming and batch mode. It offers an unified
> > > process to measure data quality from different perspectives...
> >
> > I suggest removing the last phrase, "It offers an unified...", I think
> > it's good to avoid being too specific in these descriptions. That
> > phrase appears twice in the resolution, as usual.
> >
> > Apart from that, +1 on graduation!
> >
> > -Bertrand
> >
>

[jira] [Commented] (GRIFFIN-205) Accuracy measure check should provide matchedFraction to store

2018-10-13 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648899#comment-16648899
 ] 

Lionel Liu commented on GRIFFIN-205:


In my opinion, the "total", "miss" and "matched" counts in accuracy measure 
results are raw metrics, they could be aggregated in later calculation. But the 
"matchedFraction" field is a calculated metrics, which could not be used in 
later aggregation. Combining these metrics in the same metric value might 
mislead the users.
I think it's OK to add "matchedFraction" field in accuracy metrics, but we need 
to clarify the difference between the count metric and fraction metric in 
document.
BTW, if we add this field in batch mode, it would be better to keep the 
consistency in streaming mode as well.

> Accuracy measure check should provide matchedFraction to store
> --
>
> Key: GRIFFIN-205
> URL: https://issues.apache.org/jira/browse/GRIFFIN-205
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>  Components: accuracy-batch, accuracy-real-time
>Affects Versions: 0.3.1-incubating
>Reporter: Artem Shutak
>Assignee: Artem Shutak
>Priority: Major
>
> Currently, {{accuracy}} measure results contains "total", "miss" and 
> "matched" counts.
> As a result, It's hard to analyze accuracy fraction based on results stored 
> in ElasticSearch, because ElasticSearch does not provide straight forward 
> capability to get "field divided by field" query results.
> {{Accuracy}} measure results should also contain {{matchedFraction}} field. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

RE: [VOTE] Graduate Apache Griffin (incubating) as a TLP

2018-10-12 Thread Lionel, Liu

+1, I agree.

Thanks
Lionel, Liu

From: William Guo
Sent: 2018年10月12日 17:39
To: dev@griffin.incubator.apache.org
Subject: [VOTE] Graduate Apache Griffin (incubating) as a TLP

hi all,

Please vote on the proposal for Apache Griffin graduation to TLP to submit
to
the Incubator PMC.

Vote:
[ ] +1 - Recommend Graduation of Apache Griffin as a TLP
[ ] -1 - Do not recommend graduation of Apache Griffin because ….

This vote will stay open for at least 72 hours.

At the mentors request they did a maturity model analysis [1] and wrote
contribution guidelines. [2]

The Graduation Proposal was written and discussed on the dev list. [3]

[1]
https://cwiki.apache.org/confluence/display/GRIFFIN/ASF+Maturity+Evaluation

[2] http://griffin.incubator.apache.org/docs/contribute.html

[3]
https://lists.apache.org/thread.html/ba389dd1f7a9e8c82912d4dbf06abceda5461429806f5f7b112fc05d@%3Cdev.griffin.apache.org%3E

Establish the Apache Griffin Project

WHEREAS, the Board of Directors deems it to be in the best interests of
the Foundation and consistent with the Foundation's purpose to establish
a Project Management Committee charged with the creation and maintenance
of open-source software, for distribution at no charge to the public,
related to a data quality solution for big data, including both streaming
and batch mode.
It offers an unified process to measure data quality from different
perspectives.

NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
(PMC), to be known as the "Apache Griffin Project", be and hereby is
established pursuant to Bylaws of the Foundation; and be it further

RESOLVED, that the Apache Griffin Project be and hereby is responsible
for the creation and maintenance of software related to a data quality
solution for big data,
including both streaming and batch mode.
It offers an unified process to measure data quality from different
perspectives; and be it further

RESOLVED, that the office of "Vice President, Apache Griffin" be and
hereby is created, the person holding such office to serve at the
direction of the Board of Directors as the chair of the Apache Griffin
Project, and to have primary responsibility for management of the
projects within the scope of responsibility of the Apache Griffin
Project; and be it further

RESOLVED, that the persons listed immediately below be and hereby are
appointed to serve as the initial members of the Apache Griffin Project:

* Alex Lv 
* Deyi Yao 
* Eugene Liu 
* Grant Guo 
* He Wang 
* Henry Saputra 
* Jason Liao 
* John Liu 
* Juan Li 
* Liang Shao 
* Lionel Liu 
* Luciano Resende 
* Nick Sokolov 
* Shawn Sha 
* Vincent Zhao 
* William Guo 
* Yuqin Xuan 

NOW, THEREFORE, BE IT FURTHER RESOLVED, that William Guo be appointed
to the office of Vice President, Apache Griffin, to serve in accordance
with and subject to the direction of the Board of Directors and the
Bylaws of the Foundation until death, resignation, retirement, removal
or disqualification, or until a successor is appointed; and be it
further

RESOLVED, that the initial Apache Griffin PMC be and hereby is tasked
with the creation of a set of bylaws intended to encourage open
development and increased participation in the Apache Griffin Project;
and be it further

RESOLVED, that the Apache Griffin Project be and hereby is tasked with
the migration and rationalization of the Apache Incubator Griffin
podling; and be it further

RESOLVED, that all responsibilities pertaining to the Apache Incubator
Griffin podling encumbered upon the Apache Incubator PMC are hereafter
discharged.

Thanks,
William Guo

RE: Metrics not persisted when writing a query in SPARK-SQL insteadof Griffin DSL

2018-10-11 Thread Lionel, Liu

That’s cool!

Thanks
Lionel, Liu

From: Vikram Jain
Sent: 2018年10月11日 22:52
To: Lionel Liu
Cc: dev@griffin.incubator.apache.org
Subject: Re: Metrics not persisted when writing a query in SPARK-SQL insteadof 
Griffin DSL

Thank you Lionel for your help. 
We figured it out just before your mail arrived :) 

Regards,
Vikram



On 11-Oct-2018, at 8:20 PM, Lionel Liu  wrote:


Hi Vikram,

In your JSON body, I notice that in the "rules" field, there's no "out" field, 
which means griffin measure application will only calculate without output. You 
might just changed the "dsl.type" from "griffin-dsl" to "spark-sql", actually, 
for a "griffin-dsl" rule with "dq.type" as "profiling", we create a output for 
it in transform phase: 
https://github.com/apache/incubator-griffin/blob/griffin-0.3.0-incubating-rc1/measure/src/main/scala/org/apache/griffin/measure/step/builder/dsl/transform/ProfilingExpr2DQSteps.scala#L97,
 but for a "spark-sql" rule, we don't parse it, so we don't know how it would 
work, you need to manually configure the output field to enable it.

You can refer to this document to configure the output field: 
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/measure/measure-configuration-guide.md#rule
 
Or just simply refer to the demo json for spark-sql profiling rules:
https://github.com/apache/incubator-griffin/blob/griffin-0.3.0-incubating-rc1/measure/src/test/resources/_profiling-batch-sparksql.json
 

Hope this could help you.

--
Regards,
Lionel, Liu


At 2018-10-11 17:30:29, "Vikram Jain"  wrote:
>Hello,
>
>I was trying to create a measure and write the rule in Spark-SQL directly 
>instead of Griffin-DSL. I use Postman to create the measure. The measure is 
>created successfully, the job is created and executed successfully.
>
>However, the output metrics of execution of jobs are not persisted in 
>ElasticSearch. The entry is created in Elastic but the "metricValues" array is 
>NULL.
>
>The same SQL query works fine directly on Spark-Shell.
>
>I am not using Docker and building the environment (Griffin 3.0) on my local 
>machine. All the measures created using UI are executing well. And measures 
>created using Postman with griffin-dsl rule are also working well.
>
>Below is the body of json which I am passing to add measure API call from 
>Postman. Please help me understand what is going wrong.
>
>
>{
>   "name": "custom_profiling_measure_2",
>   "measure.type": "griffin",
>   "dq.type": "PROFILING",
>   "rule.description": {
> "details": [
>   {
> "name": "id",
> "infos": "Total Count"
>   }
> ]
>   },
>   "process.type": "BATCH",
>   "owner": "test",
>   "description": "custom_profiling_measure_2",
>   "data.sources": [
> {
>   "name": "source",
>   "connectors": [
> {
>   "name": "source123",
>   "type": "HIVE",
>   "version": "1.2",
>   "data.unit": "1day",
>   "data.time.zone": "",
>   "config": {
> "database": "default",
> "table.name": "demo_src",
> "where": ""
>   }
> }
>   ]
> }
>   ],
>   "evaluate.rule": {
> "out.dataframe.name": "profiling_2",
> "rules": [
>   {
> "dsl.type": "spark-sql",
> "dq.type": "PROFILING",
> "rule": "SELECT count(id) AS cnt, max(age) AS Max_Age from demo_src",
> "out.dataframe.name": "id_count_2"
>   }
> ]
>   }
>}
>
>
>
>
>
>Regards,
>
>Vikram
>

Re:Metrics not persisted when writing a query in SPARK-SQL instead of Griffin DSL

2018-10-11 Thread Lionel Liu



Hi Vikram,


In your JSON body, I notice that in the "rules" field, there's no "out" field, 
which means griffin measure application will only calculate without output. You 
might just changed the "dsl.type" from "griffin-dsl" to "spark-sql", actually, 
for a "griffin-dsl" rule with "dq.type" as "profiling", we create a output for 
it in transform phase: 
https://github.com/apache/incubator-griffin/blob/griffin-0.3.0-incubating-rc1/measure/src/main/scala/org/apache/griffin/measure/step/builder/dsl/transform/ProfilingExpr2DQSteps.scala#L97,
 but for a "spark-sql" rule, we don't parse it, so we don't know how it would 
work, you need to manually configure the output field to enable it.


You can refer to this document to configure the output field: 
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/measure/measure-configuration-guide.md#rule
Or just simply refer to the demo json for spark-sql profiling rules:
https://github.com/apache/incubator-griffin/blob/griffin-0.3.0-incubating-rc1/measure/src/test/resources/_profiling-batch-sparksql.json


Hope this could help you.


--

Regards,
Lionel, Liu



At 2018-10-11 17:30:29, "Vikram Jain"  wrote:
>Hello,
>
>I was trying to create a measure and write the rule in Spark-SQL directly 
>instead of Griffin-DSL. I use Postman to create the measure. The measure is 
>created successfully, the job is created and executed successfully.
>
>However, the output metrics of execution of jobs are not persisted in 
>ElasticSearch. The entry is created in Elastic but the "metricValues" array is 
>NULL.
>
>The same SQL query works fine directly on Spark-Shell.
>
>I am not using Docker and building the environment (Griffin 3.0) on my local 
>machine. All the measures created using UI are executing well. And measures 
>created using Postman with griffin-dsl rule are also working well.
>
>Below is the body of json which I am passing to add measure API call from 
>Postman. Please help me understand what is going wrong.
>
>
>{
>   "name": "custom_profiling_measure_2",
>   "measure.type": "griffin",
>   "dq.type": "PROFILING",
>   "rule.description": {
> "details": [
>   {
> "name": "id",
> "infos": "Total Count"
>   }
> ]
>   },
>   "process.type": "BATCH",
>   "owner": "test",
>   "description": "custom_profiling_measure_2",
>   "data.sources": [
> {
>   "name": "source",
>   "connectors": [
> {
>   "name": "source123",
>   "type": "HIVE",
>   "version": "1.2",
>   "data.unit": "1day",
>   "data.time.zone": "",
>   "config": {
> "database": "default",
> "table.name": "demo_src",
> "where": ""
>   }
> }
>   ]
> }
>   ],
>   "evaluate.rule": {
> "out.dataframe.name": "profiling_2",
> "rules": [
>   {
> "dsl.type": "spark-sql",
> "dq.type": "PROFILING",
> "rule": "SELECT count(id) AS cnt, max(age) AS Max_Age from demo_src",
> "out.dataframe.name": "id_count_2"
>   }
> ]
>   }
>}
>
>
>
>
>
>Regards,
>
>Vikram
>

Re: [DISCUSS] Graduate Apache Griffin (incubating) as a TLP

2018-10-09 Thread Lionel Liu

I think it's the time to graduate, I agree with this proposal, and I'd like
to keep being part of PPMC to contribute to Griffin community.

Thanks,
Lionel

On Tue, Oct 9, 2018 at 11:51 PM William Guo  wrote:

> Hi All,
>
> With the 0.3.0-incubating release officially out, the Apache Griffin
> community and its mentors believe it is time to consider graduation to the
> TLP.
>
> Apache Griffin entered incubation in December of 2016, since then, the
> Griffin
> community learned a lot about how to do things in Apache ways. Now we have
> a healthy and engaged community, ready to help with all questions from the
> Griffin community. We delivered five releases, now we can do self-driving
> releases in good cadence. The PPMC has demonstrated a good understanding of
> growing the community by electing 9 individuals as committers and PPMC
> members.
> The PPMC addressed the maturity issues one by one followed by
> Apache Project Maturity Model, currently all the License and IP issues are
> resolved.
> This demonstrated our understanding of ASF's IP policies.
>
> All in all, I believe this project is qualified as a true TLP and we should
> recognize this fact by formally awarding it such a status. This thread
> means to open up the very same discussion that we had among the mentors and
> Griffin community to the rest of the IPMC. It is a DISCUSS thread so feel
> free
> to ask questions.
>
> To get you all going, here are a few data points which may help:
>
> Project status:
> http://incubator.apache.org/projects/griffin.html
>
> Project website:
> http://griffin.incubator.apache.org/
>
> Project documentation:
> http://griffin.incubator.apache.org/docs/quickstart.html
> http://griffin.incubator.apache.org/docs/download.html
>
> Maturity assessment:
> https://cwiki.apache.org/confluence/display/GRIFFIN/ASF+Maturity+Evaluation
>
> DRAFT of the board resolution is at the bottom of this email
>
> Proposed PMC size: 18 members
>
> Total number of committers: 18 members
>
> PMC affiliation (* indicated chair):
>
>  - eBay * (7)
>  - Meituan (1)
>  - VMWare (1)
>  - Ontario Institute for Cancer Research (1)
>  - NetEase (1)
>  - ASF (1)
>  - Pingan Bank (1)
>  - Satori Software (1)
>  - Grid Dynamics (1)
>  - JD.com (1)
>  - IBM (1)
>  - Huawei (1)
>
>
> 452 commits on develop
> 431 PR”s on GitHub
> 30 contributors across all branches
>
> 200 issues created
> 195 issues resolved
>
> dev list averaged ~20 msgs/month over last 12 months
>
>
> committer affiliations:
>
> active
>
> * eBay
> * meituan
> * VMWare
> * Grid Dynamics
>
> occasional
>
>  * Ontario Institute for Cancer Research
>  * NetEase
>  * ASF
>  * Pingan Bank
>  * Satori Software
>  * JD.com
>  * IBM
>  * Huawei
>
>
> Thanks,
> William
>
>
> **Notes:**
>
> * I'm proposing myself as initial PMC chair -- Please comment this or
> propose other persons as well
>
> * This draft includes all existing PPMC members and mentors into the new
> PMC.
>
> - For all: please indicate if you want to keep being part of the
> PMC or if you prefer to be removed.
>
>
> --
>
> ## Resolution to create a TLP from graduating Incubator podling
>
> X. Establish the Apache Griffin Project
>
> WHEREAS, the Board of Directors deems it to be in the best
> interests of the Foundation and consistent with the
> Foundation's purpose to establish a Project Management
> Committee charged with the creation and maintenance of
> open-source software, for distribution at no charge to
> the public, related to a data quality solution for big data,
> including both streaming and batch mode. It offers an unified
> process to measure data quality from different perspectives.
>
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> Committee (PMC), to be known as the "Apache Griffin Project",
> be and hereby is established pursuant to Bylaws of the
> Foundation; and be it further
>
> RESOLVED, that the Apache Griffin Project be and hereby is
> responsible for the creation and maintenance of software
> related to a data quality solution for big data,
> including both streaming and batch mode. It offers an unified
> process to measure data quality from different perspectives;
> and be it further
>
> RESOLVED, that the office of "Vice President, Apache Griffin" be
> and hereby is created, the person holding such office to
> serve at the direction of the Board of Directors as the chair
> of the Apache Griffin Project, and to have primary responsibility
> for management of the projects within the scope of
> responsibility of the Apache Griffin

[jira] [Commented] (GRIFFIN-202) Allow viewing of raw measure details from UI

2018-10-07 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641359#comment-16641359
 ] 

Lionel Liu commented on GRIFFIN-202:


This could be useful for users, I agree on it. The textual format of measure 
details just describes the measure, which is something like a template of its 
jobs. 
The textual format might not provide any more information than current table 
format, no matter in JSON or YAML, but in this way, we can consider to enable 
users create customized measurements in textual format. That would be a great 
enhancement.
Considering this, in the textual format, I prefer only showing the necessary 
fields of measure, just the same as the measure creation.

> Allow viewing of raw measure details from UI
> 
>
> Key: GRIFFIN-202
> URL: https://issues.apache.org/jira/browse/GRIFFIN-202
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Nikolay Sokolov
>Priority: Minor
>
> Sometimes it's desirable to see raw JSON of specific measure. This is 
> especially useful when custom measurements are created from API.
> Two implementations are possible (one or even both can be implemented):
> * show link with URL of measure details API on UI
> * add section under "Mapping Rules", showing formatted plain text of API 
> response on UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-195) [UI] Don't list all table objects from UI

2018-10-07 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-195.

   Resolution: Fixed
Fix Version/s: 1.0.0-incubating

Issue resolved by pull request 431
[https://github.com/apache/incubator-griffin/pull/431]

> [UI] Don't list all table objects from UI
> -
>
> Key: GRIFFIN-195
> URL: https://issues.apache.org/jira/browse/GRIFFIN-195
> Project: Griffin (Incubating)
>  Issue Type: Sub-task
>Reporter: Nikolay Sokolov
>Priority: Major
> Fix For: 1.0.0-incubating
>
>
> Listing all table objects when rendering list of tables to profile takes lots 
> of time, and adds significant latency even if response is cached, when number 
> of tables is big and/or schema contains lots of columns.
> Two solutions are possible:
> 1) Without GRIFFIN-194, use DB list API and then list tables API is called 
> for each DB, in order to collect all tables. When table is clicked on, 
> request to get table details can be made.
> 2) With GRIFFIN-194, DB list followed by number of table list operations can 
> be replaced by "list all table names in all databases" operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-19 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621545#comment-16621545
 ] 

Lionel Liu commented on GRIFFIN-190:


Hi [~cwoytasik], the problem also confuses me, I think we need more tests to 
figure out where it went wrong.

Since you're creating jobs via UI, I suggest you generate a dq.json file like 
this: 
[https://github.com/apache/incubator-griffin/blob/griffin-0.2.0-incubating-rc4/griffin-doc/measure/measure-batch-sample.md#batch-profiling-sample]
 
[,|https://github.com/apache/incubator-griffin/blob/griffin-0.2.0-incubating-rc4/measure/src/test/resources/_profiling-batch-griffindsl.json,]
 and submit the job directly to spark cluster. To address where's the problem.

In this way, we can also get the application log for details.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
> Attachments: PLDataLineageLoad061818.csv
>
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-19 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621534#comment-16621534
 ] 

Lionel Liu commented on GRIFFIN-190:


That's helpful [~chemikadze], I'll have a test about that and document it.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
> Attachments: PLDataLineageLoad061818.csv
>
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-193) Profiling measure UX improvements

2018-09-19 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620463#comment-16620463
 ] 

Lionel Liu commented on GRIFFIN-193:


That would be a useful enhancement for huge number of tables in production.

> Profiling measure UX improvements
> -
>
> Key: GRIFFIN-193
> URL: https://issues.apache.org/jira/browse/GRIFFIN-193
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Nikolay Sokolov
>Priority: Major
>
> While profiling measure UI works fine on small scale, it is becoming tricky 
> to use, when number of tables and databases grows, becoming almost unusable 
> on 1000+ tables. APIs listing large amounts of tables fail frequently and 
> take minutes to complete, it's hard to find tables on UI, and some animations 
> are also starting to work slowly.
> This ticket will have subtasks for both UI and service improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-18 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619276#comment-16619276
 ] 

Lionel Liu commented on GRIFFIN-190:


OK, I think I've got you now.

For null-count rule "count(source.`object`) AS `object-nullcount` WHERE 
source.`object` IS NULL", there's no result, I've checked your data, it seems 
good too. How about test null-count for the other columns?

Enum count measures the items count group by a enum column.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
> Attachments: PLDataLineageLoad061818.csv
>
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-17 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618415#comment-16618415
 ] 

Lionel Liu commented on GRIFFIN-190:


I need to double confirm about this: {color:#14892c}some of the jobs are 
completing successfully now with metric files, but some of the rules still 
fail. {color:#33}You mean all jobs of some rules like the null-count always 
failed? Or only some jobs of the null-count rule failed while some 
succeeded?{color}{color}

{color:#14892c}{color:#33}If the former one, all the null-count rule jobs 
failed, we need to check the dq.json.{color}{color}

{color:#14892c}{color:#33}If the latter one, that might be caused by the 
data difference among different partitions, like the data of 11:00 performs 
different from the data of 12:00, which may lead the failure of calculation, 
then we have to check the data.{color}{color}

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
> Attachments: PLDataLineageLoad061818.csv
>
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-191) Attempt to create job for 0-day partitioned accuracy measure fails

2018-09-16 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616623#comment-16616623
 ] 

Lionel Liu commented on GRIFFIN-191:


Sounds like an edge case, we'll have a test and look into it. Thanks 
[~chemikadze].

> Attempt to create job for 0-day partitioned accuracy measure fails
> --
>
> Key: GRIFFIN-191
> URL: https://issues.apache.org/jira/browse/GRIFFIN-191
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Nikolay Sokolov
>Priority: Trivial
>
> If accuracy measure is created through UI with partition size of 0 days, 
> attempt to create job for such measure from UI fails with error "Missing 
> 'as.baseline' config in 'data.segments'".
> Request:
> {code:none}
> {"job.name":"my-accuracy-job-0","job.type":"batch","measure.id":404,"cron.expression":"0
>  * * * * ?","cron.time.zone":"GMT7:00","data.segments":[]}
> {code}
> Response:
> {code:none}
> {"timestamp":1536989880993,"status":400,"error":"Bad 
> Request","code":"40005","message":"Missing 'as.baseline' config in 
> 'data.segments'","path":"/api/v1/jobs"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-191) Attempt to create job for 0-day partitioned accuracy measure fails

2018-09-15 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616351#comment-16616351
 ] 

Lionel Liu commented on GRIFFIN-191:


Hi [~chemikadze], how did you submit that request? Through UI or API?

Seems like through API directly. Actually the field "data.segments" is required 
like this: 
[https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/api-guide.md#add-job]

Because for accuracy, there're two data sources, we need to choose one of them 
as the baseline of timestamp, to decide which the calculated metric timestamp 
should be align with.

> Attempt to create job for 0-day partitioned accuracy measure fails
> --
>
> Key: GRIFFIN-191
> URL: https://issues.apache.org/jira/browse/GRIFFIN-191
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Nikolay Sokolov
>Priority: Trivial
>
> If accuracy measure is created through UI with partition size of 0 days, 
> attempt to create job for such measure from UI fails with error "Missing 
> 'as.baseline' config in 'data.segments'".
> Request:
> {code:none}
> {"job.name":"my-accuracy-job-0","job.type":"batch","measure.id":404,"cron.expression":"0
>  * * * * ?","cron.time.zone":"GMT7:00","data.segments":[]}
> {code}
> Response:
> {code:none}
> {"timestamp":1536989880993,"status":400,"error":"Bad 
> Request","code":"40005","message":"Missing 'as.baseline' config in 
> 'data.segments'","path":"/api/v1/jobs"}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (GRIFFIN-192) Job listing attempt with VirtualJobs in database fails with error

2018-09-15 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616338#comment-16616338
 ] 

Lionel Liu edited comment on GRIFFIN-192 at 9/15/18 2:56 PM:
-

Thanks [~chemikadze], you're right, we'll fix this bug later.

Line: 
[https://github.com/apache/incubator-griffin/blob/3bbbcb32686bb691bdef9af71ef48685d04ea63f/service/src/main/java/org/apache/griffin/core/job/JobServiceImpl.java#L331]

Because for virtual job, we didn't set the job type as batch or streaming, we 
need another branch to return a fake jobOperator for virtual job, cause virtual 
job doesn't need any operation.


was (Author: lionel_3l):
Thanks [~chemikadze], you're right, we'll fix this bug later.

> Job listing attempt with VirtualJobs in database fails with error
> -
>
> Key: GRIFFIN-192
> URL: https://issues.apache.org/jira/browse/GRIFFIN-192
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Nikolay Sokolov
>Priority: Trivial
>
> If VirtualJob instances are there in database (for example, created from UI), 
> JOB_TYPE_DOES_NOT_SUPPORT is thrown by getJobOperator, inside getJobDataBeans.
> Effectively presence of VirtualJob in db makes Job page of UI unusable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-192) Job listing attempt with VirtualJobs in database fails with error

2018-09-15 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616338#comment-16616338
 ] 

Lionel Liu commented on GRIFFIN-192:


Thanks [~chemikadze], you're right, we'll fix this bug later.

> Job listing attempt with VirtualJobs in database fails with error
> -
>
> Key: GRIFFIN-192
> URL: https://issues.apache.org/jira/browse/GRIFFIN-192
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Nikolay Sokolov
>Priority: Trivial
>
> If VirtualJob instances are there in database (for example, created from UI), 
> JOB_TYPE_DOES_NOT_SUPPORT is thrown by getJobOperator, inside getJobDataBeans.
> Effectively presence of VirtualJob in db makes Job page of UI unusable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: UI roadmap questions

2018-09-13 Thread Lionel Liu

Hi Nick,

For profiling results rendering on UI, we're showing it in a table, because
it's not easy to put the profiling metrics with timestamp in one chart.
Maybe we could change our mind, to show it in a customized page or
something like that.

The other two requirements are not supported on UI now, it would be really
nice to have the features.

Looking forward to your contribution.

Thanks,
Lionel



On Fri, Sep 14, 2018 at 8:14 AM William Guo  wrote:

> hi Nick Sokolov,
>
> For the UI part, currently, it supports typical use cases.
> We are focus on backend features for now, but you are always welcome to
> contribute for UI to enable all backend features.
>
> It is apache way , just create the ticket and send us your PR and we will
> review and merge it.
>
> http://griffin.apache.org/docs/contribute.html
>
>
> Thanks,
> William
>
>
> On Fri, Sep 14, 2018 at 1:58 AM Nick Sokolov  wrote:
>
> > Hi all,
> >
> > We are considering to use Griffin in our project, and while there are not
> > major questions with the backend, looks like UI is missing some features
> we
> > are going to need for our use cases:
> > - showing profiling results in DQ metrics
> > - ability to create spark-sql measures
> > - possibly, the same for completeness measures
> > I'm fine to implement those on our side, however would like to
> doublecheck:
> > is anyone working/planning to work on those? Will these changes be
> welcome
> > by the community, or it's against current vision of project? In the end
> of
> > the day, it would be good for us to avoid reinventing the wheel or being
> > stuck in our fork incompatible with upstream.
> >
> > Thanks in advance!
> >
>

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-13 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613227#comment-16613227
 ] 

Lionel Liu commented on GRIFFIN-190:


Yes, the backslashes are generated by UI, the code is around here: 
[https://github.com/apache/incubator-griffin/blob/griffin-0.2.0-incubating/ui/angular/src/app/measure/create-measure/pr/pr.component.ts#L291|https://github.com/apache/incubator-griffin/blob/griffin-0.2.0-incubating/ui/angular/src/app/measure/create-measure/pr/pr.component.ts#L291,]

But we've tested about this, it works well in our docker container.

I noticed that your livy version is 0.3.0 too, it should perform the same as 
ours.

If you want to fix this, you can have a try to fix it, you can ignore this 
step: 
[https://github.com/apache/incubator-griffin/blob/griffin-0.2.0-incubating/service/src/main/java/org/apache/griffin/core/job/SparkSubmitJob.java#L181|https://github.com/apache/incubator-griffin/blob/griffin-0.2.0-incubating/service/src/main/java/org/apache/griffin/core/job/SparkSubmitJob.java#L181,]
 * If this works, maybe our livy works in different way, or they're in 
different version.
 * If this brings error in livy's log, you can just remove all the backslashes 
in UI code around above link.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-11 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611501#comment-16611501
 ] 

Lionel Liu commented on GRIFFIN-190:


How did you submit the job? Directly curl via livy's API by using the escaped 
json string? Or submit via livy's API by using the config json file? Or submit 
it via griffin server?
 * If you're directly submitting the json string, you need to escape it like 
this: 
\"rule\": \"approx_count_distinct(source.\\`asset\\`) AS \\`asset-distcount\\`\"
 * If you're submitting json file, you don't need to escape the backslash like 
this: 
"rule" : "approx_count_distinct(source.`asset`) AS `asset-distcount`"
 * If you're submitting it via griffin server, you don't need to escape the 
backslash either.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] Apache Griffin-0.3.0-incubating released

2018-09-10 Thread Lionel Liu

I agree about that, we need to update our documents, some of which might be
out of date.

Thanks,
Lionel

On Tue, Sep 11, 2018 at 7:56 AM William Guo  wrote:

> Congrats!
> I think we need to prepare for graduation now , like update all documents,
> wiki, use cases.
>
> Thanks,
> William
>
> On Mon, Sep 10, 2018 at 3:10 PM Henry Saputra 
> wrote:
>
> > Congrats for latest release, guys!
> >
> > - Henry
> >
> > On Thu, Sep 6, 2018 at 10:40 PM Lionel Liu  wrote:
> >
> > > Hi all,
> > >
> > > The Apache Griffin (incubating) team is pleased to announce the release
> > of
> > > Griffin 0.3.0-incubating.
> > >
> > > Apache Griffin is data quality solution for modern data system,
> > > it defines a standard process to define, measure data quality for
> > > well-known dimensions.
> > >
> > > The release is available at:
> > > https://www.apache.org/dyn/closer.cgi/incubator/griffin
> > >
> > > Thanks,
> > >
> > > The Apache Griffin (incubating) team
> > >
> > > =
> > > *DISCLAIMER*
> > > Apache Griffin is an effort undergoing incubation at The Apache
> Software
> > > Foundation (ASF), sponsored by Incubator.
> > > Incubation is required of all newly accepted projects until a further
> > > review indicates that the infrastructure, communications, and decision
> > > making process have stabilized in a manner consistent with other
> > successful
> > > ASF projects.
> > > While incubation status is not necessarily a reflection of the
> > completeness
> > > or stability of the code, it does indicate that the project has yet to
> be
> > > fully endorsed by the ASF.
> > >
> >
>

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-07 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607886#comment-16607886
 ] 

Lionel Liu commented on GRIFFIN-190:


Actually, in griffin-0.2.0-incubating, the parameters of "email" and "sms" are 
not supported in application, we didn't remove them from the code or in 
env.json, that leads some misunderstanding. You can just remove them from 
env.json, it will not make any difference.

We're using ElasticSearch as the default metric storage, thus users could 
leverage the alert function of ES for the notification, griffin will just focus 
on the DQ calculation.

In the latest version griffin-0.3.0-incubating, the "email" and "sms" 
parameters are removed. I think you can try that version, almost the same as 
0.2.0, with better config experience of spark job parameters for livy, and 
clearer job config structure. The latest version was released recently, the 
documents are not all updated yet, we're still working on it.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[ANNOUNCE] Apache Griffin-0.3.0-incubating released

2018-09-06 Thread Lionel Liu

Hi all,

The Apache Griffin (incubating) team is pleased to announce the release of
Griffin 0.3.0-incubating.

Apache Griffin is data quality solution for modern data system,
it defines a standard process to define, measure data quality for
well-known dimensions.

The release is available at:
https://www.apache.org/dyn/closer.cgi/incubator/griffin

Thanks,

The Apache Griffin (incubating) team

=
*DISCLAIMER*
Apache Griffin is an effort undergoing incubation at The Apache Software
Foundation (ASF), sponsored by Incubator.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other successful
ASF projects.
While incubation status is not necessarily a reflection of the completeness
or stability of the code, it does indicate that the project has yet to be
fully endorsed by the ASF.

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-06 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606643#comment-16606643
 ] 

Lionel Liu commented on GRIFFIN-190:


Hi [~mkisly], 

When using something 
sparkJob.file=hdfs://localhost:9000/griffin/griffin-measure.jar we get an error 
wrong fs  expected [file:///] .

Where did you get this error? In griffin service log or in livy log? Suppose it 
would be in livy log, in griffin service, we do NOT parse the value of 
"sparkJob.file" as a path, we just directly send the string value to livy as 
the value of "file" filed like this: "file": 
"hdfs://localhost:9000/griffin/griffin-measure.jar".

In application.properties, "fs.defaultFS" is only used to check done file 
existence, it will not affect the spark job submission.

I guess there might be some issue of the environment. I'm not sure how's your 
livy and spark configured, maybe you can refer to our docker image built up 
scripts:

[https://github.com/bhlx3lyx7/griffin-docker/tree/master/env2/conf/spark]

[https://github.com/bhlx3lyx7/griffin-docker/tree/master/env2/conf/livy]

Or the error might be caused by the other parameters like: "sparkJob.jars" or 
"spark.yarn.dist.files", they also affect if you need enable Hive Context when 
submitting spark jobs.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-190) Blank Health and DQ Metrics Screen

2018-09-05 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605153#comment-16605153
 ] 

Lionel Liu commented on GRIFFIN-190:


Hi [~cwoytasik], you might need to check for some information.

1. Assume you're using the default env.json, there should be result persisted 
in hdfs if the measure job succeed, you can find the results in the path: 
hdfs:///griffin/persist//, there will be several directories named as 
the timestamp of job triggered, inside there lists the metrics.
 * If the "_METRICS" file seems good, it means that the job succeed in spark.
 * If the "_METRICS" doesn't exist, we have to find the yarn log of the spark 
application for the job. In that way, we need to find the application id in 
livy log or griffin service log, then fetch yarn log by this: 
yarn logs -applicationId  > app.log
To export the application log into app.log, then you can find the ERROR msg in 
that log.

2. If the results exist in hdfs, we can try to query them from ES like this: 

curl -XGET 
':9200/griffin/accuracy/_search?pretty_path=hits.hits._source' -d 
'\{"query":{"match_all":{}},  "sort": [\{"tmst": {"order": "asc"}}]}'

If it doesn't exist, there might be something wrong when spark application 
submit metrics to ES.

> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>
> Griffin is up and running.  We have both an accuracy measure and a profiling 
> measure that is set to run every minute via jobs.  When we click the chart 
> icon next to the job we receive a "no content" message.  When we click on the 
> Health link or DQ Metrics link they think for a second and then display a 
> blank screen.  We are thinking this might be ES related, but aren't 
> completely sure.  Need some help.  We assume it's a path or property setup 
> issue.  Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-102) [Service] Fix bug of fetching metrics of different jobs with the same name

2018-09-05 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604152#comment-16604152
 ] 

Lionel Liu commented on GRIFFIN-102:


Actually it's not that easy.

Say we have a metric named "test" with some points between 2018-08-09T12:00 and 
2018-08-10T12:00, and the metric is deleted. After 3 days, we create another 
metric also named "test", and there will be metric points after 
2018-08-13T12:00. In ES, we only saved the metric name "test" as the key, when 
we fetch the metrics of "test", ES will return two slices of data. If the two 
metrics are in the same type like "accuracy", it will confuse the latter user. 
If they are in different types, it will lead error.

> [Service] Fix bug of fetching metrics of different jobs with the same name
> --
>
> Key: GRIFFIN-102
> URL: https://issues.apache.org/jira/browse/GRIFFIN-102
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Lionel Liu
>Assignee: He Wang
>Priority: Major
>  Labels: SP_5
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When fetch metrics from ES, if one job has the same name with another job 
> which was deleted, the metrics would be union and confuse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-189) Griffin - Livy error

2018-09-04 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603919#comment-16603919
 ] 

Lionel Liu commented on GRIFFIN-189:


Maybe I can have a look at your Dockerfile, to check if there's any environment 
issue.

> Griffin - Livy error
> 
>
> Key: GRIFFIN-189
> URL: https://issues.apache.org/jira/browse/GRIFFIN-189
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>  Labels: beginner, newbie, usability
> Attachments: sparkJob.properties, sparkJob.properties
>
>
> We are trying to get griffin set up and after creating measure and jobs and 
> letting them run we have noticed the results are not available via the DQ 
> metrics link or metric link from the job itself.  We have noticed when the 
> job gets submitted the following spark context and error message are 
> generated.  We assume we must have a setting in one of the directories set 
> incorrectly.  Thoughts?
>  
> INFO 20972 --- [ryBean_Worker-2] o.a.g.c.j.SparkSubmitJob : {
>   "measure.type" : "griffin",
>   "id" : 13,
>   "name" : "LineageAccuracy",
>   "owner" : "test",
>   "description" : "AccuracyTest",
>   "organization" : null,
>   "deleted" : false,
>   "timestamp" : 153599832,
>   "dq.type" : "accuracy",
>   "process.type" : "batch",
>   "data.sources" : [ {
>     "id" : 16,
>     "name" : "source",
>     "connectors" : [ {
>   "id" : 17,
>   "name" : "source1535741016027",
>   "type" : "HIVE",
>   "version" : "1.2",
>   "predicates" : [ ],
>   "data.unit" : "1day",
>   "config" : {
>     "database" : "default",
>     "table.name" : "lineage"
>   }
>     } ]
>   }, {
>     "id" : 18,
>     "name" : "target",
>     "connectors" : [ {
>   "id" : 19,
>   "name" : "target1535741022277",
>   "type" : "HIVE",
>   "version" : "1.2",
>   "predicates" : [ ],
>   "data.unit" : "1day",
>   "config" : {
>     "database" : "default",
>     "table.name" : "lineageload"
>   }
>     } ]
>   } ],
>   "evaluate.rule" : {
>     "id" : 14,
>     "rules" : [ {
>   "id" : 15,
>   "rule" : "source.asset=target.asset AND source.element=target.element 
> AND source.elementtype=target.elementtype AND source.object=target.object AND 
> source.objecttype=target.objecttype AND source.objectfield=target.objectfield 
> AND source.sourceelement=target.sourceelement AND 
> source.sourceobject=target.sourceobject AND 
> source.sourcefield=target.sourcefield AND 
> source.sourcefieldname=target.sourcefieldname AND 
> source.transformationtext=target.transformationtext AND 
> source.displayindicator=target.displayindicator",
>   "name" : "accuracy",
>   "dsl.type" : "griffin-dsl",
>   "dq.type" : "accuracy"
>     } ]
>   },
>   "measure.type" : "griffin"
> }
> {color:#FF}2018-09-04 13:12:00.752 ERROR 20972 --- [ryBean_Worker-2] 
> o.a.g.c.j.SparkSubmitJob : Post to livy error. 500 Internal 
> Server Error{color}
> [EL Fine]: sql: 2018-09-04 
> 13:12:00.754--ClientSession(787879814)--Connection(1389579691)--UPDATE 
> JOBINSTANCEBEAN SET predicate_job_deleted = ?, STATE = ? WHERE (ID = ?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-189) Griffin - Livy error

2018-09-04 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603913#comment-16603913
 ] 

Lionel Liu commented on GRIFFIN-189:


Well, seems like you are using the old version 0.2.0.

There might be another issue: is the hdfs path "hdfs://localhost:9000" the 
correct fs.name in your docker container?

 

PS: we've also updated griffin docker image for 0.3.0, align with the code of 
0.3.0 to be released in these days, you can have a try.

> Griffin - Livy error
> 
>
> Key: GRIFFIN-189
> URL: https://issues.apache.org/jira/browse/GRIFFIN-189
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>  Labels: beginner, newbie, usability
> Attachments: sparkJob.properties, sparkJob.properties
>
>
> We are trying to get griffin set up and after creating measure and jobs and 
> letting them run we have noticed the results are not available via the DQ 
> metrics link or metric link from the job itself.  We have noticed when the 
> job gets submitted the following spark context and error message are 
> generated.  We assume we must have a setting in one of the directories set 
> incorrectly.  Thoughts?
>  
> INFO 20972 --- [ryBean_Worker-2] o.a.g.c.j.SparkSubmitJob : {
>   "measure.type" : "griffin",
>   "id" : 13,
>   "name" : "LineageAccuracy",
>   "owner" : "test",
>   "description" : "AccuracyTest",
>   "organization" : null,
>   "deleted" : false,
>   "timestamp" : 153599832,
>   "dq.type" : "accuracy",
>   "process.type" : "batch",
>   "data.sources" : [ {
>     "id" : 16,
>     "name" : "source",
>     "connectors" : [ {
>   "id" : 17,
>   "name" : "source1535741016027",
>   "type" : "HIVE",
>   "version" : "1.2",
>   "predicates" : [ ],
>   "data.unit" : "1day",
>   "config" : {
>     "database" : "default",
>     "table.name" : "lineage"
>   }
>     } ]
>   }, {
>     "id" : 18,
>     "name" : "target",
>     "connectors" : [ {
>   "id" : 19,
>   "name" : "target1535741022277",
>   "type" : "HIVE",
>   "version" : "1.2",
>   "predicates" : [ ],
>   "data.unit" : "1day",
>   "config" : {
>     "database" : "default",
>     "table.name" : "lineageload"
>   }
>     } ]
>   } ],
>   "evaluate.rule" : {
>     "id" : 14,
>     "rules" : [ {
>   "id" : 15,
>   "rule" : "source.asset=target.asset AND source.element=target.element 
> AND source.elementtype=target.elementtype AND source.object=target.object AND 
> source.objecttype=target.objecttype AND source.objectfield=target.objectfield 
> AND source.sourceelement=target.sourceelement AND 
> source.sourceobject=target.sourceobject AND 
> source.sourcefield=target.sourcefield AND 
> source.sourcefieldname=target.sourcefieldname AND 
> source.transformationtext=target.transformationtext AND 
> source.displayindicator=target.displayindicator",
>   "name" : "accuracy",
>   "dsl.type" : "griffin-dsl",
>   "dq.type" : "accuracy"
>     } ]
>   },
>   "measure.type" : "griffin"
> }
> {color:#FF}2018-09-04 13:12:00.752 ERROR 20972 --- [ryBean_Worker-2] 
> o.a.g.c.j.SparkSubmitJob : Post to livy error. 500 Internal 
> Server Error{color}
> [EL Fine]: sql: 2018-09-04 
> 13:12:00.754--ClientSession(787879814)--Connection(1389579691)--UPDATE 
> JOBINSTANCEBEAN SET predicate_job_deleted = ?, STATE = ? WHERE (ID = ?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-189) Griffin - Livy error

2018-09-04 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603814#comment-16603814
 ] 

Lionel Liu commented on GRIFFIN-189:


Hi, it seems like the submission of spark job failed by livy server error.
 # What environment are you working on? The docker we provided or the 
environment you've deployed yourself?
 # Have you configured "livy.uri" as the correct url like 
"http://:8998/batches"?
 # Does the livy server work as normal? You can have a simple test of it.
 # Can you submit the livy test from the griffin server machine?

> Griffin - Livy error
> 
>
> Key: GRIFFIN-189
> URL: https://issues.apache.org/jira/browse/GRIFFIN-189
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
>  Labels: beginner, newbie, usability
>
> We are trying to get griffin set up and after creating measure and jobs and 
> letting them run we have noticed the results are not available via the DQ 
> metrics link or metric link from the job itself.  We have noticed when the 
> job gets submitted the following spark context and error message are 
> generated.  We assume we must have a setting in one of the directories set 
> incorrectly.  Thoughts?
>  
> INFO 20972 --- [ryBean_Worker-2] o.a.g.c.j.SparkSubmitJob : {
>   "measure.type" : "griffin",
>   "id" : 13,
>   "name" : "LineageAccuracy",
>   "owner" : "test",
>   "description" : "AccuracyTest",
>   "organization" : null,
>   "deleted" : false,
>   "timestamp" : 153599832,
>   "dq.type" : "accuracy",
>   "process.type" : "batch",
>   "data.sources" : [ {
>     "id" : 16,
>     "name" : "source",
>     "connectors" : [ {
>   "id" : 17,
>   "name" : "source1535741016027",
>   "type" : "HIVE",
>   "version" : "1.2",
>   "predicates" : [ ],
>   "data.unit" : "1day",
>   "config" : {
>     "database" : "default",
>     "table.name" : "lineage"
>   }
>     } ]
>   }, {
>     "id" : 18,
>     "name" : "target",
>     "connectors" : [ {
>   "id" : 19,
>   "name" : "target1535741022277",
>   "type" : "HIVE",
>   "version" : "1.2",
>   "predicates" : [ ],
>   "data.unit" : "1day",
>   "config" : {
>     "database" : "default",
>     "table.name" : "lineageload"
>   }
>     } ]
>   } ],
>   "evaluate.rule" : {
>     "id" : 14,
>     "rules" : [ {
>   "id" : 15,
>   "rule" : "source.asset=target.asset AND source.element=target.element 
> AND source.elementtype=target.elementtype AND source.object=target.object AND 
> source.objecttype=target.objecttype AND source.objectfield=target.objectfield 
> AND source.sourceelement=target.sourceelement AND 
> source.sourceobject=target.sourceobject AND 
> source.sourcefield=target.sourcefield AND 
> source.sourcefieldname=target.sourcefieldname AND 
> source.transformationtext=target.transformationtext AND 
> source.displayindicator=target.displayindicator",
>   "name" : "accuracy",
>   "dsl.type" : "griffin-dsl",
>   "dq.type" : "accuracy"
>     } ]
>   },
>   "measure.type" : "griffin"
> }
> {color:#FF}2018-09-04 13:12:00.752 ERROR 20972 --- [ryBean_Worker-2] 
> o.a.g.c.j.SparkSubmitJob : Post to livy error. 500 Internal 
> Server Error{color}
> [EL Fine]: sql: 2018-09-04 
> 13:12:00.754--ClientSession(787879814)--Connection(1389579691)--UPDATE 
> JOBINSTANCEBEAN SET predicate_job_deleted = ?, STATE = ? WHERE (ID = ?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: griffin docker mirror error

2018-09-03 Thread Lionel Liu

cool, I'll fix this.

On Mon, Sep 3, 2018 at 2:03 PM Eugene Liu  wrote:

> Hi Lionel
>
> I prefer document update, it's easy and feasible
>
> thanks
>
> -Original Message-----
> From: Lionel Liu 
> Sent: 2018年9月3日 9:57
> To: dev@griffin.incubator.apache.org
> Subject: Re: griffin docker mirror error
>
> Hi Eugene,
>
> You are right, zookeeper:3.5 is an official docker image, which might be
> not synchronized to registry.docker-cn.com, we can not do anything on
> this.
> One solution is to re-push zookeeper:3.5 as apachegriffin/zookeeper:3.5
> like the other docker images, then it might be synchronized to
> registry.docker-cn.com.
> Or we can just change the document, change "docker pull
> registry.docker-cn.com/zookeeper:3.5" to "docker pull zookeeper:3.5".
> Which do you prefer?
>
> Thanks,
> Lionel
> On Sun, Sep 2, 2018 at 11:57 AM Eugene Liu  wrote:
>
> > docker pull registry.docker-cn.com/zookeeper:3.5
> >
> > Error response from daemon: manifest for
> > registry.docker-cn.com/zookeeper:3.5 not found
> >
> >
> > docker pull zookeeper:3.5
> >
> > 3.5: Pulling from library/zookeeper
> >
> > 8e3ba11ec2a2: Pull complete
> >
> > 311ad0da4533: Pull complete
> >
> >
> >
> > it seems zk docker mirror not linking to main repository
>
>
> 
> From: Jin Liu 
> Sent: Monday, September 3, 2018 1:56 PM
> To: liu...@apache.org
> Subject: griffin docker mirror error
>
>
>
> -Original Message-
> From: Lionel Liu 
> Sent: 2018年9月3日 9:57
> To: dev@griffin.incubator.apache.org
> Subject: Re: griffin docker mirror error
>
> Hi Eugene,
>
> You are right, zookeeper:3.5 is an official docker image, which might be
> not synchronized to registry.docker-cn.com, we can not do anything on
> this.
> One solution is to re-push zookeeper:3.5 as apachegriffin/zookeeper:3.5
> like the other docker images, then it might be synchronized to
> registry.docker-cn.com.
> Or we can just change the document, change "docker pull
> registry.docker-cn.com/zookeeper:3.5" to "docker pull zookeeper:3.5".
> Which do you prefer?
>
> Thanks,
> Lionel
>
>
> On Sun, Sep 2, 2018 at 11:57 AM Eugene Liu  wrote:
>
> > docker pull registry.docker-cn.com/zookeeper:3.5
> >
> > Error response from daemon: manifest for
> > registry.docker-cn.com/zookeeper:3.5 not found
> >
> >
> > docker pull zookeeper:3.5
> >
> > 3.5: Pulling from library/zookeeper
> >
> > 8e3ba11ec2a2: Pull complete
> >
> > 311ad0da4533: Pull complete
> >
> >
> >
> > it seems zk docker mirror not linking to main repository
> >
> >
>

[RESULT][VOTE] Release of Apache Griffin 0.3.0-incubating [RC1]

2018-09-03 Thread Lionel Liu

Hi all,

Vote passed with 5[+1] binding votes and no[-1] votes , Please check
the following tally.

+1 binding: [5]
  Lv Alex
  Eugene Liu
  Lionel Liu
  William Guo
  Kevin Yao

0 : [0]

-1 : [0]

The vote thread lists here:
*https://lists.apache.org/thread.html/b5184c861d299462f369a87b0a911bfd00905b1c79232ff242e0b78a@%3Cdev.griffin.apache.org%3E
<https://lists.apache.org/thread.html/b5184c861d299462f369a87b0a911bfd00905b1c79232ff242e0b78a@%3Cdev.griffin.apache.org%3E>*

Thanks,
Lionel
On behalf of Apache Griffin PPMC

Re: [VOTE] Release of Apache Griffin-0.3.0-incubating [RC1]

2018-09-02 Thread Lionel Liu

I' checked,

incubating in the name,

these files exist:
CHANGES.txt,
source release zip file,
pom file,
signature files,
hash files,
NO md5 file

signature files good,
hash files good,

LICENSE, NOTICE, DISCLAIMER files are all good.
licenses check success.
source build success.

No cat-X licenses for source release, No cat-B licenses for source code
included.

I vote +1.

Thanks,
Lionel


On Fri, Aug 31, 2018 at 9:53 PM Kevin Yao  wrote:

> vote +1
>
> On Fri, Aug 31, 2018 at 3:44 PM William Guo  wrote:
>
> > I checked,
> >
> > LICENSE
> > NOTICE (YEAR IS RIGHT)
> > DISCLAIMER
> >
> > license ok
> >
> > source build successfully.
> >
> > also checked third party licenses,
> >
> > no category-X GPL, LGPL, CC non commerical, JSON , BSD 4 clause, Apache
> 1.0
> >
> > so I vote +1
> >
> > Thanks,
> > William
> >
> >
> >
> >
> >
> > On Fri, Aug 31, 2018 at 3:10 PM Eugene Liu  wrote:
> >
> > > based on verification below,  I vote +1
> > >
> > > 1.
> > > gpg --verify griffin-0.3.0-incubating-source-release.zip.asc
> > > griffin-0.3.0-incubating-source-release.zip
> > > gpg: Signature made Thu 30 Aug 2018 10:17:00 PM CST using RSA key ID
> > > 02379561
> > > gpg: Good signature from "Lionel " [ultimate]
> > >
> > > 2.
> > > gpg --verify griffin-0.3.0-incubating.pom.asc
> > griffin-0.3.0-incubating.pom
> > > gpg: Signature made Thu 30 Aug 2018 10:17:00 PM CST using RSA key ID
> > > 02379561
> > > gpg: checking the trustdb
> > > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> > > gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
> > > gpg: next trustdb check due at 2028-03-29
> > > gpg: Good signature from "Lionel " [ultimate]
> > >
> > > 3.
> > > for f in *.sha1; do echo "$(cat $f) ${f/.sha1/}"; done | sha1sum -c
> > > griffin-0.3.0-incubating-source-release.zip: OK
> > >
> > > 4.
> > > mvn apache-rat:check
> > > [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0
> generated: 0
> > > approved: 135 licence.
> > > [INFO]
> > >
> 
> > > [INFO] Reactor Summary:
> > > [INFO]
> > > [INFO] Apache Griffin 0.3.0-incubating 0.3.0-incubating ... SUCCESS [
> > > 0.878 s]
> > > [INFO] Apache Griffin :: UI :: Default UI . SUCCESS [
> > > 0.132 s]
> > > [INFO] Apache Griffin :: Web Service .. SUCCESS [
> > > 0.211 s]
> > > [INFO] Apache Griffin :: Measures 0.3.0-incubating  SUCCESS [
> > > 0.146 s]
> > > [INFO]
> > >
> 
> > > [INFO] BUILD SUCCESS
> > > [INFO]
> > >
> 
> > > [INFO] Total time: 2.042 s
> > > [INFO] Finished at: 2018-08-31T14:47:03+08:00
> > > [INFO]
> > >
> 
> > >
> > > 5.
> > > mvn clean install
> > > [INFO]
> > >
> 
> > > [INFO] Reactor Summary:
> > > [INFO]
> > > [INFO] Apache Griffin 0.3.0-incubating 0.3.0-incubating ... SUCCESS [
> > > 7.137 s]
> > > [INFO] Apache Griffin :: UI :: Default UI . SUCCESS
> > [08:22
> > > min]
> > > [INFO] Apache Griffin :: Web Service .. SUCCESS
> > [01:26
> > > min]
> > > [INFO] Apache Griffin :: Measures 0.3.0-incubating  SUCCESS
> > [01:40
> > > min]
> > > [INFO]
> > >
> 
> > > [INFO] BUILD SUCCESS
> > > [INFO]
> > >
> 
> > > [INFO] Total time: 11:36 min
> > > [INFO] Finished at: 2018-08-31T14:59:57+08:00
> > > [INFO]
> > >
> 
> > >
> > > 6.
> > > mvn license:add-third-party
> > > [INFO] Writing third-party file to
> > >
> >
> /home/king/source/github/incubator-griffin/rel/v/griffin-0.3.0-incubating/measure/target/generated-sources/license/THIRD-PARTY.txt
> > > [INFO]
> > >
> 
> > > [INFO] Reactor Summary:
> > > [INFO]
> > > [INFO] Apache Griffin 0.3.0-incubating 0.3.0-incubating ... SUCCESS [
> > > 21.319 s]
> > > [INFO] Apache Griffin :: UI :: Default UI . SUCCESS [
> > > 0.025 s]
> > > [INFO] Apache Griffin :: Web Service .. SUCCESS [
> > > 3.885 s]
> > > [INFO] Apache Griffin :: Measures 0.3.0-incubating  SUCCESS [
> > > 2.723 s]
> > > [INFO]
> > >
> 
> > > [INFO] BUILD SUCCESS
> > > [INFO]
> > >
> 
> > > [INFO] Total time: 35.998 s
> > > [INFO] Finished at: 2018-08-31T15:01:57+08:00
> > > [INFO]
> > >
> 
> > >
> > >
> >
>

Re: griffin docker mirror error

2018-09-02 Thread Lionel Liu

Hi Eugene,

You are right, zookeeper:3.5 is an official docker image, which might be
not synchronized to registry.docker-cn.com, we can not do anything on this.
One solution is to re-push zookeeper:3.5 as apachegriffin/zookeeper:3.5
like the other docker images, then it might be synchronized to
registry.docker-cn.com.
Or we can just change the document, change "docker pull
registry.docker-cn.com/zookeeper:3.5" to "docker pull zookeeper:3.5".
Which do you prefer?

Thanks,
Lionel

On Sun, Sep 2, 2018 at 11:57 AM Eugene Liu  wrote:

> docker pull registry.docker-cn.com/zookeeper:3.5
>
> Error response from daemon: manifest for
> registry.docker-cn.com/zookeeper:3.5 not found
>
>
> docker pull zookeeper:3.5
>
> 3.5: Pulling from library/zookeeper
>
> 8e3ba11ec2a2: Pull complete
>
> 311ad0da4533: Pull complete
>
>
>
> it seems zk docker mirror not linking to main repository
>
>

[VOTE] Release of Apache Griffin-0.3.0-incubating [RC1]

2018-08-30 Thread Lionel Liu

Hi all,

This is a call for a vote on releasing Apache Griffin 0.3.0-incubating,
release candidate 1.
Apache Griffin is data quality service for modern data system, it
defines a standard process to define,measure data quality for well-known
dimensions.
With Apache Griffin, users will be able to quickly define their data
quality requirements and then get the result in near real time in
systematical approach.


** Highlights **
* Refactor measure module for better abstraction.
* Support missing records download for accuracy measurement.
* Support regular expression detection count in profiling measurement.
* Fix several bugs on UI.


The source tarball, including signatures, digests, etc. can be found at:
*https://dist.apache.org/repos/dist/dev/incubator/griffin/0.3.0-incubating/
*

The tag to be voted upon is 0.3.0-incubating:

*https://git-wip-us.apache.org/repos/asf?p=incubator-griffin.git;a=shortlog;h=refs/tags/griffin-0.3.0-incubating
*

The release hash is :

*https://git-wip-us.apache.org/repos/asf?p=incubator-griffin.git;a=commit;h=797cc62c94449e485d3af910bc8557ca9841bb22
*

The Nexus Staging URL:
*https://repository.apache.org/content/repositories/orgapachegriffin-1018
*


Release artifacts are signed with the following key:
7F00C3BA90F3ECAEECB843A79BD6EC6C02379561
KEYS file available:
https://dist.apache.org/repos/dist/dev/incubator/griffin/KEYS

For information about the contents of this release, see:

*https://dist.apache.org/repos/dist/dev/incubator/griffin/0.3.0-incubating/CHANGES.txt
*


Please vote on releasing this package as Apache Griffin 0.3.0-incubating


The vote will be open for 72 hours.

[ ] +1 Release this package as Apache Griffin 0.3.0-incubating
[ ] +0 no opinion
[ ] -1 Do not release this package because ...


You can follow the steps here to verify the release before you vote:
https://cwiki.apache.org/confluence/display/GRIFFIN/How+to+Verify+
Release+Package


Thanks,
Lionel
On behalf of Apache Griffin PPMC

[jira] [Resolved] (GRIFFIN-187) Support Empty String in Profiling Measure

2018-08-30 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-187.

Resolution: Done

> Support Empty String in Profiling Measure
> -
>
> Key: GRIFFIN-187
> URL: https://issues.apache.org/jira/browse/GRIFFIN-187
> Project: Griffin (Incubating)
>  Issue Type: New Feature
>Reporter: Spencer Hivert
>Priority: Minor
>
> Here at Credit Karma, we've discovered that it's also useful to check if the 
> string is empty. We were originally using the null count functionality 
> however we were mislead as the field was not null, but rather empty.
> Add the ability to check for empty strings!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-186) [UI] Re-Factor Profiling Measure Creation

2018-08-30 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-186.

Resolution: Done

> [UI] Re-Factor Profiling Measure Creation
> -
>
> Key: GRIFFIN-186
> URL: https://issues.apache.org/jira/browse/GRIFFIN-186
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Spencer Hivert
>Priority: Minor
>
> The current code structure contained in 
> "/incubator-griffin/ui/angular/src/app/measure/create-measure/pr" is 
> confusing and difficult to work with.
> Each step should have a separate component rather than combining all into a 
> single component.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-184) [Service] download miss records

2018-08-30 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-184.

Resolution: Done

> [Service] download miss records
> ---
>
> Key: GRIFFIN-184
> URL: https://issues.apache.org/jira/browse/GRIFFIN-184
> Project: Griffin (Incubating)
>  Issue Type: New Feature
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (GRIFFIN-172) when will version of 1.0.0 be released

2018-08-30 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu closed GRIFFIN-172.
--
Resolution: Not A Problem

> when will version of 1.0.0 be released 
> ---
>
> Key: GRIFFIN-172
> URL: https://issues.apache.org/jira/browse/GRIFFIN-172
> Project: Griffin (Incubating)
>  Issue Type: Task
>Affects Versions: 1.0.0-incubating
>Reporter: coal chan
>Priority: Minor
>
> I see the plan of 1.0.0-incubating will be realeased at 2018-06-30，but now is 
> out of date.
> Was there any difficulty？



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-164) Make 'Regular expression detection count' available in UI

2018-08-30 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-164.

Resolution: Done

> Make 'Regular expression detection count' available in UI
> -
>
> Key: GRIFFIN-164
> URL: https://issues.apache.org/jira/browse/GRIFFIN-164
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 0.1.6-incubating
>Reporter: Enrico D'Urso
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Hi,
> I have been playing for one month now with Griffin.
> Given my experience, some companies (included the one am working for as a 
> consultant) prefer doing stuff using UI.
> Personally, I find very useful the following feature:
>  
>  * Regular expression detection count
> which is, I have a column which should contain just numbers so I want to 
> check if my ETL process, wrongly, has populated my table with non-numeric 
> values.
> I have been able to run such a job creating my self the right config.json, in 
> particular, using spark-sql as dialect:
> {code:java}
> select count(*) from src where account_id rlike [^0-9]  
> {code}
> I saw that in pr.component.ts there is a commented line of code:
> {code:java}
> // {"id":10,"itemName":"Regular Expression Detection Count","category": 
> "Advanced Statistics"}
> {code}
> which I think is what I am talking about.
> Also, I can read:
> {code:java}
> // case 'Regular Expression Detection Count': // return 
> 'count(source.`'+col.name+'`) where source.`'+col.name+'` LIKE ';
> {code}
> which should be the griffin-dsl dialect, even if, probably, the regex should 
> be added just after LIKE.
> Then, once that the above griffin-dsl statement is available in the backend, 
> ProfilingRulePlanTrans class
> should map that into 'rlike' Spark-sql clause.
> Am not sure where (and if) ProfilingRulePlanTrans should be modified as 
> preGroupbyClause should contains everything, but I do not have enough 
> knowledge about it.
>  
> Please judge yourself the priority of such a feature, which knowing well the 
> code, should not be too hard to make.
> Thanks,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-146) [Service] Prepare and test job state and action service

2018-08-30 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-146.

Resolution: Done

> [Service] Prepare and test job state and action service
> ---
>
> Key: GRIFFIN-146
> URL: https://issues.apache.org/jira/browse/GRIFFIN-146
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>Assignee: Yuqin Xuan
>Priority: Major
>
> cherry pick and test , push to master



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-185) [UI] download miss records

2018-08-21 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-185.

   Resolution: Done
Fix Version/s: 1.0.0-incubating

Has tested this function in docker image griffin_spark2:0.2.1

> [UI] download miss records
> --
>
> Key: GRIFFIN-185
> URL: https://issues.apache.org/jira/browse/GRIFFIN-185
> Project: Griffin (Incubating)
>  Issue Type: New Feature
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
> Fix For: 1.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (GRIFFIN-188) Docker dev question

2018-08-21 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588281#comment-16588281
 ] 

Lionel Liu edited comment on GRIFFIN-188 at 8/22/18 2:19 AM:
-

Hi [~djkooks], seems like you're using docker container as griffin dependent 
environment, and running GriffinWebApplication at local or via your IDE, am I 
right?

1. In the 'service/src/main/resource/application.properties' you set: 

```

spring.datasource.url=jdbc:postgresql://192.168.99.100:{color:#ff}5432{color}/quartz?autoReconnect=true=false

```

192.168.99.100 should be your docker host ip address, since you can not access 
docker container directly, in the docker-compose.yml file we've mapped the port 
5432 of docker container to the port 35432 of docker host. Thus you need to set 
it like this:

```

spring.datasource.url=jdbc:postgresql://192.168.99.100:{color:#ff}35432{color}/quartz?autoReconnect=true=false

```

 

2. I've noticed that you're running the code of master branch, because we've 
modified the json format of measure module recently, the docker image 
`bhlx3lyx7/griffin_spark2:0.2.0` is out of date. We've also updated the docker 
image in these days, you can pull the new docker image 
`bhlx3lyx7/griffin_spark2:{color:#ff}0.2.1{color}`, and modify the version 
number in docker-compose.yml you're using too.

We'll also update the document later.

 

Hope this helps you, thanks.


was (Author: lionel_3l):
Hi [~djkooks], seems like you're using docker container as griffin dependent 
environment, and running GriffinWebApplication at local or via your IDE, am I 
right?

1. In the 'service/src/main/resource/application.properties' you set: 

```

spring.datasource.url=jdbc:postgresql://192.168.99.100:{color:#FF}5432{color}/quartz?autoReconnect=true=false

```

192.168.99.100 should be your docker host ip address, since you can not access 
docker container directly, in the docker-compose.yml file we've mapped the port 
5432 of docker container to the port 35432 of docker host. Thus you need to set 
it like this:

```

spring.datasource.url=jdbc:postgresql://192.168.99.100:{color:#FF}35432{color}/quartz?autoReconnect=true=false

```

 

2. I've noticed that you're running the code of master branch, because we've 
modified the json format of measure module recently, the docker image 
`bhlx3lyx7/griffin_spark2:0.2.0` is out of date. We've also updated the docker 
image in these days, you can pull the new docker image 
`bhlx3lyx7/griffin_spark2:{color:#FF}0.2.1{color}`, and modify the version 
number in docker-compose.yml you're using too.

 

Hope this helps you, thanks.

> Docker dev question
> ---
>
> Key: GRIFFIN-188
> URL: https://issues.apache.org/jira/browse/GRIFFIN-188
> Project: Griffin (Incubating)
>  Issue Type: Task
>Reporter: Kwang-in (Dennis) JUNG
>        Assignee: Lionel Liu
>Priority: Trivial
>
> Hello,
> I'm following guide in `environment for dev`, and finished docker containers 
> setup(API goes well via postman).
> Now, I setup the properties value and run GriffinWebApplication, but it 
> failed:
> ```
> 2018-08-21 14:45:12.385 INFO 7667 --- [ main] o.a.g.c.c.EnvConfig : {
>  "spark" : {
>  "log.level" : "WARN",
>  "checkpoint.dir" : "hdfs:///griffin/checkpoint/${JOB_NAME}",
>  "init.clear" : true,
>  "batch.interval" : "1m",
>  "process.interval" : "5m",
>  "config" : {
>  "spark.default.parallelism" : 4,
>  "spark.task.maxFailures" : 5,
>  "spark.streaming.kafkaMaxRatePerPartition" : 1000,
>  "spark.streaming.concurrentJobs" : 4,
>  "spark.yarn.maxAppAttempts" : 5,
>  "spark.yarn.am.attemptFailuresValidityInterval" : "1h",
>  "spark.yarn.max.executor.failures" : 120,
>  "spark.yarn.executor.failuresValidityInterval" : "1h",
>  "spark.hadoop.fs.hdfs.impl.disable.cache" : true
>  }
>  },
>  "sinks" : [ {
>  "type" : "CONSOLE",
>  "config" : {
>  "max.log.lines" : 100
>  }
>  }, {
>  "type" : "HDFS",
>  "config" : {
>  "path" : "hdfs:///griffin/persist",
>  "max.persist.lines" : 1,
>  "max.lines.per.file" : 1
>  }
>  }, {
>  "type" : "ELASTICSEARCH",
>  "config" : {
>  "method" : "post",
>  "api" : "http://es:9200/griffin/accuracy;
>  }
>  } ],
>  "griffin.checkpoint" : [ {
>  "type"

[jira] [Commented] (GRIFFIN-188) Docker dev question

2018-08-21 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588281#comment-16588281
 ] 

Lionel Liu commented on GRIFFIN-188:


Hi [~djkooks], seems like you're using docker container as griffin dependent 
environment, and running GriffinWebApplication at local or via your IDE, am I 
right?

1. In the 'service/src/main/resource/application.properties' you set: 

```

spring.datasource.url=jdbc:postgresql://192.168.99.100:{color:#FF}5432{color}/quartz?autoReconnect=true=false

```

192.168.99.100 should be your docker host ip address, since you can not access 
docker container directly, in the docker-compose.yml file we've mapped the port 
5432 of docker container to the port 35432 of docker host. Thus you need to set 
it like this:

```

spring.datasource.url=jdbc:postgresql://192.168.99.100:{color:#FF}35432{color}/quartz?autoReconnect=true=false

```

 

2. I've noticed that you're running the code of master branch, because we've 
modified the json format of measure module recently, the docker image 
`bhlx3lyx7/griffin_spark2:0.2.0` is out of date. We've also updated the docker 
image in these days, you can pull the new docker image 
`bhlx3lyx7/griffin_spark2:{color:#FF}0.2.1{color}`, and modify the version 
number in docker-compose.yml you're using too.

 

Hope this helps you, thanks.

> Docker dev question
> ---
>
> Key: GRIFFIN-188
> URL: https://issues.apache.org/jira/browse/GRIFFIN-188
> Project: Griffin (Incubating)
>  Issue Type: Task
>Reporter: Kwang-in (Dennis) JUNG
>        Assignee: Lionel Liu
>Priority: Trivial
>
> Hello,
> I'm following guide in `environment for dev`, and finished docker containers 
> setup(API goes well via postman).
> Now, I setup the properties value and run GriffinWebApplication, but it 
> failed:
> ```
> 2018-08-21 14:45:12.385 INFO 7667 --- [ main] o.a.g.c.c.EnvConfig : {
>  "spark" : {
>  "log.level" : "WARN",
>  "checkpoint.dir" : "hdfs:///griffin/checkpoint/${JOB_NAME}",
>  "init.clear" : true,
>  "batch.interval" : "1m",
>  "process.interval" : "5m",
>  "config" : {
>  "spark.default.parallelism" : 4,
>  "spark.task.maxFailures" : 5,
>  "spark.streaming.kafkaMaxRatePerPartition" : 1000,
>  "spark.streaming.concurrentJobs" : 4,
>  "spark.yarn.maxAppAttempts" : 5,
>  "spark.yarn.am.attemptFailuresValidityInterval" : "1h",
>  "spark.yarn.max.executor.failures" : 120,
>  "spark.yarn.executor.failuresValidityInterval" : "1h",
>  "spark.hadoop.fs.hdfs.impl.disable.cache" : true
>  }
>  },
>  "sinks" : [ {
>  "type" : "CONSOLE",
>  "config" : {
>  "max.log.lines" : 100
>  }
>  }, {
>  "type" : "HDFS",
>  "config" : {
>  "path" : "hdfs:///griffin/persist",
>  "max.persist.lines" : 1,
>  "max.lines.per.file" : 1
>  }
>  }, {
>  "type" : "ELASTICSEARCH",
>  "config" : {
>  "method" : "post",
>  "api" : "http://es:9200/griffin/accuracy;
>  }
>  } ],
>  "griffin.checkpoint" : [ {
>  "type" : "zk",
>  "config" : {
>  "hosts" : "zk:2181",
>  "namespace" : "griffin/infocache",
>  "lock.path" : "lock",
>  "mode" : "persist",
>  "init.clear" : false,
>  "close.clear" : false
>  }
>  } ]
> }
> 2018-08-21 14:45:12.387 INFO 7667 --- [ main] o.a.g.c.u.FileUtil : Location 
> is empty. Read from default path.
> 2018-08-21 14:45:12.396 INFO 7667 --- [ main] o.a.g.c.u.FileUtil : Location 
> is empty. Read from default path.
> 2018-08-21 14:45:12.397 INFO 7667 --- [ main] o.s.b.f.c.PropertiesFactoryBean 
> : Loading properties file from class path resource [quartz.properties]
> 2018-08-21 14:45:12.400 INFO 7667 --- [ main] o.a.g.c.u.PropertiesUtil : Read 
> properties successfully from /quartz.properties.
> 2018-08-21 14:45:12.516 INFO 7667 --- [ main] o.q.i.StdSchedulerFactory : 
> Using default implementation for ThreadExecutor
> 2018-08-21 14:45:12.605 INFO 7667 --- [ main] o.q.c.SchedulerSignalerImpl : 
> Initialized Scheduler Signaller of type: class 
> org.quartz.core.SchedulerSignalerImpl
> 2018-08-21 14:45:12.605 INFO 7667 --- [ main] o.q.c.QuartzSched

[jira] [Assigned] (GRIFFIN-188) Docker dev question

2018-08-21 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu reassigned GRIFFIN-188:
--

Assignee: Lionel Liu

> Docker dev question
> ---
>
> Key: GRIFFIN-188
> URL: https://issues.apache.org/jira/browse/GRIFFIN-188
> Project: Griffin (Incubating)
>  Issue Type: Task
>Reporter: Kwang-in (Dennis) JUNG
>        Assignee: Lionel Liu
>Priority: Trivial
>
> Hello,
> I'm following guide in `environment for dev`, and finished docker containers 
> setup(API goes well via postman).
> Now, I setup the properties value and run GriffinWebApplication, but it 
> failed:
> ```
> 2018-08-21 14:45:12.385 INFO 7667 --- [ main] o.a.g.c.c.EnvConfig : {
>  "spark" : {
>  "log.level" : "WARN",
>  "checkpoint.dir" : "hdfs:///griffin/checkpoint/${JOB_NAME}",
>  "init.clear" : true,
>  "batch.interval" : "1m",
>  "process.interval" : "5m",
>  "config" : {
>  "spark.default.parallelism" : 4,
>  "spark.task.maxFailures" : 5,
>  "spark.streaming.kafkaMaxRatePerPartition" : 1000,
>  "spark.streaming.concurrentJobs" : 4,
>  "spark.yarn.maxAppAttempts" : 5,
>  "spark.yarn.am.attemptFailuresValidityInterval" : "1h",
>  "spark.yarn.max.executor.failures" : 120,
>  "spark.yarn.executor.failuresValidityInterval" : "1h",
>  "spark.hadoop.fs.hdfs.impl.disable.cache" : true
>  }
>  },
>  "sinks" : [ {
>  "type" : "CONSOLE",
>  "config" : {
>  "max.log.lines" : 100
>  }
>  }, {
>  "type" : "HDFS",
>  "config" : {
>  "path" : "hdfs:///griffin/persist",
>  "max.persist.lines" : 1,
>  "max.lines.per.file" : 1
>  }
>  }, {
>  "type" : "ELASTICSEARCH",
>  "config" : {
>  "method" : "post",
>  "api" : "http://es:9200/griffin/accuracy;
>  }
>  } ],
>  "griffin.checkpoint" : [ {
>  "type" : "zk",
>  "config" : {
>  "hosts" : "zk:2181",
>  "namespace" : "griffin/infocache",
>  "lock.path" : "lock",
>  "mode" : "persist",
>  "init.clear" : false,
>  "close.clear" : false
>  }
>  } ]
> }
> 2018-08-21 14:45:12.387 INFO 7667 --- [ main] o.a.g.c.u.FileUtil : Location 
> is empty. Read from default path.
> 2018-08-21 14:45:12.396 INFO 7667 --- [ main] o.a.g.c.u.FileUtil : Location 
> is empty. Read from default path.
> 2018-08-21 14:45:12.397 INFO 7667 --- [ main] o.s.b.f.c.PropertiesFactoryBean 
> : Loading properties file from class path resource [quartz.properties]
> 2018-08-21 14:45:12.400 INFO 7667 --- [ main] o.a.g.c.u.PropertiesUtil : Read 
> properties successfully from /quartz.properties.
> 2018-08-21 14:45:12.516 INFO 7667 --- [ main] o.q.i.StdSchedulerFactory : 
> Using default implementation for ThreadExecutor
> 2018-08-21 14:45:12.605 INFO 7667 --- [ main] o.q.c.SchedulerSignalerImpl : 
> Initialized Scheduler Signaller of type: class 
> org.quartz.core.SchedulerSignalerImpl
> 2018-08-21 14:45:12.605 INFO 7667 --- [ main] o.q.c.QuartzScheduler : Quartz 
> Scheduler v.2.2.2 created.
> 2018-08-21 14:45:22.613 INFO 7667 --- [ main] o.s.s.q.LocalDataSourceJobStore 
> : Could not detect database type. Assuming locks can be taken.
> 2018-08-21 14:45:22.613 INFO 7667 --- [ main] o.s.s.q.LocalDataSourceJobStore 
> : Using db table-based data access locking (synchronization).
> Aug 21, 2018 2:45:22 PM org.apache.tomcat.jdbc.pool.ConnectionPool init
> SEVERE: Unable to create initial connections of pool.
> org.postgresql.util.PSQLException: The connection attempt failed.
>  at 
> org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:272)
>  at 
> org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51)
>  at org.postgresql.jdbc.PgConnection.(PgConnection.java:215)
>  at org.postgresql.Driver.makeConnection(Driver.java:404)
>  at org.postgresql.Driver.connect(Driver.java:272)
>  at 
> org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:310)
>  at 
> org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:203)
>  at 
> org.apache.tomcat.jdbc.pool.ConnectionPool.createConn

Re: Metrics of a streaming job

2018-08-12 Thread Lionel Liu

Hi Vikram,

Seems you're following the guide for streaming mode:
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md#how-to-use-griffin-docker-images-in-streaming-mode
At current, UI and service doesn't support streaming measure create and
submit. You can only submit the spark application manually via shell, as
the step in the previous guide.
By tracing the log, you can get the status of calculation, when the result
of each mini-batch printed in log, it should also be posted to
elasticsearch, after then you can be able to retrieve the metrics from ES.

Thanks,
Lionel

On Fri, Aug 10, 2018 at 5:52 PM Vikram Jain  wrote:

> Hello,
> I’m trying my hands on Griffin and able to successfully deploy it on
> Docker.
> The UI and REST APIs seems to work smoothly for the batch jobs but I’m
> facing issues running streaming jobs.
> As per my understanding, metrics of streaming jobs are also stored in
> ElasticSearch. However, when I look at ES after streaming job execution
> starts, I don’t see any metrics being stored there.
>
> Please help me understand where the metrics of streaming job are stored
> and how to retrive them. I am using the sample database and instructions as
> present in Apache Griffin Docker guide :
> https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md
>
>
> Thanks & Regards,
> Vikram Jain
>
>
> Sent from Mail for
> Windows 10
>
>

[jira] [Resolved] (GRIFFIN-178) [UI] job state not update after performing start/stop actions on job page

2018-07-12 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-178.

   Resolution: Fixed
Fix Version/s: 1.0.0-incubating

Issue resolved by pull request 346
[https://github.com/apache/incubator-griffin/pull/346]

> [UI] job state not update after performing start/stop actions on job page
> -
>
> Key: GRIFFIN-178
> URL: https://issues.apache.org/jira/browse/GRIFFIN-178
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
> Fix For: 1.0.0-incubating
>
> Attachments: 178.png
>
>
> job state is not update properly after performing start/stop job action, 
> still previouse state
> attached issue snapshot
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-177) [UI] measure page wrong display

2018-07-12 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-177.

   Resolution: Fixed
Fix Version/s: 1.0.0-incubating

Issue resolved by pull request 344
[https://github.com/apache/incubator-griffin/pull/344]

> [UI] measure page wrong display 
> 
>
> Key: GRIFFIN-177
> URL: https://issues.apache.org/jira/browse/GRIFFIN-177
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
> Fix For: 1.0.0-incubating
>
> Attachments: 177.png
>
>
> found measure page wrongly displayed attached the symptom 
> root cause: \{{row["dq.type"].toLowerCase()}} in 
> measure.component.html causing infinite loading if  row["dq.type"] is null
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-174) Fix deploy document

2018-07-09 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-174.

   Resolution: Resolved
Fix Version/s: 1.0.0-incubating

Fixed the wrong links and environment names in document.

> Fix deploy document
> ---
>
> Key: GRIFFIN-174
> URL: https://issues.apache.org/jira/browse/GRIFFIN-174
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 0.2.0-incubating
>Reporter: Kwang-in (Dennis) JUNG
>    Assignee: Lionel Liu
>Priority: Trivial
> Fix For: 1.0.0-incubating
>
>
> Hello.
> While following up deploy guide, I found some wrong links/file name, so 
> updated PR for fix.
> [https://github.com/apache/incubator-griffin/pull/340]
>  
>  * Wrong link of SQL table
>  * Wrong environment name
> Maybe my fix can be wrong, so could anyone check this?
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (GRIFFIN-174) Fix deploy document

2018-07-09 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu reassigned GRIFFIN-174:
--

Assignee: Lionel Liu

> Fix deploy document
> ---
>
> Key: GRIFFIN-174
> URL: https://issues.apache.org/jira/browse/GRIFFIN-174
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 0.2.0-incubating
>Reporter: Kwang-in (Dennis) JUNG
>    Assignee: Lionel Liu
>Priority: Trivial
>
> Hello.
> While following up deploy guide, I found some wrong links/file name, so 
> updated PR for fix.
> [https://github.com/apache/incubator-griffin/pull/340]
>  
>  * Wrong link of SQL table
>  * Wrong environment name
> Maybe my fix can be wrong, so could anyone check this?
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-164) Make 'Regular expression detection count' available in UI

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-164:
---
Affects Version/s: 1.0.0-incubating

> Make 'Regular expression detection count' available in UI
> -
>
> Key: GRIFFIN-164
> URL: https://issues.apache.org/jira/browse/GRIFFIN-164
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 0.1.6-incubating
>Reporter: Enrico D'Urso
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Hi,
> I have been playing for one month now with Griffin.
> Given my experience, some companies (included the one am working for as a 
> consultant) prefer doing stuff using UI.
> Personally, I find very useful the following feature:
>  
>  * Regular expression detection count
> which is, I have a column which should contain just numbers so I want to 
> check if my ETL process, wrongly, has populated my table with non-numeric 
> values.
> I have been able to run such a job creating my self the right config.json, in 
> particular, using spark-sql as dialect:
> {code:java}
> select count(*) from src where account_id rlike [^0-9]  
> {code}
> I saw that in pr.component.ts there is a commented line of code:
> {code:java}
> // {"id":10,"itemName":"Regular Expression Detection Count","category": 
> "Advanced Statistics"}
> {code}
> which I think is what I am talking about.
> Also, I can read:
> {code:java}
> // case 'Regular Expression Detection Count': // return 
> 'count(source.`'+col.name+'`) where source.`'+col.name+'` LIKE ';
> {code}
> which should be the griffin-dsl dialect, even if, probably, the regex should 
> be added just after LIKE.
> Then, once that the above griffin-dsl statement is available in the backend, 
> ProfilingRulePlanTrans class
> should map that into 'rlike' Spark-sql clause.
> Am not sure where (and if) ProfilingRulePlanTrans should be modified as 
> preGroupbyClause should contains everything, but I do not have enough 
> knowledge about it.
>  
> Please judge yourself the priority of such a feature, which knowing well the 
> code, should not be too hard to make.
> Thanks,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-164) Make 'Regular expression detection count' available in UI

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-164:
---
Affects Version/s: (was: 0.1.6-incubating)

> Make 'Regular expression detection count' available in UI
> -
>
> Key: GRIFFIN-164
> URL: https://issues.apache.org/jira/browse/GRIFFIN-164
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 0.1.6-incubating
>Reporter: Enrico D'Urso
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Hi,
> I have been playing for one month now with Griffin.
> Given my experience, some companies (included the one am working for as a 
> consultant) prefer doing stuff using UI.
> Personally, I find very useful the following feature:
>  
>  * Regular expression detection count
> which is, I have a column which should contain just numbers so I want to 
> check if my ETL process, wrongly, has populated my table with non-numeric 
> values.
> I have been able to run such a job creating my self the right config.json, in 
> particular, using spark-sql as dialect:
> {code:java}
> select count(*) from src where account_id rlike [^0-9]  
> {code}
> I saw that in pr.component.ts there is a commented line of code:
> {code:java}
> // {"id":10,"itemName":"Regular Expression Detection Count","category": 
> "Advanced Statistics"}
> {code}
> which I think is what I am talking about.
> Also, I can read:
> {code:java}
> // case 'Regular Expression Detection Count': // return 
> 'count(source.`'+col.name+'`) where source.`'+col.name+'` LIKE ';
> {code}
> which should be the griffin-dsl dialect, even if, probably, the regex should 
> be added just after LIKE.
> Then, once that the above griffin-dsl statement is available in the backend, 
> ProfilingRulePlanTrans class
> should map that into 'rlike' Spark-sql clause.
> Am not sure where (and if) ProfilingRulePlanTrans should be modified as 
> preGroupbyClause should contains everything, but I do not have enough 
> knowledge about it.
>  
> Please judge yourself the priority of such a feature, which knowing well the 
> code, should not be too hard to make.
> Thanks,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-164) Make 'Regular expression detection count' available in UI

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-164:
---
Fix Version/s: 1.0.0-incubating

> Make 'Regular expression detection count' available in UI
> -
>
> Key: GRIFFIN-164
> URL: https://issues.apache.org/jira/browse/GRIFFIN-164
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 0.1.6-incubating
>Reporter: Enrico D'Urso
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Hi,
> I have been playing for one month now with Griffin.
> Given my experience, some companies (included the one am working for as a 
> consultant) prefer doing stuff using UI.
> Personally, I find very useful the following feature:
>  
>  * Regular expression detection count
> which is, I have a column which should contain just numbers so I want to 
> check if my ETL process, wrongly, has populated my table with non-numeric 
> values.
> I have been able to run such a job creating my self the right config.json, in 
> particular, using spark-sql as dialect:
> {code:java}
> select count(*) from src where account_id rlike [^0-9]  
> {code}
> I saw that in pr.component.ts there is a commented line of code:
> {code:java}
> // {"id":10,"itemName":"Regular Expression Detection Count","category": 
> "Advanced Statistics"}
> {code}
> which I think is what I am talking about.
> Also, I can read:
> {code:java}
> // case 'Regular Expression Detection Count': // return 
> 'count(source.`'+col.name+'`) where source.`'+col.name+'` LIKE ';
> {code}
> which should be the griffin-dsl dialect, even if, probably, the regex should 
> be added just after LIKE.
> Then, once that the above griffin-dsl statement is available in the backend, 
> ProfilingRulePlanTrans class
> should map that into 'rlike' Spark-sql clause.
> Am not sure where (and if) ProfilingRulePlanTrans should be modified as 
> preGroupbyClause should contains everything, but I do not have enough 
> knowledge about it.
>  
> Please judge yourself the priority of such a feature, which knowing well the 
> code, should not be too hard to make.
> Thanks,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-164) Make 'Regular expression detection count' available in UI

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-164:
---
Affects Version/s: (was: 1.0.0-incubating)
   0.1.6-incubating

> Make 'Regular expression detection count' available in UI
> -
>
> Key: GRIFFIN-164
> URL: https://issues.apache.org/jira/browse/GRIFFIN-164
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 0.1.6-incubating
>Reporter: Enrico D'Urso
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> Hi,
> I have been playing for one month now with Griffin.
> Given my experience, some companies (included the one am working for as a 
> consultant) prefer doing stuff using UI.
> Personally, I find very useful the following feature:
>  
>  * Regular expression detection count
> which is, I have a column which should contain just numbers so I want to 
> check if my ETL process, wrongly, has populated my table with non-numeric 
> values.
> I have been able to run such a job creating my self the right config.json, in 
> particular, using spark-sql as dialect:
> {code:java}
> select count(*) from src where account_id rlike [^0-9]  
> {code}
> I saw that in pr.component.ts there is a commented line of code:
> {code:java}
> // {"id":10,"itemName":"Regular Expression Detection Count","category": 
> "Advanced Statistics"}
> {code}
> which I think is what I am talking about.
> Also, I can read:
> {code:java}
> // case 'Regular Expression Detection Count': // return 
> 'count(source.`'+col.name+'`) where source.`'+col.name+'` LIKE ';
> {code}
> which should be the griffin-dsl dialect, even if, probably, the regex should 
> be added just after LIKE.
> Then, once that the above griffin-dsl statement is available in the backend, 
> ProfilingRulePlanTrans class
> should map that into 'rlike' Spark-sql clause.
> Am not sure where (and if) ProfilingRulePlanTrans should be modified as 
> preGroupbyClause should contains everything, but I do not have enough 
> knowledge about it.
>  
> Please judge yourself the priority of such a feature, which knowing well the 
> code, should not be too hard to make.
> Thanks,
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-166) [UI] fix Create measure page layout

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-166:
---
Summary: [UI] fix Create measure page layout   (was: [UI] fix Metrics 
Publish layout )

> [UI] fix Create measure page layout 
> 
>
> Key: GRIFFIN-166
> URL: https://issues.apache.org/jira/browse/GRIFFIN-166
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
> Fix For: 1.0.0-incubating
>
> Attachments: Screen Shot 2018-05-17 at 4.28.03 PM.png
>
>
> current layout is attached 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-166) [UI] fix Metrics Publish layout

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-166:
---
Summary: [UI] fix Metrics Publish layout   (was: [UI] fix Publish layout )

> [UI] fix Metrics Publish layout 
> 
>
> Key: GRIFFIN-166
> URL: https://issues.apache.org/jira/browse/GRIFFIN-166
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
> Fix For: 1.0.0-incubating
>
> Attachments: Screen Shot 2018-05-17 at 4.28.03 PM.png
>
>
> current layout is attached 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (GRIFFIN-168) moderate severity security vulnerability detected in hoek < 4.2.1

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu closed GRIFFIN-168.
--
Resolution: Resolved

> moderate severity security vulnerability detected in hoek < 4.2.1 
> --
>
> Key: GRIFFIN-168
> URL: https://issues.apache.org/jira/browse/GRIFFIN-168
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Alex Lv
>Assignee: Alex Lv
>Priority: Major
>
> We found a potential security vulnerabilty in one of your dependencies
> |[{color:#0366d6}!https://assets-cdn.github.com/images/modules/logos_page/GitHub-Logo.png|width=76,height=21!{color}|https://github.com/]|[{color:#24292e}Sign
>  in{color}|https://github.com/login]|
> *asfsecurity,*
>  
> We found a potential security vulnerability in a repository for which you 
> have been granted security alert access.
> |!https://avatars3.githubusercontent.com/u/47359?s=56=4|width=28,height=28!|[{color:#0366d6}*apache/incubator-griffin*{color}|https://github.com/apache/incubator-griffin]|
> |
> |Known *moderate severity* security vulnerability detected in *hoek < 4.2.1* 
> defined 
> in[*package-lock.json*|https://github.com/apache/incubator-griffin/blob/master/ui/angular/package-lock.json].|
> |[*package-lock.json*|https://github.com/apache/incubator-griffin/blob/master/ui/angular/package-lock.json]
>  update suggested: *hoek ~> 4.2.1*.|
> |{color:#6a737d}_Always verify the validity and compatibility of suggestions 
> with your codebase._{color}|
> |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-170) [Measure] Refactor read steps to take the read responsibility instead of data connector and streaming data cache client

2018-07-06 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534555#comment-16534555
 ] 

Lionel Liu commented on GRIFFIN-170:


modify TimestampStorage to TimestampAnchor, or try to make it optional

> [Measure] Refactor read steps to take the read responsibility instead of data 
> connector and streaming data cache client
> ---
>
> Key: GRIFFIN-170
> URL: https://issues.apache.org/jira/browse/GRIFFIN-170
> Project: Griffin (Incubating)
>  Issue Type: Task
>Affects Versions: 1.0.0-incubating
>    Reporter: Lionel Liu
>Assignee: Lionel Liu
>Priority: Major
> Fix For: 1.0.0-incubating
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> At current, griffin reads data actually via data connector (in batch mode) 
> and streaming data cache client (in streaming mode), it should be refactored 
> to be able to read data via read steps, to make the conception much easier 
> for users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (GRIFFIN-171) [UI] UI changes to be align with service latest api

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu closed GRIFFIN-171.
--
Resolution: Resolved

> [UI] UI changes to be align with service latest api
> ---
>
> Key: GRIFFIN-171
> URL: https://issues.apache.org/jira/browse/GRIFFIN-171
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
>
>  
> need UI changes corresponding to changes in PR: 
> https://github.com/apache/incubator-griffin/pull/320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-171) [UI] UI changes to be align with service latest api

2018-07-06 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-171:
---
Summary: [UI] UI changes to be align with service latest api  (was: [UI] UI 
changes corresponding to PR320)

> [UI] UI changes to be align with service latest api
> ---
>
> Key: GRIFFIN-171
> URL: https://issues.apache.org/jira/browse/GRIFFIN-171
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
>
>  
> need UI changes corresponding to changes in PR: 
> https://github.com/apache/incubator-griffin/pull/320



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-173) [Measure] Support JDBC connection as data source

2018-07-05 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534369#comment-16534369
 ] 

Lionel Liu commented on GRIFFIN-173:


[~maver1ck], spark streaming splits data from streaming source like kafka into 
mini-batches, and generate data sets with the arriving timestamps for each, 
thus the DQ metric in streaming mode actually indicates the metrics of these 
mini-batches. 

The TimestampStorage in streaming data connector stores the timestamps, while 
in batch data connector, to keep the consistency, we store the application 
timestamp as well.

The timestamps stored helps in streaming process, not important for batch 
process.

> [Measure] Support JDBC connection as data source
> 
>
> Key: GRIFFIN-173
> URL: https://issues.apache.org/jira/browse/GRIFFIN-173
> Project: Griffin (Incubating)
>  Issue Type: Task
>Reporter: Maciej Bryński
>Assignee: William Guo
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DoD: 
> Support JDBC connection as data source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GRIFFIN-173) [Measure] Support JDBC connection as data source

2018-07-05 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1652#comment-1652
 ] 

Lionel Liu commented on GRIFFIN-173:


Hi [~maver1ck], 

I agree with you. Spark supports JDBC already, which makes it much easier.

The only thing we need to do is to implement a new batch data connector for 
JDBC, with "JDBC" type supported here: 
[https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/datasource/connector/DataConnectorFactory.scala#L59|https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/datasource/connector/DataConnectorFactory.scala#L59),]

I suggest you have a try and test for this, that would be a good chance to know 
Griffin and help it grow.

 

Thanks,

Lionel

> [Measure] Support JDBC connection as data source
> 
>
> Key: GRIFFIN-173
> URL: https://issues.apache.org/jira/browse/GRIFFIN-173
> Project: Griffin (Incubating)
>  Issue Type: Task
>Reporter: Maciej Bryński
>Assignee: William Guo
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DoD: 
> Support JDBC connection as data source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (GRIFFIN-170) [Measure] Refactor read steps to take the read responsibility instead of data connector and streaming data cache client

2018-06-22 Thread Lionel Liu (JIRA)

Lionel Liu created GRIFFIN-170:
--

 Summary: [Measure] Refactor read steps to take the read 
responsibility instead of data connector and streaming data cache client
 Key: GRIFFIN-170
 URL: https://issues.apache.org/jira/browse/GRIFFIN-170
 Project: Griffin (Incubating)
  Issue Type: Task
Affects Versions: 1.0.0-incubating
Reporter: Lionel Liu
Assignee: Lionel Liu
 Fix For: 1.0.0-incubating


At current, griffin reads data actually via data connector (in batch mode) and 
streaming data cache client (in streaming mode), it should be refactored to be 
able to read data via read steps, to make the conception much easier for users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Griffin Web API

2018-06-05 Thread Lionel Liu

Hi Karan,

For the web APIs, you can import the json files here
https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service/postman
into your postman, and here's the api guide document:
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/api-guide.md

For your questions:

1. We can create a measure to define the accuracy or profiling task, and
then create a job to schedule the measure you've created, that's how it
works.
At current, in batch mode, the job schedules by the cron expression, it is
usually used for hourly or daily jobs. For one-time job, griffin doesn't
supply such API, maybe you can try the cron expression for one-time trigger
or just schedule it on several minutes basis instead.

2. For the results of calculation, we call it metrics. You can follow the
document:
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/api-guide.md#metrics

Thanks,
Lionel

On Tue, Jun 5, 2018 at 1:44 PM, Karan Gupta  wrote:

> Hi Lionel,
>
>
>
> I was trying to explore the Web API services of Griffin,
>
>
>
> I have following questions
>
>
>
>1. How to invoke a web-service API for a profiling/accuracy task? (or)
>Can we?
>2. How to poll for result?
>
>
>
> Thank you,
>
> Karan Gupta
> --
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>

[jira] [Commented] (GRIFFIN-167) griffin实践案例

2018-06-04 Thread Lionel Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/GRIFFIN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499860#comment-16499860
 ] 

Lionel Liu commented on GRIFFIN-167:


Hi [~coal],

We have deployed griffin in ebay production environment for some use cases, in 
batch and streaming mode.

We are also providing some documents of user stories on griffin site, but with 
some issues of the site building, the documents are not viewable on site now.

However, you can follow the documents on Github:

streaming: 
[https://github.com/apache/incubator-griffin-site/blob/master/source/_posts/userstory.md]

batch: [https://github.com/apache/incubator-griffin-site/pull/3/files]

Furthermore, you can follow this email to get some more information: 
[https://lists.apache.org/thread.html/6afad173a4f2590f197a7ee4851233bdebf774d8c997fa15f9fba383@%3Cdev.griffin.apache.org%3E]

For you second question, the release version can be deployed, but for different 
environment, maybe there have to be some different workaround to access the 
resources. Griffin focuses on the data quality domain at current, for the 
specific environment issues, you can send emails to our user-list 
([us...@griffin.incubator.apache.org|mailto:us...@griffin.incubator.apache.org])
 or dev-list 
([dev@griffin.incubator.apache.org|mailto:dev@griffin.incubator.apache.org]) 
for help.

I recommend you try the docker image first: 
[https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md,]
 that would be a better way to play it.

Welcome to apache griffin, and we can have some more communication if you like.

Thanks,

Lionel

> griffin实践案例
> ---
>
> Key: GRIFFIN-167
> URL: https://issues.apache.org/jira/browse/GRIFFIN-167
> Project: Griffin (Incubating)
>  Issue Type: Task
>Reporter: coal chan
>Priority: Trivial
>
> 目前，公司正在针对现有经常发生的数据问题，以此来构建一个数据质量系统。正好谷歌到griffin这样一个开源系统，看大致介绍可以满足我们的需求，但是继续搜索发现网上的具体应用案例几乎没有。那么我的问题是
>  # 现在eBay内部有直接应用的案例说明吗？
>  # 现在项目处于孵化器阶段，发布的最新版本是否可以直接用于公司的生产环境？



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-123) [Measure] Code review of measure module, refactor and enhance the code style

2018-06-01 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-123.

Resolution: Done

> [Measure] Code review of measure module, refactor and enhance the code style
> 
>
> Key: GRIFFIN-123
> URL: https://issues.apache.org/jira/browse/GRIFFIN-123
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>    Reporter: Lionel Liu
>Assignee: William Guo
>Priority: Major
>   Original Estimate: 72h
>  Time Spent: 12h
>  Remaining Estimate: 60h
>
> Code review of measure module, refactor and enhance the code style



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-159) [Measure] Refactor generation of measure metrics

2018-06-01 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-159.

Resolution: Done

> [Measure] Refactor generation of measure metrics
> 
>
> Key: GRIFFIN-159
> URL: https://issues.apache.org/jira/browse/GRIFFIN-159
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>        Assignee: Lionel Liu
>Priority: Major
> Fix For: 1.0.0-incubating
>
>
> measure metrics is generated as json at current, we're planning to enhance 
> the metrics generation.
>  # abstract metrics data generation interface.
>  # implementation of current metric data generation.
>  # implementation of advanced metric data generation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-158) Functional test of generic scheduler

2018-06-01 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-158.

   Resolution: Done
Fix Version/s: 1.0.0-incubating

> Functional test of generic scheduler
> 
>
> Key: GRIFFIN-158
> URL: https://issues.apache.org/jira/browse/GRIFFIN-158
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Kevin Yao
>Assignee: Kevin Yao
>Priority: Major
> Fix For: 1.0.0-incubating
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Functional test of generic schedule for both streaming and batch jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (GRIFFIN-145) [Service] Refactor of service API

2018-06-01 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu reassigned GRIFFIN-145:
--

Assignee: wanyin  (was: Kevin Yao)

> [Service] Refactor of service API 
> --
>
> Key: GRIFFIN-145
> URL: https://issues.apache.org/jira/browse/GRIFFIN-145
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>Assignee: wanyin
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [REVIEW] Apache griffin podling report for June

2018-06-01 Thread Lionel Liu

LGTM

Thanks,
Lionel

On Wed, May 30, 2018 at 1:30 PM, wenzhao  wrote:

> Dear all,
>
> Please help to review the griffin podling report for June.
>
> --
> Griffin
>
> Griffin is a open source Data Quality solution for distributed data systems
> at any scale in both streaming or batch data context.
>
> Griffin has been incubating since 2016-12-05.
>
> Three most important issues to address in the move towards graduation:
>
>   1. Preparation for graduation.
>   2. Grow and marketing the community.
>   3. Improve user guide documents.
>
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
> of?
>
>   - None
>
> How has the community developed since the last report?
>
>   - 3 new committers has been elected.
>   - More contributors had contributed to our community.
>   - More users had contacted us for use cases.
>
> How has the project developed since the last report?
>
>   - Active development is moving on well, 53 commits in last three months.
>   - Released new version 0.2.0-incubating.
>   - Support measure job scheduler for batch mode.
>   - Use postgresql instead of mysql by default.
>   - Upgrade default spark version from 1.6.x to 2.2.1 in measure module.
>   - Upgrade default hive version from 1.2.1 to 2.2.0 in measure module.
>   - Refactor measure module for clearer structure.
>   - Work toward for next version release.
>
> How would you assess the podling's maturity?
>
>   [] Initial setup
>   [] Working towards first release
>   [] Community building - One new contributor
>   [X] Nearing graduation
>   [] Other:
>
> Date of last release:
>
>   - May 16th, 2018
>
> When were the last committers or PPMC members elected?
>
>   - 3 new committer were elected at Mar 30th, 2018
>
> Signed-off-by:
>
>   [ ](griffin) Henry Saputra
>  Comments:
>   [ ](griffin) Kasper Sørensen
>  Comments:
>   [ ](griffin) Uma Maheswara Rao Gangumalla
>  Comments:
>   [ ](griffin) Luciano Resende
>  Comments:
> --
>
> thanks, Vincent
>

[jira] [Resolved] (GRIFFIN-91) [Server] Enhance livy configuration, to support all the parameters of livy in users' environments

2018-06-01 Thread Lionel Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/GRIFFIN-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-91.
---
   Resolution: Fixed
Fix Version/s: 1.0.0-incubating

Issue resolved by pull request 290
[https://github.com/apache/incubator-griffin/pull/290]

> [Server] Enhance livy configuration, to support all the parameters of livy in 
> users' environments
> -
>
> Key: GRIFFIN-91
> URL: https://issues.apache.org/jira/browse/GRIFFIN-91
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>        Reporter: Lionel Liu
>Assignee: Kevin Yao
>Priority: Major
>  Labels: SP_5
> Fix For: 1.0.0-incubating
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
>  
> In some environments, users need to submit spark jobs through livy with some 
> specific parameters like "-negotiate", we need to support all the parameters 
> in griffin service solution



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Issues with running Griffin Docker

2018-05-28 Thread Lionel Liu

Hi Xuexu,

In the document of griffin docker guide:
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md#environment-preparation

In step 2, we need to increase vm.max_map_count to use elasticsearch docker
image
*sysctl -w vm.max_map_count=262144*

It's the command for linux, for mac, would you google for the similar
command ?


Thanks,
Lionel



On Mon, May 28, 2018 at 10:59 PM, XUEXU GUO 
wrote:

> Hi,
>
> I am running Docker stuff on my local(Mac).
>
> I always got ES container quit for some unknown reason. I checked the ES
> contain logs, the last log is the following:
>  [2018-05-28T13:07:53,441][INFO ][o.e.n.Node   ] []
> initializing ..
>
> And the same time, griffin container doesn't respond the request to
> http://locahost:38080
>
>
> the following is the result of "docker ps -a"
>
> CONTAINER IDIMAGECOMMAND
>   CREATED STATUS PORTS
>
>
>
>
>   NAMES
> b2ba28a3744ebhlx3lyx7/griffin_spark2:0.2.0
>  "/etc/bootstrap-all.…"   2 hours ago Up 2 hours
>  6066/tcp, 8030-8033/tcp, 8040/tcp, 9000/tcp, 10020/tcp, 19888/tcp,
> 27017/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp,
> 50090/tcp, 0.0.0.0:32122->2122/tcp, 0.0.0.0:33306->3306/tcp,
> 0.0.0.0:35432->5432/tcp,
> 0.0.0.0:38042->8042/tcp, 0.0.0.0:38080->8080/tcp, 0.0.0.0:38088->8088/tcp,
> 0.0.0.0:38998->8998/tcp, 0.0.0.0:39083->9083/tcp   griffin
> 5c0354da7fe2bhlx3lyx7/elasticsearch
> "/docker-entrypoint.…"   2 hours ago Exited (137) 2 hours ago
>
>
>
>
>   es
>

Re:Profiling Job for multiple tables

2018-05-25 Thread Lionel Liu

Hi Karan,


I think it could work even it seems a little strange. Griffin supports multiple 
data sources, like accuracy. You can declare 4 data sources with different 
names.
However, in rules, you need to declare rules for each data source.
For example, you have data source s1, s2, s3, s4.
you need to declare rules like this:
"rules": [
  {
"rule": "select count(*) from s1",
...
  },
  {
"rule": "select count(*) from s2",
...
  },
  {
"rule": "select count(*) from s3",
...
  },
  {
"rule": "select count(*) from s4",
...
  }
]


--

Regards,
Lionel, Liu

At 2018-05-25 19:31:06, "Karan Gupta" <karan.gu...@tavant.com> wrote:


Hi Lionel,

 

I want to run a custom profiling job for multiple tables in one instance. Is it 
achievable through Griffin? If yes, could you guide me as to how to declare 
more that 4 sources in the config file and use them in

The profiling job.

 

 

Thank you,

Karan Gupta

Any comments or statements made in this email are not necessarily those of 
Tavant Technologies. The information transmitted is intended only for the 
person or entity to which it is addressed and may contain confidential and/or 
privileged material. If you have received this in error, please contact the 
sender and delete the material from any computer. All emails sent from or to 
Tavant Technologies may be subject to our monitoring procedures.

Re: Unable to persist profiling results in HDFS

2018-05-24 Thread Lionel Liu

Hi Karan,

It looks good to me, but maybe a little issue.
In the before, we use "evaluateRule" as the field name, but modified to "
evaluate.rule" later. You need to check it in the code you are using, as
this part:
https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/config/params/user/UserParam.scala#L30

If this can not solve your issue, pls show me the application log.

Thanks,
Lionel


On Thu, May 24, 2018 at 2:38 PM, Karan Gupta  wrote:

> Hi Lionel,
>
>
>
> I created a custom config.json and defined a custom rule to run is spark-sql> and submitted it through spark. The spark submit runs fine
> without any issues. On HDFS, I could see the directory with the custom rule
> name but I am unable to see the _METRIC file where the results will be
> persisted, I only see a _START file. What am I missing here?
>
>
>
> HDFS -> ../persist/CheckAlphaNumeric/1527143022495/_START
>
>
>
> Config.json ->
>
>
>
>
>
> {
>
>   "name": "CheckAlphaNumeric",
>
>
>
>   "process.type": "batch",
>
>
>
>   "data.sources": [
>
> {
>
>   "name": "src",
>
>   "connectors": [
>
> {
>
>   "type": "hive",
>
>   "version": "1.2",
>
>   "config": {
>
> "database": "griffined",
>
> "table.name": "check_table"
>
>   }
>
> }
>
>   ]
>
> }
>
>   ],
>
>   "evaluateRule": {
>
> "rules": [
>
>   {
>
> "dsl.type": "spark-sql",
>
> "dq.type": "profiling",
>
> "name": "checkalphnumeric",
>
> "rule": "SELECT count(name) FROM src WHERE name REGEXP
> '^[a-zA-Z0-9]+$'",
>
> "metric": {
>
> "name": "check_rules"
>
>   }
>
> }
>
> ]
>
>   }
>
> }
>
>
>
>
> --
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>

[jira] [Resolved] (GRIFFIN-166) [UI] fix Publish layout

2018-05-21 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-166.

   Resolution: Fixed
Fix Version/s: 1.0.0-incubating

Issue resolved by pull request 283
[https://github.com/apache/incubator-griffin/pull/283]

> [UI] fix Publish layout 
> 
>
> Key: GRIFFIN-166
> URL: https://issues.apache.org/jira/browse/GRIFFIN-166
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Juan Li
>Assignee: Juan Li
>Priority: Major
> Fix For: 1.0.0-incubating
>
> Attachments: Screen Shot 2018-05-17 at 4.28.03 PM.png
>
>
> current layout is attached 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-163) [Measure] Merge 0.2.0 for spark 2 into master branch

2018-05-16 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-163.

   Resolution: Fixed
Fix Version/s: 1.0.0-incubating

Issue resolved by pull request 282
[https://github.com/apache/incubator-griffin/pull/282]

> [Measure] Merge 0.2.0 for spark 2 into master branch
> 
>
> Key: GRIFFIN-163
> URL: https://issues.apache.org/jira/browse/GRIFFIN-163
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>        Assignee: Lionel Liu
>Priority: Major
> Fix For: 1.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-158) Functional test of generic scheduler

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-158:
---
Sprint: Apache Sprint 4  (was: Apache Sprint 3)

> Functional test of generic scheduler
> 
>
> Key: GRIFFIN-158
> URL: https://issues.apache.org/jira/browse/GRIFFIN-158
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: Kevin Yao
>Assignee: Kevin Yao
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Functional test of generic schedule for both streaming and batch jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-146) [Service] Prepare and test job state and action service

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-146:
---
Sprint: Apache Sprint 4  (was: Apache Sprint 3)

> [Service] Prepare and test job state and action service
> ---
>
> Key: GRIFFIN-146
> URL: https://issues.apache.org/jira/browse/GRIFFIN-146
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>Assignee: Yuqin Xuan
>Priority: Major
>
> cherry pick and test , push to master



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-145) [Service] Refactor of service API

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-145:
---
Sprint: Apache Sprint 2, Apache Sprint 4  (was: Apache Sprint 2, Apache 
Sprint 3)

> [Service] Refactor of service API 
> --
>
> Key: GRIFFIN-145
> URL: https://issues.apache.org/jira/browse/GRIFFIN-145
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>Assignee: Kevin Yao
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-115) [Measure][UT] Enhance UT of rule part in measure module

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-115:
---
Sprint: Apache Sprint 2, Apache Sprint 4  (was: Apache Sprint 2, Apache 
Sprint 3)

> [Measure][UT] Enhance UT of rule part in measure module
> ---
>
> Key: GRIFFIN-115
> URL: https://issues.apache.org/jira/browse/GRIFFIN-115
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>Assignee: William Guo
>Priority: Major
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> In measure module, rule part needs to enhance UT, to cover more conditions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-159) [Measure] Refactor generation of measure metrics

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-159:
---
Sprint: Apache Sprint 4  (was: Apache Sprint 3)

> [Measure] Refactor generation of measure metrics
> 
>
> Key: GRIFFIN-159
> URL: https://issues.apache.org/jira/browse/GRIFFIN-159
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>        Assignee: Lionel Liu
>Priority: Major
> Fix For: 1.0.0-incubating
>
>
> measure metrics is generated as json at current, we're planning to enhance 
> the metrics generation.
>  # abstract metrics data generation interface.
>  # implementation of current metric data generation.
>  # implementation of advanced metric data generation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (GRIFFIN-55) [Server & UI] Profiling process Design

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu resolved GRIFFIN-55.
---
Resolution: Done

> [Server & UI] Profiling process Design 
> ---
>
> Key: GRIFFIN-55
> URL: https://issues.apache.org/jira/browse/GRIFFIN-55
> Project: Griffin (Incubating)
>  Issue Type: Task
>Affects Versions: 0.2.0-incubating
>    Reporter: Lionel Liu
>Assignee: Hang Hu
>Priority: Major
> Fix For: 0.2.0-incubating
>
>
> DoD: Design doc of the profiling process, including the interface between ui 
> and server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (GRIFFIN-63) [Server] [UI] add field "pattern" in data.sources of measure json

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu closed GRIFFIN-63.
-
Resolution: Won't Fix

> [Server] [UI]  add  field "pattern" in data.sources of measure json
> ---
>
> Key: GRIFFIN-63
> URL: https://issues.apache.org/jira/browse/GRIFFIN-63
> Project: Griffin (Incubating)
>  Issue Type: Task
>Affects Versions: 0.2.0-incubating
>Reporter: deyi
>Priority: Major
> Fix For: 0.2.0-incubating
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
>   Add  field "pattern" in data.sources of measure json,so that it does 
> not need to match type of data source , increasing server code flexibility 
> and extensibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-79) Some tasks griffin ops needs to optimized for cluster env

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-79:
--
Fix Version/s: (was: 0.2.0-incubating)
   1.0.0-incubating

> Some tasks griffin ops needs to optimized for cluster env
> -
>
> Key: GRIFFIN-79
> URL: https://issues.apache.org/jira/browse/GRIFFIN-79
> Project: Griffin (Incubating)
>  Issue Type: Task
>Reporter: William Guo
>Assignee: William Guo
>Priority: Major
> Fix For: 1.0.0-incubating
>
>
> Hi there,
> Below are meeting notes for 2018 Q1:
> Issues:
>1. Data recovery  :  when job hangs (dq=0 or stops ),  data loss, after
>job is restarted
>2. HDFS maintenance:  griffin job no outputs, because it fails to
>connect to hadoop
>3. when count priority job hangs (dq=0 or stops ):  may be caused by app
>memory leak issue
>4. Job runs unstable, too much to restart jobs
> Features:
>1. Multi-connector support
>2. Uniqueness
>3. Tagged-metrics
>4. Freshness
>5. Validity
> Best Regards,
> Jenny



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-144) Added support for Elastic Search 6.

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-144:
---
Affects Version/s: (was: 0.2.0-incubating)
   1.0.0-incubating

> Added support for Elastic Search 6.
> ---
>
> Key: GRIFFIN-144
> URL: https://issues.apache.org/jira/browse/GRIFFIN-144
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 1.0.0-incubating
>Reporter: Sparsh Singhal
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> We have to mention content type while posting and getting data from elastic 
> search. Created a pull request on GitHub.
>  
> For Elastic Search 6 requirements: 
> [https://www.elastic.co/blog/strict-content-type-checking-for-elasticsearch-rest-requests]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-144) Added support for Elastic Search 6.

2018-05-15 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-144:
---
Fix Version/s: (was: 0.2.0-incubating)
   1.0.0-incubating

> Added support for Elastic Search 6.
> ---
>
> Key: GRIFFIN-144
> URL: https://issues.apache.org/jira/browse/GRIFFIN-144
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Affects Versions: 1.0.0-incubating
>Reporter: Sparsh Singhal
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> We have to mention content type while posting and getting data from elastic 
> search. Created a pull request on GitHub.
>  
> For Elastic Search 6 requirements: 
> [https://www.elastic.co/blog/strict-content-type-checking-for-elasticsearch-rest-requests]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[ANNOUNCE] Apache Griffin-0.2.0-incubating released

2018-05-15 Thread Lionel Liu

Hi all,

The Apache Griffin (incubating) team is pleased to announce the release of
Griffin 0.2.0-incubating.

Apache Griffin is data quality solution for modern data system,
it defines a standard process to define, measure data quality for
well-known dimensions.

The release is available at:
https://www.apache.org/dyn/closer.cgi/incubator/griffin

Thanks,

The Apache Griffin (incubating) team

=
*DISCLAIMER*
Apache Griffin is an effort undergoing incubation at The Apache Software
Foundation (ASF), sponsored by Incubator.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other successful
ASF projects.
While incubation status is not necessarily a reflection of the completeness
or stability of the code, it does indicate that the project has yet to be
fully endorsed by the ASF.

Re: Griffin DQ Metric Populated

2018-05-15 Thread Lionel Liu

Hi Karan,

That's cool you've made it. Thanks for your contribution.

Actually the code you mentioned has been fixed by someone else:
https://github.com/apache/incubator-griffin/tree/master/measure/src/main/scala/org/apache/griffin/measure/persist#L68

Thanks,
Lionel

On Tue, May 15, 2018 at 8:52 PM, Karan Gupta <karan.gu...@tavant.com> wrote:

> Hi Lionel,
>
>
>
> We were finally able to populate the Griffin DQ Metrics and view the same
> on the Griffin Console.
>
>
>
> Here is the final adjustment I made in the HttpPersist.scala -> def
> httpResult
>
>
>
> Patching -> val header = Map[String, Object]("Content-Type"->"
> application/json")
>
>
>
> Reference -> curl -X POST "http:///griffin/accuracy/" -H
> "Content-Type:application/json" -d @
>
>
>
> The above patching worked and now I am able to see the Griffin DQ Metrics.
>
>
>
>
>
> We are using Spark 1.6.2 in our HDP cluster.. May be, could that be the
> reason?
>
> The scalaj.http libraries that we use, probably are older… may be…….Not
> sure..
>
>
>
> Anyway, All is well that ends well 
>
>
>
> A Big THANKS to @Lionel Liu <lionel...@apache.org> for all his help
>
>
>
> Thank you,
>
> Karan Gupta
> --
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>

Re:RE: No Index Formation in Elastic Search

2018-05-12 Thread Lionel Liu

Hi Karan,


That's great it works for you.
Actually for using ES, as I know, we don't have to create indices first before 
we insert any data. ES could generate the index and schema when inserting the 
first value. 
In our environment, we didn't create any index or mapping schema in ES, it 
could work as well. I don't know why it fails in your environment, which 
version of ES are you using?


This is why we didn't have any document for ES index. Maybe for different 
version of ES it performs different, if so, we'll have more investigation and 
fix it.


--

Regards,
Lionel, Liu



At 2018-05-11 17:42:22, "Karan Gupta" <karan.gu...@tavant.com> wrote:
>Hi Lionel,
>
>Thank you for your quick revert.
>
>I recreated the ES index as you suggested.
>I no more see any errors on Griffin console as I used to see earlier.
>But I don’t see any documents on the ES index either…
>The Jobs are running and completing though and HDFS is having the latest job 
>run metrics.
>
>Any suggestions here?
>
>Env.json has "method": "post" for ES persist part.
>Should it be POST?
>
>Thanks,
>Best,
>Karan
>From: Lionel Liu <lionel...@apache.org>
>Sent: Friday, May 11, 2018 3:45 PM
>To: Karan Gupta <karan.gu...@tavant.com>
>Cc: dev@griffin.incubator.apache.org
>Subject: Re: No Index Formation in Elastic Search
>
>Hi Karan,
>
>I've double checked my environment, sorry for the last reply, I pasted the old 
>version one.
>In the current version, the metric does like this:
>{
>  "name" : "accu_job",
>  "tmst" : 152481240,
>  "value" : {
>"total" : 125000,
>"miss" : 505,
>"matched" : 124495
>  }
>}
>
>I curl for mapping schema by this command:
>curl -XGET ':9200/_mapping?pretty=true'
>
>And get the schema like this:
>{
>  "griffin" : {
>"mappings" : {
>  "accuracy" : {
>"properties" : {
>  "name" : {
>"type" : "text",
>"fields" : {
>  "keyword" : {
>"type" : "keyword",
>"ignore_above" : 256
>  }
>}
>  },
>  "tmst" : {
>"type" : "long"
>  },
>  "value" : {
>"properties" : {
>  "matched" : {
>"type" : "long"
>  },
>  "miss" : {
>"type" : "long"
>  },
>  "total" : {
>"type" : "long"
>  }
>}
>  }
>}
>  }
>}
>  }
>}
>
>It's a bit different with the metrics persisted on hdfs, "name" equals 
>"metricName", "tmst" equals "timestamp", and the "value" fields are exactly 
>the same.
>{"metricName":"job_names","timestamp":152580492,"value":{"total":19,"miss":2,"matched":17}}
>
>For the details you can refer to:
>https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HdfsPersist.scala#L334<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHdfsPersist.scala%23L334=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0=iRLKXwRtUyjOkWFqjsLt86bEkuWP1%2Fs%2FXT5BtxOfA8w%3D=0>
>https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HttpPersist.scala#L110<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fpersist%2FHttpPersist.scala%23L110=01%7C01%7Ckaran.gupta%40tavant.com%7Ca05dc7bf691c427aae5908d5b7280334%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0=%2Fmy94uiDl0tS8jmMVGBSA0tAo%2Ftd2DzAPx%2FeKAaPnbQ%3D=0>
>
>There might be some modification in the later version, to refactor the metrics 
>schema, and will also be highlighted in release notes.
>
>
>Hope this helps you.
>
>Thanks,
>Lionel
>
>On Fri, May 11, 2018 at 5:52 PM, Karan Gupta 
><karan.gu...@tavant.com<mailto:karan.gu...@tavant.com>> wrote:

Re: No Index Formation in Elastic Search

2018-05-11 Thread Lionel Liu

Hi Karan,

I've double checked my environment, sorry for the last reply, I pasted the
old version one.
In the current version, the metric does like this:
{
  "name" : "accu_job",
  "tmst" : 152481240,
  "value" : {
"total" : 125000,
"miss" : 505,
"matched" : 124495
  }
}

I curl for mapping schema by this command:
curl -XGET ':9200/_mapping?pretty=true'

And get the schema like this:
{
  "griffin" : {
"mappings" : {
  "accuracy" : {
"properties" : {
  "name" : {
"type" : "text",
"fields" : {
  "keyword" : {
"type" : "keyword",
"ignore_above" : 256
  }
}
  },
  "tmst" : {
"type" : "long"
  },
  "value" : {
"properties" : {
  "matched" : {
"type" : "long"
  },
  "miss" : {
"type" : "long"
  },
  "total" : {
"type" : "long"
  }
}
  }
}
  }
}
  }
}

It's a bit different with the metrics persisted on hdfs, "name" equals
"metricName", "tmst" equals "timestamp", and the "value" fields are exactly
the same.

{"metricName":"job_names","timestamp":152580492,"
value":{"total":19,"miss":2,"matched":17}}


For the details you can refer to:

https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HdfsPersist.scala#L334

https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/persist/HttpPersist.scala#L110


There might be some modification in the later version, to refactor the
metrics schema, and will also be highlighted in release notes.


Hope this helps you.

Thanks,
Lionel

On Fri, May 11, 2018 at 5:52 PM, Karan Gupta <karan.gu...@tavant.com> wrote:

> Hi,
>
>
>
> Following is a sample JSON that is stored in HDFS by Griffin.
>
> It resides in : hdfs:///griffin/streaming/persist/job_names/
> 1525804920000/_METRICS
>
>
>
> There are also _LOG, _START, __missRecords files created for each Job. I
> assume they are not meant for storage in ES.
>
>
>
> Sample JSON:
>
> {"metricName":"job_names","timestamp":152580492,"
> value":{"total":19,"miss":2,"matched":17}}
>
>
>
> This does not match the “schema” that you have outlined below.
>
>
>
> Are we using an older version of Griffin? Can you help me with some
> clarity?
>
>
>
> Thanks,
>
> Best,
>
> Karan
>
> *From:* Lionel Liu <lionel...@apache.org>
> *Sent:* Wednesday, May 9, 2018 11:36 AM
> *To:* dev@griffin.incubator.apache.org; Karan Gupta <
> karan.gu...@tavant.com>
>
> *Subject:* Re: No Index Formation in Elastic Search
>
>
>
> Hi Karan,
>
>
>
> Sorry for the missing field "__tmst", which is the timestamp with each
> output value record.
>
> The mappings schema should be:
>
>
>
> {
>   "mappings": {
> "accuracy": {
>   "properties": {
>
> "name" : {"type": "keyword"},
>
> "tmst" : {"type": "long"},
> "value" : {
>   "properties": {
>
> "__tmst": {"type": "long"},
>
> "total": {"type": "long"},
> "miss": {"type": "long"},
> "matched": {"type": "long"}
>   }
> }
>   }
> }
>   }
> }
>
>
>
> Thanks,
>
> Lionel
>
>
>
> On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <karan.gu...@tavant.com>
> wrote:
>
> Hi Lionel,
>
> I tried the below CURL which you sent me
>
> curl -X PUT 'http:///griffin?pretty=true' -H 'Content-Type:
> application/json' -d  '{"mappings": {"accuracy": {"properties": {"name" :
> {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties":
> {"total"

[RESULT][VOTE] Release of Apache-Griffin-0.2.0-incubating [RC4]

2018-05-11 Thread Lionel Liu

Dear IPMC Community,

I am pleased to announce that the Incubator PMC has approved the release of
Apache Griffin-0.2.0-incubating.

The vote has passed with:
3 binding "+1" votes, and 4 non-binding "+1" votes
no "0" votes
no "-1" votes

The votes were

https://lists.apache.org/thread.html/668a9cc4402b4441bf75a7072ce284029669ef1334ed4395a7630be1@%3Cgeneral.incubator.apache.org%3E

+1, Henry Saputra (binding)
+1, Kasper Sørensen (binding)
+1, Justin Mclean (binding)

+1, Kevin Yao (non-binding)
    +1, William Guo (non-binding)
+1, Lionel Liu (non-binding)
+1, Shao Feng Shi (non-binding)

Thank you for your support!

We'll continue with the release now.

Lionel,
on behalf of Apache Griffin PPMC

[jira] [Assigned] (GRIFFIN-129) [Community] Build charter for Griffin community

2018-05-11 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu reassigned GRIFFIN-129:
--

Assignee: Alex Lv  (was: Juan Li)

> [Community] Build charter for Griffin community
> ---
>
> Key: GRIFFIN-129
> URL: https://issues.apache.org/jira/browse/GRIFFIN-129
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>Assignee: Alex Lv
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Make a charter for Griffin community
> References:
> [https://xerces.apache.org/charter.pdf]
> [https://hc.apache.org/charter.html]
> [https://commons.apache.org/oldcharter.html]
> [https://xalan.apache.org/old/xalan-c/charter.html]
> [https://xmlgraphics.apache.org/charter.html]
> [https://db.apache.org/derby/derby_charter.html#Secure]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (GRIFFIN-163) [Measure] Merge 0.2.0 for spark 2 into master branch

2018-05-11 Thread Lionel Liu (JIRA)

Lionel Liu created GRIFFIN-163:
--

 Summary: [Measure] Merge 0.2.0 for spark 2 into master branch
 Key: GRIFFIN-163
 URL: https://issues.apache.org/jira/browse/GRIFFIN-163
 Project: Griffin (Incubating)
  Issue Type: Task
Reporter: Lionel Liu
Assignee: Lionel Liu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-123) [Measure] Code review of measure module, refactor and enhance the code style

2018-05-11 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-123:
---
Sprint: Apache Sprint 2, Apache Sprint 4  (was: Apache Sprint 2, Apache 
Sprint 3)

> [Measure] Code review of measure module, refactor and enhance the code style
> 
>
> Key: GRIFFIN-123
> URL: https://issues.apache.org/jira/browse/GRIFFIN-123
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>    Reporter: Lionel Liu
>Assignee: William Guo
>Priority: Major
>   Original Estimate: 72h
>  Time Spent: 12h
>  Remaining Estimate: 60h
>
> Code review of measure module, refactor and enhance the code style



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (GRIFFIN-133) [UI] Add missing records download link on UI

2018-05-11 Thread Lionel Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lionel Liu updated GRIFFIN-133:
---
Sprint: Apache Sprint 4

> [UI] Add missing records download link on UI
> 
>
> Key: GRIFFIN-133
> URL: https://issues.apache.org/jira/browse/GRIFFIN-133
> Project: Griffin (Incubating)
>  Issue Type: Task
>    Reporter: Lionel Liu
>Assignee: Yuqin Xuan
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When user watch accuracy chart, provide missing records download link



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 >

1 - 100 of 345 matches

Mail list logo