[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=468374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468374
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 10/Aug/20 02:53
Start Date: 10/Aug/20 02:53
Worklog Time Spent: 10m 
  Work Description: guoyuepeng commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-671141520


   @chitralverma 
   
   The merge process as following:
   use python 2.7
   
   - run ./merge_pr.py
   - Which pull request would you like to merge? (e.g. 34): 575 - select 575
   - Proceed with merging pull request #575? (y/n): y
   - Merge complete (local ref PR_TOOL_MERGE_PR_575_MASTER). Push to 
apache-git? (y/n): y
   - Would you like to pick 1aa8995a into another branch? (y/n): n
   - Would you like to update an associated JIRA? (y/n): y
   - Enter comma-separated fix version(s) [0.6.0]:
   
   You should have permission for this. tell me if you encounter any problem.
   
   Thanks,
   William



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468374)
Time Spent: 2h 10m  (was: 2h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=468373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-468373
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 10/Aug/20 02:50
Start Date: 10/Aug/20 02:50
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #575:
URL: https://github.com/apache/griffin/pull/575


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 468373)
Time Spent: 2h  (was: 1h 50m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=467097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467097
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 06/Aug/20 06:41
Start Date: 06/Aug/20 06:41
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-669736481


   absolutely, I'm all in for Griffin. :)
   @wankunde can you please merge this. 
   
   Also, can you tell me how the requests are merged for this project so that I 
can help close some of the open PRs. Thanks.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467097)
Time Spent: 1h 50m  (was: 1h 40m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=467088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-467088
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 06/Aug/20 06:20
Start Date: 06/Aug/20 06:20
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-669727067


   LGTM, @chitralverma Many thanks for your work.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 467088)
Time Spent: 1h 40m  (was: 1.5h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=452026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452026
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 28/Jun/20 12:25
Start Date: 28/Jun/20 12:25
Worklog Time Spent: 10m 
  Work Description: wankunde commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-650745454


   Hi, @chitralverma , could you provide an implementation example of the 
`open` and `close` methods?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452026)
Time Spent: 1h 10m  (was: 1h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450431
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 13:17
Start Date: 24/Jun/20 13:17
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-648813731


   @wankunde the `open` and `close` methods are for future custom sinks 
implementations, for example, Redis, JDBC etc that do not rely on spark 
datasource v1/ v2. Such data sources require one-time initialization of 
connection/ connection pool which can then be serialized to all executor each 
time the write operation is called.
   
   This PR also acts as a basic cleanup for structured streaming sinks which 
I'm working on.
   I'm also planning to rewrite HDFSSink as FileBasedSink much like 
FileBasedDataConnector and include many other sinks.
   
   Griffin is going to get really exciting.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450431)
Time Spent: 1h  (was: 50m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450428
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 13:11
Start Date: 24/Jun/20 13:11
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #575:
URL: https://github.com/apache/griffin/pull/575#discussion_r444881499



##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String

Review comment:
   absolutely, I had the same in mind but I was planning to do it as part 
of a separate config refactoring in the near future.
   
   Do you suggest I do this right now or later? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450428)
Time Spent: 50m  (was: 40m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450426
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 13:10
Start Date: 24/Jun/20 13:10
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on a change in pull request #575:
URL: https://github.com/apache/griffin/pull/575#discussion_r444880634



##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String
   val timeStamp: Long
 
   val config: Map[String, Any]
 
   val block: Boolean
 
-  def available(): Boolean
+  /**
+   * Ensures that the pre-requisites (if any) of the Sink are met before 
opening it.
+   */
+  def validate(): Boolean
 
-  def start(msg: String): Unit
-  def finish(): Unit
+  /**
+   * Allows initialization of the connection to the sink (if required).
+   *
+   * @param applicationId Spark Application ID
+   */
+  def open(applicationId: String): Unit

Review comment:
   @wankunde this is as per the existing implementation. I just changed the 
variable names to remove ambiguity and made no functional change. This has been 
done in many other places also.
   
   I'll refactor the applicationId in favor of more description soon.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450426)
Time Spent: 40m  (was: 0.5h)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=450388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450388
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 24/Jun/20 11:05
Start Date: 24/Jun/20 11:05
Worklog Time Spent: 10m 
  Work Description: wankunde commented on a change in pull request #575:
URL: https://github.com/apache/griffin/pull/575#discussion_r444802163



##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String
   val timeStamp: Long
 
   val config: Map[String, Any]
 
   val block: Boolean
 
-  def available(): Boolean
+  /**
+   * Ensures that the pre-requisites (if any) of the Sink are met before 
opening it.
+   */
+  def validate(): Boolean
 
-  def start(msg: String): Unit
-  def finish(): Unit
+  /**
+   * Allows initialization of the connection to the sink (if required).
+   *
+   * @param applicationId Spark Application ID
+   */
+  def open(applicationId: String): Unit

Review comment:
   What's the use of `applicationId `?  Can we use `jobName` instead?

##
File path: measure/src/main/scala/org/apache/griffin/measure/sink/Sink.scala
##
@@ -18,30 +18,57 @@
 package org.apache.griffin.measure.sink
 
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
 
 import org.apache.griffin.measure.Loggable
 
 /**
- * sink metric and record
+ * Base trait for batch and Streaming Sinks.
+ * To implement custom sinks, extend your classes with this trait.
  */
 trait Sink extends Loggable with Serializable {
-  val metricName: String
+
+  val jobName: String

Review comment:
   It's better to unify the names of variable, and easier to understand.
   
   In `DQConfig` is `name`, in `BatchDQApp` is `metricName`, in `DQContext` is 
`name`, in `SinkFactory` is 
   `jobName`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 450388)
Time Spent: 0.5h  (was: 20m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=447980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447980
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 18/Jun/20 18:48
Start Date: 18/Jun/20 18:48
Worklog Time Spent: 10m 
  Work Description: chitralverma commented on pull request #575:
URL: https://github.com/apache/griffin/pull/575#issuecomment-646243010


   @guoyuepeng @wankunde Can you please review this. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447980)
Time Spent: 20m  (was: 10m)

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (GRIFFIN-305) Standardize Sink Hierarchy

2020-06-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GRIFFIN-305?focusedWorklogId=447342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-447342
 ]

ASF GitHub Bot logged work on GRIFFIN-305:
--

Author: ASF GitHub Bot
Created on: 17/Jun/20 15:06
Start Date: 17/Jun/20 15:06
Worklog Time Spent: 10m 
  Work Description: chitralverma opened a new pull request #575:
URL: https://github.com/apache/griffin/pull/575


   **What changes were proposed in this pull request?**
   
   Currently, the implementation of `Sinks` in Griffin poses the below issues. 
This PR aims at fixing these issues.
   - `Sinks` are based on the recursive MultiSink class which is a sink itself 
but the underlying implementation is that of a `Seq` which causes ambiguity and 
isn't much useful. This has been removed.
   - Some unused code like `SinkContext` has been removed.
   - Data is converted from the performant DataFrame to RDD while persisting in 
both streaming and batch pipelines. A new method `sinkBatchRecords` has been 
added to allow operations directly on DataFrame for batch pipelines. Streaming 
will still use the old implementation which will be replaced with structured 
streaming.
   - Refactored the methods of `Sink` like changed `start`/ `finish` to `open`/ 
`close` and `jobName` was incorrectly passed as `metricName`.
   - Presently, only one instance of a sink with a given type can be defined in 
the env config. This will not allow the cases where you want to configure 
multiple sinks of same type like HDFS or JDBC. Added sink `name` to env config 
which is used to define the sink that should be used in the job config also.
   - Updated all sinks as per the changes above. With some additional changes 
to ConsoleSink
   
   **Does this PR introduce any user-facing change?**
   Yes. As mentioned above, the sink config has changed in env and job configs.
   
   How was this patch tested?
   Griffin test suite and additional unit test cases



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 447342)
Remaining Estimate: 0h
Time Spent: 10m

> Standardize Sink Hierarchy
> --
>
> Key: GRIFFIN-305
> URL: https://issues.apache.org/jira/browse/GRIFFIN-305
> Project: Griffin
>  Issue Type: Sub-task
>Reporter: Chitral Verma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)