[jira] [Resolved] (GOBBLIN-6) Support eventual consistent filesystems like S3

2017-09-07 Thread Abhishek Tiwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Tiwari resolved GOBBLIN-6.
---
Resolution: Fixed

Issue resolved by pull request #1993
[https://github.com/apache/incubator-gobblin/pull/1993]

> Support eventual consistent filesystems like S3
> ---
>
> Key: GOBBLIN-6
> URL: https://issues.apache.org/jira/browse/GOBBLIN-6
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Abhishek Tiwari
>
> In our environment we use Gobblin for shipping logs to s3. Gobblin on it's 
> own works pretty well but at a couple of place it assumes the underlying fs 
> is consistent which is not true in some case like in some S3 region.
> To overcome this I added a couple of retry if the created dir/file fails to 
> exist right away after publish.
> The following changes were added:
> - Refactored RetryWriter to gobblin-core to be able use in WriterUtils and 
> not having circular dependencies.
> - Introducing mkdirsWithRecursivePermissionWithRetry to be able set retry if 
> directory does not exists right after creation on eventual consistent fs.
> - Adding retry to publisher (data.publisher.retry.enabled=true) like 
> TimestampDataPublisher, TimePartitinedDataPublisher to support eventual 
> consisteny filesystem targets
> - Tmp fs can be specified with compaction.tmp.fs in compaction job to be able 
> use hdfs for tmp fs and store result on S3. Earlier it was not possible to 
> use differnet fs for tmp and target.
> - Retry can be set for compaction if you don't want to fail right away if 
> directory fails to show up right away which can happen on eventual consistent 
> fs (compaction.retry.enabled=true)
> - Adding dataset name for compaction mr job name which makes significantly 
> easier to identify which compaction job belongs to which dataset.
> - Some minor modification to support non avro extensions
>  
> *Github Url* : https://github.com/linkedin/gobblin/pull/1993 
> *Github Reporter* : *treff7es* 
> *Github Created At* : 2017-07-03T13:36:42Z 
> *Github Updated At* : 2017-07-10T18:24:51Z 
> h3. Comments 
> 
> *treff7es* wrote on 2017-07-03T13:41:24Z : I split up my previous pull 
> request into multiple one (#1686) . This one is about supporting eventual 
> consistent filesystems. I will create a separate pull request for the json 
> specific changes which is a separate module. 
> This pr also contains change which will be needed for the following pr as 
> well to be able to support non .avro file extensions. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-312648695 
> 
> [~ibuenros] wrote on 2017-07-10T18:19:48Z : @htran1 can you take a look? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-314191228 
> 
> [~hutran] wrote on 2017-07-10T18:24:51Z : @abti, can you also take a look 
> since you reviewed the original PR that this was split from? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/1993#issuecomment-314192685



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GOBBLIN-246) Create an ConsistentFsWrapper which converts all inconsistent operations in S3 to a consistent version

2017-09-07 Thread Abhishek Tiwari (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158038#comment-16158038
 ] 

Abhishek Tiwari commented on GOBBLIN-246:
-

PR that originally added support for inconsistent FS operations: 
https://github.com/apache/incubator-gobblin/pull/1993 

> Create an ConsistentFsWrapper which converts all inconsistent operations in 
> S3 to a consistent version
> --
>
> Key: GOBBLIN-246
> URL: https://issues.apache.org/jira/browse/GOBBLIN-246
> Project: Apache Gobblin
>  Issue Type: Improvement
>  Components: gobblin-core
>Reporter: Abhishek Tiwari
>Assignee: Kuai Yu
>
> Create an ConsistentFsWrapper which converts all inconsistent operations in 
> S3 to a consistent version. This will avoid a lot of tmpFs creation and make 
> it more reusable for other modules.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-246) Create an ConsistentFsWrapper which converts all inconsistent operations in S3 to a consistent version

2017-09-07 Thread Abhishek Tiwari (JIRA)
Abhishek Tiwari created GOBBLIN-246:
---

 Summary: Create an ConsistentFsWrapper which converts all 
inconsistent operations in S3 to a consistent version
 Key: GOBBLIN-246
 URL: https://issues.apache.org/jira/browse/GOBBLIN-246
 Project: Apache Gobblin
  Issue Type: Improvement
  Components: gobblin-core
Reporter: Abhishek Tiwari
Assignee: Kuai Yu


Create an ConsistentFsWrapper which converts all inconsistent operations in S3 
to a consistent version. This will avoid a lot of tmpFs creation and make it 
more reusable for other modules.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (GOBBLIN-245) Create topic specific extract of a WorkUnit in KafkaSource

2017-09-07 Thread Hung Tran (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-245.
---
Resolution: Fixed

Issue resolved by pull request #2095
[https://github.com/apache/incubator-gobblin/pull/2095]

> Create topic specific extract of a WorkUnit in KafkaSource
> --
>
> Key: GOBBLIN-245
> URL: https://issues.apache.org/jira/browse/GOBBLIN-245
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Source:Kafka
>
> Current KafkaSource ignores topic specific configurations on creating Extract 
> of a WorkUnit. The task is to create the extract with topic specific 
> configurations if any or else job level configurations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Gobbin build failure when using custom flavor

2017-09-07 Thread Abhishek Tiwari
Is this still an issue?

On Wed, Aug 9, 2017 at 5:51 AM, Vicky Kak  wrote:

> Hi All,
>
> I have been building the tag gobblin_0.9.0 with kafka 9, here are the
> changes that I did
>
> 1)
> vicky@vicky-Latitude-E5570:~/git/apache/incubator-gobblin$ git diff
> diff --git a/gobblin-distribution/gobblin-flavor-custom.gradle
> b/gobblin-distribution/gobblin-flavor-custom.gradle
> index 5b363bd45..a890c4944 100644
> --- a/gobblin-distribution/gobblin-flavor-custom.gradle
> +++ b/gobblin-distribution/gobblin-flavor-custom.gradle
> @@ -1,6 +1,6 @@
>  dependencies {
>// Example jobs
>// compile project(':gobblin-example')
> -
> +  compile project(':gobblin-modules:gobblin-kafka-09')
>  }
>
> diff --git a/gobblin-runtime/src/test/resources/runtime_test/state_store/.keep
> b/gobblin-runtime/src/test/resources/runtime_test/state_store/.keep
> deleted file mode 100644
> index e69de29bb..0
> diff --git 
> a/gobblin-runtime/src/test/resources/runtime_test/writer_output/.keep
> b/gobblin-runtime/src/test/resources/runtime_test/writer_output/.keep
> deleted file mode 100644
> index e69de29bb..0
> diff --git 
> a/gobblin-runtime/src/test/resources/runtime_test/writer_staging/.keep
> b/gobblin-runtime/src/test/resources/runtime_test/writer_staging/.keep
> deleted file mode 100644
> index e69de29bb..0
> diff --git a/settings.gradle b/settings.gradle
> index c63463e0f..ac8db6940 100644
> --- a/settings.gradle
> +++ b/settings.gradle
> @@ -41,7 +41,7 @@ def modules = ['gobblin-admin',
>  // Disable jacoco for now as Kafka 0.8 is the default version and jacoco
> does not like the same classes
>  // being declared in different modules
>  def jacocoBlacklist =  new HashSet([
> -"gobblin-modules:gobblin-kafka-09"
> +"gobblin-modules:gobblin-kafka-08"
>  ])
>
>  modules.each { module ->
>
> 2) I also added the gobblin-flavor-custom.gradle in the root directory
> which is vicky@vicky-Latitude-E5570:~/git/apache/incubator-gobblin
>
> Finally I called
> ./gradlew -PgobblinFlavor=custom assemble
>
> It created the gobblin distribution with the Kakfka0.9.
>
> When I run the build which includes the test I got these issues
>
> 1) gobblin.metastore.DatabaseJobHistoryStoreV100Test:setUp failed with
> following error
>
> de.flapdoodle.embed.process.exceptions.DistributionException:
> java.io.IOException: File /home/vicky/.embedmysql/MySQL-
> 5.6/mysql-5.6.24-linux-glibc2.5-x86_64.tar.gz
> at de.flapdoodle.embed.process.runtime.Starter.prepare(
> Starter.java:69)
> at de.flapdoodle.embed.process.runtime.Starter.prepare(
> Starter.java:49)
> at com.wix.mysql.EmbeddedMysql.(EmbeddedMysql.java:39)
> at com.wix.mysql.EmbeddedMysql$Builder.start(EmbeddedMysql.java:131)
> at gobblin.metastore.testing.TestMetastoreDatabaseServer.(
> TestMetastoreDatabaseServer.java:94)
> at gobblin.metastore.testing.TestMetastoreDatabaseFactory.
> ensureDatabaseExists(TestMetastoreDatabaseFactory.java:72)
> at gobblin.metastore.testing.TestMetastoreDatabaseFactory.get(
> TestMetastoreDatabaseFactory.java:52)
> at gobblin.metastore.testing.TestMetastoreDatabaseFactory.get(
> TestMetastoreDatabaseFactory.java:47)
> at gobblin.metastore.DatabaseJobHistoryStoreTest.setUp(
> DatabaseJobHistoryStoreTest.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> I was able to resolve this by downloading the mysql from here
> https://downloads.mysql.com/archives/get/file/mysql-5.6.
> 24-linux-glibc2.5-x86_64.tar.gz
>
> And copying it at the location /home/vicky/.embedmysql/MySQL-5.6/.
>
> I did not investigate if we could fix it by some configuration, I just
> fixed it in this way.
>
> 2) Next failure is coming as
>
> 
> 
> :gobblin-yarn:test
> :gobblin-modules:gobblin-kafka-08:compileTestJava FAILED
> > Building 75% > :gobblin-runtime:test > 58 tests completed
>
> FAILURE: Build failed with an exception.
>
> * What went wrong:
> Execution failed for task ':gobblin-modules:gobblin-
> kafka-08:compileTestJava'.
> > Compilation failed; see the compiler error output for details.
>
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or
> --debug option to get more log output.
>
> BUILD FAILED
>
> Total time: 5 mins 19.311 secs
> 
> 
>
> Since I have blacklisted the kafka-08 in the setting.gradle so it might be
> giving the compilation error. Can someone try to run the build with the
> kafak-09 flavor on gobblin0.9 and let me know?
> I am expecting the build/run to work smoothly however it is not working as
> expected.
>
> Thanks,
> Vicky
>


[jira] [Resolved] (GOBBLIN-203) Postgresql Extractor

2017-09-07 Thread Abhishek Tiwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Tiwari resolved GOBBLIN-203.
-
Resolution: Fixed

Issue resolved by pull request #2054
[https://github.com/apache/incubator-gobblin/pull/2054]

> Postgresql Extractor
> 
>
> Key: GOBBLIN-203
> URL: https://issues.apache.org/jira/browse/GOBBLIN-203
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-sql
>Reporter: Tilak Patidar
>Assignee: Shirshanka Das
>Priority: Minor
>  Labels: features
>
> Gobblin provides support for JDBC compliant databases through JDBCExtractor 
> and JDBCSource. Postgresql is one of the most used RDBMS and adding a direct 
> implementation of it in Gobblin could be useful for users. This 
> implementation is inspired by the existing Mysql implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-245) Create topic specific extract of a WorkUnit in KafkaSource

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-245:
--
Description: Current KafkaSource ignores topic specific configurations on 
creating Extract of a WorkUnit. The task is to create the extract with topic 
specific configurations if any or else job level configurations.  (was: Current 
KafkaSource ignores topic specific configurations on creating Extract of a 
WorkUnit. The task is to create the extract with topic specific configurations 
if any and fallback to job level configurations otherwise.)

> Create topic specific extract of a WorkUnit in KafkaSource
> --
>
> Key: GOBBLIN-245
> URL: https://issues.apache.org/jira/browse/GOBBLIN-245
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Source:Kafka
>
> Current KafkaSource ignores topic specific configurations on creating Extract 
> of a WorkUnit. The task is to create the extract with topic specific 
> configurations if any or else job level configurations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-245) Create topic specific extract for a WorkUnit in KafkaSource

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-245:
--
Description: Current KafkaSource ignores topic specific configurations on 
creating Extract of a WorkUnit  (was: runOnce feature of Gobblin does not work 
correctly. A job without a schedule will be re-run upon Gobblin restart or 
modifications on the file. Additionally, `*.done` files are not being written.

Root cause:
In JobScheduler, Gobblin checks whether the job has a schedule in the method 
`scheduleJob(Properties, JobListener, Map, Class)` and sets the key 
`ConfigurationKeys.JOB_RUN_ONCE_KEY` accordingly. On the other hand, the method 
`scheduleGeneralConfiguredJobs()` checks the key 
`ConfigurationKeys.JOB_RUN_ONCE_KEY` and if runonce, creates the 
`RunOnceJobListener` that creates the `*.done` file. However, 
`scheduleGeneralConfiguredJobs()` is called before `scheduleJob(Properties, 
JobListener, Map, Class)`, so the property has not been set yet, and the 
`*.done` file is never written.

On Gobblin restart, Gobblin checks for presence of done files, and skips jobs 
that have already been executed. However, the done file is not present, so the 
job gets repeated.

 
*Github Url* : https://github.com/linkedin/gobblin/issues/1195 
*Github Reporter* : [~ibuenros] 
*Github Created At* : 2016-08-11T21:03:24Z 
*Github Updated At* : 2017-01-12T04:59:43Z)

> Create topic specific extract for a WorkUnit in KafkaSource
> ---
>
> Key: GOBBLIN-245
> URL: https://issues.apache.org/jira/browse/GOBBLIN-245
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Source:Kafka
>
> Current KafkaSource ignores topic specific configurations on creating Extract 
> of a WorkUnit



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-245) Create topic specific extract of a WorkUnit in KafkaSource

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-245:
--
Description: Current KafkaSource ignores topic specific configurations on 
creating Extract of a WorkUnit. The task is to create the extract with topic 
specific configurations if any and fallback to job level configurations 
otherwise.  (was: Current KafkaSource ignores topic specific configurations on 
creating Extract of a WorkUnit)

> Create topic specific extract of a WorkUnit in KafkaSource
> --
>
> Key: GOBBLIN-245
> URL: https://issues.apache.org/jira/browse/GOBBLIN-245
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Source:Kafka
>
> Current KafkaSource ignores topic specific configurations on creating Extract 
> of a WorkUnit. The task is to create the extract with topic specific 
> configurations if any and fallback to job level configurations otherwise.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-245) Create topic specific extract of a WorkUnit in KafkaSource

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-245:
--
Summary: Create topic specific extract of a WorkUnit in KafkaSource  (was: 
Create topic specific extract for a WorkUnit in KafkaSource)

> Create topic specific extract of a WorkUnit in KafkaSource
> --
>
> Key: GOBBLIN-245
> URL: https://issues.apache.org/jira/browse/GOBBLIN-245
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Source:Kafka
>
> Current KafkaSource ignores topic specific configurations on creating Extract 
> of a WorkUnit



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-245) Create topic specific extract for a WorkUnit in KafkaSource

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-245:
--
Issue Type: Task  (was: Bug)

> Create topic specific extract for a WorkUnit in KafkaSource
> ---
>
> Key: GOBBLIN-245
> URL: https://issues.apache.org/jira/browse/GOBBLIN-245
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Bug:Generic, Core:JobManagement, Core:Publisher
>
> runOnce feature of Gobblin does not work correctly. A job without a schedule 
> will be re-run upon Gobblin restart or modifications on the file. 
> Additionally, `*.done` files are not being written.
> Root cause:
> In JobScheduler, Gobblin checks whether the job has a schedule in the method 
> `scheduleJob(Properties, JobListener, Map, Class)` and sets the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` accordingly. On the other hand, the 
> method `scheduleGeneralConfiguredJobs()` checks the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` and if runonce, creates the 
> `RunOnceJobListener` that creates the `*.done` file. However, 
> `scheduleGeneralConfiguredJobs()` is called before `scheduleJob(Properties, 
> JobListener, Map, Class)`, so the property has not been set yet, and the 
> `*.done` file is never written.
> On Gobblin restart, Gobblin checks for presence of done files, and skips jobs 
> that have already been executed. However, the done file is not present, so 
> the job gets repeated.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1195 
> *Github Reporter* : [~ibuenros] 
> *Github Created At* : 2016-08-11T21:03:24Z 
> *Github Updated At* : 2017-01-12T04:59:43Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-87) Gobblin runOnce not working correctly

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-87:
-
Issue Type: Bug  (was: Task)

> Gobblin runOnce not working correctly
> -
>
> Key: GOBBLIN-87
> URL: https://issues.apache.org/jira/browse/GOBBLIN-87
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Issac Buenrostro
>Assignee: Zhixiong Chen
>  Labels: Core:JobManagement
>
> runOnce feature of Gobblin does not work correctly. A job without a schedule 
> will be re-run upon Gobblin restart or modifications on the file. 
> Additionally, `*.done` files are not being written.
> Root cause:
> In JobScheduler, Gobblin checks whether the job has a schedule in the method 
> `scheduleJob(Properties, JobListener, Map, Class)` and sets the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` accordingly. On the other hand, the 
> method `scheduleGeneralConfiguredJobs()` checks the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` and if runonce, creates the 
> `RunOnceJobListener` that creates the `*.done` file. However, 
> `scheduleGeneralConfiguredJobs()` is called before `scheduleJob(Properties, 
> JobListener, Map, Class)`, so the property has not been set yet, and the 
> `*.done` file is never written.
> On Gobblin restart, Gobblin checks for presence of done files, and skips jobs 
> that have already been executed. However, the done file is not present, so 
> the job gets repeated.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1195 
> *Github Reporter* : [~ibuenros] 
> *Github Created At* : 2016-08-11T21:03:24Z 
> *Github Updated At* : 2017-01-12T04:59:43Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-87) Gobblin runOnce not working correctly

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-87:
-
Issue Type: Task  (was: Bug)

> Gobblin runOnce not working correctly
> -
>
> Key: GOBBLIN-87
> URL: https://issues.apache.org/jira/browse/GOBBLIN-87
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Issac Buenrostro
>Assignee: Zhixiong Chen
>  Labels: Bug:Generic, Core:JobManagement, Core:Publisher
>
> runOnce feature of Gobblin does not work correctly. A job without a schedule 
> will be re-run upon Gobblin restart or modifications on the file. 
> Additionally, `*.done` files are not being written.
> Root cause:
> In JobScheduler, Gobblin checks whether the job has a schedule in the method 
> `scheduleJob(Properties, JobListener, Map, Class)` and sets the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` accordingly. On the other hand, the 
> method `scheduleGeneralConfiguredJobs()` checks the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` and if runonce, creates the 
> `RunOnceJobListener` that creates the `*.done` file. However, 
> `scheduleGeneralConfiguredJobs()` is called before `scheduleJob(Properties, 
> JobListener, Map, Class)`, so the property has not been set yet, and the 
> `*.done` file is never written.
> On Gobblin restart, Gobblin checks for presence of done files, and skips jobs 
> that have already been executed. However, the done file is not present, so 
> the job gets repeated.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1195 
> *Github Reporter* : [~ibuenros] 
> *Github Created At* : 2016-08-11T21:03:24Z 
> *Github Updated At* : 2017-01-12T04:59:43Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-87) Gobblin runOnce not working correctly

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-87:
-
External issue URL:   (was: https://github.com/linkedin/gobblin/issues/1195)

> Gobblin runOnce not working correctly
> -
>
> Key: GOBBLIN-87
> URL: https://issues.apache.org/jira/browse/GOBBLIN-87
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Issac Buenrostro
>Assignee: Zhixiong Chen
>  Labels: Source:Kafka
>
> runOnce feature of Gobblin does not work correctly. A job without a schedule 
> will be re-run upon Gobblin restart or modifications on the file. 
> Additionally, `*.done` files are not being written.
> Root cause:
> In JobScheduler, Gobblin checks whether the job has a schedule in the method 
> `scheduleJob(Properties, JobListener, Map, Class)` and sets the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` accordingly. On the other hand, the 
> method `scheduleGeneralConfiguredJobs()` checks the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` and if runonce, creates the 
> `RunOnceJobListener` that creates the `*.done` file. However, 
> `scheduleGeneralConfiguredJobs()` is called before `scheduleJob(Properties, 
> JobListener, Map, Class)`, so the property has not been set yet, and the 
> `*.done` file is never written.
> On Gobblin restart, Gobblin checks for presence of done files, and skips jobs 
> that have already been executed. However, the done file is not present, so 
> the job gets repeated.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1195 
> *Github Reporter* : [~ibuenros] 
> *Github Created At* : 2016-08-11T21:03:24Z 
> *Github Updated At* : 2017-01-12T04:59:43Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-87) Gobblin runOnce not working correctly

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-87:
-
Labels: Source:Kafka  (was: Bug:Generic Core:JobManagement Core:Publisher)

> Gobblin runOnce not working correctly
> -
>
> Key: GOBBLIN-87
> URL: https://issues.apache.org/jira/browse/GOBBLIN-87
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Issac Buenrostro
>Assignee: Zhixiong Chen
>  Labels: Source:Kafka
>
> runOnce feature of Gobblin does not work correctly. A job without a schedule 
> will be re-run upon Gobblin restart or modifications on the file. 
> Additionally, `*.done` files are not being written.
> Root cause:
> In JobScheduler, Gobblin checks whether the job has a schedule in the method 
> `scheduleJob(Properties, JobListener, Map, Class)` and sets the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` accordingly. On the other hand, the 
> method `scheduleGeneralConfiguredJobs()` checks the key 
> `ConfigurationKeys.JOB_RUN_ONCE_KEY` and if runonce, creates the 
> `RunOnceJobListener` that creates the `*.done` file. However, 
> `scheduleGeneralConfiguredJobs()` is called before `scheduleJob(Properties, 
> JobListener, Map, Class)`, so the property has not been set yet, and the 
> `*.done` file is never written.
> On Gobblin restart, Gobblin checks for presence of done files, and skips jobs 
> that have already been executed. However, the done file is not present, so 
> the job gets repeated.
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/1195 
> *Github Reporter* : [~ibuenros] 
> *Github Created At* : 2016-08-11T21:03:24Z 
> *Github Updated At* : 2017-01-12T04:59:43Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-245) Create topic specific extract for a WorkUnit in KafkaSource

2017-09-07 Thread Zhixiong Chen (JIRA)
Zhixiong Chen created GOBBLIN-245:
-

 Summary: Create topic specific extract for a WorkUnit in 
KafkaSource
 Key: GOBBLIN-245
 URL: https://issues.apache.org/jira/browse/GOBBLIN-245
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Zhixiong Chen
Assignee: Zhixiong Chen


runOnce feature of Gobblin does not work correctly. A job without a schedule 
will be re-run upon Gobblin restart or modifications on the file. Additionally, 
`*.done` files are not being written.

Root cause:
In JobScheduler, Gobblin checks whether the job has a schedule in the method 
`scheduleJob(Properties, JobListener, Map, Class)` and sets the key 
`ConfigurationKeys.JOB_RUN_ONCE_KEY` accordingly. On the other hand, the method 
`scheduleGeneralConfiguredJobs()` checks the key 
`ConfigurationKeys.JOB_RUN_ONCE_KEY` and if runonce, creates the 
`RunOnceJobListener` that creates the `*.done` file. However, 
`scheduleGeneralConfiguredJobs()` is called before `scheduleJob(Properties, 
JobListener, Map, Class)`, so the property has not been set yet, and the 
`*.done` file is never written.

On Gobblin restart, Gobblin checks for presence of done files, and skips jobs 
that have already been executed. However, the done file is not present, so the 
job gets repeated.

 
*Github Url* : https://github.com/linkedin/gobblin/issues/1195 
*Github Reporter* : [~ibuenros] 
*Github Created At* : 2016-08-11T21:03:24Z 
*Github Updated At* : 2017-01-12T04:59:43Z



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-244) Need additional info for gobblin tracking hourly-deduped

2017-09-07 Thread Kuai Yu (JIRA)
Kuai Yu created GOBBLIN-244:
---

 Summary: Need additional info for gobblin tracking hourly-deduped
 Key: GOBBLIN-244
 URL: https://issues.apache.org/jira/browse/GOBBLIN-244
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Kuai Yu
Assignee: Kuai Yu


Add the previous record count and the number of execution runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-238) Implement EnvelopePayloadConverter and EnvelopeSchemaDecorator

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-238:
--
Description: 
The current implementation of EnvelopeSchemaConverter has several flaws:
- Assumes top level payload schema field
- Output record is the schema'ed payload but output schema is a String

The task implements two types of EnvelopeSchemaConverter: 
EnvelopePayloadConverter and 

{code:java}
converter.envelopeSchemaConverter.schemaIdField="metadata.payloadSchemaId"
{code}


  was:
The current implementation of EnvelopeSchemaConverter only recognizes top level 
payload schema and bytes fields. This task is to support nested payload schema 
and bytes fields. For example, the converter will extract the payload with the 
following configurations:

{code:java}
EnvelopeSchemaConverter.schemaIdField="metadata.payloadSchemaId"
EnvelopeSchemaConverter.payloadField="nestedRecord.payload"
{code}



> Implement EnvelopePayloadConverter and EnvelopeSchemaDecorator
> --
>
> Key: GOBBLIN-238
> URL: https://issues.apache.org/jira/browse/GOBBLIN-238
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Core:Converter
>
> The current implementation of EnvelopeSchemaConverter has several flaws:
> - Assumes top level payload schema field
> - Output record is the schema'ed payload but output schema is a String
> The task implements two types of EnvelopeSchemaConverter: 
> EnvelopePayloadConverter and 
> {code:java}
> converter.envelopeSchemaConverter.schemaIdField="metadata.payloadSchemaId"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GOBBLIN-243) Metadata format change

2017-09-07 Thread Lei Sun (JIRA)
Lei Sun created GOBBLIN-243:
---

 Summary: Metadata format change
 Key: GOBBLIN-243
 URL: https://issues.apache.org/jira/browse/GOBBLIN-243
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Lei Sun
Assignee: Lei Sun






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (GOBBLIN-238) Implement EnvelopePayloadConverter and EnvelopeSchemaDecorator

2017-09-07 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/GOBBLIN-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated GOBBLIN-238:
--
Summary: Implement EnvelopePayloadConverter and EnvelopeSchemaDecorator  
(was: Support nested payloadSchemaId in EnvelopeSchemaConverter )

> Implement EnvelopePayloadConverter and EnvelopeSchemaDecorator
> --
>
> Key: GOBBLIN-238
> URL: https://issues.apache.org/jira/browse/GOBBLIN-238
> Project: Apache Gobblin
>  Issue Type: Task
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
>  Labels: Core:Converter
>
> The current implementation of EnvelopeSchemaConverter only recognizes top 
> level payload schema and bytes fields. This task is to support nested payload 
> schema and bytes fields. For example, the converter will extract the payload 
> with the following configurations:
> {code:java}
> EnvelopeSchemaConverter.schemaIdField="metadata.payloadSchemaId"
> EnvelopeSchemaConverter.payloadField="nestedRecord.payload"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Apache Gobblin first release

2017-09-07 Thread Jean-Baptiste Onofré

Hi,

it sounds a good timing for a first milestone release.

I'm ready to help you for this first release.
Let's chat together to prepare this.

Regards
JB

On 09/06/2017 11:45 AM, Abhishek Tiwari wrote:

Hi all,

Since we are well settled into the Apache world now with:
- All source license updated to Apache
- Source packages renamed to org.apache.gobblin
- Apache code review process established
- Apache issue tracking and mailing lists in full flow

I am proposing that we make our first release, and I am also volunteering
to be the first 'Release Manager'. So unless anyone feels otherwise I will
start the process. I will also document the steps along the way.

Regards,
Abhishek



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com