[jira] [Commented] (GOBBLIN-187) Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk usage
[ https://issues.apache.org/jira/browse/GOBBLIN-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263899#comment-16263899 ] Joel Baranick commented on GOBBLIN-187: --- [~abti] Any ideas here? This ends up causing our EFS to keep growing, incurring more cost. > Gobblin Helix doesn't clean up `.job.state` files, causing unbounded disk > usage > --- > > Key: GOBBLIN-187 > URL: https://issues.apache.org/jira/browse/GOBBLIN-187 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Joel Baranick > > Then Gobblin is running on `GobblinHelixJobLauncher.createJob` method writes > the job state to a `.job.state` file. Nothing cleans up these files. The > result is unbounded disk usage. `.job.state` files should be deleted at the > completion of jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Azmal Sheik updated GOBBLIN-321: Description: I was trying to load csv file data to HDFS with below job conf But I'm facing class not found error, I have checked in lib/gobblin-core.jar the class TextFileBasedSource is present but it was saying class not found. Can anyone help over here Here is JOB,LOGS *JOB : * job.name=json-gobblin-hdfs job.group=Gobblin-Json-Demo job.description=Publishing JSON data from files to HDFS in Avro format. job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ job.lock.enabled=false distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ source.class=gobblin.source.extractor.filebased.TextFileBasedSource converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" writer.builder.class=gobblin.writer.AvroDataWriterBuilder source.entity= source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," extract.table.name=CsvToAvro extract.namespace=gobblin.example extract.table.type=APPEND_ONLY source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {#"name":"favorite_color", "type":"string"}]} gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy qualitychecker.row.policy.types=OPTIONAL data.publisher.type=gobblin.publisher.BaseDataPublisher writer.destination.type=HDFS writer.output.format=AVRO fs.uri=hdfs://:8020/ writer.fs.uri=hdfs://...:8020/ state.store.fs.uri=hdfs://:8020/ mr.job.root.dir=/user/ndxmetadata/output/working state.store.dir=/user/ndxmetadata/output/state-store writer.staging.dir=/user/ndxmetadata/output/task-staging writer.output.dir=/user/ndxmetadata/output/task-output data.publisher.final.dir=/user/ndxmetadata/output/ --- Log's attached below was: I was trying to load csv file data to HDFS with below job conf But I'm facing class not found error, I have checked in lib/gobblin-core.jar the class TextFileBasedSource is present but it was saying class not found. Can anyone help over here Here is JOB,LOGS *JOB : * job.name=json-gobblin-hdfs job.group=Gobblin-Json-Demo job.description=Publishing JSON data from files to HDFS in Avro format. job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ job.lock.enabled=false distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ source.class=gobblin.source.extractor.filebased.TextFileBasedSource converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" writer.builder.class=gobblin.writer.AvroDataWriterBuilder source.entity= source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," extract.table.name=CsvToAvro extract.namespace=gobblin.example extract.table.type=APPEND_ONLY # source data schema source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {#"name":"favorite_color", "type":"string"}]} gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," # quality checker configuration properties qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy qualitychecker.row.policy.types=OPTIONAL # data publisher class to be used data.publisher.type=gobblin.publisher.BaseDataPublisher # writer configuration properties writer.destination.type=HDFS writer.output.format=AVRO fs.uri=hdfs://:8020/ writer.fs.uri=hdfs://...:8020/ state.store.fs.uri=hdfs://:8020/ mr.job.root.dir=/user/ndxmetadata/output/working state.store.dir=/user/ndxmetadata/output/state-store writer.staging.dir=/user/ndxmetadata/output/task-staging writer.output.dir=/user/ndxmetadata/output/task-output data.publisher.final.dir=/user/ndxmetadata/output/ --- Log's attached below > CSV to HDFS ISSUE > - > > Key:
[jira] [Updated] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Azmal Sheik updated GOBBLIN-321: Attachment: job > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263892#comment-16263892 ] Azmal Sheik edited comment on GOBBLIN-321 at 11/23/17 7:01 AM: --- gobblin dist is 10 But I loaded 11 core and core-base jar in lib was (Author: sheik5azmal): 10 > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Baranick reassigned GOBBLIN-321: - Assignee: Joel Baranick > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263895#comment-16263895 ] Joel Baranick edited comment on GOBBLIN-321 at 11/23/17 7:07 AM: - >From your logs, the class you are loading is >{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 >doesn't use the apache namespaces. Compare >[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java] > to >[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java]. > You will see that the namespaces in master are all prefixed with >{{org.apache.}} because the gobblin was adopted as an apache incubator >project. The last release pre incubator is 0.11.0. was (Author: jbaranick): >From your logs, the class you are loading is >{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 >doesn't use the apache namespaces. Compare >[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java] > to >[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java] > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263895#comment-16263895 ] Joel Baranick commented on GOBBLIN-321: --- >From your logs, the class you are loading is >{{org.apache.gobblin.source.extractor.filebased.TextFileBasedSource}}, 0.11.0 >doesn't use the apache namespaces. Compare >[0.11.0|https://github.com/apache/incubator-gobblin/blob/gobblin_0.11.0/gobblin-core/src/main/java/gobblin/source/extractor/filebased/TextFileBasedSource.java] > to >[master|https://github.com/apache/incubator-gobblin/blob/master/gobblin-core/src/main/java/org/apache/gobblin/source/extractor/filebased/TextFileBasedSource.java] > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263897#comment-16263897 ] Joel Baranick commented on GOBBLIN-321: --- [~sheik5azmal] Where did you see to use the apache qualified namespace? Maybe some documentation is guiding people astray during this transition period. > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263898#comment-16263898 ] Azmal Sheik commented on GOBBLIN-321: - [~joelbarnard] I tried with both org.apache.gobblin.source.extractor.filebased.TextFileBasedSource and gobblin.source.extractor.filebased.TextFileBasedSource but saying same class not found. > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Azmal Sheik updated GOBBLIN-321: Description: I was trying to load csv file data to HDFS with below job conf But I'm facing class not found error, I have checked in lib/gobblin-core.jar the class TextFileBasedSource is present but it was saying class not found. Can anyone help over here Here is JOB,LOGS *JOB : * job.name=json-gobblin-hdfs job.group=Gobblin-Json-Demo job.description=Publishing JSON data from files to HDFS in Avro format. job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ job.lock.enabled=false distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ source.class=gobblin.source.extractor.filebased.TextFileBasedSource converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" writer.builder.class=gobblin.writer.AvroDataWriterBuilder source.entity= source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," extract.table.name=CsvToAvro extract.namespace=gobblin.example extract.table.type=APPEND_ONLY # source data schema source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {#"name":"favorite_color", "type":"string"}]} gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," # quality checker configuration properties qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy qualitychecker.row.policy.types=OPTIONAL # data publisher class to be used data.publisher.type=gobblin.publisher.BaseDataPublisher # writer configuration properties writer.destination.type=HDFS writer.output.format=AVRO fs.uri=hdfs://:8020/ writer.fs.uri=hdfs://...:8020/ state.store.fs.uri=hdfs://:8020/ mr.job.root.dir=/user/ndxmetadata/output/working state.store.dir=/user/ndxmetadata/output/state-store writer.staging.dir=/user/ndxmetadata/output/task-staging writer.output.dir=/user/ndxmetadata/output/task-output data.publisher.final.dir=/user/ndxmetadata/output/ --- Log's attached below was: I was trying to load csv file data to HDFS with below job conf But I'm facing class not found error, I have checked in lib/gobblin-core.jar the class TextFileBasedSource is present but it was saying class not found. Can anyone help over here Here is JOB,LOGS *JOB : * ## job configuration file ## job.name=json-gobblin-hdfs job.group=Gobblin-Json-Demo job.description=Publishing JSON data from files to HDFS in Avro format. job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ job.lock.enabled=false distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ source.class=gobblin.source.extractor.filebased.TextFileBasedSource converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" writer.builder.class=gobblin.writer.AvroDataWriterBuilder source.entity= source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," extract.table.name=CsvToAvro extract.namespace=gobblin.example extract.table.type=APPEND_ONLY # source data schema source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {#"name":"favorite_color", "type":"string"}]} gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," # quality checker configuration properties qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy qualitychecker.row.policy.types=OPTIONAL # data publisher class to be used data.publisher.type=gobblin.publisher.BaseDataPublisher # writer configuration properties writer.destination.type=HDFS writer.output.format=AVRO fs.uri=hdfs://:8020/ writer.fs.uri=hdfs://...:8020/ state.store.fs.uri=hdfs://:8020/ mr.job.root.dir=/user/ndxmetadata/output/working state.store.dir=/user/ndxmetadata/output/state-store writer.staging.dir=/user/ndxmetadata/output/task-staging writer.output.dir=/user/ndxmetadata/output/task-output
[jira] [Updated] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Azmal Sheik updated GOBBLIN-321: Description: I was trying to load csv file data to HDFS with below job conf But I'm facing class not found error, I have checked in lib/gobblin-core.jar the class TextFileBasedSource is present but it was saying class not found. Can anyone help over here Here is JOB,LOGS *JOB : * job.name=json-gobblin-hdfs job.group=Gobblin-Json-Demo job.description=Publishing JSON data from files to HDFS in Avro format. job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ job.lock.enabled=false distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ source.class=gobblin.source.extractor.filebased.TextFileBasedSource converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" writer.builder.class=gobblin.writer.AvroDataWriterBuilder source.entity= source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," extract.table.name=CsvToAvro extract.namespace=gobblin.example extract.table.type=APPEND_ONLY # source data schema source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {#"name":"favorite_color", "type":"string"}]} gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," # quality checker configuration properties qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy qualitychecker.row.policy.types=OPTIONAL # data publisher class to be used data.publisher.type=gobblin.publisher.BaseDataPublisher # writer configuration properties writer.destination.type=HDFS writer.output.format=AVRO fs.uri=hdfs://:8020/ writer.fs.uri=hdfs://...:8020/ state.store.fs.uri=hdfs://:8020/ mr.job.root.dir=/user/ndxmetadata/output/working state.store.dir=/user/ndxmetadata/output/state-store writer.staging.dir=/user/ndxmetadata/output/task-staging writer.output.dir=/user/ndxmetadata/output/task-output data.publisher.final.dir=/user/ndxmetadata/output/ --- Log's attached below was: I was trying to load csv file data to HDFS with below job conf But I'm facing class not found error, I have checked in lib/gobblin-core.jar the class TextFileBasedSource is present but it was saying class not found. Can anyone help over here Here is JOB,LOGS *JOB : * job.name=json-gobblin-hdfs job.group=Gobblin-Json-Demo job.description=Publishing JSON data from files to HDFS in Avro format. job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ job.lock.enabled=false distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ source.class=gobblin.source.extractor.filebased.TextFileBasedSource converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" writer.builder.class=gobblin.writer.AvroDataWriterBuilder source.entity= source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," extract.table.name=CsvToAvro extract.namespace=gobblin.example extract.table.type=APPEND_ONLY # source data schema source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {#"name":"favorite_color", "type":"string"}]} gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," # quality checker configuration properties qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy qualitychecker.row.policy.types=OPTIONAL # data publisher class to be used data.publisher.type=gobblin.publisher.BaseDataPublisher # writer configuration properties writer.destination.type=HDFS writer.output.format=AVRO fs.uri=hdfs://:8020/ writer.fs.uri=hdfs://...:8020/ state.store.fs.uri=hdfs://:8020/ mr.job.root.dir=/user/ndxmetadata/output/working state.store.dir=/user/ndxmetadata/output/state-store writer.staging.dir=/user/ndxmetadata/output/task-staging writer.output.dir=/user/ndxmetadata/output/task-output data.publisher.final.dir=/user/ndxmetadata/output/
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263891#comment-16263891 ] Joel Baranick commented on GOBBLIN-321: --- What version of gobblin? > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263892#comment-16263892 ] Azmal Sheik commented on GOBBLIN-321: - 10 > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263906#comment-16263906 ] Azmal Sheik commented on GOBBLIN-321: - [~jbaranick] No Problem Thanks for quick response :) > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Azmal Sheik updated GOBBLIN-321: Attachment: job.txt > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Azmal Sheik updated GOBBLIN-321: Attachment: (was: job) > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GOBBLIN-321) CSV to HDFS ISSUE
[ https://issues.apache.org/jira/browse/GOBBLIN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263904#comment-16263904 ] Joel Baranick commented on GOBBLIN-321: --- Well, if you are using <= 0.11.0, I would stick to the {{gobblin.}} namespaces. Also, I'd pick either 0.10.0 or 0.11.0 and try with that. > CSV to HDFS ISSUE > - > > Key: GOBBLIN-321 > URL: https://issues.apache.org/jira/browse/GOBBLIN-321 > Project: Apache Gobblin > Issue Type: Bug >Reporter: Azmal Sheik >Assignee: Joel Baranick >Priority: Critical > Labels: beginner, newbie, starter > Attachments: gobblin-current.log, job.txt > > > I was trying to load csv file data to HDFS with below job conf But I'm facing > class not found error, I have checked in lib/gobblin-core.jar the class > TextFileBasedSource is present but it was saying class not found. > Can anyone help over here > Here is JOB,LOGS > *JOB : > * > ## job configuration file ## > job.name=json-gobblin-hdfs > job.group=Gobblin-Json-Demo > job.description=Publishing JSON data from files to HDFS in Avro format. > job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ > job.lock.enabled=false > distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ > source.class=gobblin.source.extractor.filebased.TextFileBasedSource > converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" > writer.builder.class=gobblin.writer.AvroDataWriterBuilder > source.entity= > source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > extract.table.name=CsvToAvro > extract.namespace=gobblin.example > extract.table.type=APPEND_ONLY > # source data schema > source.schema={"namespace":"example.avro", "type":"record", "name":"User", > "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", > "type":"int"}, {#"name":"favorite_color", "type":"string"}]} > gobblin.converter.schemaInjector.schema=SCHEMA > converter.csv.to.json.delimiter="," > # quality checker configuration properties > qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy > qualitychecker.task.policy.types=OPTIONAL,OPTIONAL > qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy > qualitychecker.row.policy.types=OPTIONAL > # data publisher class to be used > data.publisher.type=gobblin.publisher.BaseDataPublisher > # writer configuration properties > writer.destination.type=HDFS > writer.output.format=AVRO > fs.uri=hdfs://:8020/ > writer.fs.uri=hdfs://...:8020/ > state.store.fs.uri=hdfs://:8020/ > mr.job.root.dir=/user/ndxmetadata/output/working > state.store.dir=/user/ndxmetadata/output/state-store > writer.staging.dir=/user/ndxmetadata/output/task-staging > writer.output.dir=/user/ndxmetadata/output/task-output > data.publisher.final.dir=/user/ndxmetadata/output/ > --- > Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-321) CSV to HDFS ISSUE
Azmal Sheik created GOBBLIN-321: --- Summary: CSV to HDFS ISSUE Key: GOBBLIN-321 URL: https://issues.apache.org/jira/browse/GOBBLIN-321 Project: Apache Gobblin Issue Type: Bug Reporter: Azmal Sheik Priority: Critical Attachments: gobblin-current.log I was trying to load csv file data to HDFS with below job conf But I'm facing class not found error, I have checked in lib/gobblin-core.jar the class TextFileBasedSource is present but it was saying class not found. Can anyone help over here Here is JOB,LOGS *JOB : * ## job configuration file ## job.name=json-gobblin-hdfs job.group=Gobblin-Json-Demo job.description=Publishing JSON data from files to HDFS in Avro format. job.jars=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/lib/ job.lock.enabled=false distcp.persist.dir=/home/ndxmetadata/Ravi/Gobblin/gobblin-dist/ source.class=gobblin.source.extractor.filebased.TextFileBasedSource converter.classes="gobblin.converter.StringSchemaInjector,gobblin.converter.csv.CsvToJsonConverter,gobblin.converter.avro.JsonIntermediateToAvroConverter" writer.builder.class=gobblin.writer.AvroDataWriterBuilder source.entity= source.filebased.data.directory=file://home/ndxmetadata/Ravi/Gobblin/sample gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," extract.table.name=CsvToAvro extract.namespace=gobblin.example extract.table.type=APPEND_ONLY # source data schema source.schema={"namespace":"example.avro", "type":"record", "name":"User", "fields":[{"name":"name", "type":"string"}, {"name":"favorite_number", "type":"int"}, {#"name":"favorite_color", "type":"string"}]} gobblin.converter.schemaInjector.schema=SCHEMA converter.csv.to.json.delimiter="," # quality checker configuration properties qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL qualitychecker.row.policies=gobblin.policies.schema.SchemaRowCheckPolicy qualitychecker.row.policy.types=OPTIONAL # data publisher class to be used data.publisher.type=gobblin.publisher.BaseDataPublisher # writer configuration properties writer.destination.type=HDFS writer.output.format=AVRO fs.uri=hdfs://:8020/ writer.fs.uri=hdfs://...:8020/ state.store.fs.uri=hdfs://:8020/ mr.job.root.dir=/user/ndxmetadata/output/working state.store.dir=/user/ndxmetadata/output/state-store writer.staging.dir=/user/ndxmetadata/output/task-staging writer.output.dir=/user/ndxmetadata/output/task-output data.publisher.final.dir=/user/ndxmetadata/output/ --- Log's attached below -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GOBBLIN-320) Add metrics to GobblinHelixJobScheduler
Kuai Yu created GOBBLIN-320: --- Summary: Add metrics to GobblinHelixJobScheduler Key: GOBBLIN-320 URL: https://issues.apache.org/jira/browse/GOBBLIN-320 Project: Apache Gobblin Issue Type: Improvement Reporter: Kuai Yu Assignee: Kuai Yu -- This message was sent by Atlassian JIRA (v6.4.14#64029)