Fallon created BEAM-2761:
----------------------------

             Summary: Write to empty BigQuery partition fails with "No schema 
specified on job or table." despite having provided schema
                 Key: BEAM-2761
                 URL: https://issues.apache.org/jira/browse/BEAM-2761
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
    Affects Versions: 2.1.0, 2.2.0
            Reporter: Fallon
            Assignee: Thomas Groh
            Priority: Minor


In 2.1.0-SNAPSHOT and 2.2.0-SNAPSHOT, jobs writing an empty PCollection to a 
BigQuery partition fail with "java.lang.RuntimeException: Failed to create load 
job with id prefix". This is associated with a message "No schema specified on 
job or table" even though a schema is provided.

Command to run job:
{code}
mvn compile exec:java 
-Dexec.mainClass=org.apache.beam.examples.EmptyPCollection \
     -Dexec.args="--runner=DataflowRunner --project=<GCP project> \
                  --gcpTempLocation=<tmp location>" \
     -Pdataflow-runner
{code}

Code to reproduce the problem:
{code:title=EmptyPCollection.java|borderStyle=solid}
public class EmptyPCollection {

  public static void main(String[] args) {

    PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
    options.setTempLocation("<your tmp location>");
    Pipeline pipeline = Pipeline.create(options);

    String schema = "{\"fields\": [{\"name\": \"pet\", \"type\": \"string\", 
\"mode\": \"required\"}]}";
    String table = "mydataset.pets";
    List<String> pets = Arrays.asList("Dog", "Cat", "Goldfish");
    PCollection<String> inputText = 
pipeline.apply(Create.of(pets)).setCoder(StringUtf8Coder.of());
    PCollection<TableRow> rows = inputText.apply(ParDo.of(new DoFn<String, 
TableRow>() {
      @ProcessElement
      public void processElement(ProcessContext c) {
        String text = c.element();
        if (text.startsWith("X")) {  // change to (D)og and works fine
          TableRow row = new TableRow();
          row.set("pet", text);
          c.output(row);
        }
      }
    }));

    rows.apply(BigQueryIO.writeTableRows().to(table).withJsonSchema(schema)
        .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
        
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

    pipeline.run().waitUntilFinish();

  }
}
{code}






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to