Hi Richard, Can you share a little more info about your environment? Here's a smattering of questions for which answers may be useful.
* What runner are you using? * What version of the SDK? * Does this reproduce in the DirectRunner? * Can you share a full reproduction? (e.g., in a github gist)? * What is happening on the machine(s) executing the job? Is there high CPU? Is the disk active? Etc. Thanks, Dan On Tue, Apr 4, 2017 at 9:33 AM, Richard Hanson <[email protected]> wrote: > I am testing apache beam to read/ write xml files. But I encounter a > problem that even the code is just to read a single xml file and write it > out without doing any transformation, the process seems to hang > indefinitely. The output looks like below: > > [pool-2-thread-5-ScalaTest-running-XmlSpec] INFO > org.apache.beam.sdk.io.FileBasedSource > - Matched 1 files for pattern /tmp/input/my.xml > [pool-6-thread-1] INFO org.apache.beam.sdk.io.Write - Initializing write > operation org.apache.beam.sdk.io.XmlSink$XmlWriteOperation@1c72df2c > > > The code basically do the following: > > val options = PipelineOptionsFactory.create > val p = Pipeline.create(options) > > > val xml = XmlSource.from[Record](new File("/tmp/input/my.xml"). > toPath.toString).withRootElement("RootE").withRecordElement("Record"). > withRecordClass(classOf[Record]) > > p.apply(Read.from(xml)).apply(Write.to(XmlSink.write(). > toFilenamePrefix("xml").ofRecordClass(classOf[Record]) > .withRootElement("RootE"))) > > p.run.waitUntilFinish > > > What part may be missing in my program? > > Thanks >
