Re: HDFSFileSource and distributed Apex questions

2017-05-02 Thread Lukasz Cwik
Moving this to user@beam.apache.org

In the latest snapshot version of Apache Beam, file based sources like
AvroIO/TextIO were updated to support reading from Hadoop, see
HadoopFileSystem
.
If your using 0.6.0 or older you'll need to stick with HDFSFileSource (soon
to be removed).

As for your error, it seems as though you have not setup the Hadoop
configuration by calling:
* 0.6.0 or older: Setting the configuration on the HDFSFileSource with
HDFSFileSource#withConfiguration

* latest snapshot: Setting the configuration on HadoopFileSystemOptions
with HadoopFileSystemOptions#setHdfsConfigruation



On Tue, May 2, 2017 at 3:40 PM, Sean Story 
wrote:

> Hi all,
>
> Super newb question here - I'm just getting started playing with beam, and
> wanted to check out its capabilities to run on Apex. So I tried to follow
> the directions here:
> https://beam.apache.org/documentation/runners/apex/ <
> https://beam.apache.org/documentation/runners/apex/>
>
> The directions were a little vague around using a file on hdfs "(example
> project needs to be modified to include HDFS file provider)"
> So I removed this line:
> p.apply("ReadLines", TextIO.Read.from(options.getInputFile()))
> and replaced it with these lines:
> HDFSFileSource source =
> HDFSFileSource.fromText(options.getInputFile());
> p.apply(Read.from(source))
> in the WordCount.java example class (with corresponding pom changes to
> pull in the requisite dependencies).
>
> and ended up running into:
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run(
> ExecJavaMojo.java:293)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalStateException: Unable to find any files
> matching /tmp/input/pom.xml
> at org.apache.beam.sdks.java.io.hdfs.repackaged.com.google.
> common.base.Preconditions.checkState(Preconditions.java:518)
> at org.apache.beam.sdk.io.hdfs.HDFSFileSource$7.run(
> HDFSFileSource.java:346)
> at org.apache.beam.sdk.io.hdfs.HDFSFileSource$7.run(
> HDFSFileSource.java:339)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1657)
> at org.apache.beam.sdk.io.hdfs.HDFSFileSource.validate(
> HDFSFileSource.java:339)
> at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:104)
> at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:89)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:488)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:402)
> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
> at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:161)
> at org.apache.beam.examples.WordCount.main(WordCount.java:186)
> ... 6 more
>
>
> My assumption is that this is because it was looking locally (rather than
> in HDFS) for my pom file, so I changed my input to explicitly point at
> hdfs, like:
> `--inputFile=hdfs:///user/sean.story/pom.xml`
> 
>
> which made me get this error:
>
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run(
> ExecJavaMojo.java:293)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.IOException: No FileSystem
> for scheme: hdfs
> at org.apache.beam.sdk.io.hdfs.HDFSFileSource.validate(
> HDFSFileSource.java:353)
> at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:104)
> at org.apache.beam.sdk.io.Read$Bounded.expand(Read.java:89)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:488)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:402)
> at 

Re: Slack channel invite

2017-05-02 Thread Ismaël Mejía
Done.

On Tue, May 2, 2017 at 1:05 PM, Josh Di Fabio  wrote:
> Please will someone kindly invite joshdifa...@gmail.com to the Beam slack
> channel?


Slack channel invite

2017-05-02 Thread Josh Di Fabio
Please will someone kindly invite joshdifa...@gmail.com to the Beam slack
channel?