Re: Regarding Beam Slack Channel

2017-03-27 Thread Gareth Western

Could I also have an invite please?


On 28. mars 2017 07:43, Jean-Baptiste Onofré wrote:
Oh my bad. I know there's some limitation about the number of 
members/history when using the free version.


Thanks !
Regards
JB

On 03/27/2017 10:27 PM, Dan Halperin wrote:
There are 106 members in the channel right now. I think the problem 
is the
number of outstanding invitations. (Folks, if you've got an invite -- 
please

accept!)

On Mon, Mar 27, 2017 at 4:39 AM, Jean-Baptiste Onofré mailto:j...@nanthrax.net>> wrote:

Hi,

As said during the week end, we reached the max number of members 
on the

Slack channel (90). We are checking different ways to increase that.
I will add you as soon as it's OK.
Sorry for the delay,

Regards
JB


On 03/27/2017 11:09 AM, Chenchong Qin wrote:

Hello

Can someone please add me to the Beam slack channel?

Thanks.


--
Jean-Baptiste Onofré
jbono...@apache.org 
http://blog.nanthrax.net
Talend - http://www.talend.com








Re: Apache Beam @ JavaZone 2017?

2017-02-13 Thread Gareth Western

Whoops, wrong URL sorry!

Try https://2017.javazone.no/speakers

:)


On 13. feb. 2017 16:21, Gareth Western wrote:
I'm helping with the organisation of JavaZone 2017, and would love to 
see a talk about Apache Beam there this year. Is there anyone on the 
list who would be interested in submitting a presentation?


More about the conference here: https://2017.javazone.no/speak

Feel free to contact me if you have any questions!

Kind regards,

Gareth





Apache Beam @ JavaZone 2017?

2017-02-13 Thread Gareth Western
I'm helping with the organisation of JavaZone 2017, and would love to 
see a talk about Apache Beam there this year. Is there anyone on the 
list who would be interested in submitting a presentation?


More about the conference here: https://2017.javazone.no/speak

Feel free to contact me if you have any questions!

Kind regards,

Gareth



Re: Debugging a failed Dataflow job

2017-01-24 Thread Gareth Western

Thanks Chris. That's useful to know.

I cross-posted the question to SO [1] and was told that the error 
message was because I had forgotten to authorise the dataflow API for 
this project in the Cloud Console (PEBCAK).


1. 
https://stackoverflow.com/questions/41815519/cloud-dataflow-job-failed-without-reason/



On 23. jan. 2017 23:34, Chris Fei wrote:
I believe that documentation refers to the 1.X releases of the Google 
Dataflow SDK. BlockingDataflowPipelineRunner exists there, but it 
doesn't exist in Beam 0.4.0. Take a look at the breaking changes for 
the 2.X releases at 
https://cloud.google.com/dataflow/release-notes/release-notes-java-2, 
which outline how to migrate away from BlockingDataflowPipelineRunner.


You should just be able to specify DataflowRunner in your options, and 
then add the --blockOnRun option.


Chris


On Mon, Jan 23, 2017, at 05:03 PM, Gareth Western wrote:


Thank you. How does one use the BlockingDataflowPipelineRunner with 
Beam?


Specifying it in the PipelineOptions results in an Exception with a 
message stating that it is not one of the supported pipeline runners:


"Exception in thread "main" java.lang.IllegalArgumentException: Class 
'com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner' 
does not implement PipelineRunner. Supported pipeline runners 
[DataflowRunner, DirectRunner, FlinkRunner, TestFlinkRunner]"



On 23. jan. 2017 22:31, Lukasz Cwik wrote:
Please take a look at 
https://cloud.google.com/dataflow/pipelines/logging#monitoring-pipeline-logs


On Mon, Jan 23, 2017 at 12:21 PM, Gareth Western 
mailto:gar...@garethwestern.com>> wrote:


I'm having trouble running my pipeline using the dataflow
runner. The job is submitted successfully:

Dataflow SDK version: 0.4.0
Submitted job: 2017-01-23_12_17_20-13351949104581541182

but in the Dataflow console it simply says " workflow failed". I
can't seem to find any more details regarding the cause of the
failure. Where should I be looking?







Re: Debugging a failed Dataflow job

2017-01-23 Thread Gareth Western

Thank you. How does one use the BlockingDataflowPipelineRunner with Beam?

Specifying it in the PipelineOptions results in an Exception with a 
message stating that it is not one of the supported pipeline runners:


"Exception in thread "main" java.lang.IllegalArgumentException: Class 
'com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner' 
does not implement PipelineRunner. Supported pipeline runners 
[DataflowRunner, DirectRunner, FlinkRunner, TestFlinkRunner]"



On 23. jan. 2017 22:31, Lukasz Cwik wrote:
Please take a look at 
https://cloud.google.com/dataflow/pipelines/logging#monitoring-pipeline-logs


On Mon, Jan 23, 2017 at 12:21 PM, Gareth Western 
mailto:gar...@garethwestern.com>> wrote:


I'm having trouble running my pipeline using the dataflow runner.
The job is submitted successfully:

Dataflow SDK version: 0.4.0
Submitted job: 2017-01-23_12_17_20-13351949104581541182

but in the Dataflow console it simply says " workflow failed". I
can't seem to find any more details regarding the cause of the
failure. Where should I be looking?






Debugging a failed Dataflow job

2017-01-23 Thread Gareth Western
I'm having trouble running my pipeline using the dataflow runner. The 
job is submitted successfully:


Dataflow SDK version: 0.4.0
Submitted job: 2017-01-23_12_17_20-13351949104581541182

but in the Dataflow console it simply says " workflow failed". I can't 
seem to find any more details regarding the cause of the failure. Where 
should I be looking?




Re: KafkaIO: reset topic for reading from the start with every run

2017-01-23 Thread Gareth Western
Thanks Thomas. I'll be sure to convey that in the demo. The Flink local 
runner performs nicely. I'm now setting up the Flink cluster for the 
next test.



On 23. jan. 2017 20:20, Thomas Groh wrote:
You should also generally expect the DirectRunner to be slower than 
production runners - the goals of the DirectRunner are primarily 
ensuring that Pipelines are portable to other production runners and 
enforcing the Beam Programming Model while enabling local iteration 
and development, and exposing bugs early. As a result, there is a 
relatively large amount of additional work done per-element that will 
slow the Pipeline, and consume additional local resources as the 
Pipeline executes.






Re: KafkaIO: reset topic for reading from the start with every run

2017-01-23 Thread Gareth Western
Actually I made a silly mistake in my configuration and had the wrong 
topic name. Tired eyes. Thanks for confirming the config!



On 23. jan. 2017 19:17, Raghu Angadi wrote:


On Mon, Jan 23, 2017 at 10:10 AM, Gareth Western 
mailto:gar...@garethwestern.com>> wrote:


*ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString(),**
**ConsumerConfig.CLIENT_ID_CONFIG,
"your_client_id",**
**ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,
"earliest"*


Out of these three, you only need the last one. It should work, what 
did you observe?




KafkaIO: reset topic for reading from the start with every run

2017-01-23 Thread Gareth Western

Apologies if this is slightly off-Beam-topic.

I'm setting up a small demo for some colleagues to parse a 21-million 
line CSV file using a few different runners (Direct, Flink, and Google 
Dataflow). The TextIO using the direct runner seems a bit slow, so I was 
wondering if it's perhaps related to the IO. So I pushed the CSV to a 
Kafka topic (one line per entry). Now to test the topic I'd like to read 
from the start every time. I'm quite new to Kafka, so I'm not sure how 
to "reset" it. A bit of searching indicates that something like this for 
the ConsumerProperties should be correct:


p.apply(KafkaIO.read()
.withBootstrapServers(options.getBrokerUrl())
.withTopics(ImmutableList.of(options.getTopic()))
.withValueCoder(StringUtf8Coder.of())
.updateConsumerProperties(
*ImmutableMap.of(**
**ConsumerConfig.GROUP_ID_CONFIG, 
UUID.randomUUID().toString(),**

**ConsumerConfig.CLIENT_ID_CONFIG, "your_client_id",**
**ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"**
**)*
))

But it didn't seem to work. Is there some other setting that I'm missing?

Kind regards,

Gareth