Apache Beam, version 0.6.0 with Python SDK

2017-03-16 Thread Ahmet Altay
The Apache Beam community is pleased to announce the availability of the
0.6.0 release [1].

This release introduces a new SDK for the Python programming language [2].
Additionally, the release adds a new IO connector for Apache HBase in the
Java SDK, along with a usual batch of bug fixes and improvements. Finally,
several runners improved their support for the Beam model, including
support for the recently-introduced State and Timer API, and Beam’s
connectors to distributed file systems. For all major changes in this
release, please refer to the release notes [3].

The 0.6.0 release is now the recommended version; we encourage everyone to
upgrade from any earlier releases.

We thank all users and contributors who have helped make this release
possible. If you haven't already, we'd like to invite you to join us, as we
work towards our first release with API stability.

- Ahmet Altay, on behalf of the Apache Beam community.

[1] https://beam.apache.org/get-started/downloads/

[2] https://beam.apache.org/blog/2017/03/16/python-sdk-release.html
[3]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12339256


Re: Approach to writing to Redis in Streaming Pipeline

2017-03-16 Thread Ismaël Mejía
Hello,

Probably it is not worth the effort to write a new RedisIO from zero
considering there is an ongoing Pull Request for this.

https://github.com/apache/beam/pull/1687

Maybe you can take a look if the current WIP is enough for your needs,
and eventually give a hand there to improve it if it is not the case.

Regards,
Ismaël


On Friday, March 17, 2017, sowmya balasubramanian
 wrote:
>
> Thanks a ton Raghu and Eugene! The Setup and Teardown is what I was looking 
> for. I will try it and see how it goes.
>
> Regards,
> Sowmya
>
> On Thu, Mar 16, 2017 at 6:02 PM, Eugene Kirpichov  
> wrote:
>>
>> Yes, please use a ParDo. The Custom Sink API is not intended for use with 
>> unbounded collections (i.e. in pretty much any streaming pipelines) and it's 
>> generally due for a redesign. ParDo is currently almost always a better 
>> choice when you want to implement a connector writing data to a third-party 
>> system, unless you're just implementing export to a particular file format 
>> (in which case FileBasedSink is appropriate).
>>
>> Concur with what Raghu said about @Setup/@Teardown.
>>
>> On Thu, Mar 16, 2017 at 3:02 PM Raghu Angadi  wrote:
>>>
>>> ParDo is ok.
>>>
>>> Do you open a connection in each processElement() invocation? If you can 
>>> reuse the connection, you can open once in @Setup method and close it in 
>>> @Teardown.
>>>
>>> Raghu.
>>>
>>> On Thu, Mar 16, 2017 at 2:19 PM, sowmya balasubramanian 
>>>  wrote:

 Hi All,

 I am newbie who has recently entered the world of GCP and pipelines.

 I have a streaming pipeline in which I write to a Redis sink at the end. 
 The pipeline writes about 60,000 events per 15 minute window it processes. 
 I implemented the writing to Redis using a ParDo.

 The prototype worked well for small set of streaming events. However, when 
 I tested with my full dataset, every now and then I noticed the Redis 
 client (Jedis) threw a SocketException. (The client opens connection every 
 time it has to write to Redis, then closes the connection)

 Couple of questions I have:

 Is there a preferred Redis client for the pipeline?
 Does it make sense to write a Custom Redis sink instead of a ParDo?

 Thanks,
 Sowmya





>>>
>


Re: Approach to writing to Redis in Streaming Pipeline

2017-03-16 Thread sowmya balasubramanian
Thanks a ton Raghu and Eugene! The Setup and Teardown is what I was looking
for. I will try it and see how it goes.

Regards,
Sowmya

On Thu, Mar 16, 2017 at 6:02 PM, Eugene Kirpichov 
wrote:

> Yes, please use a ParDo. The Custom Sink API is not intended for use with
> unbounded collections (i.e. in pretty much any streaming pipelines) and
> it's generally due for a redesign. ParDo is currently almost always a
> better choice when you want to implement a connector writing data to a
> third-party system, unless you're just implementing export to a particular
> file format (in which case FileBasedSink is appropriate).
>
> Concur with what Raghu said about @Setup/@Teardown.
>
> On Thu, Mar 16, 2017 at 3:02 PM Raghu Angadi  wrote:
>
>> ParDo is ok.
>>
>> Do you open a connection in each processElement() invocation? If you can
>> reuse the connection, you can open once in @Setup method and close it in
>> @Teardown.
>>
>> Raghu.
>>
>> On Thu, Mar 16, 2017 at 2:19 PM, sowmya balasubramanian <
>> sowmya.conn...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I am newbie who has recently entered the world of GCP and pipelines.
>>
>> I have a streaming pipeline in which I write to a Redis sink at the end.
>> The pipeline writes about 60,000 events per 15 minute window it processes.
>> I implemented the writing to Redis using a ParDo.
>>
>> The prototype worked well for small set of streaming events. However,
>> when I tested with my full dataset, every now and then I noticed the Redis
>> client (Jedis) threw a SocketException. (The client opens connection every
>> time it has to write to Redis, then closes the connection)
>>
>> Couple of questions I have:
>>
>>1. Is there a preferred Redis client for the pipeline?
>>2. Does it make sense to write a Custom Redis sink instead of a ParDo?
>>
>> Thanks,
>> Sowmya
>>
>>
>>
>>
>>
>>
>>


Re: Approach to writing to Redis in Streaming Pipeline

2017-03-16 Thread Eugene Kirpichov
Yes, please use a ParDo. The Custom Sink API is not intended for use with
unbounded collections (i.e. in pretty much any streaming pipelines) and
it's generally due for a redesign. ParDo is currently almost always a
better choice when you want to implement a connector writing data to a
third-party system, unless you're just implementing export to a particular
file format (in which case FileBasedSink is appropriate).

Concur with what Raghu said about @Setup/@Teardown.

On Thu, Mar 16, 2017 at 3:02 PM Raghu Angadi  wrote:

> ParDo is ok.
>
> Do you open a connection in each processElement() invocation? If you can
> reuse the connection, you can open once in @Setup method and close it in
> @Teardown.
>
> Raghu.
>
> On Thu, Mar 16, 2017 at 2:19 PM, sowmya balasubramanian <
> sowmya.conn...@gmail.com> wrote:
>
> Hi All,
>
> I am newbie who has recently entered the world of GCP and pipelines.
>
> I have a streaming pipeline in which I write to a Redis sink at the end.
> The pipeline writes about 60,000 events per 15 minute window it processes.
> I implemented the writing to Redis using a ParDo.
>
> The prototype worked well for small set of streaming events. However, when
> I tested with my full dataset, every now and then I noticed the Redis
> client (Jedis) threw a SocketException. (The client opens connection every
> time it has to write to Redis, then closes the connection)
>
> Couple of questions I have:
>
>1. Is there a preferred Redis client for the pipeline?
>2. Does it make sense to write a Custom Redis sink instead of a ParDo?
>
> Thanks,
> Sowmya
>
>
>
>
>
>
>


Re: Approach to writing to Redis in Streaming Pipeline

2017-03-16 Thread Raghu Angadi
ParDo is ok.

Do you open a connection in each processElement() invocation? If you can
reuse the connection, you can open once in @Setup method and close it in
@Teardown.

Raghu.

On Thu, Mar 16, 2017 at 2:19 PM, sowmya balasubramanian <
sowmya.conn...@gmail.com> wrote:

> Hi All,
>
> I am newbie who has recently entered the world of GCP and pipelines.
>
> I have a streaming pipeline in which I write to a Redis sink at the end.
> The pipeline writes about 60,000 events per 15 minute window it processes.
> I implemented the writing to Redis using a ParDo.
>
> The prototype worked well for small set of streaming events. However, when
> I tested with my full dataset, every now and then I noticed the Redis
> client (Jedis) threw a SocketException. (The client opens connection every
> time it has to write to Redis, then closes the connection)
>
> Couple of questions I have:
>
>1. Is there a preferred Redis client for the pipeline?
>2. Does it make sense to write a Custom Redis sink instead of a ParDo?
>
> Thanks,
> Sowmya
>
>
>
>
>
>


Approach to writing to Redis in Streaming Pipeline

2017-03-16 Thread sowmya balasubramanian
Hi All,

I am newbie who has recently entered the world of GCP and pipelines.

I have a streaming pipeline in which I write to a Redis sink at the end.
The pipeline writes about 60,000 events per 15 minute window it processes.
I implemented the writing to Redis using a ParDo.

The prototype worked well for small set of streaming events. However, when
I tested with my full dataset, every now and then I noticed the Redis
client (Jedis) threw a SocketException. (The client opens connection every
time it has to write to Redis, then closes the connection)

Couple of questions I have:

   1. Is there a preferred Redis client for the pipeline?
   2. Does it make sense to write a Custom Redis sink instead of a ParDo?

Thanks,
Sowmya


Re: Cannot provide coder

2017-03-16 Thread Kenneth Knowles
Hi Antony,

Thanks for the report!

You are right about the easiest way to fix this: use Create.withCoder(...).
It is generally the best practice as Create actually infers coders in a way
that is different from everywhere else (we would like this to be invisible,
but unfortunately we cannot quite make it so).

But I also think that (1) your example should work and use
SerializableCoder and (2) the error message you saw is not good enough. So
I have filed https://issues.apache.org/jira/browse/BEAM-1736

Kenn

On Mon, Mar 13, 2017 at 8:28 AM, Antony Mayi  wrote:

> to answer myself:
>
> I should have said I am creating the PCollection using Create.of
> which apparently leads to this getDefaultCoder() implementation:
> CoderRegistry.getDefaultCoder(T exampleValue)
> 
>  which
> doesn't try inspecting the annotation (as opposed
> CoderRegistry.getDefaultCoder(Class clazz)
> 
>  ).
> So I guess I have to explicitly use the create.of().withCoder().
>
> antony.
>
>
> On Monday, 13 March 2017, 13:37, Antony Mayi  wrote:
>
>
> Hi,
>
> trying to create PCollection of my custom data class but keep failing due
> to the CannotProvideCoderException:
>
> My class is declared as follows:
>
> @DefaultCoder(SerializableCoder.class) public class Data BaseData> implements Serializable {
>
> and it fails like this:
>
> Caused by: org.apache.beam.sdk.coders.CannotProvideCoderException: Cannot
> provide coder based on value with class my.project.Data: No CoderFactory
> has been registered for the class.
>
> why it doesn't pick the coder?
>
> thanks,
> Antony.
>
>
>