Re: messageId from PubSubIO

2018-01-17 Thread Csaba Kassai
ok, thanks for the feature request.

On Wed, 17 Jan 2018 at 19:52 Lukasz Cwik  wrote:

> It is currently not exposed. I filed
> https://issues.apache.org/jira/browse/BEAM-3489 as a feature request.
> It shouldn't be difficult to add if you would like to try to tackle this
> feature.
>
> Here is a pointer to the contribution guide for more details:
> https://beam.apache.org/contribute/contribution-guide/
>
> On Wed, Jan 17, 2018 at 4:45 AM, Csaba Kassai 
> wrote:
>
>> Hi,
>>
>> is it possible the get somehow the messageId field of a Pub/Sub message
>> in a DoFn after using the PubSubIO Beam source to read the messages?
>>
>> I need the default id which was assigned by the Pub/Sub service. I want
>> to log it for debugging purposes.
>>
>> Using a custom attribute for the unique id and the withIdAttribute()
>> method is not possible for me, because I have no influence on the publisher
>> in this case.
>>
>> I use the 2.2.0 version of the Dataflow Java SDK.
>>
>> Thanks,
>>
>> Csabi
>>
>
>

-- 
--
[image: photo]
Csaba Kassai 
Data Architect
M:  +36703379122
LinkedIn  *•* Facebook
 *•* Blog 
Doctusoft 


Re: Strata Conference this March 6-8

2018-01-17 Thread Holden Karau
So doing a streaming BoF join in would probably require meeting somewhere
other than a coffee shop so as not to be jerks in the coffee shop.

On Wed, Jan 17, 2018 at 2:53 PM, Matthias Baetens <
matthias.baet...@datatonic.com> wrote:

> Sure, I'd be very happy to organise something. This is about Strata San
> Jose though right? Maybe we can organise a remote session in which we can
> join (depending on when you would organise the BoF) or have a channel
> set-up if the talks would be broadcasted?
>
> Also: will there be any Beam talks on Strata London or is this not known
> yet? Keen to get involved and set things up around that date as well.
>
> On Wed, Jan 17, 2018 at 8:37 AM, Jean-Baptiste Onofré 
> wrote:
>
>> That's a great idea ! I'm sure that Matthias (organizer of the Beam
>> London Meetup) can help us to plan something.
>>
>> Regards
>> JB
>>
>>
>> On 01/17/2018 08:57 AM, Ismaël Mejía wrote:
>>
>>> Maybe a good idea to try to organize a Beam meetup in london in the
>>> same dates in case some of the people around can jump in and talk too.
>>>
>>> On Wed, Jan 17, 2018 at 2:51 AM, Ron Gonzalez 
>>> wrote:
>>>
 Works for me...

 On Tuesday, January 16, 2018, 5:45:33 PM PST, Holden Karau
  wrote:


 How would folks feel about during the afternoon break (3:20-4:20) on the
 Wednesday (same day as Eugene's talk)? We could do the Philz which is a
 bit
 of a walk but gets us away from the big crowd and also lets folks not
 attending the conference but in the area join us.

 On Tue, Jan 16, 2018 at 5:29 PM, Ron Gonzalez 
 wrote:

 Cool, let me know if you guys finally schedule it. I will definitely
 try to
 make it to Eugene's talk but having an informal BoF in the area would be
 nice...

 Thanks,
 Ron

 On Tuesday, January 16, 2018, 5:06:53 PM PST, Boris Lublinsky
  wrote:


 All for it

 Boris Lublinsky
 FDP Architect
 boris.lublin...@lightbend.com
 https://www.lightbend.com/

 On Jan 16, 2018, at 7:01 PM, Ted Yu  wrote:

 +1 to BoF

 On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk <
 dmi...@postmates.com>
 wrote:

 Probably won't be attending the conference, but totally down for a BoF.

 On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau 
 wrote:

 Do interested folks have any timing constraints around a BoF?

 On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson <
 je...@bigdatainstitute.io>
 wrote:

 +1 to BoF. I don't know if any Beam talks will be on the schedule.

 We could do an informal BoF at the Philz nearby or similar?
>




 --
 Twitter: https://twitter.com/h oldenkarau




 --
 Best regards,
 Dmitry Demeshchuk.






 --
 Twitter: https://twitter.com/holdenkarau

>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>
>
> --
>
>
> *Matthias Baetens*
>
>
> *datatonic | data power unleashed*
>
> office +44 203 668 3680 <+44%2020%203668%203680>  |  mobile +44 74 918
> 20646
>
> Level24 | 1 Canada Square | Canary Wharf | E14 5AB London
> 
>
>
> We've been announced
> 
>  as
> one of the top global Google Cloud Machine Learning partners.
>



-- 
Twitter: https://twitter.com/holdenkarau


Re: Strata Conference this March 6-8

2018-01-17 Thread Matthias Baetens
Sure, I'd be very happy to organise something. This is about Strata San
Jose though right? Maybe we can organise a remote session in which we can
join (depending on when you would organise the BoF) or have a channel
set-up if the talks would be broadcasted?

Also: will there be any Beam talks on Strata London or is this not known
yet? Keen to get involved and set things up around that date as well.

On Wed, Jan 17, 2018 at 8:37 AM, Jean-Baptiste Onofré 
wrote:

> That's a great idea ! I'm sure that Matthias (organizer of the Beam London
> Meetup) can help us to plan something.
>
> Regards
> JB
>
>
> On 01/17/2018 08:57 AM, Ismaël Mejía wrote:
>
>> Maybe a good idea to try to organize a Beam meetup in london in the
>> same dates in case some of the people around can jump in and talk too.
>>
>> On Wed, Jan 17, 2018 at 2:51 AM, Ron Gonzalez 
>> wrote:
>>
>>> Works for me...
>>>
>>> On Tuesday, January 16, 2018, 5:45:33 PM PST, Holden Karau
>>>  wrote:
>>>
>>>
>>> How would folks feel about during the afternoon break (3:20-4:20) on the
>>> Wednesday (same day as Eugene's talk)? We could do the Philz which is a
>>> bit
>>> of a walk but gets us away from the big crowd and also lets folks not
>>> attending the conference but in the area join us.
>>>
>>> On Tue, Jan 16, 2018 at 5:29 PM, Ron Gonzalez 
>>> wrote:
>>>
>>> Cool, let me know if you guys finally schedule it. I will definitely try
>>> to
>>> make it to Eugene's talk but having an informal BoF in the area would be
>>> nice...
>>>
>>> Thanks,
>>> Ron
>>>
>>> On Tuesday, January 16, 2018, 5:06:53 PM PST, Boris Lublinsky
>>>  wrote:
>>>
>>>
>>> All for it
>>>
>>> Boris Lublinsky
>>> FDP Architect
>>> boris.lublin...@lightbend.com
>>> https://www.lightbend.com/
>>>
>>> On Jan 16, 2018, at 7:01 PM, Ted Yu  wrote:
>>>
>>> +1 to BoF
>>>
>>> On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk >> >
>>> wrote:
>>>
>>> Probably won't be attending the conference, but totally down for a BoF.
>>>
>>> On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau 
>>> wrote:
>>>
>>> Do interested folks have any timing constraints around a BoF?
>>>
>>> On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson <
>>> je...@bigdatainstitute.io>
>>> wrote:
>>>
>>> +1 to BoF. I don't know if any Beam talks will be on the schedule.
>>>
>>> We could do an informal BoF at the Philz nearby or similar?

>>>
>>>
>>>
>>>
>>> --
>>> Twitter: https://twitter.com/h oldenkarau
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Dmitry Demeshchuk.
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 


*Matthias Baetens*


*datatonic | data power unleashed*

office +44 203 668 3680  |  mobile +44 74 918 20646

Level24 | 1 Canada Square | Canary Wharf | E14 5AB London


We've been announced

as
one of the top global Google Cloud Machine Learning partners.


Scio 0.5.0-alpha1 is out

2018-01-17 Thread Neville Li
Hi all,

We just released Scio 0.5.0-alpha1. This release includes a typed BigQuery
performance improvement by bypassing intermediate TableRow JSONs. It has
shown a 2x speed up in some of our benchmarks.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.5.0-alpha1

*"Ia Io"*
Breaking changes

   - BigQueryIO in JobTest#output now requires a type parameter. Explicit
   .map(T.toTableRow) of test data is no longer needed.
   - Package com.spotify.scio.extra.transforms is moved from scio-extra to
   scio-core, under com.spotify.scio.transforms.

See this section

for
more details.
Features

   - Support reading BigQuery as Avro #964
   , #992
   
   - Add TFRecordSpec support for Featran #1002
   
   - Add AsyncLookupDoFn #1012 

Bug fixes

   - Fix SCollectionMatchers serialization #1001
   
   - Check runner version #1008
    #1009
   


Re: messageId from PubSubIO

2018-01-17 Thread Lukasz Cwik
It is currently not exposed. I filed
https://issues.apache.org/jira/browse/BEAM-3489 as a feature request.
It shouldn't be difficult to add if you would like to try to tackle this
feature.

Here is a pointer to the contribution guide for more details:
https://beam.apache.org/contribute/contribution-guide/

On Wed, Jan 17, 2018 at 4:45 AM, Csaba Kassai 
wrote:

> Hi,
>
> is it possible the get somehow the messageId field of a Pub/Sub message in
> a DoFn after using the PubSubIO Beam source to read the messages?
>
> I need the default id which was assigned by the Pub/Sub service. I want to
> log it for debugging purposes.
>
> Using a custom attribute for the unique id and the withIdAttribute()
> method is not possible for me, because I have no influence on the publisher
> in this case.
>
> I use the 2.2.0 version of the Dataflow Java SDK.
>
> Thanks,
>
> Csabi
>


Re: Some interesting use case

2018-01-17 Thread zlgonzalez
Thanks Boris.Yeah we can talk about it at the BoF...
Thanks,Ron


Sent via the Samsung Galaxy S7 active, an AT&T 4G LTE smartphone
 Original message From: Boris Lublinsky 
 Date: 1/17/18  6:10 AM  (GMT-08:00) To: 
user@beam.apache.org Cc: d...@beam.apache.org, Charles Chen  
Subject: Re: Some interesting use case 
Ron,If you are talking about Tensorflow Saved model format, I personally think 
that it is overkill for model serving. My preferred option is to used 
traditional TF export, which can be optimized for serving.As for processing I 
am using TF Java APIs, which basically is a population of the tensor column.
But if you are really interested, we can talk about it in San Jose or set up a 
config call if you want to discuss it sooner.

Boris Lublinsky
FDP Architect
boris.lublin...@lightbend.com
https://www.lightbend.com/



On Jan 16, 2018, at 10:53 PM, Ron Gonzalez  wrote:

Yes you're right. I believe this is the use case that I'm after. So 
if I understand correctly, transforms that do aggregations just assume that the 
batch of data being aggregated is passed as part of a tensor column. Is it 
possible to hook up a lookup call to another Tensorflow Serving servable for a 
join in batch mode?
Will a saved model when loaded into a tensorflow serving model actually have 
the definitions of the metadata when retrieved using the tensorflow serving 
metadata api?
Thanks,Ron






On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles 
Chen  wrote:





This sounds similar to the use case for tf.Transform, a 
library that depends on Beam: https://github.com/tensorflow/transform
On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez  wrote:
Hi,  I was wondering if anyone has encountered or used Beam in the following 
manner:   1. During machine learning training, use Beam to create the event 
table. The flow may consist of some joins, aggregations, row-based 
transformations, etc...  2. Once the model is created, deploy the model to some 
scoring service via PMML (or some other scoring service).  3. Enable the SAME 
transformations used in #1 by using a separate engine but thereby guaranteeing 
that it will transform the data identically as the engine used in #1.
  I think this is a pretty interesting use case where Beam is used to guarantee 
portability across engines and deployment (batch to true streaming, not 
micro-batch). What's not clear to me is with respect to how batch joins would 
translate during one-by-one scoring (probably lookups) or how aggregations 
given that some kind of history would need to be stored (and how much is kept 
is configurable too).
  Thoughts?
Thanks,Ron




Re: Some interesting use case

2018-01-17 Thread Boris Lublinsky
Ron,
If you are talking about Tensorflow Saved model format, I personally think that 
it is overkill for model serving. My preferred option is to used traditional TF 
export, which can be optimized for serving.
As for processing I am using TF Java APIs, which basically is a population of 
the tensor column.

But if you are really interested, we can talk about it in San Jose or set up a 
config call if you want to discuss it sooner.

Boris Lublinsky
FDP Architect
boris.lublin...@lightbend.com
https://www.lightbend.com/

> On Jan 16, 2018, at 10:53 PM, Ron Gonzalez  wrote:
> 
> Yes you're right. I believe this is the use case that I'm after. So if I 
> understand correctly, transforms that do aggregations just assume that the 
> batch of data being aggregated is passed as part of a tensor column. Is it 
> possible to hook up a lookup call to another Tensorflow Serving servable for 
> a join in batch mode?
> 
> Will a saved model when loaded into a tensorflow serving model actually have 
> the definitions of the metadata when retrieved using the tensorflow serving 
> metadata api?
> 
> Thanks,
> Ron
> 
> On Tuesday, January 16, 2018, 6:16:01 PM PST, Charles Chen  
> wrote:
> 
> 
> This sounds similar to the use case for tf.Transform, a library that depends 
> on Beam: https://github.com/tensorflow/transform 
> 
> On Tue, Jan 16, 2018 at 5:51 PM Ron Gonzalez  > wrote:
> Hi,
>   I was wondering if anyone has encountered or used Beam in the following 
> manner:
>  
>   1. During machine learning training, use Beam to create the event table. 
> The flow may consist of some joins, aggregations, row-based transformations, 
> etc...
>   2. Once the model is created, deploy the model to some scoring service via 
> PMML (or some other scoring service).
>   3. Enable the SAME transformations used in #1 by using a separate engine 
> but thereby guaranteeing that it will transform the data identically as the 
> engine used in #1.
> 
>   I think this is a pretty interesting use case where Beam is used to 
> guarantee portability across engines and deployment (batch to true streaming, 
> not micro-batch). What's not clear to me is with respect to how batch joins 
> would translate during one-by-one scoring (probably lookups) or how 
> aggregations given that some kind of history would need to be stored (and how 
> much is kept is configurable too).
> 
>   Thoughts?
> 
> Thanks,
> Ron



messageId from PubSubIO

2018-01-17 Thread Csaba Kassai
Hi,

is it possible the get somehow the messageId field of a Pub/Sub message in
a DoFn after using the PubSubIO Beam source to read the messages?

I need the default id which was assigned by the Pub/Sub service. I want to
log it for debugging purposes.

Using a custom attribute for the unique id and the withIdAttribute() method
is not possible for me, because I have no influence on the publisher in
this case.

I use the 2.2.0 version of the Dataflow Java SDK.

Thanks,

Csabi


Re: Strata Conference this March 6-8

2018-01-17 Thread Jean-Baptiste Onofré
That's a great idea ! I'm sure that Matthias (organizer of the Beam London 
Meetup) can help us to plan something.


Regards
JB

On 01/17/2018 08:57 AM, Ismaël Mejía wrote:

Maybe a good idea to try to organize a Beam meetup in london in the
same dates in case some of the people around can jump in and talk too.

On Wed, Jan 17, 2018 at 2:51 AM, Ron Gonzalez  wrote:

Works for me...

On Tuesday, January 16, 2018, 5:45:33 PM PST, Holden Karau
 wrote:


How would folks feel about during the afternoon break (3:20-4:20) on the
Wednesday (same day as Eugene's talk)? We could do the Philz which is a bit
of a walk but gets us away from the big crowd and also lets folks not
attending the conference but in the area join us.

On Tue, Jan 16, 2018 at 5:29 PM, Ron Gonzalez  wrote:

Cool, let me know if you guys finally schedule it. I will definitely try to
make it to Eugene's talk but having an informal BoF in the area would be
nice...

Thanks,
Ron

On Tuesday, January 16, 2018, 5:06:53 PM PST, Boris Lublinsky
 wrote:


All for it

Boris Lublinsky
FDP Architect
boris.lublin...@lightbend.com
https://www.lightbend.com/

On Jan 16, 2018, at 7:01 PM, Ted Yu  wrote:

+1 to BoF

On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk 
wrote:

Probably won't be attending the conference, but totally down for a BoF.

On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau  wrote:

Do interested folks have any timing constraints around a BoF?

On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson 
wrote:

+1 to BoF. I don't know if any Beam talks will be on the schedule.


We could do an informal BoF at the Philz nearby or similar?





--
Twitter: https://twitter.com/h oldenkarau




--
Best regards,
Dmitry Demeshchuk.






--
Twitter: https://twitter.com/holdenkarau


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com