Announcing the Community Over Code 2024 Streaming Track

2024-03-20 Thread James Hughes
Hi all,

Community Over Code , the ASF conference,
will be held in Denver, Colorado,

October 7-10, 2024. The call for presentations

is open now through April 15, 2024.  (This is two months earlier than last
year!)

I am one of the co-chairs for the stream processing track, and we would
love to see you there and hope that you will consider submitting a talk.

About the Streaming track:

There are many top-level ASF projects which focus on and push the envelope
for stream and event processing.  ActiveMQ, Beam, Bookkeeper, Camel, Flink,
Kafka, Pulsar, RocketMQ, and Spark are all house-hold names in the stream
processing and analytics world at this point.  These projects show that
stream processing has unique characteristics requiring deep expertise.  On
the other hand, users need easy to apply solutions.  The streaming track
will host talks focused on the use cases and advances of these projects as
well as other developments in the streaming world.

Thanks and see you in October!

Jim


Announcing the Community Over Code 2023 Streaming Track

2023-06-09 Thread James Hughes
Hi all,

Community Over Code , the ASF conference,
will be held in Halifax, Nova Scotia October 7-10, 2023. The call for
presentations  is
open now through July 13, 2023.

I am one of the co-chairs for the stream processing track, and we would
love to see you there and hope that you will consider submitting a talk.

About the Streaming track:

There are many top-level ASF projects which focus on and push the envelope
for stream and event processing.  ActiveMQ, Beam, Bookkeeper, Camel, Flink,
Kafka, Pulsar, RocketMQ, and Spark are all house-hold names in the stream
processing and analytics world at this point.  These projects show that
stream processing has unique characteristics requiring deep expertise.  On
the other hand, users need easy to apply solutions.  The streaming track
will host talks focused on the use cases and advances of these projects as
well as other developments in the streaming world.

Thanks and see you in October!

Jim


Go SDK Example=

2018-06-03 Thread James Wilson
Hi All,

This is first time I am trying to contribute to a large open source project.  I 
was going to tackle the BEAM-4292 "Add streaming word count example" for the Go 
SDK.  Do I assign it to myself or just complete the task and create a PR 
request?  I read through the contributing page on the Apache Beam site, but it 
didn’t go into how to tackle your first task.  Any help would be appreciated.

Best,
James

Re: October Apache Beam Newsletter

2017-10-09 Thread James
Cool, very informational, thanks!

On Tue, Oct 10, 2017 at 2:39 AM Griselda Cuevas  wrote:

> Hi Apache Beam Community,
>
> Our first Apache Beam Newsletter is here!, I'm sharing a table of contents
> of what's in this edition, which covers everything that has happened in the
> project from June 2017 until October 2017.
>
> You can find the full content in this Google Doc:
> https://docs.google.com/document/d/1BbpQne-9ng93G-_-UKH2C4UNafcEQcLALtu38qsXfI8/edit?usp=sharing
>
> Enjoy!
>
> * * * * * October 2017 Newsletter Table of Contents * * * * *
>
> >> What's Been Done
> - Beam SQL DSL APIs
> - Nexmark
> - Splittable DoFn
> - Improvements to reading and writing files
> - Improvements to BigQueryIO
> - New I/O connectors
> - Docker development images and reproducible-builds
> - Website updates: New Beam Execution Model page & the Mobile Gaming
> Walkthrough was updated w/ new sample code for Python
>
> >> What We Are Working On
> - Portability
> - Splittable DoFn for Python SDK
> - Website: Improve CoGroupByKey docs & website navigation/usability
>
> >> What's Planned
> - FileIO.write()
>
> >>  New Members (Welcome!)
> - Daniel Harper, BBC, London (UK)
>
> >>  Talks & Meetups
> - Talks @ YOW Data Sydney and Strata NY
> - Meetups @ London & dinner in NY
> - Speakers & Meetup Founders group
>
> >>  Resources
> - Capability Matrix
> - Contribution Guide
> - Featured talk & sample intro talk deck
>


Slack Invitation Request

2017-10-08 Thread James Comfort
Hi,

May I have an invitation to the Apache Beam Slack channel?

Thanks!
Jimmy


Re: slack invite

2017-08-07 Thread James
Done.

On Tue, Aug 8, 2017 at 10:14 AM Dumi Loghin  wrote:

> Hi,
>
> Can you add me to the slack channel? Thanks!
>
> Dumi
>


Re: Beam Slack channel

2017-06-29 Thread James
Invite sent.
On Thu, 29 Jun 2017 at 6:07 PM Tolsa, Camille 
wrote:

> Hello,
>
>
> Could you also invite me at camille.to...@gmail.com thanks
>
> On 29 June 2017 at 09:29, Ismaël Mejía  wrote:
>
>> Invitation sent!
>>
>> On Thu, Jun 29, 2017 at 9:16 AM, Patrick Reames
>>  wrote:
>> > Can i also get an invite?
>> >
>> > On 2017-06-25 08:51 (-0500), Aleksandr  wrote:
>> >> Hello,>
>> >> Can someone  please add me to the slack channel?>
>> >>
>> >> Best regards>
>> >> Aleksandr Gortujev.>
>> >>
>> >
>>
>
>
>
>
> 
> This e-mail transmission (message and any attached files) may contain
> information that is proprietary, privileged and/or confidential to Veolia
> Environnement and/or its affiliates and is intended exclusively for the
> person(s) to whom it is addressed. If you are not the intended recipient,
> please notify the sender by return e-mail and delete all copies of this
> e-mail, including all attachments. Unless expressly authorized, any use,
> disclosure, publication, retransmission or dissemination of this e-mail
> and/or of its attachments is strictly prohibited.
>
> Ce message electronique et ses fichiers attaches sont strictement
> confidentiels et peuvent contenir des elements dont Veolia Environnement
> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
> message par erreur, merci de le retourner a son emetteur et de le detruire
> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
> publication, la distribution, ou la reproduction non expressement
> autorisees de ce message et de ses pieces attachees sont interdites.
>
> 
>


Re: SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread James
Hi Tyler,

I think upsert is a good alternative, concise as INSERT and have the valid
semantics. Just that user seems rarely use UPSERT either(might because
there's no UPDATE in batch big data processing).

By *"INSERT will behave differently in batch & stream processing"* I mean,
if we use the "INSERT" solution I described above, there will be ten
INSERTs:

*INSERT INTO result(rowkey, col1) values(...)*

*INSERT INTO result(rowkey, col2) values(...)*

*...INSERT INTO result(rowkey, col10) values(...)*

Although we issued ten INSERTs, but there will be only ONE new records in
the target table, because 9 of the INSERTs are actually UPDATing the
record, so in stream computing *INSERT = (INSERT or UPDATE)*, while in
batch,* INSERT is just INSERT*.

I think the essence of this problem is, there is no UPDATE in batch, but
require UPDATE in streaming.



Tyler Akidau <taki...@google.com>于2017年6月22日周四 下午11:35写道:

> Calcite appears to have UPSERT
> <https://issues.apache.org/jira/browse/CALCITE-492> support, can we just
> use that instead?
>
> Also, I don't understand your statement that "INSERT will behave
> differently in batch & stream processing". Can you explain further?
>
>
> -Tyler
>
>
> On Thu, Jun 22, 2017 at 7:35 AM Jesse Anderson <je...@bigdatainstitute.io>
> wrote:
>
>> If I'm understanding correctly, Hive does that with a insert into
>> followed by a select statement that does the aggregation.
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
>>
>> On Thu, Jun 22, 2017 at 1:32 AM James <xumingmi...@gmail.com> wrote:
>>
>>> Hi team,
>>>
>>> I am thinking about a SQL and stream computing related problem, want to
>>> hear your opinions.
>>>
>>> In stream computing, there is a typical case like this:
>>>
>>> *We want to calculate a big wide result table, which has one rowkey and
>>> ten
>>> value columns:*
>>> *create table result (*
>>> *rowkey varchar(127) PRIMARY KEY,*
>>> *col1 int,*
>>> *col2 int,*
>>> *...*
>>> *col10 int*
>>> *);*
>>>
>>> Each of the value columns is calculated by a complex query, so there will
>>> be ten SQLs to calculate
>>> data for this table, for each sql:
>>>
>>> * First check whether there is a row for the specified `rowkey`.
>>> * If yes, then `update`, otherwise `insert`.
>>>
>>> There is actually a dedicated sql syntax called `MERGE` designed for
>>> this(SQL2008), a sample usage is:
>>>
>>> MERGE INTO result D
>>>USING (SELECT rowkey, col1 FROM input WHERE flag = 80) S
>>>ON (D.rowkey = S.rowkey)
>>>WHEN MATCHED THEN UPDATE SET D.col1 = S.col1
>>>WHEN NOT MATCHED THEN INSERT (D.rowkey, D.col1)
>>>
>>>
>>> *The semantic fits perfectly, but it is very verbose, and normal users
>>> rarely used this syntax.*
>>>
>>> So my colleagues invented a new syntax for this scenario (Or more
>>> precisely, a new interpretation for the INSERT statement). For the above
>>> scenario, user will always write `insert` statement:
>>>
>>> insert into result(rowkey, col1) values(...)
>>> insert into result(rowkey, col2) values(...)
>>>
>>> The sql interpreter will do a trick behind the scene: if the `rowkey`
>>> exists, then update, otherwise `insert`. This solution is very concise,
>>> but
>>> violates the semantics of `insert`, using this solution INSERT will
>>> behave
>>> differently in batch & stream processing.
>>>
>>> How do you guys think? which do you prefer? What's your reasoning?
>>>
>>> Looking forward to your opinions, thanks in advance.
>>>
>> --
>> Thanks,
>>
>> Jesse
>>
>


Re: SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread James
Hi Jesse,

Yeah, I know the insert...select grammar. In my scenario, each of the value
column is calculated separately(might calculated from different
datasources), so insert...select might not be sufficient.

Jesse Anderson <je...@bigdatainstitute.io>于2017年6月22日周四 下午10:35写道:

> If I'm understanding correctly, Hive does that with a insert into followed
> by a select statement that does the aggregation.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
>
> On Thu, Jun 22, 2017 at 1:32 AM James <xumingmi...@gmail.com> wrote:
>
>> Hi team,
>>
>> I am thinking about a SQL and stream computing related problem, want to
>> hear your opinions.
>>
>> In stream computing, there is a typical case like this:
>>
>> *We want to calculate a big wide result table, which has one rowkey and
>> ten
>> value columns:*
>> *create table result (*
>> *rowkey varchar(127) PRIMARY KEY,*
>> *col1 int,*
>> *col2 int,*
>> *...*
>> *col10 int*
>> *);*
>
>
>>
>> Each of the value columns is calculated by a complex query, so there will
>> be ten SQLs to calculate
>> data for this table, for each sql:
>>
>> * First check whether there is a row for the specified `rowkey`.
>> * If yes, then `update`, otherwise `insert`.
>>
>> There is actually a dedicated sql syntax called `MERGE` designed for
>> this(SQL2008), a sample usage is:
>>
>> MERGE INTO result D
>>USING (SELECT rowkey, col1 FROM input WHERE flag = 80) S
>>ON (D.rowkey = S.rowkey)
>>WHEN MATCHED THEN UPDATE SET D.col1 = S.col1
>>WHEN NOT MATCHED THEN INSERT (D.rowkey, D.col1)
>>
>>
>> *The semantic fits perfectly, but it is very verbose, and normal users
>> rarely used this syntax.*
>
>
>>
>> So my colleagues invented a new syntax for this scenario (Or more
>> precisely, a new interpretation for the INSERT statement). For the above
>> scenario, user will always write `insert` statement:
>>
>> insert into result(rowkey, col1) values(...)
>> insert into result(rowkey, col2) values(...)
>>
>> The sql interpreter will do a trick behind the scene: if the `rowkey`
>> exists, then update, otherwise `insert`. This solution is very concise,
>> but
>> violates the semantics of `insert`, using this solution INSERT will behave
>> differently in batch & stream processing.
>>
>> How do you guys think? which do you prefer? What's your reasoning?
>>
>> Looking forward to your opinions, thanks in advance.
>>
> --
> Thanks,
>
> Jesse
>


SQL in Stream Computing: MERGE or INSERT?

2017-06-22 Thread James
Hi team,

I am thinking about a SQL and stream computing related problem, want to
hear your opinions.

In stream computing, there is a typical case like this:

*We want to calculate a big wide result table, which has one rowkey and ten
value columns:*
*create table result (*
*rowkey varchar(127) PRIMARY KEY,*
*col1 int,*
*col2 int,*
*...*
*col10 int*
*);*

Each of the value columns is calculated by a complex query, so there will
be ten SQLs to calculate
data for this table, for each sql:

* First check whether there is a row for the specified `rowkey`.
* If yes, then `update`, otherwise `insert`.

There is actually a dedicated sql syntax called `MERGE` designed for
this(SQL2008), a sample usage is:

MERGE INTO result D
   USING (SELECT rowkey, col1 FROM input WHERE flag = 80) S
   ON (D.rowkey = S.rowkey)
   WHEN MATCHED THEN UPDATE SET D.col1 = S.col1
   WHEN NOT MATCHED THEN INSERT (D.rowkey, D.col1)


*The semantic fits perfectly, but it is very verbose, and normal users
rarely used this syntax.*

So my colleagues invented a new syntax for this scenario (Or more
precisely, a new interpretation for the INSERT statement). For the above
scenario, user will always write `insert` statement:

insert into result(rowkey, col1) values(...)
insert into result(rowkey, col2) values(...)

The sql interpreter will do a trick behind the scene: if the `rowkey`
exists, then update, otherwise `insert`. This solution is very concise, but
violates the semantics of `insert`, using this solution INSERT will behave
differently in batch & stream processing.

How do you guys think? which do you prefer? What's your reasoning?

Looking forward to your opinions, thanks in advance.


Re: Beam 2.0 Release Q and A

2017-05-17 Thread James
Thanks a lot Jesse! Learned a lot from your previous Q & A blog(and surely
will also do from this new one).

Jean-Baptiste Onofré 于2017年5月18日周四 上午2:37写道:

> Awesome ! Great work Jesse !
>
> Regards
> JB
> On May 17, 2017, at 14:26, Jesse Anderson  wrote:
>>
>> After the first release of Beam, I did a Q and A
>> 
>> with the users and developers of Beam. Now that we've done the first stable
>> release, I want to update the Q and A. This will help us promote Beam and
>> how people are using it in production.
>>
>> I've created a Google Doc
>> 
>> with
>> questions I often get about Beam. I'd love to get your answers to have as
>> many points of view as possible. You can answer as many questions as you'd
>> like.
>>
>> Once we're done, I'll publish the responses to a new blog post and send out
>> the URL.
>>
>> Thanks,
>>
>> Jesse
>>
>>


Re: Fwd: Slack Invite

2017-05-05 Thread James
Done.

On 5 May 2017, 4:57 PM +0800, Josh , wrote:
> Could someone add me too please? at j...@permutive.com 
> (mailto:j...@permutive.com)
>
> On Fri, May 5, 2017 at 9:08 AM, Jean-Baptiste Onofré  (mailto:j...@nanthrax.net)> wrote:
> > Done
> >
> > Regards
> > JB
> >
> >
> > On 05/05/2017 10:02 AM, Edward Bosher wrote:
> > > i,
> > >
> > > Whenever you have time I'd love to get an invite to slack on this email 
> > > address.
> > >
> > > edbosher at gmail com
> > >
> > > Thanks,
> > > Ed
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org (mailto:jbono...@apache.org)
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>


Re: Slack Channel Request

2017-04-13 Thread James
Could I also have an invite please?

On 2017-03-28 08:28 (+0800), Davor Bonaci  wrote: 
> Invite sent.
> 
> On Sat, Mar 25, 2017 at 2:48 AM, Prabeesh K.  wrote:
> 
> > Hi Jean,
> >
> > Thank you for your reply. I am eagerly waiting for the other options.
> >
> > Regards,
> > Prabeesh K.
> >
> > On 25 March 2017 at 10:08, Jean-Baptiste Onofré  wrote:
> >
> >> Unfortunately we reached the max number of people on Slack (90).
> >>
> >> Let me see what we can do.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 03/24/2017 09:49 PM, Prabeesh K. wrote:
> >>
> >>> Hi,
> >>>
> >>> Can someone please add me to the Apache Beam slack channel?
> >>>
> >>> Regards,
> >>>
> >>> Prabeesh K.
> >>>
> >>>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
> >
>