Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1805

2016-12-13 Thread Pei He
Thanks Dan for helping.

I looks to me it related to my change to do BatchRequest in
GcsUtil.fileSize().
I think BatchRequest is using the default timeout, which is too short.

Looking more into this issue.


On Tue, Dec 13, 2016 at 3:31 PM, Dan Halperin  wrote:

> If you look at the console output, we are retrying:
>
> [WARNING] Upload attempt failed, sleeping before retrying staging of 
> classpath: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_RunnableOnService_Dataflow/.repository/com/google/auth/google-auth-library-credentials/0.6.0/google-auth-library-credentials-0.6.0.jar
>
>
> If you look at the console output, we are retrying, up to 4 times
> according to the code.
>
>
> On Tue, Dec 13, 2016 at 2:33 PM, Jason Kuster  invalid> wrote:
>
>> This is still failing due to timeouts. Pei, Davor said you might know
>> about
>> retries for this -- what's the deal here?
>>
>> On Tue, Dec 13, 2016 at 1:11 PM, Apache Jenkins Server <
>> jenk...@builds.apache.org> wrote:
>>
>> > See > > RunnableOnService_Dataflow/changes>
>> >
>> >
>>
>>
>> --
>> ---
>> Jason Kuster
>> Apache Beam (Incubating) / Google Cloud Dataflow
>>
>
>


Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1805

2016-12-13 Thread Dan Halperin
If you look at the console output, we are retrying:

[WARNING] Upload attempt failed, sleeping before retrying staging of
classpath: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_RunnableOnService_Dataflow/.repository/com/google/auth/google-auth-library-credentials/0.6.0/google-auth-library-credentials-0.6.0.jar


If you look at the console output, we are retrying, up to 4 times according
to the code.


On Tue, Dec 13, 2016 at 2:33 PM, Jason Kuster <
jasonkus...@google.com.invalid> wrote:

> This is still failing due to timeouts. Pei, Davor said you might know about
> retries for this -- what's the deal here?
>
> On Tue, Dec 13, 2016 at 1:11 PM, Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
> > See  > RunnableOnService_Dataflow/changes>
> >
> >
>
>
> --
> ---
> Jason Kuster
> Apache Beam (Incubating) / Google Cloud Dataflow
>


Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1806

2016-12-13 Thread Kenneth Knowles
Failure in
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Dataflow/1806/
is caused by https://github.com/apache/incubator-beam/pull/1541, which I am
reverting.

On Tue, Dec 13, 2016 at 3:16 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  RunnableOnService_Dataflow/changes>
>
>


Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1805

2016-12-13 Thread Jason Kuster
This is still failing due to timeouts. Pei, Davor said you might know about
retries for this -- what's the deal here?

On Tue, Dec 13, 2016 at 1:11 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  RunnableOnService_Dataflow/changes>
>
>


-- 
---
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow


Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-13 Thread Eugene Kirpichov
Agreed with Kenn. I was the one who raised this point on the design doc,
and really I just want to make sure that pipeline authors have a way to let
their users use regular paths from command line and from String/FilePath -
it doesn't have to be an IOChannelFactory or FileSystems feature per se,
but the design needs to make sure there's some well-known way to do it, and
advertise it, including in the documentation of these classes.

Though I'm conflicted on whether it'd be ok to have, say,
TextIO.Read.from() only take an URI rather than String (though under the
hood it would of course pass a URI to FileSystems APIs).

On Tue, Dec 13, 2016 at 1:15 PM Kenneth Knowles 
wrote:

> I don't think there is any conflict here.
>
> On Tue, Dec 13, 2016 at 12:34 PM, Pei He  wrote:
>
> > One design decision made during previous design discussion [1] is
> > "Replacing
> > FilePath with URI for resolving files paths". This has been brought back
> to
> > dev@ mailing list in my previous email.
> >
>
> The direction of this argument, in my opinion, gets the burden of proof
> wrong.
>
> The original design document effectively proposed "instead of using URIs,
> let's make a Beam-specific abstraction" and [1] is just the natural comment
> "let's just use URI". This works for the internet, and gives interop with
> essentially all code, so you need a very special reason not to do it (and
> special cases generally manifest as custom URI schemes).
>
> Comment [2] asked me to clarify the impact on Windows OS users because
> > users have to specify the path in the URI format, such as:
> > "file:///C:/home/input-*"
> > "C:/home/"
> >
>
> It is not really true that users have to do this. For the command line, it
> is the responsibility of the code that parses "--filesToStage
> C:\my\windows\path". Users should absolutely be able to specify paths like
> this on Windows, and it is not difficult and nothing your proposal needs to
> solve.
>
> With programmatic creation in Java code, the same principle applies: the
> environment-specific String/File/Path should be converted to a URI at the
> membrane. Making an API take a URI makes it completely obvious to a Java
> programmer that if they have a String/File/Path they need to convert it
> appropriately.
>
> Kenn
>
>
> > Using URIs in the API is to ensure Beam code is file systems agnostic.
> >
> > Another alternative is Java Path/File. It is used in the current
> > IOChannelFactory API, and it works poorly. For example, Path throws when
> > there are file scheme or asterisk in the path:
> > new File("file:///C:/home/").toPath() throws in toPath().
> > Paths.get("C:/home/").resolve("output-*") throws in resolve().
> >
> > any thoughts and suggestions are welcome.
> >
> > Thanks
> > --
> > Pei
> >
> > ---
> > [1]:
> > https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-
> > XJsVG3qel2lhdKTknmZ_7M/edit?disco=A30vtPU#heading=h.p3gc3colc2cs
> >
> > [2]:
> > https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-
> > XJsVG3qel2lhdKTknmZ_7M/edit?disco=A02O1cY
> >
> > On Tue, Dec 6, 2016 at 1:25 PM, Kenneth Knowles 
> > wrote:
> >
> > > Thanks for the thorough answers. It all sounds good to me.
> > >
> > > On Tue, Dec 6, 2016 at 12:57 PM, Pei He 
> > wrote:
> > >
> > > > Thanks Kenn for the feedback and questions.
> > > >
> > > > I responded inline.
> > > >
> > > > On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles
>  > >
> > > > wrote:
> > > >
> > > > > I really like this document. It is easy to read and informative.
> > Three
> > > > > things not addressed by the document:
> > > > >
> > > > > 1. Major Beam use cases. I'm sure we have a few in the SDK that
> could
> > > be
> > > > > outlined in terms of the new API with pseudocode.
> > > >
> > > >
> > > > (I am writing pseudocode directly with FileSystem interface to
> > > demonstrate.
> > > > However, clients will use the utility FileSystems. This is for us to
> > > have a
> > > > layer between the file systems providers' interface and the client
> > > > interface. We can add utility functions to FileSystems for common use
> > > > patterns as needed.)
> > > >
> > > > Major Beam use cases are the followings:
> > > > A. FileBasedSource:
> > > > // a. Get input URIs and file sizes from users provided specs.
> > > > // Note: I updated the match() to be a bulk operation after I sent my
> > > last
> > > > email.
> > > > List results = match(specList);
> > > > List inputMetadataList = FluentIterable.from(results)
> > > > .transformAndConcat(
> > > > new Function() {
> > > >   @Override
> > > >   public Iterable apply(MatchResult result) {
> > > > return Arrays.asList(result.metadata());
> > > >   });
> > > >
> > > > // b. Read from a start offset to support the source splitting.
> > > > SeekableByteChannel seekChannel = open(fileUri);
> > > > seekChannel.position(source.getStartOffset());
> > > > seekChannel.read(...);
> > > >
> > > > B. FileBasedSink:
> > > > // bulk rename temporary files 

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-13 Thread Kenneth Knowles
I don't think there is any conflict here.

On Tue, Dec 13, 2016 at 12:34 PM, Pei He  wrote:

> One design decision made during previous design discussion [1] is
> "Replacing
> FilePath with URI for resolving files paths". This has been brought back to
> dev@ mailing list in my previous email.
>

The direction of this argument, in my opinion, gets the burden of proof
wrong.

The original design document effectively proposed "instead of using URIs,
let's make a Beam-specific abstraction" and [1] is just the natural comment
"let's just use URI". This works for the internet, and gives interop with
essentially all code, so you need a very special reason not to do it (and
special cases generally manifest as custom URI schemes).

Comment [2] asked me to clarify the impact on Windows OS users because
> users have to specify the path in the URI format, such as:
> "file:///C:/home/input-*"
> "C:/home/"
>

It is not really true that users have to do this. For the command line, it
is the responsibility of the code that parses "--filesToStage
C:\my\windows\path". Users should absolutely be able to specify paths like
this on Windows, and it is not difficult and nothing your proposal needs to
solve.

With programmatic creation in Java code, the same principle applies: the
environment-specific String/File/Path should be converted to a URI at the
membrane. Making an API take a URI makes it completely obvious to a Java
programmer that if they have a String/File/Path they need to convert it
appropriately.

Kenn


> Using URIs in the API is to ensure Beam code is file systems agnostic.
>
> Another alternative is Java Path/File. It is used in the current
> IOChannelFactory API, and it works poorly. For example, Path throws when
> there are file scheme or asterisk in the path:
> new File("file:///C:/home/").toPath() throws in toPath().
> Paths.get("C:/home/").resolve("output-*") throws in resolve().
>
> any thoughts and suggestions are welcome.
>
> Thanks
> --
> Pei
>
> ---
> [1]:
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-
> XJsVG3qel2lhdKTknmZ_7M/edit?disco=A30vtPU#heading=h.p3gc3colc2cs
>
> [2]:
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-
> XJsVG3qel2lhdKTknmZ_7M/edit?disco=A02O1cY
>
> On Tue, Dec 6, 2016 at 1:25 PM, Kenneth Knowles 
> wrote:
>
> > Thanks for the thorough answers. It all sounds good to me.
> >
> > On Tue, Dec 6, 2016 at 12:57 PM, Pei He 
> wrote:
> >
> > > Thanks Kenn for the feedback and questions.
> > >
> > > I responded inline.
> > >
> > > On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles  >
> > > wrote:
> > >
> > > > I really like this document. It is easy to read and informative.
> Three
> > > > things not addressed by the document:
> > > >
> > > > 1. Major Beam use cases. I'm sure we have a few in the SDK that could
> > be
> > > > outlined in terms of the new API with pseudocode.
> > >
> > >
> > > (I am writing pseudocode directly with FileSystem interface to
> > demonstrate.
> > > However, clients will use the utility FileSystems. This is for us to
> > have a
> > > layer between the file systems providers' interface and the client
> > > interface. We can add utility functions to FileSystems for common use
> > > patterns as needed.)
> > >
> > > Major Beam use cases are the followings:
> > > A. FileBasedSource:
> > > // a. Get input URIs and file sizes from users provided specs.
> > > // Note: I updated the match() to be a bulk operation after I sent my
> > last
> > > email.
> > > List results = match(specList);
> > > List inputMetadataList = FluentIterable.from(results)
> > > .transformAndConcat(
> > > new Function() {
> > >   @Override
> > >   public Iterable apply(MatchResult result) {
> > > return Arrays.asList(result.metadata());
> > >   });
> > >
> > > // b. Read from a start offset to support the source splitting.
> > > SeekableByteChannel seekChannel = open(fileUri);
> > > seekChannel.position(source.getStartOffset());
> > > seekChannel.read(...);
> > >
> > > B. FileBasedSink:
> > > // bulk rename temporary files to output files
> > > rename(tempUris, outputUris);
> > >
> > > C. General file operations:
> > > a. resolve paths
> > > b. create file to write, open file to read (for example in tests).
> > > c. bulk delete files/directories
> > >
> > >
> > >
> > > 2. Related work. How does this differ from other filesystem APIs and
> why?
> > >
> > > We need three sets of functionalities:
> > > 1. resolve paths.
> > > 2. read and write channels.
> > > 3. bulk files management operations(bulk delete/rename/match).
> > >
> > > And, they are available from Java nio, hadoop FileSystem APIs, and
> other
> > > standard library such as java.net.URI.
> > >
> > > Current IOChannelFactory interface uses Java nio for (1) and (2), and
> > > define its own interface for (3).
> > >
> > > In my redesign, I made the following choices:
> > > For (1), I replaced Java nio with URI, because it is standardized and
> > > precise and does

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-13 Thread Amir Bahmanyari
How can I unsubscribe?
I will be away from this subject for sometime 
Will rejoin once I get back
Thanks colleagues
Happy holidays 

Sent from my iPhone

> On Dec 13, 2016, at 12:34 PM, Pei He  wrote:
> 
> One design decision made during previous design discussion [1] is "Replacing
> FilePath with URI for resolving files paths". This has been brought back to
> dev@ mailing list in my previous email.
> 
> Comment [2] asked me to clarify the impact on Windows OS users because
> users have to specify the path in the URI format, such as:
> "file:///C:/home/input-*"
> "C:/home/"
> 
> Using URIs in the API is to ensure Beam code is file systems agnostic.
> 
> Another alternative is Java Path/File. It is used in the current
> IOChannelFactory API, and it works poorly. For example, Path throws when
> there are file scheme or asterisk in the path:
> new File("file:///C:/home/").toPath() throws in toPath().
> Paths.get("C:/home/").resolve("output-*") throws in resolve().
> 
> any thoughts and suggestions are welcome.
> 
> Thanks
> --
> Pei
> 
> ---
> [1]:
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit?disco=A30vtPU#heading=h.p3gc3colc2cs
> 
> [2]:
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit?disco=A02O1cY
> 
> On Tue, Dec 6, 2016 at 1:25 PM, Kenneth Knowles 
> wrote:
> 
>> Thanks for the thorough answers. It all sounds good to me.
>> 
>>> On Tue, Dec 6, 2016 at 12:57 PM, Pei He  wrote:
>>> 
>>> Thanks Kenn for the feedback and questions.
>>> 
>>> I responded inline.
>>> 
>>> On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles 
>>> wrote:
>>> 
 I really like this document. It is easy to read and informative. Three
 things not addressed by the document:
 
 1. Major Beam use cases. I'm sure we have a few in the SDK that could
>> be
 outlined in terms of the new API with pseudocode.
>>> 
>>> 
>>> (I am writing pseudocode directly with FileSystem interface to
>> demonstrate.
>>> However, clients will use the utility FileSystems. This is for us to
>> have a
>>> layer between the file systems providers' interface and the client
>>> interface. We can add utility functions to FileSystems for common use
>>> patterns as needed.)
>>> 
>>> Major Beam use cases are the followings:
>>> A. FileBasedSource:
>>> // a. Get input URIs and file sizes from users provided specs.
>>> // Note: I updated the match() to be a bulk operation after I sent my
>> last
>>> email.
>>> List results = match(specList);
>>> List inputMetadataList = FluentIterable.from(results)
>>>.transformAndConcat(
>>>new Function() {
>>>  @Override
>>>  public Iterable apply(MatchResult result) {
>>>return Arrays.asList(result.metadata());
>>>  });
>>> 
>>> // b. Read from a start offset to support the source splitting.
>>> SeekableByteChannel seekChannel = open(fileUri);
>>> seekChannel.position(source.getStartOffset());
>>> seekChannel.read(...);
>>> 
>>> B. FileBasedSink:
>>> // bulk rename temporary files to output files
>>> rename(tempUris, outputUris);
>>> 
>>> C. General file operations:
>>> a. resolve paths
>>> b. create file to write, open file to read (for example in tests).
>>> c. bulk delete files/directories
>>> 
>>> 
>>> 
>>> 2. Related work. How does this differ from other filesystem APIs and why?
>>> 
>>> We need three sets of functionalities:
>>> 1. resolve paths.
>>> 2. read and write channels.
>>> 3. bulk files management operations(bulk delete/rename/match).
>>> 
>>> And, they are available from Java nio, hadoop FileSystem APIs, and other
>>> standard library such as java.net.URI.
>>> 
>>> Current IOChannelFactory interface uses Java nio for (1) and (2), and
>>> define its own interface for (3).
>>> 
>>> In my redesign, I made the following choices:
>>> For (1), I replaced Java nio with URI, because it is standardized and
>>> precise and doesn't require additional implementation of a Path interface
>>> from file system providers.
>>> 
>>> For (2), I kept the uses of Java nio (Writable/SeekableByteChannel),
>> since
>>> I don't see any things that need to improve and I don't see any better
>>> alternatives (hadoop's FSDataInput/OutputStream provide same
>>> functionalities, but requires additional dependencies).
>>> 
>>> For (3), reasons that I didn't choose Java nio or hadoop are:
>>> 1. Beam needs bulk operations API for better performance, however Java
>> nio
>>> and hadoop FileSystems are single file based API.
>>> 2. Have APIs that are File systems agnostic. For example, we can use URI
>>> instead of Path.
>>> 3. Have APIs that are minimum, and easy to implement by file system
>>> providers.
>>> 4. Introducing less dependencies.
>>> 5. It is easy to build an adaptor based on Java nio or hadoop interfaces.
>>> 
>>> 3. Discussion of non-Java languages. It would be good to know what
>> classes
 in e.g. Python we might use in place of URI, SeekableByteChannel, etc.
>>> 
>>> I don't want to mi

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-13 Thread Pei He
One design decision made during previous design discussion [1] is "Replacing
FilePath with URI for resolving files paths". This has been brought back to
dev@ mailing list in my previous email.

Comment [2] asked me to clarify the impact on Windows OS users because
users have to specify the path in the URI format, such as:
"file:///C:/home/input-*"
"C:/home/"

Using URIs in the API is to ensure Beam code is file systems agnostic.

Another alternative is Java Path/File. It is used in the current
IOChannelFactory API, and it works poorly. For example, Path throws when
there are file scheme or asterisk in the path:
new File("file:///C:/home/").toPath() throws in toPath().
Paths.get("C:/home/").resolve("output-*") throws in resolve().

any thoughts and suggestions are welcome.

Thanks
--
Pei

---
[1]:
https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit?disco=A30vtPU#heading=h.p3gc3colc2cs

[2]:
https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit?disco=A02O1cY

On Tue, Dec 6, 2016 at 1:25 PM, Kenneth Knowles 
wrote:

> Thanks for the thorough answers. It all sounds good to me.
>
> On Tue, Dec 6, 2016 at 12:57 PM, Pei He  wrote:
>
> > Thanks Kenn for the feedback and questions.
> >
> > I responded inline.
> >
> > On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles 
> > wrote:
> >
> > > I really like this document. It is easy to read and informative. Three
> > > things not addressed by the document:
> > >
> > > 1. Major Beam use cases. I'm sure we have a few in the SDK that could
> be
> > > outlined in terms of the new API with pseudocode.
> >
> >
> > (I am writing pseudocode directly with FileSystem interface to
> demonstrate.
> > However, clients will use the utility FileSystems. This is for us to
> have a
> > layer between the file systems providers' interface and the client
> > interface. We can add utility functions to FileSystems for common use
> > patterns as needed.)
> >
> > Major Beam use cases are the followings:
> > A. FileBasedSource:
> > // a. Get input URIs and file sizes from users provided specs.
> > // Note: I updated the match() to be a bulk operation after I sent my
> last
> > email.
> > List results = match(specList);
> > List inputMetadataList = FluentIterable.from(results)
> > .transformAndConcat(
> > new Function() {
> >   @Override
> >   public Iterable apply(MatchResult result) {
> > return Arrays.asList(result.metadata());
> >   });
> >
> > // b. Read from a start offset to support the source splitting.
> > SeekableByteChannel seekChannel = open(fileUri);
> > seekChannel.position(source.getStartOffset());
> > seekChannel.read(...);
> >
> > B. FileBasedSink:
> > // bulk rename temporary files to output files
> > rename(tempUris, outputUris);
> >
> > C. General file operations:
> > a. resolve paths
> > b. create file to write, open file to read (for example in tests).
> > c. bulk delete files/directories
> >
> >
> >
> > 2. Related work. How does this differ from other filesystem APIs and why?
> >
> > We need three sets of functionalities:
> > 1. resolve paths.
> > 2. read and write channels.
> > 3. bulk files management operations(bulk delete/rename/match).
> >
> > And, they are available from Java nio, hadoop FileSystem APIs, and other
> > standard library such as java.net.URI.
> >
> > Current IOChannelFactory interface uses Java nio for (1) and (2), and
> > define its own interface for (3).
> >
> > In my redesign, I made the following choices:
> > For (1), I replaced Java nio with URI, because it is standardized and
> > precise and doesn't require additional implementation of a Path interface
> > from file system providers.
> >
> > For (2), I kept the uses of Java nio (Writable/SeekableByteChannel),
> since
> > I don't see any things that need to improve and I don't see any better
> > alternatives (hadoop's FSDataInput/OutputStream provide same
> > functionalities, but requires additional dependencies).
> >
> > For (3), reasons that I didn't choose Java nio or hadoop are:
> > 1. Beam needs bulk operations API for better performance, however Java
> nio
> > and hadoop FileSystems are single file based API.
> > 2. Have APIs that are File systems agnostic. For example, we can use URI
> > instead of Path.
> > 3. Have APIs that are minimum, and easy to implement by file system
> > providers.
> > 4. Introducing less dependencies.
> > 5. It is easy to build an adaptor based on Java nio or hadoop interfaces.
> >
> > 3. Discussion of non-Java languages. It would be good to know what
> classes
> > > in e.g. Python we might use in place of URI, SeekableByteChannel, etc.
> >
> > I don't want to mislead people here without a thorough investigation. You
> > can see from your second question, that would require iterations on
> design
> > and prototyping.
> >
> > I didn't introduce any Java specific requirements in the redesign.
> > Resolving paths, seeking with channels or streams, file m

Re: Beam Tuple

2016-12-13 Thread Kenneth Knowles
If the scope is really just tuples, then supposing a user chooses to go
with Apache Commons tuples or javatuples it seems that the problem to be
solved is easily providing coders for common data types that are not part
of Beam. I think we should address this anyhow.

The scope of having a common format is much more broad. Remember that a
coder is just a proxy for a well-defined binary format [1], so a solution
will fall somewhere in that arena. Even before encoding IDs, We had some
rudimentary support for tagging the most critical common formats [2] [3]
but it was too runner-specific and not a general solution.

Kenn

[1]
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L227
[2]
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/KvCoder.java#L129
[3]
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/IterableCoder.java#L73

On Dec 13, 2016 09:03, "Jean-Baptiste Onofré"  wrote:

> Hi Robert,
>
> Agree, however which one the user would use ? Create his own one ?
>
> Today, I think Beam is heavily flexible in term of data format (which is
> great), but the trade off is that the end-users have to write lot of
> boilerplate code (just to convert from one type to another).
>
> So, basically, the purpose of a Beam Tuple is to have something provided
> out of box: if the user wants to use another tuple, that's fine.
> Generally speaking, the discussion about data format extension is about to
> simplify the way for users to manipulate popular data formats.
>
> Regards
> JB
>
> On 12/13/2016 05:56 PM, Robert Bradshaw wrote:
>
>> The Java language isn't very amenable to Tuple APIs as there are several
>> (mutually exclusive?) tradeoffs that must be made, each with their pros
>> and
>> cons. What advantage is there of Beam providing its own tuple API vs.
>> letting users pick whatever tuple library they want and using that with
>> Beam?
>>
>> (I suppose we're already using and encouraging AutoValue which covers a
>> lot
>> of tuple cases.)
>>
>> On Tue, Dec 13, 2016 at 8:20 AM, Aparup Banerjee (apbanerj) <
>> apban...@cisco.com> wrote:
>>
>> We have created one. An untagged Tuple. Will be happy to contribute it to
>>> the community
>>>
>>> Aparup
>>>
>>> On Dec 13, 2016, at 5:11 AM, Amit  wrote:

 I'll add that I know of Beam's PTuple, but my question is about much
 simpler Tuples, untagged.

 On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré 
 wrote:

 Hi Amit,
>
> as discussed together, I think a Tuple abstraction would be good in the
> SDK (more than in the data format extension).
>
> Regards
> JB
>
> On 12/13/2016 11:06 AM, Amit Sela wrote:
>> Hi all,
>>
>> I was wondering why Beam doesn't have tuples as part of the SDK ?
>> To the best of my knowledge all currently supported (OSS) runners:
>>
> Spark,
>>>
 Flink, Apex provide a Tuple abstraction and I was wondering if Beam
>>
> should
>
>> too ?
>>
>> Consider KV for example; it is a special ("*keyed*" by the first
>> field)
>> implementation Tuple2.
>> While KV's importance is far more than being a Tuple2, I'm wondering
>> if
>>
> the
>
>> SDK would benefit from a proper TupleX support ?
>>
>> Thanks,
>> Amit
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Beam Tuple

2016-12-13 Thread Stephen Sisk
I don't have enough info to comment on whether Tuples are the right answer
- but the user problem here is real.

There's a fundamental question I had as a new Beam user which was "how do I
get my data from one ParDo to the next?" This is *really key* - without it,
doing basic pipelines is not possible, so there should hopefully be
something very simple for users. This is also an area where advanced users
with lots of knowledge (aka, people reading this list) have a lot of
knowledge they can use to decide the exact correct solution to their
problem, but for beginning users learning beam, they just want to know how
to do this seemingly simple task - if the answer is "here, read lots of
documentations about coders", we're giving users an intimidating first user
experience that will likely block their first pipeline creation experience.

Having *something* that's a simple answer would be helpful. What I've seen
from the docs don't seem to make it clear. The Beam docs don't talk about
it at all (yet!), and looking at the old the dataflow docs, from what I can
see, it forces the user to go through several jumps of understanding/read
docs in different areas.

For AutoValue - do we have clear guidance/code labs/examples showing users
how to use AutoValue and what coder to use with AutoValue? There's a real
trade-off there since it involves users learning several concepts vs
Tuples, which it sounds like most folks trying to do data processing would
be familiar with from other tools.

Like I said - I'm not speaking up for or against Tuples, but Beam should
have an answer. If we did have a built-in Tuple, I would think it would be
good for it to have a robust coder already in the coder registry.

Robert - can you speak to what exactly the Tuple tradeoffs are, and why it
wouldn't be appropriate for beam to at least push users towards one? I'd
like to hear more about that.

S

On Tue, Dec 13, 2016 at 10:03 AM Robert Bradshaw
 wrote:

> On Tue, Dec 13, 2016 at 9:02 AM, Jean-Baptiste Onofré 
> wrote:
> > Hi Robert,
> >
> > Agree, however which one the user would use ? Create his own one ?
>
> Whichever suits their needs best, which could include his or her own.
>
> > Today, I think Beam is heavily flexible in term of data format (which is
> > great), but the trade off is that the end-users have to write lot of
> > boilerplate code (just to convert from one type to another).
> >
> > So, basically, the purpose of a Beam Tuple is to have something provided
> out
> > of box: if the user wants to use another tuple, that's fine.
> > Generally speaking, the discussion about data format extension is about
> to
> > simplify the way for users to manipulate popular data formats.
>
> If I understand correctly, the proposal is to pick (or write) a Tuple
> API and bless it by shipping it with the SDK along with beam-specific
> helper code. I'd be helpful to see concretely how large of a savings
> this would be to a user, and whether that's worth the cost.
>
> > On 12/13/2016 05:56 PM, Robert Bradshaw wrote:
> >>
> >> The Java language isn't very amenable to Tuple APIs as there are several
> >> (mutually exclusive?) tradeoffs that must be made, each with their pros
> >> and
> >> cons. What advantage is there of Beam providing its own tuple API vs.
> >> letting users pick whatever tuple library they want and using that with
> >> Beam?
> >>
> >> (I suppose we're already using and encouraging AutoValue which covers a
> >> lot
> >> of tuple cases.)
> >>
> >> On Tue, Dec 13, 2016 at 8:20 AM, Aparup Banerjee (apbanerj) <
> >> apban...@cisco.com> wrote:
> >>
> >>> We have created one. An untagged Tuple. Will be happy to contribute it
> to
> >>> the community
> >>>
> >>> Aparup
> >>>
>  On Dec 13, 2016, at 5:11 AM, Amit  wrote:
> 
>  I'll add that I know of Beam's PTuple, but my question is about much
>  simpler Tuples, untagged.
> 
>  On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré  >
>  wrote:
> 
> > Hi Amit,
> >
> > as discussed together, I think a Tuple abstraction would be good in
> the
> > SDK (more than in the data format extension).
> >
> > Regards
> > JB
> >
> >> On 12/13/2016 11:06 AM, Amit Sela wrote:
> >> Hi all,
> >>
> >> I was wondering why Beam doesn't have tuples as part of the SDK ?
> >> To the best of my knowledge all currently supported (OSS) runners:
> >>>
> >>> Spark,
> >>
> >> Flink, Apex provide a Tuple abstraction and I was wondering if Beam
> >
> > should
> >>
> >> too ?
> >>
> >> Consider KV for example; it is a special ("*keyed*" by the first
> >> field)
> >> implementation Tuple2.
> >> While KV's importance is far more than being a Tuple2, I'm wondering
> >> if
> >
> > the
> >>
> >> SDK would benefit from a proper TupleX support ?
> >>
> >> Thanks,
> >> Amit
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http

Re: Beam Tuple

2016-12-13 Thread Robert Bradshaw
On Tue, Dec 13, 2016 at 9:02 AM, Jean-Baptiste Onofré  wrote:
> Hi Robert,
>
> Agree, however which one the user would use ? Create his own one ?

Whichever suits their needs best, which could include his or her own.

> Today, I think Beam is heavily flexible in term of data format (which is
> great), but the trade off is that the end-users have to write lot of
> boilerplate code (just to convert from one type to another).
>
> So, basically, the purpose of a Beam Tuple is to have something provided out
> of box: if the user wants to use another tuple, that's fine.
> Generally speaking, the discussion about data format extension is about to
> simplify the way for users to manipulate popular data formats.

If I understand correctly, the proposal is to pick (or write) a Tuple
API and bless it by shipping it with the SDK along with beam-specific
helper code. I'd be helpful to see concretely how large of a savings
this would be to a user, and whether that's worth the cost.

> On 12/13/2016 05:56 PM, Robert Bradshaw wrote:
>>
>> The Java language isn't very amenable to Tuple APIs as there are several
>> (mutually exclusive?) tradeoffs that must be made, each with their pros
>> and
>> cons. What advantage is there of Beam providing its own tuple API vs.
>> letting users pick whatever tuple library they want and using that with
>> Beam?
>>
>> (I suppose we're already using and encouraging AutoValue which covers a
>> lot
>> of tuple cases.)
>>
>> On Tue, Dec 13, 2016 at 8:20 AM, Aparup Banerjee (apbanerj) <
>> apban...@cisco.com> wrote:
>>
>>> We have created one. An untagged Tuple. Will be happy to contribute it to
>>> the community
>>>
>>> Aparup
>>>
 On Dec 13, 2016, at 5:11 AM, Amit  wrote:

 I'll add that I know of Beam's PTuple, but my question is about much
 simpler Tuples, untagged.

 On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré 
 wrote:

> Hi Amit,
>
> as discussed together, I think a Tuple abstraction would be good in the
> SDK (more than in the data format extension).
>
> Regards
> JB
>
>> On 12/13/2016 11:06 AM, Amit Sela wrote:
>> Hi all,
>>
>> I was wondering why Beam doesn't have tuples as part of the SDK ?
>> To the best of my knowledge all currently supported (OSS) runners:
>>>
>>> Spark,
>>
>> Flink, Apex provide a Tuple abstraction and I was wondering if Beam
>
> should
>>
>> too ?
>>
>> Consider KV for example; it is a special ("*keyed*" by the first
>> field)
>> implementation Tuple2.
>> While KV's importance is far more than being a Tuple2, I'm wondering
>> if
>
> the
>>
>> SDK would benefit from a proper TupleX support ?
>>
>> Thanks,
>> Amit
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Davor Bonaci
>
> I wanted to suggest if we can have sort of a window or timeline for
> feature/bug code freeze prior to release to ensure stability?
>

Release branches are a typical solution; I think we just need to get better
in using them appropriately.


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Neelesh Salian
+1 to Davor and JB's point.
The incubator stuff can certainly be done thoroughly after the board's
decision (if in favor).
Graduation aside, keeping up the cadence of the release would be better in
the product life cycle.

Perhaps a different thread altogether but I wanted to suggest if we can
have sort of a window or timeline for feature/bug code freeze prior to
release to ensure stability?


On Tue, Dec 13, 2016 at 9:19 AM, Davor Bonaci 
wrote:

> I'd suggest to proceed with 0.4.0-incubating (as JB previously planned).
>
> My reasoning: I don't think we'll be able to release a non-incubating
> release next week, regardless of the Board's graduation decision. I think
> it will take a while (more details to follow on a separate thread). On the
> other hand, 0.3.0-incubating has some important issues (e.g., template
> projects don't work across runners, WordCount has issues on Windows OS). I
> think it makes sense to fix these issues for our users, and have a better
> product if/when the graduation announcement comes.
>
> On Tue, Dec 13, 2016 at 9:05 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi,
> >
> > Either way is fine for me too.
> >
> > We discussed about the release schedule independently from the graduation
> > process, that's why 0.4.0-incubator was planned around today.
> >
> > Regards
> > JB
> >
> >
> > On 12/13/2016 06:02 PM, Daniel Kulp wrote:
> >
> >> Hate to suggest this….
> >>
> >> Assuming the Board OK’s the graduation next Wednesday, if we wait till
> >> then to do the build, we can drop the the incubator stuff entirely and
> it
> >> could be a “first release” outside of incubation.   We could avoid the
> >> extra vote on the incubator list, etc….
> >>
> >> Would it make sense to delay the week?   Not a big deal either way, but
> I
> >> don’t think I’ve ever seen a project do a release between the graduation
> >> vote and the board vote.   Every project I’ve seen decided to wait to
> have
> >> the “we’ve graduated!” release.
> >>
> >> Dan
> >>
> >>
> >>
> >> On Dec 13, 2016, at 9:43 AM, Dan Halperin 
> >>> wrote:
> >>>
> >>> Update: we think we've knocked off all the 0.4.0-incubating blockers,
> >>> including postponing some. JB is going to start the release process
> soon!
> >>>
> >>> On Sat, Dec 3, 2016 at 10:42 PM, Jean-Baptiste Onofré  >
> >>> wrote:
> >>>
> >>> Very good point Frances.
> 
>  Definitely something we have to do.
> 
>  Regards
>  JB
> 
> 
>  On 12/04/2016 07:38 AM, Frances Perry wrote:
> 
>  Sounds great, JB!
> >
> > The major blocker in my opinion is to finish the polishing pass on
> the
> > quickstarts and example archetypes, so that users will have a great
> > experience trying out 0.4.0-incubating. I know we've made some
> > significant
> > progress there in the last few weeks, but I don't think we've quite
> > finished. For example, https://issues.apache.org/
> jira/browse/BEAM-909
> > is
> > unresolved and marked as 0.4.0-incubating.
> >
> > On Sat, Dec 3, 2016 at 10:26 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> > >
> > wrote:
> >
> > Hi beamers,
> >
> >>
> >> We plan a 0.4.0-incubating release pretty soon. I propose to manage
> >> this
> >> release.
> >>
> >> I started to review the Jira with fix version set to
> 0.4.0-incubating.
> >>
> >> Please, update the fix version in Jira if you are working on
> specific
> >> Jira
> >> and you want to include in the 0.4.0-incubating release.
> >>
> >> Thanks
> >> Regards
> >> JB
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >>
> >>
> > --
>  Jean-Baptiste Onofré
>  jbono...@apache.org
>  http://blog.nanthrax.net
>  Talend - http://www.talend.com
> 
> 
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>



-- 
Neelesh Srinivas Salian
Customer Operations Engineer


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Davor Bonaci
I'd suggest to proceed with 0.4.0-incubating (as JB previously planned).

My reasoning: I don't think we'll be able to release a non-incubating
release next week, regardless of the Board's graduation decision. I think
it will take a while (more details to follow on a separate thread). On the
other hand, 0.3.0-incubating has some important issues (e.g., template
projects don't work across runners, WordCount has issues on Windows OS). I
think it makes sense to fix these issues for our users, and have a better
product if/when the graduation announcement comes.

On Tue, Dec 13, 2016 at 9:05 AM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> Either way is fine for me too.
>
> We discussed about the release schedule independently from the graduation
> process, that's why 0.4.0-incubator was planned around today.
>
> Regards
> JB
>
>
> On 12/13/2016 06:02 PM, Daniel Kulp wrote:
>
>> Hate to suggest this….
>>
>> Assuming the Board OK’s the graduation next Wednesday, if we wait till
>> then to do the build, we can drop the the incubator stuff entirely and it
>> could be a “first release” outside of incubation.   We could avoid the
>> extra vote on the incubator list, etc….
>>
>> Would it make sense to delay the week?   Not a big deal either way, but I
>> don’t think I’ve ever seen a project do a release between the graduation
>> vote and the board vote.   Every project I’ve seen decided to wait to have
>> the “we’ve graduated!” release.
>>
>> Dan
>>
>>
>>
>> On Dec 13, 2016, at 9:43 AM, Dan Halperin 
>>> wrote:
>>>
>>> Update: we think we've knocked off all the 0.4.0-incubating blockers,
>>> including postponing some. JB is going to start the release process soon!
>>>
>>> On Sat, Dec 3, 2016 at 10:42 PM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Very good point Frances.

 Definitely something we have to do.

 Regards
 JB


 On 12/04/2016 07:38 AM, Frances Perry wrote:

 Sounds great, JB!
>
> The major blocker in my opinion is to finish the polishing pass on the
> quickstarts and example archetypes, so that users will have a great
> experience trying out 0.4.0-incubating. I know we've made some
> significant
> progress there in the last few weeks, but I don't think we've quite
> finished. For example, https://issues.apache.org/jira/browse/BEAM-909
> is
> unresolved and marked as 0.4.0-incubating.
>
> On Sat, Dec 3, 2016 at 10:26 PM, Jean-Baptiste Onofré  >
> wrote:
>
> Hi beamers,
>
>>
>> We plan a 0.4.0-incubating release pretty soon. I propose to manage
>> this
>> release.
>>
>> I started to review the Jira with fix version set to 0.4.0-incubating.
>>
>> Please, update the fix version in Jira if you are working on specific
>> Jira
>> and you want to include in the 0.4.0-incubating release.
>>
>> Thanks
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
> --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com


>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Jean-Baptiste Onofré

Hi,

Either way is fine for me too.

We discussed about the release schedule independently from the 
graduation process, that's why 0.4.0-incubator was planned around today.


Regards
JB

On 12/13/2016 06:02 PM, Daniel Kulp wrote:

Hate to suggest this….

Assuming the Board OK’s the graduation next Wednesday, if we wait till then to 
do the build, we can drop the the incubator stuff entirely and it could be a 
“first release” outside of incubation.   We could avoid the extra vote on the 
incubator list, etc….

Would it make sense to delay the week?   Not a big deal either way, but I don’t 
think I’ve ever seen a project do a release between the graduation vote and the 
board vote.   Every project I’ve seen decided to wait to have the “we’ve 
graduated!” release.

Dan




On Dec 13, 2016, at 9:43 AM, Dan Halperin  wrote:

Update: we think we've knocked off all the 0.4.0-incubating blockers,
including postponing some. JB is going to start the release process soon!

On Sat, Dec 3, 2016 at 10:42 PM, Jean-Baptiste Onofré 
wrote:


Very good point Frances.

Definitely something we have to do.

Regards
JB


On 12/04/2016 07:38 AM, Frances Perry wrote:


Sounds great, JB!

The major blocker in my opinion is to finish the polishing pass on the
quickstarts and example archetypes, so that users will have a great
experience trying out 0.4.0-incubating. I know we've made some significant
progress there in the last few weeks, but I don't think we've quite
finished. For example, https://issues.apache.org/jira/browse/BEAM-909 is
unresolved and marked as 0.4.0-incubating.

On Sat, Dec 3, 2016 at 10:26 PM, Jean-Baptiste Onofré 
wrote:

Hi beamers,


We plan a 0.4.0-incubating release pretty soon. I propose to manage this
release.

I started to review the Jira with fix version set to 0.4.0-incubating.

Please, update the fix version in Jira if you are working on specific
Jira
and you want to include in the 0.4.0-incubating release.

Thanks
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam Tuple

2016-12-13 Thread Robert Bradshaw
The Java language isn't very amenable to Tuple APIs as there are several
(mutually exclusive?) tradeoffs that must be made, each with their pros and
cons. What advantage is there of Beam providing its own tuple API vs.
letting users pick whatever tuple library they want and using that with
Beam?

(I suppose we're already using and encouraging AutoValue which covers a lot
of tuple cases.)

On Tue, Dec 13, 2016 at 8:20 AM, Aparup Banerjee (apbanerj) <
apban...@cisco.com> wrote:

> We have created one. An untagged Tuple. Will be happy to contribute it to
> the community
>
> Aparup
>
> > On Dec 13, 2016, at 5:11 AM, Amit  wrote:
> >
> > I'll add that I know of Beam's PTuple, but my question is about much
> > simpler Tuples, untagged.
> >
> > On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré 
> > wrote:
> >
> >> Hi Amit,
> >>
> >> as discussed together, I think a Tuple abstraction would be good in the
> >> SDK (more than in the data format extension).
> >>
> >> Regards
> >> JB
> >>
> >>> On 12/13/2016 11:06 AM, Amit Sela wrote:
> >>> Hi all,
> >>>
> >>> I was wondering why Beam doesn't have tuples as part of the SDK ?
> >>> To the best of my knowledge all currently supported (OSS) runners:
> Spark,
> >>> Flink, Apex provide a Tuple abstraction and I was wondering if Beam
> >> should
> >>> too ?
> >>>
> >>> Consider KV for example; it is a special ("*keyed*" by the first field)
> >>> implementation Tuple2.
> >>> While KV's importance is far more than being a Tuple2, I'm wondering if
> >> the
> >>> SDK would benefit from a proper TupleX support ?
> >>>
> >>> Thanks,
> >>> Amit
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
>


Re: Beam Tuple

2016-12-13 Thread Jean-Baptiste Onofré

Hi Robert,

Agree, however which one the user would use ? Create his own one ?

Today, I think Beam is heavily flexible in term of data format (which is 
great), but the trade off is that the end-users have to write lot of 
boilerplate code (just to convert from one type to another).


So, basically, the purpose of a Beam Tuple is to have something provided 
out of box: if the user wants to use another tuple, that's fine.
Generally speaking, the discussion about data format extension is about 
to simplify the way for users to manipulate popular data formats.


Regards
JB

On 12/13/2016 05:56 PM, Robert Bradshaw wrote:

The Java language isn't very amenable to Tuple APIs as there are several
(mutually exclusive?) tradeoffs that must be made, each with their pros and
cons. What advantage is there of Beam providing its own tuple API vs.
letting users pick whatever tuple library they want and using that with
Beam?

(I suppose we're already using and encouraging AutoValue which covers a lot
of tuple cases.)

On Tue, Dec 13, 2016 at 8:20 AM, Aparup Banerjee (apbanerj) <
apban...@cisco.com> wrote:


We have created one. An untagged Tuple. Will be happy to contribute it to
the community

Aparup


On Dec 13, 2016, at 5:11 AM, Amit  wrote:

I'll add that I know of Beam's PTuple, but my question is about much
simpler Tuples, untagged.

On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré 
wrote:


Hi Amit,

as discussed together, I think a Tuple abstraction would be good in the
SDK (more than in the data format extension).

Regards
JB


On 12/13/2016 11:06 AM, Amit Sela wrote:
Hi all,

I was wondering why Beam doesn't have tuples as part of the SDK ?
To the best of my knowledge all currently supported (OSS) runners:

Spark,

Flink, Apex provide a Tuple abstraction and I was wondering if Beam

should

too ?

Consider KV for example; it is a special ("*keyed*" by the first field)
implementation Tuple2.
While KV's importance is far more than being a Tuple2, I'm wondering if

the

SDK would benefit from a proper TupleX support ?

Thanks,
Amit



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com







--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Daniel Kulp
Hate to suggest this….

Assuming the Board OK’s the graduation next Wednesday, if we wait till then to 
do the build, we can drop the the incubator stuff entirely and it could be a 
“first release” outside of incubation.   We could avoid the extra vote on the 
incubator list, etc….

Would it make sense to delay the week?   Not a big deal either way, but I don’t 
think I’ve ever seen a project do a release between the graduation vote and the 
board vote.   Every project I’ve seen decided to wait to have the “we’ve 
graduated!” release.

Dan



> On Dec 13, 2016, at 9:43 AM, Dan Halperin  wrote:
> 
> Update: we think we've knocked off all the 0.4.0-incubating blockers,
> including postponing some. JB is going to start the release process soon!
> 
> On Sat, Dec 3, 2016 at 10:42 PM, Jean-Baptiste Onofré 
> wrote:
> 
>> Very good point Frances.
>> 
>> Definitely something we have to do.
>> 
>> Regards
>> JB
>> 
>> 
>> On 12/04/2016 07:38 AM, Frances Perry wrote:
>> 
>>> Sounds great, JB!
>>> 
>>> The major blocker in my opinion is to finish the polishing pass on the
>>> quickstarts and example archetypes, so that users will have a great
>>> experience trying out 0.4.0-incubating. I know we've made some significant
>>> progress there in the last few weeks, but I don't think we've quite
>>> finished. For example, https://issues.apache.org/jira/browse/BEAM-909 is
>>> unresolved and marked as 0.4.0-incubating.
>>> 
>>> On Sat, Dec 3, 2016 at 10:26 PM, Jean-Baptiste Onofré 
>>> wrote:
>>> 
>>> Hi beamers,
 
 We plan a 0.4.0-incubating release pretty soon. I propose to manage this
 release.
 
 I started to review the Jira with fix version set to 0.4.0-incubating.
 
 Please, update the fix version in Jira if you are working on specific
 Jira
 and you want to include in the 0.4.0-incubating release.
 
 Thanks
 Regards
 JB
 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com
 
 
>>> 
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>> 

-- 
Daniel Kulp
dk...@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com



Re: Beam Tuple

2016-12-13 Thread Aparup Banerjee (apbanerj)
We have created one. An untagged Tuple. Will be happy to contribute it to the 
community

Aparup

> On Dec 13, 2016, at 5:11 AM, Amit  wrote:
> 
> I'll add that I know of Beam's PTuple, but my question is about much
> simpler Tuples, untagged.
> 
> On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré 
> wrote:
> 
>> Hi Amit,
>> 
>> as discussed together, I think a Tuple abstraction would be good in the
>> SDK (more than in the data format extension).
>> 
>> Regards
>> JB
>> 
>>> On 12/13/2016 11:06 AM, Amit Sela wrote:
>>> Hi all,
>>> 
>>> I was wondering why Beam doesn't have tuples as part of the SDK ?
>>> To the best of my knowledge all currently supported (OSS) runners: Spark,
>>> Flink, Apex provide a Tuple abstraction and I was wondering if Beam
>> should
>>> too ?
>>> 
>>> Consider KV for example; it is a special ("*keyed*" by the first field)
>>> implementation Tuple2.
>>> While KV's importance is far more than being a Tuple2, I'm wondering if
>> the
>>> SDK would benefit from a proper TupleX support ?
>>> 
>>> Thanks,
>>> Amit
>>> 
>> 
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>> 


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Dan Halperin
Update: we think we've knocked off all the 0.4.0-incubating blockers,
including postponing some. JB is going to start the release process soon!

On Sat, Dec 3, 2016 at 10:42 PM, Jean-Baptiste Onofré 
wrote:

> Very good point Frances.
>
> Definitely something we have to do.
>
> Regards
> JB
>
>
> On 12/04/2016 07:38 AM, Frances Perry wrote:
>
>> Sounds great, JB!
>>
>> The major blocker in my opinion is to finish the polishing pass on the
>> quickstarts and example archetypes, so that users will have a great
>> experience trying out 0.4.0-incubating. I know we've made some significant
>> progress there in the last few weeks, but I don't think we've quite
>> finished. For example, https://issues.apache.org/jira/browse/BEAM-909 is
>> unresolved and marked as 0.4.0-incubating.
>>
>> On Sat, Dec 3, 2016 at 10:26 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Hi beamers,
>>>
>>> We plan a 0.4.0-incubating release pretty soon. I propose to manage this
>>> release.
>>>
>>> I started to review the Jira with fix version set to 0.4.0-incubating.
>>>
>>> Please, update the fix version in Jira if you are working on specific
>>> Jira
>>> and you want to include in the 0.4.0-incubating release.
>>>
>>> Thanks
>>> Regards
>>> JB
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Jenkins pre/postcommit increased from 35m to 60m+ on Friday

2016-12-13 Thread Dan Halperin
status.apache.org is an excellent resource:
http://status.apache.org/#wplaceholder_86

It does indeed look like Jenkins has been having issues for nearly a week.

On Mon, Dec 12, 2016 at 6:32 PM, Kenneth Knowles 
wrote:

> Great. That means the timestamp change I made for Travis, ported to
> Jenkins, should reveal more.
>
> Meanwhile - any known issues with Jenkins or Maven Central? Status
> dashboard for Maven Central doesn't look unhappy.
>
> On Mon, Dec 12, 2016 at 6:25 PM, Dan Halperin  >
> wrote:
>
> > From the "bad run", the Maven part took 35 minutes and presumably the
> rest
> > is Jenkins / Maven / downloading overhead.
> >
> > [INFO] 
> > 
> > [INFO] Reactor Summary:
> > [INFO]
> > [INFO] Apache Beam :: Parent .. SUCCESS [
> > 37.558 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Build Tools . SUCCESS [
> > 8.172 s]
> > [INFO] Apache Beam :: SDKs  SUCCESS [
> > 11.521 s]
> > [INFO] Apache Beam :: SDKs :: Java  SUCCESS [
> > 10.801 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Core  SUCCESS
> [04:04
> > min]
> > [INFO] Apache Beam :: Runners . SUCCESS [
> > 14.945 s]
> > [INFO] Apache Beam :: Runners :: Core Java  SUCCESS [
> > 44.654 s]
> > [INFO] Apache Beam :: Runners :: Direct Java .. SUCCESS
> [02:06
> > min]
> > [INFO] Apache Beam :: Runners :: Google Cloud Dataflow  SUCCESS [
> > 33.245 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO .. SUCCESS [
> > 4.047 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: Google Cloud Platform
> > SUCCESS [04:32 min]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: HDFS .. SUCCESS [
> > 32.009 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: JMS ... SUCCESS [
> > 19.006 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: Kafka . SUCCESS [
> > 22.021 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: Kinesis ... SUCCESS [
> > 22.817 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: MongoDB ... SUCCESS [
> > 27.276 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: JDBC .. SUCCESS [
> > 23.662 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes  SUCCESS [
> > 3.115 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter
> > SUCCESS [ 17.818 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Examples
> > SUCCESS [ 15.111 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Examples -
> > Java 8 SUCCESS [ 24.477 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Extensions .. SUCCESS [
> > 5.759 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Extensions :: Join library
> > SUCCESS [ 22.293 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Extensions :: Sorter  SUCCESS [
> > 31.145 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Java 8 Tests  SUCCESS [
> > 8.033 s]
> > [INFO] Apache Beam :: Runners :: Flink  SUCCESS [
> > 6.503 s]
> > [INFO] Apache Beam :: Runners :: Flink :: Core  SUCCESS [
> > 43.593 s]
> > [INFO] Apache Beam :: Runners :: Flink :: Examples  SUCCESS [
> > 20.006 s]
> > [INFO] Apache Beam :: Runners :: Spark  SUCCESS
> [03:10
> > min]
> > [INFO] Apache Beam :: Runners :: Apex . SUCCESS
> [01:03
> > min]
> > [INFO] Apache Beam :: Examples  SUCCESS [
> > 8.124 s]
> > [INFO] Apache Beam :: Examples :: Java  SUCCESS
> [05:29
> > min]
> > [INFO] Apache Beam :: Examples :: Java 8 .. SUCCESS [
> > 19.005 s]
> > [INFO] 
> > 
> > [INFO] BUILD SUCCESS
> > [INFO] 
> > 
> > [INFO] Total time: 34:30 min
> > [INFO] Finished at: 2016-12-09T18:50:49+00:00
> > [INFO] Final Memory: 196M/1051M
> > [INFO] 
> > 
> >
> >
> >
> > On Mon, Dec 12, 2016 at 5:36 PM, Kenneth Knowles  >
> > wrote:
> >
> > > Hi all,
> > >
> > > We have a huge Jenkins backlog, surely exacerbated by the fact that our
> > > test time (precommit and postcommit mvn install) has roughly doubled in
> > the
> > > last few days.
> > >
> > > Here's the quick link to the trend:
> > > https://builds.apache.org/view/Beam/job/beam_PostCommit_
> > Java_MavenInstall/
> > > buildTimeTrend
> > >
> > > Good 33m build at 2016-12-09 02:42:
> > > https://builds.apache.org/view/Beam/job/beam_PostCommit_
> > > Java_MavenInstall/2041/
> > >
> > > Bad 59m build at  2016-12-09 18:00 (trigger by timer):
> > > https://builds.apache.org/view/Beam/job/beam_PostCommit_
> > > Java_MavenInstall/2048/
> > >
> > > There are a couple middling runs in between that I can't place
> > immediately
> > > into either bucket. I'm still l

Re: Beam Tuple

2016-12-13 Thread Amit Sela
I'll add that I know of Beam's PTuple, but my question is about much
simpler Tuples, untagged.

On Tue, Dec 13, 2016 at 1:56 PM Jean-Baptiste Onofré 
wrote:

> Hi Amit,
>
> as discussed together, I think a Tuple abstraction would be good in the
> SDK (more than in the data format extension).
>
> Regards
> JB
>
> On 12/13/2016 11:06 AM, Amit Sela wrote:
> > Hi all,
> >
> > I was wondering why Beam doesn't have tuples as part of the SDK ?
> > To the best of my knowledge all currently supported (OSS) runners: Spark,
> > Flink, Apex provide a Tuple abstraction and I was wondering if Beam
> should
> > too ?
> >
> > Consider KV for example; it is a special ("*keyed*" by the first field)
> > implementation Tuple2.
> > While KV's importance is far more than being a Tuple2, I'm wondering if
> the
> > SDK would benefit from a proper TupleX support ?
> >
> > Thanks,
> > Amit
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Beam Tuple

2016-12-13 Thread Jean-Baptiste Onofré

Hi Amit,

as discussed together, I think a Tuple abstraction would be good in the 
SDK (more than in the data format extension).


Regards
JB

On 12/13/2016 11:06 AM, Amit Sela wrote:

Hi all,

I was wondering why Beam doesn't have tuples as part of the SDK ?
To the best of my knowledge all currently supported (OSS) runners: Spark,
Flink, Apex provide a Tuple abstraction and I was wondering if Beam should
too ?

Consider KV for example; it is a special ("*keyed*" by the first field)
implementation Tuple2.
While KV's importance is far more than being a Tuple2, I'm wondering if the
SDK would benefit from a proper TupleX support ?

Thanks,
Amit



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Beam Tuple

2016-12-13 Thread Amit Sela
Hi all,

I was wondering why Beam doesn't have tuples as part of the SDK ?
To the best of my knowledge all currently supported (OSS) runners: Spark,
Flink, Apex provide a Tuple abstraction and I was wondering if Beam should
too ?

Consider KV for example; it is a special ("*keyed*" by the first field)
implementation Tuple2.
While KV's importance is far more than being a Tuple2, I'm wondering if the
SDK would benefit from a proper TupleX support ?

Thanks,
Amit


Re: PCollection to PCollection Conversion

2016-12-13 Thread Amit Sela
It seems that there were a lot of good points raised here, and I tend to
agree that something as trivial and lean as "ToString" should be a part of
core.
I'm particularly fond of makeString(prefix, toString, suffix) in various
combinations (Scala-like).
For "fromString", I think JB has a good point leveraging JAXB and Jackson -
though I think this should be in extensions as it is not as lean as
toString.

Thanks,
Amit

On Wed, Nov 30, 2016 at 5:13 AM Jean-Baptiste Onofré 
wrote:

> Hi Jesse,
>
> yes, I started something there (using JAXB and Jackson). Let me polish
> and push.
>
> Regards
> JB
>
> On 11/29/2016 10:00 PM, Jesse Anderson wrote:
> > I went through the string conversions. Do you have an example of writing
> > out XML/JSON/etc too?
> >
> > On Tue, Nov 29, 2016 at 3:46 PM Jean-Baptiste Onofré 
> > wrote:
> >
> >> Hi Jesse,
> >>
> >>
> >>
> https://github.com/jbonofre/incubator-beam/tree/DATAFORMAT/sdks/java/extensions/dataformat
> >>
> >> it's very simple and stupid and of course not complete at all (I have
> >> other commits but not merged as they need some polishing), but as I
> >> said, it's a base of discussion.
> >>
> >> Regards
> >> JB
> >>
> >> On 11/29/2016 09:23 PM, Jesse Anderson wrote:
> >>> @jb Sounds good. Just let us know once you've pushed.
> >>>
> >>> On Tue, Nov 29, 2016 at 2:54 PM Jean-Baptiste Onofré 
> >>> wrote:
> >>>
>  Good point Eugene.
> 
>  Right now, it's a DoFn collection to experiment a bit (a pure
>  extension). It's pretty stupid ;)
> 
>  But, you are right, depending the direction of such extension, it
> could
>  cover more use cases (even if it's not my first intention ;)).
> 
>  Let me push the branch (pretty small) as an illustration, and in the
>  mean time, I'm preparing a document (more focused on the use cases).
> 
>  WDYT ?
> 
>  Regards
>  JB
> 
>  On 11/29/2016 08:47 PM, Eugene Kirpichov wrote:
> > Hi JB,
> > Depending on the scope of what you want to ultimately accomplish with
>  this
> > extension, I think it may make sense to write a proposal document and
> > discuss it.
> > If it's just a collection of utility DoFn's for various well-defined
> > source/target format pairs, then that's probably not needed, but if
> >> it's
> > anything more, then I think it is.
> > That will help avoid a lot of churn if people propose reasonable
> > significant changes.
> >
> > On Tue, Nov 29, 2016 at 11:15 AM Jean-Baptiste Onofré <
> j...@nanthrax.net
> >>>
> > wrote:
> >
> >> By the way Jesse, I gonna push my DATAFORMAT branch on my github
> and I
> >> will post on the dev mailing list when done.
> >>
> >> Regards
> >> JB
> >>
> >> On 11/29/2016 07:01 PM, Jesse Anderson wrote:
> >>> I want to bring this thread back up since we've had time to think
> >> about
> >> it
> >>> more and make a plan.
> >>>
> >>> I think a format-specific converter will be more time consuming
> task
>  than
> >>> we originally thought. It'd have to be a writer that takes another
>  writer
> >>> as a parameter.
> >>>
> >>> I think a string converter can be done as a simple transform.
> >>>
> >>> I think we should start with a simple string converter and plan
> for a
> >>> format-specific writer.
> >>>
> >>> What are your thoughts?
> >>>
> >>> Thanks,
> >>>
> >>> Jesse
> >>>
> >>> On Thu, Nov 10, 2016 at 10:33 AM Jesse Anderson <
> >> je...@smokinghand.com
> >
> >>> wrote:
> >>>
> >>> I was thinking about what the outputs would look like last night. I
> >>> realized that more complex formats like JSON and XML may or may not
> >> output
> >>> the data in a valid format.
> >>>
> >>> Doing a direct conversion on unbounded collections would work just
>  fine.
> >>> They're self-contained. For writing out bounded collections, that's
>  where
> >>> we'll hit the issues. This changes the uber conversion transform
> >> into a
> >>> transform that needs to be a writer.
> >>>
> >>> If a transform executes a JSON conversion on a per element basis,
> >> we'd
> >> get
> >>> this:
> >>> {
> >>> "key": "value"
> >>> }, {
> >>> "key": "value"
> >>> },
> >>>
> >>> That isn't valid JSON.
> >>>
> >>> The conversion transform would need to know do several things when
> >> writing
> >>> out a file. It would need to add brackets for an array. Now we
> have:
> >>> [
> >>> {
> >>> "key": "value"
> >>> }, {
> >>> "key": "value"
> >>> },
> >>> ]
> >>>
> >>> We still don't have valid JSON. We have to remove the last comma or
>  have
> >>> the uber transform start putting in the commas, except for the last
> >> element.
> >>>
> >>> [
> >>> {
> >>> "key": "value"
> >>> }, {
> >>> "key": "value"
>