Re: Build failed in Jenkins: beam_SeedJob #356

2017-07-20 Thread Stephen Sisk
Apologies for the seed job failure spam - I'm working on getting the Jdbc IO IT working in Jenkins. S On Thu, Jul 20, 2017 at 6:12 PM Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See > >

Re: Docs/guidelines on writing filesystem sources and sinks

2017-07-06 Thread Stephen Sisk
Hi Dmitry, I'm excited to hear that you'd like to do this work. If you haven't already, I'd first suggest that you open a JIRA issue to make sure other folks know you're working on this. I was involved in working on the recent java HDFS file system implementation, so I'll try and share what I

Re: Making it easier to run IO ITs

2017-07-05 Thread Stephen Sisk
I also wrote up this dev doc that goes into more depth on how this will all work, as well as what it will be like to create a new IO IT. https://docs.google.com/document/d/1fISxgeq4Cbr-YRJQDgpnHxfTiQiHv8zQgb47dSvvJ78/edit?usp=sharing S On Wed, Jul 5, 2017 at 3:11 PM Stephen Sisk &l

Making it easier to run IO ITs

2017-07-05 Thread Stephen Sisk
hey all, I wanted to share an early draft of what it'll be like to invoke mvn for the IO integration tests in the future when we have the integration with kubernetes going. I'm really excited about these changes - working on the IO ITs, I have to run them frequently, and the command lines to run

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-28 Thread Stephen Sisk
hi! I'm hopeful we can get the fix for BEAM-2533 into this release as well, there's a bigtable fix in the next version that'd be good to have. The bigtable client release should be in the next day or two. S On Mon, Jun 26, 2017 at 12:03 PM Jean-Baptiste Onofré wrote: > Hi

Re: Java Cross-JDK Test Available on Jenkins Postcommit!

2017-06-12 Thread Stephen Sisk
Thanks Mark! this is great. Really appreciate your work on this. S On Fri, Jun 9, 2017 at 4:18 PM Mark Liu wrote: > Good catch! Actually beam_PreCommit_Java_MavenInstall > > > and

Re: [DISCUSS] HadoopInputFormat based IOs

2017-05-30 Thread Stephen Sisk
e Cassandra and Elasticsearch5 examples based > on HIF that will be clearly redundant once we have the native > versions, so they should maybe moved into the proposed website > section. What do you guys think? > > Any other ideas/comments on the general subject? > > &

Re: Python SDK: BigTableIO

2017-05-30 Thread Stephen Sisk
Hey Matthias, to add on to what Chamikara mentioned, we have lots of info in the generic IO authoring guide [1], the Python IO authoring guide [2] and the PTransform Style Guide[3]. The PTransform style guide doesn't sound like it applies, but it has a lot of specific tips from lessons we've

Re: [New Proposal] Hive connector using native api

2017-05-24 Thread Stephen Sisk
one comment I had that I realized was worth bringing back to the mailing list: The Write transform here does batching using startBundle/finishBundle, but I suspect it'd be better to use the GroupIntoBatches transform before doing the actual write. I *think* our general guidance in the future

Re: [DISCUSS] HadoopInputFormat based IOs

2017-05-23 Thread Stephen Sisk
hey, Thanks for bringing this up! It's definitely an interesting question and I can see both sides of the argument. I can see the appeal of HIFIO wrapper IOs as stop-gaps and if they have good test coverage, it does ensure that the HIFIO route is working. If we have good IT coverage, it also

Re: [DISCUSSION] Encouraging more contributions

2017-04-24 Thread Stephen Sisk
general +1 to the concept, including driving down assigned-but-not-actually-being-worked-on items. I also really like the idea of having a mentor on tickets. Etienne, Re: specific help for I/Os - is the I/O Authoring docs not a good answer? https://beam.apache.org/documentation/io/io-toc/ (or

Re: Read/Write Transform Documentation

2017-04-18 Thread Stephen Sisk
Hi Andrew, I'm excited to hear you're working on an I/O - I'd love to hear any feedback about the docs we've got written so far. Sorry they're in a partially completed state. Are you looking to develop in python or java? There's more specific docs for python available in the python SDK guide

Re: Renaming SideOutput

2017-04-11 Thread Stephen Sisk
strong +1 for changing the name away from sideOutput - the fact that sideInput and sideOutput are not really related was definitely a source of confusion for me when learning beam. S On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh wrote: > Hey everyone: > > I'd like to

Re: HDFS and Google Cloud Storage

2017-04-11 Thread Stephen Sisk
This is a great question! I filed https://issues.apache.org/jira/browse/BEAM-1929 to update the I/O docs to make sure they answer this. S On Tue, Apr 11, 2017 at 8:20 AM Shen Li wrote: > Thanks! > > Shen > > On Tue, Apr 11, 2017 at 11:10 AM, Jean-Baptiste Onofré

Re: IO ITs: Hosting Docker images

2017-04-10 Thread Stephen Sisk
; JB > > On Apr 10, 2017, 18:58, at 18:58, Ekrem Aksoy <ekremak...@gmail.com> > wrote: > >Hi Stephen, > > > >Can we piggyback on current Apache Docker Hub account? I think images > >can > >be hold there, too. > > > >-E > > > &g

Re: IO ITs: Hosting Docker images

2017-04-10 Thread Stephen Sisk
age, we can > store the image in our own "IT dockerhub". > > > > Regards > > JB > > > >> On 04/08/2017 01:03 AM, Stephen Sisk wrote: > >> Wanted to see if anyone else had opinions on this/provide a quick > update. > >> > >&g

Re: IO ITs: Hosting Docker images

2017-04-04 Thread Stephen Sisk
, Apr 4, 2017 at 10:00 AM Lukasz Cwik <lc...@google.com.invalid> wrote: > Is this something that Apache infra could help us with? > > On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk <s...@google.com.invalid> > wrote: > > > Summary: > > > > For IO ITs that u

IO ITs: Hosting Docker images

2017-04-03 Thread Stephen Sisk
Summary: For IO ITs that use data stores that need custom docker images in order to run, we can't currently use them in a kubernetes cluster (which is where we host our data stores.) I have a couple options for how to solve this and am looking for feedback from folks involved in creating IO

Re: IO IT Patterns: Simplifying data loading

2017-03-29 Thread Stephen Sisk
ue, Mar 28, 2017 at 10:27 PM Chamikara Jayalath <chamik...@apache.org> wrote: > On Tue, Mar 28, 2017 at 3:00 AM Etienne Chauchot <echauc...@gmail.com> > wrote: > > > Hi Stephen, > > > > I have some comments bellow: > > > > > > Le 24/03/2017

Re: IO IT Patterns: Simplifying data loading

2017-03-28 Thread Stephen Sisk
t/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java On Tue, Mar 28, 2017 at 3:00 AM Etienne Chauchot <echauc...@gmail.com> wrote: > Hi Stephen, > > I have some comments bellow: > > > Le 24/03/2017 à 00:26, Stephen Sisk a écrit : > > hi! > > > > I just opened

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2017-03-27 Thread Stephen Sisk
I haven't been involved with ValidatesRunner/NeedsRunner too much so I'll avoid commenting on whether a particular interpretation is correct or not. re: "- There might be a use case for testing a transform against all runners, and we don't have an agreed-upon solution about how to do that:

Re: IO IT Patterns: Simplifying data loading

2017-03-23 Thread Stephen Sisk
thanks, appreciated :) On Thu, Mar 23, 2017 at 4:59 PM Ted Yu <yuzhih...@gmail.com> wrote: > Looks like you forgot to include JIRA number: BEAM-1799 > > Cheers > > On Thu, Mar 23, 2017 at 4:26 PM, Stephen Sisk <s...@google.com.invalid> > wrote: > > >

IO IT Patterns: Simplifying data loading

2017-03-23 Thread Stephen Sisk
hi! I just opened a jira ticket that I wanted to make sure the mailing list got a chance to see. The problem is that the current design pattern for doing data loading in IO ITs (either writing a small program or using an external tool) is complex, inefficient and requires extra steps like

Re: Docker image dependencies

2017-03-22 Thread Stephen Sisk
ation or references to help > developers bootstrap their Kubernetes so they can contribute and > validate the tests in their own. > > On Wed, Mar 22, 2017 at 12:14 AM, Stephen Sisk <s...@google.com.invalid> > wrote: > > Hey Ismael, > > > > I definitely agree with you

Re: Docker image dependencies

2017-03-15 Thread Stephen Sisk
d explicitly update if needed. > > 3. It's better that docker images are under an unique responsibility scope > as different IOs can use the same resources, so they should use the same > provided docker. > > By the way, I also have a docker coming for RedisIO ;) > > Regards >

Docker image dependencies

2017-03-14 Thread Stephen Sisk
hi! as part of doing the work to enable IO ITs, we decided we want to use docker. As part of that, we need to run docker images and they'll probably be pulled from a docker repository. Questions: * What docker repositories (and users on docker hub) do we as a group allow for images we'll run for

Re: Default shading configuration and opting out

2017-03-14 Thread Stephen Sisk
Thanks for working on this. Working in IO, I see us having to continually catch and fix missing guava shading, so it seems deserved. S On Tue, Mar 14, 2017 at 2:13 PM Aviem Zur wrote: > Hi all, > > https://github.com/apache/beam/pull/2096 introduced a common shading >

shared location for IO IT resources (pipelineoptions/k8 scripts)

2017-03-07 Thread Stephen Sisk
hey, It is the case that different IOs will be created that connect to the same data stores - HadoopInputFormat in particular uses ES and cassandra, which are also used in their respective IOs as well. Jdbc is likely to have the same type of overlap. This came up while reviewing

Re: Merge HadoopInputFormatIO and HDFSIO in a single module

2017-03-01 Thread Stephen Sisk
I wanted to follow up on this thread since I see some potential blocking questions arising, and I'm trying to help dipti along with her PR. Dipti's PR[1] is currently written to put files into: io/hadoop/inputformat The recent changes to create hadoop-common created: io/hadoop-common This means

Re: Metrics for Beam IOs.

2017-02-14 Thread Stephen Sisk
hat information. So then the question becomes - does it make sense for these common transform metrics to be exposed by runner implementations or within common beam code? S On Tue, Feb 14, 2017 at 9:21 AM Ben Chambers <bchamb...@google.com.invalid> wrote: > On Tue, Feb 14, 201

IO Authoring Guide - first draft

2017-01-27 Thread Stephen Sisk
Here's the doc I've been working on: https://docs.google.com/document/d/1nGGP2sLb5fLamB_dnkHVHC8BVjDD_SE46mQPIPkK5cQ/edit?usp=sharing The general purpose of the doc is to share high level design thoughts and process for authoring an IO transform. Topics include: * List of example IOs that people

Re: IO Integration tests - concrete proposal

2017-01-25 Thread Stephen Sisk
IT (name proposal) > >if we all have an agreement on these points. Maybe it requires some >more >discussions (methods in the interface, are almost passthrough >implementations -EmbeddedIOService, RealIOService - needed, ...) > >Etienne > > >Le 24/01/2017 à 06:47, Step

Re: Hosting data stores for IO Transform testing

2017-01-20 Thread Stephen Sisk
hey folks! I wanted to gather any last thoughts that people might have. I'd like to get started setting this up - anyone else have input? S On Thu, Jan 19, 2017 at 11:41 AM Stephen Sisk <s...@google.com> wrote: > Glad to hear you support kubernetes (although to be clear, I'

Re: IO Integration tests - concrete proposal

2017-01-19 Thread Stephen Sisk
t; >> This would be different from embedding the data in the specific IT >> implementation and would also create a coupling between ITs from >> potentially multiple languages. >> >> On Tue, Jan 17, 2017 at 4:27 PM, Stephen Sisk <s...@google.com.invalid> >> wrote:

Re: Hosting data stores for IO Transform testing

2017-01-17 Thread Stephen Sisk
Thanks for taking the time to comment. > > > > My comments are bellow in the email: > > > > > > Le 24/12/2016 à 00:07, Stephen Sisk a écrit : > > > >> hey Etienne - > >> > >> thanks for your thoughts and thanks for sharing your

IO Integration tests - concrete proposal

2017-01-17 Thread Stephen Sisk
Hi all! As I've discussed previously on this list[1], ensuring that we have high quality IO Transforms is important to beam. We want to do this without adding too much burden on developers wanting to contribute. Below I have a concrete proposal for what an IO integration test would look like and

Re: Some Thoughts on IO Integration Tests

2017-01-12 Thread Stephen Sisk
opic from) > found in KafkaReadITPipelineOptions that extends PipelineOptions. > > > On Thu, Jan 12, 2017 at 5:56 PM, Stephen Sisk <s...@google.com.invalid> > wrote: > > > I see the need for/like KinesisIOTestPipelineOptions - it would allow all > > ITs/perf tests that need a kine

Re: Some Thoughts on IO Integration Tests

2017-01-12 Thread Stephen Sisk
IOOptionsTest2 which extends MyIOOptions. MyIOOptions can have all > > the shared fields such as port or host or whatever and then > > MyIOOptionsTest1 can have specific parameters for Test1 and > > MyIOOptionsTest2 can have specific parameters for Test2. > > > > &

Re: splitIntoBundles vs. generateInitialSplits

2017-01-09 Thread Stephen Sisk
hi! I think your strikethrough got lost due to this being a text-only email list. To make sure, I think you're asking the following: " would it be reasonable to think of splitIntoBundles as generateSplits? " (ie, you strikethrough'd Initial) They are very similar and I definitely also think of