Slack team for apache beam?

2017-06-29 Thread P. Ramanjaneya Reddy
Hi All,

Please let me know which team need to enter here.


[image: Inline image 3]

[image: Inline image 2]

[image: Inline image 1]


Re: Issue adding new IO(InfluxDB) with coder in Apache Beam

2017-06-29 Thread P. Ramanjaneya Reddy
I'm avialable @ Hangout

On Fri, Jun 30, 2017 at 11:25 AM, Jean-Baptiste Onofré 
wrote:

> Thanks, let me paste in an editor and do a review.
>
> I keep you posted. By the way, are you on Slack or Hangout to chat quickly
> ?
>
> Regards
> JB
>
>
> On 06/30/2017 07:46 AM, P. Ramanjaneya Reddy wrote:
>
>> Hi JB,
>>
>> Here is complete code for InfluxDB read/write methods, I have followed
>> generic method as other IO's.
>>
>> package org.apache.beam.sdk.io.influxdb;
>>
>> import static com.google.common.base.Preconditions.checkArgument;
>> import static com.google.common.base.Preconditions.checkNotNull;
>> import static com.google.common.base.Preconditions.checkState;
>>
>> import com.google.auto.value.AutoValue;
>> import org.apache.beam.sdk.coders.Coder;
>> import org.apache.beam.sdk.coders.SerializableCoder;
>> import org.apache.beam.sdk.io.BoundedSource;
>> import org.apache.beam.sdk.options.PipelineOptions;
>> import org.apache.beam.sdk.transforms.DoFn;
>> import org.apache.beam.sdk.transforms.PTransform;
>> import org.apache.beam.sdk.transforms.ParDo;
>> import org.apache.beam.sdk.transforms.display.DisplayData;
>> import org.apache.beam.sdk.values.PBegin;
>> import org.apache.beam.sdk.values.PCollection;
>>
>> import org.apache.beam.sdk.values.PDone;
>> import org.influxdb.InfluxDB;
>> import org.influxdb.InfluxDBFactory;
>> import org.influxdb.dto.BatchPoints;
>> import org.influxdb.dto.Point;
>> import org.influxdb.dto.Query;
>> import org.influxdb.dto.QueryResult;
>> import org.slf4j.Logger;
>> import org.slf4j.LoggerFactory;
>>
>> import java.io.IOException;
>> import java.io.Serializable;
>> import java.util.ArrayList;
>> import java.util.Iterator;
>> import java.util.List;
>> import java.util.NoSuchElementException;
>> import java.util.concurrent.TimeUnit;
>> import javax.annotation.Nullable;
>>
>> /**
>>   * Created by root1 on 16/5/17.
>>   */
>> public class InfluxDbIO {
>>  private static final Logger LOG = LoggerFactory.getLogger(Influx
>> DbIO.class);
>>
>>  private InfluxDbIO() {
>>  }
>>
>>  /**
>>   * Callback for the parser to use to submit data.
>>   */
>>  public interface ParserCallback extends Serializable {
>>  /**
>>   * Output the object.  The default timestamp will be the
>> GridFSDBFile
>>   * creation timestamp.
>>   * @param output
>>   */
>>  void output(T output);
>>
>>  }
>>
>>  /**
>>   * Interface for the parser that is used to parse the GridFSDBFile
>> into
>>   * the appropriate types.
>>   * @param 
>>   */
>>  public interface Parser extends Serializable {
>>  void parse(String input, ParserCallback callback) throws
>> IOException;
>>  }
>>
>>  /**
>>   * For the default {@code Read} case, this is the parser
>> that is used to
>>   * split the input file into Strings. It uses the timestamp of the
>> file
>>   * for the event timestamp.
>>   */
>>  private static final Parser TEXT_PARSER = new
>> Parser() {
>>  @Override
>>  public void parse(String input, ParserCallback callback)
>>  throws IOException {
>>  callback.output(input);
>>  }
>>  };
>>
>>  /**
>>   * Read data from a InfluxDb.
>>   *
>>   * @param  Type of the data to be read.
>>   */
>>  public static  Read read() {
>>  return new AutoValue_InfluxDbIO_Read.Builder().build();
>>  }
>>
>>  /**
>>   * A {@link PTransform} to read data from InfluxDB.
>>   */
>>  @AutoValue
>>  public abstract static class Read extends PTransform> PCollection> {
>>  @Nullable abstract Parser getparser();
>>  @Nullable abstract String geturi();
>>  @Nullable abstract String getdatabase();
>>  @Nullable abstract Coder getCoder();
>>
>>  abstract Builder toBuilder();
>>
>>  @AutoValue.Builder
>>  abstract static class Builder {
>>  abstract Builder setParser(Parser parser);
>>  abstract Builder setUri(String uri);
>>  abstract Builder setDatabase(String database);
>>  abstract Builder setCoder(Coder coder);
>>  abstract Read build();
>>  }
>>
>>  public  Read withParser(Parser parser) {
>>  checkNotNull(parser);
>>  Builder builder = (Builder) toBuilder();
>>  return builder.setParser(parser).setCoder(null).build();
>>  }
>>
>>  /**
>>   * Example documentation for withUri.
>>   */
>>  public Read withUri(String uri) {
>>  checkNotNull(uri);
>>  return toBuilder().setUri(uri).build();
>>  }
>>
>>  public Read withDatabase(String database) {
>>  checkNotNull(database);
>>  return toBuilder().setDatabase(database).build();
>>  }
>>
>>  public Read withCoder(Coder coder) {
>>  checkArgumen

Re: Issue adding new IO(InfluxDB) with coder in Apache Beam

2017-06-29 Thread Jean-Baptiste Onofré

Thanks, let me paste in an editor and do a review.

I keep you posted. By the way, are you on Slack or Hangout to chat quickly ?

Regards
JB

On 06/30/2017 07:46 AM, P. Ramanjaneya Reddy wrote:

Hi JB,

Here is complete code for InfluxDB read/write methods, I have followed
generic method as other IO's.

package org.apache.beam.sdk.io.influxdb;

import static com.google.common.base.Preconditions.checkArgument;
import static com.google.common.base.Preconditions.checkNotNull;
import static com.google.common.base.Preconditions.checkState;

import com.google.auto.value.AutoValue;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.SerializableCoder;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.display.DisplayData;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;

import org.apache.beam.sdk.values.PDone;
import org.influxdb.InfluxDB;
import org.influxdb.InfluxDBFactory;
import org.influxdb.dto.BatchPoints;
import org.influxdb.dto.Point;
import org.influxdb.dto.Query;
import org.influxdb.dto.QueryResult;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;
import java.util.concurrent.TimeUnit;
import javax.annotation.Nullable;

/**
  * Created by root1 on 16/5/17.
  */
public class InfluxDbIO {
 private static final Logger LOG = 
LoggerFactory.getLogger(InfluxDbIO.class);

 private InfluxDbIO() {
 }

 /**
  * Callback for the parser to use to submit data.
  */
 public interface ParserCallback extends Serializable {
 /**
  * Output the object.  The default timestamp will be the GridFSDBFile
  * creation timestamp.
  * @param output
  */
 void output(T output);

 }

 /**
  * Interface for the parser that is used to parse the GridFSDBFile into
  * the appropriate types.
  * @param 
  */
 public interface Parser extends Serializable {
 void parse(String input, ParserCallback callback) throws 
IOException;
 }

 /**
  * For the default {@code Read} case, this is the parser
that is used to
  * split the input file into Strings. It uses the timestamp of the file
  * for the event timestamp.
  */
 private static final Parser TEXT_PARSER = new Parser() {
 @Override
 public void parse(String input, ParserCallback callback)
 throws IOException {
 callback.output(input);
 }
 };

 /**
  * Read data from a InfluxDb.
  *
  * @param  Type of the data to be read.
  */
 public static  Read read() {
 return new AutoValue_InfluxDbIO_Read.Builder().build();
 }

 /**
  * A {@link PTransform} to read data from InfluxDB.
  */
 @AutoValue
 public abstract static class Read extends PTransform> {
 @Nullable abstract Parser getparser();
 @Nullable abstract String geturi();
 @Nullable abstract String getdatabase();
 @Nullable abstract Coder getCoder();

 abstract Builder toBuilder();

 @AutoValue.Builder
 abstract static class Builder {
 abstract Builder setParser(Parser parser);
 abstract Builder setUri(String uri);
 abstract Builder setDatabase(String database);
 abstract Builder setCoder(Coder coder);
 abstract Read build();
 }

 public  Read withParser(Parser parser) {
 checkNotNull(parser);
 Builder builder = (Builder) toBuilder();
 return builder.setParser(parser).setCoder(null).build();
 }

 /**
  * Example documentation for withUri.
  */
 public Read withUri(String uri) {
 checkNotNull(uri);
 return toBuilder().setUri(uri).build();
 }

 public Read withDatabase(String database) {
 checkNotNull(database);
 return toBuilder().setDatabase(database).build();
 }

 public Read withCoder(Coder coder) {
 checkArgument(coder != null,
"InfluxDbIO.read().withCoder(coder) called with null coder");
 return toBuilder().setCoder(coder).build();
 }

 @Override
 public PCollection expand(PBegin input) {

 PCollection inflxDb =
input.apply(org.apache.beam.sdk.io.Read.from(new
BoundedInfluxDbSource(this)));

 System.out.println("***Get InfluxDB ***");

 PCollection output = inflxDb.apply("Print List",
 ParDo.of(new DoFn() {


Re: Issue adding new IO(InfluxDB) with coder in Apache Beam

2017-06-29 Thread P. Ramanjaneya Reddy
Hi JB,

Here is complete code for InfluxDB read/write methods, I have followed
generic method as other IO's.

package org.apache.beam.sdk.io.influxdb;

import static com.google.common.base.Preconditions.checkArgument;
import static com.google.common.base.Preconditions.checkNotNull;
import static com.google.common.base.Preconditions.checkState;

import com.google.auto.value.AutoValue;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.SerializableCoder;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.display.DisplayData;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;

import org.apache.beam.sdk.values.PDone;
import org.influxdb.InfluxDB;
import org.influxdb.InfluxDBFactory;
import org.influxdb.dto.BatchPoints;
import org.influxdb.dto.Point;
import org.influxdb.dto.Query;
import org.influxdb.dto.QueryResult;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.NoSuchElementException;
import java.util.concurrent.TimeUnit;
import javax.annotation.Nullable;

/**
 * Created by root1 on 16/5/17.
 */
public class InfluxDbIO {
private static final Logger LOG = LoggerFactory.getLogger(InfluxDbIO.class);

private InfluxDbIO() {
}

/**
 * Callback for the parser to use to submit data.
 */
public interface ParserCallback extends Serializable {
/**
 * Output the object.  The default timestamp will be the GridFSDBFile
 * creation timestamp.
 * @param output
 */
void output(T output);

}

/**
 * Interface for the parser that is used to parse the GridFSDBFile into
 * the appropriate types.
 * @param 
 */
public interface Parser extends Serializable {
void parse(String input, ParserCallback callback) throws IOException;
}

/**
 * For the default {@code Read} case, this is the parser
that is used to
 * split the input file into Strings. It uses the timestamp of the file
 * for the event timestamp.
 */
private static final Parser TEXT_PARSER = new Parser() {
@Override
public void parse(String input, ParserCallback callback)
throws IOException {
callback.output(input);
}
};

/**
 * Read data from a InfluxDb.
 *
 * @param  Type of the data to be read.
 */
public static  Read read() {
return new AutoValue_InfluxDbIO_Read.Builder().build();
}

/**
 * A {@link PTransform} to read data from InfluxDB.
 */
@AutoValue
public abstract static class Read extends PTransform> {
@Nullable abstract Parser getparser();
@Nullable abstract String geturi();
@Nullable abstract String getdatabase();
@Nullable abstract Coder getCoder();

abstract Builder toBuilder();

@AutoValue.Builder
abstract static class Builder {
abstract Builder setParser(Parser parser);
abstract Builder setUri(String uri);
abstract Builder setDatabase(String database);
abstract Builder setCoder(Coder coder);
abstract Read build();
}

public  Read withParser(Parser parser) {
checkNotNull(parser);
Builder builder = (Builder) toBuilder();
return builder.setParser(parser).setCoder(null).build();
}

/**
 * Example documentation for withUri.
 */
public Read withUri(String uri) {
checkNotNull(uri);
return toBuilder().setUri(uri).build();
}

public Read withDatabase(String database) {
checkNotNull(database);
return toBuilder().setDatabase(database).build();
}

public Read withCoder(Coder coder) {
checkArgument(coder != null,
"InfluxDbIO.read().withCoder(coder) called with null coder");
return toBuilder().setCoder(coder).build();
}

@Override
public PCollection expand(PBegin input) {

PCollection inflxDb =
input.apply(org.apache.beam.sdk.io.Read.from(new
BoundedInfluxDbSource(this)));

System.out.println("***Get InfluxDB ***");

PCollection output = inflxDb.apply("Print List",
ParDo.of(new DoFn() {

@ProcessElement
public void processElement(ProcessContext c) {
System.out.println("Print List: " + c.element());
c.output(c.element());
}
}));


 

Re: Issue adding new IO(InfluxDB) with coder in Apache Beam

2017-06-29 Thread Jean-Baptiste Onofré

Hi,

can you share your code ? I will fix that with you. I suggest to check the 
expand() method in the read PTransform and the way you use generic there.


Any plan to donate this IO: I would be happy to review the PR !

Do you leverage some InfluxDB feature (like splitting/sharding) ?

Regards
JB

On 06/30/2017 07:26 AM, P. Ramanjaneya Reddy wrote:

Hi Beam Dev,

We have developed our own sdk io  functions for read/write InfluxDBIO
operations in apache BEAM.  it is works with default coder, which is
StringUtf8Coder.of().

  PCollection output = pipeline.apply(
 InfluxDbIO.read()
 .withUri("http://localhost:8086";)
 .withDatabase("beam"));




With reference mongoDB and JDBC, implemented the read function with
setcoder() options in InfluxDB also, but it is not working.

 PCollection output = pipeline.apply(
 InfluxDbIO.read()
 .withParser(new InfluxDbIO.Parser() {
   @Override
   public void parse(String input,
   InfluxDbIO.ParserCallback
callback) throws IOException {
 callback.output(input);
   }
 })
 .withUri("http://localhost:8086";)
 .withDatabase("beam")
 .withCoder(StringUtf8Coder.of()));> with coder
getting error as

java.lang.ClassCastException: org.apache.beam.sdk.values.PBegin cannot be
cast to org.apache.beam.sdk.values.PCollection

Thanks & Regards,
Ramanjaneya



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Issue adding new IO(InfluxDB) with coder in Apache Beam

2017-06-29 Thread P. Ramanjaneya Reddy
Hi Beam Dev,

We have developed our own sdk io  functions for read/write InfluxDBIO
operations in apache BEAM.  it is works with default coder, which is
StringUtf8Coder.of().

 PCollection output = pipeline.apply(
InfluxDbIO.read()
.withUri("http://localhost:8086";)
.withDatabase("beam"));




With reference mongoDB and JDBC, implemented the read function with
setcoder() options in InfluxDB also, but it is not working.

PCollection output = pipeline.apply(
InfluxDbIO.read()
.withParser(new InfluxDbIO.Parser() {
  @Override
  public void parse(String input,
  InfluxDbIO.ParserCallback
callback) throws IOException {
callback.output(input);
  }
})
.withUri("http://localhost:8086";)
.withDatabase("beam")
.withCoder(StringUtf8Coder.of()));> with coder
getting error as

java.lang.ClassCastException: org.apache.beam.sdk.values.PBegin cannot be
cast to org.apache.beam.sdk.values.PCollection

Thanks & Regards,
Ramanjaneya


Proposal: Watch: a transform for watching growth of sets

2017-06-29 Thread Eugene Kirpichov
Hi all,

Please take a look at this short proposal that came out of implementing
http://s.apache.org/textio-sdf. I think it's a nice generalization. I would
welcome comments on the proposed API, or corner cases of semantics that I
haven't thought about, or more generalizations, etc.

http://s.apache.org/beam-watch-transform

We propose a PTransform Watch.growthOf(poll function) that repeatedly polls
sets associated with each of its inputs and continuously produces new
elements in each set until a per-set termination condition is reached. It
is a generalization of "watch filepattern for matching files".

Code snippet:

PCollection inputs = …;
PCollection> outputs = inputs.apply(
Watch.growthOf((InputT input, PollReceiver out) → {
   … out.put(timestamp, value) …
   return Watch.outputCanGrow().withWatermark( … );
   // or:
   return Watch.outputIsFinal();
 })
 .withPollInterval(10 sec)
 .withTerminationPerInput(
 Watch.afterEitherOf(
   Watch.afterTotalOf(5 min),
   Watch.afterOutputStableFor(1 min)))
 .withOutputCoder( … ))

Thanks!


Re: [DISCUSSION] Encouraging more contributions

2017-06-29 Thread Kai Jiang
That's an interesting experiment. From new contributor's aspect, I would
see more small feature project ideas (just like rust community do). Since
existing beginner issues give new contributors a vague roadmap what they
should do next.

On Thu, Jun 29, 2017 at 1:53 PM, Sourabh Bajaj <
sourabhba...@google.com.invalid> wrote:

> The Rust community is trying an interesting experiment for encouraging more
> diversity in the contributors:
> https://blog.rust-lang.org/2017/06/27/Increasing-Rusts-Reach.html
>
> On Fri, Apr 28, 2017 at 12:05 PM Sourabh Bajaj 
> wrote:
>
> > I think they can probably reach out to the mentor for questions like: How
> > to navigate the code base? What parts of the code could they use as a
> > pattern? This could be done using the preferred mode of communication
> based
> > on the contributor.
> >
> > My opinion is that large projects and communities may come across as
> > intimidating to first time contributors, so being as welcoming and
> > encouraging is important.
> >
> > On Thu, Apr 27, 2017 at 8:52 PM Aviem Zur  wrote:
> >
> >> @
> >> Sourabh Bajaj
> >>
> >> The mentoring on starter tickets is an interesting Idea. How would it
> >> technically work?.
> >>
> >> A new contributor assigns a starter ticket to themselves. What happens
> >> from
> >> there?
> >>
> >> On Tue, Apr 25, 2017 at 12:01 PM Ismaël Mejía 
> wrote:
> >>
> >> > I think it is important to clarify that the developer documentation
> >> > discussed in this thread is of two kinds:
> >> >
> >> > 6.1. Documents with proposals and new designs, those covered by the
> >> > Beam Improvement Proposal (BEAM-566), and that we need to put with a
> >> > single file index (I remember there was a google dir for this but not
> >> > sure it is still valid, and in any case probably the website is a
> >> > better place for this). Is there any progress on this?
> >> >
> >> > 6.2. Documentation about how things work, so new developers can get
> >> > into developing features/fixes for the project, those are the kind
> >> > that Kenneth/Etienne mention and include Stephen’s IO guide but could
> >> > be definitely expanded to include things like how does the different
> >> > runner translation works, or some details on triggers/materialization
> >> > of panes/windows from the SDK point of view. However the hard part of
> >> > this documents is that they should be maintained e.g. updated when the
> >> > code evolves so they don’t get outdated as JB mentions.
> >> >
> >> > On Tue, Apr 25, 2017 at 10:47 AM, Wesley Tanaka
> >> >  wrote:
> >> > > These are the ones I've come across so far, are there others?
> >> > >
> >> > > * Dynamic DoFn https://s.apache.org/a-new-dofn
> >> > >
> >> > > ** Splittable DoFn (Obsoletes Source API)
> >> > http://s.apache.org/splittable-do-fn
> >> > >
> >> > > ** State and Timers for DoFn: https://s.apache.org/beam-state
> >> > >
> >> > >
> >> > > * Lateness https://s.apache.org/beam-lateness
> >> > >
> >> > >
> >> > > * Metrics API http://s.apache.org/beam-metrics-api
> >> > >
> >> > > ** I/O Metrics https://s.apache.org/standard-io-metrics
> >> > >
> >> > >
> >> > > * Runner API http://s.apache.org/beam-runner-api
> >> > >
> >> > > ** https://s.apache.org/beam-runner-composites
> >> > >
> >> > > ** https://s.apache.org/beam-side-inputs-1-pager
> >> > >
> >> > >
> >> > > * Fn API http://s.apache.org/beam-fn-api
> >> > >
> >> > > ---
> >> > > Wesley Tanaka
> >> > > https://wtanaka.com/
> >> > >
> >> > >
> >> > > On Monday, April 24, 2017, 2:45:45 PM HST, Sourabh Bajaj <
> >> > sourabhba...@google.com.INVALID> wrote:
> >> > > For 6. I think having them in one page on the website where we can
> >> find
> >> > the
> >> > > design docs more easily would be great.
> >> > >
> >> > > 7. For low-hanging-fruit, one thing I really liked from some Mozilla
> >> > > projects was assigning a mentor on the ticket. Someone you can reach
> >> out
> >> > to
> >> > > if you have questions. I think this makes the entry barrier really
> low
> >> > for
> >> > > first time contributors who might feel intimidated asking questions
> >> > > completely in public.
> >> > >
> >> > > On Mon, Apr 24, 2017 at 10:06 AM Kenneth Knowles
> >>  >> > >
> >> > > wrote:
> >> > >
> >> > >> I like the subject Etienne has brought up, and will give it a
> number
> >> in
> >> > >> this list :-)
> >> > >>
> >> > >> 6. Have more technical reference docs (not just workspace set up)
> for
> >> > >> contributors.
> >> > >>
> >> > >> I think this overlaps a lot with a prior discussion about where to
> >> > collect
> >> > >> design proposals [1]. Design docs used to be just dropped into a
> >> public
> >> > >> folder, but that got disorganized. And that thread was about work
> in
> >> > >> progress, so JIRA was a good place for details after a dev@ thread
> >> > agrees
> >> > >> on a proposal. At this point, the designs are pretty solid
> >> conceptually
> >> > or
> >> > >> even implemented and we could start to build out deeper technical
> >> bits
> >> > on
> >> 

Re: [DISCUSSION] Encouraging more contributions

2017-06-29 Thread Sourabh Bajaj
The Rust community is trying an interesting experiment for encouraging more
diversity in the contributors:
https://blog.rust-lang.org/2017/06/27/Increasing-Rusts-Reach.html

On Fri, Apr 28, 2017 at 12:05 PM Sourabh Bajaj 
wrote:

> I think they can probably reach out to the mentor for questions like: How
> to navigate the code base? What parts of the code could they use as a
> pattern? This could be done using the preferred mode of communication based
> on the contributor.
>
> My opinion is that large projects and communities may come across as
> intimidating to first time contributors, so being as welcoming and
> encouraging is important.
>
> On Thu, Apr 27, 2017 at 8:52 PM Aviem Zur  wrote:
>
>> @
>> Sourabh Bajaj
>>
>> The mentoring on starter tickets is an interesting Idea. How would it
>> technically work?.
>>
>> A new contributor assigns a starter ticket to themselves. What happens
>> from
>> there?
>>
>> On Tue, Apr 25, 2017 at 12:01 PM Ismaël Mejía  wrote:
>>
>> > I think it is important to clarify that the developer documentation
>> > discussed in this thread is of two kinds:
>> >
>> > 6.1. Documents with proposals and new designs, those covered by the
>> > Beam Improvement Proposal (BEAM-566), and that we need to put with a
>> > single file index (I remember there was a google dir for this but not
>> > sure it is still valid, and in any case probably the website is a
>> > better place for this). Is there any progress on this?
>> >
>> > 6.2. Documentation about how things work, so new developers can get
>> > into developing features/fixes for the project, those are the kind
>> > that Kenneth/Etienne mention and include Stephen’s IO guide but could
>> > be definitely expanded to include things like how does the different
>> > runner translation works, or some details on triggers/materialization
>> > of panes/windows from the SDK point of view. However the hard part of
>> > this documents is that they should be maintained e.g. updated when the
>> > code evolves so they don’t get outdated as JB mentions.
>> >
>> > On Tue, Apr 25, 2017 at 10:47 AM, Wesley Tanaka
>> >  wrote:
>> > > These are the ones I've come across so far, are there others?
>> > >
>> > > * Dynamic DoFn https://s.apache.org/a-new-dofn
>> > >
>> > > ** Splittable DoFn (Obsoletes Source API)
>> > http://s.apache.org/splittable-do-fn
>> > >
>> > > ** State and Timers for DoFn: https://s.apache.org/beam-state
>> > >
>> > >
>> > > * Lateness https://s.apache.org/beam-lateness
>> > >
>> > >
>> > > * Metrics API http://s.apache.org/beam-metrics-api
>> > >
>> > > ** I/O Metrics https://s.apache.org/standard-io-metrics
>> > >
>> > >
>> > > * Runner API http://s.apache.org/beam-runner-api
>> > >
>> > > ** https://s.apache.org/beam-runner-composites
>> > >
>> > > ** https://s.apache.org/beam-side-inputs-1-pager
>> > >
>> > >
>> > > * Fn API http://s.apache.org/beam-fn-api
>> > >
>> > > ---
>> > > Wesley Tanaka
>> > > https://wtanaka.com/
>> > >
>> > >
>> > > On Monday, April 24, 2017, 2:45:45 PM HST, Sourabh Bajaj <
>> > sourabhba...@google.com.INVALID> wrote:
>> > > For 6. I think having them in one page on the website where we can
>> find
>> > the
>> > > design docs more easily would be great.
>> > >
>> > > 7. For low-hanging-fruit, one thing I really liked from some Mozilla
>> > > projects was assigning a mentor on the ticket. Someone you can reach
>> out
>> > to
>> > > if you have questions. I think this makes the entry barrier really low
>> > for
>> > > first time contributors who might feel intimidated asking questions
>> > > completely in public.
>> > >
>> > > On Mon, Apr 24, 2017 at 10:06 AM Kenneth Knowles
>> > > >
>> > > wrote:
>> > >
>> > >> I like the subject Etienne has brought up, and will give it a number
>> in
>> > >> this list :-)
>> > >>
>> > >> 6. Have more technical reference docs (not just workspace set up) for
>> > >> contributors.
>> > >>
>> > >> I think this overlaps a lot with a prior discussion about where to
>> > collect
>> > >> design proposals [1]. Design docs used to be just dropped into a
>> public
>> > >> folder, but that got disorganized. And that thread was about work in
>> > >> progress, so JIRA was a good place for details after a dev@ thread
>> > agrees
>> > >> on a proposal. At this point, the designs are pretty solid
>> conceptually
>> > or
>> > >> even implemented and we could start to build out deeper technical
>> bits
>> > on
>> > >> the web site, or at least some place that people can find it. We do
>> have
>> > >> the Testing Guide and the PTransform Style Guide and somewhere near
>> > there
>> > >> we could have deeper references. I think we need a broader vision for
>> > the
>> > >> "table of contents" here.
>> > >>
>> > >> For my docs (triggers, lateness, runner API, side inputs, state,
>> > coders) I
>> > >> haven't had time, but I do intend to both translate from GDoc to some
>> > other
>> > >> format and also rewrite versions for users where appropriate.
>> Probably
>> > this
>> > >> will mean c

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-06-29 Thread Jean-Baptiste Onofré

FYI,

I opened https://github.com/apache/beam/pull/3471 to fix the SpannerIO test on 
my machine. I don't understand how the test can pass without defining the 
project ID (it should always fail on the precondition).


I will create the release branch once this PR is merged.

Regards
JB

On 06/29/2017 06:29 AM, Jean-Baptiste Onofré wrote:

Hi Stephen,

Thanks for the update.

I have an issue on my machine with SpannerIOTest. I will create the release 
branch as soon as this is fix. Then, we will be able to cherry-pick the fix we 
want.


I keep you posted.

Regards
JB

On 06/28/2017 09:37 PM, Stephen Sisk wrote:

hi!

I'm hopeful we can get the fix for BEAM-2533 into this release as well,
there's a bigtable fix in the next version that'd be good to have. The
bigtable client release should be in the next day or two.

S

On Mon, Jun 26, 2017 at 12:03 PM Jean-Baptiste Onofré 
wrote:


Hi guys,

just a quick update about the 2.1.0 release.

I will complete the Jira triage tomorrow.

I plan to create the release branch Wednesday.

Thanks !
Regards
JB

On 06/22/2017 04:23 AM, Jean-Baptiste Onofré wrote:

Hi guys,

As we released 2.0.0 (first stable release) last month during ApacheCon,

and to

maintain our release pace, I would like to release 2.1.0 next week.

This release would include lot of bug fixes and some new features:

https://issues.apache.org/jira/projects/BEAM/versions/12340528

I'm volunteer to be release manager for this one.

Thoughts ?

Thanks,
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com







--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: KafkaIO, Warning on offset gap

2017-06-29 Thread Elmar Weber

Thanks.

I read a bit of we can read the topic config with the KafkaConsumer, I 
could not find any method that allows that. Also we have to check the 
global defaults first and then per topic override (as KafkaIO can 
subscript to multiple topics).



I did a quick scan of the Kafka command line code [1] and it looks like 
all this is done via Zookeeper. As Zookeeper isn't supported at the 
moment as far as I can see as a config base for KafkaIO it's probably 
not worth implementing something around it.




[1] 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConfigCommand.scala#L101



On 06/28/2017 9:54 PM, Raghu Angadi wrote:

Fixing it in https://github.com/apache/beam/pull/3461.

Thanks for reporting the issue.

On Wed, Jun 28, 2017 at 8:37 AM, Raghu Angadi  wrote:


Hi Elmar,

You are right. We should not log this at all when the gaps are expected as
you pointed out. I don't think client can check if compaction is enabled
for a topic through Consumer api.

I think we should remove the log. The user can't really act on it other
than reporting it. I will send a PR.

As a temporary work around you can disable logging for a particular class
on the worker with --workerLogLevelOverrides
 option. But this
this would suppress rest of the logging the reader.

Raghu


On Wed, Jun 28, 2017 at 4:12 AM, Elmar Weber  wrote:


Hello,

I'm testing the KafkaIO with Google Cloud dataflow and getting warnings
when working with compacted logs. In the code there is a relevant check:


https://github.com/apache/beam/blob/master/sdks/java/io/kafk
a/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1158

// sanity check
if (offset != expected) {
   LOG.warn("{}: gap in offsets for {} at {}. {} records missing.",
   this, pState.topicPartition, expected, offset - expected);
}

 From what I understand, this can happen when log compaction is enabled
because the relevant entry can get cleaned up by Kafka with a newer one.
In this case, shouldn't this be a info log and / or warn only when log
compaction is disabled for the topic?


I'm still debugging some stuff because the pipeline also stops reading on
compacted logs, I'm not sure if this related / could also be an issue with
my Kafka test installation, but as far as I understand the gaps are
expected behaviour with log compaction enabled.

Thanks,
Elmar










Jenkins build is still unstable: beam_Release_NightlySnapshot #462

2017-06-29 Thread Apache Jenkins Server
See