[Proposal] Beam Newsletter

2017-09-21 Thread Griselda Cuevas
Hi Beam Community,

I have a proposal to start sending *monthly newsletters* to our dev and
user mailing lists. The idea is to summarize what's happening in the
project and keep everyone informed of what's happening, specially new
members, people interested in specific initiatives/efforts and help
visualize the progress in concrete milestones.

This is what I propose:

1. I'll send an email to the dev & user mailing list with a "Call for
updates" with a link to a Google doc like this

so people can add their updates.
2. I'll edit it to get it ready to share
3. I'll send an email with highlights and the link to the finalized Google
doc

I propose to do this monthly.

What do you guys think?

Gris


Re: Slack channel

2017-09-21 Thread Thomas Groh
Done.

On Thu, Sep 21, 2017 at 1:33 PM, Max Barrios  wrote:

> Hello
>
> Can someone please add me to the Beam slack channel?
>
> Thanks,
>
> -Max
>


Slack channel

2017-09-21 Thread Max Barrios
Hello

Can someone please add me to the Beam slack channel?

Thanks,

-Max


Slack channel

2017-09-21 Thread Max Barrios
Hello

Can someone please add me to the Beam slack channel?

Thanks,

-Max


Regarding Beam Slack Channel

2017-09-21 Thread Srinivas Reddy
Hello,
Can someone please add me to the Beam slack channel?

Thanks.

-Srinivas



--

Srinivas Mahendar Reddy

+91-9177684321 <091776%2084321>

mrsrinivas.com/so


(Sent via gmail web)


Best way to read files with timestamps in names

2017-09-21 Thread Vilhelm von Ehrenheim
Hi!
I have encountered this a few times but have only solved it using some ugly
hack so far so I thought I'd ask this time.

If I have a bunch of files with timestamps in their names but with no
timestamps in the data, how should I best read them into a PCollection of
timestamped values?

The files are json or CSV files and is I use the TextIO.read I don't have
the filenames available anymore.

If the best way to do this to write your own source? In that case how can I
most easily get the filename or timestamp into the data using essentially
everything else from TextIO? I tried doing this using a filebased source
but it didn't pan out too well.

Or is it better to do a DoFn that reads a PCollection of filenames and then
itself reads these files and fan-out? I have had some bad experiences with
fan-out so I'm not sure this is good either.

If anyone has solved this it would be really interesting to know what the
best approach would be.

Thanks!
Vilhelm von Ehrenheim


Re: Testing transforms with generics using PAssert.containsAnyOrder

2017-09-21 Thread Vilhelm von Ehrenheim
Digging a bit more in this. I might have set the Coder in the wrong place
before.

Using setCoder directly after the parameterized transform, explicitly
specifying the correct subclass (`AvroCoder.of(Person.class)` in this case)
does give me a Person object in the next step as suggested.

Thank you so much and sorry for my confusion!

Still a bit interesting that I didn't need to do this explicitly when
running in Dataflow but it was needed in the Testing case.

Br,
Vilhelm

On Wed, Sep 20, 2017 at 11:50 PM, Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:

> I dont have any mix of type in the actual pcollections. I just wanted to
> parameterize a transform that does essentially the same kind of aggregation
> regardless of the types used for the aggregation. They are later used in
> different places in my pipelines.
>
> I'm not even sure I have a problem with the coder to be honest. It does
> work really well running it in Dataflow but gives me strange results when
> running it in the tests. Even though I'm doing the same transforms. Thats
> why I thought there might be a difference in how PAssert does it but I
> realize that transforming it into a string before the PAssert also gives me
> the superclass representation. I really don't see what could be the cause
> of the difference as this is not the case in Dataflow.
>
>
> On Wed, Sep 20, 2017 at 11:39 PM, Thomas Groh  wrote:
>
>> It looks like you're trying to encode a field with a null value, and Avro
>> does not consider that field to be nullable. I believe you can use
>> "ReflectData.AllowNull" to construct a schema which can encode/decode these
>> records, but by default such a thing is not done. Additionally, if there
>> are multiple types which require different encodings or processing, it gets
>> harder to reason about the coder decoding the original inputs with full
>> fidelity - splitting these into multiple PCollections is preferable if they
>> need to be processed as their original type.
>>
>> On Wed, Sep 20, 2017 at 2:24 PM, Vilhelm von Ehrenheim <
>> vonehrenh...@gmail.com> wrote:
>>
>>> @Daniel: What is the TestSatisfies in your PAssert.satisfies? I tried
>>> with my own SerializableFunction but cannot get it to work.
>>>
>>> @Eugene: I can get rid of the error by assigning a default coder to the
>>> superclass. However the test problem persists. I tried to do a
>>> .setCoder() on the resulting collection but that faild with a
>>> NullPointerException. I really don’t get why since the stacktrace is quite
>>> extensive but not very helpful:
>>>
>>> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
>>> java.lang.NullPointerException: in co.motherbrain.materializer.Person null 
>>> of co.motherbrain.materializer.Person
>>>
>>> at 
>>> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:344)
>>> at 
>>> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:314)
>>> at 
>>> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:208)
>>> at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
>>> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>>> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:356)
>>> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:338)
>>> at 
>>> co.motherbrain.materializer.MaterializerTest.testMaterializePersonFn(MaterializerTest.java:109)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at 
>>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>>> at 
>>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>>> at 
>>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>>> at 
>>> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>>> at 
>>> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>>> at 
>>> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:327)
>>> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>>> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>>> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>>> at 
>>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>>> at 
>>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>>> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>>> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>>> at