Re: Flink runner scala version

2017-07-26 Thread Aljoscha Krettek
Hi Boris,

There is this PR that introduces a profile for building with Scala 2.11 
dependencies: https://github.com/apache/beam/pull/3255 
. However, this does not integrate 
with the release process and would require manually compiling Beam for Scala 
2.11.

I'll make sure that this gets in quickly now.

Best,
Aljoscha

> On 25. Jul 2017, at 16:12, Boris Lublinsky  
> wrote:
> 
> Flink Beam runner is great but it is still on Scala version 2.10.
> Are their any plans with dates for support of Scala 2.11?
> 
> Boris Lublinsky
> FDP Architect
> boris.lublin...@lightbend.com 
> https://www.lightbend.com/
> 



Re: [Python] Stateful processing in Python SDK

2017-07-26 Thread Ahmet Altay
HI Vilhelm,

Python SDK currently does not support stateful processing. We should update
the capability matrix to show this. I filed
https://issues.apache.org/jira/browse/BEAM-2687 to track this feature. Feel
free to follow it there or better make it happen. As far as I know, nobody
is actively working on it and will unlikely to be supported in the short
term.

Thank you,
Ahmet

On Tue, Jul 25, 2017 at 3:49 AM, Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:

> Hi!
> Is there any way to do stateful processing in Python Beam SDK?
>
> I am trying to train a LSHForest for approximate nearest neighbor search.
> Using the scikit-learn implementation it is possible to do partial fit's so
> I can gather up mini batches and fit the model on those in sequence using
> ParDo. However, to my understanding, there is no way for me to control on
> how many bundles the ParDo will execute over and therefore the training
> makes little sense and I will end up with a lot of different models, rather
> than one.
>
> Another approach would be to create a CombineFn that accumulates values by
> training  the model on but There is no intuitive way to combine models in
> `merge_accumulators` so I don't think that'll fit either.
>
> Does it makes sense to pass the whole pcollection as a list in a side
> input and train the model as so? In that case how should I chop the pcol
> into batches that I can loop over in a nice way? If I read the whole set
> I'll most likely run out of memory.
>
> I've found that there exist stateful processing in the Java SDK but it
> seems to be missing in python still.
>
> Any help/ideas are greatly appreciated.
>
> Thanks,
> Vilhelm von Ehrenheim
>


Re: Problem when trying to specify the runner when executing as a packaged uberjar

2017-07-26 Thread Nathan Deren
Hi,

I know this thread is a couple of months old, but I’m running into a similar 
problem. I’m trying to package up my dataflow pipeline as a jar for spinning up 
multiple configurable jobs on a QA environment, but I keep getting the same 
exception (but with the dataflow runner instead of the flink runner; it still 
sees the direct runner). I’m packaging it all up with the 
maven-assembly-plugin; what should I specify in my Maven classpath so that the 
dataflow runner gets recognized? My configuration is as follows:

maven-assembly-plugin


make-assembly
package

single







com.fully.qualified.pipeline.mainclass



jar-with-dependencies



On 2017-05-24 14:45 (-0700), Thomas Groh 
mailto:t...@google.com>> wrote:
> You should specify the Flink profile when you execute, with>
> '-Pflink-runner'. That will add the Flink dependency to your classpath when>
> you execute the pipeline.>
>
> On Wed, May 24, 2017 at 2:35 PM, Claire Yuan 
> mailto:cl...@yahoo-inc.com>>>
> wrote:>
>
> > Hi all,>
> >   When I tried specify the runner in command for the examples:>
> > *mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount>
> > -Dexec.args="--output=wordcount.txt --runner=FlinkRunner" *>
> >>
> >   The building was failed and got error message as:>
> > *Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:java>
> > (default-cli) on project beam-examples-java: An exception occured while>
> > executing the Java class. null: InvocationTargetException: Unknown 'runner'>
> > specified 'FlinkRunner', supported pipeline runners [DirectRunner] -> [Help>
> > 1]*>
> >>
> >I am wondering if anyone got the same issue and how you solved it.>
> >>
>

Zonar Systems


Re: Missing getOptions on Pipeline class

2017-07-26 Thread Csaba Kassai
Hi Eugene,

thanks for the answer, with the templates it totally makes sense.
Csabi


On Wed, 26 Jul 2017 at 18:10 Eugene Kirpichov  wrote:

> Hi Csaba,
>
> getOptions() was removed, and capturing PipelineOptions in the transform
> constructor is discouraged (or perhaps forbidden - not sure) because of the
> addition of templates (ValueProvider's) - the pipeline may be constructed,
> saved in a template, and then the template can be run with different
> PipelineOptions. Because of that, the Pipeline object itself can not have a
> defined set of options; and if you capture the PipelineOptions in the
> transform constructor, you may end up with confusing behavior if this
> pipeline is run with a different set of options than it was constructed
> with.
>
> Not sure if we have docs for that yet on the Beam website (if we don't, we
> should), but meanwhile take a look at the Dataflow docs
> https://cloud.google.com/dataflow/docs/templates/creating-templates
>
> Note that you're still allowed to access PipelineOptions, for example,
> from a DoFn, via ProcessContext.pipelineOptions() - because at runtime,
> options are always available.
>
> Let me know if this helps.
>
> On Wed, Jul 26, 2017 at 8:52 AM Csaba Kassai 
> wrote:
>
>> Hi,
>>
>> we are currently migrating our pipelines written with the 1.9.x
>> (pre-beam) Dataflow Java SDK to the 2.0.0 version which is based on the
>> 2.0.0 Beam SDK.
>> One change which cases a lot of headache is that getOptions method was
>> removed from the Pipeline class.
>> We used this method a lot during constructing the pipelines, for example
>> in composite PTransforms like this:
>>
>> class MyTransform extends PTransform, PDone> {
>>
>>public PDone apply(PCollection input) {
>>  PipelineOptions options = input.getPipeline().getOptions();
>> 
>>   }
>> }
>>
>> Could you tell me what was the motivation for removing this method from
>> the pipeline?
>> Is there a nicer way to get the options inside a composite
>> transformation, than to pass it via the constructor?
>>
>> Thanks,
>> Csabi
>>
>>
>>
>>
> --
--
[image: photo]
Csaba Kassai 
Data Architect
M:  +36703379122
LinkedIn  *•* Facebook
 *•* Blog 
Doctusoft 


Re: Slack invite

2017-07-26 Thread Punit Naik
Thanks!

On Jul 26, 2017 11:26 PM, "Mingmin Xu"  wrote:

> done
>
> On Wed, Jul 26, 2017 at 10:52 AM, Punit Naik 
> wrote:
>
>> Could I get one slack invite too, please?
>>
>>
>> On Jul 26, 2017 11:20 PM, "Mingmin Xu"  wrote:
>>
>> sent, welcome @Nathan.
>>
>> On Wed, Jul 26, 2017 at 10:47 AM, Nathan Deren <
>> nathan.de...@zonarsystems.com> wrote:
>>
>>> Hi,
>>>
>>> Could I get a slack invite, please?
>>>
>>> Thanks very much!
>>> —Nathan Deren
>>>
>>
>>
>>
>> --
>> 
>> Mingmin
>>
>>
>>
>
>
> --
> 
> Mingmin
>


Re: Slack invite

2017-07-26 Thread Mingmin Xu
done

On Wed, Jul 26, 2017 at 10:52 AM, Punit Naik  wrote:

> Could I get one slack invite too, please?
>
>
> On Jul 26, 2017 11:20 PM, "Mingmin Xu"  wrote:
>
> sent, welcome @Nathan.
>
> On Wed, Jul 26, 2017 at 10:47 AM, Nathan Deren <
> nathan.de...@zonarsystems.com> wrote:
>
>> Hi,
>>
>> Could I get a slack invite, please?
>>
>> Thanks very much!
>> —Nathan Deren
>>
>
>
>
> --
> 
> Mingmin
>
>
>


-- 

Mingmin


Re: Slack invite

2017-07-26 Thread Punit Naik
Could I get one slack invite too, please?

On Jul 26, 2017 11:20 PM, "Mingmin Xu"  wrote:

sent, welcome @Nathan.

On Wed, Jul 26, 2017 at 10:47 AM, Nathan Deren <
nathan.de...@zonarsystems.com> wrote:

> Hi,
>
> Could I get a slack invite, please?
>
> Thanks very much!
> —Nathan Deren
>



-- 

Mingmin


Re: Slack invite

2017-07-26 Thread Nathan Deren
Thanks very much!

From: Mingmin Xu
Reply-To: "user@beam.apache.org"
Date: Wednesday, July 26, 2017 at 10:50 AM
To: "user@beam.apache.org"
Subject: Re: Slack invite

sent, welcome @Nathan.

On Wed, Jul 26, 2017 at 10:47 AM, Nathan Deren 
mailto:nathan.de...@zonarsystems.com>> wrote:
Hi,

Could I get a slack invite, please?

Thanks very much!
—Nathan Deren



--

Mingmin


Re: Slack invite

2017-07-26 Thread Mingmin Xu
sent, welcome @Nathan.

On Wed, Jul 26, 2017 at 10:47 AM, Nathan Deren <
nathan.de...@zonarsystems.com> wrote:

> Hi,
>
> Could I get a slack invite, please?
>
> Thanks very much!
> —Nathan Deren
>



-- 

Mingmin


Slack invite

2017-07-26 Thread Nathan Deren
Hi,

Could I get a slack invite, please?

Thanks very much!
—Nathan Deren


Re: Missing getOptions on Pipeline class

2017-07-26 Thread Eugene Kirpichov
Hi Csaba,

getOptions() was removed, and capturing PipelineOptions in the transform
constructor is discouraged (or perhaps forbidden - not sure) because of the
addition of templates (ValueProvider's) - the pipeline may be constructed,
saved in a template, and then the template can be run with different
PipelineOptions. Because of that, the Pipeline object itself can not have a
defined set of options; and if you capture the PipelineOptions in the
transform constructor, you may end up with confusing behavior if this
pipeline is run with a different set of options than it was constructed
with.

Not sure if we have docs for that yet on the Beam website (if we don't, we
should), but meanwhile take a look at the Dataflow docs
https://cloud.google.com/dataflow/docs/templates/creating-templates

Note that you're still allowed to access PipelineOptions, for example, from
a DoFn, via ProcessContext.pipelineOptions() - because at runtime, options
are always available.

Let me know if this helps.

On Wed, Jul 26, 2017 at 8:52 AM Csaba Kassai 
wrote:

> Hi,
>
> we are currently migrating our pipelines written with the 1.9.x (pre-beam)
> Dataflow Java SDK to the 2.0.0 version which is based on the 2.0.0 Beam SDK.
> One change which cases a lot of headache is that getOptions method was
> removed from the Pipeline class.
> We used this method a lot during constructing the pipelines, for example
> in composite PTransforms like this:
>
> class MyTransform extends PTransform, PDone> {
>
>public PDone apply(PCollection input) {
>  PipelineOptions options = input.getPipeline().getOptions();
> 
>   }
> }
>
> Could you tell me what was the motivation for removing this method from
> the pipeline?
> Is there a nicer way to get the options inside a composite transformation,
> than to pass it via the constructor?
>
> Thanks,
> Csabi
>
>
>
>


Missing getOptions on Pipeline class

2017-07-26 Thread Csaba Kassai
Hi,

we are currently migrating our pipelines written with the 1.9.x (pre-beam)
Dataflow Java SDK to the 2.0.0 version which is based on the 2.0.0 Beam SDK.
One change which cases a lot of headache is that getOptions method was
removed from the Pipeline class.
We used this method a lot during constructing the pipelines, for example in
composite PTransforms like this:

class MyTransform extends PTransform, PDone> {

   public PDone apply(PCollection input) {
 PipelineOptions options = input.getPipeline().getOptions();

  }
}

Could you tell me what was the motivation for removing this method from the
pipeline?
Is there a nicer way to get the options inside a composite transformation,
than to pass it via the constructor?

Thanks,
Csabi