date:20170713

[GitHub] beam pull request #3553: [BEAM-2610] upgrade to version 2.2.0

2017-07-13 Thread XuMingmin

Github user XuMingmin closed the pull request at:

https://github.com/apache/beam/pull/3553


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #3553: [BEAM-2610] upgrade to version 2.2.0

2017-07-13 Thread XuMingmin

GitHub user XuMingmin reopened a pull request:

https://github.com/apache/beam/pull/3553

[BEAM-2610] upgrade to version 2.2.0

As described in task 
[BEAM-2610](https://issues.apache.org/jira/browse/BEAM-2610), this is the first 
PR to do the job. Feel free to merge it if no mis-operation during creating the 
PR as any issues are supposed to be fixed in the second PR.

Btw, the failure is expected as dsl/sql is still using `2.1.0-SNAPSHOT`, 
and be fixed in PR 2

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3553


commit eae0d05bd7c088accd927dcfe3e511efbb11c9fd
Author: IsmaÃ«l MejÃa 
Date:   2017-06-20T09:49:25Z

This closes #3391

commit 6e4357225477d6beb4cb9735255d1759f4fab168
Author: Eugene Kirpichov 
Date:   2017-06-19T18:56:29Z

Retries http code 0 (usually network error)

commit c1a2226c90bed7b7bf68a4cd240c849dc46e55ac
Author: Luke Cwik 
Date:   2017-06-20T15:53:59Z

Retries http code 0 (usually network error)

This closes #3394

commit 5e12e9d75ab78f210b3b024a77c52aaec033218c
Author: jasonkuster 
Date:   2017-06-20T19:05:22Z

Remove notifications from JDK versions test.

commit 0eb4004a8b91760a66585fb486226513686af002
Author: Kenneth Knowles 
Date:   2017-06-20T19:39:34Z

This closes #3403: Remove notifications from JDK versions test.

commit b7ff103f6ee10b07c50ddbd5a49a6a8ce6686087
Author: Eugene Kirpichov 
Date:   2017-06-16T21:27:51Z

Increases backoff in GcsUtil

commit 59598d8f41e65f9a068d7446457395e112dc3bc7
Author: Luke Cwik 
Date:   2017-06-20T20:11:06Z

Increases backoff in GcsUtil

This closes #3381

commit a0523b2dab617d6aee59708a8d8959f42049fce9
Author: Vikas Kedigehalli 
Date:   2017-06-19T18:24:14Z

Fix dataflow runner test to call pipeline.run instead of runner.run

commit f51fdd960cbfbb9ab2b2870606bd0e221d4beceb
Author: chamik...@google.com 
Date:   2017-06-20T20:32:49Z

This closes #3393

commit 08ec0d4dbff330ecd48c806cd764ab5a96835bd9
Author: Robert Bradshaw 
Date:   2017-06-20T18:01:03Z

Port fn_api_runner to be able to use runner protos.

commit e4ef23e16859e31e09e5fe6cf861d6f3db816b22
Author: Robert Bradshaw 
Date:   2017-06-20T20:47:31Z

Closes #3361

commit f69e3b53fafa4b79b21095d4b65edbe7cfeb7d2a
Author: Pei He 
Date:   2017-06-19T22:55:48Z

FlinkRunner: remove the unused ReflectiveOneToOneOverrideFactory.

commit 52794096aa8b4d614423fd787835f5b89b1ea1ac
Author: Pei He 
Date:   2017-06-19T23:10:02Z

Flink runner: refactor the translator into two phases: rewriting and 
translating.

commit 608a9c4590ebd94e53ee1ec7f3ad60bfb4905c11
Author: Pei He 
Date:   2017-06-20T21:12:55Z

This closes #3275

commit 42a2de91adf1387bb8eaf9aa515a24f6f276bf40
Author: Mairbek Khadikov 
Date:   2017-06-14T20:03:36Z

Support ValueProviders in SpannerIO.Write

commit 10e47646dd5f20d4049d670249cae56c51768ae0
Author: Eugene Kirpichov 
Date:   2017-06-20T21:25:56Z

This closes #3358: [BEAM-1542] Support ValueProviders in SpannerIO

commit 69b01a6118702277348d2f625af669225c9ed99e
Author: Reuven Lax 
Date:   2017-05-13T19:53:08Z

Add spilling code to WriteFiles.

commit 698b89e2b5b88403a5c762b039d3ec8c48b25b26
Author: Eugene Kirpichov 
Date:   2017-06-20T21:28:39Z

This closes #3161: [BEAM-2302] Add spilling code to WriteFiles.

commit a06c8bfae6fb9e35deeb4adfdd7761889b12be89
Author: Eugene Kirpichov 
Date:   2017-02-02T01:26:55Z

[BEAM-1377] Splittable DoFn in Dataflow streaming runner

Transform expansion and translation for the involved primitive
transforms. Of course, the current PR will only work after the
respective Dataflow worker and backend changes are released.

commit 4f6032c9c1774a9797e3ff25cc2a05fe56453f21
Author: Eugene Kirpichov 
Date:   2017-06-19T15:34:31Z

Bump Dataflow worker to 20170619

commit fd40d4b29d3e46dfe25bd7cea65eb7b51dde135f
Author: Eugene Kirpichov 
Date:   2017-06-20T23:27:26Z

This closes #1898: [BEAM-1377] Splittable DoFn in Dataflow streaming runner

commit ef19024d2e9dc046c6699aeee1edc483beb9a009
Author: Ahmet Altay 
Date:   2017-06-20T21:25:55Z

Add a cloud-pubsub dependency to the list of gcp extra packages

commit

[jira] [Commented] (BEAM-2610) upgrade to version 2.2.0

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086825#comment-16086825
 ] 

ASF GitHub Bot commented on BEAM-2610:
--

GitHub user XuMingmin reopened a pull request:

https://github.com/apache/beam/pull/3553

[BEAM-2610] upgrade to version 2.2.0

As described in task 
[BEAM-2610](https://issues.apache.org/jira/browse/BEAM-2610), this is the first 
PR to do the job. Feel free to merge it if no mis-operation during creating the 
PR as any issues are supposed to be fixed in the second PR.

Btw, the failure is expected as dsl/sql is still using `2.1.0-SNAPSHOT`, 
and be fixed in PR 2

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3553


commit eae0d05bd7c088accd927dcfe3e511efbb11c9fd
Author: Ismaël Mejía 
Date:   2017-06-20T09:49:25Z

This closes #3391

commit 6e4357225477d6beb4cb9735255d1759f4fab168
Author: Eugene Kirpichov 
Date:   2017-06-19T18:56:29Z

Retries http code 0 (usually network error)

commit c1a2226c90bed7b7bf68a4cd240c849dc46e55ac
Author: Luke Cwik 
Date:   2017-06-20T15:53:59Z

Retries http code 0 (usually network error)

This closes #3394

commit 5e12e9d75ab78f210b3b024a77c52aaec033218c
Author: jasonkuster 
Date:   2017-06-20T19:05:22Z

Remove notifications from JDK versions test.

commit 0eb4004a8b91760a66585fb486226513686af002
Author: Kenneth Knowles 
Date:   2017-06-20T19:39:34Z

This closes #3403: Remove notifications from JDK versions test.

commit b7ff103f6ee10b07c50ddbd5a49a6a8ce6686087
Author: Eugene Kirpichov 
Date:   2017-06-16T21:27:51Z

Increases backoff in GcsUtil

commit 59598d8f41e65f9a068d7446457395e112dc3bc7
Author: Luke Cwik 
Date:   2017-06-20T20:11:06Z

Increases backoff in GcsUtil

This closes #3381

commit a0523b2dab617d6aee59708a8d8959f42049fce9
Author: Vikas Kedigehalli 
Date:   2017-06-19T18:24:14Z

Fix dataflow runner test to call pipeline.run instead of runner.run

commit f51fdd960cbfbb9ab2b2870606bd0e221d4beceb
Author: chamik...@google.com 
Date:   2017-06-20T20:32:49Z

This closes #3393

commit 08ec0d4dbff330ecd48c806cd764ab5a96835bd9
Author: Robert Bradshaw 
Date:   2017-06-20T18:01:03Z

Port fn_api_runner to be able to use runner protos.

commit e4ef23e16859e31e09e5fe6cf861d6f3db816b22
Author: Robert Bradshaw 
Date:   2017-06-20T20:47:31Z

Closes #3361

commit f69e3b53fafa4b79b21095d4b65edbe7cfeb7d2a
Author: Pei He 
Date:   2017-06-19T22:55:48Z

FlinkRunner: remove the unused ReflectiveOneToOneOverrideFactory.

commit 52794096aa8b4d614423fd787835f5b89b1ea1ac
Author: Pei He 
Date:   2017-06-19T23:10:02Z

Flink runner: refactor the translator into two phases: rewriting and 
translating.

commit 608a9c4590ebd94e53ee1ec7f3ad60bfb4905c11
Author: Pei He 
Date:   2017-06-20T21:12:55Z

This closes #3275

commit 42a2de91adf1387bb8eaf9aa515a24f6f276bf40
Author: Mairbek Khadikov 
Date:   2017-06-14T20:03:36Z

Support ValueProviders in SpannerIO.Write

commit 10e47646dd5f20d4049d670249cae56c51768ae0
Author: Eugene Kirpichov 
Date:   2017-06-20T21:25:56Z

This closes #3358: [BEAM-1542] Support ValueProviders in SpannerIO

commit 69b01a6118702277348d2f625af669225c9ed99e
Author: Reuven Lax 
Date:   2017-05-13T19:53:08Z

Add spilling code to WriteFiles.

commit 698b89e2b5b88403a5c762b039d3ec8c48b25b26
Author: Eugene Kirpichov 
Date:   2017-06-20T21:28:39Z

This closes #3161: [BEAM-2302] Add spilling code to WriteFiles.

commit a06c8bfae6fb9e35deeb4adfdd7761889b12be89
Author: Eugene Kirpichov 
Date:   2017-02-02T01:26:55Z

[BEAM-1377] Splittable DoFn in Dataflow streaming runner

Transform expansion and translation for the involved primitive
transforms. Of course, the current PR will only work after the
respective Dataflow worker and backend changes are released.

commit 4f6032c9c1774a9797e3ff25cc2a05fe56453f21
Author: Eugene Kirpichov 
Date:   2017-06-19T15:34:31Z

Bump Dataflow worker to 20170619

commit fd40d4b29d3e46dfe25bd7cea65eb7b51dde135f
Author: Eugene Kirpichov 
Date:   2017-06-20T23:27:26Z

This closes #1898: [BEAM-1377] Splittable DoFn in Dataflow streaming runner

[jira] [Commented] (BEAM-2562) Add integration test for logical operators

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086817#comment-16086817
 ] 

ASF GitHub Bot commented on BEAM-2562:
--

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3560

[BEAM-2562] Add integration test for logical operators

R: @xumingming @takidau  @jbonofre 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2562

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3560


commit 79b3bda24f5f0907de22188586cdf2e44ca52f24
Author: mingmxu 
Date:   2017-07-13T16:44:27Z

support Types.BOOLEAN, add integration test for logical operations




> Add integration test for logical operators
> --
>
> Key: BEAM-2562
> URL: https://issues.apache.org/jira/browse/BEAM-2562
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3560: [BEAM-2562] Add integration test for logical operat...

2017-07-13 Thread XuMingmin

GitHub user XuMingmin opened a pull request:

https://github.com/apache/beam/pull/3560

[BEAM-2562] Add integration test for logical operators

R: @xumingming @takidau  @jbonofre 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuMingmin/beam BEAM-2562

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3560.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3560


commit 79b3bda24f5f0907de22188586cdf2e44ca52f24
Author: mingmxu 
Date:   2017-07-13T16:44:27Z

support Types.BOOLEAN, add integration test for logical operations




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-2614) Harness doesn't build with Java7

2017-07-13 Thread Manu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086800#comment-16086800
 ] 

Manu Zhang commented on BEAM-2614:
--

The real problem is {{beam-sdks-java-javadoc}} has a dependency on 
{{beam-sdks-java-harness}} regardless of Java version and fail to build as 
following

{code}
[ERROR] Failed to execute goal on project beam-sdks-java-javadoc: Could not 
resolve dependencies for project 
org.apache.beam:beam-sdks-java-javadoc:pom:2.1.0: Failure to find 
org.apache.beam:beam-sdks-java-harness:jar:2.1.0 in 
https://repo.maven.apache.org/maven2 was cached in the local repository, 
resolution will not be reattempted until the update interval of central has 
elapsed or updates are forced
{code}


> Harness doesn't build with Java7
> 
>
> Key: BEAM-2614
> URL: https://issues.apache.org/jira/browse/BEAM-2614
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Beam is supposed to fully build with Java7. However, the {{harness}} module 
> doesn't:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile 
> (default-compile) on project beam-sdks-java-harness: Fatal error compiling: 
> invalid target release: 1.8 -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2614) Harness doesn't build with Java7

2017-07-13 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086734#comment-16086734
 ] 

Luke Cwik commented on BEAM-2614:
-

I think this is working as intended as I think our only goal is that users can 
write pipelines using Java 7, not necessarily restrict all modules to be built 
with Java 7 (e.g. java8 lambda modules). I think in this case we should be able 
to use the java8 profile to enable/disable building the harness module.

> Harness doesn't build with Java7
> 
>
> Key: BEAM-2614
> URL: https://issues.apache.org/jira/browse/BEAM-2614
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Beam is supposed to fully build with Java7. However, the {{harness}} module 
> doesn't:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile 
> (default-compile) on project beam-sdks-java-harness: Fatal error compiling: 
> invalid target release: 1.8 -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3575

2017-07-13 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4370

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Dmitry Demeshchuk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086675#comment-16086675
 ] 

Dmitry Demeshchuk commented on BEAM-2572:
-

[~altay] I actually had to struggle quite a lot before I could make it work 
properly. Juliaset example at some point stopped working for me (some 
setuptools-related issues, which I'm still to reproduce and report here in 
JIRA), so I spent a couple days before having made it work for my use case 
(installing psycopg2 dependencies). It involved talking to people, reading the 
docs on setuptools and distutils, plus a lot of debugging of Dataflow jobs. I 
can tell for sure that if any other data engineers or data scientists decided 
to go the same path, they would be very likely to just give up.

The interface we ended up having at Postmates was basically this:
{code}
import dataflow

p = dataflow.Pipeline(
'my-namespace',
provision=[
['apt-get', 'install', '-y', 'libpsql-dev'],
['pip', 'install', '-y', 'psycopg2'],
]
)
{code}

While I think some of the decisions here (hiding pipeline options object, etc) 
were questionable, it was at least much easier for people to just write a 
single Python script and make things run on Dataflow, without learning about 
the complications of dependency handling, or the way setuptools work.

I also understand that this approach may be not usable for non-Dataflow runners 
(although we don't have any other for Python yet, besides the direct one). But 
I do think that saying "if you use AWS sources and sinks, you'd have to write a 
setup.py file and do some magic" is a bit of an overkill.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086657#comment-16086657
 ] 

Ahmet Altay commented on BEAM-2572:
---

The supported way today to setup environment variables of Dataflow workers (and 
hopefully this will be true for future runners) is to use a custom setup.py 
(e.g. https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/) 
This allows executing arbitrary commands at installation time, including settin 
up environment variables. We can improve the documentation. Do you think that 
this is not easy enough for users in general?

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2625

2017-07-13 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3396

2017-07-13 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3574

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Dmitry Demeshchuk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086535#comment-16086535
 ] 

Dmitry Demeshchuk commented on BEAM-2572:
-

re 1: I just don't want us to end up in a situation like this:

List: We just released an S3 filesystem! Please use it and tell us what you 
think!
User7231: Hi, how do I provide credentials for the filesystem, in case I run my 
stuff on Dataflow?
List: Just set up envrionment variables AWS_ACCESS_KEY_ID and 
AWS_SECRET_ACCESS_KEY on your Dataflow nodes!
User7231: Cool, how can I do that?
List: Well, there's no official way, so you just hack yourself a custom 
package, or something like that!

We only have two runners for Python right now: Direct and Dataflow. I think it 
would make sense to make things runnable in Dataflow too, even if configuring 
the environment is going to be a Dataflow-specific mechanism, totally 
independent from Beam. What worries me about making it a Dataflow feature is 
that the whole Beam S3 feature will become dependent on the Dataflow planning 
and release cycle, before it can be somewhat usable to people.

re 2, 3: That's a good point. FWIW, I'm all in for reducing the scope and 
complexity of this feature. Would rather have a non-ideal solution in a month, 
than an ideal solution someday.


I apologize for dragging this conversation so far out, there just seems no 
clear consensus on the subject, and I really want this to be usable beyond just 
the direct runner.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4369

2017-07-13 Thread Apache Jenkins Server

See

Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #4368

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086467#comment-16086467
 ] 

Chamikara Jayalath commented on BEAM-2572:
--

1:  Please seem my previous comment. How the environment variable will be set 
is runner specific. I don't see the need to provide a generalized user 
interface for this (I might be wrong :-) ).

2, 3: I agree that environment variable based approach is inadequate if we want 
to customize to the level you mentioned (different access credentials for 
read/write, per-bucket credentials). Question is we need to customize to that 
level in practice. Seems like this is out of scope of this JIRA issue and it 
might make sense to raise this question in dev list.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2538) Spanner IO ITs failing

2017-07-13 Thread Mairbek Khadikov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086459#comment-16086459
 ] 

Mairbek Khadikov commented on BEAM-2538:


 Stephen, can we close this?

> Spanner IO ITs failing
> --
>
> Key: BEAM-2538
> URL: https://issues.apache.org/jira/browse/BEAM-2538
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Reporter: Stephen Sisk
>Assignee: Mairbek Khadikov
>
> Both in local dev, and in the jenkins PostCommit_MavenInstall tests, the 
> spanner integration tests are currently failing. 
> There appear to be two different failures occurring. 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/4245/org.apache.beam$beam-runners-google-cloud-dataflow-java/testReport/junit/org.apache.beam.sdk.io.gcp.spanner/SpannerReadIT/
> first error:
> java.lang.NullPointerException
>   at 
> org.apache.beam.sdk.io.gcp.spanner.SpannerReadIT.tearDown(SpannerReadIT.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at 
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> and the second error: 
> java.lang.NoClassDefFoundError: 
> org/apache/commons/text/RandomStringGenerator$Builder
>   at 
> org.apache.beam.sdk.io.gcp.spanner.SpannerReadIT.generateDatabaseName(SpannerReadIT.java:164)
>   at 
> org.apache.beam.sdk.io.gcp.spanner.SpannerReadIT.setUp(SpannerReadIT.java:95)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at 
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassNotFoundException: 
>

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2624

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Dmitry Demeshchuk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086434#comment-16086434
 ] 

Dmitry Demeshchuk commented on BEAM-2572:
-

Couple problems that come to my mind about environment-originated configuration:

1. How do we configure the runner's environment in the first place, on the user 
level? Another pipeline option? Or make users hack their solution themselves? I 
agree that it's technically possible to do, just like provisioning a Dataflow 
container from inside Beam is, but it currently requires a lot of 
trial-and-error hacking. If we go that path, I'd like to first figure out this 
environment configuration piece first, because without it the FileSystem 
implementation would be useless.

2. Some people on this thread (and on the mailing list) mentioned that we may 
want to have multiple sets of credentials. Reading/writing may be using 
separate accounts/tokens, as well as accessing different buckets may. How would 
we configure that through the environment? Separating reading/writing concerns 
seems doable, but I'm not so sure about per-bucket access, for instance. Maybe 
it's fine saying "we won't support that, at least for now".

3. It feels like environment may be a bit too generally accessible/visible, 
which makes accident leaking of credentials much easier. Maybe we should be 
storing them at least in files, e.g. {{~/.aws/credentials}} or 
{{~/.config/gcloud/}}? But then, it makes multi-credential access a bit 
trickier.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3395

2017-07-13 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3573

2017-07-13 Thread Apache Jenkins Server

See

Build failed in Jenkins: beam_PostCommit_Python_Verify #2720

2017-07-13 Thread Apache Jenkins Server

See 


Changes:

[klk] Unbundle Context and WindowedContext.

--
[...truncated 567.12 KB...]
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
test_default_value_singleton_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-36.1.1.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-3.1.1.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
test_as_singleton_with_different_defaults 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-36.1.1.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-3.1.1.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
test_as_singleton_without_unique_labels 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
DEPRECATION: pip install --download has been deprecated and will be removed in 
the future. Pip now has a download command that should be used instead.
Collecting pyhamcrest (from -r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/PyHamcrest-1.9.0.tar.gz
Collecting mock (from -r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/mock-2.0.0.tar.gz
Collecting setuptools (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded 
/tmp/dataflow-requirements-cache/setuptools-36.1.1.zip
Collecting six (from pyhamcrest->-r postcommit_requirements.txt (line 1))
  File was already downloaded /tmp/dataflow-requirements-cache/six-1.10.0.tar.gz
Collecting funcsigs>=1 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded 
/tmp/dataflow-requirements-cache/funcsigs-1.0.2.tar.gz
Collecting pbr>=0.11 (from mock->-r postcommit_requirements.txt (line 2))
  File was already downloaded /tmp/dataflow-requirements-cache/pbr-3.1.1.tar.gz
Successfully downloaded pyhamcrest mock setuptools six funcsigs pbr
test_empty_singleton_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
test_flattened_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
test_iterable_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok
test_multi_valued_singleton_side_input 
(apache_beam.transforms.sideinputs_test.SideInputsTest) ... ok

==
ERROR: test_multiple_empty_outputs 
(apache_beam.transforms.ptransform_test.PTransformTest)
--
Traceback (most recent call last):
  File 
"
 line 262, in test_multiple_empty_outputs
pipeline.run()
  File

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2623

2017-07-13 Thread Apache Jenkins Server

See

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3394

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2595) WriteToBigQuery does not work with nested json schema

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086384#comment-16086384
 ] 

ASF GitHub Bot commented on BEAM-2595:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3556


> WriteToBigQuery does not work with nested json schema
> -
>
> Key: BEAM-2595
> URL: https://issues.apache.org/jira/browse/BEAM-2595
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 2.1.0
> Environment: mac os local runner, Python
>Reporter: Andrea Pierleoni
>Assignee: Sourabh Bajaj
>Priority: Minor
>  Labels: gcp
> Fix For: 2.1.0
>
>
> I am trying to use the new `WriteToBigQuery` PTransform added to 
> `apache_beam.io.gcp.bigquery` in version 2.1.0-RC1
> I need to write to a bigquery table with nested fields.
> The only way to specify nested schemas in bigquery is with teh json schema.
> None of the classes in `apache_beam.io.gcp.bigquery` are able to parse the 
> json schema, but they accept a schema as an instance of the class 
> `apache_beam.io.gcp.internal.clients.bigquery.TableFieldSchema`
> I am composing the `TableFieldSchema` as suggested here 
> [https://stackoverflow.com/questions/36127537/json-table-schema-to-bigquery-tableschema-for-bigquerysink/45039436#45039436],
>  and it looks fine when passed to the PTransform `WriteToBigQuery`. 
> The problem is that the base class `PTransformWithSideInputs` try to pickle 
> and unpickle the function 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L515]
>   (that includes the TableFieldSchema instance) and for some reason when the 
> class is unpickled some `FieldList` instance are converted to simple lists, 
> and the pickling validation fails.
> Would it be possible to extend the test coverage to nested json objects for 
> bigquery?
> They are also relatively easy to parse into a TableFieldSchema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3556: [BEAM-2595] Allow table schema objects in BQ DoFn

2017-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3556


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/2] beam git commit: This closes #3556

2017-07-13 Thread chamikara

This closes #3556


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e8c55744
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e8c55744
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e8c55744

Branch: refs/heads/master
Commit: e8c5574483edc28d8bea30e55aa2d54b1d566722
Parents: 5fd2c6e eb951c2
Author: Chamikara Jayalath 
Authored: Thu Jul 13 13:52:02 2017 -0700
Committer: Chamikara Jayalath 
Committed: Thu Jul 13 13:52:02 2017 -0700

--
 sdks/python/apache_beam/io/gcp/bigquery.py  | 100 +++---
 sdks/python/apache_beam/io/gcp/bigquery_test.py | 105 +--
 2 files changed, 180 insertions(+), 25 deletions(-)
--

[jira] [Commented] (BEAM-1799) IO ITs: simplify data loading design pattern

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086382#comment-16086382
 ] 

ASF GitHub Bot commented on BEAM-1799:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2507


> IO ITs: simplify data loading design pattern
> 
>
> Key: BEAM-1799
> URL: https://issues.apache.org/jira/browse/BEAM-1799
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Stephen Sisk
>Assignee: Stephen Sisk
> Fix For: 2.0.0
>
>
> Problems with the current solution
> =
> * The IO IT data loading guidelines [1] are complicated & aren't "native 
> junit" - you end up working around junit rather than working with it (I was a 
> part of defining them[0], so I critique the rules with (heart) )
> * Doing data loading using external tools means we have additional 
> dependencies outside of the tests themselves. If we *must* use them, it's 
> worth the time, but I think we have another option. I find it especially 
> amusing since the data loading tools are things like ycsb which themselves 
> are benchmarking tools ("I heard you like performance benchmarking, so here's 
> a performance benchmarking tool to use before you use your performance 
> benchmarking tool"), and really are just solving the problem of "I want to 
> write data in parallel to this data store" - that sounds familiar :) 
> The current guidelines also don't scale well to performance tests:
> * We want to write medium sized data for perf tests - doing data loading 
> using external tools means a minimum of 2 reads & writes. For the small scale 
> ITs, that's not a big deal, but for the large scale tests, if we assume we're 
> working with a fixed budget, more data transferred/stored ~= fewer tests.
> * If you want to verify that large data sets are correct (or create them), 
> you need to actually read and write those large data sets - currently, the 
> plan is that data loading/testing infrastructure only runs on one machine, so 
> those operations are going to be slow. We aren't working with actual large 
> data sets, so it won't take too long, but it's always nice to have faster 
> tests.
> New Proposed Solution
> ===
> Instead of trying to test read and write separately, the test should be a 
> "write, then read back what you just wrote", all using the IO under test. To 
> support scenarios like "I want to run my read test repeatedly without 
> re-writing the data", tests would add flags for "skipCleanUp" and 
> "useExistingData".
> Check out the example I wrote up [2]
> I didn't want to invest much time on this before I opened a Jira/talked to 
> others, so I plan on expanding on this a bit more/formalizing it in the 
> testing docs.
> A reminder of some context:
> * The goals for the ITs & Perf tests are that they are *not* intended to be 
> the place where we exercise specific scenarios. Instead, they are tripwires 
> designed to find problems with code *we already believe works* (as proven by 
> the unit tests) when it runs against real data store instances/runners using 
> multiple nodes of both.
> There are some definite disadvantages: 
> * There is a class of bugs that you can miss doing this. (namely: "I mangled 
> the data on the way into the data store, and then reverse-mangled it again on 
> the way back out so it looks fine, even though it is bad in the db") I assume 
> that many of us have tested storage code in the past, and so we've thought 
> about this trade-off. In this particular environment, where it's 
> expensive/tricky to do independent testing of the storage code, I think this 
> is the right trade off.
> * The data loading scripts cannot be re-used between languages. I think this 
> will be a pretty small relative cost compared to the cost of writing the IO 
> in multiple languages, so it shouldn't matter too much. I think we'll save 
> more time in not needing to use external tools for loading data.
> * Read-only or write-only data stores - in this case, we'll either need to 
> default to the old plan, or implement data loading or verification using beam
> * This assumes the data store support parallelism - in the case where the 
> read or write cannot be split, we probably should limit the amount of data we 
> process in the tests to what we can reasonably do on a single worker anyway.
> * It's harder to debug when this fails - I agree, and part of what I hope to 
> invest a little time in as I go forward is to make it easier to determine 
> what the actual failure is. Presumably folks debugging a particular IO's 
> failures have tools to look at that IO and will be able to quickly determine 
> if it's failing on the read or write.
> * As with the previously before

[1/2] beam git commit: [BEAM-2595] Allow table schema objects in BQ DoFn

2017-07-13 Thread chamikara

Repository: beam
Updated Branches:
  refs/heads/master 5fd2c6e13 -> e8c557448


[BEAM-2595] Allow table schema objects in BQ DoFn


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/eb951c2e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/eb951c2e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/eb951c2e

Branch: refs/heads/master
Commit: eb951c2e161294510d5a23f7c641592b0a8be087
Parents: 5fd2c6e
Author: Sourabh Bajaj 
Authored: Thu Jul 13 12:02:31 2017 -0700
Committer: Chamikara Jayalath 
Committed: Thu Jul 13 13:51:15 2017 -0700

--
 sdks/python/apache_beam/io/gcp/bigquery.py  | 100 +++---
 sdks/python/apache_beam/io/gcp/bigquery_test.py | 105 +--
 2 files changed, 180 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/eb951c2e/sdks/python/apache_beam/io/gcp/bigquery.py
--
diff --git a/sdks/python/apache_beam/io/gcp/bigquery.py 
b/sdks/python/apache_beam/io/gcp/bigquery.py
index da8be68..23fd310 100644
--- a/sdks/python/apache_beam/io/gcp/bigquery.py
+++ b/sdks/python/apache_beam/io/gcp/bigquery.py
@@ -1191,22 +1191,20 @@ class BigQueryWriteFn(DoFn):
 
   @staticmethod
   def get_table_schema(schema):
-# Transform the table schema into a bigquery.TableSchema instance.
-if isinstance(schema, basestring):
-  table_schema = bigquery.TableSchema()
-  schema_list = [s.strip() for s in schema.split(',')]
-  for field_and_type in schema_list:
-field_name, field_type = field_and_type.split(':')
-field_schema = bigquery.TableFieldSchema()
-field_schema.name = field_name
-field_schema.type = field_type
-field_schema.mode = 'NULLABLE'
-table_schema.fields.append(field_schema)
-  return table_schema
-elif schema is None:
-  return schema
-elif isinstance(schema, bigquery.TableSchema):
+"""Transform the table schema into a bigquery.TableSchema instance.
+
+Args:
+  schema: The schema to be used if the BigQuery table to write has to be
+created. This is a dictionary object created in the WriteToBigQuery
+transform.
+Returns:
+  table_schema: The schema to be used if the BigQuery table to write has
+ to be created but in the bigquery.TableSchema format.
+"""
+if schema is None:
   return schema
+elif isinstance(schema, dict):
+  return parse_table_schema_from_json(json.dumps(schema))
 else:
   raise TypeError('Unexpected schema argument: %s.' % schema)
 
@@ -1289,13 +1287,83 @@ class WriteToBigQuery(PTransform):
 self.batch_size = batch_size
 self.test_client = test_client
 
+  @staticmethod
+  def get_table_schema_from_string(schema):
+"""Transform the string table schema into a bigquery.TableSchema instance.
+
+Args:
+  schema: The sting schema to be used if the BigQuery table to write has
+ to be created.
+Returns:
+  table_schema: The schema to be used if the BigQuery table to write has
+ to be created but in the bigquery.TableSchema format.
+"""
+table_schema = bigquery.TableSchema()
+schema_list = [s.strip() for s in schema.split(',')]
+for field_and_type in schema_list:
+  field_name, field_type = field_and_type.split(':')
+  field_schema = bigquery.TableFieldSchema()
+  field_schema.name = field_name
+  field_schema.type = field_type
+  field_schema.mode = 'NULLABLE'
+  table_schema.fields.append(field_schema)
+return table_schema
+
+  @staticmethod
+  def table_schema_to_dict(table_schema):
+"""Create a dictionary representation of table schema for serialization
+"""
+def get_table_field(field):
+  """Create a dictionary representation of a table field
+  """
+  result = {}
+  result['name'] = field.name
+  result['type'] = field.type
+  result['mode'] = getattr(field, 'mode', 'NULLABLE')
+  if hasattr(field, 'description') and field.description is not None:
+result['description'] = field.description
+  if hasattr(field, 'fields') and field.fields:
+result['fields'] = [get_table_field(f) for f in field.fields]
+  return result
+
+if not isinstance(table_schema, bigquery.TableSchema):
+  raise ValueError("Table schema must be of the type bigquery.TableSchema")
+schema = {'fields': []}
+for field in table_schema.fields:
+  schema['fields'].append(get_table_field(field))
+return schema
+
+  @staticmethod
+  def get_dict_table_schema(schema):
+"""Transform the table schema into a dictionary instance.
+
+Args:
+  schema: The schema to be used if the BigQuery table to write has to be
+

[jira] [Resolved] (BEAM-2353) FileNamePolicy context parameters allow backwards compatibility where we really don't want any

2017-07-13 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-2353.
---
Resolution: Fixed

> FileNamePolicy context parameters allow backwards compatibility where we 
> really don't want any
> --
>
> Key: BEAM-2353
> URL: https://issues.apache.org/jira/browse/BEAM-2353
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Reuven Lax
> Fix For: 2.2.0
>
>
> Currently, in {{FileBasedSink}} the {{FileNamePolicy}} object accepts 
> parameters of type {{Context}} and {{WindowedContext}} respectively.
> These contexts are a coding technique to allow easy backwards compatibility 
> when adding new parameters. However, if a new parameter is added to the file 
> name policy it is likely data loss for the user to not incorporate it, so in 
> fact that is never a safe backwards compatible change.
> These are brand-new APIs and marked experimental. This is important enough I 
> think we should make the breaking change.
> We should inline all the parameters of the context, so that we _cannot_ add 
> parameters and maintain compatibility. Instead, if we have new ones we want 
> to add, it will have to be a new method or some such.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #2507: [BEAM-1799] JdbcIOIT now uses writeThenRead style

2017-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2507


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/2] beam git commit: This closes #2507: JdbcIOIT now uses writeThenRead style

2017-07-13 Thread kenn

This closes #2507: JdbcIOIT now uses writeThenRead style


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5fd2c6e1
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5fd2c6e1
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5fd2c6e1

Branch: refs/heads/master
Commit: 5fd2c6e139387d3bf1a297adaf5dc4687bcda7ee
Parents: 5f972e8 a6201ed
Author: Kenneth Knowles 
Authored: Thu Jul 13 13:43:36 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jul 13 13:43:36 2017 -0700

--
 sdks/java/io/common/pom.xml |  10 +
 .../org/apache/beam/sdk/io/common/TestRow.java  | 114 +++
 sdks/java/io/jdbc/pom.xml   |  10 +-
 .../org/apache/beam/sdk/io/jdbc/JdbcIOIT.java   | 203 ++-
 .../org/apache/beam/sdk/io/jdbc/JdbcIOTest.java | 115 ++-
 .../beam/sdk/io/jdbc/JdbcTestDataSet.java   | 130 
 .../apache/beam/sdk/io/jdbc/JdbcTestHelper.java |  81 
 7 files changed, 377 insertions(+), 286 deletions(-)
--

[1/2] beam git commit: JdbcIOIT now uses writeThenRead style

2017-07-13 Thread kenn

Repository: beam
Updated Branches:
  refs/heads/master 5f972e8b2 -> 5fd2c6e13


JdbcIOIT now uses writeThenRead style


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a6201ed1
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a6201ed1
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a6201ed1

Branch: refs/heads/master
Commit: a6201ed1488d9ae95637002744bc316f72401e56
Parents: 5f972e8
Author: Stephen Sisk 
Authored: Fri Jun 16 11:04:07 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jul 13 13:43:27 2017 -0700

--
 sdks/java/io/common/pom.xml |  10 +
 .../org/apache/beam/sdk/io/common/TestRow.java  | 114 +++
 sdks/java/io/jdbc/pom.xml   |  10 +-
 .../org/apache/beam/sdk/io/jdbc/JdbcIOIT.java   | 203 ++-
 .../org/apache/beam/sdk/io/jdbc/JdbcIOTest.java | 115 ++-
 .../beam/sdk/io/jdbc/JdbcTestDataSet.java   | 130 
 .../apache/beam/sdk/io/jdbc/JdbcTestHelper.java |  81 
 7 files changed, 377 insertions(+), 286 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/a6201ed1/sdks/java/io/common/pom.xml
--
diff --git a/sdks/java/io/common/pom.xml b/sdks/java/io/common/pom.xml
index df0d94b..1a6f54b 100644
--- a/sdks/java/io/common/pom.xml
+++ b/sdks/java/io/common/pom.xml
@@ -38,5 +38,15 @@
 com.google.guava
 guava
   
+  
+com.google.auto.value
+auto-value
+provided
+  
+  
+junit
+junit
+test
+  
 
 

http://git-wip-us.apache.org/repos/asf/beam/blob/a6201ed1/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/TestRow.java
--
diff --git 
a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/TestRow.java 
b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/TestRow.java
new file mode 100644
index 000..5f0a2fb
--- /dev/null
+++ 
b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/TestRow.java
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.common;
+
+import com.google.auto.value.AutoValue;
+import com.google.common.collect.ImmutableMap;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * Used to pass values around within test pipelines.
+ */
+@AutoValue
+public abstract class TestRow implements Serializable, Comparable {
+  /**
+   * Manually create a test row.
+   */
+  public static TestRow create(Integer id, String name) {
+return new AutoValue_TestRow(id, name);
+  }
+
+  public abstract Integer id();
+  public abstract String name();
+
+  public int compareTo(TestRow other) {
+return id().compareTo(other.id());
+  }
+
+  /**
+   * Creates a {@link org.apache.beam.sdk.io.common.TestRow} from the seed 
value.
+   */
+  public static TestRow fromSeed(Integer seed) {
+return create(seed, getNameForSeed(seed));
+  }
+
+  /**
+   * Returns the name field value produced from the given seed.
+   */
+  public static String getNameForSeed(Integer seed) {
+return "Testval" + seed;
+  }
+
+  /**
+   * Returns a range of {@link org.apache.beam.sdk.io.common.TestRow}s for 
seed values between
+   * rangeStart (inclusive) and rangeEnd (exclusive).
+   */
+  public static Iterable getExpectedValues(int rangeStart, int 
rangeEnd) {
+List ret = new ArrayList(rangeEnd - rangeStart + 1);
+for (int i = rangeStart; i < rangeEnd; i++) {
+  ret.add(fromSeed(i));
+}
+return ret;
+  }
+
+  /**
+   * Uses the input Long values as seeds to produce {@link 
org.apache.beam.sdk.io.common.TestRow}s.
+   */
+  public static class DeterministicallyConstructTestRowFn extends DoFn {
+@ProcessElement
+public void

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3393

2017-07-13 Thread Apache Jenkins Server

See

[GitHub] beam pull request #3539: [BEAM-2353] Unbundle Context and WindowedContext.

2017-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3539


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-2353) FileNamePolicy context parameters allow backwards compatibility where we really don't want any

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086352#comment-16086352
 ] 

ASF GitHub Bot commented on BEAM-2353:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3539


> FileNamePolicy context parameters allow backwards compatibility where we 
> really don't want any
> --
>
> Key: BEAM-2353
> URL: https://issues.apache.org/jira/browse/BEAM-2353
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Reuven Lax
> Fix For: 2.2.0
>
>
> Currently, in {{FileBasedSink}} the {{FileNamePolicy}} object accepts 
> parameters of type {{Context}} and {{WindowedContext}} respectively.
> These contexts are a coding technique to allow easy backwards compatibility 
> when adding new parameters. However, if a new parameter is added to the file 
> name policy it is likely data loss for the user to not incorporate it, so in 
> fact that is never a safe backwards compatible change.
> These are brand-new APIs and marked experimental. This is important enough I 
> think we should make the breaking change.
> We should inline all the parameters of the context, so that we _cannot_ add 
> parameters and maintain compatibility. Instead, if we have new ones we want 
> to add, it will have to be a new method or some such.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[2/2] beam git commit: This closes #3539: Unbundle FileNamePolicy Context and WindowedContext

2017-07-13 Thread kenn

This closes #3539: Unbundle FileNamePolicy Context and WindowedContext


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5f972e8b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5f972e8b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5f972e8b

Branch: refs/heads/master
Commit: 5f972e8b2525660a2c09e6f9f21a13b5b7b46366
Parents: 889776f 64997ef
Author: Kenneth Knowles 
Authored: Thu Jul 13 12:16:42 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jul 13 12:16:42 2017 -0700

--
 .../examples/common/WriteOneFilePerWindow.java  |  19 +-
 .../complete/game/utils/WriteToText.java|  18 +-
 .../construction/WriteFilesTranslationTest.java |  12 +-
 .../beam/sdk/io/DefaultFilenamePolicy.java  |  47 ++--
 .../org/apache/beam/sdk/io/FileBasedSink.java   | 198 --
 .../java/org/apache/beam/sdk/io/AvroIOTest.java | 263 ++-
 .../apache/beam/sdk/io/FileBasedSinkTest.java   |  88 +++
 .../org/apache/beam/sdk/io/WriteFilesTest.java  | 122 -
 8 files changed, 358 insertions(+), 409 deletions(-)
--

[1/2] beam git commit: Unbundle Context and WindowedContext.

2017-07-13 Thread kenn

Repository: beam
Updated Branches:
  refs/heads/master 889776fca -> 5f972e8b2


Unbundle Context and WindowedContext.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/64997efa
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/64997efa
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/64997efa

Branch: refs/heads/master
Commit: 64997efa597a6fd74f4a6b6a7ab48d663c56845f
Parents: 91c7d3d
Author: Reuven Lax 
Authored: Mon Jul 10 21:30:50 2017 -0700
Committer: Kenneth Knowles 
Committed: Thu Jul 13 09:29:23 2017 -0700

--
 .../examples/common/WriteOneFilePerWindow.java  |  19 +-
 .../complete/game/utils/WriteToText.java|  18 +-
 .../construction/WriteFilesTranslationTest.java |  12 +-
 .../beam/sdk/io/DefaultFilenamePolicy.java  |  47 ++--
 .../org/apache/beam/sdk/io/FileBasedSink.java   | 198 --
 .../java/org/apache/beam/sdk/io/AvroIOTest.java | 263 ++-
 .../apache/beam/sdk/io/FileBasedSinkTest.java   |  88 +++
 .../org/apache/beam/sdk/io/WriteFilesTest.java  | 122 -
 8 files changed, 358 insertions(+), 409 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/64997efa/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
--
diff --git 
a/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
 
b/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
index 49865ba..abd14b7 100644
--- 
a/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
+++ 
b/examples/java/src/main/java/org/apache/beam/examples/common/WriteOneFilePerWindow.java
@@ -28,7 +28,9 @@ import 
org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
 import org.apache.beam.sdk.io.fs.ResourceId;
 import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
+import org.apache.beam.sdk.transforms.windowing.PaneInfo;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PDone;
 import org.joda.time.format.DateTimeFormatter;
@@ -88,14 +90,18 @@ public class WriteOneFilePerWindow extends 
PTransform

[jira] [Commented] (BEAM-2612) support variance builtin aggregation function

2017-07-13 Thread Kai Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086340#comment-16086340
 ] 

Kai Jiang commented on BEAM-2612:
-

Yes, I am working on this file.

> support variance builtin aggregation function
> -
>
> Key: BEAM-2612
> URL: https://issues.apache.org/jira/browse/BEAM-2612
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>
> two builtin aggregate functions
> VAR_POP
> the population variance (square of the population standard deviation)
> VAR_SAMP
> the sample variance (square of the sample standard deviation)
> https://calcite.apache.org/docs/reference.html#aggregate-functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #3572

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1286) DataflowRunner handling of missing filesToStage

2017-07-13 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086259#comment-16086259
 ] 

Chamikara Jayalath commented on BEAM-1286:
--

Assigned JIRA to Kamil BTW. Thanks.

> DataflowRunner handling of missing filesToStage
> ---
>
> Key: BEAM-1286
> URL: https://issues.apache.org/jira/browse/BEAM-1286
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Kamil Szewczyk
>  Labels: newbie, starter
>
> DataflowRunner allows filesToStage to be missing -- it logs an error and 
> moves on. Is this the right behavior? It can complicate user experience.
> At least, I guess that if nothing to be staged is found, we should fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-1286) DataflowRunner handling of missing filesToStage

2017-07-13 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086258#comment-16086258
 ] 

Chamikara Jayalath commented on BEAM-1286:
--

You have to be a project administrator to add users to roles, right ? Looks 
like I don't have access to do that.

> DataflowRunner handling of missing filesToStage
> ---
>
> Key: BEAM-1286
> URL: https://issues.apache.org/jira/browse/BEAM-1286
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Kamil Szewczyk
>  Labels: newbie, starter
>
> DataflowRunner allows filesToStage to be missing -- it logs an error and 
> moves on. Is this the right behavior? It can complicate user experience.
> At least, I guess that if nothing to be staged is found, we should fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (BEAM-1286) DataflowRunner handling of missing filesToStage

2017-07-13 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath reassigned BEAM-1286:


Assignee: Kamil Szewczyk

> DataflowRunner handling of missing filesToStage
> ---
>
> Key: BEAM-1286
> URL: https://issues.apache.org/jira/browse/BEAM-1286
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Kamil Szewczyk
>  Labels: newbie, starter
>
> DataflowRunner allows filesToStage to be missing -- it logs an error and 
> moves on. Is this the right behavior? It can complicate user experience.
> At least, I guess that if nothing to be staged is found, we should fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2622

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-1286) DataflowRunner handling of missing filesToStage

2017-07-13 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086237#comment-16086237
 ] 

Kenneth Knowles commented on BEAM-1286:
---

OK - the role Contributor needed to be added, and now you should be able to 
handle it.

> DataflowRunner handling of missing filesToStage
> ---
>
> Key: BEAM-1286
> URL: https://issues.apache.org/jira/browse/BEAM-1286
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> DataflowRunner allows filesToStage to be missing -- it logs an error and 
> moves on. Is this the right behavior? It can complicate user experience.
> At least, I guess that if nothing to be staged is found, we should fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-1286) DataflowRunner handling of missing filesToStage

2017-07-13 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086235#comment-16086235
 ] 

Kenneth Knowles commented on BEAM-1286:
---

Are you unable to assign? I do know that contributors have grabbed JIRAs for 
themselves, so [~szewinho] should be able to take the issue. Can you just try 
again before I do it?

> DataflowRunner handling of missing filesToStage
> ---
>
> Key: BEAM-1286
> URL: https://issues.apache.org/jira/browse/BEAM-1286
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> DataflowRunner allows filesToStage to be missing -- it logs an error and 
> moves on. Is this the right behavior? It can complicate user experience.
> At least, I guess that if nothing to be staged is found, we should fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Dmitry Demeshchuk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086231#comment-16086231
 ] 

Dmitry Demeshchuk commented on BEAM-2572:
-

[~ste...@apache.org]: Hundred percent agree. My thinking was to make a special 
credentials class that hides the attribute values, making them unaccessible via 
the __getattr__() method.

If we run on EC2 and the instance has the right IAM instance profile, that's 
fine. Passing credentials should probably be optional, and I'd expect that 
whoever runs a pipeline through Beam would be at least aware of what the runner 
is and what information it requires.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-1286) DataflowRunner handling of missing filesToStage

2017-07-13 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086224#comment-16086224
 ] 

Chamikara Jayalath commented on BEAM-1286:
--

[~davor] [~kenn] could one of you assign this to Kamil ?

> DataflowRunner handling of missing filesToStage
> ---
>
> Key: BEAM-1286
> URL: https://issues.apache.org/jira/browse/BEAM-1286
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> DataflowRunner allows filesToStage to be missing -- it logs an error and 
> moves on. Is this the right behavior? It can complicate user experience.
> At least, I guess that if nothing to be staged is found, we should fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Dmitry Demeshchuk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086202#comment-16086202
 ] 

Dmitry Demeshchuk commented on BEAM-2572:
-

Hi Cham,

That sounds good, let's focus on the FileSystem sub-classes; as you said, since 
they get constructed in runtime, they are somewhat special, compared to regular 
PTransforms.

It looks like so far we've been focusing on the implementation side of things: 
using pre-setup environment and/or creating a custom package for passing the 
parameters, passing the parameters to the FileSystem class, etc. What if we 
first try to discuss the user interface first, and then for each option see 
what the implementation would be for each proposed user interface? It will be 
likely that some interface proposals will turn out not be viable due to Beam's 
execution context or stuff like that, but that's fine.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3392

2017-07-13 Thread Apache Jenkins Server

See

Jenkins build is unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2621

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2556) Client-side throttling for Datastore connector

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086165#comment-16086165
 ] 

ASF GitHub Bot commented on BEAM-2556:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3558


> Client-side throttling for Datastore connector
> --
>
> Key: BEAM-2556
> URL: https://issues.apache.org/jira/browse/BEAM-2556
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Colin Phipps
>Assignee: Colin Phipps
>Priority: Minor
>  Labels: datastore
>
> The Datastore connector currently has exponential backoff on errors, which is 
> good. But it does not do any other throttling of its write load in response 
> to errors; once a request succeeds, it resumes writing as quickly as it can.
> Write loads will be more stable and more likely to compete if the client 
> throttles itself in the event that it receives high rates of errors from the 
> Datastore service; specifically 
> https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
>  is a technique that Google has had success with on other services.
> We (Datastore) have a patch in progress to add this behaviour to the 
> connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3558: [BEAM-2556] Implement retries in the read connector...

2017-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3558


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[1/2] beam git commit: Implement retries in the read connector.

2017-07-13 Thread chamikara

Repository: beam
Updated Branches:
  refs/heads/master 66b4a1be0 -> 889776fca


Implement retries in the read connector.

Respect non-retryable error codes from Datastore.
Add unit tests to check that retryable errors are retried.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/016baf34
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/016baf34
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/016baf34

Branch: refs/heads/master
Commit: 016baf3465bbccbc9d3df6999b38b1b2533aee8c
Parents: 66b4a1b
Author: Colin Phipps 
Authored: Mon Jul 10 16:09:23 2017 +
Committer: Colin Phipps 
Committed: Thu Jul 13 11:11:21 2017 +

--
 .../beam/sdk/io/gcp/datastore/DatastoreV1.java  | 45 -
 .../sdk/io/gcp/datastore/DatastoreV1Test.java   | 51 +++-
 2 files changed, 94 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/016baf34/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
index 5f65428..1ed6430 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore/DatastoreV1.java
@@ -40,6 +40,7 @@ import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.MoreObjects;
 import com.google.common.base.Strings;
 import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableSet;
 import com.google.datastore.v1.CommitRequest;
 import com.google.datastore.v1.Entity;
 import com.google.datastore.v1.EntityResult;
@@ -65,6 +66,7 @@ import java.io.Serializable;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.NoSuchElementException;
+import java.util.Set;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.PipelineRunner;
 import org.apache.beam.sdk.annotations.Experimental;
@@ -238,6 +240,14 @@ public class DatastoreV1 {
   static final int DATASTORE_BATCH_UPDATE_BYTES_LIMIT = 9_000_000;
 
   /**
+   * Non-retryable errors.
+   * See https://cloud.google.com/datastore/docs/concepts/errors#Error_Codes .
+   */
+  private static final Set NON_RETRYABLE_ERRORS =
+ImmutableSet.of(Code.FAILED_PRECONDITION, Code.INVALID_ARGUMENT, 
Code.PERMISSION_DENIED,
+Code.UNAUTHENTICATED);
+
+  /**
* Returns an empty {@link DatastoreV1.Read} builder. Configure the source 
{@code projectId},
* {@code query}, and optionally {@code namespace} and {@code 
numQuerySplits} using
* {@link DatastoreV1.Read#withProjectId}, {@link 
DatastoreV1.Read#withQuery},
@@ -840,6 +850,14 @@ public class DatastoreV1 {
   private final V1DatastoreFactory datastoreFactory;
   // Datastore client
   private transient Datastore datastore;
+  private final Counter rpcErrors =
+Metrics.counter(DatastoreWriterFn.class, "datastoreRpcErrors");
+  private final Counter rpcSuccesses =
+Metrics.counter(DatastoreWriterFn.class, "datastoreRpcSuccesses");
+  private static final int MAX_RETRIES = 5;
+  private static final FluentBackoff RUNQUERY_BACKOFF =
+FluentBackoff.DEFAULT
+
.withMaxRetries(MAX_RETRIES).withInitialBackoff(Duration.standardSeconds(5));
 
   public ReadFn(V1Options options) {
 this(options, new V1DatastoreFactory());
@@ -857,6 +875,28 @@ public class DatastoreV1 {
 options.getLocalhost());
   }
 
+  private RunQueryResponse runQueryWithRetries(RunQueryRequest request) 
throws Exception {
+Sleeper sleeper = Sleeper.DEFAULT;
+BackOff backoff = RUNQUERY_BACKOFF.backoff();
+while (true) {
+  try {
+RunQueryResponse response = datastore.runQuery(request);
+rpcSuccesses.inc();
+return response;
+  } catch (DatastoreException exception) {
+rpcErrors.inc();
+
+if (NON_RETRYABLE_ERRORS.contains(exception.getCode())) {
+  throw exception;
+}
+if (!BackOffUtils.next(sleeper, backoff)) {
+  LOG.error("Aborting after {} retries.", MAX_RETRIES);
+  throw exception;
+}
+  }
+}
+  }
+
   /** Read and output entities for the given query. */
   @ProcessElement
   public void processElement(ProcessContext context) throws Exception {
@@ -878,7 +918,7 @@ public class DatastoreV1 {

[2/2] beam git commit: This closes #3558

2017-07-13 Thread chamikara

This closes #3558


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/889776fc
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/889776fc
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/889776fc

Branch: refs/heads/master
Commit: 889776fcad72e45a93ce4e206ee728595361b1cb
Parents: 66b4a1b 016baf3
Author: Chamikara Jayalath 
Authored: Thu Jul 13 11:28:28 2017 -0700
Committer: Chamikara Jayalath 
Committed: Thu Jul 13 11:28:28 2017 -0700

--
 .../beam/sdk/io/gcp/datastore/DatastoreV1.java  | 45 -
 .../sdk/io/gcp/datastore/DatastoreV1Test.java   | 51 +++-
 2 files changed, 94 insertions(+), 2 deletions(-)
--

Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Flink #3391

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Created] (BEAM-2620) Revisit GroupByKey.createWithFewKeys

2017-07-13 Thread Thomas Groh (JIRA)

Thomas Groh created BEAM-2620:
-

 Summary: Revisit GroupByKey.createWithFewKeys
 Key: BEAM-2620
 URL: https://issues.apache.org/jira/browse/BEAM-2620
 Project: Beam
  Issue Type: Bug
  Components: beam-model, beam-model-runner-api, sdk-java-core
Reporter: Thomas Groh


This doesn't have a parallel within the GroupByKeyPayload, so there's currently 
no way to send it through the Runner API.

The place it will almost always be created is in a {{Combine.globally()}}.

It's potentially useful as an optimizer hint. The Dataflow Runner in streaming 
mode disables combiner lifting unless the GroupByKey has the fewKeys property 
set to true. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (BEAM-2550) test JOINs with DSL methods

2017-07-13 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin closed BEAM-2550.

   Resolution: Fixed
Fix Version/s: 2.2.0

> test JOINs with DSL methods
> ---
>
> Key: BEAM-2550
> URL: https://issues.apache.org/jira/browse/BEAM-2550
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: James Xu
>  Labels: dsl_sql_merge
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2555) add README page in dsl/sql

2017-07-13 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086080#comment-16086080
 ] 

Xu Mingmin commented on BEAM-2555:
--

maybe to remove the existing one as I don't see any module have it.

[~xumingming] [~takidau] any comments?

> add README page in dsl/sql
> --
>
> Key: BEAM-2555
> URL: https://issues.apache.org/jira/browse/BEAM-2555
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2159) CAST operator support

2017-07-13 Thread Tarush Grover (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086075#comment-16086075
 ] 

Tarush Grover commented on BEAM-2159:
-

Task is complete. Please close this issue.

> CAST operator support
> -
>
> Key: BEAM-2159
> URL: https://issues.apache.org/jira/browse/BEAM-2159
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Tarush Grover
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (BEAM-2424) CAST operator supporting numeric, date and timestamp types

2017-07-13 Thread Tarush Grover (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarush Grover closed BEAM-2424.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

> CAST operator supporting numeric, date and timestamp types
> --
>
> Key: BEAM-2424
> URL: https://issues.apache.org/jira/browse/BEAM-2424
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Tarush Grover
>Assignee: Tarush Grover
>  Labels: dsl_sql_merge
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2618) Add a GroupByKeyTest where the inputs are windowed into SlidingWindows

2017-07-13 Thread Thomas Groh (JIRA)

Thomas Groh created BEAM-2618:
-

 Summary: Add a GroupByKeyTest where the inputs are windowed into 
SlidingWindows
 Key: BEAM-2618
 URL: https://issues.apache.org/jira/browse/BEAM-2618
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Thomas Groh
Assignee: Thomas Groh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (BEAM-2171) Power function

2017-07-13 Thread Tarush Grover (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarush Grover closed BEAM-2171.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

> Power function
> --
>
> Key: BEAM-2171
> URL: https://issues.apache.org/jira/browse/BEAM-2171
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Tarush Grover
>Assignee: Tarush Grover
>  Labels: dsl_sql_merge
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2619) Add a GroupByKeyTest where the input is windowed into sessions

2017-07-13 Thread Thomas Groh (JIRA)

Thomas Groh created BEAM-2619:
-

 Summary: Add a GroupByKeyTest where the input is windowed into 
sessions
 Key: BEAM-2619
 URL: https://issues.apache.org/jira/browse/BEAM-2619
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Thomas Groh
Assignee: Thomas Groh


This demonstrates the merging behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2171) Power function

2017-07-13 Thread Tarush Grover (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086072#comment-16086072
 ] 

Tarush Grover commented on BEAM-2171:
-

Yes this task is finished. Closing it.

> Power function
> --
>
> Key: BEAM-2171
> URL: https://issues.apache.org/jira/browse/BEAM-2171
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Tarush Grover
>Assignee: Tarush Grover
>  Labels: dsl_sql_merge
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (BEAM-2616) Add ViewTest with sessions

2017-07-13 Thread Thomas Groh (JIRA)

Thomas Groh created BEAM-2616:
-

 Summary: Add ViewTest with sessions
 Key: BEAM-2616
 URL: https://issues.apache.org/jira/browse/BEAM-2616
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Thomas Groh


Reading should be exercised.

The default behavior of writing a view where the input is windowed in sessions 
should be to forbid it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3559: Retry on Datastore client python socket errors

2017-07-13 Thread vikkyrk

GitHub user vikkyrk opened a pull request:

https://github.com/apache/beam/pull/3559

Retry on Datastore client python socket errors

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vikkyrk/incubator-beam py_ds_retry

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3559


commit 288def3a556ded4dc4c116dd7b7c9ff42b671ef8
Author: Vikas Kedigehalli 
Date:   2017-07-13T17:29:23Z

datastoreio: retry on socket errors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (BEAM-2615) Add ViewTests with SlidingWindows

2017-07-13 Thread Thomas Groh (JIRA)

Thomas Groh created BEAM-2615:
-

 Summary: Add ViewTests with SlidingWindows
 Key: BEAM-2615
 URL: https://issues.apache.org/jira/browse/BEAM-2615
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Thomas Groh
Assignee: Davor Bonaci


For both reading and writing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2171) Power function

2017-07-13 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086065#comment-16086065
 ] 

Xu Mingmin commented on BEAM-2171:
--

[~app-tarush], this task is finished right? can you close it

> Power function
> --
>
> Key: BEAM-2171
> URL: https://issues.apache.org/jira/browse/BEAM-2171
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Tarush Grover
>Assignee: Tarush Grover
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-2281) call SqlFunctions in operator implementation

2017-07-13 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin updated BEAM-2281:
-
Labels:   (was: dsl_sql_merge)

> call SqlFunctions in operator implementation
> 
>
> Key: BEAM-2281
> URL: https://issues.apache.org/jira/browse/BEAM-2281
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Xu Mingmin
>
> Calcite has a collections of functions in 
> {{org.apache.calcite.runtime.SqlFunctions}}. It sounds a good source to 
> leverage when adding operators as {{BeamSqlExpression}}. 
> [~xumingming] [~app-tarush], any comments?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2612) support variance builtin aggregation function

2017-07-13 Thread Xu Mingmin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086059#comment-16086059
 ] 

Xu Mingmin commented on BEAM-2612:
--

cool, you can refer to 
https://github.com/apache/beam/blob/DSL_SQL/dsls/sql/src/main/java/org/apache/beam/dsls/sql/transform/BeamBuiltinAggregations.java
 as examples.

> support variance builtin aggregation function
> -
>
> Key: BEAM-2612
> URL: https://issues.apache.org/jira/browse/BEAM-2612
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>
> two builtin aggregate functions
> VAR_POP
> the population variance (square of the population standard deviation)
> VAR_SAMP
> the sample variance (square of the sample standard deviation)
> https://calcite.apache.org/docs/reference.html#aggregate-functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-1962) Connection should be closed in case start() throws exception

2017-07-13 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated BEAM-1962:
-
Description: 
In JmsIO#start() :

{code}
  try {
Connection connection;
if (spec.getUsername() != null) {
  connection =
  connectionFactory.createConnection(spec.getUsername(), 
spec.getPassword());
} else {
  connection = connectionFactory.createConnection();
}
connection.start();
this.connection = connection;
  } catch (Exception e) {
throw new IOException("Error connecting to JMS", e);
  }
{code}
If start() throws exception, connection should be closed.

  was:
In JmsIO#start() :
{code}
  try {
Connection connection;
if (spec.getUsername() != null) {
  connection =
  connectionFactory.createConnection(spec.getUsername(), 
spec.getPassword());
} else {
  connection = connectionFactory.createConnection();
}
connection.start();
this.connection = connection;
  } catch (Exception e) {
throw new IOException("Error connecting to JMS", e);
  }
{code}
If start() throws exception, connection should be closed.


> Connection should be closed in case start() throws exception
> 
>
> Key: BEAM-1962
> URL: https://issues.apache.org/jira/browse/BEAM-1962
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Ted Yu
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> In JmsIO#start() :
> {code}
>   try {
> Connection connection;
> if (spec.getUsername() != null) {
>   connection =
>   connectionFactory.createConnection(spec.getUsername(), 
> spec.getPassword());
> } else {
>   connection = connectionFactory.createConnection();
> }
> connection.start();
> this.connection = connection;
>   } catch (Exception e) {
> throw new IOException("Error connecting to JMS", e);
>   }
> {code}
> If start() throws exception, connection should be closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (BEAM-2335) Document various maven commands for running tests

2017-07-13 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated BEAM-2335:
-
Labels: document  (was: )

> Document various maven commands for running tests
> -
>
> Key: BEAM-2335
> URL: https://issues.apache.org/jira/browse/BEAM-2335
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Ted Yu
>  Labels: document
>
> In this discussion thread, various maven commands for running / not running 
> selected tests were mentioned:
> http://search-hadoop.com/m/Beam/gfKHFd9bPDh5WJr1?subj=Re+How+can+I+disable+running+Python+SDK+tests+when+testing+my+Java+change+
> We should document these commands under 
> https://beam.apache.org/contribute/testing/ 
> Borisa raised the following questions:
> how do I execute only one test marked as @NeedsRunner?
> How do I execute one specific test in java io?
> How to execute one pecific test in any of the runners?
> How to use beamTestpipelineoptions with few json examples?
> Will mvn clean verify execute ALL tests against all runners?
> For the #1 above, we can create profile which is used run tests in 
> NeedsRunner category.
> See the following:
> http://stackoverflow.com/questions/3100924/how-to-run-junit-tests-by-category-in-maven



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (BEAM-2610) upgrade to version 2.2.0

2017-07-13 Thread Xu Mingmin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Mingmin closed BEAM-2610.

   Resolution: Fixed
Fix Version/s: 2.2.0

> upgrade to version 2.2.0
> 
>
> Key: BEAM-2610
> URL: https://issues.apache.org/jira/browse/BEAM-2610
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
> Fix For: 2.2.0
>
>
> This task syncs changes from master branch which is now using version 
> 2.2.0-SNAPSHOT. 
> As usual, there will be two PRs,
> 1. a pull request from master to DSL_SQL, this one is merged by ignoring any 
> errors;
> 2. a second PR to finish the change in DSL_SQL, and also fix any potential 
> issue;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2610) upgrade to version 2.2.0

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085904#comment-16085904
 ] 

ASF GitHub Bot commented on BEAM-2610:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3554


> upgrade to version 2.2.0
> 
>
> Key: BEAM-2610
> URL: https://issues.apache.org/jira/browse/BEAM-2610
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>  Labels: dsl_sql_merge
>
> This task syncs changes from master branch which is now using version 
> 2.2.0-SNAPSHOT. 
> As usual, there will be two PRs,
> 1. a pull request from master to DSL_SQL, this one is merged by ignoring any 
> errors;
> 2. a second PR to finish the change in DSL_SQL, and also fix any potential 
> issue;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3554: [BEAM-2610] upgrade to version 2.2.0

2017-07-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3554


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[2/2] beam git commit: [BEAM-2610] This closes #3554

2017-07-13 Thread takidau

[BEAM-2610] This closes #3554


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/5fea7463
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/5fea7463
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/5fea7463

Branch: refs/heads/DSL_SQL
Commit: 5fea74638551c4e1928ce0f3ddea833354764d9f
Parents: ec494f6 45a3fe0
Author: Tyler Akidau 
Authored: Thu Jul 13 08:56:35 2017 -0700
Committer: Tyler Akidau 
Committed: Thu Jul 13 08:56:35 2017 -0700

--
 dsls/pom.xml | 2 +-
 dsls/sql/pom.xml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--

[1/2] beam git commit: upgrade pom to 2.2.0-SNAPSHOT

2017-07-13 Thread takidau

Repository: beam
Updated Branches:
  refs/heads/DSL_SQL ec494f675 -> 5fea74638


upgrade pom to 2.2.0-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/45a3fe0a
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/45a3fe0a
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/45a3fe0a

Branch: refs/heads/DSL_SQL
Commit: 45a3fe0a4c97684d44eb94c1a1da4515ad3779c2
Parents: ec494f6
Author: mingmxu 
Authored: Wed Jul 12 19:04:08 2017 -0700
Committer: mingmxu 
Committed: Wed Jul 12 20:14:54 2017 -0700

--
 dsls/pom.xml | 2 +-
 dsls/sql/pom.xml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/45a3fe0a/dsls/pom.xml
--
diff --git a/dsls/pom.xml b/dsls/pom.xml
index a518d03..1647114 100644
--- a/dsls/pom.xml
+++ b/dsls/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.beam
 beam-parent
-2.1.0-SNAPSHOT
+2.2.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/45a3fe0a/dsls/sql/pom.xml
--
diff --git a/dsls/sql/pom.xml b/dsls/sql/pom.xml
index 54f590e..5e670a0 100644
--- a/dsls/sql/pom.xml
+++ b/dsls/sql/pom.xml
@@ -24,7 +24,7 @@
   
 org.apache.beam
 beam-dsls-parent
-2.1.0-SNAPSHOT
+2.2.0-SNAPSHOT
 ../pom.xml

Jenkins build is back to normal : beam_PostCommit_Java_ValidatesRunner_Apex #1975

2017-07-13 Thread Apache Jenkins Server

See

[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2017-07-13 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085648#comment-16085648
 ] 

Steve Loughran commented on BEAM-2572:
--

bear in mind that credentials are highly sensitive facts, which mustn't leak 
into logs, stack traces, bug reports. If credentials are to be passed around 
this way, make sure that they are never visible.

note also that if you are running on EC2, you get session credentials for free 
from the IAM service; all you need to do is set up the auth chain right.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py
>Reporter: Dmitry Demeshchuk
>Assignee: Ahmet Altay
>Priority: Minor
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Build failed in Jenkins: beam_PostCommit_Java_ValidatesRunner_Spark #2620

2017-07-13 Thread Apache Jenkins Server

See 


--
[...truncated 449.15 KB...]
2017-07-13T12:27:04.466 [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.AfterAllStateMachineTest
2017-07-13T12:27:04.466 [INFO] Running 
org.apache.beam.runners.core.triggers.ReshuffleTriggerStateMachineTest
2017-07-13T12:27:04.469 [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.001 s - in 
org.apache.beam.runners.core.triggers.ReshuffleTriggerStateMachineTest
2017-07-13T12:27:04.469 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterProcessingTimeStateMachineTest
2017-07-13T12:27:04.472 [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.001 s - in 
org.apache.beam.runners.core.triggers.AfterProcessingTimeStateMachineTest
2017-07-13T12:27:04.472 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterPaneStateMachineTest
2017-07-13T12:27:04.474 [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0 s - in 
org.apache.beam.runners.core.triggers.AfterPaneStateMachineTest
2017-07-13T12:27:04.474 [INFO] Running 
org.apache.beam.runners.core.triggers.AfterSynchronizedProcessingTimeStateMachineTest
2017-07-13T12:27:04.477 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 
0, Time elapsed: 0.001 s - in 
org.apache.beam.runners.core.triggers.AfterSynchronizedProcessingTimeStateMachineTest
2017-07-13T12:27:05.103 [INFO] 
2017-07-13T12:27:05.103 [INFO] Results:
2017-07-13T12:27:05.103 [INFO] 
2017-07-13T12:27:05.103 [INFO] Tests run: 227, Failures: 0, Errors: 0, Skipped: 0
2017-07-13T12:27:05.103 [INFO] 
[JENKINS] Recording test results
2017-07-13T12:27:10.255 [INFO] 
2017-07-13T12:27:10.255 [INFO] --- 
build-helper-maven-plugin:3.0.0:regex-properties (render-artifact-id) @ 
beam-runners-core-java ---
2017-07-13T12:27:10.307 [INFO] 
2017-07-13T12:27:10.307 [INFO] --- jacoco-maven-plugin:0.7.8:report (report) @ 
beam-runners-core-java ---
2017-07-13T12:27:10.308 [INFO] Loading execution data file 

2017-07-13T12:27:10.343 [INFO] Analyzed bundle 'Apache Beam :: Runners :: Core 
Java' with 193 classes
2017-07-13T12:27:10.736 [INFO] 
2017-07-13T12:27:10.736 [INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ 
beam-runners-core-java ---
2017-07-13T12:27:10.765 [INFO] Building jar: 

2017-07-13T12:27:10.905 [INFO] 
2017-07-13T12:27:10.905 [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ beam-runners-core-java ---
2017-07-13T12:27:12.080 [INFO] 
2017-07-13T12:27:12.080 [INFO] --- maven-jar-plugin:3.0.2:test-jar 
(default-test-jar) @ beam-runners-core-java ---
2017-07-13T12:27:12.094 [INFO] Building jar: 

2017-07-13T12:27:12.161 [INFO] 
2017-07-13T12:27:12.161 [INFO] --- maven-shade-plugin:3.0.0:shade 
(bundle-and-repackage) @ beam-runners-core-java ---
2017-07-13T12:27:12.164 [INFO] Excluding 
org.apache.beam:beam-sdks-java-core:jar:2.2.0-SNAPSHOT from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
com.google.protobuf:protobuf-java:jar:3.2.0 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-core:jar:2.8.9 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-annotations:jar:2.8.9 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
com.fasterxml.jackson.core:jackson-databind:jar:2.8.9 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding org.slf4j:slf4j-api:jar:1.7.14 from 
the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding net.bytebuddy:byte-buddy:jar:1.6.8 
from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding org.apache.avro:avro:jar:1.8.2 from 
the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
org.codehaus.jackson:jackson-core-asl:jar:1.9.13 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
com.thoughtworks.paranamer:paranamer:jar:2.7 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding org.tukaani:xz:jar:1.5 from the shaded 
jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
org.xerial.snappy:snappy-java:jar:1.1.4-M3 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
org.apache.commons:commons-compress:jar:1.14 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding 
org.apache.commons:commons-lang3:jar:3.6 from the shaded jar.
2017-07-13T12:27:12.164 [INFO] Excluding

[jira] [Created] (BEAM-2614) Harness doesn't build with Java7

2017-07-13 Thread JIRA

Jean-Baptiste Onofré created BEAM-2614:
--

 Summary: Harness doesn't build with Java7
 Key: BEAM-2614
 URL: https://issues.apache.org/jira/browse/BEAM-2614
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-extensions
Affects Versions: 2.0.0, 2.1.0
Reporter: Jean-Baptiste Onofré
Assignee: Jean-Baptiste Onofré


Beam is supposed to fully build with Java7. However, the {{harness}} module 
doesn't:

{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) 
on project beam-sdks-java-harness: Fatal error compiling: invalid target 
release: 1.8 -> [Help 1]
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-1286) DataflowRunner handling of missing filesToStage

2017-07-13 Thread Kamil Szewczyk (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085604#comment-16085604
 ] 

Kamil Szewczyk commented on BEAM-1286:
--

Ok, so could someone assign me to that task? 
I will start my contribution 

> DataflowRunner handling of missing filesToStage
> ---
>
> Key: BEAM-1286
> URL: https://issues.apache.org/jira/browse/BEAM-1286
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>  Labels: newbie, starter
>
> DataflowRunner allows filesToStage to be missing -- it logs an error and 
> moves on. Is this the right behavior? It can complicate user experience.
> At least, I guess that if nothing to be staged is found, we should fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2556) Client-side throttling for Datastore connector

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085603#comment-16085603
 ] 

ASF GitHub Bot commented on BEAM-2556:
--

GitHub user cph6 opened a pull request:

https://github.com/apache/beam/pull/3558

[BEAM-2556] Implement retries in the read connector.

Retry failed RunQuery calls.
Respect non-retryable error codes from Datastore.
Add unit tests to check that retryable errors are retried.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cph6/beam datastore_better_error_handling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3558


commit 016baf3465bbccbc9d3df6999b38b1b2533aee8c
Author: Colin Phipps 
Date:   2017-07-10T16:09:23Z

Implement retries in the read connector.

Respect non-retryable error codes from Datastore.
Add unit tests to check that retryable errors are retried.




> Client-side throttling for Datastore connector
> --
>
> Key: BEAM-2556
> URL: https://issues.apache.org/jira/browse/BEAM-2556
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Colin Phipps
>Assignee: Colin Phipps
>Priority: Minor
>  Labels: datastore
>
> The Datastore connector currently has exponential backoff on errors, which is 
> good. But it does not do any other throttling of its write load in response 
> to errors; once a request succeeds, it resumes writing as quickly as it can.
> Write loads will be more stable and more likely to compete if the client 
> throttles itself in the event that it receives high rates of errors from the 
> Datastore service; specifically 
> https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
>  is a technique that Google has had success with on other services.
> We (Datastore) have a patch in progress to add this behaviour to the 
> connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3558: [BEAM-2556] Implement retries in the read connector...

2017-07-13 Thread cph6

GitHub user cph6 opened a pull request:

https://github.com/apache/beam/pull/3558

[BEAM-2556] Implement retries in the read connector.

Retry failed RunQuery calls.
Respect non-retryable error codes from Datastore.
Add unit tests to check that retryable errors are retried.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cph6/beam datastore_better_error_handling

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3558


commit 016baf3465bbccbc9d3df6999b38b1b2533aee8c
Author: Colin Phipps 
Date:   2017-07-10T16:09:23Z

Implement retries in the read connector.

Respect non-retryable error codes from Datastore.
Add unit tests to check that retryable errors are retried.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (BEAM-2565) Add integration test for CASE operators

2017-07-13 Thread James Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Xu updated BEAM-2565:
---
Summary: Add integration test for CASE operators  (was: Add integration 
test for OTHER operators)

> Add integration test for CASE operators
> ---
>
> Key: BEAM-2565
> URL: https://issues.apache.org/jira/browse/BEAM-2565
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Jenkins build is back to normal : beam_PostCommit_Python_Verify #2717

2017-07-13 Thread Apache Jenkins Server

See

svn commit: r20441 - in /dev/beam/2.1.0: apache-beam-2.1.0-source-release.zip apache-beam-2.1.0-source-release.zip.asc apache-beam-2.1.0-source-release.zip.md5 apache-beam-2.1.0-source-release.zip.sha

2017-07-13 Thread jbonofre

Author: jbonofre
Date: Thu Jul 13 09:43:39 2017
New Revision: 20441

Log:
Fix source dist content

Modified:
dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip
dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.asc
dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.md5
dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.sha1

Modified: dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip
==
Binary files - no diff available.

Modified: dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.asc
==
--- dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.asc (original)
+++ dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.asc Thu Jul 13 09:43:39 
2017
@@ -1,17 +1,17 @@
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1
 
-iQIcBAABAgAGBQJZZykCAAoJEL/y7kLIKC52XQEP/js6Oe0iY0KmXt8/l6TA7AvU
-ZJe8yE0HAd9/xSk9Prlrrt6iu8aeEiu9SeGxrutySuYqjAPF+XQifS2oXmh3mDGk
-pErB7eeRYAWzClXtNrDkjfheEQByqAr9ZOWlfCh/PcQJYQHILzQJPgkzQ850g5jE
-QwVNbb7uBSfuoGy1QUzKZhRshiISToh2J2h7syyocMQl6Kq9xUMwNx26SkHId6EI
-2Ha4mjibGItyuS6qUnzNsl2N++P1gfqqWrn+QuuFX8SScwqPpQ19nQg85YsfRGjo
-5LrfhLdf9SOqkIo1I5mBwP/6IbVYKcGV+9VsjvB+jYueeW0+QSkjd6FO9GEF8X5j
-i7+roeBVN7gvGoApLmYsIZDsbothtOnebTUSXvw1DjCjkPOdEvsNOPLgZpzAPBQu
-V6kbiIlFbPBy1x1fsgKsNhKKQB+9hia9YZZKTdcec/tCUau38x8anTAj/Z8U98tC
-zO1UcrZp+KJ8uFsVqFM+Ab+2X4snXNvK1wXWiD9fX2aFC6OWd6t10H+qgqMcLjH5
-32syXrWOywqa9198WKe5EnUL5X7WMJCR7CKFfiFOzH0SUsQqt+n2+XxaHuNwEXQe
-umbr3JaGnnSPtUu5HUa4dgTe7ooXWSE/xLw4hodihYJQMqMn+I98K0vb3y5yMgL0
-VoBCuUWAzmNMHO3LD/Lc
-=rTy5
+iQIcBAABAgAGBQJZZ0BiAAoJEL/y7kLIKC52AxgP/3/ET1JZut4JeLgQ/02t6yu4
+5nxdw4jIS1cQee0pzT7GYs2Vgm/7Fw7eEVdHgmlOsDPGDyxug5HKL/lJ18pHdFPV
+oCyrc+mhgnMRwi9ApDTyjXF663JnOjUN3kf9MIax98pBbunUA+SDZJwk5OLLdIqI
+VCc01XjoeEfQ6yT1VUX4mShrSJqxsQtK1o2wcET40oRSv+4LsT6ERkcFi6jp8uER
+mymTG/xP9KjTJRJP+U3lcwO7zUXZw6xjUl9AXBsu/llkb8xPWc1ig4LMeAD+DLyi
+JLt3FhONuW5RsvXeUl5RzVeAi8eI7a5eD9bbjR7El4dYSXdnLlhkLeSUj8Dvlb41
+PXqCv5iOBwhkJzpmbj8xyp+RK2HIbDJ6fLR/zY09Ul4NS7ITQXYcyyuJVlJqm5E8
+hgRCCbwzLb9fcw58TItL5ly64nSJkp3XU9hvTPGR+OBKzvGdZy81Hw+5G6ygG3AU
+0wq/HdnaRw8xCL5v0SxHgsRg08uVIdn3Nsbim8B2fN4eFJBsBvKnmWx7FQKNI8jK
+QED6OyyLYHUToEYshSQdbsqUNjFUZl6TGpT86JYfLR0cYMfTwLzBwLCBJdmJiwws
+womkF9s22SHczg6cvPeKJGfOhFGh3dF/k24p3TTkt9sXchxyjJocD+eI9oE3fRzq
+IHCgpUznoSAato4S0e4K
+=ur8k
 -END PGP SIGNATURE-

Modified: dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.md5
==
--- dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.md5 (original)
+++ dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.md5 Thu Jul 13 09:43:39 
2017
@@ -1 +1 @@
-2b20e810096f1e147b29477558801e53  apache-beam-2.1.0-source-release.zip
+6f13b5238fda13f64558b5e212f2c073  apache-beam-2.1.0-source-release.zip

Modified: dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.sha1
==
--- dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.sha1 (original)
+++ dev/beam/2.1.0/apache-beam-2.1.0-source-release.zip.sha1 Thu Jul 13 
09:43:39 2017
@@ -1 +1 @@
-d29be1070a93d2f4217fbfad8248da30f04dd01c  apache-beam-2.1.0-source-release.zip
+a5024670baa2850caca7beaea92abb721ba44532  apache-beam-2.1.0-source-release.zip

[jira] [Created] (BEAM-2613) Add integration test for comparison operators

2017-07-13 Thread James Xu (JIRA)

James Xu created BEAM-2613:
--

 Summary: Add integration test for comparison operators
 Key: BEAM-2613
 URL: https://issues.apache.org/jira/browse/BEAM-2613
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: James Xu
Assignee: James Xu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (BEAM-2564) Add integration test for string operators

2017-07-13 Thread James Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Xu resolved BEAM-2564.

   Resolution: Fixed
Fix Version/s: Not applicable

> Add integration test for string operators
> -
>
> Key: BEAM-2564
> URL: https://issues.apache.org/jira/browse/BEAM-2564
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: James Xu
>  Labels: dsl_sql_merge
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2560) Add integration test for arithmetic operators

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085396#comment-16085396
 ] 

ASF GitHub Bot commented on BEAM-2560:
--

GitHub user xumingming opened a pull request:

https://github.com/apache/beam/pull/3557

[BEAM-2560] Add integration test for arithmetic operators.

Re-implemented the arithmetic operators & refactored string functions 
integration test to utilize `ExpressionChecker`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xumingming/beam 
BEAM-2560-integration-test-for-arithmetic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3557


commit 715615438ac72b2d4bf154e2816a65d2ccdd46ea
Author: James Xu 
Date:   2017-07-07T03:04:46Z

[BEAM-2560] Add integration test for arithmetic operators.

And also refactor BeamSqlStringFunctionsIntegrationTest to use 
ExpressionChecker




> Add integration test for arithmetic operators
> -
>
> Key: BEAM-2560
> URL: https://issues.apache.org/jira/browse/BEAM-2560
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: James Xu
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] beam pull request #3557: [BEAM-2560] Add integration test for arithmetic ope...

2017-07-13 Thread xumingming

GitHub user xumingming opened a pull request:

https://github.com/apache/beam/pull/3557

[BEAM-2560] Add integration test for arithmetic operators.

Re-implemented the arithmetic operators & refactored string functions 
integration test to utilize `ExpressionChecker`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xumingming/beam 
BEAM-2560-integration-test-for-arithmetic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3557


commit 715615438ac72b2d4bf154e2816a65d2ccdd46ea
Author: James Xu 
Date:   2017-07-07T03:04:46Z

[BEAM-2560] Add integration test for arithmetic operators.

And also refactor BeamSqlStringFunctionsIntegrationTest to use 
ExpressionChecker




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #3336: Beam-2171 Power function

2017-07-13 Thread app-tarush

Github user app-tarush closed the pull request at:

https://github.com/apache/beam/pull/3336


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] beam pull request #3386: [BEAM-2424] CAST operator supporting numeric, date ...

2017-07-13 Thread app-tarush

Github user app-tarush closed the pull request at:

https://github.com/apache/beam/pull/3386


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (BEAM-2424) CAST operator supporting numeric, date and timestamp types

2017-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085375#comment-16085375
 ] 

ASF GitHub Bot commented on BEAM-2424:
--

Github user app-tarush closed the pull request at:

https://github.com/apache/beam/pull/3386


> CAST operator supporting numeric, date and timestamp types
> --
>
> Key: BEAM-2424
> URL: https://issues.apache.org/jira/browse/BEAM-2424
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Tarush Grover
>Assignee: Tarush Grover
>  Labels: dsl_sql_merge
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (BEAM-2232) ApiSurface tests should run on the jar, not the pre-shaded code.

2017-07-13 Thread Innocent (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Innocent reassigned BEAM-2232:
--

Assignee: Innocent

> ApiSurface tests should run on the jar, not the pre-shaded code.
> 
>
> Key: BEAM-2232
> URL: https://issues.apache.org/jira/browse/BEAM-2232
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Innocent
>
> Currently, errors in the core SDK ApiSurface definition and loading are 
> caught only by tests of the ApiSurface of a module that depends on it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (BEAM-2612) support variance builtin aggregation function

2017-07-13 Thread Kai Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085347#comment-16085347
 ] 

Kai Jiang edited comment on BEAM-2612 at 7/13/17 8:17 AM:
--

cc [~mingmxu]
After this ticket, I can work on STDDEV_POP, STDDEV_SAMP, COVAR_POP, COVAR_SAMP


was (Author: vectorijk):
After this ticket, I can work on STDDEV_POP, STDDEV_SAMP, COVAR_POP, COVAR_SAMP

> support variance builtin aggregation function
> -
>
> Key: BEAM-2612
> URL: https://issues.apache.org/jira/browse/BEAM-2612
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>
> two builtin aggregate functions
> VAR_POP
> the population variance (square of the population standard deviation)
> VAR_SAMP
> the sample variance (square of the sample standard deviation)
> https://calcite.apache.org/docs/reference.html#aggregate-functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2612) support variance builtin aggregation function

2017-07-13 Thread Kai Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085347#comment-16085347
 ] 

Kai Jiang commented on BEAM-2612:
-

After this ticket, I can work on STDDEV_POP, STDDEV_SAMP, COVAR_POP, COVAR_SAMP

> support variance builtin aggregation function
> -
>
> Key: BEAM-2612
> URL: https://issues.apache.org/jira/browse/BEAM-2612
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>
> two builtin aggregate functions
> VAR_POP
> the population variance (square of the population standard deviation)
> VAR_SAMP
> the sample variance (square of the sample standard deviation)
> https://calcite.apache.org/docs/reference.html#aggregate-functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 114 matches

Mail list logo