[jira] [Created] (BEAM-2822) Add support for progress reporting in fn API

2017-08-29 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2822:
---

 Summary: Add support for progress reporting in fn API
 Key: BEAM-2822
 URL: https://issues.apache.org/jira/browse/BEAM-2822
 Project: Beam
  Issue Type: New Feature
  Components: beam-model
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2820) SDKs should measure time spent for active element

2017-08-29 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli reassigned BEAM-2820:
---

Assignee: Vikas Kedigehalli  (was: Ahmet Altay)

> SDKs should measure time spent for active element
> -
>
> Key: BEAM-2820
> URL: https://issues.apache.org/jira/browse/BEAM-2820
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>Priority: Minor
>
> Measuring the time spent in active element for each ptransform will be 
> important in progress reporting as outlined in 
> https://s.apache.org/beam-fn-api-progress-reporting. Each SDK needs to 
> provide this the Runner to be able to achieve efficient dynamic work 
> rebalancing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2820) SDKs should measure time spent for active element

2017-08-29 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2820:
---

 Summary: SDKs should measure time spent for active element
 Key: BEAM-2820
 URL: https://issues.apache.org/jira/browse/BEAM-2820
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-harness, sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay
Priority: Minor


Measuring the time spent in active element for each ptransform will be 
important in progress reporting as outlined in 
https://s.apache.org/beam-fn-api-progress-reporting. Each SDK needs to provide 
this the Runner to be able to achieve efficient dynamic work rebalancing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2509) Fn API Runner hangs in grpc controller mode

2017-06-27 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli resolved BEAM-2509.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> Fn API Runner hangs in grpc controller mode
> ---
>
> Key: BEAM-2509
> URL: https://issues.apache.org/jira/browse/BEAM-2509
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model-fn-api, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.1.0
>
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L312
>  tests only run in direct mode, but we should run in grpc mode as well. 
> Currently the grpc mode is broken and needs fixing. Once we enable it, these 
> tests can catch issues like https://github.com/apache/beam/pull/3431



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2509) Fn API Runner hangs in grpc controller mode

2017-06-23 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061379#comment-16061379
 ] 

Vikas Kedigehalli commented on BEAM-2509:
-

cc: [~altay]

> Fn API Runner hangs in grpc controller mode
> ---
>
> Key: BEAM-2509
> URL: https://issues.apache.org/jira/browse/BEAM-2509
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model-fn-api, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Luke Cwik
>Priority: Minor
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L312
>  tests only run in direct mode, but we should run in grpc mode as well. 
> Currently the grpc mode is broken and needs fixing. Once we enable it, these 
> tests can catch issues like https://github.com/apache/beam/pull/3431



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2509) Fn API Runner hangs in grpc controller mode

2017-06-23 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2509:
---

 Summary: Fn API Runner hangs in grpc controller mode
 Key: BEAM-2509
 URL: https://issues.apache.org/jira/browse/BEAM-2509
 Project: Beam
  Issue Type: Bug
  Components: beam-model-fn-api, sdk-py
Reporter: Vikas Kedigehalli
Assignee: Luke Cwik
Priority: Minor


https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L312
 tests only run in direct mode, but we should run in grpc mode as well. 
Currently the grpc mode is broken and needs fixing. Once we enable it, these 
tests can catch issues like https://github.com/apache/beam/pull/3431



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039467#comment-16039467
 ] 

Vikas Kedigehalli commented on BEAM-2418:
-

[~bookman_google] could you try running it without templates (by passing query 
and other options via command line arguments) and see if it works? 

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039324#comment-16039324
 ] 

Vikas Kedigehalli commented on BEAM-2418:
-

Looks like we do include 'beam-sdks-java-extensions-protobuf" 
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/pom.xml#L76,
 and we also have integration tests that pass 
(https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1ReadIT.java#L105)

Taking a look.

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2286) Improve mobile gaming example user experience

2017-05-12 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2286:
---

 Summary: Improve mobile gaming example user experience
 Key: BEAM-2286
 URL: https://issues.apache.org/jira/browse/BEAM-2286
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay
Priority: Minor


Few things I noticed while running mobile gaming example that could be improved:

1. When running on direct runner the default input is too large (20G), so it 
seems as though the pipeline is stuck without an progress updates or metrics.
This could be improved by using a much smaller dataset by default.

2. Even when running on dataflow runner, with default worker settings and auto 
scaling, it still takes more than 30 minutes to run. We could use a much 
smaller dataset here too. 

Also the documentation of these examples could be improved in both the code 
docstring as well as beam quick start guide. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2242) Apache Beam Java modules do not correctly shade test artifacts

2017-05-10 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005157#comment-16005157
 ] 

Vikas Kedigehalli commented on BEAM-2242:
-

[~lcwik] Let me know if this PR https://github.com/apache/beam/pull/2688 would 
be of any help to catch these issues. 
CC: [~davor]

> Apache Beam Java modules do not correctly shade test artifacts
> --
>
> Key: BEAM-2242
> URL: https://issues.apache.org/jira/browse/BEAM-2242
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Blocker
> Fix For: 2.0.0
>
>
> javap dump of TextIOTest.class
> Not the unshaded references to org.apache.commons.compress.*
> {code}
> Compiled from "TextIOTest.java"
> public class org.apache.beam.sdk.io.TextIOTest {
> ...
>   private static java.io.File writeToFile(java.lang.String[], 
> java.lang.String, org.apache.beam.sdk.io.TextIO$CompressionType) throws 
> java.io.IOException;
> descriptor: 
> ([Ljava/lang/String;Ljava/lang/String;Lorg/apache/beam/sdk/io/TextIO$CompressionType;)Ljava/io/File;
> Code:
>0: getstatic #6  // Field 
> tempFolder:Ljava/nio/file/Path;
>3: aload_1
>4: invokeinterface #7,  2// InterfaceMethod 
> java/nio/file/Path.resolve:(Ljava/lang/String;)Ljava/nio/file/Path;
>9: invokeinterface #8,  1// InterfaceMethod 
> java/nio/file/Path.toFile:()Ljava/io/File;
>   14: astore_3
>   15: new   #9  // class java/io/FileOutputStream
>   18: dup
>   19: aload_3
>   20: invokespecial #10 // Method 
> java/io/FileOutputStream."":(Ljava/io/File;)V
>   23: astore4
>   25: getstatic #11 // Field 
> org/apache/beam/sdk/io/TextIOTest$4.$SwitchMap$org$apache$beam$sdk$io$TextIO$CompressionType:[I
>   28: aload_2
>   29: invokevirtual #12 // Method 
> org/apache/beam/sdk/io/TextIO$CompressionType.ordinal:()I
>   32: iaload
>   33: tableswitch   { // 1 to 5
>  1: 68
>  2: 71
>  3: 85
>  4: 99
>  5: 131
>default: 145
>   }
>   68: goto  157
>   71: new   #13 // class 
> java/util/zip/GZIPOutputStream
>   74: dup
>   75: aload 4
>   77: invokespecial #14 // Method 
> java/util/zip/GZIPOutputStream."":(Ljava/io/OutputStream;)V
>   80: astore4
>   82: goto  157
>   85: new   #15 // class 
> org/apache/commons/compress/compressors/bzip2/BZip2CompressorOutputStream
>   88: dup
>   89: aload 4
>   91: invokespecial #16 // Method 
> org/apache/commons/compress/compressors/bzip2/BZip2CompressorOutputStream."":(Ljava/io/OutputStream;)V
>   94: astore4
>   96: goto  157
>   99: new   #17 // class 
> java/util/zip/ZipOutputStream
>  102: dup
>  103: aload 4
>  105: invokespecial #18 // Method 
> java/util/zip/ZipOutputStream."":(Ljava/io/OutputStream;)V
>  108: astore5
>  110: aload 5
>  112: new   #19 // class java/util/zip/ZipEntry
>  115: dup
>  116: ldc   #20 // String entry
>  118: invokespecial #21 // Method 
> java/util/zip/ZipEntry."":(Ljava/lang/String;)V
>  121: invokevirtual #22 // Method 
> java/util/zip/ZipOutputStream.putNextEntry:(Ljava/util/zip/ZipEntry;)V
>  124: aload 5
>  126: astore4
>  128: goto  157
>  131: new   #23 // class 
> org/apache/commons/compress/compressors/deflate/DeflateCompressorOutputStream
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-2143) (Mis)Running Dataflow Wordcount gives non-helpful errors

2017-05-04 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli reassigned BEAM-2143:
---

Assignee: Vikas Kedigehalli  (was: Sourabh Bajaj)

> (Mis)Running Dataflow Wordcount gives non-helpful errors
> 
>
> Key: BEAM-2143
> URL: https://issues.apache.org/jira/browse/BEAM-2143
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Reporter: Ben Chambers
>Assignee: Vikas Kedigehalli
>Priority: Blocker
> Fix For: First stable release
>
>
> If you run a pipeline and forget to specify `tempLocation` (but specify 
> something else, such as `stagingLocation`) you get two messages indicating 
> you forgot to specify `stagingLocation`. 
> One says "no stagingLocation specified, choosing ..." the other says "error, 
> the staging location isn't readable" (if you give it just a bucket and not an 
> object within a bucket).
> This is surprising to me as a user, since (1) I specified a staging location 
> and (2) the flag I actually need to modify is `--tempLocation`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2143) (Mis)Running Dataflow Wordcount gives non-helpful errors

2017-05-04 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997553#comment-15997553
 ] 

Vikas Kedigehalli commented on BEAM-2143:
-

Taking a look for Java.

> (Mis)Running Dataflow Wordcount gives non-helpful errors
> 
>
> Key: BEAM-2143
> URL: https://issues.apache.org/jira/browse/BEAM-2143
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-gcp
>Reporter: Ben Chambers
>Assignee: Sourabh Bajaj
>Priority: Blocker
> Fix For: First stable release
>
>
> If you run a pipeline and forget to specify `tempLocation` (but specify 
> something else, such as `stagingLocation`) you get two messages indicating 
> you forgot to specify `stagingLocation`. 
> One says "no stagingLocation specified, choosing ..." the other says "error, 
> the staging location isn't readable" (if you give it just a bucket and not an 
> object within a bucket).
> This is surprising to me as a user, since (1) I specified a staging location 
> and (2) the flag I actually need to modify is `--tempLocation`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2167) Add appropriate display data to BigQueryIO and tests.

2017-05-03 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2167:
---

 Summary: Add appropriate display data to BigQueryIO and tests. 
 Key: BEAM-2167
 URL: https://issues.apache.org/jira/browse/BEAM-2167
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-gcp
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli


BigQueryIO doesn't display any meaningful display data anymore. Figure out what 
all should/can be displayed and add tests. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-2167) Add appropriate display data to BigQueryIO and tests.

2017-05-03 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-2167:

Affects Version/s: First stable release

> Add appropriate display data to BigQueryIO and tests. 
> --
>
> Key: BEAM-2167
> URL: https://issues.apache.org/jira/browse/BEAM-2167
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: First stable release
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>
> BigQueryIO doesn't display any meaningful display data anymore. Figure out 
> what all should/can be displayed and add tests. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2156) Clean up import guards for GCP libraries in the python SDK

2017-05-03 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2156:
---

 Summary: Clean up import guards for GCP libraries in the python SDK
 Key: BEAM-2156
 URL: https://issues.apache.org/jira/browse/BEAM-2156
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay
Priority: Minor


For protecting against environments that do not have gcp libraries installed we 
have import guards like 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/datastore/v1/helper.py#L21
 which is ugly. Need to come up with a better approach to handle such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-2080) Add custom maven enforcer rules to catch banned classes and dependencies

2017-04-25 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-2080:
---

 Summary: Add custom maven enforcer rules to catch banned classes 
and dependencies
 Key: BEAM-2080
 URL: https://issues.apache.org/jira/browse/BEAM-2080
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Affects Versions: Not applicable
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli
Priority: Minor


The maven enforcer plugin standard rules aren't sufficient to catch certain 
issues like:
* An artifact built as an uber/bundled jar (usually with shade plugin) 
including banned classes. 
* An artifact pom that depends on banned dependencies. (bannedDependencies rule 
provided by enforcer plugin doesn't work always because it doesn't look at the 
dependency-reduced-pom generated by shade plugin)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1631) Flink runner: submit job to a Flink-on-YARN cluster

2017-04-21 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979634#comment-15979634
 ] 

Vikas Kedigehalli commented on BEAM-1631:
-

[~aljoscha] I was able to come up with something without the need to have a 
bin/flink installation, 
https://github.com/vikkyrk/incubator-beam/commit/7405d376db390aab0f4b658b34c35b2e50eca63b
 (just a hack, needs clean up) but it still require users to have a 
{{HADOOP_CONF_DIR}}. If we want to go with my approach then I am happy to clean 
up and send out a PR. 

> Flink runner: submit job to a Flink-on-YARN cluster
> ---
>
> Key: BEAM-1631
> URL: https://issues.apache.org/jira/browse/BEAM-1631
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Aljoscha Krettek
>
> As far as I understand, running Beam pipelines on a Flink cluster can be done 
> in two ways:
> * Run directly with a Flink runner, and specifying {{--flinkMaster}} pipeline 
> option via, say, {{mvn exec}}.
> * Produce a bundled JAR, and use {{bin/flink}} to submit the same pipeline.
> These two ways are equivalent, and work well on a standalone Flink cluster.
> Submitting to a Flink-on-YARN is more complicated. You can still produce a 
> bundled JAR, and use {{bin/flink -yid }} to submit such a job. 
> However, that seems impossible with a Flink runner directly.
> If so, we should add the ability to the Flink runner to submit a job to a 
> Flink-on-YARN cluster directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-976) Update examples README.md to fix instructions to run pipelines

2017-04-20 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977371#comment-15977371
 ] 

Vikas Kedigehalli commented on BEAM-976:


[~davor] Is https://github.com/apache/beam/tree/master/examples/java a legit 
doc that users can rely on? If so, it seems outdated / needs fixing. 

> Update examples README.md to fix instructions to run pipelines
> --
>
> Key: BEAM-976
> URL: https://issues.apache.org/jira/browse/BEAM-976
> Project: Beam
>  Issue Type: Task
>  Components: examples-java
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>Priority: Minor
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1956) Flatten operation should respect input type hints.

2017-04-20 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli reassigned BEAM-1956:
---

Assignee: Ahmet Altay  (was: Vikas Kedigehalli)

> Flatten operation should respect input type hints.
> --
>
> Key: BEAM-1956
> URL: https://issues.apache.org/jira/browse/BEAM-1956
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Ahmet Altay
> Fix For: First stable release
>
>
> Input type hints are currently not respected by the Flatten operation and 
> instead `Any` type is chosen as a fallback. This could lead to using a pickle 
> coder even if there was a custom coder type hint provided for input 
> PCollections. 
> Also, this could lead to undesirable results, particularly, when a Flatten 
> operation is followed by a GroupByKey operation which requires the key coder 
> to be deterministic. Even if the user provides deterministic coder type hints 
> to their PCollections, defaulting to Any would result in using the pickle 
> coder (non-deterministic). As a result of this, CoGroupByKey is broken in 
> such scenarios where input PCollection coder is deterministic for the type 
> while pickle coder is not.   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1910) test_using_slow_impl very flaky locally

2017-04-18 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973678#comment-15973678
 ] 

Vikas Kedigehalli commented on BEAM-1910:
-

Yeah most likely related to cleanups. Now that c and so files are added to 
gitignore, we might not notice them, but they were probably not cleaned up. 

> test_using_slow_impl very flaky locally
> ---
>
> Key: BEAM-1910
> URL: https://issues.apache.org/jira/browse/BEAM-1910
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Eugene Kirpichov
>Assignee: Ahmet Altay
>
> Most times this test fails on my machine when running:
> mvn verify -am -T 1C
> test_using_slow_impl (apache_beam.coders.slow_coders_test.SlowCoders) ... FAIL
> ...
> ___ summary 
> 
> ERROR:   docs: commands failed
>   lint: commands succeeded
> ERROR:   py27: commands failed
>   py27cython: commands succeeded
>   py27gcp: commands succeeded
> [ERROR] Command execution failed.
> org.apache.commons.exec.ExecuteException: Process exited with an error: 1 
> (Exit value: 1)
>   at 
> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
>   at 
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
>   at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:764)
>   at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:711)
>   at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:289)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:185)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:181)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Unfortunately the test doesn't print anything to maven output, so I don't 
> know what went wrong. I also don't know how to rerun the individual test 
> myself.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1910) test_using_slow_impl very flaky locally

2017-04-18 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973683#comment-15973683
 ] 

Vikas Kedigehalli commented on BEAM-1910:
-

Related to discussion in 
https://lists.apache.org/thread.html/d379b359cbaf7c0920c2d317dd2126ecf9b2439599195313a57611d5@%3Cdev.beam.apache.org%3E

> test_using_slow_impl very flaky locally
> ---
>
> Key: BEAM-1910
> URL: https://issues.apache.org/jira/browse/BEAM-1910
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Eugene Kirpichov
>Assignee: Ahmet Altay
>
> Most times this test fails on my machine when running:
> mvn verify -am -T 1C
> test_using_slow_impl (apache_beam.coders.slow_coders_test.SlowCoders) ... FAIL
> ...
> ___ summary 
> 
> ERROR:   docs: commands failed
>   lint: commands succeeded
> ERROR:   py27: commands failed
>   py27cython: commands succeeded
>   py27gcp: commands succeeded
> [ERROR] Command execution failed.
> org.apache.commons.exec.ExecuteException: Process exited with an error: 1 
> (Exit value: 1)
>   at 
> org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
>   at 
> org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
>   at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:764)
>   at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:711)
>   at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:289)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:185)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:181)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Unfortunately the test doesn't print anything to maven output, so I don't 
> know what went wrong. I also don't know how to rerun the individual test 
> myself.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1631) Flink runner: submit job to a Flink-on-YARN cluster

2017-04-17 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971355#comment-15971355
 ] 

Vikas Kedigehalli commented on BEAM-1631:
-

[~aljoscha] [~davor] Even if we do this, we still need to have 
`HADOOP_CONF_DIR` env variable pointing to the hadoop/yarn configuration on 
user's machine. Is that an acceptable solution?

> Flink runner: submit job to a Flink-on-YARN cluster
> ---
>
> Key: BEAM-1631
> URL: https://issues.apache.org/jira/browse/BEAM-1631
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Davor Bonaci
>Assignee: Aljoscha Krettek
>
> As far as I understand, running Beam pipelines on a Flink cluster can be done 
> in two ways:
> * Run directly with a Flink runner, and specifying {{--flinkMaster}} pipeline 
> option via, say, {{mvn exec}}.
> * Produce a bundled JAR, and use {{bin/flink}} to submit the same pipeline.
> These two ways are equivalent, and work well on a standalone Flink cluster.
> Submitting to a Flink-on-YARN is more complicated. You can still produce a 
> bundled JAR, and use {{bin/flink -yid }} to submit such a job. 
> However, that seems impossible with a Flink runner directly.
> If so, we should add the ability to the Flink runner to submit a job to a 
> Flink-on-YARN cluster directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1956) Flatten operation should respect input type hints.

2017-04-12 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966906#comment-15966906
 ] 

Vikas Kedigehalli commented on BEAM-1956:
-

cc [~lcwik] [~robertwb] [~dhalp...@google.com]

> Flatten operation should respect input type hints.
> --
>
> Key: BEAM-1956
> URL: https://issues.apache.org/jira/browse/BEAM-1956
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
> Fix For: First stable release
>
>
> Input type hints are currently not respected by the Flatten operation and 
> instead `Any` type is chosen as a fallback. This could lead to using a pickle 
> coder even if there was a custom coder type hint provided for input 
> PCollections. 
> Also, this could lead to undesirable results, particularly, when a Flatten 
> operation is followed by a GroupByKey operation which requires the key coder 
> to be deterministic. Even if the user provides deterministic coder type hints 
> to their PCollections, defaulting to Any would result in using the pickle 
> coder (non-deterministic). As a result of this, CoGroupByKey is broken in 
> such scenarios where input PCollection coder is deterministic for the type 
> while pickle coder is not.   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1956) Flatten operation should respect input type hints.

2017-04-12 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1956:
---

 Summary: Flatten operation should respect input type hints.
 Key: BEAM-1956
 URL: https://issues.apache.org/jira/browse/BEAM-1956
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli
 Fix For: First stable release


Input type hints are currently not respected by the Flatten operation and 
instead `Any` type is chosen as a fallback. This could lead to using a pickle 
coder even if there was a custom coder type hint provided for input 
PCollections. 

Also, this could lead to undesirable results, particularly, when a Flatten 
operation is followed by a GroupByKey operation which requires the key coder to 
be deterministic. Even if the user provides deterministic coder type hints to 
their PCollections, defaulting to Any would result in using the pickle coder 
(non-deterministic). As a result of this, CoGroupByKey is broken in such 
scenarios where input PCollection coder is deterministic for the type while 
pickle coder is not.   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1951) Add datastoreio integration tests for python.

2017-04-12 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1951:

Fix Version/s: First stable release

> Add datastoreio integration tests for python. 
> --
>
> Key: BEAM-1951
> URL: https://issues.apache.org/jira/browse/BEAM-1951
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
> Fix For: First stable release
>
>
> An example can be found here, 
> https://github.com/apache/beam/blob/3711c0caf91e1c4d32c055bdff098f81f56b49c1/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1952) Add more integration tests for python SDK

2017-04-12 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1952:

Fix Version/s: First stable release

> Add more integration tests for python SDK
> -
>
> Key: BEAM-1952
> URL: https://issues.apache.org/jira/browse/BEAM-1952
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Ahmet Altay
> Fix For: First stable release
>
>
> This is a tracking ticket for adding more python integration tests. 
> 1. https://issues.apache.org/jira/browse/BEAM-1951 (datastoreio)
> TODO: Add more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1952) Add more integration tests for python SDK

2017-04-12 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966254#comment-15966254
 ] 

Vikas Kedigehalli commented on BEAM-1952:
-

cc [~sb2nov] [~chamikara] [~charleschen] [~robertwb] [~markflyhigh]

> Add more integration tests for python SDK
> -
>
> Key: BEAM-1952
> URL: https://issues.apache.org/jira/browse/BEAM-1952
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Ahmet Altay
> Fix For: First stable release
>
>
> This is a tracking ticket for adding more python integration tests. 
> 1. https://issues.apache.org/jira/browse/BEAM-1951 (datastoreio)
> TODO: Add more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1952) Add more integration tests for python SDK

2017-04-12 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1952:
---

 Summary: Add more integration tests for python SDK
 Key: BEAM-1952
 URL: https://issues.apache.org/jira/browse/BEAM-1952
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay


This is a tracking ticket for adding more python integration tests. 





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1952) Add more integration tests for python SDK

2017-04-12 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1952:

Description: 
This is a tracking ticket for adding more python integration tests. 

1. https://issues.apache.org/jira/browse/BEAM-1951 (datastoreio)
TODO: Add more




  was:
This is a tracking ticket for adding more python integration tests. 




> Add more integration tests for python SDK
> -
>
> Key: BEAM-1952
> URL: https://issues.apache.org/jira/browse/BEAM-1952
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Ahmet Altay
>
> This is a tracking ticket for adding more python integration tests. 
> 1. https://issues.apache.org/jira/browse/BEAM-1951 (datastoreio)
> TODO: Add more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1951) Add datastoreio integration tests for python.

2017-04-12 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1951:
---

 Summary: Add datastoreio integration tests for python. 
 Key: BEAM-1951
 URL: https://issues.apache.org/jira/browse/BEAM-1951
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli


An example can be found here, 

https://github.com/apache/beam/blob/3711c0caf91e1c4d32c055bdff098f81f56b49c1/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes_it_test.py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1895) Create tranform in python sdk should be a custom source

2017-04-05 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1895:
---

 Summary: Create tranform in python sdk should be a custom source
 Key: BEAM-1895
 URL: https://issues.apache.org/jira/browse/BEAM-1895
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay


This allows Create transform to be runner independent. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1894) Race conditions in python direct runner eager mode

2017-04-05 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1894:
---

 Summary: Race conditions in python direct runner eager mode
 Key: BEAM-1894
 URL: https://issues.apache.org/jira/browse/BEAM-1894
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay
 Fix For: First stable release


test_eager_pipeline 
(https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline_test.py#L283)
 fails with the following error:
ERROR: test_eager_pipeline (apache_beam.pipeline_test.PipelineTest)
--
Traceback (most recent call last):
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline_test.py",
 line 285, in test_eager_pipeline
self.assertEqual([1, 4, 9], p | Create([1, 2, 3]) | Map(lambda x: x*x))
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/transforms/ptransform.py",
 line 387, in __ror__
p.run().wait_until_finish()
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py",
 line 160, in run
self.to_runner_api(), self.runner, self.options).run(False)
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py",
 line 169, in run
return self.runner.run(self)
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 99, in run
result.wait_until_finish()
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 166, in wait_until_finish
self._executor.await_completion()
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py",
 line 336, in await_completion
self._executor.await_completion()
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py",
 line 308, in __call__
uncommitted_bundle.get_elements_iterable())
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/evaluation_context.py",
 line 176, in append_to_cache
self._cache.append(applied_ptransform, tag, elements)
  File 
"/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 138, in append
self._cache[(applied_ptransform, tag)].extend(elements)
TypeError: 'NoneType' object has no attribute '__getitem__'


This is triggered when Create is changed to a custom source. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1865) Input Coder of GroupByKey should be a KV Coder in the Python SDK

2017-04-03 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1865:
---

 Summary: Input Coder of GroupByKey should be a KV Coder in the 
Python SDK
 Key: BEAM-1865
 URL: https://issues.apache.org/jira/browse/BEAM-1865
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli


`Any` type is consistent with `KV` in python. The coder for `Any` type is a 
fallback coder or a `FastPrimitivesCoder`, but for a `GroupByKey` operation 
this needs to be a `TupleCoder` to ensure that the generated pipeline 
representation is runnable on a runner in a different language (in the Fn API 
world)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1800) Can't save datastore objects

2017-03-24 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940746#comment-15940746
 ] 

Vikas Kedigehalli commented on BEAM-1800:
-

That is weird because the protobuf message is serailized to string before 
sending to httplib 
(https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/python/googledatastore/connection.py#L191)

[~mlambert] could you bypass Beam and trying writing directly using this 
library and see if it fails? That would tell us if its a Beam issue or not. 
https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/python/googledatastore/connection.py#L127

> Can't save datastore objects
> 
>
> Key: BEAM-1800
> URL: https://issues.apache.org/jira/browse/BEAM-1800
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Mike Lambert
>Assignee: Vikas Kedigehalli
>
> I can't seem to save my database objects using {{WriteToDatastore}}, as it 
> errors out on a strange unicode issue when trying to write a batch. 
> Stacktrace follows:
> {noformat}
> File "apache_beam/runners/common.py", line 195, in 
> apache_beam.runners.common.DoFnRunner.receive 
> (apache_beam/runners/common.c:5142)
>   self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 267, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7201)
>   self.reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 279, in 
> apache_beam.runners.common.DoFnRunner.reraise_augmented 
> (apache_beam/runners/common.c:7590)
>   raise type(exn), args, sys.exc_info()[2] 
> File "apache_beam/runners/common.py", line 263, in 
> apache_beam.runners.common.DoFnRunner.process 
> (apache_beam/runners/common.c:7090)
>   self._dofn_simple_invoker(element) 
> File "apache_beam/runners/common.py", line 198, in 
> apache_beam.runners.common.DoFnRunner._dofn_simple_invoker 
> (apache_beam/runners/common.c:5262)
>   self._process_outputs(element, self.dofn_process(element.value)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 354, in process
>   self._flush_batch() 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
>  line 363, in _flush_batch
>   helper.write_mutations(self._datastore, self._project, self._mutations) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 187, in write_mutations
>   commit(commit_request) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", 
> line 174, in wrapper
>   return fun(*args, **kwargs) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 185, in commit
>   datastore.commit(req) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 140, in commit
>   datastore_pb2.CommitResponse) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", 
> line 199, in _call_method
>   method='POST', body=payload, headers=headers) 
> File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 
> 631, in new_request
>   redirections, connection_type) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1609, in request (response, content)
>   = self._request(conn, authority, uri, request_uri, method, body, headers, 
> redirections, cachekey) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1351, in _request (response, content)
>   = self._conn_request(conn, request_uri, method, body, headers) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1273, in _conn_request
>   conn.request(method, request_uri, body, headers) 
> File "/usr/lib/python2.7/httplib.py", line 1039, in request
>   self._send_request(method, url, body, headers)
> File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
>self.endheaders(body) 
> File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
>   self._send_output(message_body) 
> File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
>   msg += message_body TypeError: must be str, not unicode
> [while running 'write to datastore/Convert to Mutation']
> {noformat}
> My code is basically:
> {noformat}
> | 'convert from entity' >> beam.Map(ConvertFromEntity)
> | 'write to datastore' >> WriteToDatastore(client.project)
> {noformat}
> Where {{ConvertFromEntity}} converts from a google.cloud.datastore object 
> (which has a nice API/interface) into the underlying protobuf (which is what 
> the beam gcp/datastore library expects):
> {noformat}
> from google.cloud.datastore import helpers
> def ConvertFromEntity(entity):
> return helpers.entity_to_protobuf(entity)

[jira] [Updated] (BEAM-1076) DatastoreIO template Options

2017-03-22 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1076:

Fix Version/s: First stable release

> DatastoreIO template Options
> 
>
> Key: BEAM-1076
> URL: https://issues.apache.org/jira/browse/BEAM-1076
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1076) DatastoreIO template Options

2017-03-22 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1076:

Fix Version/s: (was: 0.6.0)

> DatastoreIO template Options
> 
>
> Key: BEAM-1076
> URL: https://issues.apache.org/jira/browse/BEAM-1076
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-823) Improve DatastoreIO Documentation

2017-03-15 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli resolved BEAM-823.

   Resolution: Fixed
Fix Version/s: 0.6.0

> Improve DatastoreIO Documentation
> -
>
> Key: BEAM-823
> URL: https://issues.apache.org/jira/browse/BEAM-823
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: 0.3.0-incubating
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>Priority: Minor
> Fix For: 0.6.0
>
>
> Few things to be added to the DatastoreIO documentation,
> 1. Inequality Filter queries are not splittable.
> 2. Clarify Source is Batch only, while Sink support both Batch and Streaming. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1021) DatastoreIO for python

2017-03-15 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli resolved BEAM-1021.
-
   Resolution: Fixed
Fix Version/s: 0.6.0
   Not applicable

> DatastoreIO for python
> --
>
> Key: BEAM-1021
> URL: https://issues.apache.org/jira/browse/BEAM-1021
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
> Fix For: Not applicable, 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1076) DatastoreIO template Options

2017-03-15 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli resolved BEAM-1076.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> DatastoreIO template Options
> 
>
> Key: BEAM-1076
> URL: https://issues.apache.org/jira/browse/BEAM-1076
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1632) Current bigtable protos include google rpc generated classes in its jar

2017-03-06 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1632:

Description: bigtable-protos include generated classes like 
'com.google.rpc.Code' in its jar rather than depending on 
'grpc-google-common-protos'. This conflicts with 'Datastore' dependencies. I am 
not sure what the right solution is but for now the workaround is to tag the 
usage of 'grpc-google-common-protos' as 'usedDependency' to prevent 
'maven-dependency:analyze' issues.   (was: bigtable-protos include generated 
classes like 'com.google.rpc.Code' in its jar rather than depending on 
'grpc-google-common-protos'. I am not sure what the right solution is but for 
now the workaround is to tag the usage of 'grpc-google-common-protos' as 
'usedDependency' to prevent 'maven-dependency:analyze' issues. )

> Current bigtable protos include google rpc generated classes in its jar
> ---
>
> Key: BEAM-1632
> URL: https://issues.apache.org/jira/browse/BEAM-1632
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Daniel Halperin
>Priority: Minor
>
> bigtable-protos include generated classes like 'com.google.rpc.Code' in its 
> jar rather than depending on 'grpc-google-common-protos'. This conflicts with 
> 'Datastore' dependencies. I am not sure what the right solution is but for 
> now the workaround is to tag the usage of 'grpc-google-common-protos' as 
> 'usedDependency' to prevent 'maven-dependency:analyze' issues. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1632) Current bigtable protos includes google rpc generated classes that conflict with Datastore dependencies

2017-03-06 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1632:
---

 Summary: Current bigtable protos includes google rpc generated 
classes that conflict with Datastore dependencies
 Key: BEAM-1632
 URL: https://issues.apache.org/jira/browse/BEAM-1632
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-gcp
Reporter: Vikas Kedigehalli
Assignee: Daniel Halperin
Priority: Minor


bigtable-protos include generated classes like 'com.google.rpc.Code' in its jar 
rather than depending on 'grpc-google-common-protos'. I am not sure what the 
right solution is but for now the workaround is to tag the usage of 
'grpc-google-common-protos' as 'usedDependency' to prevent 
'maven-dependency:analyze' issues. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1632) Current bigtable protos include google rpc generated classes in its jar

2017-03-06 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1632:

Summary: Current bigtable protos include google rpc generated classes in 
its jar  (was: Current bigtable protos includes google rpc generated classes 
that conflict with Datastore dependencies)

> Current bigtable protos include google rpc generated classes in its jar
> ---
>
> Key: BEAM-1632
> URL: https://issues.apache.org/jira/browse/BEAM-1632
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Daniel Halperin
>Priority: Minor
>
> bigtable-protos include generated classes like 'com.google.rpc.Code' in its 
> jar rather than depending on 'grpc-google-common-protos'. I am not sure what 
> the right solution is but for now the workaround is to tag the usage of 
> 'grpc-google-common-protos' as 'usedDependency' to prevent 
> 'maven-dependency:analyze' issues. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1076) DatastoreIO template Options

2017-02-27 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1076:

Summary: DatastoreIO template Options  (was: Datastore Delete template)

> DatastoreIO template Options
> 
>
> Key: BEAM-1076
> URL: https://issues.apache.org/jira/browse/BEAM-1076
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1538) Add a fast version of BufferedElementCountingOutputStream

2017-02-27 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli resolved BEAM-1538.
-
   Resolution: Fixed
Fix Version/s: Not applicable

No longer need this as we embed this functionality into the IterableCoder 
directly. 

> Add a fast version of BufferedElementCountingOutputStream
> -
>
> Key: BEAM-1538
> URL: https://issues.apache.org/jira/browse/BEAM-1538
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Ahmet Altay
>Priority: Minor
> Fix For: Not applicable
>
>
> We are currently using python version of the stream which is slow. We need to 
> implement a Cython version.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1538) Add a fast version of BufferedElementCountingOutputStream

2017-02-22 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1538:
---

 Summary: Add a fast version of BufferedElementCountingOutputStream
 Key: BEAM-1538
 URL: https://issues.apache.org/jira/browse/BEAM-1538
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Ahmet Altay
Priority: Minor


We are currently using python version of the stream which is slow. We need to 
implement a Cython version.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1537) Report more accurate size estimation for Iterables of unknown length

2017-02-22 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1537:
---

 Summary: Report more accurate size estimation for Iterables of 
unknown length
 Key: BEAM-1537
 URL: https://issues.apache.org/jira/browse/BEAM-1537
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core, sdk-py
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1524) Timestamp precision and byte representation should be consistent across SDKs

2017-02-21 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877044#comment-15877044
 ] 

Vikas Kedigehalli commented on BEAM-1524:
-

No issues, just that it property wasn't satisfied by Python SDK and needs to be 
make consistent.

> Timestamp precision and byte representation should be consistent across SDKs
> 
>
> Key: BEAM-1524
> URL: https://issues.apache.org/jira/browse/BEAM-1524
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-java-core, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>
> Runners cannot have a timestamp precision that is lesser than what a SDK 
> uses. Also with Fn API a runner should be able to support multiple SDKs that 
> maybe written in different languages. It is important that we standardize the 
> precision of timestamp to ensure that runners are compatible across all the 
> SDKs.
> NOTE: Timestamps are shifted such that lexicographic ordering of the bytes 
> corresponds to chronological order. This also needs to be consistent across 
> SDKs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1524) Timestamp precision and byte representation should be consistent across SDKs

2017-02-21 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877044#comment-15877044
 ] 

Vikas Kedigehalli edited comment on BEAM-1524 at 2/21/17 11:54 PM:
---

No issues, just that it property wasn't satisfied by Python SDK and needs to be 
made consistent.


was (Author: vikasrk):
No issues, just that it property wasn't satisfied by Python SDK and needs to be 
make consistent.

> Timestamp precision and byte representation should be consistent across SDKs
> 
>
> Key: BEAM-1524
> URL: https://issues.apache.org/jira/browse/BEAM-1524
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-java-core, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>
> Runners cannot have a timestamp precision that is lesser than what a SDK 
> uses. Also with Fn API a runner should be able to support multiple SDKs that 
> maybe written in different languages. It is important that we standardize the 
> precision of timestamp to ensure that runners are compatible across all the 
> SDKs.
> NOTE: Timestamps are shifted such that lexicographic ordering of the bytes 
> corresponds to chronological order. This also needs to be consistent across 
> SDKs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1524) Timestamp precision and byte representation should be consistent across SDKs

2017-02-21 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1524:

Summary: Timestamp precision and byte representation should be consistent 
across SDKs  (was: Timestamp precision should be consistent across SDKs)

> Timestamp precision and byte representation should be consistent across SDKs
> 
>
> Key: BEAM-1524
> URL: https://issues.apache.org/jira/browse/BEAM-1524
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-java-core, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Kenneth Knowles
>
> Runners cannot have a timestamp precision that is lesser than what a SDK 
> uses. Also with Fn API a runner should be able to support multiple SDKs that 
> maybe written in different languages. It is important that we standardize the 
> precision of timestamp to ensure that runners are compatible across all the 
> SDKs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1524) Timestamp precision and byte representation should be consistent across SDKs

2017-02-21 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1524:

Description: 
Runners cannot have a timestamp precision that is lesser than what a SDK uses. 
Also with Fn API a runner should be able to support multiple SDKs that maybe 
written in different languages. It is important that we standardize the 
precision of timestamp to ensure that runners are compatible across all the 
SDKs.

NOTE: Timestamps are shifted such that lexicographic ordering of the bytes 
corresponds to chronological order. This also needs to be consistent across 
SDKs.

  was:Runners cannot have a timestamp precision that is lesser than what a SDK 
uses. Also with Fn API a runner should be able to support multiple SDKs that 
maybe written in different languages. It is important that we standardize the 
precision of timestamp to ensure that runners are compatible across all the 
SDKs.


> Timestamp precision and byte representation should be consistent across SDKs
> 
>
> Key: BEAM-1524
> URL: https://issues.apache.org/jira/browse/BEAM-1524
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-java-core, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Kenneth Knowles
>
> Runners cannot have a timestamp precision that is lesser than what a SDK 
> uses. Also with Fn API a runner should be able to support multiple SDKs that 
> maybe written in different languages. It is important that we standardize the 
> precision of timestamp to ensure that runners are compatible across all the 
> SDKs.
> NOTE: Timestamps are shifted such that lexicographic ordering of the bytes 
> corresponds to chronological order. This also needs to be consistent across 
> SDKs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1522) Add support for PaneInfo in Python SDK

2017-02-21 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1522:
---

 Summary: Add support for PaneInfo in Python SDK
 Key: BEAM-1522
 URL: https://issues.apache.org/jira/browse/BEAM-1522
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli


PaneInfo provides information about the pane an element belongs to. It is 
already available in the Java SDK here, 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PaneInfo.java
  and needs to be added to the Python SDK.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1504) Python standard coders test doesn't run with setup test

2017-02-16 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1504:
---

 Summary: Python standard coders test doesn't run with setup test
 Key: BEAM-1504
 URL: https://issues.apache.org/jira/browse/BEAM-1504
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli
Priority: Minor


_python setup.py test_ does not run _standard_coders_test_. This is because 
dynamically generated tests are not visibile to _nose.collect_ plugin. 

Fix: use _nose-parameterized_ to create parameterized tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1467) Use well-known coder types for known window coders

2017-02-15 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869039#comment-15869039
 ] 

Vikas Kedigehalli commented on BEAM-1467:
-

WindowedValueCoder implemenation and tests 
https://github.com/apache/beam/pull/2018

Note: PaneInfo isn't supported in python SDK yet, so we hard code the value to 
0x0F which represents PaneInfo.NO_FIRING

> Use well-known coder types for known window coders
> --
>
> Key: BEAM-1467
> URL: https://issues.apache.org/jira/browse/BEAM-1467
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Known window types include:
> * GlobalWindow
> * IntervalWindow
> * WindowedValueCoder
> Standardizing the name and encodings of these windows will enable many more 
> pipelines to work across the Fn API with low overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1467) Use well-known coder types for known window coders

2017-02-15 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli updated BEAM-1467:

Description: 
Known window types include:

* GlobalWindow
* IntervalWindow
* WindowedValueCoder

Standardizing the name and encodings of these windows will enable many more 
pipelines to work across the Fn API with low overhead.

  was:
Known window types include:

* GlobalWindow
* IntervalWindow

Standardizing the name and encodings of these windows will enable many more 
pipelines to work across the Fn API with low overhead.


> Use well-known coder types for known window coders
> --
>
> Key: BEAM-1467
> URL: https://issues.apache.org/jira/browse/BEAM-1467
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Known window types include:
> * GlobalWindow
> * IntervalWindow
> * WindowedValueCoder
> Standardizing the name and encodings of these windows will enable many more 
> pipelines to work across the Fn API with low overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1471) Make IterableCoder binary compatible across SDKs

2017-02-13 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864188#comment-15864188
 ] 

Vikas Kedigehalli commented on BEAM-1471:
-

Verified that when the iterable length is known, the IterableCoder is 
compatible across Java and Python SDKs. 
https://github.com/apache/beam/pull/1996 tests confirm that. 

When the iterable length is unknown, they are not compatible. Python SDK errors 
out right now. This still needs to be fixed. 

> Make IterableCoder binary compatible across SDKs
> 
>
> Key: BEAM-1471
> URL: https://issues.apache.org/jira/browse/BEAM-1471
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>
> Ensure IterableCoder across SDKs binary compatible and add tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1448) Coder encode/decode context documentation is lacking

2017-02-09 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860860#comment-15860860
 ] 

Vikas Kedigehalli commented on BEAM-1448:
-

Some more discussion on this thread, 
https://lists.apache.org/list.html?*@beam.apache.org:lte=1M:Questions%20about%20coders

> Coder encode/decode context documentation is lacking
> 
>
> Key: BEAM-1448
> URL: https://issues.apache.org/jira/browse/BEAM-1448
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>  Labels: documentation
>
> Coder encode/decode context documentation is lacking.
> * Documentation of {{Coder}} methods {{encode}} and {{decode}} should include 
> description of {{context}} argument and explain how to relate to it when 
> implementing.
> * Consider renaming the static {{Context}} values {{NESTED}} and {{OUTER}} to 
> more accurate names.
> * Emphasize the use of CoderProperties as the best way to test a coder.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/fbd2d6b869ac2b0225ec39461b14158a03f304a930782d39ac9a60a6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1225) Add a ToString transform in Java SDK

2016-12-29 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786506#comment-15786506
 ] 

Vikas Kedigehalli commented on BEAM-1225:
-

Common as in for Text based sinks, debugging etc. and not necessarily to be 
used in place of machine-readable output. There is a long discussion and a 
proposal here  
https://lists.apache.org/thread.html/caodmqrgukodgvhfx7fbfeikix8ezojtgpjcews6cr1knuu8...@mail.gmail.com
 

> Add a ToString transform in Java SDK
> 
>
> Key: BEAM-1225
> URL: https://issues.apache.org/jira/browse/BEAM-1225
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>Priority: Minor
>
> It is a common pattern in BEAM pipelines to convert a PCollection to 
> PCollection. It involves the pipeline author having to write a 
> SimpleFunction or DoFn that just calls the 'toString' method on each element 
> and then use it through MapElements or a ParDo. Having a ToString transform 
> would help avoid writing this boilerplate code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1225) Add a ToString transform in Java SDK

2016-12-28 Thread Vikas Kedigehalli (JIRA)
Vikas Kedigehalli created BEAM-1225:
---

 Summary: Add a ToString transform in Java SDK
 Key: BEAM-1225
 URL: https://issues.apache.org/jira/browse/BEAM-1225
 Project: Beam
  Issue Type: Improvement
Reporter: Vikas Kedigehalli
Assignee: Vikas Kedigehalli
Priority: Minor


It is a common pattern in BEAM pipelines to convert a PCollection to 
PCollection. It involves the pipeline author having to write a 
SimpleFunction or DoFn that just calls the 'toString' method on each element 
and then use it through MapElements or a ParDo. Having a ToString transform 
would help avoid writing this boilerplate code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)