[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-07 Thread Colin Bookman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041124#comment-16041124
 ] 

Colin Bookman commented on BEAM-2418:
-

This particular issue was fixed by using the shadowJar lib and merging service 
files...and upgrade gradle from version 1.4 to 3.5.
 
https://gist.github.com/cobookman/e30268b4cfa8d0cebbd1e4ae8ef848f0 

Thanks for help.

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Luke Cwik
>Priority: Blocker
> Fix For: Not applicable
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-07 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040457#comment-16040457
 ] 

Luke Cwik commented on BEAM-2418:
-

The way in which you are building your jar file is broken since it includes 
multiple copies of the same CoderProviderRegistrar file. Java does not 
understand jar files which have multiple copies of the same file within it. In 
this specific case the three 
*META-INF/services/org.apache.beam.sdk.coders.CoderProviderRegistrar* should 
have been concatenated together. 

{code}
lcwik@lcwik0:~/beam2418$ jar tvf dataflow-teleport-1.0-Alpha.jar | grep 
META-INF/services/org.apache.beam.sdk.coders.CoderProviderRegistrar
71 Fri May 12 17:03:24 PDT 2017 
META-INF/services/org.apache.beam.sdk.coders.CoderProviderRegistrar
   150 Fri May 12 16:56:14 PDT 2017 
META-INF/services/org.apache.beam.sdk.coders.CoderProviderRegistrar
   130 Fri May 12 17:03:38 PDT 2017 
META-INF/services/org.apache.beam.sdk.coders.CoderProviderRegistrar
{code}

The culprit seems to be that the build file in your project is incorrectly 
assembling the *uber* jar:
{code}
task uberjar(type: Jar) {
from files(sourceSets.main.output.classesDir)
from {configurations.compile.collect {zipTree(it)}} {
exclude "META-INF/*.SF"
exclude "META-INF/*.DSA"
exclude "META-INF/*.RSA"
}
manifest {
attributes 'Main-Class': mainClassName
}
}
{code}

Please take a look at the shadow gradle plugin and this section of their 
documentation about merging resources (specifically 2.7.1. Merging Service 
Descriptor Files): 
http://imperceptiblethoughts.com/shadow/#controlling_jar_content_merging

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Vikas Kedigehalli
>Priority: Blocker
> Fix For: Not applicable
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Colin Bookman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040093#comment-16040093
 ] 

Colin Bookman commented on BEAM-2418:
-

[~lcwik], I the ProtobufCoderProviderRegistar is included in the jar file. 
Here's the jar file in question: 
https://storage.googleapis.com/beam-dataflowio-bucket/dataflow-teleport-1.0-Alpha.jar
 

Here's the entire code for the Beam pipeline: 
https://github.com/cobookman/DatastoreToGCS/tree/beam
Here's the script / build I'm running and getting the error for: 
https://github.com/cobookman/DatastoreToGCS/blob/beam/scripts/datastore_to_gcs.sh



> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Vikas Kedigehalli
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039838#comment-16039838
 ] 

Luke Cwik commented on BEAM-2418:
-

[~bookman_google] Does build/libs/*.jar contain a jar representing 
beam-sdks-java-extensions-protobuf?

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Vikas Kedigehalli
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Colin Bookman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039733#comment-16039733
 ] 

Colin Bookman commented on BEAM-2418:
-

Same issue. Tried with the following arguments.


java -jar build/libs/*.jar \
  --runner=DataflowRunner \
  --project=my-project \
  --stagingLocation=gs://my-project.appspot.com/staging/ \
  --tempLocation=gs://my-project.appspot.com/temp/


```
Jun 06, 2017 2:57:37 PM org.apache.beam.runners.dataflow.DataflowRunner 
fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from 
the classpath: will stage 1 files. Enable logging at DEBUG level to see which 
files will be staged.
Exception in thread "main" java.lang.IllegalStateException: Unable to return a 
default Coder for 
IngestEntities/ParDo(GqlQueryTranslate)/ParMultiDo(GqlQueryTranslate).out0 
[PCollection]. Correct one of the following root causes:
  No Coder has been manually specified;  you may do so using .setCoder().
  Inferring a Coder from the CoderRegistry failed: Unable to provide a Coder 
for com.google.datastore.v1.Query.
  Building a Coder using a registered CoderProvider failed.
  See suppressed exceptions for detailed failures.
  Using the default output Coder from the producing PTransform failed: Unable 
to provide a Coder for com.google.datastore.v1.Query.
  Building a Coder using a registered CoderProvider failed.
  See suppressed exceptions for detailed failures.
at 
org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:250)
at 
org.apache.beam.sdk.values.PCollection.finishSpecifying(PCollection.java:104)
at 
org.apache.beam.sdk.runners.TransformHierarchy.finishSpecifyingInput(TransformHierarchy.java:147)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:481)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:422)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:277)
at 
org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read.expand(DatastoreV1.java:581)
at 
org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$Read.expand(DatastoreV1.java:226)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:482)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:441)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:179)
at 
com.google.cloud.dataflow.teleport.DatastoreToGcs.main(DatastoreToGcs.java:50)
at com.google.cloud.dataflow.teleport.Main.main(Main.java:50)
```

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039467#comment-16039467
 ] 

Vikas Kedigehalli commented on BEAM-2418:
-

[~bookman_google] could you try running it without templates (by passing query 
and other options via command line arguments) and see if it works? 

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Colin Bookman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039392#comment-16039392
 ] 

Colin Bookman commented on BEAM-2418:
-

If it helps I'm trying to build this as a template. Here's my CLI 

java -jar build/libs/*.jar \
  --runner=DataflowRunner \
  --project=my-project \
  --stagingLocation=gs://my-project.appspot.com/staging/ \
  --tempLocation=gs://my-project.appspot.com/temp/ \
  --templateLocation=gs://my-project.appspot.com/templates/


> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Vikas Kedigehalli (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039324#comment-16039324
 ] 

Vikas Kedigehalli commented on BEAM-2418:
-

Looks like we do include 'beam-sdks-java-extensions-protobuf" 
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/pom.xml#L76,
 and we also have integration tests that pass 
(https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/datastore/V1ReadIT.java#L105)

Taking a look.

> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>Priority: Blocker
> Fix For: 2.1.0
>
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-2418) Datastore IO does not work out of the box

2017-06-06 Thread Colin Bookman (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039282#comment-16039282
 ] 

Colin Bookman commented on BEAM-2418:
-

Tried adding `compile group: 'org.apache.beam', name: 
'beam-sdks-java-extensions-protobuf', version: '2.0.0'` to my build. Still did 
not solve issue.

Here's a gist that shows the stack trace, java code, and gradle build file: 
https://gist.github.com/cobookman/e4d2f2b89b4c3cadae9cd83892162758



> Datastore IO does not work out of the box
> -
>
> Key: BEAM-2418
> URL: https://issues.apache.org/jira/browse/BEAM-2418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions, sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Stephen Sisk
>Assignee: Davor Bonaci
>
> We have user reports that DatastoreIO does not work when they try to use it.
> We believe this is a result of our effort to minimize our dependencies in the 
> core SDK (protobuf in this case). ProtoCoder is not registered by default, so 
> a user would need explicitly include 'beam-sdks-java-extensions-protobuf' in 
> their maven dependencies to get it. 
> We  need to confirm it, but if so, we will probably need to fix this in the 
> next release to have ProtoCoder when using DatastoreIO.
> cc [~vikasrk]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)