Re: [External] : Re: outputReceiver.output() does not emit the result immediately

2021-01-26 Thread yu . b . zhang
Hi Boyuan, 

Thanks for replying. We are using beam 2.25.0 and direct runner for testing. We 
are trying to develop an unbounded streaming service connector with splittable 
DoFn. In our connector.read(), we want to commit the message back to stream 
after output the record to downstream user pipeline. The read and user pipeline 
looks like this:
public class Connector {
public static Connector.Read read() {
return new AutoValue_Connector_Read.Builder()
.setStream("")
.setStreamPartitions(Collections.singletonList(0))
.build();
}

@AutoValue
public abstract static class Read extends PTransform> {
@Override
public PCollection expand(PBegin input) {
PCollection output = input.getPipeline()
.apply(Impulse.create())
.apply(ParDo.of(new GenerateSourceDescriptor (this)));

// then apply the SDF read DoFn on it
return output.apply(ParDo.of(new ReadDoFn((this;
}
}
}

@DoFn.UnboundedPerElement
class ReadDoFn extends DoFn {
@ProcessElement
public ProcessContinuation processElement(@Element SourceDescriptor  
sourceDescriptor ,
  RestrictionTracker tracker,
  OutputReceiver receiver) {

while (true) {
List messages = getMessageFromStream(cursor);
if (messages.isEmpty()) {
return DoFn.ProcessContinuation.resume();
}
for (Message message : messages) {
if (!tracker.tryClaim(message)) {
return DoFn.ProcessContinuation.stop();
}

Reacord record = Record(message);
// output to user pipeline
receiver.outputWithTimestamp(record, Instant.now());

}
// commit this batch of messages and get updated cursor to read 
next batch of message
cursor = commitMessage();

}
}
}

 pipeline use Connector.read() to read from 
stream /

class UserPipline {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
p.getOptions().as(StreamingOptions.class).setStreaming(true);

PCollection>
output =
p.apply("Read Stream", Connector.read().setStream("stream1"))
.apply("Log Record", ParDo.of(new DoFn>() {
@ProcessElement
public void processElement(@Element Record input, 
OutputReceiver> out) {
System.out.printf("[User Pipeline] received 
offset %s : %s : %s \n", input.getOffset(), input.getKV().getKey(), 
input.getKV().getValue());
out.output(input.getKV());
}
}));
}
}
Since we commit the message after `outputReceiver.output()`, and use the cursor 
in commit response to get next message, if the `outputReceiver.output()` does 
not emit immediately, and buffer message 0, 1, 2, then if user pipeline stops 
and restarts, message 0, 1 are lost as `outputReceiver.output() has not emitted 
them, but messages have been committed in connector.

Is this the expected behavior of `outputReceiver.output()`, if so, how could we 
properly commit the message/ checkpoint in connector so downstream will not 
lost message when starting over. 

Thanks,
Yu
 

> On Jan 26, 2021, at 10:13, Boyuan Zhang  wrote:
> 
> +dev  
> 
> Hi Yu,
> Which runner are you using for your pipeline? Also it would be helpful to 
> share your pipeline code as well.
> 
> On Mon, Jan 25, 2021 at 10:19 PM  > wrote:
> Hi Beam Community,
> 
> I have a splittable `DoFn` that reads message from some stream and output the 
> result to down stream. The pseudo code looks like:
> @DoFn.ProcessElement
> public DoFn.ProcessContinuation processElement(@DoFn.Element SourceDescriptor 
> sourceDescriptor,
>
> RestrictionTracker tracker,
>WatermarkEstimator 
> watermarkEstimator,
>DoFn.OutputReceiver 
> receiver) throws Exception {
> while(true){
> messages = getMessageFromStream();
> if (messages.isEmpty()) {
> return DoFn.ProcessContinuation.resume();
> }
> for(message: messages){
> if (!tracker.tryClaim(message)) {
> return DoFn.ProcessContinuation.stop();
> }
> record = Record(message);
> receiver.outputWithTimestamp(record, message.getTimestamp);
> }
> }
> }
> 
> I expected to see the 

Re: Problems building latest beam source

2021-01-26 Thread Kyle Weaver
This should be fixed now.

On Tue, Jan 26, 2021 at 10:23 AM Kyle Weaver  wrote:

> I missed this thread and filed a JIRA for it:
> https://issues.apache.org/jira/browse/BEAM-11689
>
> > We could swap the spring.io repo for the pentaho nexus one
> public.nexus.pentaho.org (here probably in Beam
> https://github.com/apache/beam/blob/67989cafecf3f5d4bce879e9f6b9a690955e84d5/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy#L45
> )
>
> +1 I agree this is the correct solution here.
>
> On Tue, Jan 26, 2021 at 10:01 AM Tyson Hamilton 
> wrote:
>
>> I saw this issue on another OSS repo (
>> https://github.com/apache/hudi/issues/2479), they just removed the
>> spring.io repository but I'm not sure where their dep would come from
>> then. We could swap the spring.io repo for the pentaho nexus one
>> public.nexus.pentaho.org (here probably in Beam
>> https://github.com/apache/beam/blob/67989cafecf3f5d4bce879e9f6b9a690955e84d5/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy#L45
>> )
>>
>> On Mon, Jan 25, 2021 at 2:19 PM Tomo Suzuki  wrote:
>>
>>> I'm seeing the same problem.
>>>
>>> * What went wrong:
>>> Execution failed for task ':sdks:java:io:hcatalog:compileJava'.
>>> > Could not resolve all files for configuration
>>> ':sdks:java:io:hcatalog:provided'.
>>>> Could not resolve
>>> org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
>>>  Required by:
>>>  project :sdks:java:io:hcatalog >
>>> org.apache.hive:hive-exec:2.1.0 > org.apache.calcite:calcite-core:1.6.0
>>>   > Could not resolve
>>> org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
>>>  > Could not get resource '
>>> https://repo.spring.io/plugins-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom
>>> '.
>>> > Could not HEAD '
>>> https://repo.spring.io/plugins-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom'.
>>> Received status code 401 from server: Unauthorized
>>>
>>> On Mon, Jan 25, 2021 at 10:34 AM Steve Niemitz 
>>> wrote:
>>>
 I ran into issues
 resolving org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde, it looks
 like the spring repo hosting it has locked the artifact, I get a 401
 unauthorized trying to download it.

 Has anyone else run into this?  I assume many people have the artifact
 cached locally and so haven't run into it yet.

>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>


Re: [Proposal] Release Guide improvements

2021-01-26 Thread Robert Burke
Dang it. Good catch. Thanks.
https://issues.apache.org/jira/browse/BEAM-11693 is the right JIRA.

On Tue, Jan 26, 2021 at 3:56 PM Robert Burke  wrote:

> Hello Beamers!
>
> I just filled a JIRA BEAM-11217 [1] to make some changes and update the
> content of the release guide, based on my experiences with running the
> 2.26.0 release.
>
> Overall, the content is pretty thorough, but it's been added to
> organically over time, and is due for a bit of a cleanup. My big issues
> were around consistency within the document itself, that not all pre-reqs
> for building artifacts are declared, and that it's very easy to lose track
> of what needs to be done. None are too hard to resolve, just need a few
> passes to clean them up. Who better than someone who recently ran through
> it (me)?
>
> I've got a PR 13815 [2] out now to address initial formatting issues which
> for the most part does not affect the content. The main visible change is
> the email template is fixed (it's not presently displayed), among a few
> minor content additions.
>
> After that, I'd like to rewrite the introduction to clarify the release
> process and goals (generate and publish artifacts), and clean up some of
> the constants and such we will use. Overall the goal is to make it easier
> for any committer to read the document and possibly run a release. There
> are certainly other changes we can do (like make a release environment
> container)
>
> I understand our wonderful tech writing folks are pretty busy right now
> with the other website changes. I certainly welcome their input, but likely
> don't need per-PR review for most of these changes.
>
> 
>
> One bigger change to the content that certainly warrants discussion here:
> I'd like to remove the manual command level copies of the automation
> scripts in the release document.
>
> My reasoning: They are redundant, leading to lengthy descriptions that are
> skipped if running the scripts in question. They're often not kept in sync
> with the scripts themselves. I propose we remove the copies from the guide
> *and* improve the documentation in the scripts themselves for those who
> wish to execute the command manually.  This will better document what the
> scripts are doing, and avoids the redundancy, and the errors they can lead
> to.
>
> 
>
> If you'd like to be included on individual reviews, please let me know,
> and I'll add you to the various PRs. Otherwise, I'll be leaning on Pablo
> (2.27.0 release manager) and Cham (2.28.0 release manager) for reviews.
>
> Thank you for your time!
> Robert Burke (2.26.0 Release manager)
>
>
> [1] https://issues.apache.org/jira/browse/BEAM-11217
> [2] https://github.com/apache/beam/pull/13815
>


Re: [Proposal] Release Guide improvements

2021-01-26 Thread Kyle Weaver
> One bigger change to the content that certainly warrants discussion here:
I'd like to remove the manual command level copies of the automation
scripts in the release document.

I approve. I already removed a large chunk of the copies in [1], but I may
have missed some places.

I think BEAM-11217 is the wrong jira?

[1] https://github.com/apache/beam/pull/11764

On Tue, Jan 26, 2021 at 3:56 PM Robert Burke  wrote:

> Hello Beamers!
>
> I just filled a JIRA BEAM-11217 [1] to make some changes and update the
> content of the release guide, based on my experiences with running the
> 2.26.0 release.
>
> Overall, the content is pretty thorough, but it's been added to
> organically over time, and is due for a bit of a cleanup. My big issues
> were around consistency within the document itself, that not all pre-reqs
> for building artifacts are declared, and that it's very easy to lose track
> of what needs to be done. None are too hard to resolve, just need a few
> passes to clean them up. Who better than someone who recently ran through
> it (me)?
>
> I've got a PR 13815 [2] out now to address initial formatting issues which
> for the most part does not affect the content. The main visible change is
> the email template is fixed (it's not presently displayed), among a few
> minor content additions.
>
> After that, I'd like to rewrite the introduction to clarify the release
> process and goals (generate and publish artifacts), and clean up some of
> the constants and such we will use. Overall the goal is to make it easier
> for any committer to read the document and possibly run a release. There
> are certainly other changes we can do (like make a release environment
> container)
>
> I understand our wonderful tech writing folks are pretty busy right now
> with the other website changes. I certainly welcome their input, but likely
> don't need per-PR review for most of these changes.
>
> 
>
> One bigger change to the content that certainly warrants discussion here:
> I'd like to remove the manual command level copies of the automation
> scripts in the release document.
>
> My reasoning: They are redundant, leading to lengthy descriptions that are
> skipped if running the scripts in question. They're often not kept in sync
> with the scripts themselves. I propose we remove the copies from the guide
> *and* improve the documentation in the scripts themselves for those who
> wish to execute the command manually.  This will better document what the
> scripts are doing, and avoids the redundancy, and the errors they can lead
> to.
>
> 
>
> If you'd like to be included on individual reviews, please let me know,
> and I'll add you to the various PRs. Otherwise, I'll be leaning on Pablo
> (2.27.0 release manager) and Cham (2.28.0 release manager) for reviews.
>
> Thank you for your time!
> Robert Burke (2.26.0 Release manager)
>
>
> [1] https://issues.apache.org/jira/browse/BEAM-11217
> [2] https://github.com/apache/beam/pull/13815
>


[Proposal] Release Guide improvements

2021-01-26 Thread Robert Burke
Hello Beamers!

I just filled a JIRA BEAM-11217 [1] to make some changes and update the
content of the release guide, based on my experiences with running the
2.26.0 release.

Overall, the content is pretty thorough, but it's been added to organically
over time, and is due for a bit of a cleanup. My big issues were around
consistency within the document itself, that not all pre-reqs for building
artifacts are declared, and that it's very easy to lose track of what needs
to be done. None are too hard to resolve, just need a few passes to clean
them up. Who better than someone who recently ran through it (me)?

I've got a PR 13815 [2] out now to address initial formatting issues which
for the most part does not affect the content. The main visible change is
the email template is fixed (it's not presently displayed), among a few
minor content additions.

After that, I'd like to rewrite the introduction to clarify the release
process and goals (generate and publish artifacts), and clean up some of
the constants and such we will use. Overall the goal is to make it easier
for any committer to read the document and possibly run a release. There
are certainly other changes we can do (like make a release environment
container)

I understand our wonderful tech writing folks are pretty busy right now
with the other website changes. I certainly welcome their input, but likely
don't need per-PR review for most of these changes.



One bigger change to the content that certainly warrants discussion here:
I'd like to remove the manual command level copies of the automation
scripts in the release document.

My reasoning: They are redundant, leading to lengthy descriptions that are
skipped if running the scripts in question. They're often not kept in sync
with the scripts themselves. I propose we remove the copies from the guide
*and* improve the documentation in the scripts themselves for those who
wish to execute the command manually.  This will better document what the
scripts are doing, and avoids the redundancy, and the errors they can lead
to.



If you'd like to be included on individual reviews, please let me know, and
I'll add you to the various PRs. Otherwise, I'll be leaning on Pablo
(2.27.0 release manager) and Cham (2.28.0 release manager) for reviews.

Thank you for your time!
Robert Burke (2.26.0 Release manager)


[1] https://issues.apache.org/jira/browse/BEAM-11217
[2] https://github.com/apache/beam/pull/13815


Re: Beam support Flink Async I/O operator

2021-01-26 Thread Boyuan Zhang
+dev 

On Tue, Jan 26, 2021 at 1:07 PM Eleanore Jin  wrote:

> Hi community,
>
> Does Beam support Flink Async I/O operator? if so, can you please share
> the doc, and if not, is there any workaround to achieve the same in Beam
> semantics?
>
>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673
>
> Thanks a lot!
> Eleanore
>


Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Burke
I believe so.

The Go SDK requires in most instances for a user to Register their DoFns at
package init time, linked to the type/functions fully qualified path as
detemined by Go, which is consistent across architectures, at least with
the standard toochain.

Those strings are used to look things up on distributed workers, regardless
of the architecture.



On Tue, Jan 26, 2021, 11:33 AM Robert Bradshaw  wrote:

> Cool. Are DoFn (et al) references compatible across cross-compiled
> binaries?
>
> On Tue, Jan 26, 2021 at 11:23 AM Robert Burke  wrote:
>
>> Go cross compilation is as simple as setting the right flag env variables
>> [1], but can be as complicated as requiring a cross compiling GCC instance
>> installed if CGO[2] is necessary. I think we're probably clear on just
>> needing the flag though for the various Boot executables.
>>
>> For go pipelines we'd need to update the shared runner code to support
>> selecting the cross compiled worker binary environment. I believe it's hard
>> set to amd64 linux at present, but that's a separate issue.
>>
>> [1] https://golangcookbook.com/chapters/running/cross-compiling/
>> [2] https://golang.org/cmd/cgo/
>>
>> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw 
>> wrote:
>>
>>> +1
>>>
>>> I don't think it would be that hard to build and release arm-based
>>> docker images. (Perhaps just a matter of changing the docker file to depend
>>> on a different base, and doing some cross-compile. That would suss out
>>> whether we're inadvertently taking on any incompatible dependencies.)
>>>
>>> Theoretically, if one does that and manually specifies the container, it
>>> could just work for Python (assuming no wheel files are specified as manual
>>> dependencies). For Java, if one builds/deploys an uberjar (on a different
>>> architecture), there may be issues in any transitive dependency that has
>>> JNI code (us or users). I'd imagine this issue is common to and being
>>> explored by many of the other Java big data systems in use; it'd be
>>> interesting to know what solutions are out there.
>>>
>>> For go, the executable is uploaded directly into the container. We'd
>>> probably have to do something fancier like cross-compiling the executable
>>> (and making sure the UserFn references, which I think are just pointers
>>> into the binary, still work if the launcher is one architecture and the
>>> workers another).
>>>
>>> Definitely worth exploring.
>>>
>>>
>>>
>>>
>>> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:
>>>
 I stumbled today on this user request:
 BEAM-10982 Wheel support for linux aarch64

 It made me wonder if with the advent of ARM64 processors not only in
 the client but server side (Graviton and others) if it is worth that
 we start to think about having support for this architecture on the
 python installers and in the docker images. It seems that for the
 latter it should not be that difficult given that our parent images
 are already multi-arch.

 Are there some possible issues or binary/platform specific
 dependencies that impede us from doing this?

>>>


Re: Contributor permission for Beam Jira

2021-01-26 Thread Pablo Estrada
Welcome, Jamie!
And thanks for the contribution! I've marked you as a contributor on JIRA,
so you can assign issues to yourself now.
Best
-P.

On Tue, Jan 26, 2021 at 9:47 AM Jamie Thomson 
wrote:

> Hello,
>
>
>
> I am exploring Beam ahead of a new job I’m starting soon in which I’ll be
> using Beam/Dataflow. I’ve spotted a small typo in the beam docs that I’d
> like to correct as (a) its rather misleading and (b) correcting typos is a
> nice easy way of getting involved in Beam 
>
> Can I be added as a contributor to Beam’s Jira issue tracker? My ASF Jira
> username is “jamet”.
>
>
>
> Regards
>
> Jamie
>


Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Bradshaw
Cool. Are DoFn (et al) references compatible across cross-compiled
binaries?

On Tue, Jan 26, 2021 at 11:23 AM Robert Burke  wrote:

> Go cross compilation is as simple as setting the right flag env variables
> [1], but can be as complicated as requiring a cross compiling GCC instance
> installed if CGO[2] is necessary. I think we're probably clear on just
> needing the flag though for the various Boot executables.
>
> For go pipelines we'd need to update the shared runner code to support
> selecting the cross compiled worker binary environment. I believe it's hard
> set to amd64 linux at present, but that's a separate issue.
>
> [1] https://golangcookbook.com/chapters/running/cross-compiling/
> [2] https://golang.org/cmd/cgo/
>
> On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw 
> wrote:
>
>> +1
>>
>> I don't think it would be that hard to build and release arm-based docker
>> images. (Perhaps just a matter of changing the docker file to depend on a
>> different base, and doing some cross-compile. That would suss out whether
>> we're inadvertently taking on any incompatible dependencies.)
>>
>> Theoretically, if one does that and manually specifies the container, it
>> could just work for Python (assuming no wheel files are specified as manual
>> dependencies). For Java, if one builds/deploys an uberjar (on a different
>> architecture), there may be issues in any transitive dependency that has
>> JNI code (us or users). I'd imagine this issue is common to and being
>> explored by many of the other Java big data systems in use; it'd be
>> interesting to know what solutions are out there.
>>
>> For go, the executable is uploaded directly into the container. We'd
>> probably have to do something fancier like cross-compiling the executable
>> (and making sure the UserFn references, which I think are just pointers
>> into the binary, still work if the launcher is one architecture and the
>> workers another).
>>
>> Definitely worth exploring.
>>
>>
>>
>>
>> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:
>>
>>> I stumbled today on this user request:
>>> BEAM-10982 Wheel support for linux aarch64
>>>
>>> It made me wonder if with the advent of ARM64 processors not only in
>>> the client but server side (Graviton and others) if it is worth that
>>> we start to think about having support for this architecture on the
>>> python installers and in the docker images. It seems that for the
>>> latter it should not be that difficult given that our parent images
>>> are already multi-arch.
>>>
>>> Are there some possible issues or binary/platform specific
>>> dependencies that impede us from doing this?
>>>
>>


Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Burke
Go cross compilation is as simple as setting the right flag env variables
[1], but can be as complicated as requiring a cross compiling GCC instance
installed if CGO[2] is necessary. I think we're probably clear on just
needing the flag though for the various Boot executables.

For go pipelines we'd need to update the shared runner code to support
selecting the cross compiled worker binary environment. I believe it's hard
set to amd64 linux at present, but that's a separate issue.

[1] https://golangcookbook.com/chapters/running/cross-compiling/
[2] https://golang.org/cmd/cgo/

On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw  wrote:

> +1
>
> I don't think it would be that hard to build and release arm-based docker
> images. (Perhaps just a matter of changing the docker file to depend on a
> different base, and doing some cross-compile. That would suss out whether
> we're inadvertently taking on any incompatible dependencies.)
>
> Theoretically, if one does that and manually specifies the container, it
> could just work for Python (assuming no wheel files are specified as manual
> dependencies). For Java, if one builds/deploys an uberjar (on a different
> architecture), there may be issues in any transitive dependency that has
> JNI code (us or users). I'd imagine this issue is common to and being
> explored by many of the other Java big data systems in use; it'd be
> interesting to know what solutions are out there.
>
> For go, the executable is uploaded directly into the container. We'd
> probably have to do something fancier like cross-compiling the executable
> (and making sure the UserFn references, which I think are just pointers
> into the binary, still work if the launcher is one architecture and the
> workers another).
>
> Definitely worth exploring.
>
>
>
>
> On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:
>
>> I stumbled today on this user request:
>> BEAM-10982 Wheel support for linux aarch64
>>
>> It made me wonder if with the advent of ARM64 processors not only in
>> the client but server side (Graviton and others) if it is worth that
>> we start to think about having support for this architecture on the
>> python installers and in the docker images. It seems that for the
>> latter it should not be that difficult given that our parent images
>> are already multi-arch.
>>
>> Are there some possible issues or binary/platform specific
>> dependencies that impede us from doing this?
>>
>


Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Bradshaw
+1

I don't think it would be that hard to build and release arm-based docker
images. (Perhaps just a matter of changing the docker file to depend on a
different base, and doing some cross-compile. That would suss out whether
we're inadvertently taking on any incompatible dependencies.)

Theoretically, if one does that and manually specifies the container, it
could just work for Python (assuming no wheel files are specified as manual
dependencies). For Java, if one builds/deploys an uberjar (on a different
architecture), there may be issues in any transitive dependency that has
JNI code (us or users). I'd imagine this issue is common to and being
explored by many of the other Java big data systems in use; it'd be
interesting to know what solutions are out there.

For go, the executable is uploaded directly into the container. We'd
probably have to do something fancier like cross-compiling the executable
(and making sure the UserFn references, which I think are just pointers
into the binary, still work if the launcher is one architecture and the
workers another).

Definitely worth exploring.




On Tue, Jan 26, 2021 at 10:09 AM Ismaël Mejía  wrote:

> I stumbled today on this user request:
> BEAM-10982 Wheel support for linux aarch64
>
> It made me wonder if with the advent of ARM64 processors not only in
> the client but server side (Graviton and others) if it is worth that
> we start to think about having support for this architecture on the
> python installers and in the docker images. It seems that for the
> latter it should not be that difficult given that our parent images
> are already multi-arch.
>
> Are there some possible issues or binary/platform specific
> dependencies that impede us from doing this?
>


Re: Problems building latest beam source

2021-01-26 Thread Kyle Weaver
I missed this thread and filed a JIRA for it:
https://issues.apache.org/jira/browse/BEAM-11689

> We could swap the spring.io repo for the pentaho nexus one
public.nexus.pentaho.org (here probably in Beam
https://github.com/apache/beam/blob/67989cafecf3f5d4bce879e9f6b9a690955e84d5/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy#L45
)

+1 I agree this is the correct solution here.

On Tue, Jan 26, 2021 at 10:01 AM Tyson Hamilton  wrote:

> I saw this issue on another OSS repo (
> https://github.com/apache/hudi/issues/2479), they just removed the
> spring.io repository but I'm not sure where their dep would come from
> then. We could swap the spring.io repo for the pentaho nexus one
> public.nexus.pentaho.org (here probably in Beam
> https://github.com/apache/beam/blob/67989cafecf3f5d4bce879e9f6b9a690955e84d5/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy#L45
> )
>
> On Mon, Jan 25, 2021 at 2:19 PM Tomo Suzuki  wrote:
>
>> I'm seeing the same problem.
>>
>> * What went wrong:
>> Execution failed for task ':sdks:java:io:hcatalog:compileJava'.
>> > Could not resolve all files for configuration
>> ':sdks:java:io:hcatalog:provided'.
>>> Could not resolve
>> org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
>>  Required by:
>>  project :sdks:java:io:hcatalog > org.apache.hive:hive-exec:2.1.0
>> > org.apache.calcite:calcite-core:1.6.0
>>   > Could not resolve
>> org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
>>  > Could not get resource '
>> https://repo.spring.io/plugins-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom
>> '.
>> > Could not HEAD '
>> https://repo.spring.io/plugins-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom'.
>> Received status code 401 from server: Unauthorized
>>
>> On Mon, Jan 25, 2021 at 10:34 AM Steve Niemitz 
>> wrote:
>>
>>> I ran into issues
>>> resolving org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde, it looks
>>> like the spring repo hosting it has locked the artifact, I get a 401
>>> unauthorized trying to download it.
>>>
>>> Has anyone else run into this?  I assume many people have the artifact
>>> cached locally and so haven't run into it yet.
>>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>


Re: outputReceiver.output() does not emit the result immediately

2021-01-26 Thread Boyuan Zhang
+dev 

Hi Yu,
Which runner are you using for your pipeline? Also it would be helpful to
share your pipeline code as well.

On Mon, Jan 25, 2021 at 10:19 PM  wrote:

> Hi Beam Community,
>
> I have a splittable `DoFn` that reads message from some stream and output
> the result to down stream. The pseudo code looks like:
>
> @DoFn.ProcessElement
> public DoFn.ProcessContinuation processElement(@DoFn.Element SourceDescriptor 
> sourceDescriptor,
>
> RestrictionTracker tracker,
>WatermarkEstimator 
> watermarkEstimator,
>DoFn.OutputReceiver 
> receiver) throws Exception {
> while(true){
> messages = getMessageFromStream();
> if (messages.isEmpty()) {
> return DoFn.ProcessContinuation.resume();
> }
> for(message: messages){
> if (!tracker.tryClaim(message)) {
> return DoFn.ProcessContinuation.stop();
> }
> record = Record(message);
> receiver.outputWithTimestamp(record, message.getTimestamp);
> }
> }
> }
>
>
> I expected to see the output in downstream immediately, but the results
> are grouped into batch (4, 5 output) and emitted to down stream. Is this
> size configurable in `DoFn` or runner?
>
> Thanks for any answer,
> Yu
>
>
>
>


Multiple architectures support on Beam (ARM)

2021-01-26 Thread Ismaël Mejía
I stumbled today on this user request:
BEAM-10982 Wheel support for linux aarch64

It made me wonder if with the advent of ARM64 processors not only in
the client but server side (Graviton and others) if it is worth that
we start to think about having support for this architecture on the
python installers and in the docker images. It seems that for the
latter it should not be that difficult given that our parent images
are already multi-arch.

Are there some possible issues or binary/platform specific
dependencies that impede us from doing this?


Re: Problems building latest beam source

2021-01-26 Thread Tyson Hamilton
I saw this issue on another OSS repo (
https://github.com/apache/hudi/issues/2479), they just removed the spring.io
repository but I'm not sure where their dep would come from then. We could
swap the spring.io repo for the pentaho nexus one public.nexus.pentaho.org
(here probably in Beam
https://github.com/apache/beam/blob/67989cafecf3f5d4bce879e9f6b9a690955e84d5/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy#L45
)

On Mon, Jan 25, 2021 at 2:19 PM Tomo Suzuki  wrote:

> I'm seeing the same problem.
>
> * What went wrong:
> Execution failed for task ':sdks:java:io:hcatalog:compileJava'.
> > Could not resolve all files for configuration
> ':sdks:java:io:hcatalog:provided'.
>> Could not resolve
> org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
>  Required by:
>  project :sdks:java:io:hcatalog > org.apache.hive:hive-exec:2.1.0
> > org.apache.calcite:calcite-core:1.6.0
>   > Could not resolve
> org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde.
>  > Could not get resource '
> https://repo.spring.io/plugins-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom
> '.
> > Could not HEAD '
> https://repo.spring.io/plugins-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom'.
> Received status code 401 from server: Unauthorized
>
> On Mon, Jan 25, 2021 at 10:34 AM Steve Niemitz 
> wrote:
>
>> I ran into issues
>> resolving org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde, it looks
>> like the spring repo hosting it has locked the artifact, I get a 401
>> unauthorized trying to download it.
>>
>> Has anyone else run into this?  I assume many people have the artifact
>> cached locally and so haven't run into it yet.
>>
>
>
> --
> Regards,
> Tomo
>


Contributor permission for Beam Jira

2021-01-26 Thread Jamie Thomson
Hello,

I am exploring Beam ahead of a new job I’m starting soon in which I’ll be using 
Beam/Dataflow. I’ve spotted a small typo in the beam docs that I’d like to 
correct as (a) its rather misleading and (b) correcting typos is a nice easy 
way of getting involved in Beam 
Can I be added as a contributor to Beam’s Jira issue tracker? My ASF Jira 
username is “jamet”.

Regards
Jamie


Re: [ANNOUNCE] New committer: Piotr Szuberski

2021-01-26 Thread Reza Rokni
Congrats!

On Tue, Jan 26, 2021 at 4:25 AM Piotr Szuberski 
wrote:

> Thank you everyone! I really don't know what to say. I'm truly honoured. I
> do hope I will be able to keep up with the contributions.
>
> On 2021/01/22 16:32:45, Alexey Romanenko 
> wrote:
> > Hi everyone,
> >
> > Please join me and the rest of the Beam PMC in welcoming a new
> committer: Piotr Szuberski .
> >
> > Piotr started to contribute to Beam about one year ago and he did it
> very actively since then. He contributed to the different areas, like
> adding a cross-language functionality to existing IOs, improving ITs and
> performance tests environment/runtime, he actively worked on dependency
> updates [1].
> >
> > In consideration of his contributions, the Beam PMC trusts him with the
> responsibilities of a Beam committer [2].
> >
> > Thank you for your contributions, Piotr!
> >
> > -Alexey, on behalf of the Apache Beam PMC
> >
> > [1]
> https://github.com/apache/beam/pulls?q=is%3Apr+author%3Apiotr-szuberski <
> https://github.com/apache/beam/pulls?q=is:pr+author:piotr-szuberski>
> > [2]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> >
> >
> >
>