Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Michał Walenia
That's interesting, the command works for me (it crashes on communication with GCP, but that was expected). From the log output it seems you're using Windows. Did you use WSL to run this? Do you have an option to use Linux to check this command? The JSON created in the Gradle script may be treated

Re: Full stream-stream join semantics

2019-11-27 Thread Reza Rokni
Hi, With regards to the processing needed for sort: The first naive implementation of the prototype did a read and sort for every Timer that fired ( timers was set to fire for every LHS element timestamp, a property of the use case we was looking at). This worked but was very slow as you would

Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Rehman Murad Ali
Thank you, Brian and Michal, for replying. Here is the full command: ./gradlew :runners:google-cloud-dataflow-java:examples:preCommitLegacyWorker -PdataflowProject=apache-beam-testing -Pproject=apache-beam-testing -PgcpProject=apache-beam-testing

Re: Python staging file weirdness

2019-11-27 Thread Valentyn Tymofieiev
Tests job specify[1] a requirements.txt file that contains two entries: pyhamcrest, mock. We download[2] sources of packages specified in requirements file, and packages they depend on. While doing so, it appears that we use a cache directory on jenkins to store the sources of the packages [3],

Re: Full stream-stream join semantics

2019-11-27 Thread Kenneth Knowles
Yes, I am suggesting to add more intelligent state data structures for just that sort of join. I tagged Reza because his work basically does it, but explicitly pulls a BagState into memory and sorts it. We just need to avoid that. It is the sort of thing that already exists in some engines so

Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Zohaib Baig
+Rehman Murad Ali On Thu, Nov 28, 2019 at 2:58 AM Brian Hulette wrote: > It looks like you passed an argument like > -DbeamTestPipelineOptions >

Re: Update on push-down for SQL IOs.

2019-11-27 Thread Kenneth Knowles
Nice! Thanks for the very thorough summary. I think this will be a really good thing for Beam. Most of the IO sources are very highly optimized for querying and will do it more efficiently than the Beam runner when the structure of the query matches. I'm really excited to see the performance

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Kenneth Knowles
On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold wrote: > On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles wrote: > > > > > *Opt-in*: This is a powerful idea that I think changes everything. > >- for an experimental new IO, a separate artifact; this way we can > also see downloads > >

Python interactive runner: test dependencies removed

2019-11-27 Thread Udi Meiri
As part of a move to stop using the deprecated (and racey) setup.py keywords setup_requires and test_require, interactive runner dependencies have been removed from tests in https://github.com/apache/beam/pull/10227 If this breaks any tests, please let me know. smime.p7s Description: S/MIME

Re: [PROPOSAL] Preparing for Beam 2.18 release

2019-11-27 Thread Valentyn Tymofieiev
+1. Thanks, Udi! On Wed, Nov 27, 2019 at 12:58 PM Ahmet Altay wrote: > Thank you Udi for keeping the release cadence. +1 to cutting 2.18.0 branch > on time. > > On Thu, Nov 21, 2019 at 10:07 AM Udi Meiri wrote: > >> Thanks Cham. Tomo, if there are any dependencies you believe are blockers >>

Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Brian Hulette
It looks like you passed an argument like -DbeamTestPipelineOptions

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Elliotte Rusty Harold
On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles wrote: > > *Opt-in*: This is a powerful idea that I think changes everything. >- for an experimental new IO, a separate artifact; this way we can also > see downloads >- for experimental code fragments, add checkState that the relevant >

Re: [PROPOSAL] Preparing for Beam 2.18 release

2019-11-27 Thread Ahmet Altay
Thank you Udi for keeping the release cadence. +1 to cutting 2.18.0 branch on time. On Thu, Nov 21, 2019 at 10:07 AM Udi Meiri wrote: > Thanks Cham. Tomo, if there are any dependencies you believe are blockers > please mark them. > Also, only the sub-tasks >

Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
Agree, this makes sense. On Wed, Nov 27, 2019 at 6:23 PM Luke Cwik wrote: > That looks good as well. > > I would suggest that we make the classpath scanning system pluggable using > PipelineOptions. For example in GcpOptions[1], we use two default instance > factories. The first one controls

Python staging file weirdness

2019-11-27 Thread Udi Meiri
I was investigating a Dataflow postcommit test failure (endpoints_pb2 missing), and saw this in the staging directory: $ gsutil ls gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Andrew Pilloud
We need is an annotation checker to ensure every public method is tagged either @Experimental or @Deprecated. That way there will be no confusion about what we expect to be stable. If we really want to offer stable APIs there exist many tools (such as JAPICC[1]) to ensure we don't make breaking

[DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Kenneth Knowles
Hi all I wanted to start a dedicated thread to the discussion of how to manage our @Experimental annotations, API evolution in general, etc. After some email back-and-forth this will get too big so then I will try to summarize into a document. But I think a thread to start with makes sense.

Re: [EXT] Re: using avro instead of json for BigQueryIO.Write

2019-11-27 Thread Chuck Yang
I would love to fix this, but not sure if I have the bandwidth at the moment. Anyway, created the jira here: https://jira.apache.org/jira/browse/BEAM-8841 Thanks! Chuck -- *Confidentiality Note:* We care about protecting our proprietary information, confidential material, and trade secrets. 

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-27 Thread Kenneth Knowles
On Tue, Nov 26, 2019 at 7:00 PM Chamikara Jayalath wrote: > > > On Tue, Nov 26, 2019 at 6:17 PM Reza Rokni wrote: > >> With regards to @Experimental there are a couple of discussions around >> its usage ( or rather over usage! ) on dev@. It is something that we >> need to clean up ( some of

Re: Detecting resources to stage

2019-11-27 Thread Luke Cwik
That looks good as well. I would suggest that we make the classpath scanning system pluggable using PipelineOptions. For example in GcpOptions[1], we use two default instance factories. The first one controls which class is used as the factory[2] and the second one instantiates an instance of

Hadoop client version 2.8.5 from 2.7 (EOL)

2019-11-27 Thread Tomo Suzuki
Hi Beam developers, I created a PR to upgrade Hadoop client version. https://github.com/apache/beam/pull/10222 . However, I don't have Hadoop cluster to test this. Can anybody try to see whether this change is compatible with a real Hadoop 2.7 / 2.8 cluster or not? -- Regards, Tomo

Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Michał Walenia
Hi, can you please post the command you used in the terminal? It seems you used a wrong combination of quotes, but I'd need to see it to be sure. Cheers, Michal On Wed, Nov 27, 2019 at 5:11 PM Rehman Murad Ali < rehman.murad...@venturedive.com> wrote: > Hi Community, > > I have been recently

Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
I didn't think it through, but this is something I have in mind. Keep existing implementation for URLClassLoader, and use URLClassLoader for experimental support of Java 11. List urls; if (classLoader instanceof URLClassLoader) { urls = Arrays.asList(((URLClassLoader)

Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Rehman Murad Ali
Hi Community, I have been recently trying to test data flow jobs with the beam. I have set up Gcloud account and tried to copy a file from the local system to the Gcloud storage (which works fine). Now I am trying to run preCommitLegacyWorker task on local and I am getting the following error:

Re: Detecting resources to stage

2019-11-27 Thread Łukasz Gajowy
This looks promising. Do you think you could share your code as well? That part sounds very calming: "ClassGraph is fully compatible with the new JPMS module system (Project Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the module path. However, the code is also fully

Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
Today I tried using classgraph [1] library to scan classpath in Java 11 instead of using URLClassLoader, and after that, the job worked on Dataflow. The logic of scanning classpath is pretty sophisticated [2], and classgraph doesn't have any dependencies. I'm wondering if we can relocate it to

Re: goVet and clickHouse tests failing

2019-11-27 Thread Elliotte Rusty Harold
I did get through this one, and made the classic mistake of not immediately committing the steps I took to writing. I believe it involved some combination of setting go paths in environment variables. I seem to have added this to the end of my .profile: export GOROOT=/usr/local/go export

Re: real real-time beam

2019-11-27 Thread Jan Lukavský
> Trigger firings can have decreasing event timestamps w/ the minimum timestamp combiner*. I do think the issue at hand is best analyzed in terms of the explicit ordering on panes. And I do think we need to have an explicit guarantee or annotation strong enough to describe a

Re: goVet and clickHouse tests failing

2019-11-27 Thread Amogh Tiwari
Hi Elliotte, I am facing a similar goVet issue. It would be great if you can guide me through the solution. Please let me know the steps that you followed. Regards, Amogh On Thu, Nov 21, 2019 at 6:06 PM Elliotte Rusty Harold wrote: > Tentatively, the goVet issue does seem to have been an issue

Re: [ANNOUNCE] New committer: Daniel Oliveira

2019-11-27 Thread Ankur Goenka
Congrats Daniel! On Mon, Nov 25, 2019 at 10:02 PM Tanay Tummalapalli wrote: > Congratulations! > > On Mon, Nov 25, 2019 at 11:12 PM Mark Liu wrote: > >> Congratulations, Daniel! >> >> On Mon, Nov 25, 2019 at 9:31 AM Ahmet Altay wrote: >> >>> Congratulations, Daniel! >>> >>> On Sat, Nov 23,