Re: New Design Doc for Cost Based Optimization

2019-07-12 Thread Kenneth Knowles
Thanks for these thorough docs. I feel I am learning a lot from what you are sharing. I've commented w/ my questions on the doc. Kenn On Wed, Jul 10, 2019 at 2:54 PM Alireza Samadian wrote: > Dear Members of Beam Community, > > Previously I had shared a document discussing row count estimation

Re: [PROPOSAL] Prepare for LTS bugfix release 2.7.1

2019-07-12 Thread Kenneth Knowles
I went ahead and took over all the bugs and did the cherrypicks for the remaining backports targeting 2.7.1: https://issues.apache.org/jira/issues/?jql=statusCategory%20%3D%20new%20AND%20project%20%3D%2012319527%20AND%20fixVersion%20%3D%2012344458 The tests are not healthy. I have not had time to

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-07-12 Thread Kenneth Knowles
I've seen some discussion on the doc. I cannot tell whether the questions are resolved or what the status of review is. Would you mind looping this thread with a quick summary? This is such a major piece of work I don't want it to sit with everyone thinking they are waiting on someone else, or any

Re: python precommits failing at head

2019-07-12 Thread Tanay Tummalapalli
Thank You Valentyn! I'll retest it. Hopefully, it's a transient issue. Regards, - Tanay Tummalapalli On Sat, Jul 13, 2019 at 2:39 AM Valentyn Tymofieiev wrote: > No, we did not reduce the timeout recently. Looking at console logs, > nothing happened for an hour or so, > > *06:57:50

Re: python precommits failing at head

2019-07-12 Thread Valentyn Tymofieiev
No, we did not reduce the timeout recently. Looking at console logs, nothing happened for an hour or so, *06:57:50 py27-cython: commands succeeded 06:57:50 congratulations :) 06:57:50 * *06:57:50* >* Task :sdks:python:preCommitPy2**08:22:33* Build timed out (after 120 minutes). Marking the build

Re: python precommits failing at head

2019-07-12 Thread Tanay Tummalapalli
Hi Udi, I rebased another PR[1] onto the fix mentioned above. The lint error is fixed, but, the "beam_PreCommit_Python_Commit" Jenkins job is failing because of a timeout at 120 minutes[2]. The log says "Build timed out (after 120 minutes). Marking the build as aborted." Another PR's Python

Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Clarification on previous message. Only happens on local file system where it is unable to match a pattern string. Via a `gs://` link it is able to do multiple file matching. On Fri, Jul 12, 2019 at 1:36 PM Shannon Duncan wrote: > Awesome. I got it working for a single file, but for a structure

Re: Bucketed histogram metrics in beam. Anyone currently looking into this?

2019-07-12 Thread Steve Niemitz
I've been doing some experiments in my own fork of the Dataflow worker using HdrHistogram [1] to record histograms. I export them to our own stats collector, not Stackdriver, but have been having good success with them. The problem is that the dataflow worker metrics implementation is totally

Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Awesome. I got it working for a single file, but for a structure of: /part-0001/index /part-0001/data /part-0002/index /part-0002/data I tried to do /part-* and /part-*/data It does not find the multipart files. However if I just do /part-0001/data it will find it and read it. Any ideas why?

Re: Beam/Samza Ensuring At Least Once semantics

2019-07-12 Thread Lukasz Cwik
That seems to be an issue with how the commit is being restarted in Samza and not with the Kafka source. On Thu, Jul 11, 2019 at 4:44 PM Deshpande, Omkar wrote: > Yes, we are resuming from samza’s last commit. But the problem is that the > last commit was done for data in the window that is not

Re: Jira Contributors List

2019-07-12 Thread sridhar inuog
Thank you, guys! On Fri, Jul 12, 2019 at 12:30 PM Rui Wang wrote: > Indeed! Welcome! > > > -Rui > > On Fri, Jul 12, 2019 at 10:16 AM Pablo Estrada wrote: > >> It seems that both have been added. Welcome! >> >> On Fri, Jul 12, 2019 at 10:12 AM Rui Wang wrote: >> >>> Hi Francesco, >>> >>>

Re: Bucketed histogram metrics in beam. Anyone currently looking into this?

2019-07-12 Thread Pablo Estrada
I am not aware of anyone working on this. I do recall a couple things: - These metrics can be very large in terms of space. Users may cause themselves trouble if they define too many of them. - Not enough reason not to do it, but certainly worth considering. - There is some code added by

Re: Jira Contributors List

2019-07-12 Thread Rui Wang
Indeed! Welcome! -Rui On Fri, Jul 12, 2019 at 10:16 AM Pablo Estrada wrote: > It seems that both have been added. Welcome! > > On Fri, Jul 12, 2019 at 10:12 AM Rui Wang wrote: > >> Hi Francesco, >> >> What's your JIRA ID? >> >> >> -Rui >> >> On Thu, Jul 11, 2019 at 9:17 AM Francesco Perera

Bucketed histogram metrics in beam. Anyone currently looking into this?

2019-07-12 Thread Alex Amato
Hi, I was wondering if anyone has any plans to introduce bucketed histogram to beam (different from Distribution, which is just min, max, sum and count values)? I have some thoughts about how it could be done so that it integrates with stackdriver. Essentially I am referring to a timeseries of

Re: Jira Contributors List

2019-07-12 Thread Pablo Estrada
It seems that both have been added. Welcome! On Fri, Jul 12, 2019 at 10:12 AM Rui Wang wrote: > Hi Francesco, > > What's your JIRA ID? > > > -Rui > > On Thu, Jul 11, 2019 at 9:17 AM Francesco Perera > wrote: > >> Hi, >> I am new to the beam community but I am eager to contribute back. I am >>

Re: Jira Contributors List

2019-07-12 Thread Rui Wang
Hi Francesco, What's your JIRA ID? -Rui On Thu, Jul 11, 2019 at 9:17 AM Francesco Perera wrote: > Hi, > I am new to the beam community but I am eager to contribute back. I am > going to work on this issue : >

Re: [VOTE] Vendored Dependencies Release

2019-07-12 Thread Kai Jiang
+1 (non-binding) On Thu, Jul 11, 2019 at 8:27 PM Lukasz Cwik wrote: > Please review the release of the following artifacts that we vendor: > * beam-vendor-grpc_1_21_0 > * beam-vendor-guava-26_0-jre > * beam-vendor-bytebuddy-1_9_3 > > Hi everyone, > Please review and vote on the release

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
I tried to pass ArrayList in and it wouldn't generalize it to List. It required me to convert my ArrayLists to Lists. On Fri, Jul 12, 2019 at 10:20 AM Lukasz Cwik wrote: > Additional coders would be useful. Note that we usually don't have coders > for specific collection types like ArrayList

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Lukasz Cwik
Additional coders would be useful. Note that we usually don't have coders for specific collection types like ArrayList but prefer to have Coders for their general counterparts like List, Map, Iterable, There has been discussion in the past to make the MapCoder a deterministic coder when a

Re: Phrase triggering jobs problem

2019-07-12 Thread Michał Walenia
Thanks for the heads up, I'll get in touch with him so that I don't duplicate the research. On Fri, Jul 12, 2019 at 3:55 PM Lukasz Cwik wrote: > I believe Scott Wegner investigated the new plugin (about 10 months ago) > because it seemed like it could filter out running tests based upon paths

Re: Circular dependencies between DataflowRunner and google cloud IO

2019-07-12 Thread Lukasz Cwik
Yes, there is a dependency between Dataflow -> GCP IOs and this is expected since Dataflow depends on parts of those implementations for its own execution purposes. We definitely don't want GCP IOs depending on Dataflow since we would like users of other runners to still be able to use GCP IOs

Circular dependencies between DataflowRunner and google cloud IO

2019-07-12 Thread Michał Walenia
Hi all, recently when I was trying to implement a performance test of BigQueryIO, I ran into an issue when trying to run the test on Dataflow. The problem was that I encountered a circular dependency when compiling the tests. I added the test in org.apache.beam.sdk.io.gcp.bigquery package, so I

Re: Phrase triggering jobs problem

2019-07-12 Thread Katarzyna Kucharczyk
Just for knowledge sharing purpose, here is a link to conversation about the new plugin. Kasia On Fri, Jul 12, 2019 at 10:47 AM Michał Walenia wrote: > Hi, > I think I'd like to take a look at it.

Re: Phrase triggering jobs problem

2019-07-12 Thread Michał Walenia
Hi, I think I'd like to take a look at it. I'll assign the issue to myself and I'll keep you posted on my findings. Have a good day Michal On Thu, Jul 11, 2019 at 8:10 PM Udi Meiri wrote: > Opened https://issues.apache.org/jira/browse/BEAM-7725 for migration off > the old plugin onto the new