[jira] [Updated] (BEAM-7246) Add Google Spanner IO on Python SDK

2020-05-18 Thread Shehzaad Nakhoda (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-7246:
---
Issue Type: New Feature  (was: Bug)

> Add Google Spanner IO on Python SDK 
> 
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: New Feature
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shoaib Zafar
>Priority: P2
>  Time Spent: 22.5h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10029) Add Spanner IO Performance Tests for Python SDK

2020-05-18 Thread Shehzaad Nakhoda (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-10029:

Summary: Add Spanner IO Performance Tests for Python SDK  (was: Add Spanner 
IO Performance Test for Python)

> Add Spanner IO Performance Tests for Python SDK
> ---
>
> Key: BEAM-10029
> URL: https://issues.apache.org/jira/browse/BEAM-10029
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shehzaad Nakhoda
>Assignee: Shoaib Zafar
>Priority: P2
>
> Add performance tests so that the SpannerIO functionality can move into 
> production (i.e. out of experimental).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10029) Add Spanner IO Performance Tests for Python SDK

2020-05-18 Thread Shehzaad Nakhoda (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110660#comment-17110660
 ] 

Shehzaad Nakhoda commented on BEAM-10029:
-

cc: [~chamikara] [~altay]

> Add Spanner IO Performance Tests for Python SDK
> ---
>
> Key: BEAM-10029
> URL: https://issues.apache.org/jira/browse/BEAM-10029
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shehzaad Nakhoda
>Assignee: Shoaib Zafar
>Priority: P2
>
> Add performance tests so that the SpannerIO functionality can move into 
> production (i.e. out of experimental).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10029) Add Spanner IO Performance Test for Python

2020-05-18 Thread Shehzaad Nakhoda (Jira)
Shehzaad Nakhoda created BEAM-10029:
---

 Summary: Add Spanner IO Performance Test for Python
 Key: BEAM-10029
 URL: https://issues.apache.org/jira/browse/BEAM-10029
 Project: Beam
  Issue Type: Test
  Components: io-py-gcp
Reporter: Shehzaad Nakhoda
Assignee: Shoaib Zafar


Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
from the spanner. Currently, it only contains direct runner unit tests. In 
order to make this functionality available for the users, integration tests 
also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10029) Add Spanner IO Performance Test for Python

2020-05-18 Thread Shehzaad Nakhoda (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-10029:

Description: Add performance tests so that the SpannerIO functionality can 
move into production (i.e. out of experimental).  (was: Spanner IO (Python SDK) 
contains PTransform which uses the BatchAPI to read from the spanner. 
Currently, it only contains direct runner unit tests. In order to make this 
functionality available for the users, integration tests also need to be added.)

> Add Spanner IO Performance Test for Python
> --
>
> Key: BEAM-10029
> URL: https://issues.apache.org/jira/browse/BEAM-10029
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shehzaad Nakhoda
>Assignee: Shoaib Zafar
>Priority: P2
>
> Add performance tests so that the SpannerIO functionality can move into 
> production (i.e. out of experimental).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-02-12 Thread Shehzaad Nakhoda (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035192#comment-17035192
 ] 

Shehzaad Nakhoda commented on BEAM-2535:


[~reuvenlax] [~kenn]  can this be marked as resolved?

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 25h 10m
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8543) Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle

2020-02-11 Thread Shehzaad Nakhoda (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-8543:
--

Assignee: Shehzaad Nakhoda

> Dataflow streaming timers are not strictly time ordered when set earlier 
> mid-bundle
> ---
>
> Key: BEAM-8543
> URL: https://issues.apache.org/jira/browse/BEAM-8543
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-02-11 Thread Shehzaad Nakhoda (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-1589:
--

Assignee: Shehzaad Nakhoda

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-1819) Key should be available in @OnTimer methods

2020-02-11 Thread Shehzaad Nakhoda (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-1819:
--

Assignee: Shehzaad Nakhoda

> Key should be available in @OnTimer methods
> ---
>
> Key: BEAM-1819
> URL: https://issues.apache.org/jira/browse/BEAM-1819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6858) Support side inputs injected into a DoFn

2019-10-31 Thread Shehzaad Nakhoda (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963849#comment-16963849
 ] 

Shehzaad Nakhoda commented on BEAM-6858:


[~reuvenlax] can this be marked resolved? thanks

> Support side inputs injected into a DoFn
> 
>
> Key: BEAM-6858
> URL: https://issues.apache.org/jira/browse/BEAM-6858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Beam currently supports injecting main inputs into a DoFn process method. A 
> user can write the following:
> @ProcessElement public void process(@Element InputT element)
> And Beam will (using ByteBuddy code generation) inject the input element into 
> the process method.
> We would like to also support the same for side inputs. For example:
> @ProcessElement public void process(@Element InputT element, 
> @SideInput("tag1") String input1, @SideInput("tag2") Integer input2) 
> This requires the existing process-method analysis framework to capture these 
> side inputs. The ParDo code would have to verify the type of the side input 
> and include them in the list of side inputs. This would also eliminate the 
> need for the user to explicitly call withSideInputs on the ParDo.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-6857) Support dynamic timers

2019-10-31 Thread Shehzaad Nakhoda (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6857 started by Shehzaad Nakhoda.
--
> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
> DoFn() {
>   @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...);
>   @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...);
>                 .. set timers in processElement
>    @OnTimer("timer1") public void onTimer1() \{ .}
>    @OnTimer("timer2") public void onTimer2() \{}
> }
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
> DoFn() {
>   @TimerId("timer") private final TimerSpec timer1 = 
> TimerSpecs.dynamicTimer(...);
>    @ProcessElement process(@TimerId("timer") DynamicTimer timer) {
>       timer.set("tag1'", ts);
>       timer.set("tag2", ts);
>     }
>    @OnTimer("timer") public void onTimer1(@TimerTag String tag) \{ .}
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7274) Protobuf Beam Schema support

2019-08-21 Thread Shehzaad Nakhoda (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912136#comment-16912136
 ] 

Shehzaad Nakhoda commented on BEAM-7274:


[~alexvanboxel] any update on this? I'm looking to get started on BEAM-4455 
which depends on this. Thanks in advance!

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-6857) Support dynamic timers

2019-07-20 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6857:
--

Assignee: Shehzaad Nakhoda

> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
> DoFn() {
>   @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...);
>   @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...);
>                 .. set timers in processElement
>    @OnTimer("timer1") public void onTimer1() \{ .}
>    @OnTimer("timer2") public void onTimer2() \{}
> }
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
> DoFn() {
>   @TimerId("timer") private final TimerSpec timer1 = 
> TimerSpecs.dynamicTimer(...);
>    @ProcessElement process(@TimerId("timer") DynamicTimer timer) {
>       timer.set("tag1'", ts);
>       timer.set("tag2", ts);
>     }
>    @OnTimer("timer") public void onTimer1(@TimerTag String tag) \{ .}
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-6694) ApproximateQuantiles transform for Python SDK

2019-07-18 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6694:
--

Assignee: Shehzaad Nakhoda

> ApproximateQuantiles transform for Python SDK
> -
>
> Key: BEAM-6694
> URL: https://issues.apache.org/jira/browse/BEAM-6694
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7019) Reify transform for Python SDK

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887640#comment-16887640
 ] 

Shehzaad Nakhoda commented on BEAM-7019:


[~reuvenlax][~altay] BEAM-7388 was filed and has been resolved already.

> Reify transform for Python SDK
> --
>
> Key: BEAM-7019
> URL: https://issues.apache.org/jira/browse/BEAM-7019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
> Fix For: 2.14.0
>
>
> PTransforms for converting between explicit and implicit form of various Beam
> values.
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7019) Reify transform for Python SDK

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda resolved BEAM-7019.

   Resolution: Duplicate
Fix Version/s: 2.14.0

> Reify transform for Python SDK
> --
>
> Key: BEAM-7019
> URL: https://issues.apache.org/jira/browse/BEAM-7019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
> Fix For: 2.14.0
>
>
> PTransforms for converting between explicit and implicit form of various Beam
> values.
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-7246) Create a Spanner IO for Python

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7246 started by Shehzaad Nakhoda.
--
> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-6855) Side inputs are not supported when using the state API

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6855:
--

Assignee: (was: Shehzaad Nakhoda)

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-6855) Side inputs are not supported when using the state API

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6855 started by Shehzaad Nakhoda.
--
> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-6855) Side inputs are not supported when using the state API

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6855:
--

Assignee: Shehzaad Nakhoda

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6675) The JdbcIO sink should accept schemas

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887535#comment-16887535
 ] 

Shehzaad Nakhoda commented on BEAM-6675:


[~reuvenlax] Can this be marked as resolved? Thanks.

> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-6858) Support side inputs injected into a DoFn

2019-07-15 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6858 started by Shehzaad Nakhoda.
--
> Support side inputs injected into a DoFn
> 
>
> Key: BEAM-6858
> URL: https://issues.apache.org/jira/browse/BEAM-6858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Beam currently supports injecting main inputs into a DoFn process method. A 
> user can write the following:
> @ProcessElement public void process(@Element InputT element)
> And Beam will (using ByteBuddy code generation) inject the input element into 
> the process method.
> We would like to also support the same for side inputs. For example:
> @ProcessElement public void process(@Element InputT element, 
> @SideInput("tag1") String input1, @SideInput("tag2") Integer input2) 
> This requires the existing process-method analysis framework to capture these 
> side inputs. The ParDo code would have to verify the type of the side input 
> and include them in the list of side inputs. This would also eliminate the 
> need for the user to explicitly call withSideInputs on the ParDo.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-6694) ApproximateQuantiles transform for Python SDK

2019-07-04 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6694 started by Shehzaad Nakhoda.
--
> ApproximateQuantiles transform for Python SDK
> -
>
> Key: BEAM-6694
> URL: https://issues.apache.org/jira/browse/BEAM-6694
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-7018) Regex transform for Python SDK

2019-07-04 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7018 started by Shehzaad Nakhoda.
--
> Regex transform for Python SDK
> --
>
> Key: BEAM-7018
> URL: https://issues.apache.org/jira/browse/BEAM-7018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> PTransorms to use Regular Expressions to process elements in a PCollection
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6694) ApproximateQuantiles transform for Python SDK

2019-07-04 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6694:
--

Assignee: (was: Shehzaad Nakhoda)

> ApproximateQuantiles transform for Python SDK
> -
>
> Key: BEAM-6694
> URL: https://issues.apache.org/jira/browse/BEAM-6694
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-6756) Support lazy iterables in schemas

2019-07-04 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6756 started by Shehzaad Nakhoda.
--
> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-6696) GroupIntoBatches transform for Python SDK

2019-06-14 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6696 started by Shehzaad Nakhoda.
--
> GroupIntoBatches transform for Python SDK
> -
>
> Key: BEAM-6696
> URL: https://issues.apache.org/jira/browse/BEAM-6696
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add a PTransform that batches inputs to a desired batch size. Batches will 
> contain only elements of a single key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java
> Unlike BatchElements transform 
> (https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/python/apache_beam/transforms/util.py#L461)
>  GroupIntoBatches will use state to batch across bundles as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-7019) Reify transform for Python SDK

2019-06-14 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7019 started by Shehzaad Nakhoda.
--
> Reify transform for Python SDK
> --
>
> Key: BEAM-7019
> URL: https://issues.apache.org/jira/browse/BEAM-7019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> PTransforms for converting between explicit and implicit form of various Beam
> values.
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-6674) The JdbcIO source should produce schemas

2019-06-14 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6674 started by Shehzaad Nakhoda.
--
> The JdbcIO source should produce schemas
> 
>
> Key: BEAM-6674
> URL: https://issues.apache.org/jira/browse/BEAM-6674
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-6675) The JdbcIO sink should accept schemas

2019-06-14 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6675 started by Shehzaad Nakhoda.
--
> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7021) ToString transform for Python SDK

2019-05-31 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-7021:
--

Assignee: Shehzaad Nakhoda

> ToString transform for Python SDK
> -
>
> Key: BEAM-7021
> URL: https://issues.apache.org/jira/browse/BEAM-7021
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>  Labels: starter
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> PTransforms for converting a PCollection or  PCollection
> Iterable to a PCollection String
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ToString.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7021) ToString transform for Python SDK

2019-05-31 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-7021:
--

Assignee: (was: Shehzaad Nakhoda)

> ToString transform for Python SDK
> -
>
> Key: BEAM-7021
> URL: https://issues.apache.org/jira/browse/BEAM-7021
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Priority: Minor
>  Labels: starter
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> PTransforms for converting a PCollection or  PCollection
> Iterable to a PCollection String
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ToString.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7021) ToString transform for Python SDK

2019-05-31 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852915#comment-16852915
 ] 

Shehzaad Nakhoda commented on BEAM-7021:


[~altay] can we close this please?

> ToString transform for Python SDK
> -
>
> Key: BEAM-7021
> URL: https://issues.apache.org/jira/browse/BEAM-7021
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>  Labels: starter
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> PTransforms for converting a PCollection or  PCollection
> Iterable to a PCollection String
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ToString.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7246) Create a Spanner IO for Python

2019-05-22 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846264#comment-16846264
 ] 

Shehzaad Nakhoda commented on BEAM-7246:


Chamikara, Yes - I’m hoping to have a design shared soon. The Java 
implementation will be the inspiration - (like most Python stuff in Beam!) 
Thanks for the pointer.

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (BEAM-6675) The JdbcIO sink should accept schemas

2019-05-16 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-6675:
---
Comment: was deleted

(was: [~reuvenlax] For understanding the scope a bit better, could you point to 
an example of IOs that accept schemas.)

> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (BEAM-6674) The JdbcIO source should produce schemas

2019-05-16 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-6674:
---
Comment: was deleted

(was: [~reuvenlax] A couple of sentences of elaboration would help scope this a 
bit better. For example, pointers to existing IOs that produce schemas. thanks)

> The JdbcIO source should produce schemas
> 
>
> Key: BEAM-6674
> URL: https://issues.apache.org/jira/browse/BEAM-6674
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6675) The JdbcIO sink should accept schemas

2019-05-16 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841241#comment-16841241
 ] 

Shehzaad Nakhoda commented on BEAM-6675:


[~reuvenlax] For understanding the scope a bit better, could you point to an 
example of IOs that accept schemas.

> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6674) The JdbcIO source should produce schemas

2019-05-16 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841240#comment-16841240
 ] 

Shehzaad Nakhoda edited comment on BEAM-6674 at 5/16/19 11:38 AM:
--

[~reuvenlax] A couple of sentences of elaboration would help scope this a bit 
better. For example, pointers to existing IOs that produce schemas. thanks


was (Author: shehzaadn):
[~reuvenlax] A couple of sentences of elaboration would help point us in the 
right direction. For example, pointers to existing IOs that produce schemas.

> The JdbcIO source should produce schemas
> 
>
> Key: BEAM-6674
> URL: https://issues.apache.org/jira/browse/BEAM-6674
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7246) Create a Spanner IO for Python

2019-05-08 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-7246:
---
Description: 
Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
Testing in this work item will be in the form of DirectRunner tests and manual 
testing.

Integration and performance tests are a separate work item (not included here).


See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
Google Clound Spanner to the Database column for the Python/Batch row.

  was:
Add I/O support for Google Cloud Spanner for the Python SDK.
Testing in this work item will be in the form of DirectRunner tests and manual 
testing.

Integration and performance tests are a separate work item (not included here).


See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
Google Clound Spanner to the Database column for the Python/Batch row.


> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7246) Create a Spanner IO for Python

2019-05-08 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-7246:
---
Description: 
Add I/O support for Google Cloud Spanner for the Python SDK.
Testing in this work item will be in the form of DirectRunner tests and manual 
testing.

Integration and performance tests are a separate work item (not included here).


See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
Google Clound Spanner to the Database column for the Python/Batch row.

  was:
Add I/O support for Google Cloud Spanner for the Python SDK.
Integration and performance tests are a separate work item (not included here).

See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
Google Clound Spanner to the Database column for the Python/Batch row.


> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK.
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7246) Create a Spanner IO for Python

2019-05-08 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-7246:
---
Description: 
Add I/O support for Google Cloud Spanner for the Python SDK.
Integration and performance tests are a separate work item (not included here).

See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
Google Clound Spanner to the Database column for the Python/Batch row.

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-562) DoFn Reuse: Add new methods to DoFn

2019-05-08 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835919#comment-16835919
 ] 

Shehzaad Nakhoda commented on BEAM-562:
---

[~altay] thanks for the heads up. Can you please assign this to 
[~myffi...@gmail.com]?

> DoFn Reuse: Add new methods to DoFn
> ---
>
> Key: BEAM-562
> URL: https://issues.apache.org/jira/browse/BEAM-562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Labels: sdk-consistency
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Java SDK added setup and teardown methods to the DoFns. This makes DoFns 
> reusable and provide performance improvements. Python SDK should add support 
> for these new DoFn methods:
> Proposal doc: 
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7021) ToString transform for Python SDK

2019-05-07 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-7021:
--

Assignee: Shehzaad Nakhoda

> ToString transform for Python SDK
> -
>
> Key: BEAM-7021
> URL: https://issues.apache.org/jira/browse/BEAM-7021
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>  Labels: starter
>
> PTransforms for converting a PCollection or  PCollection
> Iterable to a PCollection String
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ToString.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7019) Reify transform for Python SDK

2019-05-07 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-7019:
--

Assignee: Shehzaad Nakhoda  (was: Ahmet Altay)

> Reify transform for Python SDK
> --
>
> Key: BEAM-7019
> URL: https://issues.apache.org/jira/browse/BEAM-7019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> PTransforms for converting between explicit and implicit form of various Beam
> values.
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7018) Regex transform for Python SDK

2019-05-07 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-7018:
--

Assignee: Shehzaad Nakhoda  (was: Ahmet Altay)

> Regex transform for Python SDK
> --
>
> Key: BEAM-7018
> URL: https://issues.apache.org/jira/browse/BEAM-7018
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> PTransorms to use Regular Expressions to process elements in a PCollection
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6696) GroupIntoBatches transform for Python SDK

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6696:
--

Assignee: Shehzaad Nakhoda

> GroupIntoBatches transform for Python SDK
> -
>
> Key: BEAM-6696
> URL: https://issues.apache.org/jira/browse/BEAM-6696
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add a PTransform that batches inputs to a desired batch size. Batches will 
> contain only elements of a single key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java
> Unlike BatchElements transform 
> (https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/python/apache_beam/transforms/util.py#L461)
>  GroupIntoBatches will use state to batch across bundles as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-562) DoFn Reuse: Add new methods to DoFn

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-562:
-

Assignee: Shehzaad Nakhoda

> DoFn Reuse: Add new methods to DoFn
> ---
>
> Key: BEAM-562
> URL: https://issues.apache.org/jira/browse/BEAM-562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Labels: sdk-consistency
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Java SDK added setup and teardown methods to the DoFns. This makes DoFns 
> reusable and provide performance improvements. Python SDK should add support 
> for these new DoFn methods:
> Proposal doc: 
> https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f#



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6858) Support side inputs injected into a DoFn

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6858:
--

Assignee: Shehzaad Nakhoda

> Support side inputs injected into a DoFn
> 
>
> Key: BEAM-6858
> URL: https://issues.apache.org/jira/browse/BEAM-6858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Beam currently supports injecting main inputs into a DoFn process method. A 
> user can write the following:
> @ProcessElement public void process(@Element InputT element)
> And Beam will (using ByteBuddy code generation) inject the input element into 
> the process method.
> We would like to also support the same for side inputs. For example:
> @ProcessElement public void process(@Element InputT element, 
> @SideInput("tag1") String input1, @SideInput("tag2") Integer input2) 
> This requires the existing process-method analysis framework to capture these 
> side inputs. The ParDo code would have to verify the type of the side input 
> and include them in the list of side inputs. This would also eliminate the 
> need for the user to explicitly call withSideInputs on the ParDo.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6694) ApproximateQuantiles transform for Python SDK

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6694:
--

Assignee: Shehzaad Nakhoda

> ApproximateQuantiles transform for Python SDK
> -
>
> Key: BEAM-6694
> URL: https://issues.apache.org/jira/browse/BEAM-6694
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Shehzaad Nakhoda
>Priority: Minor
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6855) Side inputs are not supported when using the state API

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6855:
--

Assignee: Shehzaad Nakhoda

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6856) Support dynamic MapState on Dataflow

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6856:
--

Assignee: Shehzaad Nakhoda

> Support dynamic MapState on Dataflow
> 
>
> Key: BEAM-6856
> URL: https://issues.apache.org/jira/browse/BEAM-6856
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6756) Support lazy iterables in schemas

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6756:
--

Assignee: Shehzaad Nakhoda  (was: Reuven Lax)

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6674) The JdbcIO source should produce schemas

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6674:
--

Assignee: Shehzaad Nakhoda

> The JdbcIO source should produce schemas
> 
>
> Key: BEAM-6674
> URL: https://issues.apache.org/jira/browse/BEAM-6674
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6675) The JdbcIO sink should accept schemas

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6675:
--

Assignee: Shehzaad Nakhoda

> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6673) BigQueryIO.Read should automatically produce schemas

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6673:
--

Assignee: Shehzaad Nakhoda

> BigQueryIO.Read should automatically produce schemas
> 
>
> Key: BEAM-6673
> URL: https://issues.apache.org/jira/browse/BEAM-6673
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The output PCollections should contain 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4455) Provide automatic schema registration for Protos

2019-05-05 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-4455:
--

Assignee: Shehzaad Nakhoda  (was: Reuven Lax)

> Provide automatic schema registration for Protos
> 
>
> Key: BEAM-4455
> URL: https://issues.apache.org/jira/browse/BEAM-4455
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Need to make sure this is a compatible change



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)