[jira] (BEAM-1354) TextIO should comply with PTransform style guide

2017-01-31 Thread Aviem Zur (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Aviem Zur updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Beam /  BEAM-1354 
 
 
 
  TextIO should comply with PTransform style guide  
 
 
 
 
 
 
 
 
 

Change By:
 
 Aviem Zur 
 
 
 
 
 
 
 
 
 
 * TextIO.Read,Write.Bound,Unbound - Bound,Unbound are banned names. Instead, TextIO should make the user specify the type parameters explicitly, and have simply TextIO.Read and TextIO.Write themselves be the transform tyles.* Both should take simply String, and not use Coder as a general-purpose serialization mechanism.* * The Javadoc should be changed to reflect this.*  Should perhaps use AutoValue for parameter builders 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2017-02-01 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848318#comment-15848318
 ] 

Aviem Zur commented on BEAM-1092:
-

[~davor] I agree that a general solution requires some thought.

Regarding KafkaIO this should probably be shaded in its pom for now as this is 
commonly used in clusters which may have a different version of guava.
I myself had my application crash in a Spark cluster due to KafkaIO not having 
its guava dependency shaded.

While it is true that users can shade their applications themselves it might be 
too much to expect them to go through this process, figure out that they have a 
transient dependency on a different version of guava then shade it.

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1379) Shade Guava in beam-sdks-java-io-kafka module

2017-02-02 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850030#comment-15850030
 ] 

Aviem Zur commented on BEAM-1379:
-

Kafka IO is commonly used in clusters which may have a different version of 
Guava.

I myself had my application crash in a Spark cluster due to KafkaIO not having 
its guava dependency shaded.
While it is true that users can shade their applications themselves it might be 
too much to expect them to go through this process, figure out that they have a 
transient dependency on a different version of guava then shade it.

> Shade Guava in beam-sdks-java-io-kafka module
> -
>
> Key: BEAM-1379
> URL: https://issues.apache.org/jira/browse/BEAM-1379
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Shade Guava in Kafka IO module to avoid collisions with Guava versions 
> supplied in different clusters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1379) Shade Guava in beam-sdks-java-io-kafka module

2017-02-02 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1379:
---

 Summary: Shade Guava in beam-sdks-java-io-kafka module
 Key: BEAM-1379
 URL: https://issues.apache.org/jira/browse/BEAM-1379
 Project: Beam
  Issue Type: Task
  Components: sdk-java-extensions
Reporter: Aviem Zur
Assignee: Davor Bonaci


Shade Guava in Kafka IO module to avoid collisions with Guava versions supplied 
in different clusters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2017-02-02 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850033#comment-15850033
 ] 

Aviem Zur commented on BEAM-1092:
-

[~dhalp...@google.com] Sure.
See: https://issues.apache.org/jira/browse/BEAM-1379 and 
https://github.com/apache/beam/pull/1906

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1354) TextIO should comply with PTransform style guide

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1354:
---

Assignee: Aviem Zur

> TextIO should comply with PTransform style guide
> 
>
> Key: BEAM-1354
> URL: https://issues.apache.org/jira/browse/BEAM-1354
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Aviem Zur
>  Labels: backward-incompatible
>
> * TextIO.Read,Write.Bound,Unbound - Bound,Unbound are banned names. Instead, 
> TextIO should make the user specify the type parameters explicitly, and have 
> simply TextIO.Read and TextIO.Write themselves be the transform tyles.
> * Both should take simply String, and not use Coder as a general-purpose 
> serialization mechanism.
> ** The Javadoc should be changed to reflect this.
> * Should perhaps use AutoValue for parameter builders



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1397) Introduce IO metrics

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1397:
---

Assignee: Aviem Zur  (was: Daniel Halperin)

> Introduce IO metrics
> 
>
> Key: BEAM-1397
> URL: https://issues.apache.org/jira/browse/BEAM-1397
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Introduce the usage of metrics API in IOs.
> POC using {{CountingInput}}:
> * Add metrics to {{CountingInput}}
> * {{RunnableOnService}} test which creates a pipeline which asserts these 
> metrics.
> * Close any gaps in Direct runner and Spark runner to support these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1397) Introduce IO metrics

2017-02-05 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1397:
---

 Summary: Introduce IO metrics
 Key: BEAM-1397
 URL: https://issues.apache.org/jira/browse/BEAM-1397
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-core
Reporter: Aviem Zur
Assignee: Davor Bonaci


Introduce the usage of metrics API in IOs.

POC using {{CountingInput}}:
* Add metrics to {{CountingInput}}
* {{RunnableOnService}} test which creates a pipeline which asserts these 
metrics.
* Close any gaps in Direct runner and Spark runner to support these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1397) Introduce IO metrics

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1397:
---

Assignee: Daniel Halperin  (was: Davor Bonaci)

> Introduce IO metrics
> 
>
> Key: BEAM-1397
> URL: https://issues.apache.org/jira/browse/BEAM-1397
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>
> Introduce the usage of metrics API in IOs.
> POC using {{CountingInput}}:
> * Add metrics to {{CountingInput}}
> * {{RunnableOnService}} test which creates a pipeline which asserts these 
> metrics.
> * Close any gaps in Direct runner and Spark runner to support these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1398) KafkaIO metrics

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1398:
---

Assignee: Aviem Zur  (was: Daniel Halperin)

> KafkaIO metrics
> ---
>
> Key: BEAM-1398
> URL: https://issues.apache.org/jira/browse/BEAM-1398
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Add metrics to {{KafkaIO}} using the metrics API.
> Metrics (Feel free to add more ideas here) per topic/split (Where applicable):
> * Backlog in bytes.
> * Backlog in number of messages.
> * Messages consumed.
> * Bytes consumed.
> * Error counts.
> * Messages produced.
> * Time spent reading.
> * Time spent writing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1398) KafkaIO metrics

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1398:
---

Assignee: Daniel Halperin  (was: Davor Bonaci)

> KafkaIO metrics
> ---
>
> Key: BEAM-1398
> URL: https://issues.apache.org/jira/browse/BEAM-1398
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>
> Add metrics to {{KafkaIO}} using the metrics API.
> Metrics (Feel free to add more ideas here) per topic/split (Where applicable):
> * Backlog in bytes.
> * Backlog in number of messages.
> * Messages consumed.
> * Bytes consumed.
> * Error counts.
> * Messages produced.
> * Time spent reading.
> * Time spent writing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1398) KafkaIO metrics

2017-02-05 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1398:
---

 Summary: KafkaIO metrics
 Key: BEAM-1398
 URL: https://issues.apache.org/jira/browse/BEAM-1398
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-extensions
Reporter: Aviem Zur
Assignee: Davor Bonaci


Add metrics to {{KafkaIO}} using the metrics API.

Metrics (Feel free to add more ideas here) per topic/split (Where applicable):
* Backlog in bytes.
* Backlog in number of messages.
* Messages consumed.
* Bytes consumed.
* Error counts.
* Messages produced.
* Time spent reading.
* Time spent writing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1126.
-
   Resolution: Invalid
Fix Version/s: Not applicable

> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: Not applicable
>
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1334) Split UsesMetrics category and tests into UsesCommittedMetrics and UsesAttemptedMetrics

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1334.
-
   Resolution: Done
Fix Version/s: 0.6.0

> Split UsesMetrics category and tests into UsesCommittedMetrics and 
> UsesAttemptedMetrics
> ---
>
> Key: BEAM-1334
> URL: https://issues.apache.org/jira/browse/BEAM-1334
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
> Fix For: 0.6.0
>
>
> Some runners may not be able to implement both {{committed}} and 
> {{attempted}} results in {{MetricResult}}.
> Seeing this, split the current {{RunnableOnService}} test 
> {{org.apache.beam.sdk.metrics.MetricsTest#metricsReportToQuery}} into two 
> tests, which test attempted and committed results separately with categories 
> {{UsesCommittedMetrics}} and {{UsesAttemptedMetrics}} instead of the current 
> category {{UsesMetrics}}.
> Discussion that led to this can be seen in the this 
> [PR|https://github.com/apache/beam/pull/1750#issuecomment-275412984]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1379) Shade Guava in beam-sdks-java-io-kafka module

2017-02-05 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1379.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> Shade Guava in beam-sdks-java-io-kafka module
> -
>
> Key: BEAM-1379
> URL: https://issues.apache.org/jira/browse/BEAM-1379
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
> Fix For: 0.6.0
>
>
> Shade Guava in Kafka IO module to avoid collisions with Guava versions 
> supplied in different clusters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1433) Remove coder from TextIO

2017-02-07 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1433:
---

 Summary: Remove coder from TextIO
 Key: BEAM-1433
 URL: https://issues.apache.org/jira/browse/BEAM-1433
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-java-core
Reporter: Aviem Zur
Assignee: Aviem Zur


Remove coder usage in TextIO.
TextIO should only deal with Strings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1402) Make TextIO and AvroIO use best-practice types.

2017-02-07 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857408#comment-15857408
 ] 

Aviem Zur commented on BEAM-1402:
-

OK. Created sub-task BEAM-1433 and PR.

> Make TextIO and AvroIO use best-practice types.
> ---
>
> Key: BEAM-1402
> URL: https://issues.apache.org/jira/browse/BEAM-1402
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>
> Replace static Read/Write classes with type-instantiated classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1402) Make TextIO and AvroIO use best-practice types.

2017-02-07 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857339#comment-15857339
 ] 

Aviem Zur commented on BEAM-1402:
-

I've started working on some of what [~jkff] mentioned in BEAM-1354
Namely, I've removed coder from {{TextIO}} in 
https://github.com/aviemzur/beam/commit/2b07ad168c58755cad3a12aceed19df814e9904b
Can this effort be merged?

> Make TextIO and AvroIO use best-practice types.
> ---
>
> Key: BEAM-1402
> URL: https://issues.apache.org/jira/browse/BEAM-1402
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>
> Replace static Read/Write classes with type-instantiated classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1354) TextIO should comply with PTransform style guide

2017-02-07 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857335#comment-15857335
 ] 

Aviem Zur commented on BEAM-1354:
-

I've started working on this (locally). 
I've gone ahead with removing the coder from TextIO.
Can this effort be merged?

> TextIO should comply with PTransform style guide
> 
>
> Key: BEAM-1354
> URL: https://issues.apache.org/jira/browse/BEAM-1354
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Aviem Zur
>  Labels: backward-incompatible
> Fix For: Not applicable
>
>
> * TextIO.Read,Write.Bound,Unbound - Bound,Unbound are banned names. Instead, 
> TextIO should make the user specify the type parameters explicitly, and have 
> simply TextIO.Read and TextIO.Write themselves be the transform tyles.
> * Both should take simply String, and not use Coder as a general-purpose 
> serialization mechanism.
> ** The Javadoc should be changed to reflect this.
> * Should perhaps use AutoValue for parameter builders



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1334) Split UsesMetrics category and tests into UsesCommittedMetrics and UsesAttemptedMetrics

2017-01-28 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1334:
---

Assignee: Aviem Zur  (was: Ben Chambers)

> Split UsesMetrics category and tests into UsesCommittedMetrics and 
> UsesAttemptedMetrics
> ---
>
> Key: BEAM-1334
> URL: https://issues.apache.org/jira/browse/BEAM-1334
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Some runners may not be able to implement both {{committed}} and 
> {{attempted}} results in {{MetricResult}}.
> Seeing this, split the current {{RunnableOnService}} test 
> {{org.apache.beam.sdk.metrics.MetricsTest#metricsReportToQuery}} into two 
> tests, which test attempted and committed results separately with categories 
> {{UsesCommittedMetrics}} and {{UsesAttemptedMetrics}} instead of the current 
> category {{UsesMetrics}}.
> Discussion that led to this can be seen in the this 
> [PR|https://github.com/apache/beam/pull/1750#issuecomment-275412984]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1334) Split UsesMetrics category and tests into UsesCommittedMetrics and UsesAttemptedMetrics

2017-01-28 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1334:
---

 Summary: Split UsesMetrics category and tests into 
UsesCommittedMetrics and UsesAttemptedMetrics
 Key: BEAM-1334
 URL: https://issues.apache.org/jira/browse/BEAM-1334
 Project: Beam
  Issue Type: Task
  Components: sdk-java-core
Reporter: Aviem Zur
Assignee: Davor Bonaci
Priority: Minor


Some runners may not be able to implement both {{committed}} and {{attempted}} 
results in {{MetricResult}}.

Seeing this, split the current {{RunnableOnService}} test 
{{org.apache.beam.sdk.metrics.MetricsTest#metricsReportToQuery}} into two 
tests, which test attempted and committed results separately with categories 
{{UsesCommittedMetrics}} and {{UsesAttemptedMetrics}} instead of the current 
category {{UsesMetrics}}.

Discussion that led to this can be seen in the this 
[PR|https://github.com/apache/beam/pull/1750#issuecomment-275412984]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1334) Split UsesMetrics category and tests into UsesCommittedMetrics and UsesAttemptedMetrics

2017-01-28 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1334:

Assignee: Ben Chambers  (was: Davor Bonaci)

> Split UsesMetrics category and tests into UsesCommittedMetrics and 
> UsesAttemptedMetrics
> ---
>
> Key: BEAM-1334
> URL: https://issues.apache.org/jira/browse/BEAM-1334
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Ben Chambers
>Priority: Minor
>
> Some runners may not be able to implement both {{committed}} and 
> {{attempted}} results in {{MetricResult}}.
> Seeing this, split the current {{RunnableOnService}} test 
> {{org.apache.beam.sdk.metrics.MetricsTest#metricsReportToQuery}} into two 
> tests, which test attempted and committed results separately with categories 
> {{UsesCommittedMetrics}} and {{UsesAttemptedMetrics}} instead of the current 
> category {{UsesMetrics}}.
> Discussion that led to this can be seen in the this 
> [PR|https://github.com/apache/beam/pull/1750#issuecomment-275412984]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] (BEAM-1344) Uniform metrics step name semantics

2017-01-29 Thread Aviem Zur (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Aviem Zur created an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Beam /  BEAM-1344 
 
 
 
  Uniform metrics step name semantics  
 
 
 
 
 
 
 
 
 

Issue Type:
 
  Improvement 
 
 
 

Assignee:
 
 Davor Bonaci 
 
 
 

Components:
 

 sdk-java-core 
 
 
 

Created:
 

 29/Jan/17 15:14 
 
 
 

Priority:
 
  Major 
 
 
 

Reporter:
 
 Aviem Zur 
 
 
 
 
 
 
 
 
 
 
Agree on and implement uniform metrics step name semantics which runners would adhere to. 
Current discussion seems to point at a string with the pipeline graph path to the step's transform. Something along the lines of: "PBegin/SomeInputTransform/SomeParDo/...MyTransform.#Running_number_for_collisions". 
Also agree on and implement metrics querying semantics. Current discussion seems to point at a substring or regex matching of steps on given string input. 
Original dev list discussion 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 

[jira] [Updated] (BEAM-1465) No natural place to flush/close resources in FileBasedWriter

2017-02-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1465:

Summary: No natural place to flush/close resources in FileBasedWriter  
(was: No natural place to flush outputs in FileBasedWriter)

> No natural place to flush/close resources in FileBasedWriter
> 
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>
> {{FileBasedWriter}} API does not have a natural place to flush/close 
> resources opened by the writer.
> For example, if you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1465) No natural place to flush/close resources in FileBasedWriter

2017-02-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1465:
---

Assignee: Aviem Zur  (was: Daniel Halperin)

> No natural place to flush/close resources in FileBasedWriter
> 
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> {{FileBasedWriter}} API does not have a natural place to flush/close 
> resources opened by the writer.
> For example, if you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1465) No natural place to flush outputs in FileBasedWriter

2017-02-21 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1465:

Description: 
{{FileBasedWriter}} API does not have a natural place to flush/close resources 
opened by the writer.

For example, if you create an {{OutputStream}} in your {{Writer}} using the 
{{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there is 
no natural place to call its {{flush()}} method.

Maybe something like {{finishWrite()}} to match the existing 
{{prepareWrite(WritableByteChannel channel)}} can work.

  was:
{{FileBasedWriter}} API does not have a natural place to flush output streams.

If you create an {{OutputStream}} in your {{Writer}} using the 
{{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there is 
no natural place to call its {{flush()}} method.

Maybe something like {{finishWrite()}} to match the existing 
{{prepareWrite(WritableByteChannel channel)}} can work.


> No natural place to flush outputs in FileBasedWriter
> 
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>
> {{FileBasedWriter}} API does not have a natural place to flush/close 
> resources opened by the writer.
> For example, if you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1398) KafkaIO metrics

2017-02-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1398:

Description: 
Add metrics to {{KafkaIO}} using the metrics API.

Metrics (Feel free to add more ideas here) per split (Where applicable):
* Backlog in bytes.
* Backlog in number of messages.
* Messages consumed.
* Bytes consumed.
* Error counts.
* Messages produced.
* Time spent reading.
* Time spent writing.

Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
values.

  was:
Add metrics to {{KafkaIO}} using the metrics API.

Metrics (Feel free to add more ideas here) per topic/split (Where applicable):
* Backlog in bytes.
* Backlog in number of messages.
* Messages consumed.
* Bytes consumed.
* Error counts.
* Messages produced.
* Time spent reading.
* Time spent writing.

Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
values.


> KafkaIO metrics
> ---
>
> Key: BEAM-1398
> URL: https://issues.apache.org/jira/browse/BEAM-1398
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Add metrics to {{KafkaIO}} using the metrics API.
> Metrics (Feel free to add more ideas here) per split (Where applicable):
> * Backlog in bytes.
> * Backlog in number of messages.
> * Messages consumed.
> * Bytes consumed.
> * Error counts.
> * Messages produced.
> * Time spent reading.
> * Time spent writing.
> Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Description: 
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]

  was:
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.


> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Description: 
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.

  was:
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.


> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-Dquick' specified. Enable them otherwise

2017-02-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Description: 
Skip slower verifications (checkstyle, rat and findbugs) if '-Dquick' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-Dquick' is specified.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]

  was:
Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command. Enable them otherwise.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.
Should still be able to force these verifications with '-Prelease' as before, 
even if '-DskipTests' is specified.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]


> Skip slower verifications if '-Dquick' specified. Enable them otherwise
> ---
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-Dquick' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-Dquick' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-Dquick' specified

2017-02-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Summary: Skip slower verifications if '-Dquick' specified  (was: Skip 
slower verifications if '-DskipTests' specified)

> Skip slower verifications if '-Dquick' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command. Enable them otherwise.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-DskipTests' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1446) Create should take a TypeDescriptor as an alternative to explicitly specifying the Coder

2017-02-23 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1446:
---

Assignee: Aviem Zur

> Create should take a TypeDescriptor as an alternative to explicitly 
> specifying the Coder
> 
>
> Key: BEAM-1446
> URL: https://issues.apache.org/jira/browse/BEAM-1446
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Aviem Zur
>Priority: Minor
>
> {{getDefaultCreateCoder}} is provided with the Pipeline's {{CoderRegistry}}, 
> which enables it to use standard Coder Inference. For the construction of the 
> Default Create Coder, explicitly providing the TypeDescriptor allows it to 
> ask the CoderRegistry directly rather than attempting to reconstruct the 
> TypeDescriptor based on the elements within the Create.
> This also makes some coder specifications significantly more terse, as the 
> type signature must be respecified but the entire coder need not be 
> constructed (e.g. {{KvCoder.of(VarIntCoder.of(), StringUtf8Coder.of());}} 
> becomes {{new TypeDescriptor>() {};}}, which is at least 
> somewhat simpler to type out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1379) Shade Guava in beam-sdks-java-io-kafka module

2017-02-24 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1379:
---

Assignee: Aviem Zur  (was: Davor Bonaci)

> Shade Guava in beam-sdks-java-io-kafka module
> -
>
> Key: BEAM-1379
> URL: https://issues.apache.org/jira/browse/BEAM-1379
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> Shade Guava in Kafka IO module to avoid collisions with Guava versions 
> supplied in different clusters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1061) PreCommit test with side inputs

2017-02-24 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1061:
---

Assignee: (was: Aviem Zur)

> PreCommit test with side inputs
> ---
>
> Key: BEAM-1061
> URL: https://issues.apache.org/jira/browse/BEAM-1061
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Daniel Halperin
>
> We should have at least one precommit integration test that exercises side 
> inputs on all runners. Existing tests exercise sources, files, per-key, 
> combiners, windowing, ...; side inputs is one lacking part of the model it 
> would be nice to touch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1146) Decrease spark runner startup overhead

2017-02-24 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1146:

Description: 
BEAM-921 introduced a lazy singleton instantiated once in each JVM (driver & 
executors) which utilizes reflection to find all subclasses of Source and Coder
While this is beneficial in it's own right, the change added about one minute 
of overhead in spark runner startup time (which cause the first job/stage to 
take up to a minute).
The change is in class {{BeamSparkRunnerRegistrator}}
The reason reflection (specifically reflections library) was used here is 
because  there is no current way of knowing all the source and coder classes at 
runtime.

  was:
BEAM-921 introduced a lazy singleton instantiated once in each machine (driver 
& executors) which utilizes reflection to find all subclasses of Source and 
Coder
While this is beneficial in it's own right, the change added about one minute 
of overhead in spark runner startup time (which cause the first job/stage to 
take up to a minute).
The change is in class {{BeamSparkRunnerRegistrator}}
The reason reflection (specifically reflections library) was used here is 
because  there is no current way of knowing all the source and coder classes at 
runtime.


> Decrease spark runner startup overhead
> --
>
> Key: BEAM-1146
> URL: https://issues.apache.org/jira/browse/BEAM-1146
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.5.0
>
>
> BEAM-921 introduced a lazy singleton instantiated once in each JVM (driver & 
> executors) which utilizes reflection to find all subclasses of Source and 
> Coder
> While this is beneficial in it's own right, the change added about one minute 
> of overhead in spark runner startup time (which cause the first job/stage to 
> take up to a minute).
> The change is in class {{BeamSparkRunnerRegistrator}}
> The reason reflection (specifically reflections library) was used here is 
> because  there is no current way of knowing all the source and coder classes 
> at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1145) Remove classifier from shaded spark runner artifact

2017-02-24 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1145:

Description: 
Shade plugin configured in spark runner's pom adds a classifier to spark runner 
shaded jar
{code:xml}
true
spark-app
{code}
This means, that in order for a user application that is dependent on 
spark-runner to work in cluster mode, they have to add the classifier in their 
dependency declaration, like so:
{code:xml}

org.apache.beam
beam-runners-spark
0.4.0-incubating-SNAPSHOT
spark-app

{code}
Otherwise, if they do not specify classifier, the jar they get is unshaded, 
which in cluster mode, causes collisions between different guava versions.
Example exception in cluster mode when adding the dependency without classifier:
{code}
16/12/12 06:58:56 WARN TaskSetManager: Lost task 4.0 in stage 8.0 (TID 153, 
***.com): java.lang.NoSuchMethodError: 
com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
at 
org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:137)
at 
org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:98)
at 
org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:180)
at 
org.apache.spark.streaming.StateSpec$$anonfun$1.apply(StateSpec.scala:179)
at 
org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:57)
at 
org.apache.spark.streaming.rdd.MapWithStateRDDRecord$$anonfun$updateRecordWithData$1.apply(MapWithStateRDD.scala:55)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at 
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at 
org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
at 
org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:155)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
at 
org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:149)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:275)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
I would suggest that the classifier be removed from the shaded jar, to avoid 
confusion among users, and have a better user experience.

P.S. Looks like Dataflow runner does not add a classifier to its shaded jar.

  was:
Shade plugin configured in spark runner's pom adds a classifier to spark runner 
shaded jar
{code:xml}
true
spark-app
{code}
This means, that in order for a user application that is dependent on 
spark-runner to work in cluster mode, they have to add the classifier in their 
dependency declaration, like so:
{code:xml}

org.apache.beam
beam-runners-spark
0.4.0-incubating-SNAPSHOT
spark-app

{code}
Otherwise, if they do not specify classifier, the jar they get is unshaded, 
which in cluster mode, causes collisions between different guava versions.
Example exception in cluster mode when adding the dependency without classifier:
{code}
16/12/12 06:58:56 WARN TaskSetManager: Lost task 4.0 in stage 8.0 (TID 153, 
lvsriskng02.lvs.paypal.com): java.lang.NoSuchMethodError: 
com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
at 
org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:137)
at 
org.apache.beam.runners.spark.stateful.StateSpecFunctions$1.apply(StateSpecFunctions.java:98)
at 

[jira] [Updated] (BEAM-1146) Decrease spark runner startup overhead

2017-02-24 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1146:

Description: 
BEAM-921 introduced a lazy singleton instantiated once in each JVM (driver & 
executors) which utilizes reflection to find all subclasses of Source and Coder
While this is beneficial in its own right, the change added about one minute of 
overhead in spark runner startup time (which cause the first job/stage to take 
up to a minute).
The change is in class {{BeamSparkRunnerRegistrator}}.

  was:
BEAM-921 introduced a lazy singleton instantiated once in each JVM (driver & 
executors) which utilizes reflection to find all subclasses of Source and Coder
While this is beneficial in it's own right, the change added about one minute 
of overhead in spark runner startup time (which cause the first job/stage to 
take up to a minute).
The change is in class {{BeamSparkRunnerRegistrator}}
The reason reflection (specifically reflections library) was used here is 
because  there is no current way of knowing all the source and coder classes at 
runtime.


> Decrease spark runner startup overhead
> --
>
> Key: BEAM-1146
> URL: https://issues.apache.org/jira/browse/BEAM-1146
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.5.0
>
>
> BEAM-921 introduced a lazy singleton instantiated once in each JVM (driver & 
> executors) which utilizes reflection to find all subclasses of Source and 
> Coder
> While this is beneficial in its own right, the change added about one minute 
> of overhead in spark runner startup time (which cause the first job/stage to 
> take up to a minute).
> The change is in class {{BeamSparkRunnerRegistrator}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1465) No natural place to flush resources in FileBasedWriter

2017-02-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1465:

Description: 
{{FileBasedWriter}} API does not have a natural place to flush resources opened 
by the writer.

For example, if you create an {{OutputStream}} in your {{Writer}} using the 
{{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there is 
no natural place to call its {{flush()}} method.

Maybe something like {{finishWrite()}} to match the existing 
{{prepareWrite(WritableByteChannel channel)}} can work.

  was:
{{FileBasedWriter}} API does not have a natural place to flush/close resources 
opened by the writer.

For example, if you create an {{OutputStream}} in your {{Writer}} using the 
{{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there is 
no natural place to call its {{flush()}} method.

Maybe something like {{finishWrite()}} to match the existing 
{{prepareWrite(WritableByteChannel channel)}} can work.


> No natural place to flush resources in FileBasedWriter
> --
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> {{FileBasedWriter}} API does not have a natural place to flush resources 
> opened by the writer.
> For example, if you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1512) Optimize leaf transforms materialization

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1512:

Summary: Optimize leaf transforms materialization  (was: Optimize leaf 
transformation materialization)

> Optimize leaf transforms materialization
> 
>
> Key: BEAM-1512
> URL: https://issues.apache.org/jira/browse/BEAM-1512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Amit Sela
>
> Optimize leaf materialization in {{EvaluationContext}} Use register for 
> DStream leaves and an empty {{foreachPartition}} for other leaves instead of 
> the current {{count()}} which adds overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1512) Optimize leaf transforms materialization

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1512:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Optimize leaf transforms materialization
> 
>
> Key: BEAM-1512
> URL: https://issues.apache.org/jira/browse/BEAM-1512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Optimize leaf materialization in {{EvaluationContext}} Use register for 
> DStream leaves and an empty {{foreachPartition}} for other leaves instead of 
> the current {{count()}} which adds overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1513:
---

 Summary: Skip slower verifications if '-DskipTests' specified
 Key: BEAM-1513
 URL: https://issues.apache.org/jira/browse/BEAM-1513
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Aviem Zur
Assignee: Davor Bonaci


Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
specified in the maven command.
The reasoning behind this is usually if you're skipping tests you're in a hurry 
to build and do not want to go through the slower verifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1513:
---

Assignee: Aviem Zur  (was: Davor Bonaci)

> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1513) Skip slower verifications if '-DskipTests' specified

2017-02-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1513:

Priority: Minor  (was: Major)

> Skip slower verifications if '-DskipTests' specified
> 
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-DskipTests' was 
> specified in the maven command.
> The reasoning behind this is usually if you're skipping tests you're in a 
> hurry to build and do not want to go through the slower verifications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1356) `mvn clean install -DskipTests` still runs python tests

2017-02-20 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1356:
---

Assignee: Aviem Zur  (was: Ahmet Altay)

> `mvn clean install -DskipTests` still runs python tests
> ---
>
> Key: BEAM-1356
> URL: https://issues.apache.org/jira/browse/BEAM-1356
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Daniel Halperin
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> e.g
> Using 
> /Users/dhalperi/IdeaProjects/beam/sdks/python/.eggs/PyHamcrest-1.9.0-py2.7.egg
> running egg_info
> writing dependency_links to apache_beam_sdk.egg-info/dependency_links.txt
> writing top-level names to apache_beam_sdk.egg-info/top_level.txt
> writing requirements to apache_beam_sdk.egg-info/requires.txt
> writing entry points to apache_beam_sdk.egg-info/entry_points.txt
> writing apache_beam_sdk.egg-info/PKG-INFO
> reading manifest file 'apache_beam_sdk.egg-info/SOURCES.txt'
> reading manifest template 'MANIFEST.in'
> writing manifest file 'apache_beam_sdk.egg-info/SOURCES.txt'
> running build_ext
> test_str_utf8_coder (apache_beam.coders.coders_test.CodersTest) ... ok
> test_default_fallback_path (apache_beam.coders.coders_test.FallbackCoderTest)
> Test fallback path picks a matching coder if no coder is registered. ... 
> /Users/dhalperi/IdeaProjects/beam/sdks/python/apache_beam/coders/typecoders.py:136:
>  UserWarning: Using fallback coder for typehint:  'apache_beam.coders.coders_test.DummyClass'>.
>   warnings.warn('Using fallback coder for typehint: %r.' % typehint)
> ok
> test_basics (apache_beam.coders.coders_test.PickleCoderTest) ... ok
> test_equality (apache_beam.coders.coders_test.PickleCoderTest) ... ok
> test_proto_coder (apache_beam.coders.coders_test.ProtoCoderTest) ... 
> /Users/dhalperi/IdeaProjects/beam/sdks/python/apache_beam/coders/typecoders.py:136:
>  UserWarning: Using fallback coder for typehint:  'apache_beam.coders.proto2_coder_test_messages_pb2.MessageA'>.
>   warnings.warn('Using fallback coder for typehint: %r.' % typehint)
> ok
> test_base64_pickle_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_bytes_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_custom_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_deterministic_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_dill_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_fast_primitives_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_float_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_global_window_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_iterable_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_length_prefix_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_nested_observables (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_pickle_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_proto_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_singleton_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_timestamp_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_tuple_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_tuple_sequence_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok
> test_utf8_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_varint_coder (apache_beam.coders.coders_test_common.CodersTest) ... ok
> test_windowed_value_coder (apache_beam.coders.coders_test_common.CodersTest) 
> ... ok



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1513) Skip slower verifications if '-Dquick' specified. Enable them otherwise

2017-02-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1513:
---

Assignee: Davor Bonaci  (was: Aviem Zur)

> Skip slower verifications if '-Dquick' specified. Enable them otherwise
> ---
>
> Key: BEAM-1513
> URL: https://issues.apache.org/jira/browse/BEAM-1513
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>Priority: Minor
>
> Skip slower verifications (checkstyle, rat and findbugs) if '-Dquick' was 
> specified in the maven command. Enable them otherwise.
> Should still be able to force these verifications with '-Prelease' as before, 
> even if '-Dquick' is specified.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1398) KafkaIO metrics

2017-02-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1398:

Description: 
Add metrics to {{KafkaIO}} using the metrics API.

Metrics (Feel free to add more ideas here) per split (Where applicable):
* Backlog in bytes.
* Backlog in number of messages.
* Messages consumed.
* Bytes consumed.
* Messages produced.

Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
values.

  was:
Add metrics to {{KafkaIO}} using the metrics API.

Metrics (Feel free to add more ideas here) per split (Where applicable):
* Backlog in bytes.
* Backlog in number of messages.
* Messages consumed.
* Bytes consumed.
* Messages produced.
* Time spent reading.
* Time spent writing.

Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
values.


> KafkaIO metrics
> ---
>
> Key: BEAM-1398
> URL: https://issues.apache.org/jira/browse/BEAM-1398
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Add metrics to {{KafkaIO}} using the metrics API.
> Metrics (Feel free to add more ideas here) per split (Where applicable):
> * Backlog in bytes.
> * Backlog in number of messages.
> * Messages consumed.
> * Bytes consumed.
> * Messages produced.
> Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1465) No natural place to flush resources in FileBasedWriter

2017-02-22 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1465.
-
   Resolution: Fixed
Fix Version/s: 0.6.0

> No natural place to flush resources in FileBasedWriter
> --
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> {{FileBasedWriter}} API does not have a natural place to flush resources 
> opened by the writer.
> For example, if you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1437) Spark runner StreamingListeners are not recoverable.

2017-02-09 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1437:
---

Assignee: Aviem Zur

> Spark runner StreamingListeners are not recoverable.
> 
>
> Key: BEAM-1437
> URL: https://issues.apache.org/jira/browse/BEAM-1437
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> Spark runner registers user-provided StreamingListeners in 
> {{SparkRunnerStreamingContextFactory#create}} but this is WRONG since 
> recovery/resume path skips this code (on account that it should contain the 
> DStreamGraph creation only, which is checkpointed).
> In order any StreamingListener to recover they are probably better of here: 
> https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkRunner.java#L171
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1465) No natural place to flush outputs in FileBasedWriter

2017-02-10 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1465:
---

Assignee: Daniel Halperin  (was: Davor Bonaci)

> No natural place to flush outputs in FileBasedWriter
> 
>
> Key: BEAM-1465
> URL: https://issues.apache.org/jira/browse/BEAM-1465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>
> {{FileBasedWriter}} API does not have a natural place to flush output streams.
> If you create an {{OutputStream}} in your {{Writer}} using the 
> {{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there 
> is no natural place to call its {{flush()}} method.
> Maybe something like {{finishWrite()}} to match the existing 
> {{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1465) No natural place to flush outputs in FileBasedWriter

2017-02-10 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1465:
---

 Summary: No natural place to flush outputs in FileBasedWriter
 Key: BEAM-1465
 URL: https://issues.apache.org/jira/browse/BEAM-1465
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Aviem Zur
Assignee: Davor Bonaci


{{FileBasedWriter}} API does not have a natural place to flush output streams.

If you create an {{OutputStream}} in your {{Writer}} using the 
{{FileBasedSink}}'s channel and this {{OutputStream}} buffers outputs, there is 
no natural place to call its {{flush()}} method.

Maybe something like {{finishWrite()}} to match the existing 
{{prepareWrite(WritableByteChannel channel)}} can work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1466) JSON utils extension

2017-02-11 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1466:
---

Assignee: Aviem Zur  (was: Davor Bonaci)

> JSON utils extension
> 
>
> Key: BEAM-1466
> URL: https://issues.apache.org/jira/browse/BEAM-1466
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Create a JSON extension in Java extensions which will contain transforms to 
> aid with handling JSONs.
> Suggested transforms:
> * Parse JSON strings to type OutputT.
> * Parse InputT to JSON strings.
> * JSON format file based source.
> * JSON format file based sink.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1294) Long running UnboundedSource Readers via Broadcasts

2017-02-13 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1294:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Long running UnboundedSource Readers via Broadcasts
> ---
>
> Key: BEAM-1294
> URL: https://issues.apache.org/jira/browse/BEAM-1294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> When reading from an UnboundedSource, current implementation will cause each 
> split to create a new Reader every micro-batch.
> As long as the overhead of creating a reader is relatively low, it's 
> reasonable (though I'd still be happy to get rid of), but in cases where the 
> creation overhead is large it becomes unreasonable forcing large batches.
> One way to solve this could be to create a pool of lazy-init readers to serve 
> each executor, maybe via Broadcast variables. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1457) Enable rat plugin and findbugs plugin in default build

2017-02-13 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863223#comment-15863223
 ] 

Aviem Zur commented on BEAM-1457:
-

Not really sure what was decided around this issue.

> Enable rat plugin and findbugs plugin in default build
> --
>
> Key: BEAM-1457
> URL: https://issues.apache.org/jira/browse/BEAM-1457
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Today, maven rat plugin and findbugs plugin only run when `release` profile 
> is specified.
> Since these plugins do not add a large amount of time compared to the normal 
> build, and their checks are required to pass to approve pull requests - let's 
> enable them by default.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1433) Remove coder from TextIO

2017-02-13 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1433.
-
   Resolution: Done
Fix Version/s: 0.6.0

> Remove coder from TextIO
> 
>
> Key: BEAM-1433
> URL: https://issues.apache.org/jira/browse/BEAM-1433
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: 0.6.0
>
>
> Remove coder usage in TextIO.
> TextIO should only deal with Strings.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1457) Enable rat plugin and findbugs plugin in default build

2017-02-10 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1457:
---

 Summary: Enable rat plugin and findbugs plugin in default build
 Key: BEAM-1457
 URL: https://issues.apache.org/jira/browse/BEAM-1457
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Aviem Zur
Assignee: Davor Bonaci


Today, maven rat plugin and findbugs plugin only run when `release` profile is 
specified.
Since these plugins do not add a large amount of time compared to the normal 
build, let's enable them by default.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1457) Enable rat plugin and findbugs plugin in default build

2017-02-10 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861391#comment-15861391
 ] 

Aviem Zur commented on BEAM-1457:
-

I ran a quick test on my laptop to see how much time they add to the build (of 
the entire project):

'mvn clean install -DskipTests' => Total time: 03:51 min
'mvn clean install apache-rat:check findbugs:check -DskipTests' => Total time: 
05:29 min (Added 01:38 min)
'mvn clean install' => Total time: 09:37 min
'mvn clean install apache-rat:check findbugs:check' => Total time: 11:13 min 
(Added 01:36 min)

> Enable rat plugin and findbugs plugin in default build
> --
>
> Key: BEAM-1457
> URL: https://issues.apache.org/jira/browse/BEAM-1457
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Davor Bonaci
>
> Today, maven rat plugin and findbugs plugin only run when `release` profile 
> is specified.
> Since these plugins do not add a large amount of time compared to the normal 
> build, let's enable them by default.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1457) Enable rat plugin and findbugs plugin in default build

2017-02-10 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1457:

Description: 
Today, maven rat plugin and findbugs plugin only run when `release` profile is 
specified.
Since these plugins do not add a large amount of time compared to the normal 
build, and their checks are required to pass to approve pull requests - let's 
enable them by default.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]

  was:
Today, maven rat plugin and findbugs plugin only run when `release` profile is 
specified.
Since these plugins do not add a large amount of time compared to the normal 
build, let's enable them by default.

[Original dev list 
discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]


> Enable rat plugin and findbugs plugin in default build
> --
>
> Key: BEAM-1457
> URL: https://issues.apache.org/jira/browse/BEAM-1457
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Today, maven rat plugin and findbugs plugin only run when `release` profile 
> is specified.
> Since these plugins do not add a large amount of time compared to the normal 
> build, and their checks are required to pass to approve pull requests - let's 
> enable them by default.
> [Original dev list 
> discussion|https://lists.apache.org/thread.html/e1f80e54b44b4a39630d978abe79fb6a6cecf71d9821ee1881b47afb@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1448) Coder encode/decode context documentation is lacking

2017-02-09 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1448:
---

 Summary: Coder encode/decode context documentation is lacking
 Key: BEAM-1448
 URL: https://issues.apache.org/jira/browse/BEAM-1448
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Aviem Zur
Assignee: Davor Bonaci


Coder encode/decode context documentation is lacking.

* Documentation of {{Coder}} methods {{encode}} and {{decode}} should include 
description of {{context}} argument and explain how to relate to it when 
implementing.
* Consider renaming the static {{Context}} values {{NESTED}} and {{OUTER}} to 
more accurate names.
* Emphasize the use of CoderProperties as the best way to test a coder.

[Original dev list 
discussion|https://lists.apache.org/thread.html/fbd2d6b869ac2b0225ec39461b14158a03f304a930782d39ac9a60a6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1563) Flatten Spark runner libraries.

2017-02-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1563:

Description: 
Flatten Spark runner libraries into a single package (or two) so that 
everything is private.
See [~kenn] comment: 
https://github.com/apache/beam/pull/2050#discussion_r103136216

  was:
Flatten Spark runner libraries into a single package (or two) so that 
everything is private.
See [~kenn] comment: https://github.com/apache/beam/pull/2050


> Flatten Spark runner libraries.
> ---
>
> Key: BEAM-1563
> URL: https://issues.apache.org/jira/browse/BEAM-1563
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> Flatten Spark runner libraries into a single package (or two) so that 
> everything is private.
> See [~kenn] comment: 
> https://github.com/apache/beam/pull/2050#discussion_r103136216



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1572) Add per-stage matching of scope in metrics for the DirectRunner

2017-02-27 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887219#comment-15887219
 ] 

Aviem Zur commented on BEAM-1572:
-

Please note that this relates to https://issues.apache.org/jira/browse/BEAM-1344
The ultimate goal is to have a uniform querying logic across all runners.

> Add per-stage matching of scope in metrics for the DirectRunner
> ---
>
> Key: BEAM-1572
> URL: https://issues.apache.org/jira/browse/BEAM-1572
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-extensions
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>
> e.g. Metrics with scope "Top/Outer/Inner" should be matched by queries with 
> "Top/Outer" scope.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1551) Allow PAssert#that to take a message

2017-02-26 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1551:
---

Assignee: Aviem Zur

> Allow PAssert#that to take a message
> 
>
> Key: BEAM-1551
> URL: https://issues.apache.org/jira/browse/BEAM-1551
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Aviem Zur
>
> This permits users to describe their PAsserts, which should show up in the 
> failure message.
> Additionally, the default message of a PAssert failure should contain the 
> name of the input PValue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1337) Use our coder infrastructure for coders for state

2017-02-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1337:

Description: 
Today the user must explicitly provide any coders needed for serializing state 
data. We'd rather use the coder registry and infer the coder.

Currently all factory methods in {{StateSpecs}} take a coder argument. For 
example:
{code}
StateSpecs.value(coderForT);
{code}

We could leverage the coder registry and provide different overloads:

TypeDescriptor:
{code}
StateSpecs.value(typeDescriptorForT); 
{code}

Reflection:
{code}
StateSpec.value();
{code}

  was:Today the user must explicitly provide any coders needed for serializing 
state data. We'd rather use the coder registry and infer the coder.


> Use our coder infrastructure for coders for state
> -
>
> Key: BEAM-1337
> URL: https://issues.apache.org/jira/browse/BEAM-1337
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Aviem Zur
>Priority: Minor
>
> Today the user must explicitly provide any coders needed for serializing 
> state data. We'd rather use the coder registry and infer the coder.
> Currently all factory methods in {{StateSpecs}} take a coder argument. For 
> example:
> {code}
> StateSpecs.value(coderForT);
> {code}
> We could leverage the coder registry and provide different overloads:
> TypeDescriptor:
> {code}
> StateSpecs.value(typeDescriptorForT); 
> {code}
> Reflection:
> {code}
> StateSpec.value();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-351) Add DisplayData to KafkaIO

2017-02-26 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-351:
--

Assignee: Aviem Zur

> Add DisplayData to KafkaIO
> --
>
> Key: BEAM-351
> URL: https://issues.apache.org/jira/browse/BEAM-351
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Ben Chambers
>Assignee: Aviem Zur
>Priority: Minor
>  Labels: starter
>
> Any interesting parameters of the sources/sinks should be exposed as display 
> data. See any of the sources/sinks that already export this (BigQuery, 
> PubSub, etc.) for examples. Also look at the DisplayData builder and 
> HasDisplayData interface for how to wire these up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-774) Implement Metrics support for Spark runner

2017-01-08 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-774:
--

Assignee: Aviem Zur  (was: Amit Sela)

> Implement Metrics support for Spark runner
> --
>
> Key: BEAM-774
> URL: https://issues.apache.org/jira/browse/BEAM-774
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Ben Chambers
>Assignee: Aviem Zur
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-775) Remove Aggregators from the Java SDK

2017-03-21 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935104#comment-15935104
 ] 

Aviem Zur commented on BEAM-775:


Apex runner does not support metrics either.

> Remove Aggregators from the Java SDK
> 
>
> Key: BEAM-775
> URL: https://issues.apache.org/jira/browse/BEAM-775
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Pablo Estrada
>  Labels: backward-incompatible
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-24 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940346#comment-15940346
 ] 

Aviem Zur edited comment on BEAM-1802 at 3/24/17 1:37 PM:
--

[~iemejia] Please call {{stop()}} after calling {{waitUntilFinish()}} it should 
stop the first context so you can run a second one after it.


was (Author: aviemzur):
[~iemejia] Please call `stop()` after calling `waitUntilFinish()` it should 
stop the first context so you can run a second one after it.

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1398) KafkaIO metrics

2017-03-28 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1398:

Description: 
Add metrics to {{KafkaIO}} using the metrics API.

Metrics (Feel free to add more ideas here) per split (Where applicable):
* Backlog in bytes.
* Backlog in number of messages.
* Messages consumed.
* Bytes consumed.
* Messages produced.

Add {{NeedsRunner}} test which creates a pipeline and asserts metrics values.

  was:
Add metrics to {{KafkaIO}} using the metrics API.

Metrics (Feel free to add more ideas here) per split (Where applicable):
* Backlog in bytes.
* Backlog in number of messages.
* Messages consumed.
* Bytes consumed.
* Messages produced.

Add {{RunnableOnService}} test which creates a pipeline and asserts metrics 
values.


> KafkaIO metrics
> ---
>
> Key: BEAM-1398
> URL: https://issues.apache.org/jira/browse/BEAM-1398
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> Add metrics to {{KafkaIO}} using the metrics API.
> Metrics (Feel free to add more ideas here) per split (Where applicable):
> * Backlog in bytes.
> * Backlog in number of messages.
> * Messages consumed.
> * Bytes consumed.
> * Messages produced.
> Add {{NeedsRunner}} test which creates a pipeline and asserts metrics values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1617) Add Gauge metric type to Java SDK

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1617.
-
Resolution: Implemented

> Add Gauge metric type to Java SDK
> -
>
> Key: BEAM-1617
> URL: https://issues.apache.org/jira/browse/BEAM-1617
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1617) Add Gauge metric type to Java SDK

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1617:

Fix Version/s: First stable release

> Add Gauge metric type to Java SDK
> -
>
> Key: BEAM-1617
> URL: https://issues.apache.org/jira/browse/BEAM-1617
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1810) Spark runner combineGlobally uses Kryo serialization

2017-03-26 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1810.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Spark runner combineGlobally uses Kryo serialization
> 
>
> Key: BEAM-1810
> URL: https://issues.apache.org/jira/browse/BEAM-1810
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> {{TransformTranslator#combineGlobally}} inadvertently uses Kryo serialization 
> to serialize data. This should never happen.
> This is due to Spark's implementation of {{RDD.isEmpty}} which takes the 
> first element to check if there are elements in the {{RDD}}, it is then 
> serialized with Spark's configured serializer (In our case, Kryo).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1805) a new option `asInnerJoin` for CoGroupByKey

2017-03-26 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942640#comment-15942640
 ] 

Aviem Zur commented on BEAM-1805:
-

Have you taken a look at 
[join-library|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java]?

> a new option `asInnerJoin` for CoGroupByKey
> ---
>
> Key: BEAM-1805
> URL: https://issues.apache.org/jira/browse/BEAM-1805
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> {{CoGroupByKey}} joins multiple PCollection>, act as full-outer join.
> Option {{asInnerJoin()}} restrict the output to convert to an inner-join 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1806) a new option `asLeftOuterJoin` for CoGroupByKey

2017-03-26 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942642#comment-15942642
 ] 

Aviem Zur commented on BEAM-1806:
-

Have you taken a look at 
[join-library|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java]?

> a new option `asLeftOuterJoin` for CoGroupByKey
> ---
>
> Key: BEAM-1806
> URL: https://issues.apache.org/jira/browse/BEAM-1806
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> Similar as BEAM-1805, restrict it as left-outer-join. 
> The first {{PCollection}} is used as the key, if it's empty, output is 
> ignored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-775) Remove Aggregators from the Java SDK

2017-03-22 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937681#comment-15937681
 ] 

Aviem Zur commented on BEAM-775:


I think we should consider instead to move on 
https://issues.apache.org/jira/browse/BEAM-1763 and remove these assertions 
from TestSparkRunner. This will also remove the need for 
https://github.com/apache/beam/pull/2263

As [~bchambers] mentioned, this should probably not be done using metrics, 
especially since metrics are not supported in all runners.

> Remove Aggregators from the Java SDK
> 
>
> Key: BEAM-775
> URL: https://issues.apache.org/jira/browse/BEAM-775
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Pablo Estrada
>  Labels: backward-incompatible
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-463) BoundedHeapCoder should be a StandardCoder and not a CustomCoder

2017-03-29 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-463:
--

Assignee: (was: Aviem Zur)

> BoundedHeapCoder should be a StandardCoder and not a CustomCoder
> 
>
> Key: BEAM-463
> URL: https://issues.apache.org/jira/browse/BEAM-463
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Priority: Minor
>  Labels: backward-incompatible
>
> The issue is that BoundedHeapCoder does not report component encodings which 
> prevents effective runner inspection of the components.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (BEAM-1075) Sub-task?

2017-03-28 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur closed BEAM-1075.
---

> Sub-task?
> -
>
> Key: BEAM-1075
> URL: https://issues.apache.org/jira/browse/BEAM-1075
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1075) Sub-task?

2017-03-28 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1075.
-
   Resolution: Invalid
Fix Version/s: Not applicable

> Sub-task?
> -
>
> Key: BEAM-1075
> URL: https://issues.apache.org/jira/browse/BEAM-1075
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1075) Sub-task?

2017-03-28 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1075:

Summary: Sub-task?  (was: Shuffle the input read-values to get maximum 
parallelism.)

> Sub-task?
> -
>
> Key: BEAM-1075
> URL: https://issues.apache.org/jira/browse/BEAM-1075
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1792) Spark runner uses its own filtering logic to match metrics

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1792.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Spark runner uses its own filtering logic to match metrics
> --
>
> Key: BEAM-1792
> URL: https://issues.apache.org/jira/browse/BEAM-1792
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
> Fix For: First stable release
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1074) Set default-partitioner in SourceRDD.Unbounded.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1074:

Description: 
The SparkRunner uses {{mapWithState}} to read and manage CheckpointMarks, and 
this stateful operation will be followed by a shuffle: 
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159

Since the stateful read maps "splitSource" -> "partition of a list of read 
values", the following shuffle won't benefit in any way (the list of read 
values has not been flatMapped yet). In order to avoid shuffle we need to set 
the input RDD ({{SourceRDD.Unbounded}}) partitioner to be a default 
{{HashPartitioner}} since {{mapWithState}} would use the same partitioner and 
will skip shuffle if the partitioners match.

  was:This will make sure the following stateful read within {{mapWithState}} 
won't shuffle the read values as long as they are grouped in a {{List}}.


> Set default-partitioner in SourceRDD.Unbounded.
> ---
>
> Key: BEAM-1074
> URL: https://issues.apache.org/jira/browse/BEAM-1074
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> The SparkRunner uses {{mapWithState}} to read and manage CheckpointMarks, and 
> this stateful operation will be followed by a shuffle: 
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159
> Since the stateful read maps "splitSource" -> "partition of a list of read 
> values", the following shuffle won't benefit in any way (the list of read 
> values has not been flatMapped yet). In order to avoid shuffle we need to set 
> the input RDD ({{SourceRDD.Unbounded}}) partitioner to be a default 
> {{HashPartitioner}} since {{mapWithState}} would use the same partitioner and 
> will skip shuffle if the partitioners match.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1074) Set default-partitioner in SourceRDD.Unbounded.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1074:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: BEAM-848)

> Set default-partitioner in SourceRDD.Unbounded.
> ---
>
> Key: BEAM-1074
> URL: https://issues.apache.org/jira/browse/BEAM-1074
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> This will make sure the following stateful read within {{mapWithState}} won't 
> shuffle the read values as long as they are grouped in a {{List}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1397) Introduce IO metrics

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1397.
-
   Resolution: Done
Fix Version/s: First stable release

> Introduce IO metrics
> 
>
> Key: BEAM-1397
> URL: https://issues.apache.org/jira/browse/BEAM-1397
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> Introduce the usage of metrics API in IOs.
> POC using {{CountingInput}}:
> * Add metrics to {{CountingInput}}
> * {{RunnableOnService}} test which creates a pipeline which asserts these 
> metrics.
> * Close any gaps in Direct runner and Spark runner to support these metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-848) Shuffle the input read-values to get maximum parallelism.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-848:
---
Summary: Shuffle the input read-values to get maximum parallelism.  (was: A 
better shuffle after reading from within mapWithState.)

> Shuffle the input read-values to get maximum parallelism.
> -
>
> Key: BEAM-848
> URL: https://issues.apache.org/jira/browse/BEAM-848
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> It would be wise to shuffle the read values _after_ flatmap to increase 
> parallelism in processing of the data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-848) A better shuffle after reading from within mapWithState.

2017-03-27 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-848:
---
Description: It would be wise to shuffle the read values _after_ flatmap to 
increase parallelism in processing of the data.  (was: The SparkRunner uses 
{{mapWithState}} to read and manage CheckpointMarks, and this stateful 
operation will be followed by a shuffle: 
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159

Since the stateful read maps "splitSource" -> "partition of a list of read 
values", the following shuffle won't benefit in any way (the list of read 
values has not been flatMapped yet). In order to avoid shuffle we need to set 
the input RDD ({{SourceRDD.Unbounded}}) partitioner to be a default 
{{HashPartitioner}} since {{mapWithState}} would use the same partitioner and 
will skip shuffle if the partitioners match.

It would be wise to shuffle the read values _after_ flatmap.

I will break this into two tasks:
# Set default-partitioner to the input RDD.
# Shuffle (using Coders) the input.)

> A better shuffle after reading from within mapWithState.
> 
>
> Key: BEAM-848
> URL: https://issues.apache.org/jira/browse/BEAM-848
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> It would be wise to shuffle the read values _after_ flatmap to increase 
> parallelism in processing of the data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1840) shaded classes are not getting into the proper package

2017-03-31 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951050#comment-15951050
 ] 

Aviem Zur edited comment on BEAM-1840 at 3/31/17 2:51 PM:
--

The relocation rules are based on the artifact ids, and not the package names.
{code:xml}

  org.apache.${renderedArtifactId}.repackaged.com.google.common

{code}
The way we calculate {{renderedArtifactId}} is replacing non-alphanumeric 
characters in the artifact id with {{.}}
{code:xml}

  org.codehaus.mojo
  build-helper-maven-plugin
  3.0.0
  

  render-artifact-id
  
regex-properties
  
  prepare-package
  

  
renderedArtifactId
[^A-Za-z0-9]
.
${project.artifactId}
false
  

  

  

{code}
If we want the relocated classes to be in the {{sdk}} package instead of 
{{sdks}} package what needs to be done is add another replacement on 
{{renderedArtifactId}} which will replace {{sdks}} with {{sdk}}.


was (Author: aviemzur):
The relocation rules are based on the artifact ids, and not the package names.
{{code}}

  org.apache.${renderedArtifactId}.repackaged.com.google.common

{{code}}
The way we calculate {{renderedArtifactId}} is replacing non-alphanumeric 
characters in the artifact id with {{.}}
{{code}}

  org.codehaus.mojo
  build-helper-maven-plugin
  3.0.0
  

  render-artifact-id
  
regex-properties
  
  prepare-package
  

  
renderedArtifactId
[^A-Za-z0-9]
.
${project.artifactId}
false
  

  

  

{{code}}
If we want the relocated classes to be in the {{sdk}} package instead of 
{{sdks}} package what needs to be done is add another replacement on 
{{renderedArtifactId}} which will replace {{sdks}} with {{sdk}}.

> shaded classes are not getting into the proper package
> --
>
> Key: BEAM-1840
> URL: https://issues.apache.org/jira/browse/BEAM-1840
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
>
> The current shade configuration relocates classes into packages based in the 
> artifact name, however this is inconsistent with the package nams because the 
> beam artifact ids follow the directory structure beam-sdks-java-io-* but the 
> package structure is beam/sdk/java/io so there is an extra 's' that creates a 
> different package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1840) shaded classes are not getting into the proper package

2017-03-31 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951050#comment-15951050
 ] 

Aviem Zur commented on BEAM-1840:
-

The relocation rules are based on the artifact ids, and not the package names.
{{code}}

  org.apache.${renderedArtifactId}.repackaged.com.google.common

{{code}}
The way we calculate {{renderedArtifactId}} is replacing non-alphanumeric 
characters in the artifact id with {{.}}
{{code}}

  org.codehaus.mojo
  build-helper-maven-plugin
  3.0.0
  

  render-artifact-id
  
regex-properties
  
  prepare-package
  

  
renderedArtifactId
[^A-Za-z0-9]
.
${project.artifactId}
false
  

  

  

{{code}}
If we want the relocated classes to be in the {{sdk}} package instead of 
{{sdks}} package what needs to be done is add another replacement on 
{{renderedArtifactId}} which will replace {{sdks}} with {{sdk}}.

> shaded classes are not getting into the proper package
> --
>
> Key: BEAM-1840
> URL: https://issues.apache.org/jira/browse/BEAM-1840
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
>
> The current shade configuration relocates classes into packages based in the 
> artifact name, however this is inconsistent with the package nams because the 
> beam artifact ids follow the directory structure beam-sdks-java-io-* but the 
> package structure is beam/sdk/java/io so there is an extra 's' that creates a 
> different package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1840) shaded classes are not getting into the proper package

2017-03-31 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1840:

Affects Version/s: (was: First stable release)

> shaded classes are not getting into the proper package
> --
>
> Key: BEAM-1840
> URL: https://issues.apache.org/jira/browse/BEAM-1840
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Priority: Trivial
>  Labels: newbie, starter
>
> The current shade configuration relocates classes into packages based in the 
> artifact name, however this is inconsistent with the package nams because the 
> beam artifact ids follow the directory structure beam-sdks-java-io-* but the 
> package structure is beam/sdk/java/io so there is an extra 's' that creates a 
> different package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1074) Set default-partitioner in SourceRDD.Unbounded.

2017-03-25 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1074.
-
   Resolution: Fixed
Fix Version/s: First stable release

> Set default-partitioner in SourceRDD.Unbounded.
> ---
>
> Key: BEAM-1074
> URL: https://issues.apache.org/jira/browse/BEAM-1074
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
> Fix For: First stable release
>
>
> This will make sure the following stateful read within {{mapWithState}} won't 
> shuffle the read values as long as they are grouped in a {{List}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1802) Spark Runner does not shutdown correctly when executing multiple pipelines in sequence

2017-03-24 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940346#comment-15940346
 ] 

Aviem Zur commented on BEAM-1802:
-

[~iemejia] Please call `stop()` after calling `waitUntilFinish()` it should 
stop the first context so you can run a second one after it.

> Spark Runner does not shutdown correctly when executing multiple pipelines in 
> sequence
> --
>
> Key: BEAM-1802
> URL: https://issues.apache.org/jira/browse/BEAM-1802
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Assignee: Aviem Zur
>
> I found this while running the Nexmark queries in sequence in local mode. I 
> had the correct configuration but it didn't seem to work.
> 17/03/24 12:07:49 WARN org.apache.spark.SparkContext: Multiple running 
> SparkContexts detected in the same JVM!
> org.apache.spark.SparkException: Only one SparkContext may be running in this 
> JVM (see SPARK-2243). To ignore this error, set 
> spark.driver.allowMultipleContexts = true. The currently running SparkContext 
> was created at:
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:59)
> org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:100)
> org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:69)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:206)
> org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:91)
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:266)
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1233)
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
>   at 
> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1276) StateSpecs.combiningValue interface is very awkward to use

2017-03-25 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1276:
---

Assignee: (was: Aviem Zur)

> StateSpecs.combiningValue interface is very awkward to use
> --
>
> Key: BEAM-1276
> URL: https://issues.apache.org/jira/browse/BEAM-1276
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Daniel Mills
>Priority: Minor
>
> Using StateSpecs.combiningValue with built in combiners is very verbose.  For 
> example, to keep a running sum of ints:
> {code}
> @StateId("count")
> private final StateSpec Integer>> countSpec =
> StateSpecs.combiningValue(
>   Sum.ofIntegers().getAccumulatorCoder(pipeline.getCoderRegistry(), 
> VarIntCoder.of()), Sum.ofIntegers());
> {code}
> This involves getting a reference to the pipeline into the DoFn, 
> guessing/finding the proper type parameters for Sum.ofIntegers(), and 
> manually pulling the accumulator coder out.
> For combiners like Sum.ofIntegers() that have a fixed accumulator, the 
> combiningValue call should be able to deduce that.  Additionally, it would be 
> nice to remove the type of the accumulator from the StateSpec object, since 
> the user only needs the input and output types in their code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1706) Ban Guava as a transitive dependency

2017-03-25 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1706:
---

Assignee: (was: Aviem Zur)

> Ban Guava as a transitive dependency
> 
>
> Key: BEAM-1706
> URL: https://issues.apache.org/jira/browse/BEAM-1706
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>
> Leaks of Guava dependencies to users can cause conflicts since provided 
> dependencies may rely on different versions of Guava.
> Configure Maven enforcer plugin to ban Guava as a transitive dependency.
> This will force our modules to explicitly declare all Guava dependencies (And 
> we'll shade and repackage them).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1706) Ban Guava as a transitive dependency

2017-03-25 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1706:

Description: 
Leaks of Guava dependencies to users can cause conflicts since provided 
dependencies may rely on different versions of Guava.
Configure Maven enforcer plugin to ban Guava as a transitive dependency.
This will force our modules to explicitly declare all Guava dependencies (And 
we'll shade and relocate them).

Consider using [Maven Enforcer 
Plugin|http://maven.apache.org/enforcer/enforcer-rules/banTransitiveDependencies.html]

  was:
Leaks of Guava dependencies to users can cause conflicts since provided 
dependencies may rely on different versions of Guava.
Configure Maven enforcer plugin to ban Guava as a transitive dependency.
This will force our modules to explicitly declare all Guava dependencies (And 
we'll shade and repackage them).


> Ban Guava as a transitive dependency
> 
>
> Key: BEAM-1706
> URL: https://issues.apache.org/jira/browse/BEAM-1706
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Aviem Zur
>
> Leaks of Guava dependencies to users can cause conflicts since provided 
> dependencies may rely on different versions of Guava.
> Configure Maven enforcer plugin to ban Guava as a transitive dependency.
> This will force our modules to explicitly declare all Guava dependencies (And 
> we'll shade and relocate them).
> Consider using [Maven Enforcer 
> Plugin|http://maven.apache.org/enforcer/enforcer-rules/banTransitiveDependencies.html]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1075) Shuffle the input read-values to get maximum parallelism.

2017-03-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1075:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Shuffle the input read-values to get maximum parallelism.
> -
>
> Key: BEAM-1075
> URL: https://issues.apache.org/jira/browse/BEAM-1075
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (BEAM-1074) Set default-partitioner in SourceRDD.Unbounded.

2017-03-19 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur reassigned BEAM-1074:
---

Assignee: Aviem Zur  (was: Amit Sela)

> Set default-partitioner in SourceRDD.Unbounded.
> ---
>
> Key: BEAM-1074
> URL: https://issues.apache.org/jira/browse/BEAM-1074
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Aviem Zur
>
> This will make sure the following stateful read within {{mapWithState}} won't 
> shuffle the read values as long as they are grouped in a {{List}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-1765) Remove Aggregators from Spark runner

2017-03-20 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934063#comment-15934063
 ] 

Aviem Zur commented on BEAM-1765:
-

How can this be removed from the runners independently from Java SDK?

Most uses of aggregators in Spark runner is to pass these aggregators to SDK 
methods and constructors which require them. For example: {{ReduceFnRunner}}.

Another use are the {{PAssert}} success and counter aggregators which allow 
runners to ensure that all assertions in tests actually happened and not pass 
tests that should have failed.

The only other use is the support of aggregators in {{SparkPipelineResult}} 
which can be removed independently.

> Remove Aggregators from Spark runner
> 
>
> Key: BEAM-1765
> URL: https://issues.apache.org/jira/browse/BEAM-1765
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Pablo Estrada
>Assignee: Amit Sela
>
> I have started removing aggregators from the Java SDK, but runners use them 
> in different ways that I can't figure out well. This is to track the 
> independent effort in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1765) Remove Aggregators from Spark runner

2017-03-20 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934063#comment-15934063
 ] 

Aviem Zur edited comment on BEAM-1765 at 3/21/17 4:04 AM:
--

Awesome to hear we are moving forward on removing aggregators!

How can this be removed from the runners independently from Java SDK?

Most uses of aggregators in Spark runner is to pass these aggregators to SDK 
methods and constructors which require them. For example: {{ReduceFnRunner}}.

Another use are the {{PAssert}} success and counter aggregators which allow 
runners to ensure that all assertions in tests actually happened and not pass 
tests that should have failed.

The only other use is the support of aggregators in {{SparkPipelineResult}} 
which can be removed independently.


was (Author: aviemzur):
How can this be removed from the runners independently from Java SDK?

Most uses of aggregators in Spark runner is to pass these aggregators to SDK 
methods and constructors which require them. For example: {{ReduceFnRunner}}.

Another use are the {{PAssert}} success and counter aggregators which allow 
runners to ensure that all assertions in tests actually happened and not pass 
tests that should have failed.

The only other use is the support of aggregators in {{SparkPipelineResult}} 
which can be removed independently.

> Remove Aggregators from Spark runner
> 
>
> Key: BEAM-1765
> URL: https://issues.apache.org/jira/browse/BEAM-1765
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Pablo Estrada
>Assignee: Amit Sela
>
> I have started removing aggregators from the Java SDK, but runners use them 
> in different ways that I can't figure out well. This is to track the 
> independent effort in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-1765) Remove Aggregators from Spark runner

2017-03-20 Thread Aviem Zur (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934063#comment-15934063
 ] 

Aviem Zur edited comment on BEAM-1765 at 3/21/17 4:57 AM:
--

Awesome to see we are moving forward on removing aggregators!

How can this be removed from the runners independently from Java SDK?

Most uses of aggregators in Spark runner is to pass these aggregators to SDK 
methods and constructors which require them. For example: {{ReduceFnRunner}}.

Another use are the {{PAssert}} success and counter aggregators which allow 
runners to ensure that all assertions in tests actually happened and not pass 
tests that should have failed.

The only other use is the support of aggregators in {{SparkPipelineResult}} 
which can be removed independently.


was (Author: aviemzur):
Awesome to hear we are moving forward on removing aggregators!

How can this be removed from the runners independently from Java SDK?

Most uses of aggregators in Spark runner is to pass these aggregators to SDK 
methods and constructors which require them. For example: {{ReduceFnRunner}}.

Another use are the {{PAssert}} success and counter aggregators which allow 
runners to ensure that all assertions in tests actually happened and not pass 
tests that should have failed.

The only other use is the support of aggregators in {{SparkPipelineResult}} 
which can be removed independently.

> Remove Aggregators from Spark runner
> 
>
> Key: BEAM-1765
> URL: https://issues.apache.org/jira/browse/BEAM-1765
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Pablo Estrada
>Assignee: Amit Sela
>
> I have started removing aggregators from the Java SDK, but runners use them 
> in different ways that I can't figure out well. This is to track the 
> independent effort in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1652) Add code style xml to the website

2017-03-14 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1652.
-
   Resolution: Done
Fix Version/s: Not applicable

> Add code style xml to the website
> -
>
> Key: BEAM-1652
> URL: https://issues.apache.org/jira/browse/BEAM-1652
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BEAM-1720) Consider minimization on shaded jars

2017-03-14 Thread Aviem Zur (JIRA)
Aviem Zur created BEAM-1720:
---

 Summary: Consider minimization on shaded jars
 Key: BEAM-1720
 URL: https://issues.apache.org/jira/browse/BEAM-1720
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Aviem Zur
Assignee: Aviem Zur


Some of our modules shade and relocate Guava as part of their build, this adds 
a considerable amount to their file size, even though they do not use every 
single class in Guava's jar.
maven shade-plugin solves this by minimization which only shades classes which 
are in use in the module into the jar.

There are a few issues with configuring minimzation for our shaded modules 
currently:

1. There is a bug in shade-plugin where if minimization is declared on a 
project with pom packaging it fails. I have created a 
[ticket|https://issues.apache.org/jira/browse/MSHADE-253] and 
[PR|https://github.com/apache/maven-plugins/pull/107] with a fix for this but 
could take a while before it is available to us.
2. Minimization on all of our modules adds a significant amount of build time, 
which we wish to avoid. (Up from 02:30 mins for a build of the entire project 
(skipping tests) on a Macbook pro laptop to over 5 mins)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (BEAM-1650) Add code style xml to the project repo and website

2017-03-14 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur resolved BEAM-1650.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Add code style xml to the project repo and website 
> ---
>
> Key: BEAM-1650
> URL: https://issues.apache.org/jira/browse/BEAM-1650
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Aviem Zur
>Assignee: Aviem Zur
> Fix For: Not applicable
>
>
> Contributors who develop using IntelliJ IDEa or Eclipse could benefit from 
> having an Eclipse code formatter xml which matches the checkstyle 
> enforcements of Beam.
> Add such a code style xml to the repository and link to it from the website 
> under the IDE setup sections: 
> https://beam.apache.org/contribute/contribution-guide/#intellij
> https://beam.apache.org/contribute/contribution-guide/#eclipse



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (BEAM-1581) JSON sources and sinks

2017-03-15 Thread Aviem Zur (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aviem Zur updated BEAM-1581:

Description: 
JSON source and sink to read/write JSON files.
Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}

  was:
A new IO (with source and sink) which will read/write Json files.
Similarly to {{XmlSource}}/{{XmlSink}}, this IO should have a 
{{JsonSource}}/{{JonSink}} which are a {{FileBaseSource}}/{{FileBasedSink}}.
Consider using methods/code (or refactor these) found in {{AsJsons}} and 
{{ParseJsons}}


> JSON sources and sinks
> --
>
> Key: BEAM-1581
> URL: https://issues.apache.org/jira/browse/BEAM-1581
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> JSON source and sink to read/write JSON files.
> Similarly to {{XmlSource}}/{{XmlSink}}, these be a {{JsonSource}}/{{JonSink}} 
> which are a {{FileBaseSource}}/{{FileBasedSink}}.
> Consider using methods/code (or refactor these) found in {{AsJsons}} and 
> {{ParseJsons}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   >