Re: Setup of Scala/Flink project using Bazel

2021-05-20 Thread Salva Alcántara
Hi Austin,

In the end I added the following target override for Scala:

```
maven_install(
artifacts = [
# testing
maven.artifact(
group = "com.google.truth",
artifact = "truth",
version = "1.0.1",
),
] + flink_artifacts(
addons = FLINK_ADDONS,
scala_version = FLINK_SCALA_VERSION,
version = FLINK_VERSION,
) + flink_testing_artifacts(
scala_version = FLINK_SCALA_VERSION,
version = FLINK_VERSION,
),
fetch_sources = True,
# This override results in Scala-related classes being removed from the
deploy jar as required (?)
override_targets = {
"org.scala-lang.scala-library":
"@io_bazel_rules_scala_scala_library//:io_bazel_rules_scala_scala_library",
"org.scala-lang.scala-reflect":
"@io_bazel_rules_scala_scala_reflect//:io_bazel_rules_scala_scala_reflect",
"org.scala-lang.scala-compiler":
"@io_bazel_rules_scala_scala_compiler//:io_bazel_rules_scala_scala_compiler",
"org.scala-lang.modules.scala-parser-combinators_%s" %
FLINK_SCALA_VERSION:
"@io_bazel_rules_scala_scala_parser_combinators//:io_bazel_rules_scala_scala_parser_combinators",
"org.scala-lang.modules.scala-xml_%s" % FLINK_SCALA_VERSION:
"@io_bazel_rules_scala_scala_xml//:io_bazel_rules_scala_scala_xml",
},
repositories = MAVEN_REPOSITORIES,
)
```

and now it works as expected, meaning:

```
bazel build //src/main/scala/org/example:word_count_deploy.jar
```

produces a jar with both Flink and Scala-related classes removed (since they
are provided by the runtime). I did a quick check and the flink job runs
just fine in a local cluster. It would be nice if the community could
confirm that this is indeed the way to build flink-based scala
applications...

BTW I updated the repo with the abovementioned override:
https://github.com/salvalcantara/bazel-flink-scala in case you want to give
it a try




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Salva Alcántara
That would be awesome Austin, thanks again for your help on that. In the
meantime, I also filled an issue in the `rules_scala` repo:
https://github.com/bazelbuild/rules_scala/issues/1268.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Austin Cawley-Edwards
I know @Aaron Levin  is using `rules_scala` for
building Flink apps, perhaps he can help us out here (and hope he doesn't
mind the ping).



On Wed, May 12, 2021 at 4:13 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Yikes, I see what you mean. I also can not get `neverlink` or adding the
> org.scala.lang artifacts to the deploy_env to remove them from the uber jar.
>
> I'm not super familiar with sbt/ scala, but do you know how exactly the
> assembly `includeScala` works? Is it just a flag that is passed to scalac?
>
> I've found where rules_scala defines how to call `scalac`, but am lost
> here[1].
>
> Best,
> Austin
>
> [1]:
> https://github.com/bazelbuild/rules_scala/blob/c9cc7c261d3d740eb91ef8ef048b7cd2229d12ec/scala/private/rule_impls.bzl#L72-L139
>
> On Wed, May 12, 2021 at 3:20 PM Salva Alcántara 
> wrote:
>
>> Hi Austin,
>>
>> Yep, removing Flink dependencies is working well as you pointed out.
>>
>> The problem now is that I would also need to remove the scala library...by
>> inspecting the jar you will see a lot of scala-related classes. If you
>> take
>> a look at the end of the build.sbt file, I have
>>
>> ```
>> // exclude Scala library from assembly
>> assembly / assemblyOption  := (assembly /
>> assemblyOption).value.copy(includeScala = false)
>> ```
>>
>> so the fat jar generated by running `sbt assembly` does not contain
>> scala-related classes, which are also "provided". You can compare the
>> bazel-built jar with the one built by sbt
>>
>> ```
>> $ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar
>> META-INF/MANIFEST.MF
>> org/
>> org/example/
>> BUILD
>> log4j.properties
>> org/example/WordCount$$anon$1$$anon$2.class
>> org/example/WordCount$$anon$1.class
>> org/example/WordCount$.class
>> org/example/WordCount.class
>> ```
>>
>> Note that there are neither Flink nor Scala classes. In the jar generated
>> by
>> bazel, however, I can still see Scala classes...
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>


Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Austin Cawley-Edwards
Yikes, I see what you mean. I also can not get `neverlink` or adding the
org.scala.lang artifacts to the deploy_env to remove them from the uber jar.

I'm not super familiar with sbt/ scala, but do you know how exactly the
assembly `includeScala` works? Is it just a flag that is passed to scalac?

I've found where rules_scala defines how to call `scalac`, but am lost
here[1].

Best,
Austin

[1]:
https://github.com/bazelbuild/rules_scala/blob/c9cc7c261d3d740eb91ef8ef048b7cd2229d12ec/scala/private/rule_impls.bzl#L72-L139

On Wed, May 12, 2021 at 3:20 PM Salva Alcántara 
wrote:

> Hi Austin,
>
> Yep, removing Flink dependencies is working well as you pointed out.
>
> The problem now is that I would also need to remove the scala library...by
> inspecting the jar you will see a lot of scala-related classes. If you take
> a look at the end of the build.sbt file, I have
>
> ```
> // exclude Scala library from assembly
> assembly / assemblyOption  := (assembly /
> assemblyOption).value.copy(includeScala = false)
> ```
>
> so the fat jar generated by running `sbt assembly` does not contain
> scala-related classes, which are also "provided". You can compare the
> bazel-built jar with the one built by sbt
>
> ```
> $ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar
> META-INF/MANIFEST.MF
> org/
> org/example/
> BUILD
> log4j.properties
> org/example/WordCount$$anon$1$$anon$2.class
> org/example/WordCount$$anon$1.class
> org/example/WordCount$.class
> org/example/WordCount.class
> ```
>
> Note that there are neither Flink nor Scala classes. In the jar generated
> by
> bazel, however, I can still see Scala classes...
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Salva Alcántara
Hi Austin,

Yep, removing Flink dependencies is working well as you pointed out.

The problem now is that I would also need to remove the scala library...by
inspecting the jar you will see a lot of scala-related classes. If you take
a look at the end of the build.sbt file, I have

```
// exclude Scala library from assembly
assembly / assemblyOption  := (assembly /
assemblyOption).value.copy(includeScala = false)
```

so the fat jar generated by running `sbt assembly` does not contain
scala-related classes, which are also "provided". You can compare the
bazel-built jar with the one built by sbt 

```
$ jar tf target/scala-2.12/bazel-flink-scala-assembly-0.1-SNAPSHOT.jar
META-INF/MANIFEST.MF
org/
org/example/
BUILD
log4j.properties
org/example/WordCount$$anon$1$$anon$2.class
org/example/WordCount$$anon$1.class
org/example/WordCount$.class
org/example/WordCount.class
```

Note that there are neither Flink nor Scala classes. In the jar generated by
bazel, however, I can still see Scala classes...



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Austin Cawley-Edwards
Hi Salva,

I think you're almost there. Confusion is definitely not helped by the
ADDONS/ PROVIDED_ADDONS thingy – I think I tried to get too fancy with that
in the linked thread.

I think the only thing you have to do differently is to adjust the target
you are building/ deploying – instead of
`//src/main/scala/org/example:flink_app_deploy.jar`, your target with the
provided env applied is
`//src/main/scala/org/example:word_count_deploy.jar`. I've verified this in
the following ways:

1. Building and checking the JAR itself
* bazel build //src/main/scala/org/example:word_count_deploy.jar
* jar -tf bazel-bin/src/main/scala/org/example/word_count_deploy.jar | grep
flink
  * Shows only the tools/flink/NoOp class

2. Running the word count jar locally, to ensure the main class is picked
up correctly:
./bazel-bin/src/main/scala/org/example/word_count
USAGE:
WordCount  

3. Had fun with the Bazel query language[1], inspecting the difference in
the dependencies between the deploy env and the word_cound_deploy.jar:

bazel query 'filter("@maven//:org_apache_flink.*",
deps(//src/main/scala/org/example:word_count_deploy.jar) except
deps(//:default_flink_deploy_env))'
INFO: Empty results
Loading: 0 packages loaded

This is to say that there are no Flink dependencies in the deploy JAR that
are not accounted for in the deploy env.


So I think you're all good, but let me know if I've misunderstood! Or if
you find a better way of doing the provided deps – I'd be very interested!

Best,
Austin

[1]: https://docs.bazel.build/versions/master/query.htm

On Wed, May 12, 2021 at 10:28 AM Salva Alcántara 
wrote:

> Hi Austin,
>
> I followed your instructions and gave `rules_jvm_external` a try.
>
> Overall, I think I advanced a bit, but I'm not quite there yet. I have
> followed the link [1] given by Matthias, making the necessary changes to my
> repo:
>
> https://github.com/salvalcantara/bazel-flink-scala
>
> In particular, the relevant (bazel) BUILD file looks like this:
>
> ```
> package(default_visibility = ["//visibility:public"])
>
> load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
> "scala_test")
>
> filegroup(
> name = "scala-main-srcs",
> srcs = glob(["*.scala"]),
> )
>
> scala_library(
> name = "flink_app",
> srcs = [":scala-main-srcs"],
> deps = [
> "@maven//:org_apache_flink_flink_core",
> "@maven//:org_apache_flink_flink_clients_2_12",
> "@maven//:org_apache_flink_flink_scala_2_12",
> "@maven//:org_apache_flink_flink_streaming_scala_2_12",
> "@maven//:org_apache_flink_flink_streaming_java_2_12",
> ],
> )
>
> java_binary(
> name = "word_count",
> srcs = ["//tools/flink:noop"],
> deploy_env = ["//:default_flink_deploy_env"],
> main_class = "org.example.WordCount",
> deps = [
> ":flink_app",
> ],
> )
> ```
>
> The idea is to use `deploy_env` within `java_binary` for providing the
> flink
> dependencies. This causes those dependencies to get removed from the final
> fat jar that one gets by running:
>
> ```
> bazel build //src/main/scala/org/example:flink_app_deploy.jar
> ```
>
> The problem now is that the jar still includes the Scala library, which
> should also be dropped from the jar as it is part of the provided
> dependencies within the Flink cluster. I am reading this blog post in [2]
> without luck yet...
>
> Regards,
>
> Salva
>
> [1]
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-anyone-have-an-example-of-Bazel-working-with-Flink-td35898.html
>
> [2]
>
> https://yishanhe.net/address-dependency-conflict-for-bazel-built-scala-spark/
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Salva Alcántara
Hi Austin,

I followed your instructions and gave `rules_jvm_external` a try.

Overall, I think I advanced a bit, but I'm not quite there yet. I have
followed the link [1] given by Matthias, making the necessary changes to my
repo:

https://github.com/salvalcantara/bazel-flink-scala

In particular, the relevant (bazel) BUILD file looks like this:

```
package(default_visibility = ["//visibility:public"])

load("@io_bazel_rules_scala//scala:scala.bzl", "scala_library",
"scala_test")

filegroup(
name = "scala-main-srcs",
srcs = glob(["*.scala"]),
)

scala_library(
name = "flink_app",
srcs = [":scala-main-srcs"],
deps = [
"@maven//:org_apache_flink_flink_core",
"@maven//:org_apache_flink_flink_clients_2_12",
"@maven//:org_apache_flink_flink_scala_2_12",
"@maven//:org_apache_flink_flink_streaming_scala_2_12",
"@maven//:org_apache_flink_flink_streaming_java_2_12",
],
)

java_binary(
name = "word_count",
srcs = ["//tools/flink:noop"],
deploy_env = ["//:default_flink_deploy_env"],
main_class = "org.example.WordCount",
deps = [
":flink_app",
],
)
```

The idea is to use `deploy_env` within `java_binary` for providing the flink
dependencies. This causes those dependencies to get removed from the final
fat jar that one gets by running:

```
bazel build //src/main/scala/org/example:flink_app_deploy.jar
```

The problem now is that the jar still includes the Scala library, which
should also be dropped from the jar as it is part of the provided
dependencies within the Flink cluster. I am reading this blog post in [2]
without luck yet...

Regards,

Salva

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-anyone-have-an-example-of-Bazel-working-with-Flink-td35898.html

[2]
https://yishanhe.net/address-dependency-conflict-for-bazel-built-scala-spark/



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Setup of Scala/Flink project using Bazel

2021-05-04 Thread Austin Cawley-Edwards
Great! Feel free to post back if you run into anything else or come up with
a nice template – I agree it would be a nice thing for the community to
have.

Best,
Austin

On Tue, May 4, 2021 at 12:37 AM Salva Alcántara 
wrote:

> Hey Austin,
>
> There was no special reason for vendoring using `bazel-deps`, really. I
> just
> took another project as a reference for mine and that project was already
> using `bazel-deps`. I am going to give `rules_jvm_external` a try, and
> hopefully I can make it work!
>
> Regards,
>
> Salva
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Setup of Scala/Flink project using Bazel

2021-05-03 Thread Salva Alcántara
Hey Austin,

There was no special reason for vendoring using `bazel-deps`, really. I just
took another project as a reference for mine and that project was already
using `bazel-deps`. I am going to give `rules_jvm_external` a try, and
hopefully I can make it work!

Regards,

Salva



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Setup of Scala/Flink project using Bazel

2021-05-03 Thread Austin Cawley-Edwards
Hey Salva,

This appears to be a bug in the `bazel-deps` tool, caused by mixing scala
and Java dependencies. The tool seems to use the same target name for both,
and thus produces duplicate targets (one for scala and one for java).

If you look at the dict lines that are reported as conflicting, you'll see
the duplicate "vendor/org/apache/flink:flink_clients" target:

*"vendor/org/apache/flink:flink_clients":
["lang||java","name||//vendor/org/apache/flink:flink_clients",*
...],
*"vendor/org/apache/flink:flink_clients":
["lang||scala:2.12.11","name||//vendor/org/apache/flink:flink_clients",
*...],

Can I ask what made you choose the `bazel-deps` too instead of the official
bazelbuild/rules_jvm_external[1]? That might be a bit more verbose, but has
better support and supports scala as well.


Alternatively, you might look into customizing the target templates for
`bazel-deps` to suffix targets with the lang? Something like:

_JAVA_LIBRARY_TEMPLATE = """
java_library(
  name = "{name}_java",
..."""

_SCALA_IMPORT_TEMPLATE = """
scala_import(
name = "{name}_scala",
..."""


Best,
Austin

[1]: https://github.com/bazelbuild/rules_jvm_external

On Mon, May 3, 2021 at 1:20 PM Salva Alcántara 
wrote:

> Hi Matthias,
>
> Thanks a lot for your reply. I am already aware of that reference, but it's
> not exactly what I need. What I'd like to have is the typical word count
> (hello world) app migrated from sbt to bazel, in order to use it as a
> template for my Flink/Scala apps.
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Setup of Scala/Flink project using Bazel

2021-05-03 Thread Salva Alcántara
Hi Matthias,

Thanks a lot for your reply. I am already aware of that reference, but it's
not exactly what I need. What I'd like to have is the typical word count
(hello world) app migrated from sbt to bazel, in order to use it as a
template for my Flink/Scala apps.





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Setup of Scala/Flink project using Bazel

2021-05-03 Thread Matthias Pohl
ming-scala] # provided
> flink-connector-kafka:
>   lang: java
>   version: "0.10.2"
> flink-test-utils:
>   lang: java
>   version: "0.10.2"
> ```
>
> For downloading the dependencies, I'm running
>
> ```
> bazel run //:parse generate -- --repo-root ~/Projects/bazel-flink-scala
> --sha-file vendor/workspace.bzl --target-file vendor/target_file.bzl --deps
> dependencies.yaml
> ```
>
> Which runs just fine, but then when I try to build the project
>
> ```
> bazel build //:job
> ```
>
> I'm getting this error
>
> ```
> Starting local Bazel server and connecting to it...
> ERROR: Traceback (most recent call last):
> File
> "/Users/salvalcantara/Projects/me/bazel-flink-scala/WORKSPACE", line
> 44, column 25, in 
> build_external_workspace(name = "vendor")
> File
>
> "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
> line 258, column 91, in build_external_workspace
> return build_external_workspace_from_opts(name = name,
> target_configs =
> list_target_data(), separator = list_target_data_separator(), build_header
> =
> build_header())
> File
>
> "/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
> line 251, column 40, in list_target_data
> "vendor/org/apache/flink:flink_clients":
>
> ["lang||scala:2.12.11","name||//vendor/org/apache/flink:flink_clients","visibility||//visibility:public","kind||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"],
> Error: dictionary expression has duplicate key:
> "vendor/org/apache/flink:flink_clients"
> ERROR: error loading package 'external': Package 'external' contains errors
> INFO: Elapsed time: 3.644s
> INFO: 0 processes.
> FAILED: Build did NOT complete successfully (0 packages loaded)
> ```
>
> Why is that? Anyone can help? It would be great having detailed
> instructions
> and project templates for Flink/Scala applications using Bazel. I've put
> everything together in the following repo:
> https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR
> or whatever.
>
> PS: Also posted in SO:
>
> https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Setup of Scala/Flink project using Bazel

2021-04-30 Thread Salva Alcántara
ile
"/Users/salvalcantara/Projects/me/bazel-flink-scala/vendor/target_file.bzl",
line 251, column 40, in list_target_data
"vendor/org/apache/flink:flink_clients":
["lang||scala:2.12.11","name||//vendor/org/apache/flink:flink_clients","visibility||//visibility:public","kind||import","deps|||L|||","jars|||L|||//external:jar/org/apache/flink/flink_clients_2_12","sources|||L|||","exports|||L|||","runtimeDeps|||L|||//vendor/commons_cli:commons_cli|||//vendor/org/slf4j:slf4j_api|||//vendor/org/apache/flink:force_shading|||//vendor/com/google/code/findbugs:jsr305|||//vendor/org/apache/flink:flink_streaming_java_2_12|||//vendor/org/apache/flink:flink_core|||//vendor/org/apache/flink:flink_java|||//vendor/org/apache/flink:flink_runtime_2_12|||//vendor/org/apache/flink:flink_optimizer_2_12","processorClasses|||L|||","generatesApi|||B|||false","licenses|||L|||","generateNeverlink|||B|||false"],
Error: dictionary expression has duplicate key:
"vendor/org/apache/flink:flink_clients"
ERROR: error loading package 'external': Package 'external' contains errors
INFO: Elapsed time: 3.644s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
```

Why is that? Anyone can help? It would be great having detailed instructions
and project templates for Flink/Scala applications using Bazel. I've put
everything together in the following repo:
https://github.com/salvalcantara/bazel-flink-scala, feel free to send a PR
or whatever.

PS: Also posted in SO:
https://stackoverflow.com/questions/67331792/setup-of-scala-flink-project-using-bazel



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/