[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-593282253 Hello @vsoch - I spoke to co-organizer of Airflow Summit - @leahecole - and I think we can figure out something with regards you being speaker at the Summit. @leahecole -> can you please reach out to @vsoch and take it from here. I would love to make this happen! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-590053213 I see, pity, let's see. It's the first-time event so we do not know what to expect and for now do not foresee reimbursements, but if we are successful with finding sponsors, who knows, maybe we will introduce a diversity fund. I will keep you updated :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-590051343 BTW. @vsoch I am going to give a talk together with @aijamalnk on Diversity and building even more welcoming community at the Airflow Summit (http://airflowsummit.org). And I will certainly use some of the things we've learned during this conversation. But I have just a thought - since you are nearby (I guess :) ) maybe you would like to give some talk about your side of the experience (and a bit about Singularity). I think that might be cool. We have CFP open : https://pretalx.com/apache-airflow-summit-bay-area-2020/cfp :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-590050910 Indeed Awesome work 🎉 🎉 🎉 🎉 🎉 🎉 . Likewise :). Thanks for cooperation on that one ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-590033385 one transient error - but I restarted it. Looks good ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-590033396 :crossed_fingers: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-590026699 It could be a transient error or some temporary problem with master (despite of the protections we have in place it breaks occassionally). We do not see it now in the latest master, so I suggest to rebase to the latest master (it's green: https://travis-ci.org/apache/airflow/builds/653784320?utm_medium=notification&utm_source=github_status) and see if the problem is still there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-589639710 > looks like the error linked above was one off - a later build didn't trigger it. Yeah. We still have some flaky tests - much less than before but they happen sometimes due to resource problems we have with Travis :(. We are close to migrate to Github Actions but it's one of the things that gets delayed as we are busy with other higher-priority stuff. It's annoying but it's not a blocker This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-588461323 > One quick note about the docs for pylint - the files are now changed to have pylint_main.sh and pylint_tests.sh. Also, this unexpectedly started that huge build process that failed before. Given a .pylintrc file in some root and running pylint, it would be great to see a much simpler (not wrapped / abstracted) run of pylint on a single file. I'd really recommend (again) pre-commits. They will run pylint on all modified files only. This is far better than having to run it manually - the will simply run pylint (or whatever of the other tests) automatically on all the files you modified when committing it. You don't even have to remember about it. And the reason for the build process is to make sure that you run it in the same environment as Travis CI and other people. The problems with tools like pylint is that the results depend often on installed libraries, pylint version etc. For example you need to have installed stubs for python libraries in order to do full static analysis. So there is huge variability depending on your local environment and you can simply get different results locally than on Travis. That's why we run pylint tests inside the docker container which mirrors this container that runs on Travis. This is the only way to make sure everyone runs in the same environment. It happened many times that people reported "I see different errors than those on Travis" - and it turned out that they have different pylint or python version. This is the real benefit of the "huge build". you start to appreciate it when your team of contributors is totally distributed, maintain their own environment and consists of 600 individuals - each with their own setup. Simply - it looks differently from the point of view of single person and differently from the point of view of team of few hundred people. Sometimes efficiency of team require some sacrifice of efficiency of single individual. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-588435931 > How would you like me to address this? > > ``` > PATH=/root:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/hive/bin:/usr/local/go/bin:/opt/gcloud/bin > > * Module airflow.providers.singularity.operators.singularity > > airflow/providers/singularity/operators/singularity.py:65:4: R0913: Too many arguments (12/10) (too-many-arguments) > ``` > > I had added working_dir to kwargs, but then during the test there was an error that the argument wasn't found. So I added it, but now I'm over. The pylint defats we have are 'reasonable' i.e work in most cases but if you have a justified case and do not think refactoring / using a structure instead of n-parameters you could disable pylint check - this specific check for this statement only by adding appropriate # pylint: disable comment. This is a sign from the programmer that this is a deliberate decision and theat it is conscious choice to disable tis rule. You can find plenty of examples in the code and more detailed explanation including some guidance on the decision making in the https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst in pylint section This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-587354436 Hey @vsoch a bit unrelated, but wanted you to know that we have just merged this update to CONTRIBUTING documentation to make ti easier for people like you to understand the dynamics of our project's communication: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#how-to-communicate This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-586675640 I think we do not have it explained very well - and I am not 100% sure myself. But @mik-laj can help with that. One more thing @mik-laj - maybe we should add a short explanation/howto on how to add the documentation - where things should be excluded, how to add new packages and guides? For now it's not at all obvious what steps should be taken in case a new provider's package is added. A short "how to add new package documentation" would be great in CONTRIB docs. It does not have to be long - just to explain what to do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-586641854 Sorry - devl...@airflow.apache.org - see CONTRIB.rst. I was typing it on my tablet and autocomplete kicked in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-586640329 > Conflict resolved - it was the same line as before, another package was added! I added a new line (it was getting too long) so hopefully that won't make the linter unhappy... yep. it is a very active project. so conflicts are expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-586633715 and you need t rebase as well ontop of latest master & you have conflict with setup.py This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-586633655 > Questions: > > * where does example_singularity_operator.py go - it used to be under contrib/example_dags but now, that entire folder is empty? in the meantime we completed AIP-21 and every integration is now moved to 'providers' package.Singularity will need to be moved to providers/singularity > * this error is triggering for the docs (is it related to the above)? I don't see any script_files with grep, nor do I see how super() relates to the example_singularity_operator.py file. just look where any of the other providers are added - there is a TOC where you can navigate to your documentation - otherwise users won't be able to find the docs. > > ``` > writing additional pages... search/usr/local/lib/python3.6/site-packages/sphinx_rtd_theme/search.html:20: RemovedInSphinx30Warning: To modify script_files in the theme is deprecated. Please insert a
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-586169342 > High level feedback since this was a topic earlier: > > * the extra linting for the Dockerfile, which comes down to fairly trivial things like using cd in a command for one layer and using curl instead of wget, has led to several additional errors that are frustrating to deal with. If I knew the project perfectly sure I might have known these in advance, but for a new contributor that gets these bugs (which are somewhat subjective) that then has to figure them out and fix, it's an extra pain that simply doesn't need to exist. > * given the long duration of a PR, the addition of new linters / tests makes it even more confusing. > > I'm happy to keep providing this feedback, and if you don't want it let me know! Feedback is welcome :). I understand it's a bit of pain with linting, but I would treat it as a great learning experience. For me this is the best trainer I have - when I make a mistake or don't do something very well and can do better, I get feedback. That's the case with linting - I've learned a lot of good practices from linting - hadolint for Docker and especially shellcheck for bash. And especially that those lint rules have very good explanation rationale and most of them has the effect that they prevent errors that others might have problems with. For example "using cd" has a good explanation and I think it's a fantastic practice to use WORKDIR instead of cd in Dockerfiles as it makes it more controllable - you always know which directory you are in without having to parse the shell statement. Here is more detailed explanation with example and rationale: https://github.com/hadolint/hadolint/wiki/DL3003 You might of course disagree with particular rules (and you have full right to do so) but we also have the rule in community "disagree but engage" - community decisions are more important than individual preferences. That's one of the things that I had to learn when I joined the community - there are a number of things I do not agree with but then after community decided otherwise i follow and even engage with). On the other hand there were a few rules that were annoying and we raised a proposal to remove them eventually, so there is still place for discussing them in the devlist - and if you have good reasoning why the rules should be removed, I am sure people who will be interested in it will have opinions and maybe decision will be made to disable those rules. * Regarding adding new checks - here again community is more important. When you look at it from your individual point of view, it's bad, but if you look at it from the community, there are more than 100 PRs opened at any time so we would never had a chance to improve - steady improvements are crucial, if you don't improve you actually deteriorate, so while annoying, it's also expected that after rebase you might have to adapt to some new rules. That's actually a sign of healthy project. Also if you had pre-commits installed and working, those "encoding" lines would be removed automatically for you. Whenever we add new rules we try to make them self-fixing and having pre-commit is one of the reasons for that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-585964299 > 1. installing Singularity as a binary inside a container (these bases already exist) > 2. running the container via docker-compose > 3. But then running tests inside of that container > > Is this possible? I see now !. In this case I think the easiest is to add the binary to Docker image and running it from there. Then there is no need to run it as separate image/runtime. That will be way simpler. We already do that for the minicluster - for hadoop tests: Docker installation here: https://github.com/apache/airflow/blob/master/Dockerfile#L184 And starting the cluster here: https://github.com/apache/airflow/blob/master/scripts/ci/in_container/entrypoint_ci.sh#L140 J. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-585943248 > okay just to clarify - you want a Singularity + Airflow container run via a similar kind cluster? You said something about adding specific commands `--start-singularity` and `--stop-singularity` and I'm not sure what / where that is referring to, and how / why we would want to start or stop singularity (it's not a service, it's just a binary installed). I assume we can start singularity from the host and be able to forward this connection to inside the airlfow-testing container so that we can connect to it. With Kind - we are starting it from within the container (by forwarded docker socket) , but it could be started from the host as well (mongo, kerberos and others are started from the host using `docker-compose` configuration and then we can connect to them from the "airflow-testing" by specifying their service names (mongo/kerberos etc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-585941481 > okay so I'm tracing the kubernetes (runtime) as an example, and I have a quick question. In scripts/ci/in_container/entrypoint_ci.sh I see that given the kubernetes runtime, you are setting some variable for a container name to be the airflow name with suffix "-kubernetes." > > and this would suggest the container is brought up inside of this container? Wouldn't it make more sense to have a docker-compose run that runs some equivalent of airflow-testing but with Singularity installed inside? Yes it would make sense - similar to what we do with other integrations (say kerberos or mongo). What we are doing with kind is a bit different - we are forwarding the host's docker credential to inside the airflow-testing container and we are running the kind command line from inside it. This - in effect - reaches out to the host's docker and set's up the kind cluster using host's docker engine. With singularity - if it can be run in the host and we can connect to it from inside the airflow-testing container that would be best. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-585939693 Somehow I missed it completely. Sorry. Responding now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-579443300 > Could you give me some pointers about where to write the Dockerfile, what example from the current codebase (link) is what I should try to copy the logic for? I think the closest one is Kubernetes and I am leaning towards creating another RUNTIME (singularity). There are few of scripts to update. But it's not that hard at the end. I think a custom `--start-singularity` `--stop-singularity` switches should be added similar to `--start-kind-cluster` `--stop-kind-cluster`. It's best if you look for RUNTIME capital wherever it's used but here is the guide Breeze (for local development and testing your solution): - https://github.com/apache/airflow/blob/master/breeze-complete#L40 -> options to add runtime are added here amongst others - https://github.com/apache/airflow/blob/master/breeze#L203 -> Displaying chosen RUNTIME - https://github.com/apache/airflow/blob/master/breeze#L595 -> Flags of Breeze - https://github.com/apache/airflow/blob/master/breeze#L1007 -> that's where RUNTIME leads to adding extra docker compose - https://github.com/apache/airflow/blob/master/breeze#L391 -> help describing the options. Docker Compose: - https://github.com/apache/airflow/blob/master/scripts/ci/docker-compose/runtime-kubernetes.yml - similar should be created for singularity and used in Breeze. In container: - https://github.com/apache/airflow/blob/master/scripts/ci/in_container/entrypoint_ci.sh#L164 - entrypoint for the CI where Kubernetes is started after choosing the RUNTIME="kubernetes" here is where the setup for singularity can be done. It runs https://github.com/apache/airflow/blob/master/scripts/ci/in_container/kubernetes/setup_kind_cluster.sh under the hood. That's it for interactive "local" breeze session. T Then if you run tests the script goes further. It builds airflow kubernetes image and deploys it to kubernetes. In interactive session, the deployment should be run manually as described in https://github.com/apache/airflow/blob/master/TESTING.rst#running-tests-with-kubernetes but if you run tests in travis (RUN_TESTS="true") those steps are executed manually. https://github.com/apache/airflow/blob/master/scripts/ci/in_container/entrypoint_ci.sh#L218 you have the deployment of image to kubernetes: https://github.com/apache/airflow/blob/master/scripts/ci/in_container/deploy_airflow_to_kubernetes.sh and one of the scripts it runs https://github.com/apache/airflow/blob/master/scripts/ci/in_container/kubernetes/docker/rebuild_airflow_image.sh - that's where the Kubernetes image is built. I hope that's enough leads. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-579315631 Thanks @vsoch ! Really sorry for not getting back before - I've been travelling and missed that last comment. Fantastic work! I really liked how you explained the steps you did to make it works! Kudos! > So, although I'm hesitant that something about travis won't provide us with all the permissions that we need, I'm going to give it a try to add a docker compose file to reproduce this. My question for you is how I should go about providing this integration - the examples given previously expose services, and for this case we would want an environment with airflow and singularity in the same space. Do you have thoughts or examples for how to approach this? Travis and any other environment will provide all that we need. We already run in privileged mode (and pretty much all CI systems allow that in order to make all kinds of docker-in-docker approach). In fact what we already do is we are mounting docker socket to within the airflow-testing so that you can even run docker commands to reach out the host docker engine. This is needed to run kind (Kubernetes-In-Docker). > And then given that we can add a custom container, it would need to use one of the Singularity bases (note they are alpine), and then also include dependencies for airflow (or is the install already handled given the container, and happens before the testing?) Either way, if customization is needed on top of the Singularity base (that isn't handled by the testing suite) what would be the best way to do this - building the image elsewhere and using here, or building on demand? Hmm, I'd assume that having a base Singularity should be enough and that we can reach it via "singularity" host name via docker compose. But I understand you need to build in some Airflow dependencies in? Am I right? What kind of dependencies you need to build in that container? Is this like local airflow sources or some airflow dependencies? Excuse my ignorance as I do not know much about singularity. We have few approaches we can get: 1) In case of Kubernetes (Kind) we are using RUNTIME="kubernetes" env and we are checking that in the initial CI scripts - if the RUNTIME="kubernetes" we will install kubernetes-in-docker cluster and provision it with the latest Airflow locally built CI image and then we run our tests with this. This is most complex thing we do and I hope we do not have to do the same. 2) If those are about some small packages to install at running singularity after it is started already, we could simply do it in the CI scripts when we start up and see "INTEGRATION_SINGULARITY" is set to "true" (this will be properly set to true if you add it similarly to other integrations of ours and when --integration singularity switch is used). 3) If you need something pre-backed in the Docker image itself then I think we need a separate Dockerfile with FROM: singularity and some steps to build the dockerfile before docker-compose is run (and refer to the newly built image in Docker compose. My best bet if that is the case, that would be that you build and test such images locally for local testing and once it works I can help to integrate it in the CI scripts properly (You can try yourself - this will likely be just appropriate 'docker build' command executed only when --integration singularity is used) I hope this is helpful! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-576861320 That's OK! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-576477396 > But one quick note - regardless of the testing troubles, the support on this PR has been absolutely excellent, night and day when compared to my previous experience. I'm really appreciative of that :) Thanks for the kind words :). The trouble with that it that we can't afford to spend that much time for every single new person coming :(. The project needs to be scalable. That's why my goal is to have it all as smooth as possible and as easy to self-study and self-discover as possible - only reaching out to mentors/committers/other experienced contributors when really needed. Of course you can't do it from day 1 or even day 5, but my hope is that if we relentlessly work on it and improve it and also get a lot of feedback from people like you - we can approach this "ideal" where everything is self-explanatory and easy to follow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-576475634 > If it makes you feel better, just chock it up to me being an idiot and I promise I’ll never bother this community again. First of all it's not a matter of making me feel better at all. I certainly don't think of you as an idiot, and I would never, ever call anyone idiot, just because that person has no time, or bandwidth or capacity, to read and understand the TON of documentation we have. And if anything - that's a sign that maybe we should do better. You seem to be competent and you know how to program and fix things on your own and do your best to do it - I certainly cannot blame you that you do not know our complex setup. And Airflow IS a complex project. It has a LOT of moving parts and my goal is to make it approachable to even one-time contributors and my ultimate goal is to make it as easy and painless as possible - while keeping the high bar on quality, test automation, check automation etc. I am just curious what are the obstacles to that - and whether we could make it somewhat easier to someone like you who just comes to the project from outside for a short time. That's why I am looking for some suggestions/areas of improvements. I know the project too well myself to see the obstacles - there is even a term for that "Expert Blindness" - http://mnav.com/expert-blindness/ that's why I am looking for help and suggestions of people like you. To have the project much easier approachable is my goal from the very beginning of the project and - believe me - a year ago it took literally days to set it all up so that you could run tests. Now (if we do not have sudden breakage of master by 3rd party dependencies) you can have it up and running in 10 minutes (with fast network). I think so far - by looking of what you've done, I will make a few improvements: - I am going to make "Breeze" THE default dev environment (we have it much lighter as of a few weeks) and if you had it installed (by simply running ./breeze command), then pre-commit would just work out-of-the box or (and?) - we can direct new contributors to simply follow the CI messages - they are descriptive enough and should be good to start I already added a few improvements to Contributing and we even have this PR in progress to explain how communication is expected to look like : https://github.com/apache/airflow/pull/7204 - I will be happy if you add your comments there. WDYT? Anything else you think we can improve to make it more approachable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-576470784 > The linting appears to be all set, but since Singularity is installed, the tests are obviously going to fail. Let me know how you would like to proceed - it's a non trivial thing to get Singularity installed from within Travis. I don't think we have enough singularity experience to add it. We approached in various ways for other integrations: - some of the integrations (mongo etc.) are installed as docker-compose images (for example here: https://github.com/apache/airflow/blob/master/scripts/ci/docker-compose/integration-mongo.yml) - some of them (Kubernetes) are installed via kind (Kubernetes-in-Docker) - here: https://github.com/apache/airflow/blob/master/scripts/ci/in_container/kubernetes/setup_kind_cluster.sh - some of them we install inside the docker container via regular Dockerfile image build instructions (minicluster/hive/hadoop) - https://github.com/apache/airflow/blob/3730c24c41470cd331c5109539ee2fa0c9f4e74a/Dockerfile#L152 https://github.com/apache/airflow/blob/master/Dockerfile#L185 I do not know if/whether any of such options is possible for Singularity - please take a look and see if this is possible to use either of the options above? If we find integration too difficult - In a number cases we rely on "unit tests" where we mock the external systems and mark the tests as skipped - we have a custom marker `pytest.mark.integration` that does it: https://github.com/apache/airflow/blob/master/TESTING.rst#running-integration-tests . If you have `pytest.mark.integration("singularity")` as decorator, the test will be skipped unless you have INTEGRATION_SINGULARITY env variable defined. You can leave such tests in the code and manually test it when you have singularity (but it's best if we can figure out a way to add it to the docker-compose we are already running). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow
potiuk commented on issue #7191: [AIRFLOW-4030] second attempt to add singularity to airflow URL: https://github.com/apache/airflow/pull/7191#issuecomment-576467652 > Also, could you give feedback on: > > > Title Validator — Wrong commit title: adding singularity operator and tests > > What is wrong about it? I think it's quite clearly described - it's about the "commit title". As mentioned in the PR description: > Commit message/PR title starts with [AIRFLOW-]. AIRFLOW- = JIRA ID* Commit message should start with [AIRFLOW-] similarly as PR description. You can run `git commit --amend` to change it and then `git push --force` to push it. It's also quite clearly described in the "Pull Request guidelines": https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines with example: > Preface your commit's subject & PR title with [AIRFLOW-XXX] COMMIT_MSG where XXX is the JIRA number. For example: [AIRFLOW-5574] Fix Google Analytics script loading. I have a question @vsoch - it seems that for some reason it is quite difficult to follow (for you at least). We have not only good and comprehensive documentation but also we have spend quite a lot of time to have messages printed either by the pre-commit system, or docs and require additional explanation (mostly repeating what is in the documentation already) - and pretty much careful reading the error messages could (I think) prevent asking extra questions - that was at least the goal. The messages are designed to guide you and In my answers I mostly repeat the messages already written there when you ask the question and link to already existing documentation. I am really curious why it happens. Maybe we can improve it somehow - is the documentation not written clearly enough? Is there any problem with surfacing the error messages ? Or maybe there is a problem with finding relevant information? Or maybe you need more time to get familiar with the tools we are using ? We tried (with professional technical writer) to place the documentation exactly where it is supposed to be, plus we push it exactly at the place where problems happen (for example in pre-commit checks) and try to make it as clear as possible - but apparently you still have problems with finding it. I am really curious - would it be possible for you to explain how are you approaching it, how are you reading (or maybe not?) the messages you get and how you are understanding (or not) what we try to communicate. As a novice user you are the best "test" for it - maybe also you can have some ideas, proposals or maybe even a Pull Request that can improve it and make the documentation/messages better? Looking forward to more insight. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services