I was able to build OK on my system. I think the download must be failing when it's running "mvn clean", but it works fine on my system (it downloads that plugin OK)
tarmstrong@tarmstrong-box2:~/Impala/impala/fe$ mvn clean [INFO] Scanning for projects... [INFO] [INFO] -----------------< org.apache.impala:impala-frontend >------------------ [INFO] Building Apache Impala Query Engine Frontend 0.1-SNAPSHOT [INFO] --------------------------------[ jar ]--------------------------------- Downloading from cloudera.thirdparty.repo: https://repository.cloudera.com/content/repositories/third-party/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom (3.9 kB at 65 kB/s) Downloading from cloudera.thirdparty.repo: https://repository.cloudera.com/content/repositories/third-party/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.jar Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.jar Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.jar (25 kB at 616 kB/s) [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ impala-frontend --- [INFO] Deleting /home/tarmstrong/Impala/impala/fe/target [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.548 s [INFO] Finished at: 2020-04-20T16:23:48-07:00 [INFO] ------------------------------------------------------------------------ On Mon, Apr 20, 2020 at 4:05 PM Tim Armstrong <tarmstr...@cloudera.com> wrote: > Can you attach the maven build log? It should be at logs/mvn/mvn.log. > > I'm pretty sure that this *should* be downloaded from maven central (i.e. > here > https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-clean-plugin/2.5) > but not sure why maven is trying to download it from there. It looks like > it's a transitive dependency of some other projects we depend on. > > Unfortunately maven is quite opaque and can be non-deterministic in what > it picks to download. We did have a lot of cleanup since the 3.3. release > so maybe something fixed that problem. One of the bigger issues we had in > the past was that some repositories had conflicting snapshot versions of > different dependencies. > > I'm building the 3.3.0 branch locally now and seeing if I run into the > same issue. It's busy downloading the internet as we speak. > > As far as things that might resolve the issue that have worked in the past > for similar things: > > - Upgrading to a newer maven version > - Deleting ~/.m2/repository (the local maven cache). Unfortunately > this forces it to re-download everything, which can take a while. > - Praying to the maven gods. > > - Tim > > On Fri, Apr 17, 2020 at 2:40 PM ravi kanth <ravi....@gmail.com> wrote: > >> Hi Tim, >> >> I configured all the dependencies and tried building buildall.sh with >> -release flag. However, maven build got stuck & failed downloading: >> https://repository.cloudera.com/content/repositories/third-party/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom >> >> I looked up this pom and got a File Not Found response. >> >> Thanks, >> Rav >> >> >> On Mon, Apr 13, 2020 at 11:01 AM Tim Armstrong <tarmstr...@cloudera.com> >> wrote: >> >>> For those following along, I created a code review to improve the README >>> a bit: https://gerrit.cloudera.org/#/c/15719/ >>> >>> Thanks Ravi for asking these questions, it helps us make the project >>> better. >>> >>> On Mon, Apr 6, 2020 at 9:00 PM ravi kanth <ravi....@gmail.com> wrote: >>> >>>> Hi Tim, >>>> >>>> Thanks for taking the time and explaining everything in detail. I will >>>> invest more time in building this cluster & will reach out to the >>>> community if I face any issues. >>>> >>>> Thanks, >>>> Rav >>>> >>>> >>>> On Mon, Apr 6, 2020 at 9:45 AM Tim Armstrong <tarmstr...@cloudera.com> >>>> wrote: >>>> >>>>> > I had the following already set up and working as they were >>>>> mentioned mandatory for building impala from GitHub (The components >>>>> needed to build Impala are Apache Hadoop, Hive, HBase, and Sentry) >>>>> We should probably remove some of that stuff from the README on >>>>> github, it's mainly confusing - the real dev docs are on apache wiki and >>>>> the real user docs are elsewhere. Those are just some notes about how the >>>>> development environment works that are not of general interested. >>>>> >>>>> >>>>> > 1. Is there a well-written documentation on how to build the source >>>>> code from scratch for multi-node environments?. >>>>> The build scripts are all the same - the impalad, statestored, >>>>> catalogd binaries used in the dev environment are deployable in production >>>>> setups. For a production deployment you want a release build (pass in the >>>>> -release flag to buildall.sh). >>>>> >>>>> On Mon, Apr 6, 2020 at 9:40 AM Tim Armstrong <tarmstr...@cloudera.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 2, 2020 at 6:08 PM ravi kanth <ravi....@gmail.com> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> Hoping you all are staying safe in these tough times. And I am >>>>>>> utilizing this time to learn about Impala. :) >>>>>>> >>>>>>> I want to do the following: >>>>>>> >>>>>>> 1. Setup Impala on 5 nodes (1 master + 4 data) >>>>>>> 2. I don't want to use prepackaged Impala from 3rd party >>>>>>> vendors, instead, I strictly wanted to do from scratch. >>>>>>> >>>>>>> This is what I did: >>>>>>> 1. Downloaded the latest Release-3.3.0 available at >>>>>>> https://impala.apache.org/downloads.html >>>>>>> 2. Observed that the downloaded is a source project and not the >>>>>>> binary. Which means I need to build the source and generate the >>>>>>> binaries. >>>>>>> 3. So, digging deeper & reading through the following docs I >>>>>>> understand that its not straight forward to bring up an impala cluster >>>>>>> instead there is a lot of pre-setup that needs to be done. >>>>>>> >>>>>>> https://github.com/apache/impala >>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala >>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Build+Prerequisites >>>>>>> >>>>>>> I had the following already set up and working as they were >>>>>>> mentioned mandatory for building impala from GitHub (The components >>>>>>> needed to build Impala are Apache Hadoop, Hive, HBase, and Sentry) >>>>>>> 1. Hadoop >>>>>>> 2. Hive >>>>>>> 3. Sentry >>>>>>> Also, installed and configured but haven't brought up the service >>>>>>> for HBase. (I don't understand why this was needed in first place but >>>>>>> still >>>>>>> installed & configured it to make Impala building happy :)) >>>>>>> >>>>>>> Questions: >>>>>>> 1. Is there a well-written documentation on how to build the source >>>>>>> code from scratch for multi-node environments?. >>>>>>> >>>>>>> I understand >>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala >>>>>>> deals with building however, it clearly mentions that its for >>>>>>> development >>>>>>> purpose. Also, the starting line in the document "*This page >>>>>>> describes how to build Impala from source and how to configure and run >>>>>>> Impala in a single node development environment.*" says its >>>>>>> intended for single-node development. >>>>>>> >>>>>> >>>>>>> Also, the comments on this page don't sound positive which makes me >>>>>>> think that if they really work. However, it was last updated in Oct, >>>>>>> 2019 >>>>>>> which is good. >>>>>>> >>>>>> The comment is just that >>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Bootstrapping+an+Impala+Development+Environment+From+Scratch >>>>>> is >>>>>> the recommended approach, which is less manual. I don't see any comments >>>>>> saying that it doesn't work. AFAIK the page you linked still works. >>>>>> >>>>>> I'd suggest starting with the front page of the wiki if you want >>>>>> developer docs, it's easier finding the most relevant stuff if you start >>>>>> there: https://cwiki.apache.org/confluence/display/IMPALA/Impala+Home >>>>>> >>>>>> >>>>>>> 2. The same build page from the previous question also mentions that >>>>>>> the source code is compatible with CentOS 7. However, if you look at >>>>>>> bin/bootstrap_build.sh, its all hardcoded to Ubuntu(also mentioned in >>>>>>> the >>>>>>> comments). So, it seems like I have to do some changes to the scripts to >>>>>>> make it compatible with CentOs. Please suggest me if I am wrong and if >>>>>>> there is anything readily available. Unfortunately, I couldn't locate >>>>>>> any. >>>>>>> >>>>>> bootstrap_development.sh supports CentOS. bootstrap_build.sh is not >>>>>> really used much, only in a Jenkins job AFAIK. >>>>>> >>>>>>> >>>>>>> 3. In the same build page, it was mentioned >>>>>>> *Installing and Configuring Impala (Obsolete)* >>>>>>> If its Obsolete, where can I find the latest installation & >>>>>>> configuration document? >>>>>>> >>>>>> The wiki is mostly developer documentation, user-facing documentation >>>>>> is here: https://impala.apache.org/docs/build/html/index.html. >>>>>> >>>>>> It does have some info about how you might run the different >>>>>> services, but as of right now the Apache Impala project doesn't provide a >>>>>> multi-node cluster management solution. Users that I know of tend to >>>>>> either >>>>>> use their own scripts, use docker containers, or use Cloudera Manager. >>>>>> The >>>>>> hardest part is wiring it up to other services - you need the various >>>>>> hive/hadoop configurations so that Impala can connect to the various >>>>>> storage and metadata services. At the moment we're in a similar position >>>>>> to >>>>>> say, the core linux kernel project, where Apache Impala as a project has >>>>>> been focused on the core technology and not so much on packaging, >>>>>> distribution, orchestration, etc - that's been left to others, similar to >>>>>> the relationship between the linux kernel and Red Hat, Debian, Ubuntu, >>>>>> etc. >>>>>> I think we'd all like to make it more accessible, especially for people >>>>>> wanting to try it out, cause the project website is obviously the first >>>>>> place people will come and look. >>>>>> >>>>>> >>>>>>> 4. >>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Build+Prerequisites >>>>>>> mentions about setting up of PostGresSQL to bring up Impala. I am aware >>>>>>> that Impala needs Hive Metastore for Metadata mangement which in my >>>>>>> case is >>>>>>> pointing to MySQL. So, do I still need Postgres? >>>>>>> >>>>>> Those instructions are for setting up a development environment. The >>>>>> development environment includes its own versions of all dependencies >>>>>> including HMS and will set them all up pointing at the postgres instance. >>>>>> If you want to point it at your own installation of HMS, etc, then it >>>>>> doesn't really apply. >>>>>> >>>>>> >>>>>>> >>>>>>> So, to bring up Impala it looks like we need a ton of other >>>>>>> databases/technologies. >>>>>>> >>>>>> Yeah, that's the nature of the big data ecosystem. There's good and >>>>>> bad about it. Impala is focused on being a great query engine for data >>>>>> stored in a bunch of different formats - the good is that we can focus on >>>>>> that one problem, the bad is that it's not self-contained. >>>>>> >>>>>> >>>>>>> In short, I heard great about Impala for its efficient analytical >>>>>>> query processing based on Parquet and I am eagerly waiting to play with >>>>>>> it. >>>>>>> However, the documentation is creating a lot of pain and yet times >>>>>>> disappointing. Sorry about that. >>>>>>> >>>>>> If you want to kick the tires on a single node setup, the Apache Kudu >>>>>> team put together this docker-based quickstart: >>>>>> https://github.com/apache/kudu/blob/master/examples/quickstart/impala/README.adoc. >>>>>> It's not suitable for production deployments but it is self-contained. I >>>>>> would highly recommend this because it sounds like it's addressing the >>>>>> pain >>>>>> points you are hitting. >>>>>> >>>>>> The development environment you get from running >>>>>> bootstrap_development.sh also is good for playing around on a single >>>>>> node, >>>>>> but takes longer and has more potential to hit snags cause it's building >>>>>> from scratch: >>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Bootstrapping+an+Impala+Development+Environment+From+Scratch >>>>>> >>>>>> >>>>>>> >>>>>>> Hoping to hear from some brilliant minds. >>>>>>> >>>>>>> Thanks, >>>>>>> Rav >>>>>>> >>>>>>