Can you attach the maven build log? It should be at logs/mvn/mvn.log. I'm pretty sure that this *should* be downloaded from maven central (i.e. here https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-clean-plugin/2.5) but not sure why maven is trying to download it from there. It looks like it's a transitive dependency of some other projects we depend on.
Unfortunately maven is quite opaque and can be non-deterministic in what it picks to download. We did have a lot of cleanup since the 3.3. release so maybe something fixed that problem. One of the bigger issues we had in the past was that some repositories had conflicting snapshot versions of different dependencies. I'm building the 3.3.0 branch locally now and seeing if I run into the same issue. It's busy downloading the internet as we speak. As far as things that might resolve the issue that have worked in the past for similar things: - Upgrading to a newer maven version - Deleting ~/.m2/repository (the local maven cache). Unfortunately this forces it to re-download everything, which can take a while. - Praying to the maven gods. - Tim On Fri, Apr 17, 2020 at 2:40 PM ravi kanth <ravi....@gmail.com> wrote: > Hi Tim, > > I configured all the dependencies and tried building buildall.sh with > -release flag. However, maven build got stuck & failed downloading: > https://repository.cloudera.com/content/repositories/third-party/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom > > I looked up this pom and got a File Not Found response. > > Thanks, > Rav > > > On Mon, Apr 13, 2020 at 11:01 AM Tim Armstrong <tarmstr...@cloudera.com> > wrote: > >> For those following along, I created a code review to improve the README >> a bit: https://gerrit.cloudera.org/#/c/15719/ >> >> Thanks Ravi for asking these questions, it helps us make the project >> better. >> >> On Mon, Apr 6, 2020 at 9:00 PM ravi kanth <ravi....@gmail.com> wrote: >> >>> Hi Tim, >>> >>> Thanks for taking the time and explaining everything in detail. I will >>> invest more time in building this cluster & will reach out to the >>> community if I face any issues. >>> >>> Thanks, >>> Rav >>> >>> >>> On Mon, Apr 6, 2020 at 9:45 AM Tim Armstrong <tarmstr...@cloudera.com> >>> wrote: >>> >>>> > I had the following already set up and working as they were mentioned >>>> mandatory for building impala from GitHub (The components needed to >>>> build Impala are Apache Hadoop, Hive, HBase, and Sentry) >>>> We should probably remove some of that stuff from the README on github, >>>> it's mainly confusing - the real dev docs are on apache wiki and the real >>>> user docs are elsewhere. Those are just some notes about how the >>>> development environment works that are not of general interested. >>>> >>>> >>>> > 1. Is there a well-written documentation on how to build the source >>>> code from scratch for multi-node environments?. >>>> The build scripts are all the same - the impalad, statestored, catalogd >>>> binaries used in the dev environment are deployable in production setups. >>>> For a production deployment you want a release build (pass in the -release >>>> flag to buildall.sh). >>>> >>>> On Mon, Apr 6, 2020 at 9:40 AM Tim Armstrong <tarmstr...@cloudera.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Apr 2, 2020 at 6:08 PM ravi kanth <ravi....@gmail.com> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Hoping you all are staying safe in these tough times. And I am >>>>>> utilizing this time to learn about Impala. :) >>>>>> >>>>>> I want to do the following: >>>>>> >>>>>> 1. Setup Impala on 5 nodes (1 master + 4 data) >>>>>> 2. I don't want to use prepackaged Impala from 3rd party >>>>>> vendors, instead, I strictly wanted to do from scratch. >>>>>> >>>>>> This is what I did: >>>>>> 1. Downloaded the latest Release-3.3.0 available at >>>>>> https://impala.apache.org/downloads.html >>>>>> 2. Observed that the downloaded is a source project and not the >>>>>> binary. Which means I need to build the source and generate the binaries. >>>>>> 3. So, digging deeper & reading through the following docs I >>>>>> understand that its not straight forward to bring up an impala cluster >>>>>> instead there is a lot of pre-setup that needs to be done. >>>>>> >>>>>> https://github.com/apache/impala >>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala >>>>>> >>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Build+Prerequisites >>>>>> >>>>>> I had the following already set up and working as they were mentioned >>>>>> mandatory for building impala from GitHub (The components needed to >>>>>> build Impala are Apache Hadoop, Hive, HBase, and Sentry) >>>>>> 1. Hadoop >>>>>> 2. Hive >>>>>> 3. Sentry >>>>>> Also, installed and configured but haven't brought up the service for >>>>>> HBase. (I don't understand why this was needed in first place but still >>>>>> installed & configured it to make Impala building happy :)) >>>>>> >>>>>> Questions: >>>>>> 1. Is there a well-written documentation on how to build the source >>>>>> code from scratch for multi-node environments?. >>>>>> >>>>>> I understand >>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala >>>>>> deals with building however, it clearly mentions that its for development >>>>>> purpose. Also, the starting line in the document "*This page >>>>>> describes how to build Impala from source and how to configure and run >>>>>> Impala in a single node development environment.*" says its intended >>>>>> for single-node development. >>>>>> >>>>> >>>>>> Also, the comments on this page don't sound positive which makes me >>>>>> think that if they really work. However, it was last updated in Oct, 2019 >>>>>> which is good. >>>>>> >>>>> The comment is just that >>>>> https://cwiki.apache.org/confluence/display/IMPALA/Bootstrapping+an+Impala+Development+Environment+From+Scratch >>>>> is >>>>> the recommended approach, which is less manual. I don't see any comments >>>>> saying that it doesn't work. AFAIK the page you linked still works. >>>>> >>>>> I'd suggest starting with the front page of the wiki if you want >>>>> developer docs, it's easier finding the most relevant stuff if you start >>>>> there: https://cwiki.apache.org/confluence/display/IMPALA/Impala+Home >>>>> >>>>> >>>>>> 2. The same build page from the previous question also mentions that >>>>>> the source code is compatible with CentOS 7. However, if you look at >>>>>> bin/bootstrap_build.sh, its all hardcoded to Ubuntu(also mentioned in the >>>>>> comments). So, it seems like I have to do some changes to the scripts to >>>>>> make it compatible with CentOs. Please suggest me if I am wrong and if >>>>>> there is anything readily available. Unfortunately, I couldn't locate >>>>>> any. >>>>>> >>>>> bootstrap_development.sh supports CentOS. bootstrap_build.sh is not >>>>> really used much, only in a Jenkins job AFAIK. >>>>> >>>>>> >>>>>> 3. In the same build page, it was mentioned >>>>>> *Installing and Configuring Impala (Obsolete)* >>>>>> If its Obsolete, where can I find the latest installation & >>>>>> configuration document? >>>>>> >>>>> The wiki is mostly developer documentation, user-facing documentation >>>>> is here: https://impala.apache.org/docs/build/html/index.html. >>>>> >>>>> It does have some info about how you might run the different services, >>>>> but as of right now the Apache Impala project doesn't provide a multi-node >>>>> cluster management solution. Users that I know of tend to either use their >>>>> own scripts, use docker containers, or use Cloudera Manager. The hardest >>>>> part is wiring it up to other services - you need the various hive/hadoop >>>>> configurations so that Impala can connect to the various storage and >>>>> metadata services. At the moment we're in a similar position to say, the >>>>> core linux kernel project, where Apache Impala as a project has been >>>>> focused on the core technology and not so much on packaging, distribution, >>>>> orchestration, etc - that's been left to others, similar to the >>>>> relationship between the linux kernel and Red Hat, Debian, Ubuntu, etc. I >>>>> think we'd all like to make it more accessible, especially for people >>>>> wanting to try it out, cause the project website is obviously the first >>>>> place people will come and look. >>>>> >>>>> >>>>>> 4. >>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Build+Prerequisites >>>>>> mentions about setting up of PostGresSQL to bring up Impala. I am aware >>>>>> that Impala needs Hive Metastore for Metadata mangement which in my case >>>>>> is >>>>>> pointing to MySQL. So, do I still need Postgres? >>>>>> >>>>> Those instructions are for setting up a development environment. The >>>>> development environment includes its own versions of all dependencies >>>>> including HMS and will set them all up pointing at the postgres instance. >>>>> If you want to point it at your own installation of HMS, etc, then it >>>>> doesn't really apply. >>>>> >>>>> >>>>>> >>>>>> So, to bring up Impala it looks like we need a ton of other >>>>>> databases/technologies. >>>>>> >>>>> Yeah, that's the nature of the big data ecosystem. There's good and >>>>> bad about it. Impala is focused on being a great query engine for data >>>>> stored in a bunch of different formats - the good is that we can focus on >>>>> that one problem, the bad is that it's not self-contained. >>>>> >>>>> >>>>>> In short, I heard great about Impala for its efficient analytical >>>>>> query processing based on Parquet and I am eagerly waiting to play with >>>>>> it. >>>>>> However, the documentation is creating a lot of pain and yet times >>>>>> disappointing. Sorry about that. >>>>>> >>>>> If you want to kick the tires on a single node setup, the Apache Kudu >>>>> team put together this docker-based quickstart: >>>>> https://github.com/apache/kudu/blob/master/examples/quickstart/impala/README.adoc. >>>>> It's not suitable for production deployments but it is self-contained. I >>>>> would highly recommend this because it sounds like it's addressing the >>>>> pain >>>>> points you are hitting. >>>>> >>>>> The development environment you get from running >>>>> bootstrap_development.sh also is good for playing around on a single node, >>>>> but takes longer and has more potential to hit snags cause it's building >>>>> from scratch: >>>>> https://cwiki.apache.org/confluence/display/IMPALA/Bootstrapping+an+Impala+Development+Environment+From+Scratch >>>>> >>>>> >>>>>> >>>>>> Hoping to hear from some brilliant minds. >>>>>> >>>>>> Thanks, >>>>>> Rav >>>>>> >>>>>