I was able to build OK on my system. I think the download must be failing
when it's running "mvn clean", but it works fine on my system (it downloads
that plugin OK)

tarmstrong@tarmstrong-box2:~/Impala/impala/fe$ mvn clean
[INFO] Scanning for projects...
[INFO]
[INFO] -----------------< org.apache.impala:impala-frontend
>------------------
[INFO] Building Apache Impala Query Engine Frontend 0.1-SNAPSHOT
[INFO] --------------------------------[ jar
]---------------------------------
Downloading from cloudera.thirdparty.repo:
https://repository.cloudera.com/content/repositories/third-party/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom
Downloading from central:
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom
Downloaded from central:
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom
(3.9 kB at 65 kB/s)
Downloading from cloudera.thirdparty.repo:
https://repository.cloudera.com/content/repositories/third-party/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.jar
Downloading from central:
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.jar
Downloaded from central:
https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.jar
(25 kB at 616 kB/s)
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ impala-frontend
---
[INFO] Deleting /home/tarmstrong/Impala/impala/fe/target
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 1.548 s
[INFO] Finished at: 2020-04-20T16:23:48-07:00
[INFO]
------------------------------------------------------------------------

On Mon, Apr 20, 2020 at 4:05 PM Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Can you attach the maven build log? It should be at logs/mvn/mvn.log.
>
> I'm pretty sure that this *should* be downloaded from maven central (i.e.
> here
> https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-clean-plugin/2.5)
> but not sure why maven is trying to download it from there. It looks like
> it's a transitive dependency of some other projects we depend on.
>
> Unfortunately maven is quite opaque and can be non-deterministic in what
> it picks to download. We did have a lot of cleanup since the 3.3. release
> so maybe something fixed that problem. One of the bigger issues we had in
> the past was that some repositories had conflicting snapshot versions of
> different dependencies.
>
> I'm building the 3.3.0 branch locally now and seeing if I run into the
> same issue. It's busy downloading the internet as we speak.
>
> As far as things that might resolve the issue that have worked in the past
> for similar things:
>
>    - Upgrading to a newer maven version
>    - Deleting ~/.m2/repository (the local maven cache). Unfortunately
>    this forces it to re-download everything, which can take a while.
>    - Praying to the maven gods.
>
> - Tim
>
> On Fri, Apr 17, 2020 at 2:40 PM ravi kanth <ravi....@gmail.com> wrote:
>
>> Hi Tim,
>>
>> I configured all the dependencies and tried building buildall.sh with
>> -release flag. However, maven build got stuck & failed downloading:
>> https://repository.cloudera.com/content/repositories/third-party/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom
>>
>> I looked up this pom and got a File Not Found response.
>>
>> Thanks,
>> Rav
>>
>>
>> On Mon, Apr 13, 2020 at 11:01 AM Tim Armstrong <tarmstr...@cloudera.com>
>> wrote:
>>
>>> For those following along, I created a code review to improve the README
>>> a bit: https://gerrit.cloudera.org/#/c/15719/
>>>
>>> Thanks Ravi for asking these questions, it helps us make the project
>>> better.
>>>
>>> On Mon, Apr 6, 2020 at 9:00 PM ravi kanth <ravi....@gmail.com> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> Thanks for taking the time and explaining everything in detail. I will
>>>> invest more time in building this cluster & will reach out to the
>>>> community if I face any issues.
>>>>
>>>> Thanks,
>>>> Rav
>>>>
>>>>
>>>> On Mon, Apr 6, 2020 at 9:45 AM Tim Armstrong <tarmstr...@cloudera.com>
>>>> wrote:
>>>>
>>>>> > I had the following already set up and working as they were
>>>>> mentioned mandatory for building impala from GitHub (The components
>>>>> needed to build Impala are Apache Hadoop, Hive, HBase, and Sentry)
>>>>> We should probably remove some of that stuff from the README on
>>>>> github, it's mainly confusing - the real dev docs are on apache wiki and
>>>>> the real user docs are elsewhere. Those are just some notes about how the
>>>>> development environment works that are not of general interested.
>>>>>
>>>>>
>>>>> > 1. Is there a well-written documentation on how to build the source
>>>>> code from scratch for multi-node environments?.
>>>>> The build scripts are all the same - the impalad, statestored,
>>>>> catalogd binaries used in the dev environment are deployable in production
>>>>> setups. For a production deployment you want a release build (pass in the
>>>>> -release flag to buildall.sh).
>>>>>
>>>>> On Mon, Apr 6, 2020 at 9:40 AM Tim Armstrong <tarmstr...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 2, 2020 at 6:08 PM ravi kanth <ravi....@gmail.com> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Hoping you all are staying safe in these tough times. And I am
>>>>>>> utilizing this time to learn about Impala. :)
>>>>>>>
>>>>>>> I want to do the following:
>>>>>>>
>>>>>>> 1. Setup Impala on 5 nodes (1 master + 4 data)
>>>>>>> 2. I don't want to use prepackaged Impala from 3rd party
>>>>>>> vendors, instead, I strictly wanted to do from scratch.
>>>>>>>
>>>>>>> This is what I did:
>>>>>>> 1. Downloaded the latest Release-3.3.0 available at
>>>>>>> https://impala.apache.org/downloads.html
>>>>>>> 2. Observed that the downloaded is a source project and not the
>>>>>>> binary. Which means I need to build the source and generate the 
>>>>>>> binaries.
>>>>>>> 3. So, digging deeper & reading through the following docs I
>>>>>>> understand that its not straight forward to bring up an impala cluster
>>>>>>> instead there is a lot of pre-setup that needs to be done.
>>>>>>>
>>>>>>> https://github.com/apache/impala
>>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Build+Prerequisites
>>>>>>>
>>>>>>> I had the following already set up and working as they were
>>>>>>> mentioned mandatory for building impala from GitHub (The components
>>>>>>> needed to build Impala are Apache Hadoop, Hive, HBase, and Sentry)
>>>>>>> 1. Hadoop
>>>>>>> 2. Hive
>>>>>>> 3. Sentry
>>>>>>> Also, installed and configured but haven't brought up the service
>>>>>>> for HBase. (I don't understand why this was needed in first place but 
>>>>>>> still
>>>>>>> installed & configured it to make Impala building happy :))
>>>>>>>
>>>>>>> Questions:
>>>>>>> 1. Is there a well-written documentation on how to build the source
>>>>>>> code from scratch for multi-node environments?.
>>>>>>>
>>>>>>> I understand
>>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
>>>>>>> deals with building however, it clearly mentions that its for 
>>>>>>> development
>>>>>>> purpose. Also, the starting line in the document "*This page
>>>>>>> describes how to build Impala from source and how to configure and run
>>>>>>> Impala in a single node development environment.*" says its
>>>>>>> intended for single-node development.
>>>>>>>
>>>>>>
>>>>>>> Also, the comments on this page don't sound positive which makes me
>>>>>>> think that if they really work. However, it was last updated in Oct, 
>>>>>>> 2019
>>>>>>> which is good.
>>>>>>>
>>>>>> The comment is just that
>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Bootstrapping+an+Impala+Development+Environment+From+Scratch
>>>>>>  is
>>>>>> the recommended approach, which is less manual. I don't see any comments
>>>>>> saying that it doesn't work. AFAIK the page you linked still works.
>>>>>>
>>>>>> I'd suggest starting with the front page of the wiki if you want
>>>>>> developer docs, it's easier finding the most relevant stuff if you start
>>>>>> there: https://cwiki.apache.org/confluence/display/IMPALA/Impala+Home
>>>>>>
>>>>>>
>>>>>>> 2. The same build page from the previous question also mentions that
>>>>>>> the source code is compatible with CentOS 7. However, if you look at
>>>>>>> bin/bootstrap_build.sh, its all hardcoded to Ubuntu(also mentioned in 
>>>>>>> the
>>>>>>> comments). So, it seems like I have to do some changes to the scripts to
>>>>>>> make it compatible with CentOs. Please suggest me if I am wrong and if
>>>>>>> there is anything readily available. Unfortunately, I couldn't locate 
>>>>>>> any.
>>>>>>>
>>>>>> bootstrap_development.sh supports CentOS. bootstrap_build.sh is not
>>>>>> really used much, only in a Jenkins job AFAIK.
>>>>>>
>>>>>>>
>>>>>>> 3. In the same build page, it was mentioned
>>>>>>> *Installing and Configuring Impala (Obsolete)*
>>>>>>> If its Obsolete, where can I find the latest installation &
>>>>>>> configuration document?
>>>>>>>
>>>>>> The wiki is mostly developer documentation, user-facing documentation
>>>>>> is here: https://impala.apache.org/docs/build/html/index.html.
>>>>>>
>>>>>> It does have some info about how you might run the different
>>>>>> services, but as of right now the Apache Impala project doesn't provide a
>>>>>> multi-node cluster management solution. Users that I know of tend to 
>>>>>> either
>>>>>> use their own scripts, use docker containers, or use Cloudera Manager. 
>>>>>> The
>>>>>> hardest part is wiring it up to other services - you need the various
>>>>>> hive/hadoop configurations so that Impala can connect to the various
>>>>>> storage and metadata services. At the moment we're in a similar position 
>>>>>> to
>>>>>> say, the core linux kernel project, where Apache Impala as a project has
>>>>>> been focused on the core technology and not so much on packaging,
>>>>>> distribution, orchestration, etc - that's been left to others, similar to
>>>>>> the relationship between the linux kernel and Red Hat, Debian, Ubuntu, 
>>>>>> etc.
>>>>>> I think we'd all like to make it more accessible, especially for people
>>>>>> wanting to try it out, cause the project website is obviously the first
>>>>>> place people will come and look.
>>>>>>
>>>>>>
>>>>>>> 4.
>>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Impala+Build+Prerequisites
>>>>>>> mentions about setting up of PostGresSQL to bring up Impala. I am aware
>>>>>>> that Impala needs Hive Metastore for Metadata mangement which in my 
>>>>>>> case is
>>>>>>> pointing to MySQL. So, do I still need Postgres?
>>>>>>>
>>>>>> Those instructions are for setting up a development environment. The
>>>>>> development environment includes its own versions of all dependencies
>>>>>> including HMS and will set them all up pointing at the postgres instance.
>>>>>> If you want to point it at your own installation of HMS, etc, then it
>>>>>> doesn't really apply.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> So, to bring up Impala it looks like we need a ton of other
>>>>>>> databases/technologies.
>>>>>>>
>>>>>> Yeah, that's the nature of the big data ecosystem. There's good and
>>>>>> bad about it. Impala is focused on being a great query engine for data
>>>>>> stored in a bunch of different formats - the good is that we can focus on
>>>>>> that one problem, the bad is that it's not self-contained.
>>>>>>
>>>>>>
>>>>>>> In short, I heard great about Impala for its efficient analytical
>>>>>>> query processing based on Parquet and I am eagerly waiting to play with 
>>>>>>> it.
>>>>>>> However, the documentation is creating a lot of pain and yet times
>>>>>>> disappointing. Sorry about that.
>>>>>>>
>>>>>> If you want to kick the tires on a single node setup, the Apache Kudu
>>>>>> team put together this docker-based quickstart:
>>>>>> https://github.com/apache/kudu/blob/master/examples/quickstart/impala/README.adoc.
>>>>>> It's not suitable for production deployments but it is self-contained. I
>>>>>> would highly recommend this because it sounds like it's addressing the 
>>>>>> pain
>>>>>> points you are hitting.
>>>>>>
>>>>>> The development environment you get from running
>>>>>> bootstrap_development.sh also is good for playing around on a single 
>>>>>> node,
>>>>>> but takes longer and has more potential to hit snags cause it's building
>>>>>> from scratch:
>>>>>> https://cwiki.apache.org/confluence/display/IMPALA/Bootstrapping+an+Impala+Development+Environment+From+Scratch
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Hoping to hear from some brilliant minds.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rav
>>>>>>>
>>>>>>

Reply via email to