Re: GVO request
Started a job, saw Jim's job afterwards. Will cancel mine. On Tue, Feb 28, 2017 at 3:53 PM, Bharath Vissapragadawrote: > Can someone GVO this please. > > https://gerrit.cloudera.org/#/c/5792/ >
Re: GVO request
done On Tue, Feb 28, 2017 at 3:53 PM, Bharath Vissapragadawrote: > Can someone GVO this please. > > https://gerrit.cloudera.org/#/c/5792/ >
GVO request
Can someone GVO this please. https://gerrit.cloudera.org/#/c/5792/
Re: Toolchain - versioning dependencies with the same version number
On 28 February 2017 at 12:57, Marcel Kornackerwrote: > Yes, I too am particularly concerned about maintaining the ability to > build offline, and downloading the same things over and over again. > > I don't quite understand the case against versioning - if gc'ing > obsolete versions in order to reduce storage space is a concern, then > it's probably fine to a) blow away and re-download everything, or b) > throw away old versions manually, if you happen to be in a situation > where a) isn't possible. > The issue I have with versioning is that there's no way to understand the link between the version number, and what actually changed. It's a kludge to deal with the fact that the toolchain can't handle this kind of situation. That said, my immediate goal is to make sure that everyone picks up the new LZ4 build. So I'll add a new version for now, and we can revisit this some other time. > > On Tue, Feb 28, 2017 at 12:20 PM, Tim Armstrong > wrote: > > I agree it's not too bad if you have a fat pipe to S3, but it's a pretty > > bad regression in usability to make it the default and particularly > provide > > no way to opt out. > > > > The toolchain is almost 1GB though, which is pretty problematic to > download > > if a developer is on coffee-shop wifi, cellular wireless, airplane wifi, > > etc. It'd also be pretty easy for a developer working offline to switch > > branches, run buildall.sh, have gcc, etc, automatically deleted and then > be > > stuck unable to build anything. > > > > > > On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson > wrote: > > > >> I'd prefer not to do that because it's something of a hack and generates > >> too many artifacts if we make incremental build changes, not to mention > the > >> extra complexity required to make such a change because new tarballs > might > >> need to be uploaded. > >> > >> > >> > >> > >> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker wrote: > >> > >> > Can we add another version string component like -1 or -impala1, or > add a > >> > dummy patch to the affected packages to allow for new versions with > the > >> > same upstream version? I think this is what Linux distributions > commonly > >> do > >> > to have several versions of the same upstream version. > >> > > >> > On Feb 27, 2017 21:15, "Henry Robinson" wrote: > >> > > >> > Yes, it would force re-downloading. At my office, downloading a > toolchain > >> > takes a matter of a few seconds, so I'm not sure the cost is that > great. > >> > And if it turned out to be problematic, one could always change the > >> > toolchain directory for different branches. Having something locally > that > >> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ > >> would > >> > work. > >> > > >> > However I wouldn't want to force behaviour that into the toolchain > >> scripts > >> > because of the need for garbage collection it would raise - it > wouldn't > >> be > >> > clear when to delete old toolchains programatically. > >> > > >> > On 27 February 2017 at 20:51, Tim Armstrong > >> > wrote: > >> > > >> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading > of > >> the > >> > > entire toolchain every time a developer switches between branches > with > >> > > different build IDs? > >> > > > >> > > I know some developers do that frequently, e.g. to try and reproduce > >> bugs > >> > > on older versions or backport patches. > >> > > > >> > > I agree it would be good to fix this, since I've run into this > problem > >> > > before, I'm just not quite sure what the best solution is. In the > other > >> > > case where I had this issue with LLVM I changed the version number > (by > >> > > appending noasserts-) to it, but that's really just a hack. > >> > > > >> > > -Tim > >> > > > >> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson > > >> > > wrote: > >> > > > >> > > > As Matt said, I have a patch that implements build ID-based > >> versioning > >> > at > >> > > > https://gerrit.cloudera.org/#/c/6166/2. > >> > > > > >> > > > Does anyone want to take a look? If we could get this in soon it > >> would > >> > > help > >> > > > smooth over the LZ4 change which is going in shortly. > >> > > > > >> > > > On 27 February 2017 at 14:21, Henry Robinson > >> > wrote: > >> > > > > >> > > > > I agree that that might be useful, and that it's a separately > >> > > addressable > >> > > > > problem. > >> > > > > > >> > > > > On 27 February 2017 at 14:18, Matthew Jacobs > >> > wrote: > >> > > > > > >> > > > >> Just catching up to this e-mail, though I had seen your code > >> reviews > >> > > > >> and I think this approach makes sense. An additional concern > would > >> > be > >> > > > >> how to identify how a toolchain package was built, and AFAIK > this > >> is > >> > > > >> tricky now if only the 'toolchain ID' is
Re: Toolchain - versioning dependencies with the same version number
Yes, I too am particularly concerned about maintaining the ability to build offline, and downloading the same things over and over again. I don't quite understand the case against versioning - if gc'ing obsolete versions in order to reduce storage space is a concern, then it's probably fine to a) blow away and re-download everything, or b) throw away old versions manually, if you happen to be in a situation where a) isn't possible. On Tue, Feb 28, 2017 at 12:20 PM, Tim Armstrongwrote: > I agree it's not too bad if you have a fat pipe to S3, but it's a pretty > bad regression in usability to make it the default and particularly provide > no way to opt out. > > The toolchain is almost 1GB though, which is pretty problematic to download > if a developer is on coffee-shop wifi, cellular wireless, airplane wifi, > etc. It'd also be pretty easy for a developer working offline to switch > branches, run buildall.sh, have gcc, etc, automatically deleted and then be > stuck unable to build anything. > > > On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson wrote: > >> I'd prefer not to do that because it's something of a hack and generates >> too many artifacts if we make incremental build changes, not to mention the >> extra complexity required to make such a change because new tarballs might >> need to be uploaded. >> >> >> >> >> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker wrote: >> >> > Can we add another version string component like -1 or -impala1, or add a >> > dummy patch to the affected packages to allow for new versions with the >> > same upstream version? I think this is what Linux distributions commonly >> do >> > to have several versions of the same upstream version. >> > >> > On Feb 27, 2017 21:15, "Henry Robinson" wrote: >> > >> > Yes, it would force re-downloading. At my office, downloading a toolchain >> > takes a matter of a few seconds, so I'm not sure the cost is that great. >> > And if it turned out to be problematic, one could always change the >> > toolchain directory for different branches. Having something locally that >> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ >> would >> > work. >> > >> > However I wouldn't want to force behaviour that into the toolchain >> scripts >> > because of the need for garbage collection it would raise - it wouldn't >> be >> > clear when to delete old toolchains programatically. >> > >> > On 27 February 2017 at 20:51, Tim Armstrong >> > wrote: >> > >> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of >> the >> > > entire toolchain every time a developer switches between branches with >> > > different build IDs? >> > > >> > > I know some developers do that frequently, e.g. to try and reproduce >> bugs >> > > on older versions or backport patches. >> > > >> > > I agree it would be good to fix this, since I've run into this problem >> > > before, I'm just not quite sure what the best solution is. In the other >> > > case where I had this issue with LLVM I changed the version number (by >> > > appending noasserts-) to it, but that's really just a hack. >> > > >> > > -Tim >> > > >> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson >> > > wrote: >> > > >> > > > As Matt said, I have a patch that implements build ID-based >> versioning >> > at >> > > > https://gerrit.cloudera.org/#/c/6166/2. >> > > > >> > > > Does anyone want to take a look? If we could get this in soon it >> would >> > > help >> > > > smooth over the LZ4 change which is going in shortly. >> > > > >> > > > On 27 February 2017 at 14:21, Henry Robinson >> > wrote: >> > > > >> > > > > I agree that that might be useful, and that it's a separately >> > > addressable >> > > > > problem. >> > > > > >> > > > > On 27 February 2017 at 14:18, Matthew Jacobs >> > wrote: >> > > > > >> > > > >> Just catching up to this e-mail, though I had seen your code >> reviews >> > > > >> and I think this approach makes sense. An additional concern would >> > be >> > > > >> how to identify how a toolchain package was built, and AFAIK this >> is >> > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this >> > > > >> e-mail I was thinking about this problem (which I think we can >> > address >> > > > >> separately), and that we might want to write the native-toolchain >> > git >> > > > >> hash with every toolchain build so that the exact build scripts >> are >> > > > >> associated with those build artifacts. I filed >> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related >> > > > >> problem. >> > > > >> >> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson < >> he...@apache.org> >> > > > >> wrote: >> > > > >> > As written, the toolchain can't apparently deal with the >> > possibility >> > > > of >> > > > >> > build flags changing, but a dependency version
Re: Toolchain - versioning dependencies with the same version number
I agree it's not too bad if you have a fat pipe to S3, but it's a pretty bad regression in usability to make it the default and particularly provide no way to opt out. The toolchain is almost 1GB though, which is pretty problematic to download if a developer is on coffee-shop wifi, cellular wireless, airplane wifi, etc. It'd also be pretty easy for a developer working offline to switch branches, run buildall.sh, have gcc, etc, automatically deleted and then be stuck unable to build anything. On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinsonwrote: > I'd prefer not to do that because it's something of a hack and generates > too many artifacts if we make incremental build changes, not to mention the > extra complexity required to make such a change because new tarballs might > need to be uploaded. > > > > > On Tue, Feb 28, 2017 at 8:55 AM Lars Volker wrote: > > > Can we add another version string component like -1 or -impala1, or add a > > dummy patch to the affected packages to allow for new versions with the > > same upstream version? I think this is what Linux distributions commonly > do > > to have several versions of the same upstream version. > > > > On Feb 27, 2017 21:15, "Henry Robinson" wrote: > > > > Yes, it would force re-downloading. At my office, downloading a toolchain > > takes a matter of a few seconds, so I'm not sure the cost is that great. > > And if it turned out to be problematic, one could always change the > > toolchain directory for different branches. Having something locally that > > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ > would > > work. > > > > However I wouldn't want to force behaviour that into the toolchain > scripts > > because of the need for garbage collection it would raise - it wouldn't > be > > clear when to delete old toolchains programatically. > > > > On 27 February 2017 at 20:51, Tim Armstrong > > wrote: > > > > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of > the > > > entire toolchain every time a developer switches between branches with > > > different build IDs? > > > > > > I know some developers do that frequently, e.g. to try and reproduce > bugs > > > on older versions or backport patches. > > > > > > I agree it would be good to fix this, since I've run into this problem > > > before, I'm just not quite sure what the best solution is. In the other > > > case where I had this issue with LLVM I changed the version number (by > > > appending noasserts-) to it, but that's really just a hack. > > > > > > -Tim > > > > > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson > > > wrote: > > > > > > > As Matt said, I have a patch that implements build ID-based > versioning > > at > > > > https://gerrit.cloudera.org/#/c/6166/2. > > > > > > > > Does anyone want to take a look? If we could get this in soon it > would > > > help > > > > smooth over the LZ4 change which is going in shortly. > > > > > > > > On 27 February 2017 at 14:21, Henry Robinson > > wrote: > > > > > > > > > I agree that that might be useful, and that it's a separately > > > addressable > > > > > problem. > > > > > > > > > > On 27 February 2017 at 14:18, Matthew Jacobs > > wrote: > > > > > > > > > >> Just catching up to this e-mail, though I had seen your code > reviews > > > > >> and I think this approach makes sense. An additional concern would > > be > > > > >> how to identify how a toolchain package was built, and AFAIK this > is > > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this > > > > >> e-mail I was thinking about this problem (which I think we can > > address > > > > >> separately), and that we might want to write the native-toolchain > > git > > > > >> hash with every toolchain build so that the exact build scripts > are > > > > >> associated with those build artifacts. I filed > > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related > > > > >> problem. > > > > >> > > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson < > he...@apache.org> > > > > >> wrote: > > > > >> > As written, the toolchain can't apparently deal with the > > possibility > > > > of > > > > >> > build flags changing, but a dependency version remaining the > same. > > > > >> > > > > > >> > LZ4 has never (afaict) been built with optimization enabled. I > > have > > > a > > > > >> > commit that enables -O3, but that continues to produce artifacts > > for > > > > >> > lz4-1.7.5 with no version change. This is a problem because > > > > >> bootstrapping > > > > >> > the toolchain will fail to pick up the new binaries - because > the > > > > >> > previously downloaded version is still in the local cache, and > > won't > > > > be > > > > >> > overwritten because of the version change. > > > > >> > > > > > >> > I think the simplest way to fix this is to write the toolchain > > build >
Re: Toolchain - versioning dependencies with the same version number
I'd prefer not to do that because it's something of a hack and generates too many artifacts if we make incremental build changes, not to mention the extra complexity required to make such a change because new tarballs might need to be uploaded. On Tue, Feb 28, 2017 at 8:55 AM Lars Volkerwrote: > Can we add another version string component like -1 or -impala1, or add a > dummy patch to the affected packages to allow for new versions with the > same upstream version? I think this is what Linux distributions commonly do > to have several versions of the same upstream version. > > On Feb 27, 2017 21:15, "Henry Robinson" wrote: > > Yes, it would force re-downloading. At my office, downloading a toolchain > takes a matter of a few seconds, so I'm not sure the cost is that great. > And if it turned out to be problematic, one could always change the > toolchain directory for different branches. Having something locally that > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ would > work. > > However I wouldn't want to force behaviour that into the toolchain scripts > because of the need for garbage collection it would raise - it wouldn't be > clear when to delete old toolchains programatically. > > On 27 February 2017 at 20:51, Tim Armstrong > wrote: > > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of the > > entire toolchain every time a developer switches between branches with > > different build IDs? > > > > I know some developers do that frequently, e.g. to try and reproduce bugs > > on older versions or backport patches. > > > > I agree it would be good to fix this, since I've run into this problem > > before, I'm just not quite sure what the best solution is. In the other > > case where I had this issue with LLVM I changed the version number (by > > appending noasserts-) to it, but that's really just a hack. > > > > -Tim > > > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson > > wrote: > > > > > As Matt said, I have a patch that implements build ID-based versioning > at > > > https://gerrit.cloudera.org/#/c/6166/2. > > > > > > Does anyone want to take a look? If we could get this in soon it would > > help > > > smooth over the LZ4 change which is going in shortly. > > > > > > On 27 February 2017 at 14:21, Henry Robinson > wrote: > > > > > > > I agree that that might be useful, and that it's a separately > > addressable > > > > problem. > > > > > > > > On 27 February 2017 at 14:18, Matthew Jacobs > wrote: > > > > > > > >> Just catching up to this e-mail, though I had seen your code reviews > > > >> and I think this approach makes sense. An additional concern would > be > > > >> how to identify how a toolchain package was built, and AFAIK this is > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this > > > >> e-mail I was thinking about this problem (which I think we can > address > > > >> separately), and that we might want to write the native-toolchain > git > > > >> hash with every toolchain build so that the exact build scripts are > > > >> associated with those build artifacts. I filed > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related > > > >> problem. > > > >> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson > > > >> wrote: > > > >> > As written, the toolchain can't apparently deal with the > possibility > > > of > > > >> > build flags changing, but a dependency version remaining the same. > > > >> > > > > >> > LZ4 has never (afaict) been built with optimization enabled. I > have > > a > > > >> > commit that enables -O3, but that continues to produce artifacts > for > > > >> > lz4-1.7.5 with no version change. This is a problem because > > > >> bootstrapping > > > >> > the toolchain will fail to pick up the new binaries - because the > > > >> > previously downloaded version is still in the local cache, and > won't > > > be > > > >> > overwritten because of the version change. > > > >> > > > > >> > I think the simplest way to fix this is to write the toolchain > build > > > ID > > > >> to > > > >> > the dependency version file (that's in the local cache only) when > > it's > > > >> > downloaded. If that ID changes, the dependency will be > > re-downloaded. > > > >> > > > > >> > This has the disadvantage that any bump in > IMPALA_TOOLCHAIN_BUILD_ID > > > >> will > > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will > > > >> > re-download all of them. My feeling is that that cost is better > than > > > >> trying > > > >> > to individually determine whether a dependency has changed between > > > >> > toolchain builds. > > > >> > > > > >> > Any thoughts on whether this is the right way to go? > > > >> > > > > >> > Henry > > > >> > > > > > > > > > > > > > > > > -- > > > > Henry Robinson > > > > Software Engineer > > > > Cloudera > > > >
Re: Toolchain - versioning dependencies with the same version number
Can we add another version string component like -1 or -impala1, or add a dummy patch to the affected packages to allow for new versions with the same upstream version? I think this is what Linux distributions commonly do to have several versions of the same upstream version. On Feb 27, 2017 21:15, "Henry Robinson"wrote: Yes, it would force re-downloading. At my office, downloading a toolchain takes a matter of a few seconds, so I'm not sure the cost is that great. And if it turned out to be problematic, one could always change the toolchain directory for different branches. Having something locally that set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ would work. However I wouldn't want to force behaviour that into the toolchain scripts because of the need for garbage collection it would raise - it wouldn't be clear when to delete old toolchains programatically. On 27 February 2017 at 20:51, Tim Armstrong wrote: > Maybe I'm misunderstanding, but wouldn't that force re-downloading of the > entire toolchain every time a developer switches between branches with > different build IDs? > > I know some developers do that frequently, e.g. to try and reproduce bugs > on older versions or backport patches. > > I agree it would be good to fix this, since I've run into this problem > before, I'm just not quite sure what the best solution is. In the other > case where I had this issue with LLVM I changed the version number (by > appending noasserts-) to it, but that's really just a hack. > > -Tim > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson > wrote: > > > As Matt said, I have a patch that implements build ID-based versioning at > > https://gerrit.cloudera.org/#/c/6166/2. > > > > Does anyone want to take a look? If we could get this in soon it would > help > > smooth over the LZ4 change which is going in shortly. > > > > On 27 February 2017 at 14:21, Henry Robinson wrote: > > > > > I agree that that might be useful, and that it's a separately > addressable > > > problem. > > > > > > On 27 February 2017 at 14:18, Matthew Jacobs wrote: > > > > > >> Just catching up to this e-mail, though I had seen your code reviews > > >> and I think this approach makes sense. An additional concern would be > > >> how to identify how a toolchain package was built, and AFAIK this is > > >> tricky now if only the 'toolchain ID' is known. Before I saw this > > >> e-mail I was thinking about this problem (which I think we can address > > >> separately), and that we might want to write the native-toolchain git > > >> hash with every toolchain build so that the exact build scripts are > > >> associated with those build artifacts. I filed > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related > > >> problem. > > >> > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson > > >> wrote: > > >> > As written, the toolchain can't apparently deal with the possibility > > of > > >> > build flags changing, but a dependency version remaining the same. > > >> > > > >> > LZ4 has never (afaict) been built with optimization enabled. I have > a > > >> > commit that enables -O3, but that continues to produce artifacts for > > >> > lz4-1.7.5 with no version change. This is a problem because > > >> bootstrapping > > >> > the toolchain will fail to pick up the new binaries - because the > > >> > previously downloaded version is still in the local cache, and won't > > be > > >> > overwritten because of the version change. > > >> > > > >> > I think the simplest way to fix this is to write the toolchain build > > ID > > >> to > > >> > the dependency version file (that's in the local cache only) when > it's > > >> > downloaded. If that ID changes, the dependency will be > re-downloaded. > > >> > > > >> > This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID > > >> will > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will > > >> > re-download all of them. My feeling is that that cost is better than > > >> trying > > >> > to individually determine whether a dependency has changed between > > >> > toolchain builds. > > >> > > > >> > Any thoughts on whether this is the right way to go? > > >> > > > >> > Henry > > >> > > > > > > > > > > > > -- > > > Henry Robinson > > > Software Engineer > > > Cloudera > > > 415-994-6679 <(415)%20994-6679> > > > > > > > > > > > -- > > Henry Robinson > > Software Engineer > > Cloudera > > 415-994-6679 > > > -- Henry Robinson Software Engineer Cloudera 415-994-6679 <(415)%20994-6679>