I've made some comments about the conversion process here:
https://issues.apache.org/jira/browse/LUCENE-6933?focusedCommentId=15064208#comment-15064208
Feel free to try it out.
https://github.com/dweiss/lucene-solr-svn2git
I don't know what the next steps are. This looks like a good starting
I've filed https://issues.apache.org/jira/browse/LUCENE-6937 as a parent
issue to discuss and work through a migration.
I'm going to assume we are going to go ahead with this until someone steps
up and says otherwise. So far we seem to have consensus. In any case, that
JIRA is probably the best
> The question I had (I am sure a very dumb one): WHY do we care about history
preserved perfectly in Git?
For me it's for sentimental, archival and task-challenge reasons. Robert's
requirement is that git praise/blame/log works and on a given file and
shows its true history of changes. Everyone
I filed LUCENE-6937 as a parent issue for an SVN->Git migration. I've
linked the issue that Dawid is working on, as well as a new issue for
converting the build to work correctly in a Git checkout rather than SVN.
- Mark
On Tue, Dec 15, 2015 at 1:26 PM Mark Miller wrote:
On 12/16/2015 5:53 PM, Alexandre Rafalovitch wrote:
> On 16 December 2015 at 00:44, Dawid Weiss wrote:
>> 4) The size of JARs is really not an issue. The entire SVN repo I mirrored
>> locally (including empty interim commits to cater for svn:mergeinfos) is 4G.
>> If you
+1 totally agree. Any way; the bloat should largely be the binaries &
unrelated projects, not code (small text files).
On Wed, Dec 16, 2015 at 10:36 PM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:
> In defense of more history immediately available--it is often far more
> useful
On Thu, Dec 17, 2015, at 12:53 AM, Alexandre Rafalovitch wrote:
> On 16 December 2015 at 00:44, Dawid Weiss wrote:
> > 4) The size of JARs is really not an issue. The entire SVN repo I mirrored
> > locally (including empty interim commits to cater for svn:mergeinfos) is
On 16 December 2015 at 00:44, Dawid Weiss wrote:
> 4) The size of JARs is really not an issue. The entire SVN repo I mirrored
> locally (including empty interim commits to cater for svn:mergeinfos) is 4G.
> If you strip the stuff like javadocs and side projects (Nutch,
In defense of more history immediately available--it is often far more
useful to poke around code history/run blame to figure out some code than
by taking it at face value. Putting this in a secondary place like
Apache SVN repo IMO reduces the readability of the code itself. This is
doubly true
3 is typically solved by adding a .gitignore or .gitkeep file in what would
be an empty directory, if the directory itself is important.
On Tue, Dec 15, 2015 at 12:21 PM, Dawid Weiss wrote:
>
> Oh, just for completeness -- moving to git is not just about the version
>
It's not true that nobody is working on this. I have been working on the
SVN dump in the meantime. You would not believe how incredibly complex the
process of processing that (remote) dump is. Let me highlight a few key
issues:
1) There is no "one" Lucene SVN repository that can be transferred to
And if nobody steps up and "solves" the current technical issue will that
simply accelerate the (desired) shift to using git as the main repo for
future Lucene/Solr development? Would there be any downside to that outcome?
Is there any formal Apache policy for new projects as to whether they can
I know that, but I meant historical checkouts -- and if you add fake files
you're altering history :)
D.
On Tue, Dec 15, 2015 at 7:24 PM, Mike Drob wrote:
> 3 is typically solved by adding a .gitignore or .gitkeep file in what
> would be an empty directory, if the directory
I thought the general consensus at minimum was to investigate a git mirror
that stripped some artifacts out (jars etc) to lighten up the work of the
process. If at some point the project switched to git, such a mirror might
be a suitable git repo for the project with archived older versions in
Anyone willing to lead this discussion to some kind of better resolution?
Did that whole back and forth help with any ideas on the best path forward?
I know it's a complicated issue, git / svn, the light side, the dark side,
but doesn't GitHub also depend on this mirroring? It's going to be super
I don't think you will get a volunteer until someone sums up the discussion
with a proposal that someone is not going to veto or something. We can't
expect everyone to read the same tea leaves and come to the same
conclusion.
Perhaps a stripped down mirror is the consensus. I'd rather we had some
If we move to git, stripping out jars seems to be an independent decision?
Can you even strip out jars and preserve history (i.e. not change
hashes and invalidate everyone's forks/clones)?
I did run across this:
Ok, give me some time and I'll see what I can achieve. Now that I actually
wrote an SVN dump parser (validator and serializer) things are under much
better control...
I'll try to achieve the following:
1) selectively drop unnecessary stuff from history (cms/, javadocs/, JARs
and perhaps other
Let's just move to git. It's almost 2016. I suspect many contributors are
probably primarily working off the github mirror anyway. Is there any
great argument for delaying?
On Dec 15, 2015 11:51 AM, "Mark Miller" wrote:
> I don't think you will get a volunteer until
Oh, just for completeness -- moving to git is not just about the version
management, it's also:
1) all the scripts that currently do validations, etc.
2) what to do with svn:* properties
3) what to do with empty folders (not available in git).
I don't volunteer to solve these :)
Dawid
On Tue,
Let's just make some JIRA issues. I'm not worried about volunteers for any
of it yet, just a direction we agree upon. Once we know where we are going,
we generally don't have a big volunteer problem. We haven't heard from Uwe
yet, but really does seem like moving to Git makes the most sense.
I'm
If Dawid is volunteering to sort out this mess, +1 to let him make it
a move to git. I don't care if we disagree about JARs, I trust he will
do a good job and that is more important.
On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss wrote:
>
> It's not true that nobody is
FYI.
- All of Lucene's SVN, incremental deltas, uncompressed: 5.0G
- the above, tar.bz2: 1.2G
Sadly, I didn't succeed at recreating a local SVN repo from those
incremental dumps. svnadmin load fails with a cryptic error related to
the fact that revision number of node-copy operations refer to
Have we heard anything more from Infrastructure? It seems the thing to
do right now is to get more of a conversation going with them to
understand the issue at hand. Once the release is done, I'd be happy to
try and get that conversation going faster than it is.
Upayavira
On Tue, Dec 8, 2015, at
github will reject files larger than 100MB and will warn for files larger
than 50MB (https://help.github.com/articles/working-with-large-files/).
They have recently released Git Large File Storage to alleviate issues
caused by these restrictions (
On Tue, Dec 8, 2015 at 2:05 PM, Upayavira wrote:
> Have we heard anything more from Infrastructure?
Alas, no, unfortunately, at least from what I've seen ...
I would love to know if this memory leak in git-svn is a known issue
so we can be more informed (we've asked several
Dumb question, but searching around suggests that git-svn can be killed and
then resumed with `git svn fetch`. Shouldn't that resolve any
process-level memory leak?
On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Hello devs,
>
> The infra team has
If you do that, then the changes do not sync to github, and there's a 99%
chance that the next time a change is seen by the mirroring process (or by
the hourly cron that updates all the svn->git mirrors) the same memory leak
would happen.
On Tue, Dec 8, 2015 at 12:40 PM, Scott Blum
Here's what I've just got on the Infra hipchat channel:
The ASF has a tool, svn2gitupdate[1], which I presume uses git-svn,
which fails periodically. When it does fail, it takes with it all other
ASF projects that make use of the same tool, until an admin can
intervene and restart things.
When
> Don’t know how much we have of historic jars in our history.
I actually do know. Or will know. In about ~10 hours. I wrote a script
that does the following:
1) git log all revisions touching https://svn.apache.org/repos/asf/lucene
2) grep revision numbers
3) use svnrdump to get every single
The lfs cost at GitHub starts at >1Gb. Don’t know how much we have of historic
jars in our history. Also, as far as I understand, Apache is free to install
their own git-lfs server, so the repository will use an Apache-operated server
for storing the large files instead of GitHub’s own storage
One more thing, perhaps of importance, the raw Lucene repo contains
all the history of projects that then turned top-level (Nutch,
Mahout). These could also be dropped (or ignored) when converting to
git. If we agree JARs are not relevant, why should projects not
directly related to Lucene/ Solr
It seems you'd want to preserve that history in a frozen/archiced Apache
Svn repo for Lucene. Then make the new git repo slimmer before switching.
Folks that want very old versions or doing research can at least go through
the original SVN repo.
On Tuesday, December 8, 2015, Dawid Weiss
> So you're trying to minimise the size of a git clone?
Yes and no. I'm just a curious individual. My gut feeling is that even
with 10+ years of history and binary blobs inside, the size of the
repo (git or SVN) should *not* be much of a problem. It's merely ~47k
worth of revisions... :)
Dawid
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.
Upayavira
On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.
Upayavira
On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.
Upayavira
On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a
So you're trying to minimise the size of a git clone?
I'd agree that Nutch etc aren't relevant.
Upayavira
On Tue, Dec 8, 2015, at 09:16 PM, Dawid Weiss wrote:
> One more thing, perhaps of importance, the raw Lucene repo contains
> all the history of projects that then turned top-level (Nutch,
>
Grant's 1.3gb record commit was adding HTML files with JavaDocs to the
cms probably not that relevant either.
svn log -v -r1240618 https://svn.apache.org/repos/asf/lucene
It's fun exploring, actually... I bet with a few proper exclusions one
can get down to manageable size. As always with
If you pull from aaf git (git.a.o) or github, you are not using git-svn at
all, bypassing the actual git-svn problem.
Check out
https://github.com/apache/infrastructure-puppet/tree/deployment/modules/git_mirror_asf
for what we use, specifically the update-mirror.sh script. That is was
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.
Upayavira
On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.
Upayavira
On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.
Upayavira
On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.
Upayavira
On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a
I had not heard of git-lfs looks promising
https://git-lfs.github.com/?utm_source=github_site_medium=blog_campaign=gitlfs
On Sunday, December 6, 2015, Jan Høydahl wrote:
> If the size of historic jars is the problem here, would looking into
> git-lfs for *.jar be one
If the size of historic jars is the problem here, would looking into git-lfs
for *.jar be one workaround? I might also be totally off here :-)
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
> 6. des. 2015 kl. 00.46 skrev Scott Blum :
>
> If
I tried it once (for storing large text files -- Polish dictionaries,
uncompressed -- on github), but it simply didn't work. More headaches
than benefits (to me).
Dawid
On Sun, Dec 6, 2015 at 10:04 PM, Doug Turnbull
wrote:
> I had not heard of git-lfs looks
I'm fine if we drop the jars, really. I'm just fond of having a "real"
history of a project, that's all. And I don't think the conversion
problem stems from JARs alone; I think there's some other underlying
issue. I asked for a filtered dump of the svn repo branch, perhaps I
can experiment a bit
If I understand this thread (perhaps not?) The issue comes from synching
git and svn? If we move to git only, all old versions and jars will live in
svn so anyone who needs to build an old version is all set. The move to git
can retain history without jars for "blame". .
re: keeping old jars around...
Having all the old jars around is a nice idea, but do we know that
anybody really cares?
Straw-man two question poll:
1> What's the most recent version of Solr/Lucene you'd be OK with
nuking the jars?
2> In the last year, what's the oldest version of Solr/Lucene
On Sat, Dec 5, 2015 at 5:53 PM, david.w.smi...@gmail.com
wrote:
> I understand Gus; but we’d like to separate the question of wether we should
> move from svn to git from fixing the git mirror.
Except moving to git is one path to fixing the issue, so it's not
really
If lucene was a new project being started today, is there any question
about whether it would be managed in svn or git? If not, this might be a
good impetus for moving to a better world.
On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley wrote:
> On Sat, Dec 5, 2015 at 5:53 PM,
I understand Gus; but we’d like to separate the question of wether we
should move from svn to git from fixing the git mirror. It’s contentious —
I encourage you to search the list archives for some of the arguments.
On Sat, Dec 5, 2015 at 12:53 PM Gus Heck wrote:
> If I
Ouch... not having an official mirror would be a huge burden on those of us
managing org-specific forks. :(
On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Hello devs,
>
> The infra team has notified us (Lucene/Solr) that in 26 days our
> git-svn mirror
> I don't think jar files are 'history' and it was a mistake we had so
> many in source control before we cleaned that up. it is much better
> without them.
Depends how you look at it. If your goal is to be able to actually
build ancient versions then dropping those JARs is going to be a real
If we moved to git would a read only svn for older versions still exist? If
so no reason to keep any jars at all in git.
On Dec 4, 2015 4:22 PM, "Robert Muir" wrote:
> On Fri, Dec 4, 2015 at 4:14 PM, Dawid Weiss wrote:
> >> [...] several GBs unless we
> Does anyone know of a link to this git-svn issue? Is it a known
issue? If there's something simple we can do (remove old jars from
our svn history, remove old branches), maybe we can sidestep the issue
and infra will allow it to keep running?
I believe it is partially covered under
On Fri, Dec 4, 2015 at 4:14 PM, Dawid Weiss wrote:
>> [...] several GBs unless we remove those JARs from our history.
>
> 1) History is important, don't dump it.
I don't think jar files are 'history' and it was a mistake we had so
many in source control before we cleaned
It'd be cool to actually reintegrate ancient CVS history as well (I
think not all of it was moved to SVN).
https://sourceforge.net/projects/lucene/
D.
On Fri, Dec 4, 2015 at 10:30 PM, Upayavira wrote:
> Even if we moved to git and did an svn rm on
>
On Fri, Dec 4, 2015 at 4:25 PM, Dawid Weiss wrote:
>> I don't think jar files are 'history' and it was a mistake we had so
>> many in source control before we cleaned that up. it is much better
>> without them.
>
> Depends how you look at it. If your goal is to be able to
Maybe a silly question, but has anybody actually looked into the
git-svn itself. E.g. talking to git-svn team with our example to help
them troubleshoot the link. Or run a test sync under profiler.
Also, it is running into OOM, but how big is a system doing the sync.
If the issue is upgrading the
> [...] several GBs unless we remove those JARs from our history.
1) History is important, don't dump it.
2) git isn't dumb -- git clone -b master --single-branch would only
fetch what's actually needed/ referenced. We could split the history
into "pre-ivy" and "post-ivy" branches so that
Even if we moved to git and did an svn rm on
https://svn.apache.org/repos/asf/lucene/dev, the entire history of Lucene would
remain in the ASF Subversion repository. Nothing we can do to prevent that!!
Upayavira
On Fri, Dec 4, 2015, at 09:26 PM, Gus Heck wrote:
> If we moved to git would a
Hello devs,
The infra team has notified us (Lucene/Solr) that in 26 days our
git-svn mirror will be turned off, because running it consumes too
many system resources, affecting other projects, apparently because of
a memory leak in git-svn.
Does anyone know of a link to this git-svn issue? Is
.@mikemccandless.com]
Sent: Friday, December 04, 2015 2:58 PM
To: Lucene/Solr dev
Cc: infrastruct...@apache.org
Subject: Lucene/Solr git mirror will soon turn off
Hello devs,
The infra team has notified us (Lucene/Solr) that in 26 days our
git-svn mirror will be turned off, because running
Oh, nevermind -- I think I know why:
License
GNU Library or Lesser General Public License version 2.0 (LGPLv2)
D.
On Fri, Dec 4, 2015 at 10:33 PM, Dawid Weiss wrote:
> It'd be cool to actually reintegrate ancient CVS history as well (I
> think not all of it was moved to
Many old builds will also have problems even with a git checkout. If you
actually wanted to try and build them it would be much more sane to work
from the SVN history I'd hope we can retain.
Mark
On Fri, Dec 4, 2015 at 4:55 PM Robert Muir wrote:
> On Fri, Dec 4, 2015 at 4:25
In the original report, the Infrastructure team said that throwing
memory at it did not solve the problem. And I believe they threw *a lot*
of memory at it.
There may well be other options - just needs someone to dive in and
look!
Upayavira
On Fri, Dec 4, 2015, at 11:10 PM, Alexandre
As I said earlier - our history is inside the ASF SVN repo. The only way
our history would be lost would be if the whole repo was deleted, which
I suspect won't happen for a while. So even if we imported a snapshot
over to Git, our full SVN history is immutably stored in SVN (even if we
did svn rm
che.org
> Cc: infrastruct...@apache.org
> Subject: RE: Lucene/Solr git mirror will soon turn off
>
> I know Infra has tried a number of things to resolve this, to no avail. But
> did
> we try "git-svn --revision=" to only mirror "post-LUCENE-3930" (ivy,
>
;>
>
> > -Original Message-
> > From: Dyer, James [mailto:james.d...@ingramcontent.com <javascript:;>]
> > Sent: Friday, December 04, 2015 10:48 PM
> > To: dev@lucene.apache.org <javascript:;>
> > Cc: infrastruct...@apache.org <javascript:;&g
I agree with Rob on this — delete the ‘jar’s from git history, for all the
reasons Rob said. If someone wants to attempt to actually *build* an old
release, and thus needs the jars, then they are welcome to use ASF SVN
archives for that purpose instead, and even then apparently it will be a
72 matches
Mail list logo