Re: Moving to git?
There is only one good rule though - no merge commmits in the history :) Ever. Do whatever you want beyond that. A clean, simple history for each branch is the only sensible use of Git I've seen. +1 - Mark On Sat, May 30, 2015 at 9:00 AM Adrien Grand jpou...@gmail.com wrote: The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller
Re: Moving to git?
bq. That is my workflow, get over it. Done. bq. Its not something we vote about. Its just like the editor I choose to use. I have care 0 about your vote or opinion on anything. Bad community member. - Mark On Mon, Jun 1, 2015 at 4:57 AM Robert Muir rcm...@gmail.com wrote: I use merge actually. Its just fine. That is my workflow, get over it. Its not something we vote about. Its just like the editor I choose to use. On Mon, Jun 1, 2015 at 2:37 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: There is only one good rule though - no merge commmits in the history :) Ever. Do whatever you want beyond that. A clean, simple history for each branch is the only sensible use of Git I've seen. +1 - Mark On Sat, May 30, 2015 at 9:00 AM Adrien Grand jpou...@gmail.com wrote: The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller
Re: Moving to git?
I use merge actually. Its just fine. That is my workflow, get over it. Its not something we vote about. Its just like the editor I choose to use. On Mon, Jun 1, 2015 at 2:37 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: There is only one good rule though - no merge commmits in the history :) Ever. Do whatever you want beyond that. A clean, simple history for each branch is the only sensible use of Git I've seen. +1 - Mark On Sat, May 30, 2015 at 9:00 AM Adrien Grand jpou...@gmail.com wrote: The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
I've honestly never understood the perspective of eliminating merge commits (though I've had to work with it, and the rebasing required got me into some of the worst git snafu's I've ever been in). Merges are history too. Why would anyone want to loose the information that code was merged from a branch? For example if the problem was introduced when code lines were merged, that's useful info about when/how it happened and where more attention needs to be focused. Not saying it's wrong, just saying I don't understand it... I like git and use it where I can, but as was noted earlier it will probably be necessary for the project to establish the way they wish to use or it will likely create significant chaos as one person tries to eliminate merges in the history and another person preserves them; One person forks and makes pull requests while another commits directly... who reviews the pull request... Do commiters use pull requests, or only non-commiters? Food for thought: https://www.atlassian.com/git/tutorials/comparing-workflows/ I don't use the github repo for solr when I build it from the repo right now because it seems to be a secondary add on and I always favor the canonical source, because the last thing I want is to deal with an extra layer and figuring out where the pitfalls in the translation between layers might be. My $0.02, Gus On Mon, Jun 1, 2015 at 2:37 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: There is only one good rule though - no merge commmits in the history :) Ever. Do whatever you want beyond that. A clean, simple history for each branch is the only sensible use of Git I've seen. +1 - Mark On Sat, May 30, 2015 at 9:00 AM Adrien Grand jpou...@gmail.com wrote: The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller -- http://www.the111shift.com
Re: Moving to git?
Nice! On Sun, May 31, 2015 at 1:31 PM Steve Davids sdav...@gmail.com wrote: bq. Something needs to be done about all those jars in the source history, I will not let this go. I went ahead and used the BFG Repo Cleaner https://rtyley.github.io/bfg-repo-cleaner/ tool to drop all of the old jars in the git history, here are the findings: $ git clone --mirror https://github.com/apache/lucene-solr.git lucene-solr-mirror 489M lucene-solr-mirror $ java -jar ~/Downloads/bfg-1.12.3.jar --delete-files *.jar --protect-blobs-from trunk,branch_5x,branch_4x lucene-solr-mirror $ cd lucene-solr-mirror $ git reflog expire --expire=now --all git gc --prune=now --aggressive 182M lucene-solr-mirror $ cat lucene-solr-mirror.bfg-report/2015-05-31/10-16-36/deleted-files.txt af4eed0506b53f17a4d22e4f1630ee03cb7991e5 177868 Tidy.jar 53f82a1c4c492dc810c27317857bbb02afd6fa58 62983 activation-1.1.jar 3beb3b802ffd7502ac4b4d47e0b2a75d08e30cc3 1034049 ant-1.6.5.jar 704717779f6d0d7eb026dc7af78a35e51adeec8b 1323005 ant-1.7.1.jar 7f5be4a4e05939429353a90e882846aeac72b976 1933743 ant-1.8.2.jar 063cce4f940033fa6e33d3e590cf6f5051129295 93518 ant-junit-1.7.1.jar 704717779f6d0d7eb026dc7af78a35e51adeec8b 1323005 apache-ant-1.7.1.jar 063cce4f940033fa6e33d3e590cf6f5051129295 93518 apache-ant-junit-1.7.1.jar e3c62523fb93b5e2f73365e6cee0d0bc68e48556 95511 apache-mime4j-core-0.7.jar 1f7bf1ea13697ca0243d399ca6e5d864dd8bec0b 300168 apache-mime4j-dom-0.7.jar bab8b31fb99256e13fc6010701db560243c47fa7 26027 apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar 5c4007c7e74af85d823243153d308f80e084eff0 22478 apache-solr-noggit-r1099557.jar f59a39b011591edafc7955e97ae0d195fdf8b42e 22376 apache-solr-noggit-r1209632.jar 2a07c61d9ecb9683a135b7847682e7c36f19bbfe 22770 apache-solr-noggit-r1211150.jar 30be80e0b838a9c1445936b6966ccfc7ff165ae5 36776 apache-solr-noggit-r730138.jar 97d779912d38d2524a0e20efa849a4b6f01a4b46 21229 apache-solr-noggit-r730138.jar a798b805d0ce92606697cc1b2aac42bf416076e3 37259 apache-solr-noggit-r944541.jar 9b434f5760dd0d78350bdf8237273c0d5db0174e 21240 apache-solr-noggit-r944541.jar 8217cae0a1bc977b241e0c8517cc2e3e7cede276 43033 asm-3.1.jar 4133d823d96bf3fc26d3a9754375dcc30d8da416 342664 asm-debug-all-4.1.jar f66e9a8b9868226121961c13e6a32a55d0b2f78a 229116 bcmail-jdk15-1.45.jar 409070b0370a95c14ed4357261afb96b91d10e86 1663318 bcprov-jdk15-1.45.jar b64b033af70609338c07e2a88a5f7efcd1a84ddb 92027 boilerpipe-1.1.0.jar 96c3bdbdaacd5289b0e654842e435689fbcf22e2 679423 carrot2-core-3.4.0.jar 043c0cb889aea066f7d4126af029d00a0bcd9e81 655412 carrot2-core-3.4.0.jar f872cbc8eec94f7d5b29a73f99cd13089848a3cd 933657 carrot2-core-3.4.2.jar ce2d3bf9c28a4ff696d66a82334d15fd0161e890 995243 carrot2-core-3.4.2.jar be94db93d41bd4ba53b650d421cfa5fb0519b9af 958799 carrot2-core-3.5.0.1.jar adc127c48137d03e252f526de84a07c8d6bda521 979186 carrot2-core-3.5.0.jar ab44cf9314b1efff393e05f9c938446887d3570e 981085 carrot2-core-3.5.0.jar 5ca86c5e72b2953feb0b58fbd87f76d0301cbbf6 517641 carrot2-mini-3.1.0.jar b1b89c9c921f16af22a88db3ff28975a8e40d886 188671 commons-beanutils-1.7.0.jar e633afbe6842aa92b1a8f0ff3f5b8c0e3283961b 36174 commons-cli-1.1.jar 957b6752af9a60c1bb2a4f65db0e90e5ce00f521 46725 commons-codec-1.3.jar 458d432da88b0efeab640c229903fb5aad274044 58160 commons-codec-1.4.jar e9013fed78f333c928ff7f828948b91fcb5a92b4 73098 commons-codec-1.5.jar ee1bc49acae11cc79eceec51f7be785590e99fd8 232771 commons-codec-1.6.jar 41e230feeaa53618b6ac5f8d11792c2eecf4d4fd 559366 commons-collections-3.1.jar c35fa1fee145cba638884e41b80a401cbe4924ef 575389 commons-collections-3.2.1.jar 78d832c11c42023d4bc12077a1d9b7b5025217bc 143847 commons-compress-1.0.jar 51baf91a2df10184a8cca5cb43f11418576743a1 161361 commons-compress-1.1.jar 61753909c3f32306bf60d09e5345d47058ba2122 168596 commons-compress-1.2.jar 6c826c528b60bb1b25e9053b7f4c920292f6c343 224548 commons-compress-1.3.jar f80348dfa0b59f0840c25d1b8c25d1490d1eaf51 22017 commons-csv-1.0-SNAPSHOT-r609327.jar 8439e6f1a8b1d82943f84688b8086869255eda86 27361 commons-csv-1.0-SNAPSHOT-r966014.jar 1783dbea232ced6db122268f8faa5ce773c7ea42 139966 commons-digester-1.7.jar 9c8bd13a2002a9ff5b35b873b9f111d5281ad201 148783 commons-digester-2.0.jar aa209b3887c90933cdc58c8c8572e90435e8e48d 57779 commons-fileupload-1.2.1.jar 7c59774aed4f5dd08778489aaad565690ff7c132 305001 commons-httpclient-3.1.jar 133dc6cb35f5ca2c5920fd0933a557c2def88680 109043 commons-io-1.4.jar b5c7d692fe5616af4332c1a1db6efd23e3ff881b 163151 commons-io-2.1.jar ce0ca22c8d29a9be736d775fe50bfdc6ce770186 257923 commons-lang-2.4.jar 532939ecab6b77ccb77af3635c55ff9752b70ab7 261809 commons-lang-2.4.jar 98467d3a653ebad776ffa3542efeb9732fe0b482 284220 commons-lang-2.6.jar b73a80fab641131e6fbe3ae833549efb3c540d17 38015 commons-logging-1.0.4.jar 1deef144cb17ed2c11c6cdcdcb2d9530fa8d0b47 60686 commons-logging-1.1.1.jar ae0b63586701efdc7bf03ffb0a840d50950d211c 3566844 core-3.1.1.jar
Re: Moving to git?
You guys totally miss the point on clone. No, I think you miss our point. The thing is that svn checkout gives you enough, to do what you need to do. git clone --depth 1 does as well -- you work on your stuff, then you diff against the baseline, submit a patch. Like I said -- we differ in opinions on what's easier to do. For me diffing against trunk with svn is *terribly* slow and backporting to any other branch is an annoying manual and tedious process. D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
On Sun, May 31, 2015 at 2:32 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Yeah, but it misses the point -- history is history, if there were jars in it, you shouldn't just strip them, it'd be confusing. How was it back when Lucene was merging with Solr? Didn't it just initiate with a new clean repo? Maybe not all of the history is really needed -- if we limited ourselves to, say, all of the history that includes ivy then the size of the repo would drop significantly... but again, to me size doesn't really matter at all; one initial clone is no-cost. Go make yourself a cup of tea, come back and you're set. It seems like we can do something reasonable here either way. We are talking about a lot of jars. But I would love to see this kinda stuff (what history will be imported/preserved elsewhere) as part of the proposal, that is all. Making the slowest operation of git (which is turtle slow) more reasonable can go a long way to win over people, like me that are more on the -0 side. I re-clone from time to time, maybe someone else will just keep their old workflow and use 5 checkouts or whatever they want. So yeah, I think the size of the jars are very relevant. All these silly jars are maybe even the root cause of the huge repository / git mirroring issue that spawned this thread. And while it might not be relevant to your workflow, try to imagine that other people have different workflow of their own, and are just fine with that. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
I totally agree Doug. Losing the jars would have a cost: those old branches wouldn't work out of box if you wanted to run tests on them. But I am not sure how bad that cost really is. It might be zero. I havent tried to run e.g. lucene 2.x tests with a modern java 7 or java 8, but i bet they probably do not work due to things like hashmap failures. And I think solr before 4.0 will not even compile, because of things like wildcard import + base64 clashes. So if i had my preference, we'd import all history as much as we can, and nuke the silly jars. And I'd like that sourceforge history there too if we can get it, but I don't know if it is really legal. The sourceforge CVS works, see IndexWriter: http://lucene.cvs.sourceforge.net/viewvc/lucene/lucene/com/lucene/index/IndexWriter.java?view=log On Sun, May 31, 2015 at 3:10 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: I have no dog in the svn vs git debate honestly. I want to say how important it is to keep healthy history. I recently went on a bit of code archeology dig recently to figure out why something in Lucene was done the way it was. It was handy that the history went as far back as it did, but I had to switch around to different places to continue the history. For example, the abrupt shift that seems to be around when Solr/Lucene were put together had me digging for the last pure lucene tag. Its over at lucene/java/branches NOT lucene/dev/tags with teh other tags. Then when you get to the branch for lucene-101, the first commit is: 2001: New repository initialized by cvs2svn. Unable to find a cvs repo, my hunt stopped (love to hear if anyone has a CVS repo -- maybe from Jakarta?) So removing some jars isn't a big deal. But cutting off history and restarting at some arbitrary point can be annoying and make it harder to dig up more about why things are the way they are. /steps down from soapbox -Doug On Sunday, May 31, 2015, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Yeah, but it misses the point -- history is history, if there were jars in it, you shouldn't just strip them, it'd be confusing. How was it back when Lucene was merging with Solr? Didn't it just initiate with a new clean repo? Maybe not all of the history is really needed -- if we limited ourselves to, say, all of the history that includes ivy then the size of the repo would drop significantly... but again, to me size doesn't really matter at all; one initial clone is no-cost. Go make yourself a cup of tea, come back and you're set. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
I'd like to have full consolidated history, as much as possible, connect-the-dots across whatever CVS/SVN/etc repos to the extent maximally permitted by law, as Doug hints at. Just nuke the jars. I've done this (CVS-SVN-GIT) before. It wasn't that difficult. Eventually (for git) you script it and it gets version after version from CVS or SVN and appends it to git. I admit I didn't care much about svn merging infos though. Any files can be removed/ pruned by rewriting git trees before they're published. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
There are also some rather large '.dat' files in the history as well, I found this by running on a job to delete all blobs 5MB from the history via: $ java -jar ~/Downloads/bfg-1.12.3.jar --strip-blobs-bigger-than 5M --protect-blobs-from trunk,branch_5x,branch_4x lucene-solr-mirror Deleted files - Filename Git id --- DoubleArrayTrie.dat| 8babf9fa (16.8 MB), f3bfe15b (16.8 MB), ... TokenInfoDictionary$buffer.dat | 25938b37 (7.0 MB), 7f02420f (7.1 MB), ... TokenInfoDictionary$trie.dat | 69e76d64 (16.8 MB) dat.dat| 7445d1c8 (16.0 MB), 79bd7c8b (16.8 MB), 37a215e5 (16.8 MB) europarl.lines.txt.gz | e0366f10 (5.5 MB) tid.dat| 5a1e6199 (24.9 MB), 996d3fc5 (28.1 MB), ... tid_map.dat| 690fbea5 (6.3 MB), c1c01405 (6.3 MB), 7a8c1420 (6.4 MB) wiki_results.txt | db9e9294 (19.8 MB), 52ff9357 (19.8 MB), ... wiki_sentence.txt | 3a38f62e (19.0 MB) Dropping just those files reduced the repo by 50M, overall size is 131MB. Note: there is one large file still in the trunk 5MB: * commit df1e3b32 (protected by 'trunk') - contains 1 dirty file : - lucene/test-framework/src/resources/org/apache/lucene/util/europarl.lines.txt.gz (5.5 MB) Also, I failed to provide the numbers on what `git reflog expire --expire=now --all git gc --prune=now --aggressive` on a fresh mirror checkout, it results in a repo size of 320M. So, dropping the old jars saves 120MB. -Steve On Sun, May 31, 2015 at 4:39 PM, david.w.smi...@gmail.com david.w.smi...@gmail.com wrote: I like where this is going! I also think history of source code is very important, but not history of ‘.jar’ files that shouldn’t have been in source control in the first place. I’m fiercely negative about large binaries or ‘jar’ files that can be downloaded by the build system (e.g. ivy) in source control. And it was already mentioned a full history (.jar’s all) could be kept somewhere more for archival purposes — which is a good compromise, I think, since “build-ability” of history should be retained (assuming it’s even still possible, given Rob’s comments) but doesn’t have to be convenient (e.g. by it being in a separate repo). +1 to that! If we were to come up with a new git repo that doesn’t have the ‘.jar’s, it’d be good to also streamline the history prior to the big Lucene + Solr merge due to the paths in source control as to where the trunk, branches, and tags lived. It appears the current repo may have been a blind git import from subversion. And hand-done process that is mindful of these things would result in a nice history. I’ve done this sorta thing once (a project at my last job) and volunteer to do it here if we can get consensus on a move to git. ~ David On Sun, May 31, 2015 at 4:21 PM Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: I'd like to have full consolidated history, as much as possible, connect-the-dots across whatever CVS/SVN/etc repos to the extent maximally permitted by law, as Doug hints at. Just nuke the jars. I've done this (CVS-SVN-GIT) before. It wasn't that difficult. Eventually (for git) you script it and it gets version after version from CVS or SVN and appends it to git. I admit I didn't care much about svn merging infos though. Any files can be removed/ pruned by rewriting git trees before they're published. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
You just made my day with that CVS repo! :) Though I don't really get a vote -- +1 to your plan Robert. /polishes history degree -Doug On Sun, May 31, 2015 at 3:16 PM, Robert Muir rcm...@gmail.com wrote: I totally agree Doug. Losing the jars would have a cost: those old branches wouldn't work out of box if you wanted to run tests on them. But I am not sure how bad that cost really is. It might be zero. I havent tried to run e.g. lucene 2.x tests with a modern java 7 or java 8, but i bet they probably do not work due to things like hashmap failures. And I think solr before 4.0 will not even compile, because of things like wildcard import + base64 clashes. So if i had my preference, we'd import all history as much as we can, and nuke the silly jars. And I'd like that sourceforge history there too if we can get it, but I don't know if it is really legal. The sourceforge CVS works, see IndexWriter: http://lucene.cvs.sourceforge.net/viewvc/lucene/lucene/com/lucene/index/IndexWriter.java?view=log On Sun, May 31, 2015 at 3:10 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: I have no dog in the svn vs git debate honestly. I want to say how important it is to keep healthy history. I recently went on a bit of code archeology dig recently to figure out why something in Lucene was done the way it was. It was handy that the history went as far back as it did, but I had to switch around to different places to continue the history. For example, the abrupt shift that seems to be around when Solr/Lucene were put together had me digging for the last pure lucene tag. Its over at lucene/java/branches NOT lucene/dev/tags with teh other tags. Then when you get to the branch for lucene-101, the first commit is: 2001: New repository initialized by cvs2svn. Unable to find a cvs repo, my hunt stopped (love to hear if anyone has a CVS repo -- maybe from Jakarta?) So removing some jars isn't a big deal. But cutting off history and restarting at some arbitrary point can be annoying and make it harder to dig up more about why things are the way they are. /steps down from soapbox -Doug On Sunday, May 31, 2015, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Yeah, but it misses the point -- history is history, if there were jars in it, you shouldn't just strip them, it'd be confusing. How was it back when Lucene was merging with Solr? Didn't it just initiate with a new clean repo? Maybe not all of the history is really needed -- if we limited ourselves to, say, all of the history that includes ivy then the size of the repo would drop significantly... but again, to me size doesn't really matter at all; one initial clone is no-cost. Go make yourself a cup of tea, come back and you're set. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, LLC | 240.476.9983 | http://www.opensourceconnections.com Author: Relevant Search http://manning.com/turnbull from Manning Publications This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Moving to git?
Losing the jars would have a cost: those old branches wouldn't work out of box if you wanted to run tests on Yeah, I'd rather not have them at all than have them filtered and crippled. It'll be confusing. There's nothing wrong in preserving the SVN history (or even a full git import from SVN, but in a separate repo) for archival reasons and just starting a new repo from some point in history (where it makes sense, for example the still maintained branches). Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
On Sun, May 31, 2015 at 3:53 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Losing the jars would have a cost: those old branches wouldn't work out of box if you wanted to run tests on Yeah, I'd rather not have them at all than have them filtered and crippled. It'll be confusing. But my argument is that they are already crippled. So what is the purpose of keeping the jars? I'd like to have full consolidated history, as much as possible, connect-the-dots across whatever CVS/SVN/etc repos to the extent maximally permitted by law, as Doug hints at. Just nuke the jars. Propose this and I will use any versioning system you would like on top of it! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
bq. Something needs to be done about all those jars in the source history, I will not let this go. I went ahead and used the BFG Repo Cleaner https://rtyley.github.io/bfg-repo-cleaner/ tool to drop all of the old jars in the git history, here are the findings: $ git clone --mirror https://github.com/apache/lucene-solr.git lucene-solr-mirror 489M lucene-solr-mirror $ java -jar ~/Downloads/bfg-1.12.3.jar --delete-files *.jar --protect-blobs-from trunk,branch_5x,branch_4x lucene-solr-mirror $ cd lucene-solr-mirror $ git reflog expire --expire=now --all git gc --prune=now --aggressive 182M lucene-solr-mirror $ cat lucene-solr-mirror.bfg-report/2015-05-31/10-16-36/deleted-files.txt af4eed0506b53f17a4d22e4f1630ee03cb7991e5 177868 Tidy.jar 53f82a1c4c492dc810c27317857bbb02afd6fa58 62983 activation-1.1.jar 3beb3b802ffd7502ac4b4d47e0b2a75d08e30cc3 1034049 ant-1.6.5.jar 704717779f6d0d7eb026dc7af78a35e51adeec8b 1323005 ant-1.7.1.jar 7f5be4a4e05939429353a90e882846aeac72b976 1933743 ant-1.8.2.jar 063cce4f940033fa6e33d3e590cf6f5051129295 93518 ant-junit-1.7.1.jar 704717779f6d0d7eb026dc7af78a35e51adeec8b 1323005 apache-ant-1.7.1.jar 063cce4f940033fa6e33d3e590cf6f5051129295 93518 apache-ant-junit-1.7.1.jar e3c62523fb93b5e2f73365e6cee0d0bc68e48556 95511 apache-mime4j-core-0.7.jar 1f7bf1ea13697ca0243d399ca6e5d864dd8bec0b 300168 apache-mime4j-dom-0.7.jar bab8b31fb99256e13fc6010701db560243c47fa7 26027 apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar 5c4007c7e74af85d823243153d308f80e084eff0 22478 apache-solr-noggit-r1099557.jar f59a39b011591edafc7955e97ae0d195fdf8b42e 22376 apache-solr-noggit-r1209632.jar 2a07c61d9ecb9683a135b7847682e7c36f19bbfe 22770 apache-solr-noggit-r1211150.jar 30be80e0b838a9c1445936b6966ccfc7ff165ae5 36776 apache-solr-noggit-r730138.jar 97d779912d38d2524a0e20efa849a4b6f01a4b46 21229 apache-solr-noggit-r730138.jar a798b805d0ce92606697cc1b2aac42bf416076e3 37259 apache-solr-noggit-r944541.jar 9b434f5760dd0d78350bdf8237273c0d5db0174e 21240 apache-solr-noggit-r944541.jar 8217cae0a1bc977b241e0c8517cc2e3e7cede276 43033 asm-3.1.jar 4133d823d96bf3fc26d3a9754375dcc30d8da416 342664 asm-debug-all-4.1.jar f66e9a8b9868226121961c13e6a32a55d0b2f78a 229116 bcmail-jdk15-1.45.jar 409070b0370a95c14ed4357261afb96b91d10e86 1663318 bcprov-jdk15-1.45.jar b64b033af70609338c07e2a88a5f7efcd1a84ddb 92027 boilerpipe-1.1.0.jar 96c3bdbdaacd5289b0e654842e435689fbcf22e2 679423 carrot2-core-3.4.0.jar 043c0cb889aea066f7d4126af029d00a0bcd9e81 655412 carrot2-core-3.4.0.jar f872cbc8eec94f7d5b29a73f99cd13089848a3cd 933657 carrot2-core-3.4.2.jar ce2d3bf9c28a4ff696d66a82334d15fd0161e890 995243 carrot2-core-3.4.2.jar be94db93d41bd4ba53b650d421cfa5fb0519b9af 958799 carrot2-core-3.5.0.1.jar adc127c48137d03e252f526de84a07c8d6bda521 979186 carrot2-core-3.5.0.jar ab44cf9314b1efff393e05f9c938446887d3570e 981085 carrot2-core-3.5.0.jar 5ca86c5e72b2953feb0b58fbd87f76d0301cbbf6 517641 carrot2-mini-3.1.0.jar b1b89c9c921f16af22a88db3ff28975a8e40d886 188671 commons-beanutils-1.7.0.jar e633afbe6842aa92b1a8f0ff3f5b8c0e3283961b 36174 commons-cli-1.1.jar 957b6752af9a60c1bb2a4f65db0e90e5ce00f521 46725 commons-codec-1.3.jar 458d432da88b0efeab640c229903fb5aad274044 58160 commons-codec-1.4.jar e9013fed78f333c928ff7f828948b91fcb5a92b4 73098 commons-codec-1.5.jar ee1bc49acae11cc79eceec51f7be785590e99fd8 232771 commons-codec-1.6.jar 41e230feeaa53618b6ac5f8d11792c2eecf4d4fd 559366 commons-collections-3.1.jar c35fa1fee145cba638884e41b80a401cbe4924ef 575389 commons-collections-3.2.1.jar 78d832c11c42023d4bc12077a1d9b7b5025217bc 143847 commons-compress-1.0.jar 51baf91a2df10184a8cca5cb43f11418576743a1 161361 commons-compress-1.1.jar 61753909c3f32306bf60d09e5345d47058ba2122 168596 commons-compress-1.2.jar 6c826c528b60bb1b25e9053b7f4c920292f6c343 224548 commons-compress-1.3.jar f80348dfa0b59f0840c25d1b8c25d1490d1eaf51 22017 commons-csv-1.0-SNAPSHOT-r609327.jar 8439e6f1a8b1d82943f84688b8086869255eda86 27361 commons-csv-1.0-SNAPSHOT-r966014.jar 1783dbea232ced6db122268f8faa5ce773c7ea42 139966 commons-digester-1.7.jar 9c8bd13a2002a9ff5b35b873b9f111d5281ad201 148783 commons-digester-2.0.jar aa209b3887c90933cdc58c8c8572e90435e8e48d 57779 commons-fileupload-1.2.1.jar 7c59774aed4f5dd08778489aaad565690ff7c132 305001 commons-httpclient-3.1.jar 133dc6cb35f5ca2c5920fd0933a557c2def88680 109043 commons-io-1.4.jar b5c7d692fe5616af4332c1a1db6efd23e3ff881b 163151 commons-io-2.1.jar ce0ca22c8d29a9be736d775fe50bfdc6ce770186 257923 commons-lang-2.4.jar 532939ecab6b77ccb77af3635c55ff9752b70ab7 261809 commons-lang-2.4.jar 98467d3a653ebad776ffa3542efeb9732fe0b482 284220 commons-lang-2.6.jar b73a80fab641131e6fbe3ae833549efb3c540d17 38015 commons-logging-1.0.4.jar 1deef144cb17ed2c11c6cdcdcb2d9530fa8d0b47 60686 commons-logging-1.1.1.jar ae0b63586701efdc7bf03ffb0a840d50950d211c 3566844 core-3.1.1.jar b9c8c8a170881dfe9c33adc87c26348904510954 364003 cpptasks-1.0b5.jar 99baf20bacd712cae91dd6e4e1f46224cafa1a37 500676 db-4.7.25.jar c8c4dbb92d6c23a7fbb2813eb721eb4cce91750c 313898
Re: Moving to git?
Yeah, but it misses the point -- history is history, if there were jars in it, you shouldn't just strip them, it'd be confusing. How was it back when Lucene was merging with Solr? Didn't it just initiate with a new clean repo? Maybe not all of the history is really needed -- if we limited ourselves to, say, all of the history that includes ivy then the size of the repo would drop significantly... but again, to me size doesn't really matter at all; one initial clone is no-cost. Go make yourself a cup of tea, come back and you're set. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
In any case, decisions on these types of things are majority of PMC rules. No one has wanted to call a vote yet. Eventually we will and eventually we will move to git. I'm still in no hurry. - mark On Sun, May 31, 2015 at 9:59 AM Uwe Schindler u...@thetaphi.de wrote: I also clone my SVN working copy locally. After that I just switch branch. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Sunday, May 31, 2015 3:56 PM To: Solr/Lucene Dev Subject: Re: Moving to git? On Sun, May 31, 2015 at 9:31 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Personally, clone for me is 'rare', I did it once years back, and have never done it since. log, diff and others I do on a daily basis. Yep, I find I need fewer different working directories with git, but when I do want an additional copy, I just make a local copy of an existing repo since it has everything. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller
Re: Moving to git?
I like where this is going! I also think history of source code is very important, but not history of ‘.jar’ files that shouldn’t have been in source control in the first place. I’m fiercely negative about large binaries or ‘jar’ files that can be downloaded by the build system (e.g. ivy) in source control. And it was already mentioned a full history (.jar’s all) could be kept somewhere more for archival purposes — which is a good compromise, I think, since “build-ability” of history should be retained (assuming it’s even still possible, given Rob’s comments) but doesn’t have to be convenient (e.g. by it being in a separate repo). +1 to that! If we were to come up with a new git repo that doesn’t have the ‘.jar’s, it’d be good to also streamline the history prior to the big Lucene + Solr merge due to the paths in source control as to where the trunk, branches, and tags lived. It appears the current repo may have been a blind git import from subversion. And hand-done process that is mindful of these things would result in a nice history. I’ve done this sorta thing once (a project at my last job) and volunteer to do it here if we can get consensus on a move to git. ~ David On Sun, May 31, 2015 at 4:21 PM Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: I'd like to have full consolidated history, as much as possible, connect-the-dots across whatever CVS/SVN/etc repos to the extent maximally permitted by law, as Doug hints at. Just nuke the jars. I've done this (CVS-SVN-GIT) before. It wasn't that difficult. Eventually (for git) you script it and it gets version after version from CVS or SVN and appends it to git. I admit I didn't care much about svn merging infos though. Any files can be removed/ pruned by rewriting git trees before they're published. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
On Sun, May 31, 2015 at 4:39 PM, david.w.smi...@gmail.com david.w.smi...@gmail.com wrote: If we were to come up with a new git repo that doesn’t have the ‘.jar’s, it’d be good to also streamline the history prior to the big Lucene + Solr merge due to the paths in source control as to where the trunk, branches, and tags lived. It appears the current repo may have been a blind git import from subversion. And hand-done process that is mindful of these things would result in a nice history. I’ve done this sorta thing once (a project at my last job) and volunteer to do it here if we can get consensus on a move to git. The current Git history is totally broken. This is a complete dealbreaker from my perspective, if its indicative of what svn - git conversion will produce. Look at CheckIndex.java history in git: https://github.com/apache/lucene-solr/commits/trunk/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java?page=5 It stops at Feb 7, 2012. In subversion it goes back to 2007, to the original issue where Mike added CheckIndex: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java?view=log - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
Having to agree on mechanics certainly is a downside of Git. There is only one good rule though - no merge commmits in the history :) Ever. Do whatever you want beyond that. A clean, simple history for each branch is the only sensible use of Git I've seen. - Mark On Sat, May 30, 2015 at 9:00 AM Adrien Grand jpou...@gmail.com wrote: The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller
Re: Moving to git?
Regarding history, if we switch to git, our history will remain in svn, even if the branches are deleted, the history and old revisions are still there. Upayavira On Sun 2015, at 10:48 PM, Mark Miller wrote: Having to agree on mechanics certainly is a downside of Git. There is only one good rule though - no merge commmits in the history :) Ever. Do whatever you want beyond that. A clean, simple history for each branch is the only sensible use of Git I've seen. - Mark On Sat, May 30, 2015 at 9:00 AM Adrien Grand jpou...@gmail.com wrote: The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller
Re: Moving to git?
Hmmm. I pulled up this file in IntelliJ in my git checkout and viewed the history. It went back to March 17th 2010 (earlier than the 2012 you found) with git hash 3ee0ace1ba6b9bff3ffaa278c0bba07e6064057dwith a commit message of: git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/branches/newtrunk@924483 13f79535-47bb-0310-9956-ffa450edef68 All files were added in that commit; it's the earliest commit in this git repo. This is the kind of thing I should be able to fix if I build a repo manually. Side note: I used to be able to see the commands IntelliJ gave to git, but I don’t see it in the latest EAP anyways. I was wondering if it passed the an option to git log like --find-renames=40% to be more aggressive in its rename detection. On Sun, May 31, 2015 at 6:57 PM Robert Muir rcm...@gmail.com wrote: And here is IndexWriter with initial revision in 2001, but again git still only stops at Feb 7, 2012. http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java?view=log Revision 149570 - (view) (download) (annotate) - [select for diffs] Added Tue Sep 18 16:29:48 2001 UTC (13 years, 8 months ago) by jvanzyl Original Path: lucene/java/trunk/src/java/org/apache/lucene/index/IndexWriter.java File length: 15076 byte(s) Initial revision So subversion history looks pretty complete. If we can add other history from sourceforge, fantastic, but there isn't so so much going on there. It is git that is totally broken here with respect to history. On Sun, May 31, 2015 at 6:47 PM, Robert Muir rcm...@gmail.com wrote: On Sun, May 31, 2015 at 4:39 PM, david.w.smi...@gmail.com david.w.smi...@gmail.com wrote: If we were to come up with a new git repo that doesn’t have the ‘.jar’s, it’d be good to also streamline the history prior to the big Lucene + Solr merge due to the paths in source control as to where the trunk, branches, and tags lived. It appears the current repo may have been a blind git import from subversion. And hand-done process that is mindful of these things would result in a nice history. I’ve done this sorta thing once (a project at my last job) and volunteer to do it here if we can get consensus on a move to git. The current Git history is totally broken. This is a complete dealbreaker from my perspective, if its indicative of what svn - git conversion will produce. Look at CheckIndex.java history in git: https://github.com/apache/lucene-solr/commits/trunk/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java?page=5 It stops at Feb 7, 2012. In subversion it goes back to 2007, to the original issue where Mike added CheckIndex: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java?view=log - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
I'm all for a small download size in all things, but personally, I download Git repos for a project about 1/20th as often as I download svn checkouts (one of the things I prefer about my Git usage) and I have fast internet. Not a sore spot here. - Mark On Sun, May 31, 2015 at 5:38 PM Steve Davids sdav...@gmail.com wrote: There are also some rather large '.dat' files in the history as well, I found this by running on a job to delete all blobs 5MB from the history via: $ java -jar ~/Downloads/bfg-1.12.3.jar --strip-blobs-bigger-than 5M --protect-blobs-from trunk,branch_5x,branch_4x lucene-solr-mirror Deleted files - Filename Git id --- DoubleArrayTrie.dat| 8babf9fa (16.8 MB), f3bfe15b (16.8 MB), ... TokenInfoDictionary$buffer.dat | 25938b37 (7.0 MB), 7f02420f (7.1 MB), ... TokenInfoDictionary$trie.dat | 69e76d64 (16.8 MB) dat.dat| 7445d1c8 (16.0 MB), 79bd7c8b (16.8 MB), 37a215e5 (16.8 MB) europarl.lines.txt.gz | e0366f10 (5.5 MB) tid.dat| 5a1e6199 (24.9 MB), 996d3fc5 (28.1 MB), ... tid_map.dat| 690fbea5 (6.3 MB), c1c01405 (6.3 MB), 7a8c1420 (6.4 MB) wiki_results.txt | db9e9294 (19.8 MB), 52ff9357 (19.8 MB), ... wiki_sentence.txt | 3a38f62e (19.0 MB) Dropping just those files reduced the repo by 50M, overall size is 131MB. Note: there is one large file still in the trunk 5MB: * commit df1e3b32 (protected by 'trunk') - contains 1 dirty file : - lucene/test-framework/src/resources/org/apache/lucene/util/europarl.lines.txt.gz (5.5 MB) Also, I failed to provide the numbers on what `git reflog expire --expire=now --all git gc --prune=now --aggressive` on a fresh mirror checkout, it results in a repo size of 320M. So, dropping the old jars saves 120MB. -Steve On Sun, May 31, 2015 at 4:39 PM, david.w.smi...@gmail.com david.w.smi...@gmail.com wrote: I like where this is going! I also think history of source code is very important, but not history of ‘.jar’ files that shouldn’t have been in source control in the first place. I’m fiercely negative about large binaries or ‘jar’ files that can be downloaded by the build system (e.g. ivy) in source control. And it was already mentioned a full history (.jar’s all) could be kept somewhere more for archival purposes — which is a good compromise, I think, since “build-ability” of history should be retained (assuming it’s even still possible, given Rob’s comments) but doesn’t have to be convenient (e.g. by it being in a separate repo). +1 to that! If we were to come up with a new git repo that doesn’t have the ‘.jar’s, it’d be good to also streamline the history prior to the big Lucene + Solr merge due to the paths in source control as to where the trunk, branches, and tags lived. It appears the current repo may have been a blind git import from subversion. And hand-done process that is mindful of these things would result in a nice history. I’ve done this sorta thing once (a project at my last job) and volunteer to do it here if we can get consensus on a move to git. ~ David On Sun, May 31, 2015 at 4:21 PM Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: I'd like to have full consolidated history, as much as possible, connect-the-dots across whatever CVS/SVN/etc repos to the extent maximally permitted by law, as Doug hints at. Just nuke the jars. I've done this (CVS-SVN-GIT) before. It wasn't that difficult. Eventually (for git) you script it and it gets version after version from CVS or SVN and appends it to git. I admit I didn't care much about svn merging infos though. Any files can be removed/ pruned by rewriting git trees before they're published. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark about.me/markrmiller
Re: Moving to git?
And here is IndexWriter with initial revision in 2001, but again git still only stops at Feb 7, 2012. http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java?view=log Revision 149570 - (view) (download) (annotate) - [select for diffs] Added Tue Sep 18 16:29:48 2001 UTC (13 years, 8 months ago) by jvanzyl Original Path: lucene/java/trunk/src/java/org/apache/lucene/index/IndexWriter.java File length: 15076 byte(s) Initial revision So subversion history looks pretty complete. If we can add other history from sourceforge, fantastic, but there isn't so so much going on there. It is git that is totally broken here with respect to history. On Sun, May 31, 2015 at 6:47 PM, Robert Muir rcm...@gmail.com wrote: On Sun, May 31, 2015 at 4:39 PM, david.w.smi...@gmail.com david.w.smi...@gmail.com wrote: If we were to come up with a new git repo that doesn’t have the ‘.jar’s, it’d be good to also streamline the history prior to the big Lucene + Solr merge due to the paths in source control as to where the trunk, branches, and tags lived. It appears the current repo may have been a blind git import from subversion. And hand-done process that is mindful of these things would result in a nice history. I’ve done this sorta thing once (a project at my last job) and volunteer to do it here if we can get consensus on a move to git. The current Git history is totally broken. This is a complete dealbreaker from my perspective, if its indicative of what svn - git conversion will produce. Look at CheckIndex.java history in git: https://github.com/apache/lucene-solr/commits/trunk/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java?page=5 It stops at Feb 7, 2012. In subversion it goes back to 2007, to the original issue where Mike added CheckIndex: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java?view=log - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
I have no dog in the svn vs git debate honestly. I want to say how important it is to keep healthy history. I recently went on a bit of code archeology dig recently to figure out why something in Lucene was done the way it was. It was handy that the history went as far back as it did, but I had to switch around to different places to continue the history. For example, the abrupt shift that seems to be around when Solr/Lucene were put together had me digging for the last pure lucene tag. Its over at lucene/java/branches NOT lucene/dev/tags with teh other tags. Then when you get to the branch for lucene-101, the first commit is: 2001: New repository initialized by cvs2svn. Unable to find a cvs repo, my hunt stopped (love to hear if anyone has a CVS repo -- maybe from Jakarta?) So removing some jars isn't a big deal. But cutting off history and restarting at some arbitrary point can be annoying and make it harder to dig up more about why things are the way they are. /steps down from soapbox -Doug On Sunday, May 31, 2015, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Yeah, but it misses the point -- history is history, if there were jars in it, you shouldn't just strip them, it'd be confusing. How was it back when Lucene was merging with Solr? Didn't it just initiate with a new clean repo? Maybe not all of the history is really needed -- if we limited ourselves to, say, all of the history that includes ivy then the size of the repo would drop significantly... but again, to me size doesn't really matter at all; one initial clone is no-cost. Go make yourself a cup of tea, come back and you're set. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
+1 for git, great for working on multiple things at once. Side note: git-svn is also not great btw for the kind of merging we need to do with every commit, it kind of works but with too many caveats. On the note that git clone is slow, sure, because it fetches a fair amount of history which svn doesn't. But to compare just them is unfair, since checkout and clone are not identical. If you want to compare times, you will also have to add up every log, diff, or annotate you do on the tree during your development (of which I certainly do a lot and I am sure others do as well), and git will certainly win if you include all those because it does no network lookup. Clone and checkout are typically one time operations, why should their speed be a concern in any case? I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta
Re: Moving to git?
You guys totally miss the point on clone. The thing is that svn checkout gives you enough, to do what you need to do. And yes it does network lookup for more rare things like history, but this works just fine in general. On the other hand git downloads gigabytes, before you can even get started. Something needs to be done about all those jars in the source history, I will not let this go. On Sun, May 31, 2015 at 9:16 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: +1 for git, great for working on multiple things at once. Side note: git-svn is also not great btw for the kind of merging we need to do with every commit, it kind of works but with too many caveats. On the note that git clone is slow, sure, because it fetches a fair amount of history which svn doesn't. But to compare just them is unfair, since checkout and clone are not identical. If you want to compare times, you will also have to add up every log, diff, or annotate you do on the tree during your development (of which I certainly do a lot and I am sure others do as well), and git will certainly win if you include all those because it does no network lookup. Clone and checkout are typically one time operations, why should their speed be a concern in any case? I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
Personally, clone for me is 'rare', I did it once years back, and have never done it since. log, diff and others I do on a daily basis. Same with svn as well actually, you checkout just once usually.. I think the previous discussion had the agreement that this issue should focus on committers rather than contributors. And committers by definition aren't getting started with Solr. If you want to make things more flexible and faster for contributors, sure, github mirror provides an svn facade which allows you to check out a subversion wc from it's repos for read/write (write's not that useful for us though since github is not the primary repo). On 31 May 2015 14:25, Robert Muir rcm...@gmail.com wrote: You guys totally miss the point on clone. The thing is that svn checkout gives you enough, to do what you need to do. And yes it does network lookup for more rare things like history, but this works just fine in general. On the other hand git downloads gigabytes, before you can even get started. Something needs to be done about all those jars in the source history, I will not let this go. On Sun, May 31, 2015 at 9:16 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: +1 for git, great for working on multiple things at once. Side note: git-svn is also not great btw for the kind of merging we need to do with every commit, it kind of works but with too many caveats. On the note that git clone is slow, sure, because it fetches a fair amount of history which svn doesn't. But to compare just them is unfair, since checkout and clone are not identical. If you want to compare times, you will also have to add up every log, diff, or annotate you do on the tree during your development (of which I certainly do a lot and I am sure others do as well), and git will certainly win if you include all those because it does no network lookup. Clone and checkout are typically one time operations, why should their speed be a concern in any case? I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
On Sat, May 30, 2015 at 4:20 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: # time git clone --depth 1 https://github.com/apache/lucene-solr.git This breaks rule #1 of using git, don't pass any options to any of the commands, or it shits itself. Git clone is slow, i think the reason is all the old jar files in the repository. It needs to be fixed. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
Don't assume your workflow is everyone else's workflow. And don't try to enforce your workflow on me. I don't use svn OR git in the way you describe. On Sun, May 31, 2015 at 9:31 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Personally, clone for me is 'rare', I did it once years back, and have never done it since. log, diff and others I do on a daily basis. Same with svn as well actually, you checkout just once usually.. I think the previous discussion had the agreement that this issue should focus on committers rather than contributors. And committers by definition aren't getting started with Solr. If you want to make things more flexible and faster for contributors, sure, github mirror provides an svn facade which allows you to check out a subversion wc from it's repos for read/write (write's not that useful for us though since github is not the primary repo). On 31 May 2015 14:25, Robert Muir rcm...@gmail.com wrote: You guys totally miss the point on clone. The thing is that svn checkout gives you enough, to do what you need to do. And yes it does network lookup for more rare things like history, but this works just fine in general. On the other hand git downloads gigabytes, before you can even get started. Something needs to be done about all those jars in the source history, I will not let this go. On Sun, May 31, 2015 at 9:16 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: +1 for git, great for working on multiple things at once. Side note: git-svn is also not great btw for the kind of merging we need to do with every commit, it kind of works but with too many caveats. On the note that git clone is slow, sure, because it fetches a fair amount of history which svn doesn't. But to compare just them is unfair, since checkout and clone are not identical. If you want to compare times, you will also have to add up every log, diff, or annotate you do on the tree during your development (of which I certainly do a lot and I am sure others do as well), and git will certainly win if you include all those because it does no network lookup. Clone and checkout are typically one time operations, why should their speed be a concern in any case? I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Moving to git?
I also clone my SVN working copy locally. After that I just switch branch. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Sunday, May 31, 2015 3:56 PM To: Solr/Lucene Dev Subject: Re: Moving to git? On Sun, May 31, 2015 at 9:31 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Personally, clone for me is 'rare', I did it once years back, and have never done it since. log, diff and others I do on a daily basis. Yep, I find I need fewer different working directories with git, but when I do want an additional copy, I just make a local copy of an existing repo since it has everything. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
On Sun, May 31, 2015 at 9:31 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Personally, clone for me is 'rare', I did it once years back, and have never done it since. log, diff and others I do on a daily basis. Yep, I find I need fewer different working directories with git, but when I do want an additional copy, I just make a local copy of an existing repo since it has everything. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
+1 to moving to git. I am not going to attempt to convince those stubborn types that want to stick to SVN. I use git and svn and git simply works better for me. I just want to explain something, because there seems to be a misunderstanding. time git clone git://git.apache.org/lucene-solr.git test.git These are all apples-to-oranges comparisons. You're fetching the entire history of all commits. SVN just fetches one branch. I typically fetch more than one branch when I work and it takes (N * svn checkout) times to do so (and no, switch is not much faster). For your comparison to be fairer, you should be comparing git clone to: time svn co https://svn.apache.org/repos/asf/lucene/dev I don't know what the actual time of this is or how much disk space it takes. Yes, it is an insane command but it's really an equivalent of having a git clone locally... Also, even if the initial clone takes 9 minutes (which I think is Apache's git server being dog slow), you can always fetch from any other mirror. It's still the same repository, with all the commits, tags, hashes, etc. Or you can fetch just the latest revision with --depth 1. Many options out there [shrug]. If I ever reach a point where I am working on multiple code trees, I expect that I will have them in separate directories because that will help me keep them straight. Once you start working with git you probably won't bother having separate folders. Simply because this thing is super helpful (and fast) -- it aggregates all your branch commits into a single patch: git diff my-branch..origin/master Again, I do this with SVN too (against a remote URL) and it takes a looong time, every time I do it. A single initial checkout time is not enough to assess overall tool productivity... Finally, even if your taste is to have separate folders, remember you only need one clone of the repo, ever (and you only pull new commits later on). Managing a different branch then becomes: cd .. cp -R lucene-master lucene-my-branch cd lucene-my-branch git co master -b my-branch git push origin HEAD -u done, you're set. Takes 3 seconds. Longer than a remote svn copy, at least from Europe... Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
Did this, out of curiosity (from a server in the U.S.): # time git clone https://github.com/apache/lucene-solr.git ... Receiving objects: 100% (563630/563630), 472.01 MiB | 10.46 MiB/s, done. real1m13.049s user0m46.000s sys 0m10.060s # time git clone --depth 1 https://github.com/apache/lucene-solr.git ... Receiving objects: 100% (9507/9507), 37.40 MiB | 9.84 MiB/s, done. real0m7.814s user0m2.550s sys 0m1.110s # time svn co https://svn.apache.org/repos/asf/lucene/dev/trunk test.svn ... Checked out revision 1682650. real0m34.526s user0m12.460s sys 0m10.320s As you can see everything is relative... Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
As I've mentioned multiple times, I think git is super useful when working on multiple unrelated things that affect the same files. What I'd been doing so far with svn is, creating multiple physical directories (checkouts) and working on them, and tracking them, cleaning them up, and deleting them when done. With git, I wouldn't have to do any of that, letting me spend more time on building/fixing things than just managing my changes. On Fri, May 29, 2015 at 9:45 PM, Walter Underwood wun...@wunderwood.org wrote: I’m not a committer, but I’ve built production code with a lot of source control systems and git is by far the the most cumbersome. It does one thing well, handling untrusted contributors. With trusted committers, Subversion is very nice, thank you. Here are the systems I’ve used. * SCCS * RCS * HP history manager * ClearCase * CVS * Perforce * Subversion * git wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 8:58 PM, Ishan Chattopadhyaya ichattopadhy...@gmail.com wrote: Life is so much easier on long train/plane journeys with Git. +1. On Sat, May 30, 2015 at 9:21 AM, Shai Erera ser...@gmail.com wrote: +1 to moving to git. Shai On May 30, 2015 6:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: * There may be other good reasons for using git, but this is not one.* I just added one more to the list. I think most other reasons have already been spoken about in previous discussions. I'm not trying to debate on what is better (I think it's a lot to do with *opinion*). I think it's a reasonable thing to move to a system that allows for distributed version control and makes working on multiple things at the same time easy. But again, that's my thought. The last time the discussion came up, I was +1 to moving and wasn't already using it a lot. Right now, I'm just trying to work on multiple things and find git easier for that purpose. I just wanted to bring this back up and see if the opinion of active contributors has changed since the last time by means of a polite and friendly discussion. In the end, we can agree to disagree but it'd be better than not discussing at all. :-) On Fri, May 29, 2015 at 7:24 PM, Walter Underwood wun...@wunderwood.org wrote: There may be other good reasons for using git, but this is not one. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:57 PM, Yonik Seeley ysee...@gmail.com wrote: On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Anshum Gupta -- Anshum Gupta
Re: Moving to git?
Walter Underwood wun...@wunderwood.org wrote: I’m not a committer, but I’ve built production code with a lot of source control systems and git is by far the the most cumbersome. I am not a committer and I have build production code with very few source control systems: CVS, SVN GIT. GIT is the one I dislike the least; occasionally I even find it quite neat. It does one thing well, handling untrusted contributors. With trusted committers, [...] So is this a question of optimizing towards committers or contributors? Ease of use for the core people or more openness towards outside contributions? - Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
The commit then push workflow *is* what allows you to maintain multiple local branches without the nightmare of having multiple checkouts, and managing them. I don't think that moving to GIT has anything to do with Github. With SVN, we also don't have any nice user interface and/or pull requests, so the lack of one in GIT is irrelevant in my opinion. If we consider moving to GIT, we might also want to look at using Gerrit for our code reviews and patch submissions. It makes the code review aspect much easier, nicer and helpful. Shai On Sat, May 30, 2015 at 12:36 PM, Uwe Schindler u...@thetaphi.de wrote: Hi, I think most people say that GIT is easier or better to use because they combine in their mind using „GIT“ with „the Github user interface“. This is indeed very nice to have - I (for myself) am also very happy with using Github, as long as it keeps simple (you only have users from Github sending you pull requests, if you only need to push to one location,…). On the other hand, I always get annoyed that you cannot do the same like “svn update” or “svn commit” in one turn. You always have to pull first and then update or vice versa first commit then push. If you are online there is no reason to have this separated, especially if you are a “committer”. If you are contributor, that fine – because you cannot push, but for committers this is what subversion people like me hate. And all these additional steps are not useful to a “centralized” infrastructure like ASF. As said before, at the ASF, we don’t get the Github interface as “main primary user interface”, because the “committers” have to use the official ASF git installation to push. So we are still not able to easily handle pull requests on github and so on. So there is no useful thing (except the mentioned: no longer need to have multiple checkouts locally) with GIT. The big backside is: without the Github Web interface, GIT is unuseable to me, sorry. The command line is a disaster, technical concepts behind GIT are a disaster; everything is a disaster J I would plus one to use Git, if we would solely use “GitHub” as central repository (like Elasticsearch), but with having the “central infrastructure” at the ASF: Clear -1 !!! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Anshum Gupta [mailto:ans...@anshumgupta.net] *Sent:* Friday, May 29, 2015 11:08 PM *To:* dev@lucene.apache.org *Subject:* Moving to git? I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta
Re: Moving to git?
bq. I don't think that moving to GIT has anything to do with Github. I think that's a common misconception by people that don't 'get' Git. Most people could care less about Git as it pertains to GitHub except for one thing - it provides a nice central master repo that is hosted that you can push to. That other stuff on GitHub is whatever. Candy, fluff, whatever. The power of and ease of Git vs svn is Git, not GitHub. svn feels like 1980, Git feels like 1990. - Mark On Sat, May 30, 2015 at 7:22 AM Shai Erera ser...@gmail.com wrote: The commit then push workflow *is* what allows you to maintain multiple local branches without the nightmare of having multiple checkouts, and managing them. I don't think that moving to GIT has anything to do with Github. With SVN, we also don't have any nice user interface and/or pull requests, so the lack of one in GIT is irrelevant in my opinion. If we consider moving to GIT, we might also want to look at using Gerrit for our code reviews and patch submissions. It makes the code review aspect much easier, nicer and helpful. Shai On Sat, May 30, 2015 at 12:36 PM, Uwe Schindler u...@thetaphi.de wrote: Hi, I think most people say that GIT is easier or better to use because they combine in their mind using „GIT“ with „the Github user interface“. This is indeed very nice to have - I (for myself) am also very happy with using Github, as long as it keeps simple (you only have users from Github sending you pull requests, if you only need to push to one location,…). On the other hand, I always get annoyed that you cannot do the same like “svn update” or “svn commit” in one turn. You always have to pull first and then update or vice versa first commit then push. If you are online there is no reason to have this separated, especially if you are a “committer”. If you are contributor, that fine – because you cannot push, but for committers this is what subversion people like me hate. And all these additional steps are not useful to a “centralized” infrastructure like ASF. As said before, at the ASF, we don’t get the Github interface as “main primary user interface”, because the “committers” have to use the official ASF git installation to push. So we are still not able to easily handle pull requests on github and so on. So there is no useful thing (except the mentioned: no longer need to have multiple checkouts locally) with GIT. The big backside is: without the Github Web interface, GIT is unuseable to me, sorry. The command line is a disaster, technical concepts behind GIT are a disaster; everything is a disaster J I would plus one to use Git, if we would solely use “GitHub” as central repository (like Elasticsearch), but with having the “central infrastructure” at the ASF: Clear -1 !!! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Anshum Gupta [mailto:ans...@anshumgupta.net] *Sent:* Friday, May 29, 2015 11:08 PM *To:* dev@lucene.apache.org *Subject:* Moving to git? I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta -- - Mark about.me/markrmiller
Re: Moving to git?
A git clone is just too slow right now the way its setup. So what will be done to fix that? Currently, svn is way faster in the worst case. In the time it takes to git clone, i can do 10 svn checkouts. I sometimes use git, but usually when working on software, i don't work on trivial things. I do nontrivial stuff and sometimes shit breaks, including git workspaces and including git itself. On average do I get 10 branches per clone() ? Not sure. I dont know if its because there used to be JAR files in the source trees back in the day or what, but fixing this is really mandatory. Remember for a new developer its also something they must always do, so its not just when 'git fucks up for me', it is a slow operation that occasionally people must deal with. And its just too slow. git clone: Cloning into 'lucene-solr'... remote: Counting objects: 563630, done. remote: Compressing objects: 100% (136240/136240), done. remote: Total 563630 (delta 329023), reused 539045 (delta 304604) Receiving objects: 100% (563630/563630), 356.90 MiB | 1.79 MiB/s, done. Resolving deltas: 100% (329023/329023), done. Checking connectivity... done. real3m43.157s user0m38.243s sys0m10.480s svn co: Checked out revision 1682598. real0m34.713s user0m5.504s sys0m4.170s PS don't tell me i dont know how to use git correctly, i dont care to hear your religious arguments. A clone is still the worst case operation, so it must be supported. Its also the first thing someone must do, if they want to work with the codebase. And with git right now, its broken (too slow) for lucene-solr. On Fri, May 29, 2015 at 5:07 PM, Anshum Gupta ans...@anshumgupta.net wrote: I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Moving to git?
Hi, I think most people say that GIT is easier or better to use because they combine in their mind using „GIT“ with „the Github user interface“. This is indeed very nice to have - I (for myself) am also very happy with using Github, as long as it keeps simple (you only have users from Github sending you pull requests, if you only need to push to one location,…). On the other hand, I always get annoyed that you cannot do the same like “svn update” or “svn commit” in one turn. You always have to pull first and then update or vice versa first commit then push. If you are online there is no reason to have this separated, especially if you are a “committer”. If you are contributor, that fine – because you cannot push, but for committers this is what subversion people like me hate. And all these additional steps are not useful to a “centralized” infrastructure like ASF. As said before, at the ASF, we don’t get the Github interface as “main primary user interface”, because the “committers” have to use the official ASF git installation to push. So we are still not able to easily handle pull requests on github and so on. So there is no useful thing (except the mentioned: no longer need to have multiple checkouts locally) with GIT. The big backside is: without the Github Web interface, GIT is unuseable to me, sorry. The command line is a disaster, technical concepts behind GIT are a disaster; everything is a disaster J I would plus one to use Git, if we would solely use “GitHub” as central repository (like Elasticsearch), but with having the “central infrastructure” at the ASF: Clear -1 !!! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de From: Anshum Gupta [mailto:ans...@anshumgupta.net] Sent: Friday, May 29, 2015 11:08 PM To: dev@lucene.apache.org Subject: Moving to git? I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta
Re: Moving to git?
On 5/30/2015 6:59 AM, Adrien Grand wrote: The main benefit I see is that external contributors would get their name in the commit log. However on the other hand, I'm a bit annoyed that people easily disagree on the workflow: some people merge into the maintenance branch first and then to master, other people merge into master first and then cherry-pick, other people prefer rebasing instead of merging, etc. I personally don't really care but if we agree on moving to Git, I hope we can agree on the workflow at the same time. At least today with svn we have something simple that everybody agrees on. -0: I'm not against it but Subversion works well for me today. If everybody else agrees on switching to Git I would like us to agree on the workflow as well. -0 is my vote as well, as long as we take Adrien's advice about the workflow. The normal workflow must be thoroughly documented if we are going to change our version control tool. There will likely be deviations from that workflow required, and if specific deviations are required for common-but-not-entirely-normal situations, those should be documented as well. Additional TL;DR detail about my thoughts and findings below: The impression that I have gotten from watching these religious wars over version control is that subversion is superior at absolute correctness and faithful maintenance of the version history in the face of problems, while git excels when you have a very large number of people who contribute code but only a few who have write access, or for people who work on a lot of different things in different branches. My impression may not be correct, and I'm absolutely sure that reality is a lot more complex. I was not aware of the speed differences Robert noted. I conducted my own timing tests on my 7 megabit DSL connection: time git clone git://git.apache.org/lucene-solr.git test.git real9m20.437s user0m43.288s sys 0m12.192s time svn co https://svn.apache.org/repos/asf/lucene/dev/trunk test.svn real2m16.505s user0m6.794s sys 0m5.267s If I ever reach a point where I am working on multiple code trees, I expect that I will have them in separate directories because that will help me keep them straight. I gather from comments on this thread that git will let you keep them all in one repo, but I think I would get myself into trouble doing that, working on the wrong one accidentally. There is a significant (nearly two to one) size difference between the cloned git repo and an equivalent svn checkout, although to be honest, with today's typical storage sizes, this doesn't matter all that much, unless you are maintaining separate directories for multiple git code trees, as I probably would. elyograg@sauron:~/asf$ du -sm test.* 498 test.git 252 test.svn I ran git gc --aggressive which reduced the repo by 55MB. I do use git at work for the code that I write there, and for version control on configuration files for various server installations. Those repos are *nowhere* near as big as lucene-solr, though. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
+1 to move to git! -Yonik On Fri, May 29, 2015 at 5:07 PM, Anshum Gupta ans...@anshumgupta.net wrote: I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
There may be other good reasons for using git, but this is not one. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:57 PM, Yonik Seeley ysee...@gmail.com wrote: On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
Life is so much easier on long train/plane journeys with Git. +1. On Sat, May 30, 2015 at 9:21 AM, Shai Erera ser...@gmail.com wrote: +1 to moving to git. Shai On May 30, 2015 6:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: * There may be other good reasons for using git, but this is not one.* I just added one more to the list. I think most other reasons have already been spoken about in previous discussions. I'm not trying to debate on what is better (I think it's a lot to do with *opinion*). I think it's a reasonable thing to move to a system that allows for distributed version control and makes working on multiple things at the same time easy. But again, that's my thought. The last time the discussion came up, I was +1 to moving and wasn't already using it a lot. Right now, I'm just trying to work on multiple things and find git easier for that purpose. I just wanted to bring this back up and see if the opinion of active contributors has changed since the last time by means of a polite and friendly discussion. In the end, we can agree to disagree but it'd be better than not discussing at all. :-) On Fri, May 29, 2015 at 7:24 PM, Walter Underwood wun...@wunderwood.org wrote: There may be other good reasons for using git, but this is not one. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:57 PM, Yonik Seeley ysee...@gmail.com wrote: On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Anshum Gupta
Re: Moving to git?
I’m not a committer, but I’ve built production code with a lot of source control systems and git is by far the the most cumbersome. It does one thing well, handling untrusted contributors. With trusted committers, Subversion is very nice, thank you. Here are the systems I’ve used. * SCCS * RCS * HP history manager * ClearCase * CVS * Perforce * Subversion * git wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 8:58 PM, Ishan Chattopadhyaya ichattopadhy...@gmail.com wrote: Life is so much easier on long train/plane journeys with Git. +1. On Sat, May 30, 2015 at 9:21 AM, Shai Erera ser...@gmail.com wrote: +1 to moving to git. Shai On May 30, 2015 6:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: There may be other good reasons for using git, but this is not one. I just added one more to the list. I think most other reasons have already been spoken about in previous discussions. I'm not trying to debate on what is better (I think it's a lot to do with *opinion*). I think it's a reasonable thing to move to a system that allows for distributed version control and makes working on multiple things at the same time easy. But again, that's my thought. The last time the discussion came up, I was +1 to moving and wasn't already using it a lot. Right now, I'm just trying to work on multiple things and find git easier for that purpose. I just wanted to bring this back up and see if the opinion of active contributors has changed since the last time by means of a polite and friendly discussion. In the end, we can agree to disagree but it'd be better than not discussing at all. :-) On Fri, May 29, 2015 at 7:24 PM, Walter Underwood wun...@wunderwood.org wrote: There may be other good reasons for using git, but this is not one. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:57 PM, Yonik Seeley ysee...@gmail.com wrote: On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Anshum Gupta
Re: Moving to git?
* There may be other good reasons for using git, but this is not one.* I just added one more to the list. I think most other reasons have already been spoken about in previous discussions. I'm not trying to debate on what is better (I think it's a lot to do with *opinion*). I think it's a reasonable thing to move to a system that allows for distributed version control and makes working on multiple things at the same time easy. But again, that's my thought. The last time the discussion came up, I was +1 to moving and wasn't already using it a lot. Right now, I'm just trying to work on multiple things and find git easier for that purpose. I just wanted to bring this back up and see if the opinion of active contributors has changed since the last time by means of a polite and friendly discussion. In the end, we can agree to disagree but it'd be better than not discussing at all. :-) On Fri, May 29, 2015 at 7:24 PM, Walter Underwood wun...@wunderwood.org wrote: There may be other good reasons for using git, but this is not one. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:57 PM, Yonik Seeley ysee...@gmail.com wrote: On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Anshum Gupta
Re: Moving to git?
“git breaks when it tries to mirror” is not a convincing argument for moving to git. It might be an argument for fixing the mirroring in git. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:03 PM, Yonik Seeley ysee...@gmail.com wrote: +1 to move to git! -Yonik On Fri, May 29, 2015 at 5:07 PM, Anshum Gupta ans...@anshumgupta.net wrote: I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Moving to git?
+1 to moving to git. Shai On May 30, 2015 6:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: * There may be other good reasons for using git, but this is not one.* I just added one more to the list. I think most other reasons have already been spoken about in previous discussions. I'm not trying to debate on what is better (I think it's a lot to do with *opinion*). I think it's a reasonable thing to move to a system that allows for distributed version control and makes working on multiple things at the same time easy. But again, that's my thought. The last time the discussion came up, I was +1 to moving and wasn't already using it a lot. Right now, I'm just trying to work on multiple things and find git easier for that purpose. I just wanted to bring this back up and see if the opinion of active contributors has changed since the last time by means of a polite and friendly discussion. In the end, we can agree to disagree but it'd be better than not discussing at all. :-) On Fri, May 29, 2015 at 7:24 PM, Walter Underwood wun...@wunderwood.org wrote: There may be other good reasons for using git, but this is not one. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:57 PM, Yonik Seeley ysee...@gmail.com wrote: On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Anshum Gupta
Re: Moving to git?
+1 to git. Git may not be perfect, but SVN isn’t either. On Fri, May 29, 2015 at 11:59 PM Ishan Chattopadhyaya ichattopadhy...@gmail.com wrote: Life is so much easier on long train/plane journeys with Git. +1. On Sat, May 30, 2015 at 9:21 AM, Shai Erera ser...@gmail.com wrote: +1 to moving to git. Shai On May 30, 2015 6:24 AM, Anshum Gupta ans...@anshumgupta.net wrote: * There may be other good reasons for using git, but this is not one.* I just added one more to the list. I think most other reasons have already been spoken about in previous discussions. I'm not trying to debate on what is better (I think it's a lot to do with *opinion*). I think it's a reasonable thing to move to a system that allows for distributed version control and makes working on multiple things at the same time easy. But again, that's my thought. The last time the discussion came up, I was +1 to moving and wasn't already using it a lot. Right now, I'm just trying to work on multiple things and find git easier for that purpose. I just wanted to bring this back up and see if the opinion of active contributors has changed since the last time by means of a polite and friendly discussion. In the end, we can agree to disagree but it'd be better than not discussing at all. :-) On Fri, May 29, 2015 at 7:24 PM, Walter Underwood wun...@wunderwood.org wrote: There may be other good reasons for using git, but this is not one. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 29, 2015, at 6:57 PM, Yonik Seeley ysee...@gmail.com wrote: On Fri, May 29, 2015 at 9:40 PM, Walter Underwood wun...@wunderwood.org wrote: “git breaks when it tries to mirror” is not a convincing argument for moving to git. I'd be +1 without that annoyance as well. As Anshum mentioned, this has come up a number of times in the past. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Anshum Gupta
Moving to git?
I know this has come up a few times in the past but I wanted to bring this up again. The lucene-solr ASF git mirror has been behind by about a day. I was speaking with the infra people and they say that the size of the repo needs more and more ram. Forcing a sync causes a fork-bomb: Can't fork: Cannot allocate memory at /usr/share/perl5/Git.pm line 1517. They tried a few things but it's almost certain that it needs even more RAM, which still is a band-aid as they'd soon need even more RAM. Also, adding RAM involves downtime for git.a.o which needs to be planned. As a stop gap arrangement attached a volume to the instance and are using it as swap to work around the adding RAM requires restart issue. FAQ: How would the memory requirement change if we moved to git instead of mirroring? Answer: svn - git mirroring is a weird process and has quite the memory leak. Using git directly is much cleaner. I personally think git does make things easier to manage when you're working on multiple overlapping things and so we should re-evaluate moving to it. I would have been fine had the mirroring worked, as all I want is a way to be able to work on multiple (local) branches without having to create and maintain directories like: lucene-solr-trunk1, lucene-solr-trunk2, or SOLR-, etc. Opinions? -- Anshum Gupta