Re: [fossil-users] Fossil scalability

2012-12-28 Thread Stefan Bellon
On Sat, 22 Dec, Stefan Bellon wrote:

 In order to verify whether the problem really is the timestamp, I'm
 now trying to convert just one of the four repositories using the
 method of sharing the workspace. Let's see what happens after 5000 or
 9000 revisions.

This method was quick enough. I was able to convert all 32000
revisions in a day (or so). Much quicker than the other method.

 But perhaps it's not that I'm touching the files, but that I'm
 committing from an empty workspace as I open the checkout each time
 with fossil open --keep and then doing fossil addremove.

It looks that the creation of the .fslckout file takes the time when
starting with a clear checkout for each revision. At present I'm using
an approach of rsync'ing the four working copies into one fossil
workspace per revision, cleaning the workspace completely apart from
the .fslckout file which I retain over revisions. This looks promising
regarding long-time performance.

Greetings,
Stefan

-- 
Stefan Bellon
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil scalability

2012-12-27 Thread Eduardo Morras
On Fri, 21 Dec 2012 15:02:50 +0100
Stefan Bellon sbel...@sbellon.de wrote:

 On Fri, 21 Dec, Joerg Sonnenberger wrote:
 
  On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote:
   In total, the Subversion repositories hold over 45000 revisions.
   The first 5000 revisions were converted in a quite acceptable
   time. But then things started to slow down. At the moment (at
   revision 8150) one Fossil commit takes around half a minute.
  
  There is one issue with commits on large checkouts unrelated to
  repo-cksum and mtime-changes. The SQLite database in the checkout is
  essentially rewritten after each commit, which can be very slow with
  many files.
 
 I have set mtime-changes on and repo-cksum off, like Lluís suggested,
 but only noticed a very minor speed improvement. But now - with those
 settings still in place - I decided to take a closer look at what
 happens at commit time. Especially I paid attention to the point when
 the journal file appears, the point when the New_Version hash is
 output and the point when the journal file disappears/fossil
 terminates.
 
 I made the following observation: In the vast majority of commits, it
 takes around 15 to 18 seconds from fossil commit till the journal
 file appears. Then it takes between 4 and 10 seconds till the
 New_Version message is output and another 11 to 16 seconds till the
 journal file disappears and fossil terminates.
 
 However, there are a very few commits that work differently. They take
 the same time till the journal file appears, but then only a very few
 seconds till New_Version is displayed and again a very short time
 till fossil terminates.
 
 So I would assume that the time till the journal file appears is used
 for checksum/mtime calculation and file system performance. But the
 phases during journal lifetime seem to be the database transaction
 time.
 
 Two observations:
 
 1) It looks like fossil always takes its 15 to 20 seconds (for this
specific project at this specific state on this specific machine!)
till the commit actually begins. Perhaps this can be improved (as
Subversion is faster regarding this respect in the same scenario),
but perhaps not because checking for modifications has to work
differently, I don't know.
 
 2) The database transaction time can vary wildly. Most of the cases it
takes 15 to 25 seconds, in some cases however under 5 seconds.
Looking at the specific commit data I was unable to detect any
suspicious difference between the slow and the quick commits: It
happens for just file modifications and for additions the same.
 

Perhaps you can use a ram based disk for the temporary directory,
what does iostat tells you?

Another point, plus another perhaps, to keep in mind is that database
configuration is biased towards little size or low work repositories. A
big or a busy or both repository may stress the db with current
configuration. Again, iostat and similar tools is your/our friend.

There are some pragmas that can be send to sqlite, bigger cache_size,
journal_mode=memory, automatic_index=off, temp_store=2,
bigger wal-autochekpoint (fossil default is 1, sqlite default is 1000),
etc. Afaik, only foreign_keys=off is used.

 Greetings,
 Stefan
 

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] Fossil scalability

2012-12-21 Thread Stefan Bellon
Hi!

Previously I haven't used Fossil for very large repositories. But I
like its concept and I am thinking about migrating our 15 years of
history in four parallel Subversion repositories into one Fossil
repository.

I wrote a script to replay the commits from Subversion (at the moment
just trunk) into Fossil and I am wondering whether I will end up with
some usable state or not.

In total, the Subversion repositories hold over 45000 revisions. The
first 5000 revisions were converted in a quite acceptable time. But
then things started to slow down. At the moment (at revision 8150) one
Fossil commit takes around half a minute.

I'm not sure whether the time is spend in the file system trying to
find out which files (of the over 1 ones) to commit or in the real
database operations of Fossil. However, I cannot imagine that it's the
file system, because the same workspace is used daily to do Subversion
commits which work instantaneously.

So, is this some database inefficiency that can be solved via some
regular optimize command? Or is this just the point where Fossil is
not that scalable like other VCS?

Just for reference, here's what Fossil's stats page of the current
repository state (the conversion is still running) is saying:

Repository Size:151875584 bytes (151.9MB)
Number Of Artifacts:41671 (stored as 10572 full text and 31099 delta 
blobs)
Uncompressed Artifact Size: 69607 bytes average, 31742157 bytes max, 2900554618 
bytes (2.9GB) total
Compression Ratio:  19:1
Number Of Check-ins:8150
Number Of Files:11667
Number Of Wiki Pages:   0
Number Of Tickets:  0
Duration Of Project:5649 days or approximately 15.86 years.
Project ID: ad001d59eb3892b9dfad405a2bd5a752a04ef448
Server ID:  61c45d1915244bd9ff087951c9f5c6d59819d350
Fossil Version: 1.24 2012-10-22 12:48:04 [8d758d3715] (gcc-4.3.3 
20081214 for GNAT Pro 6.2.2 (20090612))
SQLite Version: 2012-10-09 01:39:25 [01dc032b5b] (3.7.15)
Database Stats: 148316 pages, 1024 bytes/page, 251 free pages, 
UTF-8, delete mode

I'm looking forward to hearing your experience with projects of that
size.

Greetings,
Stefan

-- 
Stefan Bellon
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil scalability

2012-12-21 Thread Joerg Sonnenberger
On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote:
 In total, the Subversion repositories hold over 45000 revisions. The
 first 5000 revisions were converted in a quite acceptable time. But
 then things started to slow down. At the moment (at revision 8150) one
 Fossil commit takes around half a minute.

There is one issue with commits on large checkouts unrelated to
repo-cksum and mtime-changes. The SQLite database in the checkout is
essentially rewritten after each commit, which can be very slow with
many files.

Joerg
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil scalability

2012-12-21 Thread Stefan Bellon
On Fri, 21 Dec, Joerg Sonnenberger wrote:

 On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote:
  In total, the Subversion repositories hold over 45000 revisions. The
  first 5000 revisions were converted in a quite acceptable time. But
  then things started to slow down. At the moment (at revision 8150)
  one Fossil commit takes around half a minute.
 
 There is one issue with commits on large checkouts unrelated to
 repo-cksum and mtime-changes. The SQLite database in the checkout is
 essentially rewritten after each commit, which can be very slow with
 many files.

I have set mtime-changes on and repo-cksum off, like Lluís suggested,
but only noticed a very minor speed improvement. But now - with those
settings still in place - I decided to take a closer look at what
happens at commit time. Especially I paid attention to the point when
the journal file appears, the point when the New_Version hash is
output and the point when the journal file disappears/fossil terminates.

I made the following observation: In the vast majority of commits, it
takes around 15 to 18 seconds from fossil commit till the journal
file appears. Then it takes between 4 and 10 seconds till the
New_Version message is output and another 11 to 16 seconds till the
journal file disappears and fossil terminates.

However, there are a very few commits that work differently. They take
the same time till the journal file appears, but then only a very few
seconds till New_Version is displayed and again a very short time
till fossil terminates.

So I would assume that the time till the journal file appears is used
for checksum/mtime calculation and file system performance. But the
phases during journal lifetime seem to be the database transaction
time.

Two observations:

1) It looks like fossil always takes its 15 to 20 seconds (for this
   specific project at this specific state on this specific machine!)
   till the commit actually begins. Perhaps this can be improved (as
   Subversion is faster regarding this respect in the same scenario),
   but perhaps not because checking for modifications has to work
   differently, I don't know.

2) The database transaction time can vary wildly. Most of the cases it
   takes 15 to 25 seconds, in some cases however under 5 seconds.
   Looking at the specific commit data I was unable to detect any
   suspicious difference between the slow and the quick commits: It
   happens for just file modifications and for additions the same.

Greetings,
Stefan

-- 
Stefan Bellon
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil scalability

2012-12-21 Thread Richard Hipp
On Fri, Dec 21, 2012 at 8:25 AM, Joerg Sonnenberger jo...@britannica.bec.de
 wrote:

 On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote:
  In total, the Subversion repositories hold over 45000 revisions. The
  first 5000 revisions were converted in a quite acceptable time. But
  then things started to slow down. At the moment (at revision 8150) one
  Fossil commit takes around half a minute.

 There is one issue with commits on large checkouts unrelated to
 repo-cksum and mtime-changes. The SQLite database in the checkout is
 essentially rewritten after each commit, which can be very slow with
 many files.


Joerg means the _FOSSIL_ or .fslckout database that records the current
state of the check-out.  That database contains a record for each file in
the checkout, and that record gets updated for every file on a commit.  So
if you have 10 files, that means 10 UPDATEs.  But those should all
happen within a single transaction, are very small records, and so the
total update time should be under one second.  I take it you are seeing
something different?  Can you give me additional information about what you
are seeing take place on the _FOSSIL_ database so that I can try to track
down the problem?



 Joerg
 ___
 fossil-users mailing list
 fossil-users@lists.fossil-scm.org
 http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users




-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil scalability

2012-12-21 Thread Joerg Sonnenberger
On Fri, Dec 21, 2012 at 09:33:26AM -0500, Richard Hipp wrote:
 On Fri, Dec 21, 2012 at 8:25 AM, Joerg Sonnenberger jo...@britannica.bec.de
  wrote:
 
  On Fri, Dec 21, 2012 at 12:30:25PM +0100, Stefan Bellon wrote:
   In total, the Subversion repositories hold over 45000 revisions. The
   first 5000 revisions were converted in a quite acceptable time. But
   then things started to slow down. At the moment (at revision 8150) one
   Fossil commit takes around half a minute.
 
  There is one issue with commits on large checkouts unrelated to
  repo-cksum and mtime-changes. The SQLite database in the checkout is
  essentially rewritten after each commit, which can be very slow with
  many files.
 
 
 Joerg means the _FOSSIL_ or .fslckout database that records the current
 state of the check-out.  That database contains a record for each file in
 the checkout, and that record gets updated for every file on a commit. 

Correct.

 So if you have 10 files, that means 10 UPDATEs.  But those should all
 happen within a single transaction, are very small records, and so the
 total update time should be under one second.  I take it you are seeing
 something different?  Can you give me additional information about what you
 are seeing take place on the _FOSSIL_ database so that I can try to track
 down the problem?

Last time I looked at the details, it wasn't using UPDATEs, but removed
all entries at the beginning and inserted them again. That creates a
huge number of disk seeks (even within the transaction) as it ended up
rewriting something like a 40MB table. It would definitely help a lot if
only the *changed* entries where UPDATEd.

Joerg
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil scalability

2012-12-21 Thread Matt Welland
Some time ago I did experiments with large numbers of commits and large
amounts of data and I thought the fossil performance was quite acceptable.
I did see things slow down but I don't recall it being as dramatic as what
you are describing.

How does your replay script work? Are you overlapping the subversion repo
with the fossil one and doing an svn update so that fossil only sees the
files touched that actually changed? If all files get touched in the repo
that may slow things down unnecessarily. My method for doing what you are
doing was to use rsync with appropriate ignores in a flow something like
this:

In svn repo:

svn update
rsync -av --delete [some excludes here] ./ /path/to/fossil/repo/

then in fossil repo:
fsl addremove
fsl commit 

This flow ensured that only actual changes were seen by fossil. I.e.
unchanged files remain untouched.

My results:

   Note: This scenario is very unrealistic in that every file has every
line changed hundreds of times

  which creates thousands of tiny deltas that are time consuming to
transfer.




 Repository size 704 MB  Number of artifacts (objects stored) 234078
 Number of check-ins 140119
 Number of files 239
 Data size (files in check out area) 178 MB


 Step Time (s) Comment  Clone 94.609 Higher than would like, happens only
once and can always fall back to copy + sync (8 secs)  Copy + sync
7.690 Equivalent
to clone  Sync (no changes) 1.630
 Update (no change, no sync) 0.004 Autosync turned off  Open 7.060
 Checkin (move one line in one file) 5.440 Autosync turned off  Sync (one
change) 2.500
 Revert (all files touched) 6.360




On Fri, Dec 21, 2012 at 4:30 AM, Stefan Bellon sbel...@sbellon.de wrote:

 Hi!

 Previously I haven't used Fossil for very large repositories. But I
 like its concept and I am thinking about migrating our 15 years of
 history in four parallel Subversion repositories into one Fossil
 repository.

 I wrote a script to replay the commits from Subversion (at the moment
 just trunk) into Fossil and I am wondering whether I will end up with
 some usable state or not.

 In total, the Subversion repositories hold over 45000 revisions. The
 first 5000 revisions were converted in a quite acceptable time. But
 then things started to slow down. At the moment (at revision 8150) one
 Fossil commit takes around half a minute.

 I'm not sure whether the time is spend in the file system trying to
 find out which files (of the over 1 ones) to commit or in the real
 database operations of Fossil. However, I cannot imagine that it's the
 file system, because the same workspace is used daily to do Subversion
 commits which work instantaneously.

 So, is this some database inefficiency that can be solved via some
 regular optimize command? Or is this just the point where Fossil is
 not that scalable like other VCS?

 Just for reference, here's what Fossil's stats page of the current
 repository state (the conversion is still running) is saying:

 Repository Size:151875584 bytes (151.9MB)
 Number Of Artifacts:41671 (stored as 10572 full text and 31099
 delta blobs)
 Uncompressed Artifact Size: 69607 bytes average, 31742157 bytes max,
 2900554618 bytes (2.9GB) total
 Compression Ratio:  19:1
 Number Of Check-ins:8150
 Number Of Files:11667
 Number Of Wiki Pages:   0
 Number Of Tickets:  0
 Duration Of Project:5649 days or approximately 15.86 years.
 Project ID: ad001d59eb3892b9dfad405a2bd5a752a04ef448
 Server ID:  61c45d1915244bd9ff087951c9f5c6d59819d350
 Fossil Version: 1.24 2012-10-22 12:48:04 [8d758d3715]
 (gcc-4.3.3 20081214 for GNAT Pro 6.2.2 (20090612))
 SQLite Version: 2012-10-09 01:39:25 [01dc032b5b] (3.7.15)
 Database Stats: 148316 pages, 1024 bytes/page, 251 free pages,
 UTF-8, delete mode

 I'm looking forward to hearing your experience with projects of that
 size.

 Greetings,
 Stefan

 --
 Stefan Bellon
 ___
 fossil-users mailing list
 fossil-users@lists.fossil-scm.org
 http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil scalability

2012-12-21 Thread Stefan Bellon
On Fri, 21 Dec, Matt Welland wrote:

 How does your replay script work? Are you overlapping the subversion
 repo with the fossil one and doing an svn update so that fossil only
 sees the files touched that actually changed?

That's what I'm trying at the moment in order to see whether that
helps. But this only works with one repository and one working copy.

For my real task I want to merge (on directory-level) four
repositories. I'm not sure whether your rsync idea works in this case.
I went for the naive but safe way to do it like the following:

Each Subversion repository has its own workspace. I update the one that
needs to be updated according to chronological timeline. Then I create
a new temporary directory and copy the content of the four workspaces
there. This is the workspace as I want it to be stored in Fossil. Then
I do the following:

  cd temporary_dir
  fossil open --keep ../master.fossil
  fossil user new $AUTHOR
  fossil addremove
  fossil commit --date-override $COMMITDATE --user $AUTHOR -m $MSG

Then I remove the temporary directory and continue to update the
Subversion workspace with the next chronological commit in the
timeline, copying the files together, ... and so on.

I used Python's shutil.copytree which is documented as cp -p and
therefore should have preserved mode, ownership and timestamp. But I
think I should give it another try with explicit cp -a. I don't think
I can use rsync because of the merge I want to achieve. Or is rsync
even able to merge multiple directories on one side into one on the
other?

But perhaps it's not that I'm touching the files, but that I'm
committing from an empty workspace as I open the checkout each time
with fossil open --keep and then doing fossil addremove.

 If all files get touched in the repo that may slow things down
 unnecessarily. My method for doing what you are doing was to use
 rsync with appropriate ignores in a flow something like this:
 
 In svn repo:
 
 svn update
 rsync -av --delete [some excludes here] ./ /path/to/fossil/repo/
 
 then in fossil repo:
 fsl addremove
 fsl commit 

If I wanted to convert just one repository, then I could just use the
same Subversion workspace as Fossil workspace and just do (meta-code):

svn co .../trunk workdir
fossil init repo.fossil
cd workdir
# parse svn log --xml
fossil open ../repo.fossil
for REV in range($REVISIONS):
svn up -r$REV
fossil addremove
fossil commit ...

Or even easier, if I just wanted to convert one repository, then I
could take the git-conversion route which gives me all the branches and
tags as well and not just trunk (if I understand it correctly).

In order to verify whether the problem really is the timestamp, I'm now
trying to convert just one of the four repositories using the method of
sharing the workspace. Let's see what happens after 5000 or 9000
revisions.

Greetings,
Stefan

-- 
Stefan Bellon
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users