Re: Design challenges in chunkd self-checking

2010-01-05 Thread Pete Zaitcev
On Tue, 22 Dec 2009 22:36:16 -0500 Jeff Garzik j...@garzik.org wrote: Seems like a mutex-wrapped GLib hash table would work... I dunno about this... See, I think it's like kernel timers: there's a lot of premium on having add and remove quick, and the rest is whatever. The important part is not

Re: Design challenges in chunkd self-checking

2010-01-05 Thread Pete Zaitcev
On Tue, 05 Jan 2010 16:02:58 -0500 Jeff Garzik j...@garzik.org wrote: On 01/05/2010 03:47 PM, Pete Zaitcev wrote: On Tue, 22 Dec 2009 22:36:16 -0500 Jeff Garzikj...@garzik.org wrote: Seems like a mutex-wrapped GLib hash table would work... I dunno about this... See, I think it's like

Re: Design challenges in chunkd self-checking

2010-01-05 Thread Pete Zaitcev
On Tue, 05 Jan 2010 16:53:55 -0500 Jeff Garzik j...@garzik.org wrote: If you have a constant pointer value [for the lifetime of the hash table entry], use g_direct_hash. If you have a nul-terminated string, GLib also has g_str_hash. Of course I considered these, but thanks to our keys

[Patch] tabled: add checksumming to test/large-object.c

2010-01-05 Thread Pete Zaitcev
The block numbers do not give us comlete enough coverage, so add a simple checksum that includes all transmitted bytes. Keep block numbers though: they are invaluable when comparing specific damage with a traces of events inside tabled. Signed-off-by: Pete Zaitcev zait...@redhat.com --- test

[Patch 2/2] tabled: add a test for larger objects

2010-01-04 Thread Pete Zaitcev
Existing tests only excercised operations with relatively small objects. It did not test pipelining of object data in sufficient degree. So, let's have a better test case for this (large-object.c). We also change the existing basic-object.c to match. Signed-Off-By: Pete Zaitcev zait

[Patch] cld: write one less newline into PID

2010-01-03 Thread Pete Zaitcev
Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/server.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/server.c b/server/server.c index c3e77f9..7c9c07a 100644 --- a/server/server.c +++ b/server/server.c @@ -706,7 +725,7 @@ static int net_open_any(void

Re: [Patch 1/3] tabled: make time2str reentrant

2010-01-03 Thread Pete Zaitcev
On Sun, 03 Jan 2010 03:27:59 -0500 Jeff Garzik j...@garzik.org wrote: - sprintf(datestr, Date: %s, time2str(timestr, time(NULL))); + sprintf(datestr, Date: %s, time2str(timestr, 64, time(NULL))); applied 1-3, and then added sizeof() to the above time2str calls... The hardcoded sizes

[Patch 1/3] tabled: make time2str reentrant

2010-01-02 Thread Pete Zaitcev
The main point here is to kill gmtime, but since we're at it, may as well fix the API and add safety (observe, that not all timestr arguments were 64 bytes long in the original code). Signed-Off-By: Pete Zaitcev zait...@redhat.com --- include/httputil.h |2 +- lib/httpstor.c | 12

[Patch 2/3] tabled: return correct owner in lists

2010-01-02 Thread Pete Zaitcev
For some reason we were printing the user which were executing the request instead of the owner of the key in question, even before the changeove to obj_vitals. Signed-Off-By: Pete Zaitcev zait...@redhat.com --- server/bucket.c |6 -- 1 file changed, 4 insertions(+), 2 deletions

[Patch 3/3] tabled: drop StorageNode clause

2010-01-02 Thread Pete Zaitcev
We rely on CLD for builds (on all platforms), so we do not need the StorageNode clause anymore. Signed-Off-By: Pete Zaitcev zait...@redhat.com --- doc/etc.tabled.conf | 14 +-- server/config.c | 77 +- 2 files changed, 6 insertions(+), 85

[Patch 1/1] chunkd: split up fs_list_objs

2009-12-27 Thread Pete Zaitcev
This way we create a set of methods that can be used by self-check to list existing objects. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/be-fs.c | 361 +- server/chunkd.h | 16 ++ 2 files changed, 244 insertions(+), 133 deletions

[Patch 2/4] chunkd: clean-up return paths

2009-12-27 Thread Pete Zaitcev
This version leaves fs_free alone and preserves the quick-quit mechanism. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/be-fs.c | 10 ++ server/server.c | 15 --- 2 files changed, 14 insertions(+), 11 deletions(-) commit

[Patch 3/4] chunkd: add objcache

2009-12-27 Thread Pete Zaitcev
This a mechanism by which self-check may know what objects were updated and thus should not be disturbed if their checksums fail. Signed-off-by: Pete Zaitcev zait...@redhat.com --- include/Makefile.am |2 include/objcache.h | 72 +++ server/Makefile.am |3

[Patch 4/4] chunkd: add self-checking

2009-12-27 Thread Pete Zaitcev
with the self-checking. The intent is to let the performance of checking to scale with the number of objects and amount of data stored in the cell. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/Makefile.am|2 server/be-fs.c| 126 +++- server/chunkd.h

[Patch 1/4] chunkd: drop unused defines

2009-12-25 Thread Pete Zaitcev
Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/be-fs.c |2 -- server/chunkd.h |2 -- 2 files changed, 4 deletions(-) commit 039dba4ecca0f7edb049f6397d98c139da6bef4d Author: Master zait...@lembas.zaitcev.lan Date: Fri Dec 25 21:56:07 2009 -0700 Unused defines, drop

[Patch 2/4] chunkd: Add tchdbsetmutex

2009-12-25 Thread Pete Zaitcev
Documentation says this is necessary for multi-threaded access. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/be-fs.c |3 +++ 1 file changed, 3 insertions(+) commit 8245bc23aa4608a666b8dda767f18ca03a110906 Author: Master zait...@lembas.zaitcev.lan Date: Fri Dec 25 22:14:37

[Patch 3/4] chunkd: make error paths more regular

2009-12-25 Thread Pete Zaitcev
1. Don't try to save a call in a function where normal and error unfolding sequences are different. 2. Use exception labels linked to what caused them, not to what cleanup has to be done. 3. Balance fs_open - fs_close. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/be-fs.c

[Patch 4/4] chunkd: Drop unused forward declaration

2009-12-25 Thread Pete Zaitcev
Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/chunkd.h |1 - 1 file changed, 1 deletion(-) commit 6fe147ad2f8833e162d92277ec6827520c5497ba Author: Master zait...@lembas.zaitcev.lan Date: Fri Dec 25 23:18:50 2009 -0700 Unused forward declaration, drop. diff --git a/server

Design challenges in chunkd self-checking

2009-12-22 Thread Pete Zaitcev
I'm looking into adding self-checking to chunkd. This involves basically a process that re-reads everything stored in the chunkserver and verifies that it's still ok. Nothing can be simpler, right? So, current problems for which I'd like input are: - Scheduling and deconflicting with normal

Re: Design challenges in chunkd self-checking

2009-12-22 Thread Pete Zaitcev
On Tue, 22 Dec 2009 17:43:58 -0500 Jeff Garzik j...@garzik.org wrote: It is normal and reasonable to maintain global information about all in-progress operations. Caching systems do that, for example, to ensure multiple cache requests for object A do not initiate multiple simultaneous

[Patch 1/1] Chunk: fix stored checksums

2009-12-20 Thread Pete Zaitcev
Existing code writes checksums of something other than the object data. Fix by summing the object data. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/object.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) This seems too obvious... Where is the trap? diff -urpN -X

Re: a boto-works test for tabled?

2009-12-18 Thread Pete Zaitcev
On Fri, 18 Dec 2009 10:57:45 -0500 Jeff Darcy jda...@redhat.com wrote: On 12/18/2009 05:14 AM, Jeff Garzik wrote: Is there anyone that would be interested in copying (or directly use) /usr/lib/python2.6/site-packages/boto/tests/test_s3connection.py as a tabled boto-works test?

Re: a boto-works test for tabled?

2009-12-18 Thread Pete Zaitcev
On Fri, 18 Dec 2009 10:57:45 -0500 Jeff Darcy jda...@redhat.com wrote: Boto can accept non-Amazon hostnames, but there's a bit of a trick to making it work with tabled. As of September 10, this was the magic formula. I see what happened, the fix was later than 9/10:

Re: [PATCH 0/6 v2] logging refactoring

2009-12-15 Thread Pete Zaitcev
On Mon, 14 Dec 2009 18:18:47 -0800 Colin McCabe cmcc...@alumni.cmu.edu wrote: Also my new patch creates a hail_log.h. It didn't really seem right to force everyone who wanted to use HAIL_LOG to include cldc.h. I'll convert over the rest of chunkd and tabled to the new macros if this patch

[Patch 1/3] tabled: drop commented messages

2009-12-15 Thread Pete Zaitcev
Less clutter is good. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/cldu.c |3 --- server/storage.c | 16 server/storparse.c |1 - 3 files changed, 20 deletions(-) Jeff, if you recall, I promised to do this as a condition for a past merge. commit

[Patch 2/3] tabled: use argument of a thread

2009-12-15 Thread Pete Zaitcev
We replace a comment with code to better show the intent. Once we have several threads, we can plug TLS easier into this. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/replica.c | 69 ++--- 1 file changed, 41 insertions(+), 28 deletions

Re: [PATCH 5/6] cld: modify cld-dns to use logging macros

2009-12-09 Thread Pete Zaitcev
On Tue, 8 Dec 2009 16:11:55 -0800 Colin McCabe cmcc...@alumni.cmu.edu wrote: @@ -161,8 +153,8 @@ static void push_host(GList **host_list, struct cldc_host *hp_in) * This is not reentrant. Better be called before any other threads * are started. */ -int cldc_getaddr(GList

Re: [Patch 1/1] tabled: Add replication daemon

2009-12-04 Thread Pete Zaitcev
On Thu, 03 Dec 2009 00:42:47 -0500 Jeff Garzik j...@garzik.org wrote: On 11/26/2009 09:39 PM, Pete Zaitcev wrote: It seems to me you should a) leave current main-thread libevent code unchanged b) call event_base_new() for the new thread, passing that newly-created event_base to replica

chunkd self-check question

2009-12-02 Thread Pete Zaitcev
I need a way to scan all objects that an Chunk node keeps. There's a function that does it already: fs_list_objs. Looking at it, is there a reason why it uses readdir instead of tchdbiternext? In case of self-checking, scanning directories is undesirable, because if an object somehow (e.g. a

Re: [Patch 1/1] CLD: fix crash in __mutex_get_max (libdb-4.7.so) on F13

2009-12-01 Thread Pete Zaitcev
On Sun, 29 Nov 2009 20:38:45 -0500 Jeff Garzik j...@garzik.org wrote: Interesting... I recall the root cause clearly, now: /usr/include/db.h always refers to the latest installed db4, even if compat-db{,45,46} is installed. Our configure recipe links with the most recent db4 listed in

Re: [Patch 1/2] CLD: factor timers out into a library

2009-11-29 Thread Pete Zaitcev
On Sun, 29 Nov 2009 06:11:44 -0500 Jeff Garzik j...@garzik.org wrote: hmmm... cld now segfaults reliably in koji: http://koji.fedoraproject.org/koji/taskinfo?taskID=1836079 Curious. It works fine here, of course (make distcheck). Did you try to build locally? -- Pete -- To unsubscribe from

Re: [Patch 1/2] CLD: factor timers out into a library

2009-11-29 Thread Pete Zaitcev
On Sun, 29 Nov 2009 15:36:53 -0500, Jeff Garzik j...@garzik.org wrote: On 11/29/2009 02:34 PM, Pete Zaitcev wrote: On Sun, 29 Nov 2009 06:11:44 -0500 Jeff Garzikj...@garzik.org wrote: hmmm... cld now segfaults reliably in koji: http://koji.fedoraproject.org/koji/taskinfo?taskID=1836079

[Patch 1/1] CLD: fix crash in __mutex_get_max (libdb-4.7.so) on F13

2009-11-29 Thread Pete Zaitcev
Fedora 13 comes with db4.8 and apparently the compat-db4.7 is bust. Let us link with 4.8 as a workaround. Signed-Off-By: Pete Zaitcev zait...@redhat.com --- configure.ac |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Not sure how safe or desirable this is, but it seems to work

Re: [Patch 1/1] CLD: fix crash in __mutex_get_max (libdb-4.7.so) on F13

2009-11-29 Thread Pete Zaitcev
On Sun, 29 Nov 2009 20:38:45 -0500, Jeff Garzik j...@garzik.org wrote: On 11/29/2009 08:17 PM, Pete Zaitcev wrote: Fedora 13 comes with db4.8 and apparently the compat-db4.7 is bust. Let us link with 4.8 as a workaround. Interesting... I recall the root cause clearly, now: /usr/include

Re: [PATCH] Some minor CLD test program fixes

2009-11-28 Thread Pete Zaitcev
On Fri, 27 Nov 2009 15:20:36 -0800 cmcc...@alumni.cmu.edu wrote: When doing a raw read(2) in cld_readport(), resume after EINTR. Also resume if we read less than the requested amount. if ((fd = open(fname, O_RDONLY)) == -1) return -errno; - rc = read(fd, buf, LEN);

[Patch 2/2] CLD: drop dependency on libevent from libcldc

2009-11-28 Thread Pete Zaitcev
. Signed-Off-By: Pete Zaitcev zait...@redhat.com --- configure.ac |4 - include/cldc.h |7 -- include/libtimer.h |3 - lib/cldc-udp.c | 27 - pkg/cld.spec |2 test/.gitignore|1 test/Makefile.am |9 ++- test

[Patch 1/1] tabled: Add replication daemon

2009-11-26 Thread Pete Zaitcev
too much from benchmarks (but if it does, it's only honest to take the hit). It is indispensible. However, there's a plan to add useful monitoring of jobs and other state, such as available nodes. This implementation uses a separate thread. Signed-off-by: Pete Zaitcev zait...@redhat.com

Re: [Patch 3/7] tabled: Reduce verbosity in CLD client

2009-11-14 Thread Pete Zaitcev
On Sat, 14 Nov 2009 03:37:50 -0500, Jeff Garzik j...@garzik.org wrote: On 11/14/2009 01:32 AM, Pete Zaitcev wrote: 1) remove it 2) switch to cld's -D debuglevel option format, and bury the logging statements under a debug level that produces a higher verbosity. Anything

[Patch 1/7] tabled: Fix error path in bucket_del

2009-11-13 Thread Pete Zaitcev
Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/bucket.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/server/bucket.c b/server/bucket.c index c13ec05..b3c73de 100644 --- a/server/bucket.c +++ b/server/bucket.c @@ -591,7 +591,7 @@ bool bucket_del(struct

[Patch 5/7] tabled: Add replication daemon

2009-11-13 Thread Pete Zaitcev
too much from benchmarks (but if it does, it's only honest to take the hit). It is indispensible. However, there's a plan to add useful monitoring of jobs and other state, such as available nodes. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/Makefile.am |4 server/replica.c

[Patch 7/7] tabled: Improve messages in storage.c

2009-11-13 Thread Pete Zaitcev
Mostly, add IDs so we can see which job and which node fails. Signed-off-by: Pete Zaitcev zait...@redhat.com --- server/storage.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/server/storage.c b/server/storage.c index f1822f7..fdb3fc1 100644 --- a/server

CLD crash mystery

2009-11-12 Thread Pete Zaitcev
With the last fixes to the timer code, CLD does not crash often anymore, but it still does, and it's still related to timers. Program received signal SIGSEGV, Segmentation fault. 0x in ?? () (gdb) where #0 0x in ?? () #1 0x080511be in timers_run () at util.c:207 #2 0x0804edfa

Re: Hail usable as a limited local AWS S3 and doc contribution?

2009-10-18 Thread Pete Zaitcev
On Tue, 13 Oct 2009 10:27:00 -0700 (PDT), Zack Perry zack.pe...@sbcglobal.net wrote: While trying to find a less expensive way to validate my application targeting AWS S3, I stumbled across Project Hail. I'm not sure if the assumptions are sound. The S3 is extremely cheap. For example, my

[Patch 3/3] tabled: flip the switch to CLD-based configuration

2009-08-28 Thread Pete Zaitcev
the switch. This patch demonstrates how to flip the two configuration files atomically. Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/test/chunkd-test.conf b/test/chunkd-test.conf index 119aa00..9256b9d 100644 --- a/test/chunkd-test.conf +++ b/test/chunkd-test.conf @@ -9,11 +9,7 @@ /Listen

[Patch] cldcli: permit (much) longer messages

2009-08-27 Thread Pete Zaitcev
a strncat, but it's more concise. Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/tools/cldcli.c b/tools/cldcli.c index eb4ebc4..60ab301 100644 --- a/tools/cldcli.c +++ b/tools/cldcli.c @@ -131,8 +131,8 @@ static void applog(int prio, const char *fmt, ...) va_list ap

[Patch] cld: pad protocol structures

2009-08-25 Thread Pete Zaitcev
-Off-By: Pete Zaitcev zait...@redhat.com diff --git a/include/cld_msg.h b/include/cld_msg.h index 89ab066..e4c8f28 100644 --- a/include/cld_msg.h +++ b/include/cld_msg.h @@ -167,6 +167,7 @@ struct cld_msg_open { uint32_tmode; /** open mode, COM_xxx

[Patch 2/3] Chunkd: Whole hog on applog

2009-08-12 Thread Pete Zaitcev
We abandon the wasteful strategy of gradual changes and simply change to applog-style API wholesale, to put the whole issue behind. Applications need to be rebuilt after new cld-devel is installed. Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/server/chunkd.h b/server/chunkd.h

[Patch 1/2] chunkd: corrent printed name

2009-08-12 Thread Pete Zaitcev
Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/server/cldu.c b/server/cldu.c index da08d1b..fc746e4 100644 --- a/server/cldu.c +++ b/server/cldu.c @@ -422,7 +422,7 @@ static int cldu_lock_cb(struct cldc_call_opts *carg, enum cle_err_codes errc) int rc; if (errc

[Patch] libcldc: transition to applog, phase 1

2009-08-11 Thread Pete Zaitcev
of -app_log arguments as a pledge that it has to change again anyway. Long story short, this should be safe to build in Koji without resorting to chain builds. Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/include/cldc.h b/include/cldc.h index f625d5e..712e7c7 100644 --- a/include/cldc.h

[Patch 3/4] chunkd: write our contact information into CLD

2009-08-10 Thread Pete Zaitcev
of elements by hand). Signed-Off-By: Pete Zaitcev zait...@redhat.com diff --git a/doc/api.txt b/doc/api.txt index 1afe3de..002a55f 100644 --- a/doc/api.txt +++ b/doc/api.txt @@ -36,3 +36,34 @@ username==password, the minimum level of authentication necessary to prove that it works. It does not yet

[Patch 1/4] cld: Drop comment about struct layout

2009-08-08 Thread Pete Zaitcev
We do not rely on precise placement of fields in unrelated structs anymore. Signed-Off-By: Pete Zaitcev zait...@redhat.com diff --git a/include/cld_msg.h b/include/cld_msg.h index 124acbb..89ab066 100644 --- a/include/cld_msg.h +++ b/include/cld_msg.h @@ -188,10 +188,6 @@ struct cld_msg_get

[Patch 1/2] tabled: return known to array

2009-08-05 Thread Pete Zaitcev
The .known is the array management item which has no business in a list-managed struct. Phase 1: return it where it was. In phase 2 we will drop it from cldc_host. Then perhaps change array to a circular list. Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/server/cldu.c b/server

[Patch] cldcli: better error messages v2

2009-08-04 Thread Pete Zaitcev
Produce more specific error messages for those who think that host does not necesserily imply port. Also, clean up a bit. Signed-Off-By: Pete Zaitcev zait...@redhat.com diff --git a/tools/cldcli.c b/tools/cldcli.c index 1060d3f..2945596 100644 --- a/tools/cldcli.c +++ b/tools/cldcli.c @@ -9,13

[Patch 2/3] cldcli: suppress warnings from -Wshadow

2009-08-03 Thread Pete Zaitcev
Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/tools/cldcli.c b/tools/cldcli.c index bb5c4cc..90ae601 100644 --- a/tools/cldcli.c +++ b/tools/cldcli.c @@ -437,12 +437,12 @@ static bool cld_p_timer_ctl(void *private, bool add, static int cld_p_pkt_send(void *priv, const void *addr

[Patch 3/3] cldcli: drop useless comment

2009-08-03 Thread Pete Zaitcev
This seems copy-pasted from a daemon like Chunk that has a comment to the tune of now that we have arguments parsed we can switch to syslog. But in cldcli it's meaningless. Signed-off-by: Pete Zaitcev zait...@redhat.com diff --git a/tools/cldcli.c b/tools/cldcli.c index 90ae601..c9b7130 100644

Re: chunkd page updated

2009-07-31 Thread Pete Zaitcev
On Fri, 31 Jul 2009 18:00:46 -0400, Jeff Garzik j...@garzik.org wrote: I updated the chunkd page with API and design notes, capturing some of the emails to Rick and others: http://hail.wiki.kernel.org/index.php/Chunkd This looks pretty good. Ironically, I am thinking about adding a mode

<    1   2