Re: Beagle-query and dates

2007-10-18 Thread Kevin Kubasik
I defiantly like this idea, I think we need to address a few quick
things about how it would be done.

1) We need an advanced query interface, thats the best way to do it. I
know this has been on a wish list forever, maybe next SoC we can ask
for it... ( I think theres enough new features in the works as is, and
if we get the cool metadata/rdf stuff figured out in the next few
months, then they could/would be a definite boon to anyone looking to
help build an interface for more advanced queries.
a) A sub-question here, adding just date spans to queries wouldn't
be too hard (from an interface perspective) if we have/had a calendar
control. I know Windows Forms has one, and I seem to remember during a
short stint of gtkmm work Gtk::Calendar being an autocomplete option,
I dunno if its utility stuff or a Calendar control, but if anyone
knows if its available in the current gtk# (and stable) then I
wouldn't mind looking into it. (I know I'm building up quite a todo
list I'm hoping to just hammer them hard one night in the near future
once I get net)

2)  We already have the most recent modified event and/or creation
time stored in most cases, however it would be cool to somehow keep a
record of all modification times (like to the nearest minute, not
every one of a thousand events in a 30 second timeframe). I'm gonna
put some thought into this, since just adding fields for every event
is far too expensive for a variety of actions. But its just something
that could be useful for the metadata stuff im working on.

3) Date stuff doesn't always seem to be 100% reliable in beagle, we
have lots of backends that are prone to wildly inaccurate dates etc.
While we can get the FIleSystemInfo or DirectoryInfo for file hits
(and can in turn get modification/creation times in most cases) I
think it wouldn't be out of the question to provide 2 new properties
to Hit (LastModifiedTime and CreationTime) for these values. While we
do find these values in properties sometimes, I think we should try to
make them more universial/stabalize our date information across
beagle's backends. (This is based off of some conversations had over 6
months ago, its possible all this has been fixed, but at the very
least the convention of making the creation/mod time part of the
indexables and hits (not properties) is worth consideration.

4) A small technicality when it comes to allowing searches with terms
like 'yesterday' would we still require date: ? aka.
date:yesterday
vs.
yesterday (assuming we want to be able to search for documents with
the word yesterday, this doesn't exactly work)

and for mulit-term phrases like '2 days ago' would we require quotes?
date:2 days ago
vs
date:2 days ago

the first is a little harder to discover, so we would probably need to
add it to our hint page. The second is just impossible to
intelligently discover what the user wants to do. ( I think)

Cheers,
Kevin Kubasik

On 10/17/07, D Bera [EMAIL PROTECTED] wrote:
  Beagle does support date queries, though.  So if you knew the date,
  you could programmatically construct the extra query parameter, which
  would be something like:
 
  date:20071017

 It was added post 0.2.16 which unfortunately means its only in the svn
 trunk and not in 0.2.16.x, 0.2.17, 0.2.18 *sigh*.

 --
 -
 Debajyoti Bera @ http://dtecht.blogspot.com
 beagle / KDE fan
 Mandriva / Inspiron-1100 user
 ___
 Dashboard-hackers mailing list
 Dashboard-hackers@gnome.org
 http://mail.gnome.org/mailman/listinfo/dashboard-hackers



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Joe Shaw
Hi,

On 10/16/07, D Bera [EMAIL PROTECTED] wrote:
 A followup question, I didnot find any API documentation of
 Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
 there.

My understanding is that both M.D.SqliteClient and M.D.Sqlite follow
the general ADO.Net API patterns and that the latter is more or less a
drop-in replacement for the former.  A few things may need to be
tweaked, but in general just changing the using statements at the
top of each source file should be all that's needed.

 If M.D.Sqlite does not have a way to return rows on demand, I
 am against the migration. In the worst case, we can ship with a
 modified copy of M.D.Sqlite but I am not sure what will that buy us.

You've always been able to get rows on demand via ADO.Net, it's just a
matter of the implementation underneath.  The old one (not modified by
us) would load all of them into memory.  I'm not sure how the new one
performs memory-wise.  If the Mono guys don't have any idea, the right
thing to do here would be to create a large test database (or use an
existing TextCache or FAStore db) and do a SELECT * using the 3
implementations and walk the results, using heap-buddy and/or
heap-shot to analyze their memory usage.

 In the same breath, what is the benefit of M.D.Sqlite over
 M.D.SqliteClient for beagle ? I figured out there are some ADO.Net
 advantages but other than that ... ?

It's maintained for one, which our modified one essentially isn't.  It
has the backing of the Mono team.  The code is much cleaner and easier
to understand, largely because it doesn't have two separate codepaths
(one for v2 and one for v3).  I am sure the Mono guys have other good
reasons too. :)

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: System.InvalidOperationException: Invalid connection string

2007-10-18 Thread Debajyoti Bera
 This is probably related.  Now my beagle log is filling up with:

 20071018 00:42:49.9363 09041 Beagle DEBUG: Unable to determine account name
 for [EMAIL PROTECTED]:993

 Pressumably one for each of the bogus folders under

 /home/brian/.evolution/mail/imap/[EMAIL PROTECTED]:993/folders/cur/subf
olders/

 Any ideas on how to clean this mess up?  I've asked on the evolution
 list but nobody has responded.

Its something to do with the account_names for those folders as stored in 
gconf. I dont know much about these things ... maybe you can try to check the 
list at gconf:/apps/evolution/mail/accounts and see if there is any suspicous 
entry. Could be some bug in the Evolution backend too ...

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Debajyoti Bera
  A followup question, I didnot find any API documentation of
  Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
  there.

 My understanding is that both M.D.SqliteClient and M.D.Sqlite follow
 the general ADO.Net API patterns and that the latter is more or less a
 drop-in replacement for the former.  A few things may need to be
 tweaked, but in general just changing the using statements at the
 top of each source file should be all that's needed.

I was more looking for some method for row-by-row retrieval, on demand. Real 
on-demand, where the implementation does not retrieve all the rows at once 
but returns one by one.

 You've always been able to get rows on demand via ADO.Net, it's just a
 matter of the implementation underneath.  The old one (not modified by
 us) would load all of them into memory.  I'm not sure how the new one
 performs memory-wise.  If the Mono guys don't have any idea, the right

I checked the source out of curiousity
http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite/
And the code for DataReader looks exactly the same (didnt do a diff, just 
visually) as the one in Mono.Data.SqliteClient. So even if we migrate (the 
migration would be easy), we still have to ship with a modified inhouse 
M.D.Sqlite and keep syncing in with upstream. *sigh*

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Debajyoti Bera
Ignore my previous email ... I was looking at the wrong place :(
This is the right place for the new M.D.Sqlite
http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite_2.0/SQLiteDataReader.cs

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Best way of using Beagle to index data CDs

2007-10-18 Thread Nikolai Weibull
On 10/17/07, D Bera [EMAIL PROTECTED] wrote:
 Hi Nikolai,

  Please direct me to the right mailing list if this isn't the
  appropriate one for things like this.  There doesn't seem to be a
  dashboard-users mailing list.

 This is the right place. Welcome aboard.

  I would like to use Beagle to index my data CDs.  I figured that
  static indexes was the way to go, but I haven't quite determined the
  best way of doing it.  My initial idea was to create an index for each
  CD, so that I could simply remove the index associated with a CD if I
  threw out the CD.  But then I started worrying about using hundreds of
  static indexes.  My idea then was to merge the static indexes into a
  master index that I could rebuild whenever an index was added or
  removed, using $(beagle-manage-index merge).
 
  Does this make sense?  Does anyone have a better suggestion?
 
  Also, I figured that I wanted to add an attribute to each file indexed
  in this way that says what CD it is on, for example, 'disc:N', where N
  is an integer.  There doesn't seem to be a way of doing this yet with
  beagle-build-index.  Is this something that would be interesting to
  see as a patch, or are user-defined attributes outside the scope of
  Beagle?

 You are more or less in the right track. As Kevin pointed out, one way
 it so leverage static-indexes. Due to the way static indexes work, it
 isnt directly possible to use that for removable index. There is a
 --tag option to static indexes, which can be used to tag files when
 using beagle-build-index. You can use that to identify files from each
 medium. If you merge several indexes, there would be two kinds of
 problems:
 1) Files that are not in the filesystem would not be reported (happens
 for any static index)
 2) If there are files in different removable media but with same
 absolute path, then only one of them will be returned. And there might
 be more weirdness.

OK.  Both of these are real problem.

Problem 2 can be solved by making sure that one uses --remap correctly
to make each prefix unique, for example, /media/disc id/.

Problem 1 is a complete bummer.  That makes beagle more or less
unusable to this end.  How do we solve this?

It seems that you've basically solved both of these problems in
BuildRemovableIndex.cs by introducing a new URI protocol (removable)
for solving problem 1 and using media_name for solving problem 2.

However, BuildRemovableIndex.cs hasn't been completed.  It doesn't
seem to be missing that much, though.

How would one tell Beagle to report any removable:///* URI?

I guess I'm not familiar enough with the structure of Beagle to know
where to begin resolving these issues.
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


HTML mimetype

2007-10-18 Thread Debajyoti Bera
Hey all,
I recently noticed that *.html files are getting detected as 
application/x-mozilla-bookmarks instead of the correct text/html ! This is 
due to an xdgmime mime database (shared-mime-info) weirdness which recognizes 
*.html files as application/x-mozilla-bookmarks.
Just for consolation, gnomevfs-info also makes the same mistake. I 
wonder 
what does nautilus do ?
And no, beagle's HTML filter does not index 
application/x-mozilla-bookmarks 
file. Its trivial to add the mimetype to the HTML filter but I wonder if that 
is the right thing to do. Till this issue is resolved, don't be surprised if 
your html files are not indexed! The problem is partly due to 
shared-mime-info, so anybody with shared-mime-info-0.22 [1] will face the 
same problem.
Anyone knows anything ?

- dBera

[1] 
http://webcvs.freedesktop.org/mime/shared-mime-info/freedesktop.org.xml.in?revision=1.246view=markup

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Migrate to Mono.Data.Sqlite (Was: Re: GSoC Weekly Report)

2007-10-18 Thread Debajyoti Bera
 Ignore my previous email ... I was looking at the wrong place :(
 This is the right place for the new M.D.Sqlite
 http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mo
no.Data.Sqlite_2.0/SQLiteDataReader.cs

Migration from Mono.Data.SqliteClient to Mono.Data.Sqlite completed (rev 
4061).

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Best way of using Beagle to index data CDs

2007-10-18 Thread D Bera
  You are more or less in the right track. As Kevin pointed out, one way
  it so leverage static-indexes. Due to the way static indexes work, it
...
  medium. If you merge several indexes, there would be two kinds of
  problems:
  1) Files that are not in the filesystem would not be reported (happens
  for any static index)
  2) If there are files in different removable media but with same
  absolute path, then only one of them will be returned. And there might
  be more weirdness.

 Problem 2 can be solved by making sure that one uses --remap correctly
 to make each prefix unique, for example, /media/disc id/.

--remap doesn't work. I even think it was removed from svn trunk.

 Problem 1 is a complete bummer.  That makes beagle more or less
 unusable to this end.  How do we solve this?

 It seems that you've basically solved both of these problems in
 BuildRemovableIndex.cs by introducing a new URI protocol (removable)
 for solving problem 1 and using media_name for solving problem 2.

 However, BuildRemovableIndex.cs hasn't been completed.  It doesn't
 seem to be missing that much, though.

 How would one tell Beagle to report any removable:///* URI?
 I guess I'm not familiar enough with the structure of Beagle to know
 where to begin resolving these issues.

None of these issues require too much internal detail of beagle, so I
am trying to describe whats there and whats missing. Pause me if you
miss something.

By nature, URIs should be unique. So the uri should be changed to use
the media_name as well e.g.
removable://media_name/relative/path/to/file

Using the removable: scheme is just to capture the path and media
name is a different kind of URL. I dont think its standard and beagle
clients should interpret it as a removable media URL where the host of
the URL is the name of the media and the path of the URL is the
relative path of the file relative to where the media is mounted.
Note: beagle-search and other clients out there don't yet know about
removable media and would probably ignore such results. They need to
be patched too.

BuildRemovableIndex.cs is just a smart wrapper around BuildIndex.cs
which does the above mentioned changes.

The static backends are handled by StaticQueryable.cs;
http://svn.gnome.org/viewvc/beagle?view=revisionrevision=3108
contained a modified StaticQueryable.cs which knew about removable
media. When started, the backend would load the possible mappings from
config file and store the mapping in a mapping_table. Every result
passes through the backend just before it is returned. At that time,
the modified StaticQueryable would take a removable media, extract the
media_name and the relative path, use the mapping_table to get the
mounted path for that media_name, append the relative path to the end
of the mounted path and return the correct file:/// url. If the media
is not mounted, it would just report the original removable:// url and
mark a flag saying media not found. The client can then suitably
interact with the user. E.g. the client can either drop all the
un-mounted URLs or display all and when a user clicks on an unmounted
URL, requests the user to mount that medium and then opens the file.
The client interaction needs to be added to beagle-search.

Another major part that needs to be completed is deciding where/how to
store the media_name info for the medium. I was thinking
beagle-removable-index would work like

$ beagle-removable-index --build --medium medium_name [--config
/path/to/new/config] --target /path/to/index/ ...

would create a index at the path pointed to by target (as it happens
now). It would also store a removableconfig.xml file with the name of
the medium (and other possible configuration values) at
/path/to/new/config. If --config is absent, the location will default
to /path/to/index/removableconfig.xml

$ beagle-removable-index --mount [--config /path/to/config] --target
/path/to/index

will inform running beagled that a removable index at /path/to/index
is added. If the --config... is present, read the name and other
information from there or try to read to the config information from
/path/to/index/removableconfig.xml
The running beagled will inform staticqueryable about the new medium
being inserted which will in turn store the medium_name and
/path/to/index to its mapping table.

$ beagle-removable-index --unmount ... similarly

It doesnt _have_ to work this way. This is just what I thought would
make everybody happy.

The last major piece which wasnt done (I think, I dont remember
completely) is the real-time loading of new indexes. When
StaticQueryable is informed about a new index and a mapping, and if
the index at /path/to/index is not already loaded, then load the new
static_index. This should not be too difficult, just call into
QueryDriver.cs (see LoadRemovableMediaQueryables). If the index at
/path/to/index is already loaded, then just update the
(medium_name,path) mapping. Lucene allows beagle to silently update
the index in the 

Re: HTML mimetype

2007-10-18 Thread Arun Raghavan
On 19/10/2007, Debajyoti Bera [EMAIL PROTECTED] wrote:
snip
 And no, beagle's HTML filter does not index 
 application/x-mozilla-bookmarks
 file. Its trivial to add the mimetype to the HTML filter but I wonder if that
 is the right thing to do. Till this issue is resolved, don't be surprised if
 your html files are not indexed! The problem is partly due to
 shared-mime-info, so anybody with shared-mime-info-0.22 [1] will face the
 same problem.
 Anyone knows anything ?

Found this 2 month old bug --
https://bugs.freedesktop.org/show_bug.cgi?id=11843.
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers