Re: A question about filters

2007-04-30 Thread Alex Mac
On 28/04/07, Johann Petrak <[EMAIL PROTECTED]> wrote:
> Is it possible to have multiple filters for a a group of files
> where each provides their own part of (meta)information about
> the files?
> For example, a user might have a collection of documents
> (e.g. PDF, plain text, OpenOffice) but there is also an
> application that stores meta information like authors,
> project, dates, internal numbers etc. that is associated
> with these documents.
> So ... would it be possible for the existing PDF, OpenOffice,
> Text etc. filters to provide the conventional information
> (e.g. the content) and for and *additional* filter to
> access the database and provide that additional information?
>
> How would a use then be able to query for files using those
> additional properties?
>
> I am considering using beeagle as the back-end for some
> inhouse dekstop search solution that would utilize
> additional information about files (not only documents)
> so something like this would be essential ...

soon beagle will support XMP sidecar files which will be able to
contain arbitrary metadata. So if you have a file "thing.foo" you
would create a file called "thing.foo.xmp" which contained whatever
properties you wanted.

It's still in development but hopefully it will be included in the next release
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: regression testing

2007-04-27 Thread Alex Mac
On 27/04/07, Joe Shaw <[EMAIL PROTECTED]> wrote:
> We also have a repo which is stored in Novell Forge, which I run every
> now and then but mostly when I'm gearing up to do a release.  It was
> originally put in Novell Forge because it was SVN and maintaining it
> in GNOME CVS would cause major pain for people who were interested
> only in the code.  A lot of those test files were making sure we
> extracted the right info, got all the metadata, etc.  They weren't
> really there to test "broken" files so much.  At some point I'll look
> into moving that stuff over into our current SVN.
>
> A big problem I had when I was fixing and dealing with a lot of the MS
> Word crashers and misbehaving documents is that they contained private
> information and I couldn't add them to a regression suite.  Since the
> files were badly formed for whatever reason (or the parsers were
> broken), we couldn't recreate the situation in another file.
>
> Joe
>

Crashing errors are quite easy to detect as there will be lots of
stuff put in the error logs. I was thinking more along the lines of
monitoring beagle for more subtle errors such as only partially
indexing files which may go unnoticed for a while. Soft errors like
that are much worse as they might give people the impression that
beagle was just not very good (not that I'm saying there are any such
cases like that).

It would also make fair comparisons between other projects like
tracker possible.
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


regression testing

2007-04-27 Thread Alex Mac
I'm sure someone has mentioned this before but it seems that a good
way of preventing all the bugs that people complain about (broken
filters, not all content indexed) might be caught more quickly if
beagle had a repository of various files that indexing can be tested
on every night.

Coupled with a program to generate stats about what keywords were
extracted, how long it took how much ram was used e.t.c this would be
really cool and useful (the cairo folk do this rigorously to track how
their performance changes every release).

I'm too busy to write the code but I have plenty of sample files I can
donate to the test repository :)
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: build system broken?

2007-03-17 Thread Alex Mac
On 17/03/07, Joe Shaw <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Alex Mac wrote:
> > checking for automake >= 1.8...
> >
> >   testing automake-1.8...
> > found 1.8.5
>
> Try changing the REQUIRED_AUTOMAKE_VERSION in autogen.sh to 1.9 and
> rerun it.  I'm assuming you also have automake 1.9 installed. :)
>
> Joe
>

that seems to do the trick, thanks

Alex
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: build system broken?

2007-03-16 Thread Alex Mac
On 16/03/07, Joe Shaw <[EMAIL PROTECTED]> wrote:
> Hi,
> This has been brought up before, but I thought it was fixed now.
> There's a bug in some versions of automake which build directories out
> of order.
>
> What versions of automake do you have installed, and which one is chosen
> by autogen.sh when you run?  Also, is it using the system-provided
> autogen.sh, or the one included in beagle?  (It should be the first line
> of the output.)
>
> Joe

I'm running autogen.sh that comes with beagle, here's the output:

<[EMAIL PROTECTED]:~/local/src/beagle> ./autogen.sh
Using system-provided gnome-autogen script

checking for autoconf >= 2.53...

  testing autoconf2.50...
found 2.61

checking for automake >= 1.8...

  testing automake-1.8...
found 1.8.5

checking for libtool >= 1.5...

  testing libtoolize...
found 1.5.22

checking for glib-gettext >= 2.2.0...

  testing glib-gettextize...
found 2.12.11

checking for intltool >= 0.30...

  testing intltoolize...
found 0.35.5

checking for pkg-config >= 0.14.0...

  testing pkg-config...
found 0.21

Checking for required M4 macros...


Checking for forbidden M4 macros...

**Warning**: I am going to run `configure' with no arguments.
If you wish to pass any to it, please specify them on the
`./autogen.sh' command line.


Processing ./configure.in


Running libtoolize...


Running glib-gettextize... Ignore non-fatal messages.

Copying file mkinstalldirs
Copying file po/Makefile.in.in

Please add the files
  codeset.m4 gettext.m4 glibc21.m4 iconv.m4 isc-posix.m4 lcmessage.m4
  progtest.m4
from the /aclocal directory to your autoconf macro directory
or directly to your aclocal.m4 file.
You will also need config.guess and config.sub, which you can get from
ftp://ftp.gnu.org/pub/gnu/config/.


Running intltoolize...


Running aclocal-1.8...

acinclude.m4:124: warning: underquoted definition of AM_CHECK_PYMOD
  run info '(automake)Extending aclocal'
  or see http://sources.redhat.com/automake/automake.html#Extending%20aclocal
/usr/share/aclocal/tulip.m4:2: warning: underquoted definition of AC_PATH_TULIP

Running autoconf2.50...


Running autoheader2.50...


Running automake-1.8...

INSTALL INSTALL.autogen_bak differ: byte 1, line 1

Running ./configure --enable-maintainer-mode ...
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


build system broken?

2007-03-16 Thread Alex Mac
if I do a clean checkout of the repository and run autogen.sh followed
by make it tries to build things in the wrong order and fails straight
away with the following error, I noticed this a while back but assumed
someone else might have noticed it by now so I didn't bother reporting
it. I'm running the latest version of ubuntu feisty.

If I manually 'cd' into various dirs I can get it to build fine, but
thats a bit lame.

Making all in images
make[2]: Entering directory `/home/alex/local/src/beagle/images'
/usr/bin/gmcs -debug -out:Images.dll -target:library ./Images.cs
-r:../Util/Util.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/pango-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/atk-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gdk-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gtk-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/glib-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gconf-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gconf-sharp-peditors.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gnome-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/art-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gnome-vfs-sharp.dll
-r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/glade-sharp.dll
-r:/usr/lib/cli/gmime-sharp-2.2/gmime-sharp.dll
-resource:./bug.png,bug.png
-resource:./contact-icon.png,contact-icon.png
-resource:./emblem-blog.png,emblem-blog.png
-resource:./emblem-bugzilla.png,emblem-bugzilla.png
-resource:./emblem-calendar.png,emblem-calendar.png
-resource:./emblem-contact.png,emblem-contact.png
-resource:./emblem-file.png,emblem-file.png
-resource:./emblem-folder.png,emblem-folder.png
-resource:./emblem-fspot.png,emblem-fspot.png
-resource:./emblem-google.png,emblem-google.png
-resource:./emblem-im-log.png,emblem-im-log.png
-resource:./emblem-mail-message.png,emblem-mail-message.png
-resource:./emblem-music.png,emblem-music.png
-resource:./emblem-note.png,emblem-note.png
-resource:./emblem-picture.png,emblem-picture.png
-resource:./emblem-web-history.png,emblem-web-history.png
-resource:./gnome-gaim.png,gnome-gaim.png
-resource:./icon-blog.png,icon-blog.png
-resource:./icon-monodoc.png,icon-monodoc.png
-resource:./icon-search.png,icon-search.png
-resource:./icon-web.png,icon-web.png -resource:./mail.png,mail.png
-resource:./music.png,music.png -resource:./no-match.png,no-match.png
-resource:./note.png,note.png -resource:./person.png,person.png
-resource:./quick-tips.png,quick-tips.png
-resource:./status-away.png,status-away.png
-resource:./status-online.png,status-online.png
-resource:./system-search.png,system-search.png
-resource:./tip-arrow.png,tip-arrow.png
error CS0006: cannot find metadata file `../Util/Util.dll'
Compilation failed: 1 error(s), 0 warnings
make[2]: *** [Images.dll] Error 1
make[2]: Leaving directory `/home/alex/local/src/beagle/images'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/alex/local/src/beagle'
make: *** [all] Error 2
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Xslt filter problem

2007-03-04 Thread Alex Mac
dbera: I've posted a patch to the xslt filter in the bugzilla entry,
can you apply that please (just tidies up the code to use the
recommended xml parsing methods)

as I said in the bugzilla entry your file is not well-formed, fixing
that makes the error go away.

That said I have no idea why beagle is treating the debug output as
the content, Joe or dbera will need to have a look into this one

On 04/03/07, Stephan Hegel <[EMAIL PROTECTED]> wrote:
> D Bera wrote:
> > Can you file a bug and attach the file to it ? I will ask the Xslt
> > filter author to have a look at it.
> > Thanks in advance,
> Bug 414498 submitted.
>
> Regards,
> Stephan.
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


svn access

2007-02-06 Thread Alex Mac
I've got a few improvements I want to make to some of the various
filters I've got in beagle, and dbera has just opened up another bug
on my svg filter. Getting these things done would be a lot quicker and
easier for me If I had an svn account, how do I go about getting one?

Alex
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Mars Filter

2006-12-07 Thread Alex Mac
No not a filter that searches for water on mars, something much more
interesting than that! Mars is adobe's new PDF format:

"The Mars (code name) Project is an XML-friendly implementation of PDF syntax.
Already an open specification, PDF is the global standard for trusted, high
fidelity electronic documentation. The Mars file format incorporates additional
industry standards such as SVG, PNG, JPG, JPG2000, OpenType, Xpath and XML into
ZIP-based document container. The Mars plug-ins enable recognition of the Mars
file format by Adobe Acrobat 8 and Adobe Reader 8 software."

If you've got a windows or mac with acrobat 8 then grab the plugin:
http://labs.adobe.com/downloads/mars.html and you can convert your
pdfs into mars documents.

I've started to write a filter for mars files:
http://bugzilla.gnome.org/show_bug.cgi?id=383312 which people should
play with if they are interested.

I've also included a sample mars document
http://bugzilla.gnome.org/attachment.cgi?id=77880&action=view if you
just want to see whats going on. Apart from missing fonts inkscape
seems to do a decent job of rendering the pages.

Alex
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: searching in subversion repos

2006-12-07 Thread Alex Mac
On 06/12/06, Richard Boulton <[EMAIL PROTECTED]> wrote:
> > A thought crossed my mind recently about having beagle search inside
> > subversion repos. Would I have to implement that as a backend? I've
> > only written fairly trivial filters so far so I'm not sure how much
> > work that would involve or if it would be practical but it would be
> > quite cool to be able to search back in time through all my code and
> > documents in the repo that I use to store all my work.
>
> I don't know the ins and outs out implementing this for beagle, but being
> able to search through subversion (or CVS) repositories would be a useful
> feature.  Don't forget that such repositories also contain a lot of
> potentially useful information in the form of log messages attached to
> each commit.
>
> I'd encourage anyone thinking of working on this kind of search to take a
> look at the "cvssearch" project (at http://cvssearch.sourceforge.net/) for
> ideas, to avoid having to reinvent the wheel.  In particular, take a look
> at the papers linked to from that page:
>
> http://www.cse.unsw.edu.au/~amichail/cvssearch/paper.pdf
> and
> http://www.cse.unsw.edu.au/~amichail/cvssearch/paper2.pdf

thanks, I'll take a look at those

> You might be able to get the cvssearch code working, but development on it
> has been fairly dead lately due to lack of time.  The ideas described in
> the papers are well worth reading, though.

yeah, I would probably just implement from scratch.

> --
> Richard
>
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: searching in subversion repos

2006-12-07 Thread Alex Mac
On 06/12/06, Joe Shaw <[EMAIL PROTECTED]> wrote:
> Hi Alex,
>
> On Wed, 2006-12-06 at 20:38 +, Alex Mac wrote:
> > Any possible pitfalls or things that would make this not worth doing?
>
> The idea is to find .svn directories and deal with their contents,
> correct?  Or is there something else here I'm not following?

nope, I'm not talking about monitoring a checked out copy of a repo
but the actual repo itself where all the data (including the logs as
Richard mentioned) is stored. The .svn directories that get created
when you check out a repo just have duplicates of all the files in for
diffing purposes, nothing interesting.

So the idea would be that searching for "foo" will show me all the
revisions of all files in my repository that had foo in them, not just
the most recent.

it occurs to me as well that with the svn api this would be able to
handle remote svn repositories as well. that would be cool, if a
little bandwidth intensive on the first index.

> A backend is definitely the way to go here.  You can take a look at the
> Tomboy backend for an example of a pretty simple one.  You'll probably
> need to set up inotify watches and such on the .svn directories to watch
> for changes, and this could get tricky depending on how often those
> change and how much data has to be reindexed.
>
> As you mentioned, how to display this will be tricky.  We could possibly
> have some basic support in beagle-search for it, and either integrate
> with an existing SVN visualization tool or write our own to give more
> advanced features.
>
> > I know someone has half wrapped the svn api in c# so that might need a
> > little fixing first...
>
> A worthwhile project in any case, IMO. :)
>

yeah, probably a little bit much for me to take on at the moment but
it might make a good SoC project for next year.

> Thanks,
> Joe
>
>
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


searching in subversion repos

2006-12-06 Thread Alex Mac
A thought crossed my mind recently about having beagle search inside
subversion repos. Would I have to implement that as a backend? I've
only written fairly trivial filters so far so I'm not sure how much
work that would involve or if it would be practical but it would be
quite cool to be able to search back in time through all my code and
documents in the repo that I use to store all my work.

Once indexed I'm not too sure how best to display the results in the
beagle search app but it seems like a cool feature to have.

Any possible pitfalls or things that would make this not worth doing?
I know someone has half wrapped the svn api in c# so that might need a
little fixing first...

Alex
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Scribus filter / c# advice needed

2006-12-02 Thread Alex Mac
On 02/12/06, D Bera <[EMAIL PROTECTED]> wrote:
> > On further inspection it seems to be because I am using the .Net 1.0
> > method of creating an XmlTextReader which does not enable character
> > checking:
> >
> > XmlTextReader reader = new XmlTextReader(thestream);
> >
> > msdn says this is deprecated in .Net 2.0 in favour of:
> >
> > XmlReader r = XmlReader.Create(thestream);
> >
> > using the new method character checking seems to be enabled by
> > default. So its not a bug in mono its just that the XmlReader is
> > slightly lax by default.
>
> There is a plan to move beagle to .Net-2.0 pretty soon (this or next
> release). Would the filter break in that case ?

Nope, all the XmlTextReader based filters will continue to work the
same, but for the sake of future proofing they should probably be
fixed to use the new XmlReader creation method
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Scribus filter / c# advice needed

2006-12-02 Thread Alex Mac
On 02/12/06, D Bera <[EMAIL PROTECTED]> wrote:
> > Thanks for the advice, although I just tried running the filter on an
> > old scribus file and it seems the XmlReader is quite happy to process
> > xml files that are malformed in this way so it looks like there's no
> > need for buffering.
>
> Are sure that this it not a bug (in mono implementation) that
> XmlReader processes malformed files w/out any problem ? What does the
> spec or msdn documentation say (you can also test on a .Net windows
> machine) ?
>
> - dBera

On further inspection it seems to be because I am using the .Net 1.0
method of creating an XmlTextReader which does not enable character
checking:

XmlTextReader reader = new XmlTextReader(thestream);

msdn says this is deprecated in .Net 2.0 in favour of:

XmlReader r = XmlReader.Create(thestream);

using the new method character checking seems to be enabled by
default. So its not a bug in mono its just that the XmlReader is
slightly lax by default.
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Scribus filter / c# advice needed

2006-12-02 Thread Alex Mac
On 01/12/06, D Bera <[EMAIL PROTECTED]> wrote:
> > efficiently do this using stream readers/ writers so I can just
> > connect the output from that into the XmlReader and never load the
> > whole thing into ram?
>
> One way I would try is to create a BufferedStream stream and create
> your XmlTextReader from it. Use a byte[] as the storage - check the
> XmlTextReader source to see what buffer size they use to read from
> stream. If your filter is well behaved, then it most probably be using
> anything else other that Read(byte[], int offset, int count),
> ReadByte(), Close(). Implement them using your internal buffer. And
> whenever your read anything in your buffer, do a search and replace in
> the buffer.
>
> - dBera

Thanks for the advice, although I just tried running the filter on an
old scribus file and it seems the XmlReader is quite happy to process
xml files that are malformed in this way so it looks like there's no
need for buffering.

I've attached a tweaked version of the filter to the bugzilla page
(http://bugzilla.gnome.org/show_bug.cgi?id=380950) which is now ready
to be committed by someone.

Alex Mac
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Scribus filter / c# advice needed

2006-12-01 Thread Alex Mac
I just added a filter for handling Scribus (http://www.scribus.net)
files to bugzilla, one problem with it at the moment is it can only
handle scribus files from version 1.3.4 onwards as version 1.3.3.x and
earlier generate slightly invalid xml.

The thing stopping me from using an XML parser on earlier files is
simply that they use character entities that are not allowed ().
Obviously I could load the whole file into ram and do a search and
replace, but maybe some c# gurus can give me some advice one how to
efficiently do this using stream readers/ writers so I can just
connect the output from that into the XmlReader and never load the
whole thing into ram?

Alex Mac
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers