Re: A question about filters
On 28/04/07, Johann Petrak <[EMAIL PROTECTED]> wrote: > Is it possible to have multiple filters for a a group of files > where each provides their own part of (meta)information about > the files? > For example, a user might have a collection of documents > (e.g. PDF, plain text, OpenOffice) but there is also an > application that stores meta information like authors, > project, dates, internal numbers etc. that is associated > with these documents. > So ... would it be possible for the existing PDF, OpenOffice, > Text etc. filters to provide the conventional information > (e.g. the content) and for and *additional* filter to > access the database and provide that additional information? > > How would a use then be able to query for files using those > additional properties? > > I am considering using beeagle as the back-end for some > inhouse dekstop search solution that would utilize > additional information about files (not only documents) > so something like this would be essential ... soon beagle will support XMP sidecar files which will be able to contain arbitrary metadata. So if you have a file "thing.foo" you would create a file called "thing.foo.xmp" which contained whatever properties you wanted. It's still in development but hopefully it will be included in the next release ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: regression testing
On 27/04/07, Joe Shaw <[EMAIL PROTECTED]> wrote: > We also have a repo which is stored in Novell Forge, which I run every > now and then but mostly when I'm gearing up to do a release. It was > originally put in Novell Forge because it was SVN and maintaining it > in GNOME CVS would cause major pain for people who were interested > only in the code. A lot of those test files were making sure we > extracted the right info, got all the metadata, etc. They weren't > really there to test "broken" files so much. At some point I'll look > into moving that stuff over into our current SVN. > > A big problem I had when I was fixing and dealing with a lot of the MS > Word crashers and misbehaving documents is that they contained private > information and I couldn't add them to a regression suite. Since the > files were badly formed for whatever reason (or the parsers were > broken), we couldn't recreate the situation in another file. > > Joe > Crashing errors are quite easy to detect as there will be lots of stuff put in the error logs. I was thinking more along the lines of monitoring beagle for more subtle errors such as only partially indexing files which may go unnoticed for a while. Soft errors like that are much worse as they might give people the impression that beagle was just not very good (not that I'm saying there are any such cases like that). It would also make fair comparisons between other projects like tracker possible. ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
regression testing
I'm sure someone has mentioned this before but it seems that a good way of preventing all the bugs that people complain about (broken filters, not all content indexed) might be caught more quickly if beagle had a repository of various files that indexing can be tested on every night. Coupled with a program to generate stats about what keywords were extracted, how long it took how much ram was used e.t.c this would be really cool and useful (the cairo folk do this rigorously to track how their performance changes every release). I'm too busy to write the code but I have plenty of sample files I can donate to the test repository :) ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: build system broken?
On 17/03/07, Joe Shaw <[EMAIL PROTECTED]> wrote: > Hi, > > Alex Mac wrote: > > checking for automake >= 1.8... > > > > testing automake-1.8... > > found 1.8.5 > > Try changing the REQUIRED_AUTOMAKE_VERSION in autogen.sh to 1.9 and > rerun it. I'm assuming you also have automake 1.9 installed. :) > > Joe > that seems to do the trick, thanks Alex ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: build system broken?
On 16/03/07, Joe Shaw <[EMAIL PROTECTED]> wrote: > Hi, > This has been brought up before, but I thought it was fixed now. > There's a bug in some versions of automake which build directories out > of order. > > What versions of automake do you have installed, and which one is chosen > by autogen.sh when you run? Also, is it using the system-provided > autogen.sh, or the one included in beagle? (It should be the first line > of the output.) > > Joe I'm running autogen.sh that comes with beagle, here's the output: <[EMAIL PROTECTED]:~/local/src/beagle> ./autogen.sh Using system-provided gnome-autogen script checking for autoconf >= 2.53... testing autoconf2.50... found 2.61 checking for automake >= 1.8... testing automake-1.8... found 1.8.5 checking for libtool >= 1.5... testing libtoolize... found 1.5.22 checking for glib-gettext >= 2.2.0... testing glib-gettextize... found 2.12.11 checking for intltool >= 0.30... testing intltoolize... found 0.35.5 checking for pkg-config >= 0.14.0... testing pkg-config... found 0.21 Checking for required M4 macros... Checking for forbidden M4 macros... **Warning**: I am going to run `configure' with no arguments. If you wish to pass any to it, please specify them on the `./autogen.sh' command line. Processing ./configure.in Running libtoolize... Running glib-gettextize... Ignore non-fatal messages. Copying file mkinstalldirs Copying file po/Makefile.in.in Please add the files codeset.m4 gettext.m4 glibc21.m4 iconv.m4 isc-posix.m4 lcmessage.m4 progtest.m4 from the /aclocal directory to your autoconf macro directory or directly to your aclocal.m4 file. You will also need config.guess and config.sub, which you can get from ftp://ftp.gnu.org/pub/gnu/config/. Running intltoolize... Running aclocal-1.8... acinclude.m4:124: warning: underquoted definition of AM_CHECK_PYMOD run info '(automake)Extending aclocal' or see http://sources.redhat.com/automake/automake.html#Extending%20aclocal /usr/share/aclocal/tulip.m4:2: warning: underquoted definition of AC_PATH_TULIP Running autoconf2.50... Running autoheader2.50... Running automake-1.8... INSTALL INSTALL.autogen_bak differ: byte 1, line 1 Running ./configure --enable-maintainer-mode ... ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
build system broken?
if I do a clean checkout of the repository and run autogen.sh followed by make it tries to build things in the wrong order and fails straight away with the following error, I noticed this a while back but assumed someone else might have noticed it by now so I didn't bother reporting it. I'm running the latest version of ubuntu feisty. If I manually 'cd' into various dirs I can get it to build fine, but thats a bit lame. Making all in images make[2]: Entering directory `/home/alex/local/src/beagle/images' /usr/bin/gmcs -debug -out:Images.dll -target:library ./Images.cs -r:../Util/Util.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/pango-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/atk-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gdk-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gtk-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/glib-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gconf-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gconf-sharp-peditors.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gnome-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/art-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/gnome-vfs-sharp.dll -r:/usr/lib/pkgconfig/../../lib/mono/gtk-sharp-2.0/glade-sharp.dll -r:/usr/lib/cli/gmime-sharp-2.2/gmime-sharp.dll -resource:./bug.png,bug.png -resource:./contact-icon.png,contact-icon.png -resource:./emblem-blog.png,emblem-blog.png -resource:./emblem-bugzilla.png,emblem-bugzilla.png -resource:./emblem-calendar.png,emblem-calendar.png -resource:./emblem-contact.png,emblem-contact.png -resource:./emblem-file.png,emblem-file.png -resource:./emblem-folder.png,emblem-folder.png -resource:./emblem-fspot.png,emblem-fspot.png -resource:./emblem-google.png,emblem-google.png -resource:./emblem-im-log.png,emblem-im-log.png -resource:./emblem-mail-message.png,emblem-mail-message.png -resource:./emblem-music.png,emblem-music.png -resource:./emblem-note.png,emblem-note.png -resource:./emblem-picture.png,emblem-picture.png -resource:./emblem-web-history.png,emblem-web-history.png -resource:./gnome-gaim.png,gnome-gaim.png -resource:./icon-blog.png,icon-blog.png -resource:./icon-monodoc.png,icon-monodoc.png -resource:./icon-search.png,icon-search.png -resource:./icon-web.png,icon-web.png -resource:./mail.png,mail.png -resource:./music.png,music.png -resource:./no-match.png,no-match.png -resource:./note.png,note.png -resource:./person.png,person.png -resource:./quick-tips.png,quick-tips.png -resource:./status-away.png,status-away.png -resource:./status-online.png,status-online.png -resource:./system-search.png,system-search.png -resource:./tip-arrow.png,tip-arrow.png error CS0006: cannot find metadata file `../Util/Util.dll' Compilation failed: 1 error(s), 0 warnings make[2]: *** [Images.dll] Error 1 make[2]: Leaving directory `/home/alex/local/src/beagle/images' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/alex/local/src/beagle' make: *** [all] Error 2 ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: Xslt filter problem
dbera: I've posted a patch to the xslt filter in the bugzilla entry, can you apply that please (just tidies up the code to use the recommended xml parsing methods) as I said in the bugzilla entry your file is not well-formed, fixing that makes the error go away. That said I have no idea why beagle is treating the debug output as the content, Joe or dbera will need to have a look into this one On 04/03/07, Stephan Hegel <[EMAIL PROTECTED]> wrote: > D Bera wrote: > > Can you file a bug and attach the file to it ? I will ask the Xslt > > filter author to have a look at it. > > Thanks in advance, > Bug 414498 submitted. > > Regards, > Stephan. ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
svn access
I've got a few improvements I want to make to some of the various filters I've got in beagle, and dbera has just opened up another bug on my svg filter. Getting these things done would be a lot quicker and easier for me If I had an svn account, how do I go about getting one? Alex ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Mars Filter
No not a filter that searches for water on mars, something much more interesting than that! Mars is adobe's new PDF format: "The Mars (code name) Project is an XML-friendly implementation of PDF syntax. Already an open specification, PDF is the global standard for trusted, high fidelity electronic documentation. The Mars file format incorporates additional industry standards such as SVG, PNG, JPG, JPG2000, OpenType, Xpath and XML into ZIP-based document container. The Mars plug-ins enable recognition of the Mars file format by Adobe Acrobat 8 and Adobe Reader 8 software." If you've got a windows or mac with acrobat 8 then grab the plugin: http://labs.adobe.com/downloads/mars.html and you can convert your pdfs into mars documents. I've started to write a filter for mars files: http://bugzilla.gnome.org/show_bug.cgi?id=383312 which people should play with if they are interested. I've also included a sample mars document http://bugzilla.gnome.org/attachment.cgi?id=77880&action=view if you just want to see whats going on. Apart from missing fonts inkscape seems to do a decent job of rendering the pages. Alex ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: searching in subversion repos
On 06/12/06, Richard Boulton <[EMAIL PROTECTED]> wrote: > > A thought crossed my mind recently about having beagle search inside > > subversion repos. Would I have to implement that as a backend? I've > > only written fairly trivial filters so far so I'm not sure how much > > work that would involve or if it would be practical but it would be > > quite cool to be able to search back in time through all my code and > > documents in the repo that I use to store all my work. > > I don't know the ins and outs out implementing this for beagle, but being > able to search through subversion (or CVS) repositories would be a useful > feature. Don't forget that such repositories also contain a lot of > potentially useful information in the form of log messages attached to > each commit. > > I'd encourage anyone thinking of working on this kind of search to take a > look at the "cvssearch" project (at http://cvssearch.sourceforge.net/) for > ideas, to avoid having to reinvent the wheel. In particular, take a look > at the papers linked to from that page: > > http://www.cse.unsw.edu.au/~amichail/cvssearch/paper.pdf > and > http://www.cse.unsw.edu.au/~amichail/cvssearch/paper2.pdf thanks, I'll take a look at those > You might be able to get the cvssearch code working, but development on it > has been fairly dead lately due to lack of time. The ideas described in > the papers are well worth reading, though. yeah, I would probably just implement from scratch. > -- > Richard > ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: searching in subversion repos
On 06/12/06, Joe Shaw <[EMAIL PROTECTED]> wrote: > Hi Alex, > > On Wed, 2006-12-06 at 20:38 +, Alex Mac wrote: > > Any possible pitfalls or things that would make this not worth doing? > > The idea is to find .svn directories and deal with their contents, > correct? Or is there something else here I'm not following? nope, I'm not talking about monitoring a checked out copy of a repo but the actual repo itself where all the data (including the logs as Richard mentioned) is stored. The .svn directories that get created when you check out a repo just have duplicates of all the files in for diffing purposes, nothing interesting. So the idea would be that searching for "foo" will show me all the revisions of all files in my repository that had foo in them, not just the most recent. it occurs to me as well that with the svn api this would be able to handle remote svn repositories as well. that would be cool, if a little bandwidth intensive on the first index. > A backend is definitely the way to go here. You can take a look at the > Tomboy backend for an example of a pretty simple one. You'll probably > need to set up inotify watches and such on the .svn directories to watch > for changes, and this could get tricky depending on how often those > change and how much data has to be reindexed. > > As you mentioned, how to display this will be tricky. We could possibly > have some basic support in beagle-search for it, and either integrate > with an existing SVN visualization tool or write our own to give more > advanced features. > > > I know someone has half wrapped the svn api in c# so that might need a > > little fixing first... > > A worthwhile project in any case, IMO. :) > yeah, probably a little bit much for me to take on at the moment but it might make a good SoC project for next year. > Thanks, > Joe > > ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
searching in subversion repos
A thought crossed my mind recently about having beagle search inside subversion repos. Would I have to implement that as a backend? I've only written fairly trivial filters so far so I'm not sure how much work that would involve or if it would be practical but it would be quite cool to be able to search back in time through all my code and documents in the repo that I use to store all my work. Once indexed I'm not too sure how best to display the results in the beagle search app but it seems like a cool feature to have. Any possible pitfalls or things that would make this not worth doing? I know someone has half wrapped the svn api in c# so that might need a little fixing first... Alex ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: Scribus filter / c# advice needed
On 02/12/06, D Bera <[EMAIL PROTECTED]> wrote: > > On further inspection it seems to be because I am using the .Net 1.0 > > method of creating an XmlTextReader which does not enable character > > checking: > > > > XmlTextReader reader = new XmlTextReader(thestream); > > > > msdn says this is deprecated in .Net 2.0 in favour of: > > > > XmlReader r = XmlReader.Create(thestream); > > > > using the new method character checking seems to be enabled by > > default. So its not a bug in mono its just that the XmlReader is > > slightly lax by default. > > There is a plan to move beagle to .Net-2.0 pretty soon (this or next > release). Would the filter break in that case ? Nope, all the XmlTextReader based filters will continue to work the same, but for the sake of future proofing they should probably be fixed to use the new XmlReader creation method ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: Scribus filter / c# advice needed
On 02/12/06, D Bera <[EMAIL PROTECTED]> wrote: > > Thanks for the advice, although I just tried running the filter on an > > old scribus file and it seems the XmlReader is quite happy to process > > xml files that are malformed in this way so it looks like there's no > > need for buffering. > > Are sure that this it not a bug (in mono implementation) that > XmlReader processes malformed files w/out any problem ? What does the > spec or msdn documentation say (you can also test on a .Net windows > machine) ? > > - dBera On further inspection it seems to be because I am using the .Net 1.0 method of creating an XmlTextReader which does not enable character checking: XmlTextReader reader = new XmlTextReader(thestream); msdn says this is deprecated in .Net 2.0 in favour of: XmlReader r = XmlReader.Create(thestream); using the new method character checking seems to be enabled by default. So its not a bug in mono its just that the XmlReader is slightly lax by default. ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: Scribus filter / c# advice needed
On 01/12/06, D Bera <[EMAIL PROTECTED]> wrote: > > efficiently do this using stream readers/ writers so I can just > > connect the output from that into the XmlReader and never load the > > whole thing into ram? > > One way I would try is to create a BufferedStream stream and create > your XmlTextReader from it. Use a byte[] as the storage - check the > XmlTextReader source to see what buffer size they use to read from > stream. If your filter is well behaved, then it most probably be using > anything else other that Read(byte[], int offset, int count), > ReadByte(), Close(). Implement them using your internal buffer. And > whenever your read anything in your buffer, do a search and replace in > the buffer. > > - dBera Thanks for the advice, although I just tried running the filter on an old scribus file and it seems the XmlReader is quite happy to process xml files that are malformed in this way so it looks like there's no need for buffering. I've attached a tweaked version of the filter to the bugzilla page (http://bugzilla.gnome.org/show_bug.cgi?id=380950) which is now ready to be committed by someone. Alex Mac ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Scribus filter / c# advice needed
I just added a filter for handling Scribus (http://www.scribus.net) files to bugzilla, one problem with it at the moment is it can only handle scribus files from version 1.3.4 onwards as version 1.3.3.x and earlier generate slightly invalid xml. The thing stopping me from using an XML parser on earlier files is simply that they use character entities that are not allowed (). Obviously I could load the whole file into ram and do a search and replace, but maybe some c# gurus can give me some advice one how to efficiently do this using stream readers/ writers so I can just connect the output from that into the XmlReader and never load the whole thing into ram? Alex Mac ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers