Re: [Framework-Team] Plip : indexing files
Hi Martin, We are completing a new version of ARFilePreview, totally based on a five approach. It is available in collective trunk ; older versions are in branches. It provides preview on a normal file object by adapting it. It also indexes the file's content (some work is in progress in order to install it properly). This new version is about to be finished, but it gives a good overview of what we want to do and how we intend to achieve it. We now have issues that are related to the ATCT File content that doesn't handle many kind of events. We have a regression compared to last version : ATFile has bad handle of events after PUT, no rules (usecases ?) defined when a file is renamed via webdav, etc... All this work is done in the AT version of ARFilePreview. Should we work on ATFile ? Would someone join us in this task ? Best regards, Thierry. Martin Aspeli a écrit : > Hi Thierry, > > I think this sounds quite interesting. Certainly, a better "document" > story (which includes full-text indexing and a strategy to avoid ZODB > bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and > limi's as well, fwiw). > > I would like to see a proposal that is somwhat less AT centric, > though. It may be wishful to think that we can achieve this, but > ideally we'd decouple portal_transform entirely, replacing it with a > lighter framework based on Zope 3 adapters and utilities (a transform > is a utility, adapters take care of the actual extraction of data to > transform and consumption of the transformed text). This should also > allow some async option (register a consumer for the transform that is > called when the transform is complete). > > At this point, we could extend ATFile relatively easily to use this. I > don't think we'd want a new content type, but rather to extend ATFile > as necessary. > > I think BLOB storage and transform should be two separate proposals > and two separate implementations. > > Martin > > On 1/29/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> Hi, >> >> I'd like to make a proposal that extends Plip #177 >> http://plone.org/products/plone/roadmap/177 >> >> We developed a plone component that stores a file with its html >> preview : >> ATFilePreview . >> >> This does the following : >> >> - make the file available for download >> >> - create a html preview of the file >> >> - index the file's content in full text >> >> >> It has the following advantages : >> >> - it uses mimetypes registry in order to detect mimetypes >> >> - it uses portal transforms in order to create the preview and uses this >> preview in order to extract the text that has to be indexed >> >> - it stores both html preview and all subobjects into the object, as >> persistant sub-objects >> >> - it's totally generic : obviously it does preview and indexes for >> opendocuments, ms documents, pdf, rtf, html, python etc. It may also >> show >> a preview for zip files, video files, audio files or whatever you can >> imagine. Let's take the example of a video file : you may decide that >> all >> video that is uploaded will be transcoded to mkv format and streamed in >> the page via a java applet that displays the video. You only need to >> have >> a video_to_html transform that will achieve it. The result will be >> stored >> together with the original file and the html preview will be displayed. >> >> - the trunk (it's in collective) stores everything inside the object in >> zodb, so it has no dependency and can take place of normal file objects >> >> - there is another version that stores file, html and subobjects in the >> filesystem. It currently uses FSS but we'd like to move that to BlobFile >> as FSS is a bit too complex for our usecase. >> >> - we don't need all the TING mechanics in order to get the fulltext >> indexing : we only need the UnicodeLexicon as far as portal transforms >> send unicode results (tested in france ; you can imagine ;-) ) >> >> - we already have the transforms for all office files in >> AROfficesTransform, for which we are currently doing the integration >> into >> archetypes. >> >> >> >> At this time there are 2 new things to consider : >> >> - portal transforms may overload the zope server >> >> - there may be decorators that should be applied to files in order to >> handle properly specific extra fields (especially for multimedia files : >> metadata etc.) >> >> * Concerning overload of zope server : I think that we should have an >> asynchronous portal transform that may run as a separate twisted deamon. >> This may live together with portal_transforms and may be called >> asynchronous_portal_transform (APT). The only difference with >> portal_transforms is that we need to give a callback method to APT in >> order to allow it to send the result of the transform after a while. >> Therefore if a content type is APT-aware and APT is activated, APT is >> used >> instead of portal_transforms. This allow to move the overload to one or >> many dedicated servers for example.
Re: [Framework-Team] Plip : indexing files
At this time there are 2 new things to consider : - portal transforms may overload the zope server - there may be decorators that should be applied to files in order to handle properly specific extra fields (especially for multimedia files : metadata etc.) * Concerning overload of zope server : I think that we should have an asynchronous portal transform that may run as a separate twisted deamon. This may live together with portal_transforms and may be called asynchronous_portal_transform (APT). The only difference with portal_transforms is that we need to give a callback method to APT in order to allow it to send the result of the transform after a while. Therefore if a content type is APT-aware and APT is activated, APT is used instead of portal_transforms. This allow to move the overload to one or many dedicated servers for example. We may also take a look at BlueDCS (I just heard of it but never tried it) wicked.fieldevent allows you to register subscribers for fieldstorage events. These subscribers could be asyncronous(maybe just some instructions that clockserver could handle later). I would try this first. -w -- -- d. whit morriss -- - senior engineer, opencore - - http://www.openplans.org - - m: 415-710-8975 - "If you don't know where you are, you don't know anything at all" Dr. Edgar Spencer, Ph.D., 1995 begin:vcard fn:D. Whitfield Morriss n:Morriss;D. Whitfield org:The Open Planning Project;OpenPlans adr:;;1309 Ashwood Ave;Nashville;TN;37212;USA email;internet:[EMAIL PROTECTED] title:Lead Developer tel;home:615 292-9142 tel;cell:415 710-8975 x-mozilla-html:FALSE version:2.1 end:vcard ___ Framework-Team mailing list Framework-Team@lists.plone.org http://lists.plone.org/mailman/listinfo/framework-team
Re: [Framework-Team] Plip : indexing files
On 1/29/07, Thierry Benita <[EMAIL PROTECTED]> wrote: This makes sense. I'll work with other sprinters in order to make this piece of work more conform to Z3 model and use adapters on items that implement IFileContent. Is there some work in progress on the rewrite of portal_transforms ? There is still an issue if we want to store sub-objects either in zodb or filesystem. We'll investigate that. Also the view will stay in skins as far as Plone can't support Z3 views with correct caching policies. I don't know if anything has been done on transform work. If I were doing it, I'd start with a blank sheet of paper and design a generic Zope 3 transform service. Off the top of my head, this would involve: - An ITransformer interface; objects could be adapted to this to perform a transform - An ITransformData interface, which the ITransformer would adapt its context to in order to get data to transform. This means that code needing to transform "the data" of an object could adapt to the ITransformerInterface, which would support generic functionality, but would rely on the object having an ITransformData adapter to fetch the data part. - The transformer would probably take a callback object, an adaptation to ITransformReceiver. This could be called synchronously or asynchronously depending ont he ITransformer implementation, and would be responsible for doing something with the transformed data. Alternatively, sync/async could be an option when calling the transform, e.g. have one method which would get the transformed data (e.g. during rendering, when async makes no sense) and another which could do it asynchronously (e.g. when saving an object). - Mimetypes could be registered as named utilties implementing IMimetype. No need for a separate tool, I don't think. Persistent configuration could happen with local utilities. - Available transforms would be named utilities as well. Or they could be adapters of two mimetypes, but that gets a bit tricky because you could quite easily have ambiguity. On the other hand, a transform will be closely related to an input and an output mimetype, so maybe it's a named multi-adapter from IMimetypeOne to IMimetypeTwo, where both those are sub-interfaces of IMimetype. I think a generic design like that wouldn't be too hard to dream up. It could be a plain zope 3 package (plone.transform) - best to check no-one has one of those yet. A lot of the actual transform code could be lifted straight from AT (but the API to the transform function should be less dumb). I don't think this is terribly hard and possibly qutie fun. :) Martin ___ Framework-Team mailing list Framework-Team@lists.plone.org http://lists.plone.org/mailman/listinfo/framework-team
Re: [Framework-Team] Plip : indexing files
Hi Martin, This makes sense. I'll work with other sprinters in order to make this piece of work more conform to Z3 model and use adapters on items that implement IFileContent. Is there some work in progress on the rewrite of portal_transforms ? There is still an issue if we want to store sub-objects either in zodb or filesystem. We'll investigate that. Also the view will stay in skins as far as Plone can't support Z3 views with correct caching policies. Any other idea welcome ;) Thierry. Martin Aspeli a écrit : > Hi Thierry, > > I think this sounds quite interesting. Certainly, a better "document" > story (which includes full-text indexing and a strategy to avoid ZODB > bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and > limi's as well, fwiw). > > I would like to see a proposal that is somwhat less AT centric, > though. It may be wishful to think that we can achieve this, but > ideally we'd decouple portal_transform entirely, replacing it with a > lighter framework based on Zope 3 adapters and utilities (a transform > is a utility, adapters take care of the actual extraction of data to > transform and consumption of the transformed text). This should also > allow some async option (register a consumer for the transform that is > called when the transform is complete). > > At this point, we could extend ATFile relatively easily to use this. I > don't think we'd want a new content type, but rather to extend ATFile > as necessary. > > I think BLOB storage and transform should be two separate proposals > and two separate implementations. > > Martin begin:vcard fn:Thierry BENITA - atReal n:BENITA;Thierry adr:;;113 Bd de Pont-de-Vivaux;Marseille;;13010;France email;internet:[EMAIL PROTECTED] title:atReal tel;work:+33 (0)4 91 29 42 81 tel;fax:+33 (0)4 91 29 42 82 note;quoted-printable:atReal et openElec : Troph=C3=A9e d'Or aux Troph=C3=A9es du Libre=0D=0A= http://www.atreal.fr=0D=0A= http://www.openelec.org x-mozilla-html:FALSE url:http://www.atreal.net version:2.1 end:vcard ___ Framework-Team mailing list Framework-Team@lists.plone.org http://lists.plone.org/mailman/listinfo/framework-team
Re: [Framework-Team] Plip : indexing files
--On 29. Januar 2007 13:23:18 +0100 [EMAIL PROTECTED] wrote: Hi, - we don't need all the TING mechanics in order to get the fulltext indexing : we only need the UnicodeLexicon as far as portal transforms send unicode results (tested in france ; you can imagine ;-) ) My two cents: TXNG is a working solution since years and working successfully since years in large installations. Any need to reinvent wheels (from scratch)? Andreas pgppVNZGlZ2O1.pgp Description: PGP signature ___ Framework-Team mailing list Framework-Team@lists.plone.org http://lists.plone.org/mailman/listinfo/framework-team
Re: [Framework-Team] Plip : indexing files
Hi Thierry, I think this sounds quite interesting. Certainly, a better "document" story (which includes full-text indexing and a strategy to avoid ZODB bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and limi's as well, fwiw). I would like to see a proposal that is somwhat less AT centric, though. It may be wishful to think that we can achieve this, but ideally we'd decouple portal_transform entirely, replacing it with a lighter framework based on Zope 3 adapters and utilities (a transform is a utility, adapters take care of the actual extraction of data to transform and consumption of the transformed text). This should also allow some async option (register a consumer for the transform that is called when the transform is complete). At this point, we could extend ATFile relatively easily to use this. I don't think we'd want a new content type, but rather to extend ATFile as necessary. I think BLOB storage and transform should be two separate proposals and two separate implementations. Martin On 1/29/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Hi, I'd like to make a proposal that extends Plip #177 http://plone.org/products/plone/roadmap/177 We developed a plone component that stores a file with its html preview : ATFilePreview . This does the following : - make the file available for download - create a html preview of the file - index the file's content in full text It has the following advantages : - it uses mimetypes registry in order to detect mimetypes - it uses portal transforms in order to create the preview and uses this preview in order to extract the text that has to be indexed - it stores both html preview and all subobjects into the object, as persistant sub-objects - it's totally generic : obviously it does preview and indexes for opendocuments, ms documents, pdf, rtf, html, python etc. It may also show a preview for zip files, video files, audio files or whatever you can imagine. Let's take the example of a video file : you may decide that all video that is uploaded will be transcoded to mkv format and streamed in the page via a java applet that displays the video. You only need to have a video_to_html transform that will achieve it. The result will be stored together with the original file and the html preview will be displayed. - the trunk (it's in collective) stores everything inside the object in zodb, so it has no dependency and can take place of normal file objects - there is another version that stores file, html and subobjects in the filesystem. It currently uses FSS but we'd like to move that to BlobFile as FSS is a bit too complex for our usecase. - we don't need all the TING mechanics in order to get the fulltext indexing : we only need the UnicodeLexicon as far as portal transforms send unicode results (tested in france ; you can imagine ;-) ) - we already have the transforms for all office files in AROfficesTransform, for which we are currently doing the integration into archetypes. At this time there are 2 new things to consider : - portal transforms may overload the zope server - there may be decorators that should be applied to files in order to handle properly specific extra fields (especially for multimedia files : metadata etc.) * Concerning overload of zope server : I think that we should have an asynchronous portal transform that may run as a separate twisted deamon. This may live together with portal_transforms and may be called asynchronous_portal_transform (APT). The only difference with portal_transforms is that we need to give a callback method to APT in order to allow it to send the result of the transform after a while. Therefore if a content type is APT-aware and APT is activated, APT is used instead of portal_transforms. This allow to move the overload to one or many dedicated servers for example. We may also take a look at BlueDCS (I just heard of it but never tried it) * Concerning the decorators : there should be a kind of decorators_registry that would allow to add decorators based on mimetypes What do you think of all these points ? Best regards, Thierry. -- atReal http://www.atreal.net ___ Framework-Team mailing list Framework-Team@lists.plone.org http://lists.plone.org/mailman/listinfo/framework-team ___ Framework-Team mailing list Framework-Team@lists.plone.org http://lists.plone.org/mailman/listinfo/framework-team