Re: [Framework-Team] Plip : indexing files

2007-03-16 Thread Thierry Benita
Hi Martin,

We are completing a new version of ARFilePreview, totally based on a
five approach. It is available in collective trunk ; older versions are
in branches.
It provides preview on a normal file object by adapting it. It also
indexes the file's content (some work is in progress in order to install
it properly).

This new version is about to be finished, but it gives a good overview
of what we want to do and how we intend to achieve it. We now have
issues that are related to the ATCT File content that doesn't handle
many kind of events.

We have a regression compared to last version : ATFile has bad handle of
events after PUT, no rules (usecases ?) defined when a file is renamed
via webdav, etc... All this work is done in the AT version of
ARFilePreview.

Should we work on ATFile ? Would someone join us in this task ?

Best regards,

Thierry.

Martin Aspeli a écrit :
> Hi Thierry,
>
> I think this sounds quite interesting. Certainly, a better "document"
> story (which includes full-text indexing and a strategy to avoid ZODB
> bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and
> limi's as well, fwiw).
>
> I would like to see a proposal that is somwhat less AT centric,
> though. It may be wishful to think that we can achieve this, but
> ideally we'd decouple portal_transform entirely, replacing it with a
> lighter framework based on Zope 3 adapters and utilities (a transform
> is a utility, adapters take care of the actual extraction of data to
> transform and consumption of the transformed text). This should also
> allow some async option (register a consumer for the transform that is
> called when the transform is complete).
>
> At this point, we could extend ATFile relatively easily to use this. I
> don't think we'd want a new content type, but rather to extend ATFile
> as necessary.
>
> I think BLOB storage and transform should be two separate proposals
> and two separate implementations.
>
> Martin
>
> On 1/29/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I'd like to make a proposal that extends Plip #177
>> http://plone.org/products/plone/roadmap/177
>>
>> We developed a plone component that stores a file with its html
>> preview :
>> ATFilePreview .
>>
>> This does the following :
>>
>> - make the file available for download
>>
>> - create a html preview of the file
>>
>> - index the file's content in full text
>>
>>
>> It has the following advantages :
>>
>> - it uses mimetypes registry in order to detect mimetypes
>>
>> - it uses portal transforms in order to create the preview and uses this
>> preview in order to extract the text that has to be indexed
>>
>> - it stores both html preview and all subobjects into the object, as
>> persistant sub-objects
>>
>> - it's totally generic : obviously it does preview and indexes for
>> opendocuments, ms documents, pdf, rtf, html, python etc. It may also
>> show
>> a preview for zip files, video files, audio files or whatever you can
>> imagine. Let's take the example of a video file : you may decide that
>> all
>> video that is uploaded will be transcoded to mkv format and streamed in
>> the page via a java applet that displays the video. You only need to
>> have
>> a video_to_html transform that will achieve it. The result will be
>> stored
>> together with the original file and the html preview will be displayed.
>>
>> - the trunk (it's in collective) stores everything inside the object in
>> zodb, so it has no dependency and can take place of normal file objects
>>
>> - there is another version that stores file, html and subobjects in the
>> filesystem. It currently uses FSS but we'd like to move that to BlobFile
>> as FSS is a bit too complex for our usecase.
>>
>> - we don't need all the TING mechanics in order to get the fulltext
>> indexing : we only need the UnicodeLexicon as far as portal transforms
>> send unicode results (tested in france ; you can imagine ;-) )
>>
>> - we already have the transforms for all office files in
>> AROfficesTransform, for which we are currently doing the integration
>> into
>> archetypes.
>>
>>
>>
>> At this time there are 2 new things to consider :
>>
>> - portal transforms may overload the zope server
>>
>> - there may be decorators that should be applied to files in order to
>> handle properly specific extra fields (especially for multimedia files :
>> metadata etc.)
>>
>> * Concerning overload of zope server : I think that we should have an
>> asynchronous portal transform that may run as a separate twisted deamon.
>> This may live together with portal_transforms and may be called
>> asynchronous_portal_transform (APT). The only difference with
>> portal_transforms is that we need to give a callback method to APT in
>> order to allow it to send the result of the transform after a while.
>> Therefore if a content type is APT-aware and APT is activated, APT is
>> used
>> instead of portal_transforms. This allow to move the overload to one or
>> many dedicated servers for example. 

Re: [Framework-Team] Plip : indexing files

2007-01-29 Thread whit



At this time there are 2 new things to consider :

- portal transforms may overload the zope server

- there may be decorators that should be applied to files in order to
handle properly specific extra fields (especially for multimedia files :
metadata etc.)

* Concerning overload of zope server : I think that we should have an
asynchronous portal transform that may run as a separate twisted deamon.
This may live together with portal_transforms and may be called
asynchronous_portal_transform (APT). The only difference with
portal_transforms is that we need to give a callback method to APT in
order to allow it to send the result of the transform after a while.
Therefore if a content type is APT-aware and APT is activated, APT is used
instead of portal_transforms. This allow to move the overload to one or
many dedicated servers for example. We may also take a look at BlueDCS (I
just heard of it but never tried it)
  
wicked.fieldevent allows you to register subscribers for fieldstorage 
events.  These subscribers could be asyncronous(maybe just some 
instructions that clockserver could handle later).   I would try this first.


-w

--

-- d. whit morriss --
- senior engineer, opencore -
- http://www.openplans.org  -
- m: 415-710-8975   -

"If you don't know where you are,   
you don't know anything at all" 


Dr. Edgar Spencer, Ph.D., 1995

begin:vcard
fn:D. Whitfield  Morriss
n:Morriss;D. Whitfield 
org:The Open Planning Project;OpenPlans
adr:;;1309 Ashwood Ave;Nashville;TN;37212;USA
email;internet:[EMAIL PROTECTED]
title:Lead Developer 
tel;home:615 292-9142
tel;cell:415 710-8975
x-mozilla-html:FALSE
version:2.1
end:vcard

___
Framework-Team mailing list
Framework-Team@lists.plone.org
http://lists.plone.org/mailman/listinfo/framework-team


Re: [Framework-Team] Plip : indexing files

2007-01-29 Thread Martin Aspeli

On 1/29/07, Thierry Benita <[EMAIL PROTECTED]> wrote:


This makes sense. I'll work with other sprinters in order to make this
piece of work more conform to Z3 model and use adapters on items that
implement IFileContent.
Is there some work in progress on the rewrite of portal_transforms ?
There is still an issue if we want to store sub-objects either in zodb
or filesystem. We'll investigate that.
Also the view will stay in skins as far as Plone can't support Z3 views
with correct caching policies.


I don't know if anything has been done on transform work. If I were
doing it, I'd start with a blank sheet of paper and design a generic
Zope 3 transform service. Off the top of my head, this would involve:

- An ITransformer interface; objects could be adapted to this to
perform a transform

- An ITransformData interface, which the ITransformer would adapt its
context to in order to get data to transform. This means that code
needing to transform "the data" of an object could adapt to the
ITransformerInterface, which would support generic functionality, but
would rely on the object having an ITransformData adapter to fetch the
data part.

- The transformer would probably take a callback object, an
adaptation to ITransformReceiver. This could be called synchronously
or asynchronously depending ont he ITransformer implementation, and
would be responsible for doing something with the transformed data.
Alternatively, sync/async could be an option when calling the
transform, e.g. have one method which would get the transformed data
(e.g. during rendering, when async makes no sense) and another which
could do it asynchronously (e.g. when saving an object).

- Mimetypes could be registered as named utilties implementing
IMimetype. No need for a separate tool, I don't think. Persistent
configuration could happen with local utilities.

- Available transforms would be named utilities as well. Or they
could be adapters of two mimetypes, but that gets a bit tricky because
you could quite easily have ambiguity. On the other hand, a transform
will be closely related to an input and an output mimetype, so maybe
it's a named multi-adapter from IMimetypeOne to IMimetypeTwo, where
both those are sub-interfaces of IMimetype.

I think a generic design like that wouldn't be too hard to dream up.
It could be a plain zope 3 package (plone.transform) - best to check
no-one has one of those yet. A lot of the actual transform code could
be lifted straight from AT (but the API to the transform function
should be less dumb). I don't think this is terribly hard and possibly
qutie fun. :)

Martin

___
Framework-Team mailing list
Framework-Team@lists.plone.org
http://lists.plone.org/mailman/listinfo/framework-team


Re: [Framework-Team] Plip : indexing files

2007-01-29 Thread Thierry Benita
Hi Martin,

This makes sense. I'll work with other sprinters in order to make this
piece of work more conform to Z3 model and use adapters on items that
implement IFileContent.
Is there some work in progress on the rewrite of portal_transforms ?
There is still an issue if we want to store sub-objects either in zodb
or filesystem. We'll investigate that.
Also the view will stay in skins as far as Plone can't support Z3 views
with correct caching policies.

Any other idea welcome ;)

Thierry.

Martin Aspeli a écrit :
> Hi Thierry,
>
> I think this sounds quite interesting. Certainly, a better "document"
> story (which includes full-text indexing and a strategy to avoid ZODB
> bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and
> limi's as well, fwiw).
>
> I would like to see a proposal that is somwhat less AT centric,
> though. It may be wishful to think that we can achieve this, but
> ideally we'd decouple portal_transform entirely, replacing it with a
> lighter framework based on Zope 3 adapters and utilities (a transform
> is a utility, adapters take care of the actual extraction of data to
> transform and consumption of the transformed text). This should also
> allow some async option (register a consumer for the transform that is
> called when the transform is complete).
>
> At this point, we could extend ATFile relatively easily to use this. I
> don't think we'd want a new content type, but rather to extend ATFile
> as necessary.
>
> I think BLOB storage and transform should be two separate proposals
> and two separate implementations.
>
> Martin

begin:vcard
fn:Thierry BENITA - atReal
n:BENITA;Thierry
adr:;;113 Bd de Pont-de-Vivaux;Marseille;;13010;France
email;internet:[EMAIL PROTECTED]
title:atReal
tel;work:+33 (0)4 91 29 42 81
tel;fax:+33 (0)4 91 29 42 82
note;quoted-printable:atReal et openElec : Troph=C3=A9e d'Or aux Troph=C3=A9es du Libre=0D=0A=
	http://www.atreal.fr=0D=0A=
	http://www.openelec.org
x-mozilla-html:FALSE
url:http://www.atreal.net
version:2.1
end:vcard

___
Framework-Team mailing list
Framework-Team@lists.plone.org
http://lists.plone.org/mailman/listinfo/framework-team


Re: [Framework-Team] Plip : indexing files

2007-01-29 Thread Andreas Jung



--On 29. Januar 2007 13:23:18 +0100 [EMAIL PROTECTED] wrote:


Hi,
- we don't need all the TING mechanics in order to get the fulltext
indexing : we only need the UnicodeLexicon as far as portal transforms
send unicode results (tested in france ; you can imagine ;-) )


My two cents: TXNG is a working solution since years and working
successfully since years in large installations. Any need to reinvent
wheels (from scratch)?

Andreas

pgppVNZGlZ2O1.pgp
Description: PGP signature
___
Framework-Team mailing list
Framework-Team@lists.plone.org
http://lists.plone.org/mailman/listinfo/framework-team


Re: [Framework-Team] Plip : indexing files

2007-01-29 Thread Martin Aspeli

Hi Thierry,

I think this sounds quite interesting. Certainly, a better "document"
story (which includes full-text indexing and a strategy to avoid ZODB
bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and
limi's as well, fwiw).

I would like to see a proposal that is somwhat less AT centric,
though. It may be wishful to think that we can achieve this, but
ideally we'd decouple portal_transform entirely, replacing it with a
lighter framework based on Zope 3 adapters and utilities (a transform
is a utility, adapters take care of the actual extraction of data to
transform and consumption of the transformed text). This should also
allow some async option (register a consumer for the transform that is
called when the transform is complete).

At this point, we could extend ATFile relatively easily to use this. I
don't think we'd want a new content type, but rather to extend ATFile
as necessary.

I think BLOB storage and transform should be two separate proposals
and two separate implementations.

Martin

On 1/29/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

Hi,

I'd like to make a proposal that extends Plip #177
http://plone.org/products/plone/roadmap/177

We developed a plone component that stores a file with its html preview :
ATFilePreview .

This does the following :

- make the file available for download

- create a html preview of the file

- index the file's content in full text


It has the following advantages :

- it uses mimetypes registry in order to detect mimetypes

- it uses portal transforms in order to create the preview and uses this
preview in order to extract the text that has to be indexed

- it stores both html preview and all subobjects into the object, as
persistant sub-objects

- it's totally generic : obviously it does preview and indexes for
opendocuments, ms documents, pdf, rtf, html, python etc. It may also show
a preview for zip files, video files, audio files or whatever you can
imagine. Let's take the example of a video file : you may decide that all
video that is uploaded will be transcoded to mkv format and streamed in
the page via a java applet that displays the video. You only need to have
a video_to_html transform that will achieve it. The result will be stored
together with the original file and the html preview will be displayed.

- the trunk (it's in collective) stores everything inside the object in
zodb, so it has no dependency and can take place of normal file objects

- there is another version that stores file, html and subobjects in the
filesystem. It currently uses FSS but we'd like to move that to BlobFile
as FSS is a bit too complex for our usecase.

- we don't need all the TING mechanics in order to get the fulltext
indexing : we only need the UnicodeLexicon as far as portal transforms
send unicode results (tested in france ; you can imagine ;-) )

- we already have the transforms for all office files in
AROfficesTransform, for which we are currently doing the integration into
archetypes.



At this time there are 2 new things to consider :

- portal transforms may overload the zope server

- there may be decorators that should be applied to files in order to
handle properly specific extra fields (especially for multimedia files :
metadata etc.)

* Concerning overload of zope server : I think that we should have an
asynchronous portal transform that may run as a separate twisted deamon.
This may live together with portal_transforms and may be called
asynchronous_portal_transform (APT). The only difference with
portal_transforms is that we need to give a callback method to APT in
order to allow it to send the result of the transform after a while.
Therefore if a content type is APT-aware and APT is activated, APT is used
instead of portal_transforms. This allow to move the overload to one or
many dedicated servers for example. We may also take a look at BlueDCS (I
just heard of it but never tried it)

* Concerning the decorators : there should be a kind of
decorators_registry that would allow to add decorators based on mimetypes

What do you think of all these points ?

Best regards,

Thierry.

--
atReal
http://www.atreal.net



___
Framework-Team mailing list
Framework-Team@lists.plone.org
http://lists.plone.org/mailman/listinfo/framework-team




___
Framework-Team mailing list
Framework-Team@lists.plone.org
http://lists.plone.org/mailman/listinfo/framework-team