Re: [translate-pootle] New backend API

F Wolff Fri, 23 Jun 2006 01:39:48 -0700

On Di, 2006-06-20 at 02:44 +0300, Gintautas Miliauskas wrote:
> Hello,
> 
> > I'll just comment loosely on a few things as I notice them or think
> > about them. Forgive me if my comments are not quite in the direction
> > or on the detail level you were asking for :-)
> 
> Your comments are always very useful.  A new revision of the API is
> attached.
>


Thanks, Gintautas

Sorry for only responding now. I only looked through it quickly and
there are only a few small comments. Some of them probably don't even
need a change - just comments.  I think the other guys are a bit tied up
as well :-/  

> > ILanguageInfo: ISO639 is not enough - we need to include the
> > possibility of ISO3166 country codes, although they shouldn't be
> > mandatory.
> 
> Done.

You mark the language code as optional - I assume we agree that it
really is necessary? The country code (dialect code) should be optional,
though.

> 
> > Please explain what you mean with a 'module'. To use gettext
> > terminology, does it correspond to a PO file in GNOME, or to a
> > directory of files, or to a PO domain?
> 
> I renamed it to a UnitCollection.
> 

Ok, if you stick with this, please document that his corresponds to a
Store/file in our base classes. (Perhaps we should still align this?)

> > For ISuggestionList you can look at the existing
> > lookupserver/lookupclient implementation - it is still very basic,
> > and I have some uncommited work there, but is a start and already
> > exists.
> 
> In this case I meant more of a Rosetta-like suggestion, i.e., an
> unapproved translation.  I think that the suggestion model you speak
> about should live in a different part of the system.
> 

Ok, so you distinguish between TM matches/suggestions and suggestions by
translators who are not allowed to change the "official" translation, do
I understand right? Does your API now refer to the second type I
mention? Then we might want to also store who submitted it (could also
be anonymous) which can be useful to know who suggested it to facilitate
inter-translator discussions on other media).

Perhaps these two are just variations of the same thing? A TM
suggestions should however also indicate the TM engine's score (match
percentage) and possibly the location it got it from (project,
filename).

> > In terms of the statistics, I don't know if we necessarily need
> > separate types for a project and for language. We need to determine
> > the needs, but it might be simpler to have one type.
> 
> I gave it a little thought.  Couldn't think of anything special, so I
> collapsed it for now.  Thanks for the suggestion.
> 

Quick note: Perhaps it could be useful to access in a dictionary type of
way, since we will possibly be updating this a lot, or disabling some
for certain languages, etc.  The ones you have are the important ones,
of course, but we also work by words (the translation industry way), but
probably not for all languages.  We also store statistics on the results
of the quality checkers and it might be useful to access all of these in
a simple, uniform way without necessarily knowing if something is
supported/measured by a certain project/language/pootle
version/whatever.

Your API has ICollectionStatistics - which is not declared elsewhere.

> > IProject: we might need information like accelerators, etc. - what we
> > currently call checkerstyle. Probably just a checkerstyle, although we
> > might want to define an interface for that as well, some day in the
> > future. We might want to store optional version control information.
> 
> Yep, I added checkers in a low-tech way, as a set of string ids.
> 
> > I would suggest aligning the terminology (and API in general) with
> > what we have in the base classes (which was based on XLIFF, as I
> > recall), so for example rather 'unit' than 'message' or
> > 'translation'. 
> 
> Done, thanks.  I consider this very important.
> 
> > In terms of data, we'll probably need fairly rich ways of supporting
> > comments, context, states (fuzzy, needs-review, etc.), formats
> > (c-format, etc.). There are _lots_ of stuff in XLIFF, we can't
> > realistically support all of it immediately. But perhaps we should at
> > least support most of these that I mention. People will probably
> > disagree about what is important, but this list is a start. On the
> > other hand, we want to work towards handling process information,
> > etc. so we'll probably need a lot more.
> 
> Agreed.  We can extend the interfaces later.
> 
> Actually I was thinking of having an "annotations" style attribute on
> most objects so that arbitrary data could be put in there.  It would be
> best to minimize the amount of data put in there, because it's better
> to have things declared explicitly in the interface, then their
> semantics are clear, and they can be stored sanely in rich formats like
> XLIFF.  Still, such an attribute might prove useful for storing things
> such as translation owner, etc. used by other subsystems.
> 
> I would imagine a dict {string->string} on the implementation level;
> that should be easy to store on most backend formats (RDB, .po. XLIFF)
> without much fuss.
> 
> > In terms of actions, we'll need methods for pushing updates,
> > specifying which actions to take and in what way (join, overwrite,
> > overwrite if empty, ignore, turn into suggestion, etc.).  We'll
> > probably want some way to trigger an action, like updating from
> > version control. We'll probably need some authentication system,
> > although this whole area probably needs far more consideration.
> 
> I would prefer to keep this storage layer dumb.  Of course we will need
> authentication, merging, etc., but I think that these can be split
> off into separate components.
> 
> At this moment authentication worries me a bit.  A lot of things can be
> just postponed until they are needed, but I have some tough experiences
> with security tacked on after the fact ;)  It can probably wait just a
> little bit more though.
> 
> > > I wanted to use Zope interfaces for declaring the API, but decided
> > > that it may not be worth it here to add another dependency.
> 
> The more I go into this, the more I want an interfaces package.  If we
> want a modular system, we definitely want interfaces. zope.interfaces is
> relatively standard, not bound to Zope in any way. AFAIK it's used in
> Twisted and a lot of other projects.  The disadvantage is that it
> contains C files which would require a C compiler.  For now we can
> probably live with the current hacked-up style, but a long-term
> solution would be nice.
> 
> > > I ran into two design problems here.  I think that they would hold
> > > for any API, not just the one I sketched, so please bear with me :)
> > > 
> > > 1) how to add a new item to a container, let's say, a new module to
> > > a language translation set.  I see two ways:
> > > * use a special factory class (Abstract Factory pattern) that
> > > builds the needed objects, then add them (I prefer this)
> > > * have each container implement the add() method so that it
> > > instantiates an empty item, adds it and returns it.  The new empty
> > > item can then be updated with the required data.  This works a bit
> > > like the Prototype pattern.
> > > 
> > I don't quite see the advantage of the first approach, since I don't
> > foresee a complex requirements for item creation.  For
> > base.TranslationStore, we already have addsourceunit(source), so
> > perhaps we can use something similar unless there is need for
> > something else.
> 
> The problem here is that I want to generalize manipulation of all
> containers, and this problem recurs in several places.  In case of
> pofile you use the UnitClass attribute that points to the class of
> the children.  Something similar could work here too I guess.
> 
> > > 2) when to save data.  Again, several choices:
> > > * straight-through: always carry out the operation at once.  Grossly
> > > inefficient for strings (imagine adding strings to a module one by
> > > one), but might work for higher-level containers
> > > * completely explicit: serialization happens when you explicitly
> > > call a method save().  This is prone to bugs and not very nice
> > > design: it may break the abstraction.
> > > * transactional: when you modify an object, it marks itself as
> > > "dirty".  The Pootle main function calls "db.startTransaction()" at
> > > the beginning of processing a request and calls
> > > "db.endTransaction()" at the end.  endTransaction() would collect
> > > the "dirty" objects and write them to disk.  I like this one best,
> > > as it leaves it to the implementation of the API how to efficiently
> > > deal with changes.
> > 
> > The third approach might have been necessary if we had big data
> > dependencies, but it might be overkill. Then again, I guess we can
> > implement something simple within that API.  I'll let others comment
> > on this more. I don't see why the second is necessarily that bad, but
> > we'll discuss this more later.
> 
> I guess you are right.  I'm currently leaning towards this: all actions
> are performed immediately with explicit exceptions (such as editing
> unit collections and individual translation units).  There will probably
> be a few more exceptions.  What do you think about this approach?
> 

I think most actions are small (updating a single translation, comment,
suggestion, etc.). For merging operations (update from version control,
file upload, etc.)  we have bigger sets and probably need to commit the
whole bunch together.

> I also consulted your wiki document on base classes.  I am not
> completely convinced that we need multifiles and the distinction
> between multistrings and translation units at this stage.  Do you think
> that we can get by for now without these, or that they should be
> introduced into the API early on?  Your call here ;)
> 
> Best regards,

The multistrings are already implemented and used by the po and poxliff
classes. Multifiles has not yet been coded as a base API. That page was
written in the planning phase of moving to a base API. 

About ITranslationUnit: I think we should allow multiple references. You
currently use "key" and "translation": does this correspond to the XLIFF
"source" and "target"?  We use source and target in the base API. I
think "key" will be a non-deal choice. XLIFF units have ids for
example :-)  I don't quite see how the plurals will work. Especially how
the plurals for the source and the target is split.

I agree that we don't want something more complex for states, but it
might be useful to remember that XLIFF distinguishes between datatype
(c-format, etc.) and state (fuzzy, needs-review, etc). Always easier to
throw things together than to separate :-)

F


Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Translate-pootle mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/translate-pootle

Re: [translate-pootle] New backend API

Reply via email to