Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-03 Thread Michiel Meeuwissen
Eduard Witteveen wrote:
> Michiel Meeuwissen wrote:
> 
> >CALL FOR:   New attributes on StorageManager.
> >
> >Follows a long story, I'll add an abstract near the end.
> > 
> >
> Shouldnt this be on the DatabaseStorageManager(protected on the class)?
> From what i understand is that the database chars are stored in the 
> wrong format.

It's implementation became the very general possibility to specify a filter
on retrieval and storage of a String. I figured that may come in useful also
with other then database implementations (of which exist, er, none), so
those options are on StorageManagerFactory.

The 'lie-cp1252' option is more of a hack, and so i confined its scope to
only the database implementation.

> By using these attributes, the String retrieved from the DatabaseStorage 
> will do a convertion after retrieving the value from the database (which 
> was faulty and what we want to solve)

Yes, but since it is a very general filter, I can imagine that one would
want to put it to some other use. E.g. if you want to use property files as
storage (hey, there _are_ crazy people out there), you may plug in 
'UnicodeEscaper' or
so, and that problem would be solved out-of-the box...


Michiel

-- 
Michiel Meeuwissen  mihxil'
Mediacentrum 140 H'sum[] ()
+31 (0)35 6772979 nl_NL eo_XX en_US



___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-03 Thread Eduard Witteveen
Michiel Meeuwissen wrote:
CALL FOR:   New attributes on StorageManager.
Follows a long story, I'll add an abstract near the end.
 

Shouldnt this be on the DatabaseStorageManager(protected on the class)?
From what i understand is that the database chars are stored in the 
wrong format.
By using these attributes, the String retrieved from the DatabaseStorage 
will do a convertion after retrieving the value from the database (which 
was faulty and what we want to solve)
___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-02 Thread Rob van Maris
On Jan 31, 2005, at 9:30 PM, Michiel Meeuwissen wrote:
CALL FOR:   New attributes on StorageManager.
 [X] +0 (ABSTAIN )

Regards,
Rob van Maris
___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-02 Thread Daniel Ockeloen
On Jan 31, 2005, at 9:30 PM, Michiel Meeuwissen wrote:
 [X] +1 (YES)

Daniel.
___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager,related to encoding problems.

2005-02-02 Thread Rob Vermeulen

 [X] +1 (YES)
 [_] +0 (ABSTAIN )
 [_] -1 (NO), because :
 [_] VETO, because:

___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-02 Thread Marcel Maatkamp

>  [x] +1 (YES)

marcel maatkamp
VPRO Digitaal
___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-02 Thread Pierre van Rooden
Michiel Meeuwissen wrote:
 [X] +1 (YES)
--
Pierre van Rooden
Mediapark, C 107 tel. +31 (0)35 6772815
"Anything worth doing is worth overdoing."
___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-02 Thread Rico Jansen

 [X] +1 (YES)

--
Rico Jansen ([EMAIL PROTECTED])
"You call it untidy, I call it LRU ordered" -- Daniel Barlow
___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-01 Thread Michiel Meeuwissen
Arjan Lamers wrote:
> database searches for example can produce akward results. Just to warn you 
> for this sort of 'dirty hacks'. I dont know the internals of mmbase that 
> well, so i don't know if there are any other problems with charsets in 
> mmbase.

I am aware of the risques. My higher goal is though that in any case you
must ensure that the java strings are correct. Actually, the current
situation was that the database contains something 'impossible' already which
resulted in 'incorrect' java strings. My goal is to offer the possibility to
mimimize the incorrectness to the database.

I agree that generally you should never want something impossible anywhere.

> There are actually to problems regarding websites and charsets: one is 
> having a database which is in a limited encoding (such as iso8859-1 when 
> you also want to support cp1252), the other is discovering what encoding 
> the browser is using. If you fix one you'll also need to fix the other one 
> as you already mentioned.

Yes, I agree, but this hack is only about the database layer, which should IMHO
always assume that it receives 'correct' strings. If that in some case is
not true yet, then that must be fixed _too_.


> IMHO the best solution for the database kind of problem is to create a new 
> JDBC driver, and not to polute mmbase itself with workarounds for a 
> wrongly encoded database. For the browser encoding problem a few work 

I think I don't want to create a new JDBC driver... The closest thing is de
Storage Layer in MMBase.  Actually this whole hack is only about a few lines
of code there. Certainly a whole lot less then a completely new JDBC
driver...

> arounds exist (mainly by forcing utf-8). If I understand your solution 
> correctly you are trying to compensate a bad interpreted request from 
> a browser and a bad encoded database inside the JVM, resulting in 
> unpredictable db queries and string length's for multibyte encodings.

No, on the contrary. I want to make sure that requests are never badly
interpreted, but offer a kind of work around if you until now undepended on
that. My mantra is that java strings must be correct in _any_ case, even if
for some legacy reason the database isn't. Of course, if you make a new
setup, I'd allways recommend to arrange some unicode-capable backend with
UTF-8 pages on the front-end.

> > Then, I propose the possibility to provide 'surrogators' on database
> > level. A surrogator is a something which translates 'impossible
> > characters' to something which comes close enough but is not the real
> > thing. E.g. it can replace the euro-sign with the word 'EURO'.
> 
> IBM has a product ICU4j which does exactly this sort of thing. It too can 
> convert unsupported chararcters to alternative representations, but also a 
> lot more: http://www.icu4j.org/
> I don't know if their open source license is comaptible with mmbase's 
> license, but it 
> seems to me it is worth investigating.

Thanks that's very interesting, and will keep it mind. For the moment I need
only surrogating of those 27 odd cp1252 characters, for which I simply made
a very straightforward filter.


Michiel


-- 
Michiel Meeuwissen  mihxil'
Mediacentrum 140 H'sum[] ()
+31 (0)35 6772979 nl_NL eo_XX en_US



___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-02-01 Thread Arjan Lamers


On Mon, 31 Jan 2005, Michiel Meeuwissen wrote:
[...]
> So, I'll fix that CP1252 is interpreted as such if the database is
> ISO-8859-1. That can never harm, because there are no ISO-8859-1
> characters which are not on the same place in CP1252. Now, you fetch
> 'correct' strings in any case.

I've seen some project which also did this sort of charset overriding in 
the database. As long as you have single byte charsets, this will work 
more or less. But when you'll start working on multibyte charsets, 
database searches for example can produce akward results. Just to warn you 
for this sort of 'dirty hacks'. I dont know the internals of mmbase that 
well, so i don't know if there are any other problems with charsets in 
mmbase.

There are actually to problems regarding websites and charsets: one is 
having a database which is in a limited encoding (such as iso8859-1 when 
you also want to support cp1252), the other is discovering what encoding 
the browser is using. If you fix one you'll also need to fix the other one 
as you already mentioned.

IMHO the best solution for the database kind of problem is to create a new 
JDBC driver, and not to polute mmbase itself with workarounds for a 
wrongly encoded database. For the browser encoding problem a few work 
arounds exist (mainly by forcing utf-8). If I understand your solution 
correctly you are trying to compensate a bad interpreted request from 
a browser and a bad encoded database inside the JVM, resulting in 
unpredictable db queries and string length's for multibyte encodings.

Again, I don't know the internals and I may be misunderstanding your HACK, 
so there may be other reasons.

> Then, I propose the possibility to provide 'surrogators' on database
> level. A surrogator is a something which translates 'impossible
> characters' to something which comes close enough but is not the real
> thing. E.g. it can replace the euro-sign with the word 'EURO'.

IBM has a product ICU4j which does exactly this sort of thing. It too can 
convert unsupported chararcters to alternative representations, but also a 
lot more: http://www.icu4j.org/
I don't know if their open source license is comaptible with mmbase's 
license, but it 
seems to me it is worth investigating.

___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers


Re: [Developers] HACK: New attributes on StorageManager, related to encoding problems.

2005-01-31 Thread Kees Jongenburger
>  [X] +1 (YES)
___
Developers mailing list
Developers@lists.mmbase.org
http://lists.mmbase.org/mailman/listinfo/developers