Thanks for reply, Marcel. But my problem is a little more especific... I have posted a jira issue <http://issues.apache.org/jira/browse/JCR-1727>that illustrates my problem more accurately.
So let's wait a response... On Tue, Aug 26, 2008 at 4:30 AM, Marcel Reutegger <[EMAIL PROTECTED]>wrote: > Hi Danilo, > > this indicates that the default encoding of your platform is ISO-8859-1. > See > [1]. you should rather use [2] instead and specify "UTF-8". > > regards > marcel > > > [1] > http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes()<http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes%28%29> > [2] > > http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes(java.lang.String)<http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes%28java.lang.String%29> > > Danilo Barboza wrote: > > Hail!! > > > > I am having some problems while tries to search over a HTML content in a > > jcr:contet node with properties: > > > > jcr:mimeType = "text/html" > > jcr:encoding = "UTF-8" > > jcr:data = "<html><head></head><body> Some content with acute á á á > > </body></html>" > > > > When I try to search using > > > > //element(*, nt:resource)[jcr:contains(., "á")] > > > > I recieving none result... All my Strings are UTF-8 encoded, that is the > JVM > > Default. > > > > When I try to search using > > > > //element(*, nt:resource)[jcr:contains(., "á")] > > > > I receive the expected result, but with this latin-converted string in > place > > of my "á" UTF-8 string. > > > > I've write a simple sample demonstrating the problem (see attachment). > > > > When you run the sample you must set the defaul JVM encondig to UTF-8 > > passing -Dfile.encoding=UTF-8 argument to JVM. > > > > I also have tested with other binary content (like MSWord DOC) and > > everything is going ok... > > > > The sample code says more than I can explain. > > > > Someone knows why this occour only with HTML binary content? Maybe the > > HTMLTextExtractor? > > > > Thanks, > > > > Danilo Barboza > > > >
