Forgot to mention that if you have a shared file system the best practice is not to serialize your content (SOFA) from JD to service. Instead, in a CR add a path to the file containing Subject of Analysis to the CAS and have the CM in the pipeline read the content from the shared file system. -jerry
On Tue, Nov 6, 2018 at 9:37 AM Jaroslaw Cwiklik <[email protected]> wrote: > Can you try setting -Dfile.encoding=ISO-8859-1 for the service (job) > process and -Djavax.servlet.request.encoding=ISO-8859-1 > -Dfile.encoding=ISO-8859-1 for the JD process. > > The JD actually uses Jetty webserver to serve service requests over HTTP. > I went as far as extracting Jetty server code from JD into a simple http > server process and also extracted HttpClient related code from the service > into a simple client process to be able to test. > > So on the server side I have: > String text = new String("استعرض المتحدث باسم قوات «التحالف العربي > لدعم".getBytes("UTF-8"),"ISO-8859-1"); > response.setHeader("content-type", "text/xml"); > String body = marshall(text); // XStream serialization > response.getWriter().write(body); > > On the client side: > System.out.println("Default Locale: " + Locale.getDefault()); > System.out.println("Default Charset: " + Charset.defaultCharset()); > System. out.println("file.encoding; " + > System.getProperty("file.encoding")); > > HttpResponse response = httpClient.execute(postMethod); > HttpEntity entity = response.getEntity(); > String content = EntityUtils.toString(entity); > String result = (String) unmarshall(content); //XStream unmarshall > String o = new String(result.getBytes() ); > System.out.println(o); > > When I run with the above -D settings the client console shows: > Default Locale: en_US > Default Charset: ISO-8859-1 > file.encoding; ISO-8859-1 > > استعرض المتحدث باسم قوات «التحالف العربي لدعم > > Without the -D's I dont see arabic text and instead see garbage on the > console. > > On Fri, Jul 6, 2018 at 3:00 AM [email protected] < > [email protected]> wrote: > >> Yes if i run the AE as a DUCC UIMA-AS Service and send it CASes from >> UIMA-AS client it works fine. >> Infact the enviornment i.e the LANG argument is same for UIMA-AS Service >> and DUCC JOB. >> >> Environ[3] = LANG=en_IN >> >> And if i change the LANG=ar then while getting the data coming in JD the >> arabic text is already replaced with ???(Question Mark) and the encoding of >> the data coming in JD or CR shows ASCII encoding. >> I don't understand why is this happening. >> >> Best >> Rohit >> >> >> On 2018/07/05 13:35:11, Eddie Epstein <[email protected]> wrote: >> > So if you run the AE as a DUCC UIMA-AS service and send it CASes from >> some >> > UIMA-AS client it works OK? The full environment for all processes that >> > DUCC launches are available via ducc-mon under the Specification or >> > Registry tab for that job or managed reservation or service. Please see >> if >> > the LANG setting for the service is different from the LANG setting for >> the >> > job. >> > >> > One can also see the LANG setting for a linux process-id by doing: >> > >> > cat /proc/<pid>/environ >> > >> > The LANG to be used for a DUCC process can be set by adding to the >> > --environment argument "LANG=xxx" as needed >> > >> > Thanks, >> > Eddie >> > >> > >> > >> > On Thu, Jul 5, 2018 at 6:47 AM, [email protected] < >> > [email protected]> wrote: >> > >> > > Hey, >> > > Yeah you got it right the first snippet comes in CR before the data >> goes >> > > in CAS. >> > > And the second snippet is in the first annotator or analysis >> engine(AE) of >> > > my Aggregate Desciptor. >> > > I am pretty sure this is an issue of the CAS used by DUCC because if >> i use >> > > service of DUCC in which we are supposed to send the CAS and receive >> the >> > > same CAS with added features from DUCC i get accurate results. >> > > >> > > But the only problem comes in submitting a job where the cas is >> generated >> > > by DUCC. >> > > This can also be a issue of the enviornment(Language) of DUCC because >> the >> > > default language is english. >> > > >> > > Bets Regards >> > > Rohit >> > > >> > > On 2018/07/03 13:11:50, Eddie Epstein <[email protected]> wrote: >> > > > Rohit, >> > > > >> > > > Before sending the data into jcas if i force encode it :- >> > > > > >> > > > > String content2 = null; >> > > > > content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1"); >> > > > > jcas.setDocumentText(content2); >> > > > > >> > > > >> > > > Where is this code, in the job CR? >> > > > >> > > > >> > > > >> > > > > >> > > > > And when i go in my first annotator i force decode it:- >> > > > > >> > > > > String content = null; >> > > > > content = new String(jcas.getDocumentText.getBytes("ISO-8859-1"), >> > > > > "UTF-8"); >> > > > > >> > > > >> > > > And is this in the first annotator of the job process, i.e. the CM? >> > > > >> > > > Please be as specific as possible. >> > > > >> > > > Thanks, >> > > > Eddie >> > > > >> > > >> > >> >
