Forgot to mention that if you have a shared file system the best practice
is not to serialize your content (SOFA)
from JD to service. Instead, in a CR add a path to the file containing
Subject of Analysis to the CAS and have
the CM in the pipeline read the content from the shared file system.
-jerr
Can you try setting -Dfile.encoding=ISO-8859-1 for the service (job)
process and -Djavax.servlet.request.encoding=ISO-8859-1
-Dfile.encoding=ISO-8859-1 for the JD process.
The JD actually uses Jetty webserver to serve service requests over HTTP. I
went as far as extracting Jetty server code from J
Yes if i run the AE as a DUCC UIMA-AS Service and send it CASes from UIMA-AS
client it works fine.
Infact the enviornment i.e the LANG argument is same for UIMA-AS Service and
DUCC JOB.
Environ[3] = LANG=en_IN
And if i change the LANG=ar then while getting the data coming in JD the arabic
text
So if you run the AE as a DUCC UIMA-AS service and send it CASes from some
UIMA-AS client it works OK? The full environment for all processes that
DUCC launches are available via ducc-mon under the Specification or
Registry tab for that job or managed reservation or service. Please see if
the LANG
Hey,
Yeah you got it right the first snippet comes in CR before the data goes in
CAS.
And the second snippet is in the first annotator or analysis engine(AE) of my
Aggregate Desciptor.
I am pretty sure this is an issue of the CAS used by DUCC because if i use
service of DUCC in which we are sup
Rohit,
Before sending the data into jcas if i force encode it :-
>
> String content2 = null;
> content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1");
> jcas.setDocumentText(content2);
>
Where is this code, in the job CR?
>
> And when i go in my first annotator i force decode it:-
>
>
Hey Eddie,
Before sending the data into jcas if i force encode it :-
String content2 = null;
content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1");
jcas.setDocumentText(content2);
And when i go in my first annotator i force decode it:-
String content = null;
content = new String(jcas.g
Hi Rohit,
In a DUCC job the CAS created by users CR in the Job Driver is serialized
into cas.xmi format, transported to the Job Process where it is
deserialized and given to the users analytics. Likely the problem is in CAS
serialization or deserialization, perhaps due to the active LANG
environme
Hey,
I use DUCC for english language and it works without any problem.
But lately i tried deploying a job for Arabic Language and all the
content of Arabic Text is replaced by *'?'* (Question Mark).
I am extracting Data from Accumlo and after processing i send it to ES6.
When i checked the lo